• Что бы вступить в ряды "Принятый кодер" Вам нужно:
    Написать 10 полезных сообщений или тем и Получить 10 симпатий.
    Для того кто не хочет терять время,может пожертвовать средства для поддержки сервеса, и вступить в ряды VIP на месяц, дополнительная информация в лс.

  • Пользаватели которые будут спамить, уходят в бан без предупреждения. Спам сообщения определяется администрацией и модератором.

  • Гость, Что бы Вы хотели увидеть на нашем Форуме? Изложить свои идеи и пожелания по улучшению форума Вы можете поделиться с нами здесь. ----> Перейдите сюда
  • Все пользователи не прошедшие проверку электронной почты будут заблокированы. Все вопросы с разблокировкой обращайтесь по адресу электронной почте : info@guardianelinks.com . Не пришло сообщение о проверке или о сбросе также сообщите нам.

Add AI superpower to your Delphi & C++Builder apps part 3: multimodal LLM use

Sascha Оффлайн

Sascha

Заместитель Администратора
Команда форума
Администратор
Регистрация
9 Май 2015
Сообщения
1,483
Баллы
155
TMS Software Delphi  Components


This is part 3 of our blog series on adding AI superpower to your Delphi & C++Builder apps. We already had the

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

and the second article about

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

. In these first two articles, we dealt with textual information. In this third installment, we shift to multimodal LLMs. That is LLMs with the capabilities to deal also with other information than "simple prompts". In other words, providing files as context for the LLMs that contain ngimages, video, audio, documents ...

Embracing Multimodal LLMs in Delphi: Describe, Compare, Extract, Summarize, Translate All in One


AI has quickly moved beyond just text generation. With the rise of multimodal large language models (LLMs), Delphi developers can now leverage image understanding, OCR, file summarization, and translation all with minimal code and maximum flexibility. And thanks to the TTMSFNCCloudAI component, switching between AI providers like OpenAI, Claude, Mistral, Gemini, DeepSeek, Ollama, Grok, or Perplexity becomes seamless.

TMS Software Delphi  Components


Why Multimodal Matters


Traditional LLMs focused on text. Todays advanced models can process both text and images, enabling workflows such as:


  • Automatically describing image content


  • Performing OCR on photos or scanned documents


  • Comparing two pictures and identifying visual differences


  • Summarizing lengthy documents


  • Translating files between languages

All of these tasks are achievable with the same API structure, just by adjusting context instructions. And best of all, you remain in control of the backend AI servicewhether hosted or local.

A Unified Approach with TTMSFNCCloudAI


Heres how you use it:

1. Describe an Image


Whether its a scenic photo or a complex chart, supported AI models can return a natural language summary of whats in the image.Here is an example showing an amazing result, that it even detected a half readable bottle label and could correctly identify it as Jules Mumm champagne!
TMS Software Delphi  Components


2. Compare Two Pictures


Ideal for visual regression tests, UI comparisons, or even spotting differences in scanned documents or maps. In our testing, the Claude LLM seemed to provide the most accurate and knowledgable answer.

TMS Software Delphi  Components


3. Perform OCR (Optical Character Recognition)


Forget hard-coded OCR libraries just describe the task and let the LLM handle everything. Here the test performed was with a picture taken from the back of the

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

(I had the honor to meet a few times back in Scotts Valley). Here credits go to OpenAI that was not only extremely accurate but was also smart enough to see the two column layout and properly put the text under each other. Up till the ISBN number of the book, everything is correct.
TMS Software Delphi  Components


4. Summarize a Text File

Perfect for making sense of long reports, log files, or any dense document.

5. Translate Text


Build multilingual applications with just a few lines of Delphi code.

Abstracting the Complexity


One of the biggest strengths of TTMSFNCCloudAI is abstraction. You don't need to learn every provider's API or worry about changing your code when switching services. The interface stays the same. Just configure your model and endpoint.

This allows developers to:


  • Prototype with OpenAI, then move to Claude for privacy


  • Use local models with Ollama during development


  • Compare results from Gemini or Grok with just a config change
Vision Models Required


Note: Some providers require specific models that support image understanding. For example:


  • Ollama: Only models like llava or bakllava support vision


  • Grok and Mistral: Need to be paired with multimodal-capable backends


  • Claude, OpenAI (GPT-4o), and Gemini Pro Vision support image input natively

Always ensure the model you choose understands the data type you're sending.

A Future-Proof Way to Integrate AI


With TTMSFNCCloudAI, you're not locked into one vendor or use case. You build once, and switch as needed. The multimodal revolution is here, and Delphi developers now have a first-class way to participate.

Start experimenting. Start integrating. Start building smarter Delphi apps today.

Explore TTMSFNCCloudAI and redefine how your applications interact with the world.

In upcoming articles, well dive deeper into RAG, agents, MCP servers & clients.
If you have an active

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

license, you can now get also access to the first test version of

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

that uses the TTMSFNCCloudAI component but also has everything on board to let you build MCP servers and clients.
Register now to participate in this testing via this

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

.


Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.





Источник:

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

 
Вверх Снизу