Add AI superpower to your Delphi & C++Builder apps part 3: multimodal LLM use

Sascha · 23 Май 2025

This is part 3 of our blog series on adding AI superpower to your Delphi & C++Builder apps. We already had the

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

and the second article about

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

. In these first two articles, we dealt with textual information. In this third installment, we shift to multimodal LLMs. That is LLMs with the capabilities to deal also with other information than "simple prompts". In other words, providing files as context for the LLMs that contain ngimages, video, audio, documents ...

Embracing Multimodal LLMs in Delphi: Describe, Compare, Extract, Summarize, Translate All in One

AI has quickly moved beyond just text generation. With the rise of multimodal large language models (LLMs), Delphi developers can now leverage image understanding, OCR, file summarization, and translation all with minimal code and maximum flexibility. And thanks to the TTMSFNCCloudAI component, switching between AI providers like OpenAI, Claude, Mistral, Gemini, DeepSeek, Ollama, Grok, or Perplexity becomes seamless.

Why Multimodal Matters

Traditional LLMs focused on text. Todays advanced models can process both text and images, enabling workflows such as:

Automatically describing image content
Performing OCR on photos or scanned documents
Comparing two pictures and identifying visual differences
Summarizing lengthy documents
Translating files between languages

All of these tasks are achievable with the same API structure, just by adjusting context instructions. And best of all, you remain in control of the backend AI servicewhether hosted or local.

A Unified Approach with TTMSFNCCloudAI

Heres how you use it:

1. Describe an Image

Whether its a scenic photo or a complex chart, supported AI models can return a natural language summary of whats in the image.Here is an example showing an amazing result, that it even detected a half readable bottle label and could correctly identify it as Jules Mumm champagne!

2. Compare Two Pictures

Ideal for visual regression tests, UI comparisons, or even spotting differences in scanned documents or maps. In our testing, the Claude LLM seemed to provide the most accurate and knowledgable answer.

3. Perform OCR (Optical Character Recognition)

Forget hard-coded OCR libraries just describe the task and let the LLM handle everything. Here the test performed was with a picture taken from the back of the

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

(I had the honor to meet a few times back in Scotts Valley). Here credits go to OpenAI that was not only extremely accurate but was also smart enough to see the two column layout and properly put the text under each other. Up till the ISBN number of the book, everything is correct.

4. Summarize a Text File

Perfect for making sense of long reports, log files, or any dense document.

5. Translate Text

Build multilingual applications with just a few lines of Delphi code.

Abstracting the Complexity

One of the biggest strengths of TTMSFNCCloudAI is abstraction. You don't need to learn every provider's API or worry about changing your code when switching services. The interface stays the same. Just configure your model and endpoint.

This allows developers to:

Prototype with OpenAI, then move to Claude for privacy
Use local models with Ollama during development
Compare results from Gemini or Grok with just a config change

Vision Models Required

Note: Some providers require specific models that support image understanding. For example:

Ollama: Only models like llava or bakllava support vision
Grok and Mistral: Need to be paired with multimodal-capable backends
Claude, OpenAI (GPT-4o), and Gemini Pro Vision support image input natively

Always ensure the model you choose understands the data type you're sending.

A Future-Proof Way to Integrate AI

With TTMSFNCCloudAI, you're not locked into one vendor or use case. You build once, and switch as needed. The multimodal revolution is here, and Delphi developers now have a first-class way to participate.

Start experimenting. Start integrating. Start building smarter Delphi apps today.

Explore TTMSFNCCloudAI and redefine how your applications interact with the world.

In upcoming articles, well dive deeper into RAG, agents, MCP servers & clients.
If you have an active

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

license, you can now get also access to the first test version of

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

that uses the TTMSFNCCloudAI component but also has everything on board to let you build MCP servers and clients.
Register now to participate in this testing via this

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

.

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

Источник:

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

Add AI superpower to your Delphi & C++Builder apps part 3: multimodal LLM use

Sascha

Заместитель Администратора