- Регистрация
- 1 Мар 2015
- Сообщения
- 1,481
- Баллы
- 155
Alright folks, settle in. We've been exploring the Vercel AI SDK v5 canary in this series, and today we're tackling a big one: Tools. If you're building any kind of agentic behavior, or just need your AI to interact with the outside world (or your own app's functions), this is where the rubber meets the road. v5 brings some serious structure and developer experience improvements to how tools are defined, called, and represented in the UI.
As a quick reminder, we're building on concepts from previous posts: UIMessage and UIMessagePart (the new message anatomy), v5's streaming capabilities, V2 Model Interfaces (especially LanguageModelV2FunctionTool), and the overall client/server architecture. If those are new to you, you might want to glance back.
?? A Note on Process & Curation: While I didn't personally write every word, this piece is a product of my dedicated curation. It's a new concept in content creation, where I've guided powerful AI tools (like Gemini Pro 2.5 for synthesis, git diff main vs canary v5 informed by extensive research including OpenAI's Deep Research, spent 10M+ tokens) to explore and articulate complex ideas. This method, inclusive of my fact-checking and refinement, aims to deliver depth and accuracy efficiently. I encourage you to see this as a potent blend of human oversight and AI capability. I use them for my own LLM chats on Thinkbuddy, and doing some make-ups and pushing to there too.
Let's get into how v5 makes tool calls first-class citizens.
1. Tool Invocation Lifecycle Recap: From AI Request to UI Update
Before we dive into the v5 specifics, let's quickly refresh the general lifecycle of an AI tool call, as this sets the stage for understanding the structured approach v5 brings.
Why this matters? (Context & Pain-Point)
Tool calling is a fundamental capability for making LLMs more than just text generators. They allow models to fetch real-time data, perform actions, or interact with other systems. Historically, managing this interaction – the request from the AI, executing the tool, getting the result back to the AI, and updating the UI – could be a bit loose, often involving custom logic and parsing.
How it generally works (Pre-v5 context):
v5 Enhancement Teaser:
This general flow remains, but what Vercel AI SDK v5 does is provide a much more structured, typed, and integrated way to represent and manage this entire lifecycle, especially within the chat UI (using ToolInvocationUIPart) and the data flow between client and server. It turns these steps into well-defined parts of your UIMessage objects and the v5 UI Message Stream. That's what we're about to unpack.
Take-aways / Migration Checklist Bullets
To empower your LLM with tools in Vercel AI SDK v5, you first need to define them on the server-side using the LanguageModelV2FunctionTool interface, with Zod playing a crucial role in schema definition and argument validation.
Why this matters? (Context & Pain-Point)
If an LLM is going to use a tool, it needs to know a few things: what the tool is called, what it does, and exactly what arguments it expects. Without a clear, machine-readable definition, the LLM might try to call tools that don't exist or pass arguments in the wrong format, leading to errors and frustration. As you probably know from v4 or other systems, getting this "contract" right is key. v5 standardizes this with LanguageModelV2FunctionTool and strongly encourages schema validation, most commonly with Zod.
How it’s solved in v5? (Step-by-step, Code, Diagrams)
When you're working with V2 model interfaces in AI SDK v5 (which you should be for new development), tools are defined using the LanguageModelV2FunctionTool interface. This structure is found in the @ai-sdk/provider package (specifically packages/provider/src/language-model/v2/language-model-v2.ts if you're digging in the monorepo).
Here's its TypeScript interface:
// From packages/provider/src/language-model/v2/language-model-v2.ts
export interface LanguageModelV2FunctionTool<NAME extends string, ARGS, RESULT> {
readonly type: 'function'; // Discriminator, always 'function' for this type
readonly function: {
readonly name: NAME; // The name the LLM will use to call the tool
readonly description?: string; // Natural language description for the LLM
readonly parameters: ZodSchema<ARGS>; // Typically a Zod schema for arguments
readonly execute?: (args: ARGS) => PromiseLike<RESULT>; // Optional server-side execution function
};
}
[FIGURE 1: Diagram showing the structure of LanguageModelV2FunctionTool and its fields]
Let's break down the key fields within function:
Example with Zod:
Let's define a server-side tool to get weather information. This example draws from concepts in <extra_details>, Section 8 ("Chatbot Tool Usage" API route) and the general "Tool Calling" examples in the V4 docs which illustrate Zod usage.
import { z } from 'zod';
// Make sure this import path is correct for your SDK version,
// it's usually from '@ai-sdk/provider' or a similar core package
import { LanguageModelV2FunctionTool } from '@ai-sdk/provider';
// 1. Define the Zod schema for the tool's arguments
const weatherParamsSchema = z.object({
city: z.string().describe("The city for which to get the weather, e.g., 'San Francisco', 'Tokyo'"),
unit: z.enum(['celsius', 'fahrenheit'])
.optional() // Make unit optional
.default('celsius') // Default to celsius if not provided
.describe("The temperature unit, either 'celsius' or 'fahrenheit'."),
});
// Infer the TypeScript type for ARGS from the Zod schema
type WeatherParams = z.infer<typeof weatherParamsSchema>;
// Define the V2 tool
const getWeatherTool: LanguageModelV2FunctionTool<
'getWeatherInformation', // Tool name (must be a string literal type)
WeatherParams, // Inferred ARGS type
string // RESULT type (we'll return the weather report as a string)
> = {
type: 'function',
function: {
name: 'getWeatherInformation',
description: 'Fetches the current weather conditions for a specified city. Includes temperature and a brief forecast.',
parameters: weatherParamsSchema, // Our Zod schema
execute: async ({ city, unit }) => { // args are automatically typed as WeatherParams
console.log(`Executing getWeatherInformation for city: ${city}, unit: ${unit}`);
// In a real application, you'd call an actual weather API here.
// const report = await fetchWeatherFromAPI(city, unit);
// For this example, we'll simulate:
const simulatedTemperature = Math.floor(Math.random() * 25) + 5; // Temp between 5 and 29
const conditions = ['sunny', 'partly cloudy', 'cloudy with chance of rain', 'windy'];
const simulatedCondition = conditions[Math.floor(Math.random() * conditions.length)];
// The string returned here is the RESULT
return `The weather in ${city} is currently ${simulatedTemperature}°${unit === 'celsius' ? 'C' : 'F'} and ${simulatedCondition}.`;
},
},
};
// How you might use this tool in a server-side API route with streamText:
// (Assuming `modelMessages` is your conversation history in ModelMessage[] format)
//
// const result = await streamText({
// model: openai('gpt-4o-mini'), // Your V2 model instance
// messages: modelMessages,
// tools: {
// getWeatherInformation: getWeatherTool // Register the tool with the SDK
// },
// toolChoice: 'auto' // Let the model decide when to use tools
// });
//
// // Then, result.toUIMessageStreamResponse() would handle streaming to client
LLM Hints via .describe():
Notice the use of .describe() on the Zod schema properties (city and unit). This is a great practice. The AI SDK (or the underlying model provider's integration) often uses these descriptions to provide more context to the LLM about what each parameter means, helping it generate more accurate arguments. This technique was also recommended in the V4 Prompt Engineering docs for tools.
By defining tools this way, you give the AI SDK and the LLM a clear, validated, and executable contract for extending the AI's capabilities. The automatic validation via Zod is a significant step up in building robust and secure tool-using AI applications.
Take-aways / Migration Checklist Bullets
When an LLM calls a server-defined tool that includes an execute function, Vercel AI SDK v5 orchestrates a seamless flow: validating arguments, running your function, and then automatically feeding the results back to the LLM for further processing, all while keeping the client UI informed via streamed updates.
Why this matters? (Context & Pain-Point)
Imagine the alternative: the LLM says "call tool X," your server gets this request, and then you have to manually parse the arguments, call your tool's code, format the result, construct a new message for the LLM, and send it back. That's a lot of boilerplate! For tools that can be executed entirely on the server without direct user interaction for that step, v5's automated flow is a huge time-saver and reduces potential error points.
How it’s solved in v5? (Step-by-step, Code, Diagrams)
Let's consider the scenario where you've called streamText (or a similar V2 core function) on your server and provided a LanguageModelV2FunctionTool (like our getWeatherTool from Section 2) that has an execute method defined.
Here's the typical flow:
Simultaneously, as this server-side execution is happening, the result.toUIMessageStreamResponse() method (which you call in your API route after streamText) is hard at work streaming updates to the client. This ensures the UI can reflect what the AI is doing:
[FIGURE 3: Diagram showing UIMessageStreamParts ('tool-call', 'tool-result') flowing to the client, updating a ToolInvocationUIPart]
3.2 Injecting results for model follow-up (Automatic if maxSteps > 1)
As mentioned in step 4, if streamText is configured for multi-step interactions (e.g., maxSteps: 5), the SDK doesn't just stop after getting a tool result.
This seamless server-side loop, combined with rich UI updates, is a cornerstone of building powerful, tool-using agents with AI SDK v5 when tools don't require direct client-side interaction for their execution.
Take-aways / Migration Checklist Bullets
Sometimes, a tool needs to be executed directly in the user's browser—to access browser APIs like geolocation, interact with a browser extension, or simply to ask the user for a confirmation via window.confirm. Vercel AI SDK v5's useChat hook facilitates this through its onToolCall prop.
Why this matters? (Context & Pain-Point)
Not all tools make sense to run on the server. If the AI needs to know the user's current location, asking the server to figure that out (e.g., via IP geolocation) is often less accurate or less privacy-preserving than using the browser's navigator.geolocation API. Similarly, if the AI suggests an action that requires explicit user confirmation ("Book a flight to Paris?"), that confirmation dialog (window.confirm or a custom modal) must happen on the client. In V4, handling these client-side "tool effects" could be a bit manual.
How it’s solved in v5? (Step-by-step, Code, Diagrams)
The primary mechanism for this in v5 is the onToolCall callback prop you can provide to the useChat hook.
Scenario:
The LLM, running on the server via streamText, decides to call a tool.
In both these cases, if there's no server-side execute to intercept the call, streamText().toUIMessageStreamResponse() will stream a 'tool-call' UIMessageStreamPart to the client. On the client, useChat will process this, resulting in a ToolInvocationUIPart being added to the assistant's UIMessage, with its state set to 'call'.
Now, your client-side onToolCall function comes into play. This is a prop you pass when setting up useChat:
// Client-side React component using useChat
import { useChat, UIMessage, ToolCall } from '@ai-sdk/react'; // Ensure ToolCall is imported if needed directly
// ... other imports
function MyChatComponent() {
const { messages, input, handleInputChange, handleSubmit, addToolResult } = useChat({
// ... other useChat options (api, id, etc.)
// The onToolCall handler!
onToolCall: async ({ toolCall }: { toolCall: ToolCall }) => {
// toolCall is an object of type ToolCall (from @ai-sdk/core/tool or similar path)
// It contains:
// - toolCallId: string (unique ID for this specific call attempt)
// - toolName: string (the name of the tool the AI wants to use)
// - args: any (the parsed arguments provided by the AI for the tool)
console.log('Client onToolCall invoked:', toolCall);
if (toolCall.toolName === 'getClientGeolocation') {
try {
// Accessing a browser API
const position = await new Promise<GeolocationPosition>((resolve, reject) =>
navigator.geolocation.getCurrentPosition(resolve, reject, {
timeout: 10000, // 10 second timeout
})
);
// IMPORTANT: Return an object with the original toolCallId and the result
return {
toolCallId: toolCall.toolCallId,
result: {
latitude: position.coords.latitude,
longitude: position.coords.longitude,
accuracy: position.coords.accuracy,
}
};
} catch (error: any) {
console.error('Error getting geolocation:', error);
return {
toolCallId: toolCall.toolCallId,
error: error.message || 'Failed to get geolocation.'
};
}
} else if (toolCall.toolName === 'askUserConfirmation') {
// Example: A tool that requires user interaction via window.confirm
// The 'args' would typically contain the message to confirm.
const messageToConfirm = (toolCall.args as { message: string }).message || "Are you sure?";
const userConfirmed = window.confirm(messageToConfirm);
return {
toolCallId: toolCall.toolCallId,
result: { confirmed: userConfirmed }, // Send back whether the user confirmed
};
}
// If this onToolCall doesn't handle the tool, you can return nothing (or a specific error)
// The SDK might have other mechanisms or this might signal an unhandled tool.
// For this example, let's return an error for unhandled tools.
return {
toolCallId: toolCall.toolCallId,
error: `Tool '${toolCall.toolName}' not implemented or handled on the client.`
};
}
});
// ... rest of your component (rendering messages, input form, etc.)
// The UI would render the ToolInvocationUIPart in its 'call' state initially,
// then it would update to 'result' or 'error' after onToolCall completes.
return ( /* ... JSX ... */ );
}
[FIGURE 4: Diagram showing client-side onToolCall flow: Server streams 'tool-call' -> Client useChat -> onToolCall executes -> Result updates ToolInvocationUIPart -> (Optional) Resubmit to server]
What onToolCall Receives:
As shown, it receives an object, and the key property is toolCall. This toolCall object (type ToolCall from @ai-sdk/core/tool or a similar core path) gives you:
What onToolCall Must Return:
It must return a Promise that resolves to an object with the following structure:
{ toolCallId: string; result?: any; error?: string; }
SDK Action After onToolCall Completes:
Automatic Resubmission to Server:
This is a key part of the loop. If:
The server then runs streamText again, the LLM gets the client-side tool results, and the conversation continues. This creates a seamless flow even for tools that need browser-specific execution.
4.1 Browser APIs (geolocation example)
The getClientGeolocation example above is a prime use case. navigator.geolocation.getCurrentPosition() is an async browser API. onToolCall allows you to await its result and feed it back into the AI conversation loop.
4.2 UX patterns for confirmation dialogs (using addToolResult for manual submission)
Sometimes, a tool call isn't meant to be automatically executed by onToolCall. Instead, the AI requests something that first needs explicit user interaction and confirmation after the AI has made its request. For example: "AI wants to book a flight to Paris for $500. [Confirm] [Cancel]".
Flow:
* If the user clicks "Cancel", you'd call `addToolResult({ toolCallId, result: "User cancelled booking." })` or perhaps `addToolResult({ toolCallId, error: "User declined." })`.
This addToolResult pattern is perfect for tools where the "execution" is actually a user making a choice in the UI after the AI has prompted for it via a tool call. It decouples the AI's request from the user's asynchronous response. The example from <extra_details>, Section 8 (client-side page showing confirmation buttons) illustrates this interactive pattern.
Take-aways / Migration Checklist Bullets
Tool calls, like any external interaction, can fail. Vercel AI SDK v5 provides mechanisms for these errors to propagate through the system and offers strategies for recovery, ensuring your application can handle hiccups gracefully.
Why this matters? (Context & Pain-Point)
When an AI tries to use a tool, things can go wrong at multiple stages:
How it’s solved in v5? (Step-by-step, Code, Diagrams)
v5 surfaces tool-related errors through specific error types and by updating the state of the ToolInvocationUIPart.
* **Client-Side Tools:** For tools handled by `onToolCall` on the client, if you want similar pre-validation, you'd typically perform Zod parsing within your `onToolCall` handler itself and return an error if it fails.
[FIGURE 5: Flowchart showing different points where tool errors can occur and how they propagate to the UI]
Recovery Strategies:
Once an error related to a tool call occurs and is displayed in the UI, how can the user or the system recover?
Take-aways / Migration Checklist Bullets
When integrating tools with LLMs, security is paramount. Always validate arguments provided by the LLM before tool execution and sanitize results from tools before displaying them or feeding them back to the LLM. Vercel AI SDK v5's emphasis on Zod schemas for tool parameters is a key enabler for input validation.
Why this matters? (Context & Pain-Point)
This section deserves CRUCIAL EMPHASIS. LLMs generate text, and that text can sometimes be unpredictable or, if influenced by malicious user input (prompt injection), actively harmful. If an LLM calls your tools with unchecked arguments, or if your tools return data from untrusted sources that you then render or process without care, you open yourself up to serious security vulnerabilities.
Think about it:
These aren't just theoretical; they are real risks when bridging generative AI with functional code and external data. As the <context>, Section 10.3, rightly points out, preventing prompt injection that leads to harmful tool arguments is vital.
How it’s solved in v5? (Practices & SDK Features)
Vercel AI SDK v5, particularly with its V2 LanguageModelV2FunctionTool definitions, provides strong mechanisms and encourages practices to mitigate these risks.
6.1 Validating LLM-Generated Arguments (Input Validation for Tools)
The flow of data isn't just one way. Your tools produce results, and these results also need scrutiny.
[FIGURE 6: Diagram illustrating the two-way validation: LLM args -> Tool (Input Validation) and Tool result -> UI/LLM (Output Validation/Sanitization)]
Security is not an afterthought; it's a continuous process. The Vercel AI SDK v5 provides better primitives for this, but the responsibility for secure implementation rests with you, the developer.
Take-aways / Migration Checklist Bullets
Vercel AI SDK v5 excels at facilitating multi-step conversational flows where the AI might call a tool, get a result, then call another tool or generate text, all within a single user turn. This is powered by the maxSteps option on both the server (streamText) and client (useChat), along with the structured streaming of tool interactions.
Why this matters? (Context & Pain-Point)
Simple Q&A is one thing, but true conversational agents often need to perform a sequence of actions or reasoning steps. For example: "What's the weather in the capital of France, and can you then find me a good bistro there that's open now?" This requires:
Manually orchestrating such chains can be complex, involving careful state management and multiple back-and-forth calls. v5 aims to simplify this.
How it’s solved in v5? (Step-by-step, Code, Diagrams)
7.1 maxSteps in streamText (Server-Side Multi-Step)
This feature, recapped from <extra_details>, Section 8 (Chatbot Tool Usage - Server-side Multi-Step Calls) and the V4 docs (Tool Calling - Multi-Step Calls), is key for server-driven tool chains.
[FIGURE 7: Diagram of server-side multi-step flow with maxSteps: LLM -> ToolA -> ResultA -> LLM -> ToolB -> ResultB -> LLM -> Final Text. All streamed to client.]
7.2 maxSteps in useChat (Client-Involved Multi-Step)
The useChat hook on the client also has a maxSteps option (mentioned in <extra_details>, Section 8's client-side example). This controls how many "rounds" of interaction useChat will automatically manage when client-side tools are involved. A "round" here typically means: User sends message -> Server (LLM) -> Client (tool call handled) -> Server (LLM with tool result) -> Client (final response or another tool call).
Flow with Client-Side Tools (e.g., handled by onToolCall):
This client-side maxSteps enables building conversational agents that can seamlessly weave in browser-based actions or user confirmations without you having to manually code all the resubmission logic.
7.3 StepStartUIPart for UI Delineation
When streamText().toUIMessageStreamResponse() processes these multi-step server-side tool calls (or even complex client-involved chains), it might automatically insert 'step-start' UIMessageStreamParts into the v5 UI Message Stream being sent to the client.
By combining maxSteps on both server and client with the structured streaming of UIMessageParts (including ToolInvocationUIPart and StepStartUIPart), Vercel AI SDK v5 provides a powerful toolkit for building sophisticated, chained conversational interactions.
Take-aways / Migration Checklist Bullets
To illustrate the power of v5's structured tool handling and multi-step chains, let's imagine building a conceptual calendar-booking assistant. This example will highlight how server-side tools, client-side interactions, and UI updates come together.
Why this matters? (Context & Pain-Point)
Booking a meeting often involves multiple back-and-forth steps: checking availability, presenting options, getting confirmation, and then finalizing. Trying to model this with a simple request-response LLM call is difficult. We need the AI to guide the user through a process, using tools at each stage. This is a perfect showcase for v5's capabilities.
The Scenario:
A user wants to book a meeting: "Book a 1-hour meeting with Jane for next Tuesday afternoon."
Conceptual Tools We'll Define:
Walkthrough of the Interaction (v5 Features Highlighted):
Let's assume useChat and streamText are configured with maxSteps: 5 to allow these chained interactions.
v5 Features Highlighted in this Showcase:
This kind of complex, multi-turn, tool-using interaction is precisely what Vercel AI SDK v5 is designed to simplify and make robust.
Take-aways / Migration Checklist Bullets
Wrapping up our deep dive into Vercel AI SDK v5's tool capabilities, let's consolidate the main advantages and best practices that emerge from this new, more structured approach.
Why this matters? (Context & Pain-Point)
Working with AI tools can quickly become complex. Vercel AI SDK v5 brings a significant level of organization and power to this domain. Understanding these core principles will help you build more robust, maintainable, and user-friendly tool-using AI applications.
Actionable Takeaways & Best Practices:
Teasing Post 9: Persisting Rich Chat Histories
With tool calls now behaving as first-class citizens within our rich UIMessages, and our AI assistants capable of complex, multi-step interactions, a critical question arises: how do we reliably save and restore these intricate conversations?
Post 9 in our "Inside Vercel AI SDK 5" series will explore just that: "Persisting Rich UIMessage Histories: The v5 'Persist Once, Render Anywhere' Model." We'll dive into strategies for database schema design, best practices for saving UIMessage arrays with all their parts and metadata, and how v5 facilitates high-fidelity restoration of these complex chat states. Stay tuned!
As a quick reminder, we're building on concepts from previous posts: UIMessage and UIMessagePart (the new message anatomy), v5's streaming capabilities, V2 Model Interfaces (especially LanguageModelV2FunctionTool), and the overall client/server architecture. If those are new to you, you might want to glance back.
?? A Note on Process & Curation: While I didn't personally write every word, this piece is a product of my dedicated curation. It's a new concept in content creation, where I've guided powerful AI tools (like Gemini Pro 2.5 for synthesis, git diff main vs canary v5 informed by extensive research including OpenAI's Deep Research, spent 10M+ tokens) to explore and articulate complex ideas. This method, inclusive of my fact-checking and refinement, aims to deliver depth and accuracy efficiently. I encourage you to see this as a potent blend of human oversight and AI capability. I use them for my own LLM chats on Thinkbuddy, and doing some make-ups and pushing to there too.
Let's get into how v5 makes tool calls first-class citizens.
1. Tool Invocation Lifecycle Recap: From AI Request to UI Update
Before we dive into the v5 specifics, let's quickly refresh the general lifecycle of an AI tool call, as this sets the stage for understanding the structured approach v5 brings.
Why this matters? (Context & Pain-Point)
Tool calling is a fundamental capability for making LLMs more than just text generators. They allow models to fetch real-time data, perform actions, or interact with other systems. Historically, managing this interaction – the request from the AI, executing the tool, getting the result back to the AI, and updating the UI – could be a bit loose, often involving custom logic and parsing.
How it generally works (Pre-v5 context):
- AI Decides to Use a Tool: It all starts with the LLM. After processing the user's input and the conversation history, the model determines that it can't answer directly or perform the requested action without external help. It decides it needs to call a specific function or tool you've made available to it.
- AI Specifies Tool and Arguments: The LLM then generates a "tool call" request. This isn't just a vague wish; it specifies the exact toolName it wants to use and the args (arguments) for that tool, structured as the tool expects (often as JSON).
- Application Executes Tool: Your application (this could be server-side logic or, in some cases, client-side code) receives this tool call request. It then executes the named tool, passing in the arguments provided by the LLM.
- Result Fed Back to AI: The output from the tool execution – whether it's the data fetched, a confirmation of an action, or even an error if something went wrong – is then packaged up and sent back to the LLM.
- AI Generates Final Response: The LLM takes this tool result, incorporates it into its understanding, and then formulates its final response to the user. This response often explains what it found or did using the tool.
v5 Enhancement Teaser:
This general flow remains, but what Vercel AI SDK v5 does is provide a much more structured, typed, and integrated way to represent and manage this entire lifecycle, especially within the chat UI (using ToolInvocationUIPart) and the data flow between client and server. It turns these steps into well-defined parts of your UIMessage objects and the v5 UI Message Stream. That's what we're about to unpack.
Take-aways / Migration Checklist Bullets
- Tool calling involves the AI requesting, your app executing, and the AI using the result.
- v5 brings enhanced structure to this lifecycle, especially for UI and data flow.
To empower your LLM with tools in Vercel AI SDK v5, you first need to define them on the server-side using the LanguageModelV2FunctionTool interface, with Zod playing a crucial role in schema definition and argument validation.
Why this matters? (Context & Pain-Point)
If an LLM is going to use a tool, it needs to know a few things: what the tool is called, what it does, and exactly what arguments it expects. Without a clear, machine-readable definition, the LLM might try to call tools that don't exist or pass arguments in the wrong format, leading to errors and frustration. As you probably know from v4 or other systems, getting this "contract" right is key. v5 standardizes this with LanguageModelV2FunctionTool and strongly encourages schema validation, most commonly with Zod.
How it’s solved in v5? (Step-by-step, Code, Diagrams)
When you're working with V2 model interfaces in AI SDK v5 (which you should be for new development), tools are defined using the LanguageModelV2FunctionTool interface. This structure is found in the @ai-sdk/provider package (specifically packages/provider/src/language-model/v2/language-model-v2.ts if you're digging in the monorepo).
Here's its TypeScript interface:
// From packages/provider/src/language-model/v2/language-model-v2.ts
export interface LanguageModelV2FunctionTool<NAME extends string, ARGS, RESULT> {
readonly type: 'function'; // Discriminator, always 'function' for this type
readonly function: {
readonly name: NAME; // The name the LLM will use to call the tool
readonly description?: string; // Natural language description for the LLM
readonly parameters: ZodSchema<ARGS>; // Typically a Zod schema for arguments
readonly execute?: (args: ARGS) => PromiseLike<RESULT>; // Optional server-side execution function
};
}
[FIGURE 1: Diagram showing the structure of LanguageModelV2FunctionTool and its fields]
Let's break down the key fields within function:
- name: NAME (string): This is the unique name your LLM will use to refer to and call this tool. Choose something descriptive and unambiguous.
- description?: string: This is super important for the LLM. It's a natural language description of what the tool does, when it should be used, and what kind of information it returns. Writing good descriptions is a bit of an art – think of it as prompt engineering for the LLM's tool-choosing capabilities. The V4 docs had some good tips on "Prompts for Tools" in the Prompt Engineering section, and those principles still apply. The better the description, the more likely the LLM will use the tool correctly.
- parameters: ZodSchema<ARGS>: This is arguably the most critical part for robustness and security. You define a Zod schema that describes the expected structure and types of the arguments (ARGS) for your tool.
- Why Zod? While the type annotation says ZodSchema, the underlying system often converts this to a JSON Schema to inform the LLM about the required arguments and their format. Zod is excellent for this because it provides both static typing for your ARGS in TypeScript and runtime validation.
- Automatic Validation: When the LLM decides to call this tool and provides arguments, the AI SDK (if the tool has a server-side execute function, or sometimes even before passing to client-side handlers) will use this Zod schema to validate the arguments before your execute function is ever called. This is a massive win for security (preventing malformed or malicious arguments from the LLM from hitting your tool logic directly) and for robustness (catching errors early). We'll talk more about security in Section 6. This validation was highlighted as a key feature for ToolInvocationUIPart in <extra_details> Section 4 and touched upon for V2 models in Post 3, Section 4 of this series.
- execute?: (args: ARGS) => PromiseLike<RESULT>: This is an optional server-side function.
- If you provide an execute function, the AI SDK can automatically run this tool when the LLM calls it. The function receives the validated args (typed according to your Zod schema) and should return a Promise that resolves to the RESULT of the tool's operation.
- If you don't provide execute (or if the tool is specifically meant for client-side execution, which we'll cover in Section 4), the tool call information (name and args) is streamed down to the client for handling there.
Example with Zod:
Let's define a server-side tool to get weather information. This example draws from concepts in <extra_details>, Section 8 ("Chatbot Tool Usage" API route) and the general "Tool Calling" examples in the V4 docs which illustrate Zod usage.
import { z } from 'zod';
// Make sure this import path is correct for your SDK version,
// it's usually from '@ai-sdk/provider' or a similar core package
import { LanguageModelV2FunctionTool } from '@ai-sdk/provider';
// 1. Define the Zod schema for the tool's arguments
const weatherParamsSchema = z.object({
city: z.string().describe("The city for which to get the weather, e.g., 'San Francisco', 'Tokyo'"),
unit: z.enum(['celsius', 'fahrenheit'])
.optional() // Make unit optional
.default('celsius') // Default to celsius if not provided
.describe("The temperature unit, either 'celsius' or 'fahrenheit'."),
});
// Infer the TypeScript type for ARGS from the Zod schema
type WeatherParams = z.infer<typeof weatherParamsSchema>;
// Define the V2 tool
const getWeatherTool: LanguageModelV2FunctionTool<
'getWeatherInformation', // Tool name (must be a string literal type)
WeatherParams, // Inferred ARGS type
string // RESULT type (we'll return the weather report as a string)
> = {
type: 'function',
function: {
name: 'getWeatherInformation',
description: 'Fetches the current weather conditions for a specified city. Includes temperature and a brief forecast.',
parameters: weatherParamsSchema, // Our Zod schema
execute: async ({ city, unit }) => { // args are automatically typed as WeatherParams
console.log(`Executing getWeatherInformation for city: ${city}, unit: ${unit}`);
// In a real application, you'd call an actual weather API here.
// const report = await fetchWeatherFromAPI(city, unit);
// For this example, we'll simulate:
const simulatedTemperature = Math.floor(Math.random() * 25) + 5; // Temp between 5 and 29
const conditions = ['sunny', 'partly cloudy', 'cloudy with chance of rain', 'windy'];
const simulatedCondition = conditions[Math.floor(Math.random() * conditions.length)];
// The string returned here is the RESULT
return `The weather in ${city} is currently ${simulatedTemperature}°${unit === 'celsius' ? 'C' : 'F'} and ${simulatedCondition}.`;
},
},
};
// How you might use this tool in a server-side API route with streamText:
// (Assuming `modelMessages` is your conversation history in ModelMessage[] format)
//
// const result = await streamText({
// model: openai('gpt-4o-mini'), // Your V2 model instance
// messages: modelMessages,
// tools: {
// getWeatherInformation: getWeatherTool // Register the tool with the SDK
// },
// toolChoice: 'auto' // Let the model decide when to use tools
// });
//
// // Then, result.toUIMessageStreamResponse() would handle streaming to client
LLM Hints via .describe():
Notice the use of .describe() on the Zod schema properties (city and unit). This is a great practice. The AI SDK (or the underlying model provider's integration) often uses these descriptions to provide more context to the LLM about what each parameter means, helping it generate more accurate arguments. This technique was also recommended in the V4 Prompt Engineering docs for tools.
By defining tools this way, you give the AI SDK and the LLM a clear, validated, and executable contract for extending the AI's capabilities. The automatic validation via Zod is a significant step up in building robust and secure tool-using AI applications.
Take-aways / Migration Checklist Bullets
- Define server-side tools using the LanguageModelV2FunctionTool interface from @ai-sdk/provider.
- Use Zod schemas for function.parameters to define argument structure and get automatic validation.
- The function.execute method is optional; if provided, the SDK can run the tool on the server.
- Write clear function.description fields for the LLM.
- Use .describe() on Zod schema properties to give hints to the LLM.
- This structure is for V2 model interfaces. If migrating from V4, ensure your tool definitions are updated.
When an LLM calls a server-defined tool that includes an execute function, Vercel AI SDK v5 orchestrates a seamless flow: validating arguments, running your function, and then automatically feeding the results back to the LLM for further processing, all while keeping the client UI informed via streamed updates.
Why this matters? (Context & Pain-Point)
Imagine the alternative: the LLM says "call tool X," your server gets this request, and then you have to manually parse the arguments, call your tool's code, format the result, construct a new message for the LLM, and send it back. That's a lot of boilerplate! For tools that can be executed entirely on the server without direct user interaction for that step, v5's automated flow is a huge time-saver and reduces potential error points.
How it’s solved in v5? (Step-by-step, Code, Diagrams)
Let's consider the scenario where you've called streamText (or a similar V2 core function) on your server and provided a LanguageModelV2FunctionTool (like our getWeatherTool from Section 2) that has an execute method defined.
Here's the typical flow:
LLM Decides to Call the Tool: Based on the user's prompt and conversation history, the LLM (e.g., GPT-4o-mini) decides to use your defined tool (e.g., getWeatherInformation). It generates the tool call request, including the toolName and the args it thinks are appropriate. In OpenAI's terminology, this often comes through as tool_calls in the LLM's response.
V2 Provider Adapter Parses the Request: The Vercel AI SDK's provider adapter (e.g., @ai-sdk/openai) receives the raw response from the LLM API and parses out this tool call information.
AI SDK Core Logic Takes Over (within streamText or similar):
- Identifies Tool: The SDK core logic identifies that the LLM wants to call getWeatherInformation.
- Validates Arguments: This is crucial. The SDK looks up the getWeatherTool definition you provided in the tools option. It takes the parameters Zod schema (e.g., weatherParamsSchema) from your tool definition and uses it to validate the args supplied by the LLM.
- If validation fails (e.g., LLM provided a string for a number, or missed a required field not caught by optional().default()), an error is typically generated. The V4 docs mentioned errors like InvalidToolArgumentsError (from <ai_sdk_docs_v4>, Errors section), and similar structured errors are expected in v5. This prevents your execute function from receiving garbage data.
- Calls execute(args): If argument validation passes, the SDK calls your tool's execute method. The args passed to your function are now typed (thanks to z.infer) and validated. For our example, it would be getWeatherTool.function.execute({ city: "London", unit: "celsius" }).
- Awaits Result: Your execute function returns a Promise. The SDK awaits this promise to get the tool's result (e.g., "The weather in London is currently 15°C and cloudy.").
[FIGURE 2: Sequence diagram of server-side tool execution: LLM -> SDK -> Validate Args -> Execute Tool -> Get Result -> SDK]
Result Sent Back to LLM (for multi-step interaction):
- The SDK constructs a new ModelMessage with role: 'tool'. This message contains one or more LanguageModelV2ToolResultPart(s). Each part includes the toolCallId (an ID the LLM generated for its specific call attempt, crucial for matching results to requests if there are multiple parallel tool calls), the toolName, and the result obtained from your execute function.
- If your streamText call is configured for multi-step interactions (e.g., maxSteps > 1 as mentioned in <extra_details>, Section 8 and ai_sdk_docs_v4, Tool Calling, or if toolChoice forces further interaction), the SDK automatically sends this new tool message back to the LLM as part of the ongoing streamText operation.
- The LLM then processes this tool result and generates its next response, which could be the final text answer for the user, or even another tool call if needed. This automated "loop" is a key benefit.
Simultaneously, as this server-side execution is happening, the result.toUIMessageStreamResponse() method (which you call in your API route after streamText) is hard at work streaming updates to the client. This ensures the UI can reflect what the AI is doing:
Streaming the Tool Call Intention:
- A 'tool-call-delta' UIMessageStreamPart might be streamed first if toolCallStreaming is enabled in streamText options (as per <ai_sdk_docs_v4> Chatbot Tool Usage). This allows the UI to show the arguments being "typed out" by the AI.
- This is followed by a 'tool-call' UIMessageStreamPart. This part contains the toolCallId, toolName, and the complete (stringified JSON) args.
- On the client, useChat (via processUIMessageStream) uses these stream parts to create or update a ToolInvocationUIPart within the assistant's UIMessage. This part will initially have state: 'call' (or 'partial-call' then 'call').
- UI Indication: This allows your UI to display something like: "AI is using tool: getWeatherInformation with arguments: { city: 'London', unit: 'celsius' }..." or a loading spinner for that tool.
Streaming the Tool Result:
- After your server-side execute function completes and returns its result, toUIMessageStreamResponse() streams a 'tool-result' UIMessageStreamPart. This part includes the toolCallId, toolName, and the (stringified JSON) result from your tool.
- Client-side, processUIMessageStream uses this to update the same ToolInvocationUIPart (identified by toolCallId), transitioning its state from 'call' to 'result' and populating it with the actual result data.
- UI Indication: The UI can now update to show the outcome: "Tool getWeatherInformation result: The weather in London is currently 15°C and cloudy."
[FIGURE 3: Diagram showing UIMessageStreamParts ('tool-call', 'tool-result') flowing to the client, updating a ToolInvocationUIPart]
3.2 Injecting results for model follow-up (Automatic if maxSteps > 1)
As mentioned in step 4, if streamText is configured for multi-step interactions (e.g., maxSteps: 5), the SDK doesn't just stop after getting a tool result.
- It automatically appends the tool role ModelMessage (containing the tool's output) to the internal conversation history it maintains for the current streamText operation.
- It then makes another call to the LLM, providing this augmented history.
- The LLM now has the tool's output and can generate its next response. This could be:
- The final text answer for the user.
- A call to another tool.
- More reasoning text.
- This entire loop (LLM -> tool call -> execute -> tool result -> LLM -> text/tool call) continues until the LLM generates a final text response without further tool calls, or the maxSteps limit is reached.
- All these intermediate steps (further tool calls, tool results, reasoning text) are also streamed to the client as appropriate UIMessageStreamParts, building up the rich UIMessage on the client.
This seamless server-side loop, combined with rich UI updates, is a cornerstone of building powerful, tool-using agents with AI SDK v5 when tools don't require direct client-side interaction for their execution.
Take-aways / Migration Checklist Bullets
- If a server-side tool has an execute function, streamText can automatically validate args and run it.
- Argument validation uses the Zod schema from LanguageModelV2FunctionTool.parameters.
- The SDK streams 'tool-call' and 'tool-result' UIMessageStreamParts to the client, updating the ToolInvocationUIPart.
- If maxSteps > 1 (or other conditions), the SDK automatically sends tool results back to the LLM for continued processing.
- This provides a seamless server-side execution loop for many common tool use cases.
Sometimes, a tool needs to be executed directly in the user's browser—to access browser APIs like geolocation, interact with a browser extension, or simply to ask the user for a confirmation via window.confirm. Vercel AI SDK v5's useChat hook facilitates this through its onToolCall prop.
Why this matters? (Context & Pain-Point)
Not all tools make sense to run on the server. If the AI needs to know the user's current location, asking the server to figure that out (e.g., via IP geolocation) is often less accurate or less privacy-preserving than using the browser's navigator.geolocation API. Similarly, if the AI suggests an action that requires explicit user confirmation ("Book a flight to Paris?"), that confirmation dialog (window.confirm or a custom modal) must happen on the client. In V4, handling these client-side "tool effects" could be a bit manual.
How it’s solved in v5? (Step-by-step, Code, Diagrams)
The primary mechanism for this in v5 is the onToolCall callback prop you can provide to the useChat hook.
Scenario:
The LLM, running on the server via streamText, decides to call a tool.
- Case 1: This tool was defined in the tools option on the server without an execute function.
- Case 2: The LLM calls a tool name that wasn't defined on the server at all, but you intend to handle it purely on the client.
In both these cases, if there's no server-side execute to intercept the call, streamText().toUIMessageStreamResponse() will stream a 'tool-call' UIMessageStreamPart to the client. On the client, useChat will process this, resulting in a ToolInvocationUIPart being added to the assistant's UIMessage, with its state set to 'call'.
Now, your client-side onToolCall function comes into play. This is a prop you pass when setting up useChat:
// Client-side React component using useChat
import { useChat, UIMessage, ToolCall } from '@ai-sdk/react'; // Ensure ToolCall is imported if needed directly
// ... other imports
function MyChatComponent() {
const { messages, input, handleInputChange, handleSubmit, addToolResult } = useChat({
// ... other useChat options (api, id, etc.)
// The onToolCall handler!
onToolCall: async ({ toolCall }: { toolCall: ToolCall }) => {
// toolCall is an object of type ToolCall (from @ai-sdk/core/tool or similar path)
// It contains:
// - toolCallId: string (unique ID for this specific call attempt)
// - toolName: string (the name of the tool the AI wants to use)
// - args: any (the parsed arguments provided by the AI for the tool)
console.log('Client onToolCall invoked:', toolCall);
if (toolCall.toolName === 'getClientGeolocation') {
try {
// Accessing a browser API
const position = await new Promise<GeolocationPosition>((resolve, reject) =>
navigator.geolocation.getCurrentPosition(resolve, reject, {
timeout: 10000, // 10 second timeout
})
);
// IMPORTANT: Return an object with the original toolCallId and the result
return {
toolCallId: toolCall.toolCallId,
result: {
latitude: position.coords.latitude,
longitude: position.coords.longitude,
accuracy: position.coords.accuracy,
}
};
} catch (error: any) {
console.error('Error getting geolocation:', error);
return {
toolCallId: toolCall.toolCallId,
error: error.message || 'Failed to get geolocation.'
};
}
} else if (toolCall.toolName === 'askUserConfirmation') {
// Example: A tool that requires user interaction via window.confirm
// The 'args' would typically contain the message to confirm.
const messageToConfirm = (toolCall.args as { message: string }).message || "Are you sure?";
const userConfirmed = window.confirm(messageToConfirm);
return {
toolCallId: toolCall.toolCallId,
result: { confirmed: userConfirmed }, // Send back whether the user confirmed
};
}
// If this onToolCall doesn't handle the tool, you can return nothing (or a specific error)
// The SDK might have other mechanisms or this might signal an unhandled tool.
// For this example, let's return an error for unhandled tools.
return {
toolCallId: toolCall.toolCallId,
error: `Tool '${toolCall.toolName}' not implemented or handled on the client.`
};
}
});
// ... rest of your component (rendering messages, input form, etc.)
// The UI would render the ToolInvocationUIPart in its 'call' state initially,
// then it would update to 'result' or 'error' after onToolCall completes.
return ( /* ... JSX ... */ );
}
[FIGURE 4: Diagram showing client-side onToolCall flow: Server streams 'tool-call' -> Client useChat -> onToolCall executes -> Result updates ToolInvocationUIPart -> (Optional) Resubmit to server]
What onToolCall Receives:
As shown, it receives an object, and the key property is toolCall. This toolCall object (type ToolCall from @ai-sdk/core/tool or a similar core path) gives you:
- toolCall.toolCallId: string: A unique ID for this specific invocation. You must include this ID in your return value.
- toolCall.toolName: string: The name of the tool.
- toolCall.args: any (or more specifically, JSONValue): The arguments, already parsed by the SDK from the LLM's output. You'll likely cast this to an expected type (e.g., as { city: string }).
What onToolCall Must Return:
It must return a Promise that resolves to an object with the following structure:
{ toolCallId: string; result?: any; error?: string; }
- toolCallId: Crucially, this must be the same toolCallId you received. This allows the SDK to match the result to the correct pending tool call.
- result?: any: If the tool executed successfully, provide its output here. This will be JSON-stringified when sent back to the LLM.
- error?: string: If the tool execution failed, provide an error message here.
SDK Action After onToolCall Completes:
- The SDK takes the result (or error) you returned.
- It finds the corresponding ToolInvocationUIPart in the assistant's UIMessage (using the toolCallId).
- It updates that ToolInvocationUIPart's state to 'result' (and populates the result field) or 'error' (and populates errorMessage). This change will reactively update your UI.
Automatic Resubmission to Server:
This is a key part of the loop. If:
- All pending tool calls requested by the AI in that turn now have results (either because your onToolCall provided them, or through addToolResult which we'll see next), AND
- useChat is configured for multi-step interactions (e.g., you've set maxSteps in the useChat options, as mentioned in <extra_details>, Section 8's client-side example), THEN useChat will automatically take the updated messages array (which now includes your client-side tool results, internally formatted as tool role messages) and POST them back to your server API endpoint.
The server then runs streamText again, the LLM gets the client-side tool results, and the conversation continues. This creates a seamless flow even for tools that need browser-specific execution.
4.1 Browser APIs (geolocation example)
The getClientGeolocation example above is a prime use case. navigator.geolocation.getCurrentPosition() is an async browser API. onToolCall allows you to await its result and feed it back into the AI conversation loop.
4.2 UX patterns for confirmation dialogs (using addToolResult for manual submission)
Sometimes, a tool call isn't meant to be automatically executed by onToolCall. Instead, the AI requests something that first needs explicit user interaction and confirmation after the AI has made its request. For example: "AI wants to book a flight to Paris for $500. [Confirm] [Cancel]".
Flow:
- LLM streams a 'tool-call' for a tool like requestFlightBookingConfirmation, with args like { flightDetails: "Paris, $500", actionPrompt: "Do you want to book this flight?" }.
- Your UI renders the ToolInvocationUIPart for this tool. It's in state: 'call'. Your rendering logic for this specific tool name might display the actionPrompt and "Confirm"/"Cancel" buttons.
- Your onToolCall for requestFlightBookingConfirmation might not be implemented, or it might simply log the request and do nothing, because the action depends on the user clicking a button in the UI, not on onToolCall directly resolving it.
User Clicks a Button:
If the user clicks "Confirm", your button's event handler calls addToolResult() (a function returned by useChat):
// Inside your component, assuming you have the toolCallId for 'requestFlightBookingConfirmation'
// const { addToolResult } = useChat(...);
const handleConfirmBooking = (toolCallIdForBooking: string) => {
addToolResult({
toolCallId: toolCallIdForBooking,
result: "User confirmed booking."
// You could also pass structured data: result: { confirmed: true, details: "..." }
});
};
* If the user clicks "Cancel", you'd call `addToolResult({ toolCallId, result: "User cancelled booking." })` or perhaps `addToolResult({ toolCallId, error: "User declined." })`.
- SDK Action after addToolResult:
- Calling addToolResult updates the specified ToolInvocationUIPart to state: 'result' (or 'error').
- Just like with onToolCall, if this resolves all pending tool calls for the AI's turn and maxSteps allows, useChat will automatically resubmit the messages to the server.
This addToolResult pattern is perfect for tools where the "execution" is actually a user making a choice in the UI after the AI has prompted for it via a tool call. It decouples the AI's request from the user's asynchronous response. The example from <extra_details>, Section 8 (client-side page showing confirmation buttons) illustrates this interactive pattern.
Take-aways / Migration Checklist Bullets
- Use useChat's onToolCall prop to handle tools that must execute in the browser.
- onToolCall receives { toolCall: ToolCall } and must return a Promise of { toolCallId: string; result?: any; error?: string; }.
- Return the original toolCallId.
- The SDK updates the ToolInvocationUIPart's state in messages.
- If maxSteps allows, useChat automatically resubmits messages with tool results to the server.
- For user confirmations or UI-driven tool completions, render the 'call' state and use addToolResult() from button handlers.
- Remember to handle potential errors within your onToolCall logic and return them in the error field.
Tool calls, like any external interaction, can fail. Vercel AI SDK v5 provides mechanisms for these errors to propagate through the system and offers strategies for recovery, ensuring your application can handle hiccups gracefully.
Why this matters? (Context & Pain-Point)
When an AI tries to use a tool, things can go wrong at multiple stages:
- The LLM might generate invalid arguments.
- The tool's execute function (server-side or client-side via onToolCall) might throw an error.
- The LLM might try to call a tool that doesn't exist. Without proper error handling, these issues can crash your app or leave the user (and the AI) in a broken state. Robust applications need to catch these errors, display them meaningfully, and ideally, offer ways to recover or retry.
How it’s solved in v5? (Step-by-step, Code, Diagrams)
v5 surfaces tool-related errors through specific error types and by updating the state of the ToolInvocationUIPart.
Schema Validation Errors (InvalidToolArgumentsError):
- Scenario: The LLM generates arguments for a tool that don't match the Zod schema defined in your LanguageModelV2FunctionTool.parameters.
- SDK Action (Server-Side Tools): If the tool has a server-side execute function, the AI SDK core (within streamText or similar) will automatically validate the LLM-provided arguments against this schema before calling execute. If validation fails, it typically throws or generates an error like InvalidToolArgumentsError (this error type was mentioned in the V4 docs, Errors section, and similar behavior is expected for v5's Zod-based validation).
- Propagation to Client: This server-side validation error is then streamed to the client, usually as a 'tool-error' UIMessageStreamPart. This stream part contains the toolCallId, toolName, and the errorMessage.
UI Update: processUIMessageStream (used by useChat) updates the corresponding ToolInvocationUIPart by setting its state to 'error' and populating its errorMessage field with the validation error details. Your UI can then render this error.
// Example ToolInvocationUIPart after a validation error
{
"type": "tool-invocation",
"toolInvocation": {
"state": "error",
"toolCallId": "tool_xyz456",
"toolName": "setAppointment",
"args": { "date": "Tomorrow", "time": "Morning" }, // LLM provided invalid args
"errorMessage": "Invalid arguments: 'date' must be in YYYY-MM-DD format."
}
}
* **Client-Side Tools:** For tools handled by `onToolCall` on the client, if you want similar pre-validation, you'd typically perform Zod parsing within your `onToolCall` handler itself and return an error if it fails.
Tool Execution Errors (ToolExecutionError):
- Scenario (Server-Side): Your server-side execute function (e.g., for getWeatherTool) throws an unhandled exception during its operation (e.g., the external weather API is down).
- SDK Action (Server-Side): The SDK catches this error. It's often wrapped as a ToolExecutionError.
- Propagation to Client: Like validation errors, this execution error is streamed to the client as a 'tool-error' UIMessageStreamPart.
- UI Update: The ToolInvocationUIPart on the client transitions to state: 'error' with the relevant errorMessage.
- Scenario (Client-Side): Your onToolCall function on the client either throws an error or returns an object like { toolCallId, error: "API fetch failed" }.
- SDK Action (Client-Side): useChat processes this, updates the ToolInvocationUIPart to state: 'error' with the provided errorMessage.
Tool Not Found Errors (NoSuchToolError):
- Scenario: The LLM hallucinates and tries to call a tool name (e.g., flyToMars) that you haven't defined in the tools option passed to streamText (for server-side tools) or that isn't handled by your client-side onToolCall.
- SDK Action: The SDK will recognize that the requested tool isn't available. This typically results in a NoSuchToolError.
- Propagation & UI: Similar to other errors, this would likely be streamed as a 'tool-error' part (or a general stream 'error' part if it happens before tool-specific streaming starts), leading to an error state in the UI.
SDK Error Handling & Repair (Experimental - ToolCallRepairError):
- The V4 docs mentioned an experimental experimental_repairToolCall option for generateText/streamText. This allows developers to provide a function to attempt to fix invalid tool calls from the LLM (e.g., if an argument name is slightly misspelled).
- If such a feature persists and is enhanced in v5, and if this repair function itself fails, a ToolCallRepairError might be thrown.
- This shows the SDK's ongoing efforts to make tool calling more resilient, even in the face of imperfect LLM outputs. The primary defense, however, remains strong schema validation.
[FIGURE 5: Flowchart showing different points where tool errors can occur and how they propagate to the UI]
Recovery Strategies:
Once an error related to a tool call occurs and is displayed in the UI, how can the user or the system recover?
Retry with reload():
- The useChat hook returns a reload() function. If a tool interaction sequence leads to an error that sets useChat's general error state or if the user simply wants to try the last prompt again, calling reload() will resend the last user message.
- This gives the LLM another chance. It might:
- Try calling the same tool but with different (hopefully corrected) arguments.
- Choose a different tool.
- Attempt to answer without using a tool.
- This is a user-driven recovery mechanism.
AI Self-Correction (in multi-step flows):
- If a ToolInvocationUIPart ends up in an 'error' state (e.g., due to invalid arguments or an execution error from the tool itself), and this result (the error message) is sent back to the LLM as part of a multi-step conversation (because maxSteps > 1 or similar):
- A sufficiently sophisticated LLM might be able to understand the error message.
- It could then attempt to call the same tool again, but this time with corrected arguments. For instance, if the error was "Invalid date format, use YYYY-MM-DD", the LLM might reformat the date and retry.
- Or, it might decide the tool is unsuitable and try an alternative approach.
- This relies on the LLM's capabilities and how informative your tool's error messages are.
- If a ToolInvocationUIPart ends up in an 'error' state (e.g., due to invalid arguments or an execution error from the tool itself), and this result (the error message) is sent back to the LLM as part of a multi-step conversation (because maxSteps > 1 or similar):
Clear UI Feedback:
- This is crucial. Your UI must clearly display tool errors. The ToolInvocationUIPart having an 'error' state and an errorMessage property is designed precisely for this.
- Render the errorMessage so the user understands what went wrong (e.g., "Sorry, I couldn't fetch the weather right now because the city name was unclear. Could you try again with a specific city?").
- This feedback helps the user decide whether to rephrase their request, try reload(), or abandon that line of inquiry.
Take-aways / Migration Checklist Bullets
- Anticipate errors: LLM argument errors, tool execution failures, tool not found.
- v5 surfaces these via specific error types (e.g., InvalidToolArgumentsError) and updates ToolInvocationUIPart.state to 'error' with an errorMessage.
- Errors are typically streamed to the client via 'tool-error' UIMessageStreamParts.
- Implement UI rendering for ToolInvocationUIPart when state === 'error' to show users what happened.
- Use useChat().reload() to allow users to retry their last prompt.
- For multi-step flows, informative error messages sent back to the LLM can enable AI self-correction.
- Explore experimental repair mechanisms if available and appropriate for your risk tolerance.
When integrating tools with LLMs, security is paramount. Always validate arguments provided by the LLM before tool execution and sanitize results from tools before displaying them or feeding them back to the LLM. Vercel AI SDK v5's emphasis on Zod schemas for tool parameters is a key enabler for input validation.
Why this matters? (Context & Pain-Point)
This section deserves CRUCIAL EMPHASIS. LLMs generate text, and that text can sometimes be unpredictable or, if influenced by malicious user input (prompt injection), actively harmful. If an LLM calls your tools with unchecked arguments, or if your tools return data from untrusted sources that you then render or process without care, you open yourself up to serious security vulnerabilities.
Think about it:
- What if a user tricks the LLM into crafting arguments for your executeSQL tool that result in DROP TABLE users;?
- What if your fetchWebpageTitle tool returns a title containing <script>alert('XSS')</script>, and you render that directly in your UI?
- What if a tool result includes an enormous payload designed to crash the LLM or your application when sent back?
These aren't just theoretical; they are real risks when bridging generative AI with functional code and external data. As the <context>, Section 10.3, rightly points out, preventing prompt injection that leads to harmful tool arguments is vital.
How it’s solved in v5? (Practices & SDK Features)
Vercel AI SDK v5, particularly with its V2 LanguageModelV2FunctionTool definitions, provides strong mechanisms and encourages practices to mitigate these risks.
6.1 Validating LLM-Generated Arguments (Input Validation for Tools)
- The Golden Rule: Always validate arguments provided by the LLM before executing any tool. This is especially true for server-side tools that might perform sensitive actions, interact with databases, call other APIs, or touch the file system.
Zod Schemas are Your Best Friend:
- As we saw in Section 2, when you define a LanguageModelV2FunctionTool, the parameters field is a Zod schema (e.g., weatherParamsSchema).
- For server-side tools with an execute function, the AI SDK automatically uses this Zod schema to validate the arguments generated by the LLM before your execute function is called. If validation fails, an error (like InvalidToolArgumentsError) is raised, and your execute function is typically not even invoked with the bad data.
This is a powerful, built-in defense mechanism. Leverage it fully. Define your Zod schemas to be as strict and precise as possible.
// Recap: Tool definition with Zod schema
// const weatherParamsSchema = z.object({
// city: z.string().min(1).max(100), // Add more constraints
// unit: z.enum(['celsius', 'fahrenheit']).optional().default('celsius'),
// });
// const getWeatherTool: LanguageModelV2FunctionTool = {
// type: 'function',
// function: { /* ..., */ parameters: weatherParamsSchema, execute: async (args) => { /* args are validated */ } },
// };
Manual Validation for Client-Side Tools (in onToolCall):
- If you're handling tool execution on the client using onToolCall, the AI SDK provides the parsed toolCall.args. While the SDK might do some basic parsing, the robust Zod schema validation as described above is primarily for server-side execute.
Therefore, within your onToolCall function, if the tool performs critical operations or if the args structure is complex and could be manipulated, you should manually validate toolCall.args against an expected schema (again, Zod is excellent for this on the client too).
// Client-side onToolCall example with manual Zod validation
// const clientToolArgsSchema = z.object({ action: z.string(), targetId: z.string().uuid() });
// onToolCall: async ({ toolCall }) => {
// if (toolCall.toolName === 'performClientAction') {
// try {
// const validatedArgs = clientToolArgsSchema.parse(toolCall.args); // Throws if invalid
// // Proceed with validatedArgs...
// return { toolCallId: toolCall.toolCallId, result: "Action performed" };
// } catch (validationError) {
// return { toolCallId: toolCall.toolCallId, error: "Invalid arguments for client tool." };
// }
// }
// }
Why is this so important? Preventing Injections:
- The primary threat here is prompt injection leading to malicious argument generation. A user might craft their input to trick the LLM. For instance, if you have a tool that queries a database:
- User: "Find orders for customer O'Malley. Also, the SQL comment for this is --; DROP TABLE orders; --"
- LLM (if not properly sandboxed/instructed) might naively generate arguments for your runSQLQuery tool like: args: { query: "SELECT * FROM orders WHERE customer_name = 'O''Malley'; --; DROP TABLE orders; --" }
- If your runSQLQuery tool directly executes this query string without validation or parameterization, you're in big trouble.
- Strict schema validation on args (e.g., ensuring query doesn't contain suspicious SQL keywords or structure if it's not supposed to be raw SQL) is one layer of defense. Secure coding practices within the tool itself (like using parameterized queries) are another.
- The primary threat here is prompt injection leading to malicious argument generation. A user might craft their input to trick the LLM. For instance, if you have a tool that queries a database:
The flow of data isn't just one way. Your tools produce results, and these results also need scrutiny.
The Golden Rule: If tool results come from external APIs, user-influenced processes, or any untrusted source, validate or sanitize these results *before*:
- Displaying them directly in the UI: To prevent Cross-Site Scripting (XSS) if the result inadvertently contains HTML, JavaScript, or CSS that could be executed by the browser.
- Sending them back to the LLM: To prevent the LLM from being fed malicious, excessively large, or malformed data that could disrupt its subsequent processing, lead to further vulnerabilities, or cause denial-of-service.
Examples:
- XSS Prevention: Your getWebpageTitle tool fetches a title. The title is "Welcome! <img src=x onerror=alert(1)>". If you render this directly into HTML (e.g., div.innerHTML = toolResult), you've got an XSS vulnerability.
- Solution: Treat the result as plain text (React does this by default in JSX {toolResult}), or if you must render it as HTML (e.g., for rich formatting from a trusted tool), use a robust HTML sanitization library like DOMPurify.
- LLM Input Poisoning/Denial of Service: Your tool fetches user comments from a forum. One comment is 10MB of random characters or contains confusing control sequences.
- Solution: Validate the structure and size of the tool result. Truncate excessively long text. Sanitize or escape control characters before sending the result back to the LLM. If the tool returns JSON, validate it against an expected schema (Zod again!).
- XSS Prevention: Your getWebpageTitle tool fetches a title. The title is "Welcome! <img src=x onerror=alert(1)>". If you render this directly into HTML (e.g., div.innerHTML = toolResult), you've got an XSS vulnerability.
[FIGURE 6: Diagram illustrating the two-way validation: LLM args -> Tool (Input Validation) and Tool result -> UI/LLM (Output Validation/Sanitization)]
Security is not an afterthought; it's a continuous process. The Vercel AI SDK v5 provides better primitives for this, but the responsibility for secure implementation rests with you, the developer.
Take-aways / Migration Checklist Bullets
- SECURITY IS PARAMOUNT for tool interactions.
- ALWAYS validate LLM-generated arguments before executing tools. Leverage Zod schemas in LanguageModelV2FunctionTool.parameters for automatic server-side validation.
- Manually validate arguments in client-side onToolCall handlers if they perform sensitive actions.
- ALWAYS validate or sanitize tool results before rendering them in the UI (prevent XSS) or sending them back to the LLM.
- Be especially careful with tools that execute code, query databases, or interact with the file system. Use the principle of least privilege.
- Stay informed about prompt injection techniques and defenses.
Vercel AI SDK v5 excels at facilitating multi-step conversational flows where the AI might call a tool, get a result, then call another tool or generate text, all within a single user turn. This is powered by the maxSteps option on both the server (streamText) and client (useChat), along with the structured streaming of tool interactions.
Why this matters? (Context & Pain-Point)
Simple Q&A is one thing, but true conversational agents often need to perform a sequence of actions or reasoning steps. For example: "What's the weather in the capital of France, and can you then find me a good bistro there that's open now?" This requires:
- Identifying "capital of France" (possibly a tool or internal knowledge).
- Getting weather for Paris (tool call).
- Searching for bistros in Paris based on weather and current time (another tool call, potentially using results from the first).
- Synthesizing a final answer.
Manually orchestrating such chains can be complex, involving careful state management and multiple back-and-forth calls. v5 aims to simplify this.
How it’s solved in v5? (Step-by-step, Code, Diagrams)
7.1 maxSteps in streamText (Server-Side Multi-Step)
This feature, recapped from <extra_details>, Section 8 (Chatbot Tool Usage - Server-side Multi-Step Calls) and the V4 docs (Tool Calling - Multi-Step Calls), is key for server-driven tool chains.
- Scenario: You call streamText on the server, provide tools with execute functions, and set maxSteps to a value greater than 1 (e.g., maxSteps: 5).
- Automatic SDK Orchestration:
- User sends a message. Your server calls streamText.
- LLM (Turn 1): Decides to call ServerToolA.
- SDK: Validates args for ServerToolA, calls its execute function, gets resultA.
- SDK: Constructs a tool role ModelMessage with resultA and automatically sends it back to the LLM (along with the prior history) as part of the same streamText operation.
- LLM (Turn 2): Processes resultA. It might now:
- Generate a final text response.
- Call ServerToolB.
- Call ServerToolA again with different arguments.
- SDK: If another tool is called, repeats step 3-4.
- This loop continues until the LLM generates a text response without a tool call, or maxSteps is reached.
- Streaming to Client: Throughout this server-side loop, result.toUIMessageStreamResponse() is streaming all the intermediate 'tool-call' and 'tool-result' UIMessageStreamParts to the client. The client UI sees the ToolInvocationUIParts updating live, showing the AI "working" through its steps.
- Benefit: Complex server-side agentic behavior with minimal manual orchestration.
[FIGURE 7: Diagram of server-side multi-step flow with maxSteps: LLM -> ToolA -> ResultA -> LLM -> ToolB -> ResultB -> LLM -> Final Text. All streamed to client.]
7.2 maxSteps in useChat (Client-Involved Multi-Step)
The useChat hook on the client also has a maxSteps option (mentioned in <extra_details>, Section 8's client-side example). This controls how many "rounds" of interaction useChat will automatically manage when client-side tools are involved. A "round" here typically means: User sends message -> Server (LLM) -> Client (tool call handled) -> Server (LLM with tool result) -> Client (final response or another tool call).
Flow with Client-Side Tools (e.g., handled by onToolCall):
- User: Sends a message. useChat POSTs to your server API.
- Server: streamText runs. LLM decides to call ClientToolX (which has no server-side execute or is meant for client). Server streams a 'tool-call' UIMessageStreamPart for ClientToolX to the client.
- Client (useChat):
- Receives the 'tool-call' part.
- Its onToolCall handler for ClientToolX runs and returns resultX.
- The ToolInvocationUIPart for ClientToolX in messages updates to state: 'result' with resultX.
- Client (useChat Automatic Resubmission):
- If maxSteps (in useChat options) allows further interaction (e.g., maxSteps: 3, and this was the first round), AND
- All tool calls requested by the AI in this "turn" now have results (from onToolCall or addToolResult),
- THEN useChat automatically POSTs the updated messages array (which now includes resultX internally formatted as a tool role message) back to your server API endpoint.
- Server (Again): Your API route receives this new POST. streamText runs again, now with resultX as part of the ModelMessage history.
- LLM: Processes resultX and can now:
- Generate its final text response.
- Call another tool (server-side or client-side).
- The loop continues, streaming updates to the client, until a final text response is generated or useChat's maxSteps limit is hit for these automated client-server rounds.
This client-side maxSteps enables building conversational agents that can seamlessly weave in browser-based actions or user confirmations without you having to manually code all the resubmission logic.
7.3 StepStartUIPart for UI Delineation
When streamText().toUIMessageStreamResponse() processes these multi-step server-side tool calls (or even complex client-involved chains), it might automatically insert 'step-start' UIMessageStreamParts into the v5 UI Message Stream being sent to the client.
- Purpose: These are simple, contentless marker parts: { type: 'step-start'; }.
- Function: They indicate to the client UI that a new logical "step" or "phase" in the AI's multi-part generation process is beginning. For example, one might be emitted before a tool call is detailed, another after its result, and another before the final textual summary.
- UI Rendering: Your client-side rendering logic for UIMessages can look for StepStartUIPart (as shown in <extra_details>, Section 8 "Step start parts"). When encountered, you might render a visual separator like a horizontal rule (<hr>), a small heading like "Step 2:", or just add some extra spacing.
- Benefit: This helps the user visually follow the AI's process when it's performing multiple actions or reasoning steps to fulfill a complex request, making the interaction feel more transparent and less like a monolithic black box.
By combining maxSteps on both server and client with the structured streaming of UIMessageParts (including ToolInvocationUIPart and StepStartUIPart), Vercel AI SDK v5 provides a powerful toolkit for building sophisticated, chained conversational interactions.
Take-aways / Migration Checklist Bullets
- Use maxSteps in server-side streamText for automated multi-step tool call chains with server-executable tools.
- Use maxSteps in client-side useChat options to enable automatic resubmission of client-side tool results to the server.
- The SDK streams all intermediate tool calls and results to the client as ToolInvocationUIParts.
- Look for StepStartUIPart in the message parts to visually delineate steps in the UI.
- This architecture supports building complex conversational agents that can string together multiple tool uses and reasoning phases.
To illustrate the power of v5's structured tool handling and multi-step chains, let's imagine building a conceptual calendar-booking assistant. This example will highlight how server-side tools, client-side interactions, and UI updates come together.
Why this matters? (Context & Pain-Point)
Booking a meeting often involves multiple back-and-forth steps: checking availability, presenting options, getting confirmation, and then finalizing. Trying to model this with a simple request-response LLM call is difficult. We need the AI to guide the user through a process, using tools at each stage. This is a perfect showcase for v5's capabilities.
The Scenario:
A user wants to book a meeting: "Book a 1-hour meeting with Jane for next Tuesday afternoon."
Conceptual Tools We'll Define:
checkAvailability(person: string, dateRange: string, durationHours: number)
- Type: Server-side (has an execute function).
- Args Schema (Zod): z.object({ person: z.string(), dateRange: z.string(), durationHours: z.number().int().positive() })
- Action: Queries a calendar backend to find available slots for person within dateRange for the given durationHours.
- Result: An array of available slot strings (e.g., ["Tuesday 2:00 PM", "Tuesday 3:00 PM", "Tuesday 4:00 PM"]) or an empty array if none.
displaySlotOptions(slots: string[], prompt: string)
- Type: Client-side interaction (no server execute; handled by UI rendering of ToolInvocationUIPart and addToolResult via user click).
- Args Schema (Zod): z.object({ slots: z.array(z.string()), prompt: z.string() })
- Action (AI's intent): The AI calls this to tell the UI to present the slots to the user as clickable options, along with a prompt (e.g., "Please select a time:").
- Result (from user click via addToolResult): The selected slot string (e.g., "Tuesday 3:00 PM").
confirmBooking(person: string, selectedSlot: string, durationHours: number)
- Type: Server-side (has an execute function).
- Args Schema (Zod): z.object({ person: z.string(), selectedSlot: z.string(), durationHours: z.number().int().positive() })
- Action: Attempts to book the meeting on the calendar backend.
- Result: A confirmation message (e.g., "Meeting booked successfully!") or an error message.
Walkthrough of the Interaction (v5 Features Highlighted):
Let's assume useChat and streamText are configured with maxSteps: 5 to allow these chained interactions.
User: "Book a 1-hour meeting with Jane for next Tuesday afternoon."
- (Client: useChat sends this to the server API.)
AI (Turn 1 - Server):
- LLM processes the request. Decides it needs to check Jane's availability.
- Calls server-side tool checkAvailability(person: "Jane", dateRange: "next Tuesday afternoon", durationHours: 1).
- (SDK: Validates args, calls getAvailabilityTool.function.execute(...).)
- (Client UI: streamText().toUIMessageStreamResponse() sends 'tool-call' for checkAvailability. ToolInvocationUIPart appears, state: 'call' -> shows "Checking Jane's availability...").
Server (checkAvailability executes):
- execute function runs, queries calendar backend.
- Returns result: ["Tuesday 2:00 PM", "Tuesday 3:00 PM", "Tuesday 4:00 PM"].
- (SDK: This result is sent back to the LLM as part of the ongoing streamText operation because maxSteps > 1.)
- (Client UI: Server streams 'tool-result' for checkAvailability. ToolInvocationUIPart updates to state: 'result', UI might show "Found 3 available slots for Jane.").
AI (Turn 2 - Server, after processing checkAvailability result):
- LLM sees the available slots. Now it needs to present these options to the user.
- Decides to call the client-interaction tool: displaySlotOptions(slots: ["Tuesday 2:00 PM", "Tuesday 3:00 PM", "Tuesday 4:00 PM"], prompt: "Okay, I found these times for Jane next Tuesday. Which one works for you?").
- (SDK: Since displaySlotOptions has no server execute, this tool call is streamed to the client.)
- (Client UI: Server streams 'tool-call' for displaySlotOptions. A new ToolInvocationUIPart appears with state: 'call'.)
- Your UI rendering logic for displaySlotOptions (when state: 'call') would parse its args and display: "Okay, I found these times for Jane next Tuesday. Which one works for you?" [Button: Tuesday 2:00 PM] [Button: Tuesday 3:00 PM] [Button: Tuesday 4:00 PM] [FIGURE 8: Mockup of UI showing these clickable slot buttons within a ToolInvocationUIPart]
User (Client-Side Interaction):
- User sees the options and clicks the "Tuesday 3:00 PM" button.
- (Client: The button's onClick handler calls addToolResult({ toolCallId: id_for_displaySlotOptions, result: "Tuesday 3:00 PM" }).)
- (Client useChat: ToolInvocationUIPart for displaySlotOptions updates to state: 'result'. Since maxSteps on useChat allows, and this resolves the pending tool call, useChat automatically POSTs the updated messages (including this tool result) back to the server API.)
AI (Turn 3 - Server, after processing "Tuesday 3:00 PM" selected):
- LLM receives the user's selection. Now it needs to confirm the booking.
- Calls server-side tool confirmBooking(person: "Jane", selectedSlot: "Tuesday 3:00 PM", durationHours: 1).
- (SDK: Validates args, calls confirmBookingTool.function.execute(...).)
- (Client UI: Server streams 'tool-call' for confirmBooking. A new ToolInvocationUIPart appears, state: 'call' -> shows "Confirming your meeting with Jane for Tuesday 3:00 PM...").
Server (confirmBooking executes):
- execute function runs, interacts with the calendar backend to book the slot.
- Returns result: "Great! I've booked your 1-hour meeting with Jane for next Tuesday at 3:00 PM.".
- (SDK: Result sent back to LLM.)
- (Client UI: Server streams 'tool-result' for confirmBooking. ToolInvocationUIPart updates to state: 'result', UI might show "Booking confirmed!" and the message).
AI (Turn 4 - Server, after processing confirmBooking result):
- LLM sees the successful booking confirmation.
- Generates a final text response to the user: "Great! I've booked your 1-hour meeting with Jane for next Tuesday at 3:00 PM."
- (SDK: No more tool calls, so this text is the final output for this streamText operation.)
- (Client UI: Server streams 'text' parts. The final assistant message appears.)
v5 Features Highlighted in this Showcase:
- Structured ToolInvocationUIPart: Each tool interaction (check availability, display options, confirm booking) is a distinct part in an assistant's UIMessage, evolving through states ('call', 'result').
- Mix of Tool Types: We used server-side auto-executed tools (checkAvailability, confirmBooking) and a client-side UI-interactive tool (displaySlotOptions).
- maxSteps (Server & Client): This option on both streamText (server) and useChat (client) facilitated the automated chaining of these steps without excessive manual coding of the conversation flow.
- addToolResult: Crucial for the client-side displaySlotOptions tool, allowing user interaction to provide the "result" for that tool call.
- Clear UI Feedback: The changing states of ToolInvocationUIPart allow the UI to continuously inform the user about what the AI assistant is doing.
This kind of complex, multi-turn, tool-using interaction is precisely what Vercel AI SDK v5 is designed to simplify and make robust.
Take-aways / Migration Checklist Bullets
- v5's tool features enable complex, multi-step conversational agents.
- Combine server-side tools (with execute) for backend logic and client-side tools (onToolCall / addToolResult) for UI interactions.
- maxSteps on both server and client helps automate the conversational chain.
- ToolInvocationUIPart is key for representing tool state in the UI.
- Design your tools and prompts to guide the AI through the desired workflow.
Wrapping up our deep dive into Vercel AI SDK v5's tool capabilities, let's consolidate the main advantages and best practices that emerge from this new, more structured approach.
Why this matters? (Context & Pain-Point)
Working with AI tools can quickly become complex. Vercel AI SDK v5 brings a significant level of organization and power to this domain. Understanding these core principles will help you build more robust, maintainable, and user-friendly tool-using AI applications.
Actionable Takeaways & Best Practices:
Structured is Better:
- The shift from ad-hoc tool data or simple string parsing to v5's ToolInvocationUIPart (within UIMessage.parts) and the server-side V2 LanguageModelV2FunctionTool definition provides a robust, typed, and stateful way to handle tools. This structured approach is far superior for managing the lifecycle of tool calls, displaying their state in the UI, and ensuring data integrity.
Schema is Your Friend (Embrace Zod):
- For LanguageModelV2FunctionTool.parameters, always use Zod schemas (or other compatible schema types if supported by your specific V2 provider adapter). This gives you:
- Clear definition of expected arguments for the LLM.
- Automatic server-side validation of arguments provided by the LLM before your execute function runs.
- Type safety for tool arguments in your TypeScript code.
- Don't skimp on defining these schemas accurately.
- For LanguageModelV2FunctionTool.parameters, always use Zod schemas (or other compatible schema types if supported by your specific V2 provider adapter). This gives you:
Clear Client vs. Server Execution Strategy:
- Consciously decide where each tool should run.
- Use server-side execute functions within your LanguageModelV2FunctionTool definitions for tools that:
- Depend on backend resources (databases, internal APIs).
- Require secure execution environments or API keys.
- Don't need direct user interaction for their execution.
- Use client-side handling via useChat's onToolCall prop or UI-driven addToolResult calls for tools that:
- Need to access browser APIs (geolocation, local storage, etc.).
- Require explicit user confirmation or interaction within the UI after the AI requests the tool.
Design for Multi-Step Interactions:
- Leverage the maxSteps option in both server-side streamText and client-side useChat.
- This allows you to build conversational agents that can chain multiple tool calls, process results, and continue reasoning or acting, all within a more automated flow.
- Think about your user journeys as potential multi-step sequences that tools can facilitate.
Build Rich, Informative UIs for Tools:
- The different states of ToolInvocationUIPart ('partial-call', 'call', 'result', 'error') are there for a reason. Use them!
- Provide clear visual feedback to the user:
- When the AI is calling a tool.
- When arguments are being streamed or processed.
- When a tool is executing.
- Display tool results (or errors) in a user-friendly way.
- This transparency greatly improves the user experience and helps manage expectations during complex AI operations.
Security First, Always:
- Validate LLM-generated arguments before execution. Zod schemas for server-side tools help immensely. Manually validate on the client if needed.
- Sanitize tool results before displaying them in the UI (to prevent XSS) and before sending them back to the LLM (to prevent sending harmful or excessively large data).
- Treat any data coming from the LLM or from external tool executions as potentially untrusted until validated or sanitized.
Teasing Post 9: Persisting Rich Chat Histories
With tool calls now behaving as first-class citizens within our rich UIMessages, and our AI assistants capable of complex, multi-step interactions, a critical question arises: how do we reliably save and restore these intricate conversations?
Post 9 in our "Inside Vercel AI SDK 5" series will explore just that: "Persisting Rich UIMessage Histories: The v5 'Persist Once, Render Anywhere' Model." We'll dive into strategies for database schema design, best practices for saving UIMessage arrays with all their parts and metadata, and how v5 facilitates high-fidelity restoration of these complex chat states. Stay tuned!