Voice to Voice AI with Amazon Nova Sonic

Lomanu4 · 16 Май 2025

Amazon Nova Sonic is a state-of-the-art speech-to-speech model that delivers real-time, human-like voice conversations with industry-leading price performance and low latency. Available with a bidirectional streaming API on Bedrock, Nova Sonic can enable developers to create truly natural, human-like AI agents that do not require users to type in their requests. What excites me most is that this capability opens AI access to many people who otherwise might struggle to use it.

Nova Sonic has both masculine-sounding and feminine-sounding voices, and can produce American and British English accents.

Nova Sonic can be used in Agentic workflows. It can consult knowledge bases using RAG and ground the information it gives to the user. It can do function calling, also called tool use. Since tools are supported, we are just a step away from utilising MCP servers with Nova Sonic.

Amazon Nova Sonic uses a persistent bidirectional connection that allows simultaneous event streaming in both directions.We use WebSockets in the demo below. This means that the conversation can flow very naturally, we can continuously stream the audio, and input can be processed while output is being generated. Just like humans, Nova Sonic can even respond without needing to wait for complete utterances from the user.

Nova Sonic is event-driven. client and model exchange structured JSON events and those events control the session lifecycle, audio streaming, text responses, and tool interactions.

How to use Nova Sonic? AWS SDKs in several languages, including Java, JavaScript, C++, Kotlin, and Swift, support the new bidirectional InvokeModelWithBidirectionalStream API. Python SDK, which uses async features to do this, is an experimental one, but it covers the basics well.

You will do the following (Python example, but same applies elsewhere)

Create a Sonic client.
Create function(s) that define how you will handle each event like ContentStart, ContentEnd etc.
Start a session with the client
Call the Invoke api above with await (in experimental Python SDK)

Demo Video Snippet:

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

You can also get started with this Nova Workshop codebase:

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

Voice to Voice AI with Amazon Nova Sonic

Lomanu4