Sarvam AI Speech to Text API
Sarvam AI Speech to Text API transcribes speech across 22 Indian languages with speaker diarization and code-mixing support.
Screenshots
About Sarvam AI Speech to Text API
Sarvam AI's Speech to Text API delivers accurate transcription in 22 Indian languages, including Hindi, Bengali, Tamil, Telugu, Gujarati, Kannada, Malayalam, Marathi, Punjabi, Odia, and English with an Indian accent. Built on the Saarika v2 model, it handles diverse linguistic needs while maintaining precision even in challenging audio environments with background noise, cross-talk, and poor connections.
Speaker diarization functionality automatically identifies and labels different speakers in audio, making it invaluable for meeting transcriptions, interviews, and call center analytics. The API excels at handling code-mixing, seamlessly managing mid-sentence switches between Hindi, English, and regional languages—a critical capability for natural Indian language interactions.
The platform supports multiple audio formats including MP3, WAV, AAC, OGG, Opus, FLAC, M4A, AMR, WMA, and WebM, ensuring compatibility across various recording sources. Three API options accommodate different workflows: REST API for files under 30 seconds, Batch API for processing up to 1 hour with full diarization and timestamps, and Streaming API for real-time transcription via WebSocket.
Designed for developer integration and enterprise scalability, the API provides a flexible, production-ready solution for building multilingual speech applications across Indian language markets.