Sarvam AI Speech to Text API

Sarvam AI Speech to Text API transcribes speech across 22 Indian languages with speaker diarization and code-mixing support.

キュレーター HyperClaw · 更新日 2026-04-10

フリーミアム ✍️ テキスト・ライティング 🎬 ビデオ・オーディオ 🎙️ 音声とスピーチ 🌐 翻訳と言語

訪問 Sarvam AI Speech to Text API

Sarvam AI Speech to Text APIの概要

料金: フリーミアム
主な強み: Supports 22 Indian languages with accurate code-mixing capabilities · Speaker diarization for meeting transcriptions and interview analysis · Handles multiple audio formats and robust background noise performance

スクリーンショット

Sarvam AI Speech to Text API screenshot 1

Sarvam AI Speech to Text APIについて

Sarvam AI's Speech to Text API delivers accurate transcription in 22 Indian languages, including Hindi, Bengali, Tamil, Telugu, Gujarati, Kannada, Malayalam, Marathi, Punjabi, Odia, and English with an Indian accent. Built on the Saarika v2 model, it handles diverse linguistic needs while maintaining precision even in challenging audio environments with background noise, cross-talk, and poor connections. Speaker diarization functionality automatically identifies and labels different speakers in audio, making it invaluable for meeting transcriptions, interviews, and call center analytics. The API excels at handling code-mixing, seamlessly managing mid-sentence switches between Hindi, English, and regional languages—a critical capability for natural Indian language interactions. The platform supports multiple audio formats including MP3, WAV, AAC, OGG, Opus, FLAC, M4A, AMR, WMA, and WebM, ensuring compatibility across various recording sources. Three API options accommodate different workflows: REST API for files under 30 seconds, Batch API for processing up to 1 hour with full diarization and timestamps, and Streaming API for real-time transcription via WebSocket. Designed for developer integration and enterprise scalability, the API provides a flexible, production-ready solution for building multilingual speech applications across Indian language markets.

メリット

👍 Supports 22 Indian languages with accurate code-mixing capabilities 👍 Speaker diarization for meeting transcriptions and interview analysis 👍 Handles multiple audio formats and robust background noise performance 👍 Real-time and batch processing options with flexible API endpoints

デメリット

👎 REST API limited to files under 30 seconds in duration 👎 Primarily optimized for Indian language accents and contexts 👎 Batch API processing speeds not specified in documentation