Mistral releases a new open-source model for speech generation | TechCrunch
Briefly

Mistral releases a new open-source model for speech generation | TechCrunch
""Our customers have been asking for a speech model. So we built a small-sized speech model that can fit on a smartwatch, a smartphone, a laptop, or other edge devices. The cost of it is a fraction of anything else on the market, but it offers state-of-the-art performance," Pierre Stock, vp of science operations at Mistral AI, told TechCrunch during a phone interview."
"The model can adapt a custom voice with a sample of less than five seconds, capturing characteristics like subtle accents, inflections, intonations, and irregularities in the flow of speech."
"The model has a time-to-first-audio (TTFA) of 90ms for a 10-second sample of 500 characters and a real-time factor (RTF) of 6x, allowing it to render a 10-second clip in roughly 1.6 seconds."
Mistral introduced Voxtral TTS, an open-source text-to-speech model designed for voice AI assistants and enterprise applications like customer support. The model supports nine languages and can adapt a custom voice with a sample of less than five seconds. It features a time-to-first-audio of 90ms and a real-time factor of 6x, ensuring quick and efficient performance. Mistral aims to provide a comprehensive suite of voice products, enhancing customer engagement and sales capabilities for enterprises.
Read at TechCrunch
Unable to calculate read time
[
|
]