#speech-recognition
#speech-recognition

3 weeks ago

Google quietly releases an offline-first AI dictation app on iOS | TechCrunch

Google released an offline-first dictation app called Google AI Edge Eloquent for iOS, featuring advanced speech recognition and text editing capabilities.

#machine-learning

fromTheregister

4 weeks ago

Artificial intelligence

Microsoft shivs OpenAI with new AI models for speech, images

Microsoft launched public preview versions of machine learning models for speech recognition, speech synthesis, and image generation, competing directly with OpenAI.

fromHackernoon

Artificial intelligence

Evaluating Multimodal Speech Models Across Diverse Audio Tasks | HackerNoon

The study leverages diverse speech datasets to evaluate model performance across various speech tasks and improve generalization capabilities.

fromTheregister

4 weeks ago

Microsoft shivs OpenAI with new AI models for speech, images

Microsoft launched public preview versions of machine learning models for speech recognition, speech synthesis, and image generation, competing directly with OpenAI.

fromHackernoon

fromwww.businessinsider.com

Artificial intelligence

Evaluating Multimodal Speech Models Across Diverse Audio Tasks | HackerNoon

more#machine-learning

#ai

Artificial intelligence

The AI tech my dad helped pioneer is now the foundation for the tools I build at AT&T

fromArs Technica

Mobile UX

The debut of Gemini 3.1 Flash Live could make it harder to know if you're talking to a robot

fromFast Company

4 months ago

Startup companies

This AI startup is extending an olive branch between humans and machines

Artificial intelligence

How to Use AI in Video Calls | ClickUp

fromInside Higher Ed | Higher Education News, Events and Jobs

Artificial intelligence

Mistral releases Voxtral, its first open source AI audio model | TechCrunch

Artificial intelligence

Howard and Google Aim to Improve AI Tech for Black Users

fromwww.businessinsider.com

The AI tech my dad helped pioneer is now the foundation for the tools I build at AT&T

Natalie Gilbert's work in AI is rooted in her father's foundational research in speech recognition at AT&T's Bell Labs.

Mobile UX

fromArs Technica

The debut of Gemini 3.1 Flash Live could make it harder to know if you're talking to a robot

Gemini 3.1 Flash Live enhances audio interaction, mimicking human speech with AI flags for authenticity detection.

fromFast Company

4 months ago

Startup companies

This AI startup is extending an olive branch between humans and machines

Artificial intelligence

How to Use AI in Video Calls | ClickUp

fromInside Higher Ed | Higher Education News, Events and Jobs

Artificial intelligence

Mistral releases Voxtral, its first open source AI audio model | TechCrunch

Artificial intelligence

Howard and Google Aim to Improve AI Tech for Black Users

more#ai

Cohere launches an open-source voice model specifically for transcription | TechCrunch

Cohere's Transcribe model is designed for tasks like note-taking and speech analysis, supporting 14 languages and optimized for consumer-grade GPUs, making it accessible for self-hosting.

European startups

Typography

fromMail Online

The UK's hardest accents to understand - with Essex at top of the list

The Essex accent is the most difficult for automated speech-to-text systems to understand, while the Mancunian accent is the easiest.

Business

fromEntrepreneur

2 months ago

Grow Your Global Business Reach: Learn a New Language With Rosetta Stone

Mastering new languages with Rosetta Stone's lifetime subscription builds cross-cultural trust, strengthens partnerships, and offers immersive speech-recognition training across 25 languages.

fromHubspot

3 years ago

Voice search optimization: How to get your business heard about

From smartphones to smart speakers and smart TVs, conducting web searches with our voices is common. In many cases, it's even faster, more convenient, and easier than typing in a query. That's likely why the global speech and voice recognition market is projected to grow from $9.66 billion in 2025 to $23.11 billion by 2030. Here's the catch, though: Voice search isn't the same as a text search.

Marketing tech

fromAbove the Law

3 months ago

Why Solo And Small Firm Lawyers Should Make Voice Their Choice For AI - Above the Law

Voice-based drafting, powered by modern AI transcription, is faster and increasingly accurate, offering a practical shift from keyboard to voice for solos and small firms.

fromApp Developer Magazine

Why MedGemma 1.5 matters more than the headlines

MedGemma 1.5 and MedASR provide practical, open tools improving integration of 3D medical imaging, clinical text, and speech for healthcare developers.

Healthcare

fromTechzine Global

3 months ago

Dutch healthcare AI Juvoly acquired by Swedish Tandem Health

Juvoly is being acquired by Tandem Health to scale operations, expand across Europe, and accelerate voice-controlled AI reporting for healthcare with added development and certification capacity.

Speechify adds voice typing and voice assistant to its Chrome extension | TechCrunch

Speechify added Chrome extension voice detection — English dictation and a conversational voice assistant — but accuracy and site compatibility require further improvement.

fromAxios

AI's listening gap is fueling bias in jobs, schools and health care

AI speech recognition often misinterprets nonstandard English and some Black speakers, risking misdiagnosis, incorrect legal records, and unequal access in high-stakes decisions.

fromTechzine Global

AI speech model aiOla Drax outpaces OpenAI & Alibaba

As explained in this video, flow-matching-based generative methods are a class of models that learn a "continuous vector field" in order to manage and transform what are relatively simple "noise distributions" into more complex data distributions. They do this by following ordinary differential equations. Instead of learning "discrete denoising steps" (that's what diffusion models do), they train the flow to match probability paths directly between data and noise.

Artificial intelligence

Startup companies

Subtle Computing's voice isolation models help computers understand you in noisy environments | TechCrunch

Subtle Computing builds device-specific voice isolation models that preserve device acoustics to capture clean, personalized speech in noisy environments and outperform generic solutions.

fromSearch Engine Roundtable

fromFast Company

6 months ago

Inside Microsoft's quest to make Windows 11's AI irresistible

Windows 11 introduces Copilot Voice to enable spoken interactions with AI and spoken responses, continuing decades of Microsoft voice-computing efforts.

6 months ago

Google Voice Search Now Using Speech-to-Retrieval (S2R)

At its core, S2R is a technology that directly interprets and retrieves information from a spoken query without the intermediate, and potentially flawed, step of having to create a perfect text transcript. It represents a fundamental architectural and philosophical shift in how machines process human speech.

Artificial intelligence

fromFortune

6 months ago

I tried the viral AI 'Friend' necklace everyone's talking about-and it's like wearing your senile, anxious grandmother around your neck | Fortune

An always-listening AI necklace marketed for contextual emotional support failed to deliver reliable, timely, or truly contextual help during an emotional crisis.

Voice Recognition vs Speech Recognition: What You Need to Know

You've probably used both technologies this week without realizing it. When Siri transcribes your text message, that's speech recognition. When your banking app verifies it's you speaking, that's voice recognition. The terms are often used interchangeably, but they address completely different problems. And as artificial intelligence gets better at faking human speech, understanding voice recognition vs. speech recognition becomes critical for anyone building secure systems.

Artificial intelligence

Gadgets

fromDesign Milk

Timekettle W4 AI Interpreter Earbuds Streamline Translation

Timekettle's W4 AI Interpreter Earbuds provide near-instant, AI-powered multilingual translation with Bone-voiceprint sensors, dual-voice pickup, noise filtering, and 98% accuracy across 42 languages.

Education

fromEntrepreneur

8 months ago

Use Rosetta Stone to Impress Clients Around the World with Fluent, Natural Speech | Entrepreneur

Lifetime Rosetta Stone access to 25 languages with speech-recognition and immersive lessons is available for new users for $148.97 using code FLUENT until September 7.

fromTheregister

8 months ago

Transcription app Otter.ai accused of illegal recordings

"Otter tries to shift responsibility, outsourcing its legal obligations to its accountholders, rather than seeking permission and consent from the individuals Otter records, as required by law."

Privacy professionals

8 months ago

Whisper vs. Google Speech-to-Text: Which One Should You Use?

Whisper excels in multilingual transcription, supporting a variety of languages and offering consistent accuracy, making it suitable for global applications and media projects.

Artificial intelligence

Online marketing

10 Best Whisper AI Alternatives for Transcription in 2025 | ClickUp

Whisper AI has limitations in real-time features and collaboration.

fromInfoQ

Mistral Voxtral is an Open-Weights Competitor to OpenAI Whisper and Other ASR Tools

Mistral's Voxtral integrates advanced speech recognition and language understanding, offering deployment flexibility with openly available model weights.

fromTechzine Global

Mistral launches Voxtral: open-source speech recognition for businesses

Mistral's Voxtral speech models offer an alternative to closed APIs, combining high accuracy, multilingual support, and extensive context processing at competitive prices.

Artificial intelligence

#ai-technology

fromwww.bbc.com

Artificial intelligence

New AI voice tool trained to copy British regional accents

Women in technology

Wispr Flow releases iOS app in a bid to make dictation feel effortless | TechCrunch

fromwww.bbc.com

Artificial intelligence

New AI voice tool trained to copy British regional accents

Women in technology

Wispr Flow releases iOS app in a bid to make dictation feel effortless | TechCrunch

more#ai-technology

fromHackernoon

fromSitePoint Forums | Web Development & Design Community

SpeechVerse vs. SOTA: Multi-Task Speech Models in Real-World Benchmarks | HackerNoon

The SpeechVerse framework demonstrates effective end-to-end joint speech and language capabilities, outperforming some specialized models in specific tasks while balancing performance across diverse benchmarks.

Artificial intelligence

Python

Developing speech to text transcription on python

The provided code attempts to transcribe audio files using Whisper, but it has syntax issues that need to be addressed for successful execution.

Parenting

fromPsychology Today

11 months ago

The Mother of Communication

Babies begin language learning in the womb by recognizing their mother's voice and speech patterns.

fromwww.scientificamerican.com