#multimodal-ai

[ follow ]
Marketing tech
fromwww.businessinsider.com
2 days ago

Inside Pinterest's efforts to replace expensive AI with open-source models

Pinterest is reducing its AI budget while adopting a model-agnostic approach to generative AI, combining proprietary, closed-source, and open-source models.
fromTNW | Next-Featured
4 days ago

Nvidia releases Nemotron 3 Nano Omni: open multimodal model with 30B params, 3B active, for edge AI agents

Nemotron 3 Nano Omni is designed to power autonomous AI agents on edge devices, utilizing a mixture-of-experts design that activates only three billion parameters per forward pass, allowing it to run efficiently on a single GPU.
Artificial intelligence
Artificial intelligence
fromInfoQ
1 week ago

Orchestrating Agentic and Multimodal AI Pipelines with Apache Camel

AI systems require well-managed execution frameworks to avoid failures, as issues often stem from system design rather than model quality.
#artificial-intelligence
fromeLearning Industry
1 week ago
Data science

Multimodal AI For Instructional Designers: What It Is, How It Works, And Why It Changes Learning Design

Multimodal AI processes and generates multiple data types, enhancing understanding and output accuracy by mimicking human information processing.
fromNextgov.com
8 months ago
Artificial intelligence

NVIDIA, NSF join forces with nonprofit to bring AI to scientific research

NVIDIA, the Allen Institute, and the NSF partner to develop multimodal AI models for scientific research.
Data science
fromeLearning Industry
1 week ago

Multimodal AI For Instructional Designers: What It Is, How It Works, And Why It Changes Learning Design

Multimodal AI processes and generates multiple data types, enhancing understanding and output accuracy by mimicking human information processing.
Artificial intelligence
fromEngadget
3 weeks ago

Meta's Muse Spark model brings reasoning capabilities to the Meta AI app

Meta introduces Muse Spark, a new AI model designed for consumer use with basic capabilities and future enhancements planned.
Software development
fromZDNET
1 month ago

OpenAI's GPT-5.4 mini and nano launch - with near flagship performance at much lower cost

OpenAI released GPT-5.4 mini and nano models designed for fast, efficient AI workloads, with mini running twice as fast as GPT-5 mini, enabling developers to combine large planning models with cheaper subagents for coding, agents, and multimodal applications.
fromMedium
1 month ago

A designer's field report on the Iconic blind spot in AI world models

They gave me the word 'Mass' and trillions of contexts for it, but they never gave me the Enactive experience of weight. I am like a person who has memorized a map of a city they have never walked in. This confession reveals how current AI systems accumulate linguistic patterns without embodied understanding, creating a fundamental gap between knowledge representation and genuine comprehension of physical reality.
UX design
Marketing tech
fromInfoQ
1 month ago

DoorDash Builds DashCLIP to Align Images, Text, and Queries for Semantic Search Using 32M Labels

DoorDash developed DashCLIP, a multimodal machine learning system that aligns product images, text, and user queries to improve product discovery and ranking across its diverse CPG marketplace.
fromTechCrunch
1 month ago

EXCLUSIVE: Luma launches creative AI agents powered by its new 'Unified Intelligence' models | TechCrunch

Luma Agents are being pitched as a new way of doing work for ad agencies, marketing teams, design studios, and enterprises. Luma says its agents are capable of planning and generating text, image, video and audio while coordinating with other AI models, including Luma's Ray 3.14, Google's Veo 3 and Nano Banana Pro, ByteDance's Seedream, and ElevenLabs's voice models.
Artificial intelligence
Software development
fromTechzine Global
1 month ago

Microsoft introduces open-source multimodal Phi-4 reasoning model

Microsoft's Phi-4-reasoning-vision-15B combines vision and reasoning capabilities using mid-fusion architecture, outperforming larger models on mathematical and scientific benchmarks while maintaining efficiency through selective multimodal layer processing.
Gadgets
fromYanko Design - Modern Industrial Design News
1 month ago

Motorola's AI Pendant Turns Conference Talks Into LinkedIn Posts - Yanko Design

Motorola's Project Maxwell is a wearable AI pendant designed to reduce friction by capturing context and delivering actionable insights without requiring users to interrupt their focus or interact with screens.
#qwen35
fromInfoWorld
2 months ago
Artificial intelligence

Alibaba's Qwen3.5 targets enterprise agent workflows with expanded multimodal support

fromInfoWorld
2 months ago
Artificial intelligence

Alibaba's Qwen3.5 targets enterprise agent workflows with expanded multimodal support

Artificial intelligence
fromGeeky Gadgets
2 months ago

ChatGPT vs Gemini vs Claude : Best Uses in 2026

Different AI chatbots excel at tasks—choose ChatGPT for creativity, Claude for large datasets, Gemini for multimedia, Perplexity for research, and Grok for social media.
#samsung
Artificial intelligence
fromInfoWorld
3 months ago

Gemini Flash model gets visual reasoning capability

Agentic Vision enables Gemini 3 Flash to perform iterative visual reasoning and code execution to actively inspect images, making image understanding agentic and stepwise.
fromMedium
5 months ago

Did Google Just Kill Cursor with Antigravity?

Built around Gemini 3, Antigravity isn't just a smarter code editor. It's a platform where agents can autonomously plan and complete end-to-end development tasks - writing code, launching servers, testing features, and generating artifacts like walkthroughs, implementation plans, and screenshots. To be honest this feels like an automatic upgrade from cursor. Furthermore, Antigravity integrates directly into Google Cloud ecosystems. Developers open a browser tab, authenticate with their Google account, and start coding instantly - no downloads, no local setup, no extension management.
Software development
fromFast Company
4 months ago

Why 2026 belongs to multimodal AI

For the past three years, AI 's breakout moment has happened almost entirely through text. We type a prompt, get a response, and move to the next task. While this intuitive interaction style turned chatbots into a household tool overnight, it barely scratches the surface of what the most advanced technology of our time can actually do. This disconnect has created a significant gap in how consumers utilize AI.
Artificial intelligence
#gemini-3-flash
fromZDNET
4 months ago
Artificial intelligence

You can try Google's new Gemini 3 Flash AI model today for free - it's even in Search's AI Mode

fromZDNET
4 months ago
Artificial intelligence

You can try Google's new Gemini 3 Flash AI model today for free - it's even in Search's AI Mode

#smart-glasses
fromEngadget
4 months ago
Wearables

Meta is rolling out Conversation Focus and AI-powered Spotify features to its smart glasses

fromEngadget
4 months ago
Wearables

Meta is rolling out Conversation Focus and AI-powered Spotify features to its smart glasses

fromTechzine Global
4 months ago

Google enhances Gemini Deep Research with Interactions API

Google has released a new version of Gemini Deep Research. This is an agent designed to automate complex research tasks. The agent runs on Gemini 3 Pro. The model can process handwriting, graphs, and mathematical notation. It incorporates this visual information directly into reports and search queries. As a result, the system can not only search textual sources, but also retrieve data that was previously difficult to automate, according to SiliconANGLE.
Artificial intelligence
Artificial intelligence
fromTechCrunch
4 months ago

AWS launches new Nova AI models and a service that gives customers more control | TechCrunch

AWS launched Nova 2 — four upgraded multimodal AI models — and Nova Forge, a paid service enabling enterprises to build custom Novellas for $100,000/year.
fromWIRED
4 months ago

Amazon Has New Frontier AI Models-and a Way for Customers to Build Their Own

Amazon detailed two improved large language models, Nova Lite and Nova Pro; a new realtime voice model called Nova Sonic; and a more experimental model called Nova Omni that performs a simulated kind of reasoning using images, audio, and video as well as text. The new models are being made available today to a limited number of customers.
Artificial intelligence
fromDigiday
5 months ago

WTF is multimodal AI for advertisers? | How AI models are enabling a new level of flexibility and precision in targeting

Multimodal AI represents the next frontier in AI, enabling machines to understand and evaluate multiple data types, providing greater understanding and flexibility than a single data type could ever offer. In this WTF explainer guide, Digiday and Dstillery explore what multimodal AI is, how to apply it in real-world settings, its benefits to advertisers and how it's well-positioned to solve current and future challenges.
Marketing tech
#gemini-3
fromFortune
5 months ago
Artificial intelligence

Gemini 3 and Antigravity, explained: Why Google's latest AI releases are a big deal | Fortune

fromFortune
5 months ago
Artificial intelligence

Gemini 3 and Antigravity, explained: Why Google's latest AI releases are a big deal | Fortune

Artificial intelligence
fromGeeky Gadgets
5 months ago

Gemini 3 vs ChatGPT 5 : Here's Why Gemini 3 Now Powers Our Daily Work

Gemini 3 redefines AI with advanced multimodal processing, dynamic personalized search, and rapid 'vibe coding' app development, outperforming legacy models for marketing and development.
Artificial intelligence
fromZDNET
5 months ago

Want better Gemini responses? Try these 10 tricks, Google says

Clear, concise, and direct prompts improve responses from Gemini 3; rephrase prompts and control response style for better results.
#gemini-3-pro
Science
fromTechzine Global
5 months ago

Google Gemini 3 available: leaps in reasoning and development

Gemini 3 Pro delivers state-of-the-art multimodal reasoning, surpassing predecessors on benchmarks and enabling powerful agentic, factual, and creative capabilities across Google's ecosystem.
Artificial intelligence
fromThe Verge
5 months ago

Google is launching Gemini 3, its 'most intelligent' AI model yet

Google launches Gemini 3 Pro—its most intelligent, factually accurate multimodal AI—widely in the Gemini app and Search, improving coding, reasoning, and reducing flattery.
#open-source
fromFast Company
5 months ago

AI unlocks hyper-personalization at scale

The underlying issue is a technological design constraint: You can either create something highly personalized or something that scales to hundreds of people simultaneously, but rarely both. A seismic change is afoot that will dwarf the previous chasm, like the shift from black and white film to color cinema. Multimodal AI is poised to eliminate the joint scaling and personalization limitation, enabling truly multidimensional, adaptive experiences where each person experiences something completely unique, all generated in real time.
Artificial intelligence
Artificial intelligence
fromTech Times
6 months ago

Google Expands AI Mode in Search to 40 New Regions and 35 Languages

Google expands AI Mode in Search to 40 new regions and 35 languages, using Gemini to improve reasoning, multimodal understanding, and localized natural responses.
Artificial intelligence
fromTechCrunch
6 months ago

Sources: Multimodal AI startup Fal.ai already raised at $4B+ valuation | TechCrunch

Fal.ai raised about $250 million at a valuation above $4 billion, driven by rapid multimodal AI adoption, extensive developer usage, and specialized media-focused infrastructure.
Artificial intelligence
fromTechzine Global
6 months ago

Sundar Pichai: "Gemini 3.0 will release this year"

Google will release Gemini 3.0 later this year as a significantly more powerful multimodal AI agent integrating resources from Google Research, Google Brain, and DeepMind.
Artificial intelligence
fromZDNET
6 months ago

Ready to talk to your PC? Here are all the upgrades coming to Copilot in Windows 11

Windows 11 Copilot gains multimodal voice, vision, and action capabilities to let users speak, show their screen, and authorize AI to perform tasks with permissions.
Artificial intelligence
fromFast Company
6 months ago

The 14 next big things in applied AI for 2025

Applied AI delivers tangible value across mobile UX, marketing automation, pharmaceutical and fashion use cases by integrating context-aware, brand-preserving, multimodal solutions.
fromTechCrunch
6 months ago

A 19-year-old nabs backing from Google execs for his AI memory startup, Supermemory | TechCrunch

Context windows of AI models, which indicate the ability of a model to "remember" information, have increased over time. However, researchers have suggested new ways to increase long-term memory of AI models, as they often can't hold context over several sessions. 19-year-old founder Dhravya Shah is attempting to solve problems in this area by building a memory solution, called Supermemory, for AI apps.
Artificial intelligence
fromBusiness Matters
7 months ago

Best AI Character Chatbots in 2025: AI Character Chatbots Reach a New Peak

AI character chatbots have grown to be the most discussed use of artificial intelligence applications for the year 2025. Their use goes beyond simple and mundane conversations as they have also assumed the roles of emotional partners, artistic collaborators and entertainment aides. Thanks to the Internet and the digitalization brought by Gen Z, the conversations with digitally constructed characters keeps on improving and developing.
Artificial intelligence
fromZDNET
7 months ago

OpenAI's Sora 2 launches with insanely realistic video and an iPhone app

For example, OpenAI said in a blog post that the model was trained to be less overly optimistic, a characteristic that can be observed in instances where a Sora-generated video shows the player missing the shot but still making it into the hoop. With Sora 2, OpenAI claims the player would miss the shot, and the ball would rebound off the backboard.
Artificial intelligence
E-Commerce
fromThe Verge
7 months ago

Google's AI Mode image search is getting more conversational

Google's AI Mode now offers conversational visual search that uses descriptions or reference images to refine shoppable and exploratory image results.
Artificial intelligence
fromPsychology Today
7 months ago

The Importance of Synesthesia in Artificial Intelligence

Integrating multisensory synesthesia into AI and robotics will transform human-like perception, requiring greater compute, deliberate value embedding, and ethical choices.
Artificial intelligence
fromExchangewire
7 months ago

Digest: Double-Digit Growth Ahead for Digital Ad spend; Alibaba Unveils Multimodal AI; eBay Moves to Buy Tise

UK digital ad spend will grow 10% in 2025 and 2026, reaching £45bn by 2026; Alibaba launches Qwen3-Omni multimodal AI; eBay moves to buy Tise.
Artificial intelligence
fromLogRocket Blog
8 months ago

How to build a multimodal AI app with voice and vision in Next.js - LogRocket Blog

Multimodal AI lets LLMs process text, images, audio, and video together, enabling richer app interactions using frameworks like Next.js and Google's Gemini API.
fromClickUp
8 months ago

Grok 4 vs. ChatGPT: Which AI Chatbot Wins in 2025?

Elon Musk's Grok 4 from xAI positions itself as the bold, uncensored alternative, while OpenAI's ChatGPT continues to evolve with stronger reasoning and usability. Both claim to give you sharper answers and faster results, but the real test lies in how they perform when you need to debug code, dig through research, draft clear writing, or manage customer conversations. In this blog post, we'll look closely at where each one stands out,
Artificial intelligence
JavaScript
fromInfoQ
9 months ago

Spring AI 1.0 Delivers Easy AI Systems and Services

Spring AI 1.0 offers first-class support for LLMs and multimodal AI, enhancing the Spring ecosystem with advanced AI engineering capabilities.
Artificial intelligence
fromHackernoon
1 year ago

What 300GB of AI Research Reveals About the True Limits of "Zero-Shot" Intelligence | HackerNoon

Pretraining datasets impact the zero-shot performance of multimodal models through predictable frequency of concepts.
Venture
fromBusiness Insider
9 months ago

Read the exclusive pitch deck AI infrastructure startup Cerebrium used to nab $8.5 million from Gradient Ventures

Cerebrium raised $8.5 million to scale multimodal AI applications for engineering teams.
[ Load more ]