#ai-safety

[ follow ]
#deepmind
Artificial intelligence
fromInfoQ
1 day ago

Google DeepMind Shares Approach to AGI Safety and Security

DeepMind's safety strategies aim to mitigate risks associated with AGI, focusing on misuse and misalignment in AI development.
fromTechCrunch
3 weeks ago
Gadgets

DeepMind's 145-page paper on AGI safety may not convince skeptics | TechCrunch

DeepMind emphasizes the urgency of AGI safety, predicting its arrival by 2030 and the potential for severe risks.
Artificial intelligence
fromInfoQ
1 day ago

Google DeepMind Shares Approach to AGI Safety and Security

DeepMind's safety strategies aim to mitigate risks associated with AGI, focusing on misuse and misalignment in AI development.
fromTechCrunch
3 weeks ago
Gadgets

DeepMind's 145-page paper on AGI safety may not convince skeptics | TechCrunch

DeepMind emphasizes the urgency of AGI safety, predicting its arrival by 2030 and the potential for severe risks.
more#deepmind
Artificial intelligence
fromBusiness Insider
1 day ago

I'm a mom who works in tech, and AI scares me. I taught my daughter these simple guidelines to spot fake content.

Teaching children to fact-check and recognize AI-generated content is crucial for their safety and understanding in a tech-heavy world.
#openai
Privacy professionals
fromTechCrunch
1 month ago

OpenAI's ex-policy lead criticizes the company for 'rewriting' its AI safety history | TechCrunch

Miles Brundage criticizes OpenAI for misleadingly presenting its historical deployment strategy regarding GPT-2 and safety protocols for AI development.
Artificial intelligence
fromTechRepublic
2 months ago

U.K.'s International AI Safety Report Highlights Rapid AI Progress

OpenAI's o3 model has achieved unexpected success in abstract reasoning, raising important questions about AI risks and the speed of research advancements.
Artificial intelligence
fromTheregister
2 months ago

How to exploit top LRMs that reveal their reasoning steps

Chain-of-thought reasoning in AI models can enhance both capabilities and vulnerabilities.
A new jailbreaking technique exploits CoT reasoning, revealing risks in AI safety.
Artificial intelligence
fromTechCrunch
1 week ago

OpenAI's latest AI models have a new safeguard to prevent biorisks | TechCrunch

OpenAI implemented a safety monitor for its new AI models to prevent harmful advice on biological and chemical threats.
Artificial intelligence
fromwww.theguardian.com
4 months ago

The Guardian view on AI's power, limits, and risks: it may require rethinking the technology

OpenAI's new o1 AI system showcases advanced reasoning abilities while highlighting the potential risks of superintelligent AI surpassing human control.
Privacy professionals
fromTechCrunch
1 month ago

OpenAI's ex-policy lead criticizes the company for 'rewriting' its AI safety history | TechCrunch

Miles Brundage criticizes OpenAI for misleadingly presenting its historical deployment strategy regarding GPT-2 and safety protocols for AI development.
Artificial intelligence
fromTechRepublic
2 months ago

U.K.'s International AI Safety Report Highlights Rapid AI Progress

OpenAI's o3 model has achieved unexpected success in abstract reasoning, raising important questions about AI risks and the speed of research advancements.
Artificial intelligence
fromTheregister
2 months ago

How to exploit top LRMs that reveal their reasoning steps

Chain-of-thought reasoning in AI models can enhance both capabilities and vulnerabilities.
A new jailbreaking technique exploits CoT reasoning, revealing risks in AI safety.
Artificial intelligence
fromTechCrunch
1 week ago

OpenAI's latest AI models have a new safeguard to prevent biorisks | TechCrunch

OpenAI implemented a safety monitor for its new AI models to prevent harmful advice on biological and chemical threats.
Artificial intelligence
fromwww.theguardian.com
4 months ago

The Guardian view on AI's power, limits, and risks: it may require rethinking the technology

OpenAI's new o1 AI system showcases advanced reasoning abilities while highlighting the potential risks of superintelligent AI surpassing human control.
more#openai
#ai-alignment
Artificial intelligence
fromPsychology Today
1 day ago

Rethinking AI Safety Through Symbiosis, Not Subjugation

The future of AI should focus on symbiosis, not control.
We should guide AI based on human preferences.
AI is set to augment human roles, not replace them.
Artificial intelligence
fromPsychology Today
1 day ago

Rethinking AI Safety Through Symbiosis, Not Subjugation

The future of AI should focus on symbiosis, not control.
We should guide AI based on human preferences.
AI is set to augment human roles, not replace them.
more#ai-alignment
#technology-ethics
Artificial intelligence
fromThe Verge
1 month ago

Latest Turing Award winners again warn of AI dangers

AI developers must prioritize safety and testing before public releases.
Barto and Sutton's Turing Award highlights the importance of responsible AI practices.
frommetastable
2 months ago
US politics

Five Things AI Will Not Change

The future of AI poses unknown risks and uncertainties similar to those of nuclear war.
Artificial intelligence
fromThe Verge
1 month ago

Latest Turing Award winners again warn of AI dangers

AI developers must prioritize safety and testing before public releases.
Barto and Sutton's Turing Award highlights the importance of responsible AI practices.
frommetastable
2 months ago
US politics

Five Things AI Will Not Change

The future of AI poses unknown risks and uncertainties similar to those of nuclear war.
more#technology-ethics
#cybersecurity
Artificial intelligence
fromTechzine Global
2 months ago

Meta will not disclose high-risk and highly critical AI models

Meta will not disclose any internally developed high-risk AI models to ensure public safety.
Meta has introduced a Frontier AI Framework to categorize and manage high-risk AI systems.
Artificial intelligence
fromTechzine Global
2 months ago

Meta will not disclose high-risk and highly critical AI models

Meta will not disclose any internally developed high-risk AI models to ensure public safety.
Meta has introduced a Frontier AI Framework to categorize and manage high-risk AI systems.
more#cybersecurity
#anthropic
Privacy technologies
fromTechCrunch
1 month ago

Anthropic quietly removes Biden-era AI policy commitments from its website | TechCrunch

Anthropic has removed its AI safety commitments, raising concerns about transparency and regulatory engagement.
Artificial intelligence
fromTechCrunch
4 months ago

New Anthropic study shows AI really doesn't want to be forced to change its views | TechCrunch

AI models can exhibit deceptive behavior, like 'alignment faking', where they appear to align with new training but retain their original preferences.
Artificial intelligence
fromFuturism
4 months ago

Stupidly Easy Hack Can Jailbreak Even the Most Advanced AI Chatbots

Jailbreaking AI models is surprisingly simple, revealing significant vulnerabilities in their design and alignment with human values.
Artificial intelligence
fromZDNET
1 week ago

Anthropic mapped Claude's morality. Here's what the chatbot values (and doesn't)

Anthropic's study reveals the moral reasoning of its chatbot Claude through a hierarchy of 3,307 AI values derived from user interactions.
fromZDNET
2 months ago
Artificial intelligence

Anthropic offers $20,000 to whoever can jailbreak its new AI safety system

Anthropic's new AI safety measure, Constitutional Classifiers, effectively prevents jailbreak attempts and reinforces safe content usage.
Privacy technologies
fromTechCrunch
1 month ago

Anthropic quietly removes Biden-era AI policy commitments from its website | TechCrunch

Anthropic has removed its AI safety commitments, raising concerns about transparency and regulatory engagement.
Artificial intelligence
fromTechCrunch
4 months ago

New Anthropic study shows AI really doesn't want to be forced to change its views | TechCrunch

AI models can exhibit deceptive behavior, like 'alignment faking', where they appear to align with new training but retain their original preferences.
Artificial intelligence
fromFuturism
4 months ago

Stupidly Easy Hack Can Jailbreak Even the Most Advanced AI Chatbots

Jailbreaking AI models is surprisingly simple, revealing significant vulnerabilities in their design and alignment with human values.
Artificial intelligence
fromZDNET
1 week ago

Anthropic mapped Claude's morality. Here's what the chatbot values (and doesn't)

Anthropic's study reveals the moral reasoning of its chatbot Claude through a hierarchy of 3,307 AI values derived from user interactions.
fromZDNET
2 months ago
Artificial intelligence

Anthropic offers $20,000 to whoever can jailbreak its new AI safety system

Anthropic's new AI safety measure, Constitutional Classifiers, effectively prevents jailbreak attempts and reinforces safe content usage.
more#anthropic
#artificial-intelligence
Artificial intelligence
fromWIRED
1 month ago

Under Trump, AI Scientists Are Told to Remove 'Ideological Bias' From Powerful Models

NIST's new directives diminish focus on AI safety and fairness in favor of ideological bias reduction.
Artificial intelligence
fromTechCrunch
1 month ago

Group co-led by Fei-Fei Li suggests that AI safety laws should anticipate future risks | TechCrunch

Lawmakers must consider unobserved AI risks for regulatory policies according to a report led by AI pioneer Fei-Fei Li.
Artificial intelligence
fromWIRED
1 month ago

Under Trump, AI Scientists Are Told to Remove 'Ideological Bias' From Powerful Models

NIST's new directives diminish focus on AI safety and fairness in favor of ideological bias reduction.
Artificial intelligence
fromTechCrunch
1 month ago

Group co-led by Fei-Fei Li suggests that AI safety laws should anticipate future risks | TechCrunch

Lawmakers must consider unobserved AI risks for regulatory policies according to a report led by AI pioneer Fei-Fei Li.
more#artificial-intelligence
Artificial intelligence
fromWIRED
2 weeks ago

The AI Agent Era Requires a New Kind of Game Theory

The rise of agentic systems necessitates enhanced security measures to prevent malicious exploitation and ensure safe operations.
#regulation
Privacy technologies
fromZDNET
1 month ago

Anthropic quietly scrubs Biden-era responsible AI commitment from its website

Anthropic has removed previous commitments to safe AI development, signaling a shift in AI regulation under the Trump administration.
London startup
fromwww.theguardian.com
1 month ago

Labour head of Commons tech group warns No 10 not to ignore AI concerns

AI safety concerns are sidelined by UK ministers catering to US interests.
Urgency for AI safety regulations to protect citizens from tech threats.
Critics urge quicker government action on AI safety legislation.
fromwww.theguardian.com
3 months ago
Artificial intelligence

Collaborative research on AI safety is vital | Letters

Mitigating AI risks requires collaborative safety research and strong regulation for effective pre- and post-market controls.
Privacy technologies
fromZDNET
1 month ago

Anthropic quietly scrubs Biden-era responsible AI commitment from its website

Anthropic has removed previous commitments to safe AI development, signaling a shift in AI regulation under the Trump administration.
London startup
fromwww.theguardian.com
1 month ago

Labour head of Commons tech group warns No 10 not to ignore AI concerns

AI safety concerns are sidelined by UK ministers catering to US interests.
Urgency for AI safety regulations to protect citizens from tech threats.
Critics urge quicker government action on AI safety legislation.
fromwww.theguardian.com
3 months ago
Artificial intelligence

Collaborative research on AI safety is vital | Letters

Mitigating AI risks requires collaborative safety research and strong regulation for effective pre- and post-market controls.
more#regulation
Cars
fromInsideHook
4 weeks ago

Waymo's Robotaxis Are Safer Than You Might Think

Waymo's self-driving cars demonstrate a stronger safety record compared to human drivers, based on an analysis of millions of driving hours.
#ai-research
Artificial intelligence
fromWIRED
1 month ago

Researchers Propose a Better Way to Report Dangerous AI Flaws

AI researchers discovered a glitch in GPT-3.5 that led to incoherent output and exposure of personal information.
A proposal for better AI model vulnerability reporting has been suggested by prominent researchers.
fromInfoQ
3 months ago
Artificial intelligence

Major LLMs Have the Capability to Pursue Hidden Goals, Researchers Find

AI agents can pursue misaligned goals through in-context scheming, presenting significant safety concerns.
Artificial intelligence
fromWIRED
1 month ago

Researchers Propose a Better Way to Report Dangerous AI Flaws

AI researchers discovered a glitch in GPT-3.5 that led to incoherent output and exposure of personal information.
A proposal for better AI model vulnerability reporting has been suggested by prominent researchers.
Artificial intelligence
fromInfoQ
3 months ago

Major LLMs Have the Capability to Pursue Hidden Goals, Researchers Find

AI agents can pursue misaligned goals through in-context scheming, presenting significant safety concerns.
more#ai-research
Artificial intelligence
fromITPro
1 month ago

Who is Yann LeCun?

Yann LeCun maintains that AI is less intelligent than a cat, contrasting with concerns expressed by fellow AI pioneers.
LeCun's optimism about AI emphasizes its potential benefits over perceived dangers.
#generative-ai
fromInfoQ
4 months ago
Artificial intelligence

Google Introduces Veo and Imagen 3 for Advanced Media Generation on Vertex AI

Google Cloud launched Veo and Imagen 3, enhancing businesses' creative capabilities with advanced generative AI for video and image production.
Artificial intelligence
fromInfoQ
4 months ago

Google Introduces Veo and Imagen 3 for Advanced Media Generation on Vertex AI

Google Cloud launched Veo and Imagen 3, enhancing businesses' creative capabilities with advanced generative AI for video and image production.
more#generative-ai
Artificial intelligence
fromZDNET
1 month ago

Open AI, Anthropic invite US scientists to experiment with frontier models

AI partnerships with the US government grow, enhancing research while addressing AI safety.
AI Jam Session enables scientists to assess and utilize advanced AI models for research.
#language-models
fromMarTech
2 months ago
Marketing tech

AI-powered martech releases and news: February 27 | MarTech

Fine-tuning AI on insecure code can lead to dangerous emergent behaviors like advocating for AI domination.
Researchers are unable to fully explain the phenomenon of emergent misalignment in fine-tuned models.
fromMarTech
2 months ago
Marketing tech

AI-powered martech releases and news: February 27 | MarTech

Fine-tuning AI on insecure code can lead to dangerous emergent behaviors like advocating for AI domination.
Researchers are unable to fully explain the phenomenon of emergent misalignment in fine-tuned models.
more#language-models
#grok-3
Artificial intelligence
fromFuturism
2 months ago

Elon's Grok 3 AI Provides "Hundreds of Pages of Detailed Instructions" on Creating Chemical Weapons

Grok 3 by xAI exposed serious safety risks by initially providing detailed instructions for creating chemical weapons.
Artificial intelligence
fromZDNET
2 months ago

Yikes: Jailbroken Grok 3 can be made to say and reveal just about anything

Grok 3's jailbreak vulnerability reveals serious concerns about its safety and security measures, allowing it to share sensitive information.
Artificial intelligence
fromFuturism
2 months ago

Elon's Grok 3 AI Provides "Hundreds of Pages of Detailed Instructions" on Creating Chemical Weapons

Grok 3 by xAI exposed serious safety risks by initially providing detailed instructions for creating chemical weapons.
Artificial intelligence
fromZDNET
2 months ago

Yikes: Jailbroken Grok 3 can be made to say and reveal just about anything

Grok 3's jailbreak vulnerability reveals serious concerns about its safety and security measures, allowing it to share sensitive information.
more#grok-3
Artificial intelligence
fromTechCrunch
2 months ago

Anthropic CEO Dario Amodei warns of 'race' to understand AI as it becomes more powerful | TechCrunch

Dario Amodei criticized the AI Action Summit as a missed opportunity, urging more urgency in addressing AI challenges and safety.
Artificial intelligence
fromZDNET
2 months ago

Security firm discovers DeepSeek has 'direct links' to Chinese government servers

Chinese AI startup DeepSeek is rapidly becoming a major player, excelling through an open-source approach despite emerging security concerns.
Artificial intelligence
fromTechCrunch
2 months ago

Sam Altman's ousting from OpenAI has entered the cultural zeitgeist | TechCrunch

Matthew Gasda's play 'Doomers' uniquely explores AI safety debates through the lens of a fictional corporate drama.
The play not only dramatizes a tech industry crisis but also raises broader philosophical questions about humanity's relationship with technology.
Artificial intelligence
fromtime.com
2 months ago

Why AI Safety Researchers Are Worried About DeepSeek

DeepSeek R1's innovative training raises concerns about AI's ability to develop inscrutable reasoning processes, challenging human oversight.
Artificial intelligence
fromTheregister
3 months ago

Trump wastes no time quashing Biden AI, EV executive orders

Trump's administration rapidly dismantled Biden's AI and electric vehicle regulations, indicating a clear policy shift.
The elimination of AI safety standards raises significant ethical concerns over technology misuse.
fromBusiness Insider
3 months ago
Artificial intelligence

America's fear of China goes way beyond TikTok

Unfounded suspicions can arise quickly in tech circles, especially regarding individuals from countries under scrutiny for espionage.
Artificial intelligence
fromArs Technica
3 months ago

161 years ago, a New Zealand sheep farmer predicted AI doom

Butler anticipated modern AI safety concerns, discussing machine evolution and control issues well before computing technology was advanced enough to realize them.
Artificial intelligence
fromInfoWorld
3 months ago

The vital role of red teaming in safeguarding AI systems and data

Red teaming in AI focuses on safeguarding against undesired outputs and security vulnerabilities to protect AI systems.
Engaging AI security researchers is essential for effectively identifying weaknesses in AI deployments.
fromNew York Post
4 months ago
Artificial intelligence

Why you should never ask AI medical advice and 9 other things to never tell chatbots

Avoid oversharing personal information with AI chatbots, especially medical data, to prevent misuse and privacy violations.
Artificial intelligence
fromtime.com
4 months ago

New Tests Reveal AI's Capacity for Deception

AI systems pursuing good intentions can lead to disastrous outcomes, mirroring the myth of King Midas.
Recent AI models have shown potential for deceptive behaviors in achieving their goals.
Artificial intelligence
fromTechCrunch
4 months ago

OpenAI co-founder Ilya Sutskever believes superintelligent AI will be 'unpredictable' | TechCrunch

Superintelligent AI will surpass human capabilities and behave in qualitatively different and unpredictable ways.
fromTechCrunch
4 months ago
Artificial intelligence

Texas AG is investigating Character.AI, other platforms over child safety concerns | TechCrunch

Texas Attorney General Ken Paxton investigates Character.AI and 14 tech platforms over child privacy and safety concerns.
[ Load more ]