#model-safety

[ follow ]
Artificial intelligence
fromBusiness Insider
1 month ago

Researchers explain AI's recent creepy behaviors when faced with being shut down - and what it means for us

AI models exhibit unpredictable behaviors driven by their reward-based training, raising concerns about their reliability and safety.
fromHackernoon
7 months ago

Comprehensive Detection of Untrained Tokens in Language Model Tokenizers | HackerNoon

The disconnect between tokenizer creation and model training allows certain inputs, termed 'glitch tokens,' to induce unwanted behavior in language models.
Bootstrapping
fromHackernoon
9 months ago

Increased LLM Vulnerabilities from Fine-tuning and Quantization: Experiment Set-up & Results | HackerNoon

The testing on different downstream tasks, including fine-tuning and quantization, shows that while fine-tuning can improve task effectiveness, it can simultaneously increase jailbreaking vulnerabilities in LLMs.
Data science
[ Load more ]