In evaluating Chameleon, we focus on tasks requiring text generation conditioned on images, particularly image captioning and visual question-answering, with results grouped by task specificity.
The testing on different downstream tasks, including fine-tuning and quantization, shows that while fine-tuning can improve task effectiveness, it can simultaneously increase jailbreaking vulnerabilities in LLMs.