In one of my courses at Stanford Medical School, my classmates and I were tasked with using a secure AI model for a thought experiment. We asked it to generate a clinical diagnosis from a fictional patient case: "Diabetic retinopathy," the chatbot said. When we asked for supporting evidence, it produced a tidy list of academic citations. The problem? The authors didn't actually exist. The journals were fabricated. The AI chatbot had hallucinated.
OpenAI has been clear in its messaging that different models perform differently. But my recent testing has shown that different interaction modes, even using the same model, also perform differently. As it turns out, ChatGPT in Voice Mode (both Standard and Advanced) is considerably less accurate than the web version. The reason? It doesn't want to take time to think because that would slow down the conversation.
Hallucination is fundamental to how transformer-based language models work. In fact, it's their greatest asset: this is the method by which language models find links between sometimes disparate concepts. But hallucination can become a curse when language models are applied in domains where the truth matters. Examples range from questions about health care policies, to code that correctly uses third-party APIs.
Bi Gan's new film "Resurrection" is a bold exploration of hallucination and memory in an episodic journey through Chinese history, featuring incredible visual storytelling.