Why AI needs to learn new languages
Briefly

The latest version, Chat GPT-4, scored 85% on a common question-and-answer test. In other languages it is less impressive. When taking the test in Telugu, an Indian language spoken by around 100m people, for instance, it scored just 62%.
Large language models ( LLMs) are trained on text scraped from the internet, on which English is the lingua franca. Around 93% of Chat GPT-3's training data was in English. In Common Crawl, just one of the datasets on which the model was trained, English makes up 47% of the corpus, with other (mostly related) European languages accounting for 38% more. Chinese and Japanese combined, by contrast, made up just 9%. Telugu was not even a rounding error.
Read at The Economist
[
add
]
[
|
|
]