Real polling asks real people, unlike synthetic sampling

"Synthetic sampling uses models to "survey" fake respondents. G. Elliott Morris and Verasight compared real polling data against the synthetic variety to find that the latter is error-prone. We find that the AIs cannot successfully replicate real-world data. Across models, the LLMs missed real population proportions for Trump approval and the generic ballot by between 4 and 23 percentage points."

"Even the best model we tested overstated disapproval of Trump, and almost never produced "don't know" responses despite ~3% of humans choosing it. For core demographic subgroups, the average absolute subgroup error was ~8 points; errors for some key groups (e.g., Black respondents) were as large as 15 points on Trump disapproval, and smaller groups had larger errors still (30 percentage points for Pacific Islanders)."

Synthetic sampling using LLMs fails to replicate real-world polling data with acceptable accuracy. Across tested models, errors in population proportions for Trump approval and the generic ballot ranged from 4 to 23 percentage points. The best model overstated Trump disapproval and nearly never generated "don't know" responses, despite about 3% of human respondents selecting that option. Average absolute subgroup error was about 8 points, with errors up to 15 points for Black respondents on Trump disapproval and as large as 30 points for small groups like Pacific Islanders. These error magnitudes render synthetic sampling unusable for serious polling analysis.

#synthetic-sampling #llms #polling-error #demographic-bias

Read at FlowingData

Unable to calculate read time

Collection

[

...

]

Real polling asks real people, unlike synthetic samplingReal polling asks real people, unlike synthetic sampling Briefly

Real polling asks real people, unlike synthetic sampling
Real polling asks real people, unlike synthetic sampling
Briefly