
"An audit of 2.5 million academic papers has identified nearly 3,000 biomedical-science papers that contain fake references - ones that could not be traced to known publications. The findings, published in The Lancet on 7 May, are contained in the first academic study to estimate the scale of fake citations in the biomedical literature."
"The authors designed an automated pipeline to screen papers from PubMed Central - a database of publicly accessible biomedical articles - published between January 2023 and February 2026. Their work suggests that the contamination of papers with fake citations is a rapidly growing problem in biomedicine. There were 12 times more publications with fabricated citations in 2025 compared with 2023."
"The findings are "conservative underestimates", says study co-author Maxim Topaz, an AI researcher at Columbia University in New York. "What we identified is the lower bound of true prevalence. We're scratching the tip of the iceberg," he adds. Kathryn Weber-Boer, director of scientometrics at the London-based company Digital Science, agrees."
"In their study, Topaz and his colleagues developed a system to inspect the 125.6 million references cited by 2.5 million papers. They focused the analysis on 97 million references that had valid Digital Object Identifiers (DOIs) - unique strings of letters and numbers assigned by publishers and preprint repositories - or an ID assigned by the database PubMed."
An audit of 2.5 million biomedical-science papers identified nearly 3,000 papers containing fake references that could not be traced to known publications. The study estimated the scale of fake citations in biomedical literature by screening papers in PubMed Central published between January 2023 and February 2026. An automated pipeline inspected 125.6 million references and focused on references with valid DOIs or PubMed IDs. Results indicate rapidly growing contamination, with 12 times more publications containing fabricated citations in 2025 than in 2023. The reported counts are described as conservative underestimates, suggesting the true prevalence is higher. A separate analysis estimated about 1.6% of 2025 publications contained at least one non-existent reference.
Read at Nature
Unable to calculate read time
Collection
[
|
...
]