Featured paper: Delving into LLM-assisted writing in biomedical publications through excess vocabulary

Disclaimer: This content was generated by NotebookLM and has been reviewed for accuracy by Dr. Tram.

Imagine a future where a significant chunk of the scientific papers you read might have been co-written or heavily edited by an artificial intelligence. Well, according to a groundbreaking new study, that future is already here, and it’s happening at an unprecedented rate. Researchers recently delved into millions of scientific abstracts and found a clear “fingerprint” of AI-assisted writing that’s far more impactful than even major global events like the COVID-19 pandemic.

The Rise of AI in Academia

Large Language Models, or LLMs, like ChatGPT, burst onto the scene in November 2022, offering an incredible ability to generate and refine text that sounds remarkably human. Scientists quickly saw the potential for these tools to help with their demanding writing tasks. LLMs can improve grammar, make writing clearer, translate text into English for non-native speakers, and even quickly summarize information. This promised to make scientific writing more efficient and potentially more accessible for many.

However, the rapid adoption of AI also brought worries. LLMs aren’t perfect; they can produce inaccurate information, make up references, and even reinforce existing biases. There’s also the risk of “paper mills” (groups that produce fake research) misusing these powerful tools. These concerns led scientists to try and figure out just how much LLM-assisted writing was truly happening in academic literature.

Hunting for the AI Footprint

Previous efforts to track AI writing mostly relied on specific “AI detectors” or comparing AI-generated text to human-written text. But these methods had a big problem: they needed “ground-truth” examples – essentially, knowing exactly which texts were human and which were AI. This often meant using older human texts and newer AI texts generated in specific ways, which could introduce bias and make the results less reliable.

That’s where the new study by Kobak and colleagues comes in. They decided to take a different, data-driven approach. Instead of trying to detect individual AI-written papers, they looked for patterns across a massive collection of scientific abstracts. They were inspired by studies of “excess mortality” during the COVID-19 pandemic, which looked at deaths above what was expected. Here, they looked for “excess words” - words that suddenly appeared much more frequently than expected after ChatGPT was released.

What Did They Study?

The researchers examined a massive dataset: more than 15 million English-language biomedical abstracts published between 2010 and 2024, all indexed by PubMed. This huge amount of data allowed them to spot subtle shifts in language usage over time.

Their method was clever: for each word, they looked at how often it appeared each year. Then, for 2024, they predicted how often a word should have appeared based on its frequency in 2021 and 2022 (they deliberately avoided 2023 data because LLMs might have already influenced it). The difference between the actual frequency and the expected frequency was their “excess usage”. They had two main ways to measure this: the “frequency gap” (for common words) and the “frequency ratio” (for less common words).

The Unprecedented Shift: Style Over Substance

The findings were striking. They discovered an abrupt increase in the frequency of certain words in 2023-2024. What kind of words were these? Not new scientific terms, but rather stylistic words that are often unrelated to the actual content of the paper.

For example, words like “delves” (its frequency ratio jumped 28 times!), “underscores,” and “showcasing” saw huge increases. More common words like “potential,” “findings,” and “crucial” also showed significant “excess usage”.

To put this in perspective, the study compared these changes to the impact of the COVID-19 pandemic. During 2020-2022, words like “coronavirus” and “pandemic” obviously saw massive spikes. This was considered an “unprecedented effect” on biomedical publishing at the time. But the impact of LLMs in 2024 was even greater! The number of “excess words” in 2024 reached 454, compared to 190 at the peak of the COVID-19 pandemic in 2021.

Crucially, the type of excess words was different. During the COVID pandemic, the excess words were almost all “content words” - nouns directly related to the disease, like “respiratory” or “remdesivir”. In contrast, the excess vocabulary in 2024 consisted almost entirely of “style words”. These were often verbs (66%) and adjectives (14%), reflecting a shift in how scientists were writing, rather than what they were writing about. This strongly suggests that LLMs, which are designed to generate fluent and often elaborate prose, were influencing writing style.

How Much AI Is Really Out There?

By combining the “excess usage” of these AI-preferred style words, the researchers could estimate a lower bound for LLM usage. They found that at least 13.5% of all PubMed abstracts published in 2024 were processed with LLMs.

Think about that: out of roughly 1.5 million papers indexed in PubMed each year, this means LLMs are assisting in writing at least 200,000 papers annually. And remember, this is just a lower bound. The true number is likely higher, as many LLM-assisted texts might not contain these specific “marker words” that the study identified. In fact, the amount of LLM usage in 2024 was estimated to be at least twice as high as the peak of COVID-related literature.

Uneven Adoption: Who’s Using AI Most?

The study also revealed that LLM usage isn’t uniform across the scientific world. There’s significant variation across different research fields, countries, and even journals.

  • Computational fields like bioinformatics showed higher usage, with roughly 20% of abstracts showing AI influence. This makes sense, as computer scientists might be more familiar with and willing to adopt AI technology.
  • Among countries, non-English speaking nations like China, South Korea, and Taiwan showed higher rates (around 20%), suggesting LLMs are being heavily used there to help authors write in English.
  • When looking at individual journals, open-access journals with faster or simplified review processes like Sensors (25%) and Cureus (20%) showed very high AI adoption. In contrast, highly selective and prestigious journals like Nature, Science, and Cell had much lower rates (around 7-10%). This might suggest that stricter review processes or higher expectations in these journals lead authors to be more careful about detectable AI styles.
  • In some very specific areas, the estimated LLM usage was even higher, reaching over 40% in computational papers from China, and even 50% in clusters of papers on deep learning-based object detection primarily from Chinese affiliations and published in MDPI’s Sensors journal. The researchers suggest that the true LLM usage might be closer to these highest observed figures, where the AI’s influence might be less deliberately hidden.

The AI Challenge: What Now?

The widespread adoption of LLMs in scientific writing is a double-edged sword. While they offer benefits like improved readability and grammar, they also pose serious risks to research integrity, including the generation of inaccurate information, fabricated references, and the potential for increased plagiarism. LLM outputs can also be less diverse and novel, potentially leading to a homogenization of scientific writing and missed opportunities for innovation.

Given these challenges, the academic community and publishers are grappling with how to respond. Many publishers and funding agencies have already started putting policies in place, for example, banning LLMs as co-authors or from being used in peer review without disclosure.

This study highlights just how critical these conversations and policies are. By providing an unbiased and data-driven way to measure LLM usage, it gives the scientific community a powerful tool to monitor whether these policies are being followed. The “excess word” approach developed by Kobak and colleagues can help track the ongoing impact of AI on scientific writing and inform the crucial debate about how we use these powerful tools responsibly in the future. It’s clear that LLMs are not just a passing fad; they are fundamentally changing how science is communicated, and this trend will likely continue to grow.


<
Previous Post
Unmasking the Hidden Shifts: Why Your BMI Doesn’t Tell the Whole Story in Pediatric Cancer Treatment
>
Next Post
Deep Learning for Body Composition in Pediatric Lymphoma