Generative AI’s First Real-World Health Test

A new university-led study puts generative AI head-to-head with traditional research teams in a high-stakes medical data challenge, with surprising results.

Generative AI analyzes medical data faster than human research teams. That was the central finding from a new study led by scientists at the University of California, San Francisco and Wayne State University, offering one of the earliest real-world tests of generative AI in health research.

In a head-to-head comparison, researchers assigned identical analytical tasks to multiple groups. Some relied entirely on human expertise. Others paired scientists with generative AI tools. The challenge: predict preterm birth using data from more than 1,000 pregnant women.

Human experts had previously spent months analyzing the same information. By contrast, AI-supported teams were able to generate functioning analytical code in minutes.

A Junior Team, Powered by AI

Notably, even a junior research pair, a UCSF master’s student and a high school student, successfully developed prediction models with AI assistance. The advantage stemmed from AI’s ability to write analytical code based on short but highly specific prompts.

Performance, however, was not universal. Only four of the eight AI chatbots tested produced usable code. Still, the systems that succeeded did not require large teams of specialists to guide them.

Because of this acceleration, the junior researchers were able to complete their experiments, verify their findings, and submit results to a journal within just a few months.

“These AI tools could relieve one of the biggest bottlenecks in data science: building our analysis pipelines,” said Marina Sirota, professor of Pediatrics who is the interim director of the Bakar Computational Health Sciences Institute (BCHSI) at UCSF and the principal investigator of the March of Dimes Prematurity Research Center at UCSF. “The speed-up couldn’t come sooner for patients who need help now.”

The study was published in Cell Reports Medicine on Feb. 17.

Why Preterm Birth Research Matters

The implications extend beyond technical efficiency. Preterm birth remains the leading cause of newborn death and contributes significantly to long-term motor and cognitive challenges in children. In the United States alone, roughly 1,000 babies are born prematurely each day.

Despite decades of research, scientists still do not fully understand what causes preterm birth. To investigate potential risk factors, the research team compiled microbiome data from approximately 1,200 pregnant women across nine separate studies.

“This kind of work is only possible with open data sharing, pooling the experiences of many women and the expertise of many researchers,” said Tomiko T. Oskotsky, MD, co-director of the March of Dimes Preterm Birth Data Repository, associate professor in UCSF BCHSI, and co-author of the paper.

However, analyzing such vast and complex datasets presents its own bottlenecks. To address this, researchers turned to DREAM (Dialogue on Reverse Engineering Assessment and Methods), a global crowdsourcing competition designed to tackle difficult biomedical data challenges.

More than 100 teams worldwide participated in pregnancy-related DREAM challenges, developing machine learning models to detect patterns linked to preterm birth. Most teams completed their work within three months. Yet consolidating findings and publishing results ultimately took nearly two years.

Testing AI on Pregnancy and Microbiome Data

Curious whether generative AI could shorten that timeline, the UCSF team partnered with researchers at Wayne State. Together, they instructed eight AI systems to independently generate algorithms using the same DREAM datasets, without direct human coding.

The AI chatbots received carefully written natural language prompts, guiding them to analyze vaginal microbiome data for signs of preterm birth and to examine blood or placental samples to estimate gestational age.

Pregnancy dating is almost always an estimate, yet it determines the type of care women receive as pregnancies progress. Inaccurate estimates can complicate labor preparation and clinical decision-making.

When researchers executed the AI-generated code, four of the eight tools produced models that matched the performance of human teams. In some cases, the AI models performed better. Importantly, the entire generative AI effort—from inception to submission of a paper—took just six months.

AI as a Research Accelerator

Scientists caution that AI systems still require careful oversight. Generative models can produce misleading or flawed outputs, and human expertise remains essential for interpretation and validation.

Still, the productivity gains are difficult to ignore.

“Thanks to generative AI, researchers with a limited background in data science won’t always need to form wide collaborations or spend hours debugging code,” said Adi L. Tarca, PhD, professor at Wayne State University. “They can focus on answering the right biomedical questions.”

For investors and healthcare stakeholders, the study signals more than incremental improvement. If generative AI continues to compress research timelines it could materially accelerate drug discovery, diagnostics development, and translational medicine.

In a sector where time-to-insight often translates directly into patient outcomes and commercial opportunity, that shift may prove transformative.

Generative AI’s First Real-World Health Test

A Junior Team, Powered by AI

Why Preterm Birth Research Matters

Testing AI on Pregnancy and Microbiome Data

AI as a Research Accelerator

Share:

become a subscriber and receive our newsletter