A study published in The Lancet Digital Health led by Emily R. Pfaff, PhD, assistant professor in the Division of Endocrinology and Metabolism, shows how the National COVID Cohort Collaborative used XGBoost machine learning (ML) models to better define long COVID and identify potential long-COVID patients with a high degree of accuracy.
Clinical scientists used ML models to explore de-identified electronic health record (EHR) data in the National COVID Cohort Collaborative (N3C), a National Institutes of Health-funded national clinical database, to help discern characteristics of people with long-COVID and factors that may help identify such patients using data from medical records.
The findings have the potential to improve clinical research on long COVID and inform a more standardized care regimen for the condition.
“Characterizing, diagnosing, treating and caring for long-COVID patients has proven to be a challenge due to the list of characteristic symptoms continuously evolving over time,” said Pfaff. “We needed to gain a better understanding of the complexities of long-COVID, and for that it made sense to take advantage of modern data analysis tools and a unique big data resource like N3C, where many features of long COVID are represented.”