Big data is coming to medical records. (AP Photo/Kiichiro Sato)
French and Lebanese health data scientists are experimenting with a new system for integrating individual patients' medical records with massive networks of data from clinical trials and public databases.
In a study in Computers and Industrial Engineering published March 15, the research team detailed their big data sorting platform, called "EMR2vec," referring to electronic medical records, which have become more popular in hospitals as digital technologies replace older filing systems. These new ways of cataloguing information have led to an influx of data about patients' symptoms and outcomes, which could help epidemiologists and doctors better understand pandemic trends and other public health concerns.
But the digital transition presents one major problem: hospitals and health care clinics have adopted different standardization and naming conventions, so it's difficult for researchers to analyze multiple datasets at once.
Additionally, the current norm for identifying trial candidates — manually parsing through medical records from individual hospital databases — is a time-consuming task that often leaves room for mistakes. This disconnect can cost lives, as researchers mistakenly skip over worthy candidates who might otherwise be eligible for experimental treatments.
The researchers cited a 2001 Journal of Clinical Oncology study that indicated that less than five percent of cancer patients enter clinical trials, even when these experimental treatments could provide a vital step toward recovery.
Houssein Dhayne, a health care data integration researcher at St. Joseph University, Lebanon, and a lead author of the paper, told The Academic Times that the team merged multiple computing tools to overcome some of the disparities between individual patient data and clinical trial results.
"We tried to benefit from new techniques such as artificial intelligence and machine learning, and at the same time, use traditional techniques, like ontology," Dhayne said, referring to the naming conventions that information scientists use to categorize data.
At the core of the project, the team attempted to accurately categorize 10,000 patients' medical records and 10,000 clinical trial datasets using complex geometric algorithms. It also employed a few already-available tools, like SNOMED-CT, a project to create a universal, multilingual medical taxonomy that has around 350,000 unique terms. The team's system reached an average precision of 0.86, which outperformed the results of a similar platform that had an average precision of about 0.56, according to the study.
The coronavirus pandemic revealed issues within the global health care system, from poor testing infrastructure to mask manufacturing shortages. But behind the scenes, health researchers wrestled with vast swaths of data to better understand the virus' global effects.
And even as the pandemic spread around the world, it took months for clinical trials to get off the ground. In one analysis, only 11 out of 75 coronavirus trials had begun to recruit candidates by March 2020 and none had been completed, leaving an information gap during the pandemic's early days. Researchers say these sorts of delays could be mitigated if there were more efficient ways to dig through already available data.
Most clinical trials include eligibility requirements centered on a patient's identity, ethnicity or genetic makeup to better isolate the results of a particular study. But medical data researchers have had trouble linking those large-scale medical trials with the individual experiences of their patients. While clinical data is often separated out into broad, demographic-based categories, a subject's personal medical data can be highly specific and technical.
In recent years, the medical community has embraced the concept of evidence-based medicine, prioritizing personalized treatment plans that identify each patient's unique circumstances and symptoms. In the coming years, better systems for digging through clinical trial data, including EMR2vec, could allow doctors to search through a centralized database to compare the treatment plans of patients with similar genetic makeups and demographic backgrounds.
Although the experiment helped explore new digital organizing principles and mathematical computing models, the data scientists who worked on the project say their core goal is to help ordinary patients find more viable treatment methods.
"[EMR2vec] can be very effective and efficient in treating patients suffering life-threatening diseases and hence can play a significant role in saving lives," the researchers wrote.
The study, "EMR2vec: Bridging the Gap Between Patient Data and Clinical Trial," published March 15 in Computers and Industrial Engineering, was authored by Dhayne Houssein and Rima Kilany, Saint Joseph University, Beirut; Rafiqul Haque, Intelligencia Company; and Yehia Taher, Versailles Saint-Quentin-en-Yvelines University.