Machine-learning model predicts lung and liver cancer through genome sequence changes

Artificial Intelligence Lung Color Social

Researchers have developed a machine-learning approach that may potentially predict early-stage lung or liver cancer by detecting repetitive sequences in the genome.

As they detail in an article in Science Translational Medicine, the researchers from Johns Hopkins University School of Medicine in Baltimore developed their machine-learning model, dubbed Artemis for Analysis of RepeaT EleMents in dISease, to identify tumor-specific changes in the repetitive sequence of the genome to predict the development of lung or liver cancer from liquid biopsies.

More than half of the human genome comprises repetitive sequences; research has implicated shifts in these sequences in the development of cancer. Artemis was designed to be alignment-free and genomewide -- it could "consider the entirety of the genome, rather than only that from the ~60 to 85% of reads from next-generation sequencing that can be aligned with high quality," the authors wrote.

Using Artemis, the team analyzed 1.2 billion sequences from 1,290 types of repeats across the entire genome. The sequences came from 2,837 tissue and plasma samples taken from 1,975 patients, including those with lung, breast, colorectal, ovarian, liver, gastric, head and neck, bladder, cervical, thyroid, and prostate cancers.

They determined that Artemis could predict disease in patients with early-stage liver or lung or cancer. The research team could identify tumor-specific changes in 1,280 repeat element types from the LINE, SINE, LTR, transposable element, and human satellite families. Not only did these include changes to known repeats, they also included 820 elements -- almost two-thirds of the total number -- that were not previously known to be linked to cancer.

The researchers then validated Artemis in four cohorts of 532 patients with and without lung cancer and 208 patients at high risk for liver cancer. Artemis classified the patients with lung cancer with an area under the curve (AUC) of 0.82 (95% confidence interval [CI], 0.78 to 0.87), and the liver cancer high-risk cohort with an AUC of 0.87 (95% CI, 0.82 to 0.93); when a set of fragmentation features was added, the AUCs were even higher.

The researchers acknowledge limitations to their current work with Artemis: a need for larger study populations, the lack of ability to directly identify the specific location of changes in repeat elements through the approach, its reliance on the evaluation of changes in repeat sequences that are "inherently variable among the germline of individuals." These, they conclude, should be addressed in future work.

Nevertheless, they write, "ARTEMIS alone or in combination with other genome-wide features may provide an avenue for noninvasive detection, monitoring, and tissue of origin determination of cancer."

Page 1 of 18
Next Page