Latest News

Machine Learning Ranks Cancer Drugs Based On Proteome Of Tumors

By Deborah Borfitz 

April 14, 2021 | Kinomica, a U.K.-based proteomic-data science and diagnostics company, is contemplating development of companion diagnostics (CDx) that ask an open question about which oncology drug should be used to treat a patient—a reversal of the usual yes-or-no answer provided by such tests that are tied to a specific drug or biological product. Answers would come from laboratory analysis of the proteins produced by tumors and list viable therapeutic alternatives ranked by their predicted effectiveness, according to company co-founder Pedro Cutillas, who is also professor of cell signaling and proteomics at Queen Mary University of London. 

If the approach sounds familiar, it should. In late 2017, Foundation Medicine (now owned by Roche) made headlines with a genetic test called FoundationOne CDx that matches cancer patients to the best treatment. It was the first FDA-approved CDx for solid tumors.  

As noted in a study newly published in Nature Communications (DOI: 10.1038/s41467-021-22170-8), anecdotal evidence suggests that proteomic-derived features of tumors may be able to better predict drug responses than genomic alternatives. 

The workhorse here is Drug Ranking Using Machine Learning (DRUML), an ensemble of predictive models trained for drugs with different modes of action that can help clinicians pick the most appropriate cancer drugs for individual patients from among most (412 of 659) of the options now at their disposal, says Cutillas. Many of the drugs are in phase 2 or 3 clinical trials and about a quarter of them are on the market.  

The potential of the technology is vast—to stratify drug targets in different phenotypes and define biomarkers needed for companion diagnostics, determine eligibility for clinical studies and design of basket trials where patients are assigned to different therapies based on molecular markers of their tumor, and as a clinical decision support tool for physicians at the point of care. Any cancer can be analyzed my DRUML with a biopsy sample from the tumor, says Cutillas. 

Unsurprisingly, “drugs appear to be high ranking in different tumors irrespective of the tissue of origin,” he says. Pharmaceutical companies routinely develop cancer drugs across tumor types.  

Training Exercise 

Mass spectrometry was used to analyze 48 leukemia, esophagus, and liver cancer cell line models, which produced the proteomics and phosphoproteomics data for training the machine learning algorithm, Cutillas says. Thereafter, when new samples are obtained from patients, the first step is to quantify the proteins present in the tumor, or their expression. When that data is inputted into the algorithm saved in the cloud, the output is an ordered list of drugs based on their effectiveness in reducing cancer cell growth.  The models built with the two solid tumor cell lines did a good job of predicting drug efficacy in other tumor types, including breast, lung, and colorectal cancer, says Cutillas. The verification proteomics data was obtained from a set of 53 cancer cell models profiled by 12 other laboratories, in addition to a clinical dataset of 36 primary acute myeloid leukemia samples. 

Traditionally, machine learning algorithms that have worked well in datasets used for training and validation do not work nearly as well when tested in independent datasets for verification, notes Cutillas. External labs are invited to further test the DRUML approach with their own datasets by downloading the package

Making DRUML possible was the availability of data from drug response profiles for many cells lines and drugs as well as improvements in omics techniques (i.e., liquid chromatography coupled to tandem mass spectrometry) and label-free methods of quantifying proteins and their chemically modified counterparts (phosphoproteomics), says Cutillas. Random forest, a supervised learning algorithm, was found to perform better than more advanced deep learning methods, which came as a surprise, he adds.  

As new cancer drugs emerge, Cutillas says, they will likely be added to DRUML in collaboration with the pharma companies behind their development. Retraining the models to see the drug response in a large panel of cancer cells and then obtaining proteomics data from the same cell lines would likely be a one- to two-month process. With those two datasets in hand, the retraining exercise could be accomplished in a matter of a few hours. 

The recently published paper is expected to most immediately spark interest among pharma companies and investigators in using DRUML in the design of oncology clinical trials, including eligibility criteria, to identify optimal treatments for individual cancer patients.