December 7, 2023 | When artificial intelligence (AI) was used to assess breast cancer properties in patients given the worst prognosis, many of them were recategorized into a low-risk category where the best outcomes are expected. That’s the crux of a study led by researchers at Northwestern University that recently published in Nature Medicine (DOI: 10.1038/s41591-023-02643-7).
For many decades now, pathologists have been using a tumor grading system that has served to guide treatment decision-making by oncologists about the type of treatment patients will receive—including the length of chemotherapy. But noncancerous elements in the tumor microenvironment, which were considered by the AI-based scoring of survival risk, may have been an important missing link all along, according to Lee Cooper, Ph.D., associate professor of pathology at Northwestern University Feinberg School of Medicine.
It is known from cancer biology studies that nontumor cells, including those from the immune system and others providing form and structure for the tissue, can play an important role in sustaining or inhibiting cancer growth, he says. AI is useful in measuring those elements in a precise and reproducible way.
For the overall prognostic score, the algorithm factored in 26 features of impact. A lot of texture in the space between stromal cells tended to indicate a good prognosis, for instance, which is “observable with the human eye but it is probably something that is not easy for a pathologist to assign a number to,” says Cooper. On the other hand, it was a bad sign if immune cells tended to form clusters around tumor cells rather than be more evenly diffused.
The AI tool was able to identify breast cancer patients who are currently classified as high or intermediate risk but who become long-term survivors. The duration or intensity of their chemotherapy could therefore be reduced and, along with it, the unpleasant and harmful side effects.
Notably, the so-called Histomic Prognostic Signature (HiPS) reliably outperformed pathologists in predicting survival outcomes. This was driven mostly by stromal and immune features, says Cooper.
The problem with tumor grading, which describes how abnormal cancer cells and tissue look under a microscope when compared to healthy cells, is twofold, Cooper says. First, it focuses on the properties of the tumor and so it is not holistic. Second, unlike a computer, it is difficult for pathologists to do their evaluation in a reproducible way.
So that the HiPS might be embraced as an alternative way to gauge a cancer’s likelihood to grow and spread, the development team took pains to ensure it generated outputs that would make sense to pathologists and could be explained in plain language. “We spent several years conducting markups of cells and tissue structures and images and then we trained the algorithm to use that data to look at a [digital] slide and map all the components of the tumor,” explains Cooper.
The comprehensive map enables measurement of all types of patterns that might be of interest to pathologists, including the degree to which lymphocytes had infiltrated a tumor and the proximity of different types of cells to each other. Survival risk scores are based on interactions between cellular and tissue structures that are occurring in the breast tumor microenvironment.
The HiPS was developed using a population-level dataset from the Cancer Prevention Study II sponsored by the American Cancer Society (ACS) and validated in three independent cohorts—the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial of the National Cancer Institute (NCI), the Cancer Prevention Study-3 of the ACS, and The Cancer Genome Atlas, a joint effort between NCI and the National Human Genome Research Institute.
Importantly, the scoring methodology includes a “visual dictionary” to ensure the results are readily comprehensible by pathologists even if they don’t specialize in breast cancer. The high and low values are reported in their language rather than in technical or mathematical terms.
Since this was an observational study, investigators did not evaluate how the biomarker might be used to determine the intensity and duration of chemotherapy, he points out. Clinical trials will be required to establish that HiPS, in the hands of pathologists, would improve patient outcomes and “move the needle on survival.”
But if validated clinically, breast cancer patients in the future could be scored with these criteria and their pathologist would get a “customized explanation” as to why their prognosis is good or bad, says Cooper. Ultimately, with more training, the AI tool might also prove useful in evaluating response to treatment as tissue gets sampled over time to determine tumor regression or progression.
The algorithm could be particularly impactful in community settings where specialty pathologists are in short supply, he adds. HiPS could potentially be made openly and freely available to democratize the ability to generate an accurate prognosis. The user base might logically include community hospitals with limited access to molecular testing of cancer risk (e.g., Oncotype DX), which would only need a glass slide, camera, and an internet connection to make the same sort of assessment.
But first, Cooper says, better datasets need to be utilized for training purposes to eliminate lingering concerns about unintentional bias. The algorithm was built on data about patients from over 423 U.S. counties and digital image annotations by an international network of medical students and pathologists across several continents. The ACS acknowledges that the dataset tapped for the study does not represent the full racial diversity of the country, a situation that is being actively addressed.
Like many institutions around the country, Northwestern Memorial Hospital is making the transition from glass slides to digital images. That creates opportunities to put the power of AI tools like HiPS in the hands of pathologists and oncologists to demonstrate the operational capacity to deploy them in real-world clinical care, says Cooper.
In addition to looking at more diverse clinical trial datasets, the Northwestern team has started developing prognostic models for specific types of breast cancers, notably triple-negative and HER2-positive. “We realize these different subtypes of cancer can behave very differently—certainly their response to treatment is different—and so we expect some of the tissue patterns that are predictive [of survival risk] will look different,” he says.
Industry collaboration will be essential to bringing the prognostic tools into clinical practice, says Cooper, who consults for AI-enabled precision medicine company Tempus and serves as an advisor to both global diagnostics company Veracyte and biotechnology tools provider Targeted Bioscience. It remains to be seen whether HiPS eventually goes through the Food and Drug Administration’s approval pathway for Software as a Medical Device.
“We would be open to partnerships to explore the opportunity to go through the approval process,” he says. “The first step is to use it as a lab developed test... [where] pathology has some degree of freedom.”
Data coming out of the pathology lab has rich clinical and scientific value and is “one of the areas where AI can have the biggest impact in medicine,” Cooper says. “Academic medical centers can play a leading role in developing AI tools,... [but] to bring these to the world we need industry partnerships."