Latest News

AI Tool Enhances Tissue Images, Aims to Improve Surgical Diagnoses

By Paul Nicolaus

February 14, 2023 | Researchers at Harvard University and Brigham and Women’s Hospital, along with collaborators from Bogazici University in Turkey, have revealed new technology intended to help make timely decisions during surgery. Their method uses artificial intelligence (AI) to enhance the image quality of frozen tissue samples in hopes of improving diagnostic speed and accuracy.

With some surgeries, it is possible to perform a biopsy and obtain a diagnosis before the operation takes place, explained Faisal Mahmood, an associate professor of pathology at Harvard Medical School. In other scenarios, like brain surgery, it is not always possible to determine the type of disease or figure out how aggressive it is until the surgery begins. 

In these instances, a sample may be taken and sent off to a pathology lab. “And this is happening while the patient is still on the operating table,” he told Diagnostics World. During this process, an expert would make a diagnosis and report back to the surgeon so that critical decisions can be made, such as how much of the tumor—or brain—should be removed. 

To arrive at a more immediate diagnosis, the tissue cannot be evaluated in its raw form, Mahmood explained. Instead, a process called cryosectioning is used. This quicker alternative, which involves freezing tissue and cutting it into thin sections that can be analyzed with a microscope, is an important tool for intra-operative decision-making.

It trims the time involved from hours or days to minutes and can be used for tasks like assessing tumor margins or differentiating between malignant and benign lesions. However, there is a trade-off that comes along with this increased speed. “The issue is that freezing creates a lot of artifacts,” he said, “and it makes the cells difficult to see.”

Freezing and cutting specimens can introduce issues that compromise image quality and diagnostic accuracy, such as distortion of details at the cellular level, loss of tissue due to ice crystal formation, folding or tearing of sections, or variances in staining due to changes in section thickness, the researchers noted in their paper published in Nature Biomedical Engineering (DOI: 10.1038/s41551-022-00952-9). These artifacts can disguise malignant cells or make benign cells appear atypical, they added.

These are the existing options, Mahmood explained. One route makes it possible to arrive at an answer quickly, but the tissue appears distorted and contains artifacts. The other avenue, which utilizes formalin-fixed and paraffin-embedded (FFPE) tissue samples—a method that preserves the tissue content and structure more clearly and vividly—can lead to high-quality images that may make it easier to arrive at a diagnosis. However, that process typically takes anywhere from 12 to 48 hours. 

“Deep learning has been used to address a variety of different tasks in diagnostic pathology, and deep-learning-based generative adversarial networks (GANs) have been used in many different areas of medicine including image segmentation, resolution enhancement, domain adaptation, virtual staining of histology slides, and stain transfer between different stains,” the study authors noted.

But enhancing the image quality of frozen tissues is an area that has remained largely unexplored, so they set out to pursue this possibility. “What we were thinking is that there’s all this work in artificial intelligence in generating synthetic images,” Mahmood said. He and colleagues wondered how this type of technology could help bridge the gap between quick tissue prep and the gold standard method.

AI Tool Shows Early Promise, Raises Crucial Question

Their AI model is designed to translate between these two approaches and arrive at enhanced images in about 2 to 4 minutes. According to the study, the technology “efficiently corrects” a variety of artifacts in lung and brain sections. The authors noted that its ability to correct various artifact types shows its versatility, which is also revealed in its capacity to “highlight patterns of diagnostic importance” for different kinds of tissue and tumor. 

The thought behind their efforts is that making frozen section tissue look like FFPE tissue could lead to images that are easier to interpret and, ultimately, faster and more accurate diagnoses. Another benefit is that various AI models for pathology are trained mainly on FFPE tissue. Some of those models could potentially be applied to frozen section tissue that has been transformed to resemble FFPE tissue images, he added.

“This is fabulous technology,” said Ulysses Balis, a professor of pathology informatics at the University of Michigan who was not involved with this research. However, the paper raises an important question: How does this type of image content alteration impact diagnostic accuracy? Answering that would require additional research, he added.

International Dataset Used to Test AI Tool

To validate their technology, the researchers chose two disease models that commonly involve frozen sections—one for the brain and another for the lungs—and focused on subtyping glioma and small-cell lung cancer.

These are simplistic tasks, he acknowledged, but they can alter the course of a surgery. “We compared the performance of multiple pathologists when they were looking at just frozen section tissue and when they were looking at both the frozen section tissue as well as the transferred form that came from the AI,” Mahmood explained, “and found that they agreed more with each other when they also could see the AI.” 

Slide images can vary because of different standards and protocols for tissue processing, slide preparation, and digitization. With this in mind, the researchers indicated that it was crucial to see if their AI models trained on frozen section and FFPE tissue from the Cancer Genome Atlas (TCGA) could be generalized to clinical data from other sources.

They scanned 132 brain samples and 166 lung samples from Turkey that were used as independent test sets to evaluate performance. “Essentially, we wanted to get data from a relatively low resource setting to see how it would work when the data is not prepared using the same kind of mechanical tools we often use here,” Mahmood said. The whole idea of using an external test set was to see how the AI model could adapt across an international cohort with tissue that has been prepared differently. “And we found that it adapts quite well.” 

The researchers conducted a reader study using slide images from the independent cohort. They found that it improves the odds of accurate tumor subtyping of second-grade and fourth-grade gliomas by 19% and of lung squamous cell carcinoma and lung adenocarcinoma by 16%. 

Mahmood and colleagues have indicated that it may be possible to apply this type of technology more broadly to other forms of cancer and perhaps other diseases because the methods are, in a sense, generic. “So if you were to use this with, for example, thyroid cancer, you would need to retrain the algorithm on that tissue,” he said. Yet the techniques they have developed would essentially remain the same. 

How is Diagnostic Accuracy Impacted?

At a high level, the findings detailed in this paper are not unexpected, considering adversarial networks are known to be capable of imputing improved image quality, according to Balis, the University of Michigan professor who was not involved with the study. “But the thing that’s interesting about this is that it is scientifically an extrapolation, and that isn’t as well emphasized in the paper as it should be,” he told Diagnostics World. 

The upside is that “this is a tremendous opportunity for taking baseline medical imagery, in this case frozen section histopathology, and improving it with the presumptive expectation that this simplifies the pathologist’s life by removing the distractions of artifacts,” Balis continued. He called the quality of the revised images “superb” and said “you could argue that, at least aesthetically, the resultant images are free of distractions and are perhaps more pleasing to look at.”

Yet he wonders if this veers into “hammer looking for a nail” territory. The GANs have been applied to images to remove factors “perceived to be an intrusion on the diagnostic process,” he said. But “pathologists are trained—in doing frozen section consultative work—to expect and recognize the artifacts and just deal with it,” Balis pointed out, “so I don’t know that the artifacts necessarily are really so intrusive to the diagnostic process.”

When considering the real-world, clinical practice utility of this technology, he said it is important to note that the primary data—the digital whole slide image—is being changed to derivative work, which means the information presented to the pathologist is altered. Balis explained that changing the pixels changes the diagnostic content of the images, which begs the question: What effect does that have on diagnostic accuracy? “That has to be front and center of any approach that modifies the original image,” he added.

“It is totally appropriate to be exploring the use of GANs, and I’m sure in the near future people will be applying generative models to improve these images, but I think the question is still largely unanswered,” he added. “Is the application of this technology to the frozen section diagnosis actually helpful, or is it simply creating an aesthetic continuum for the pathologist that, while beautiful to look at, is not adding diagnostic value?”

Next Steps and Lingering Challenges 

Mahmood said he and colleagues are looking into a number of different ways to build upon this line of work. For example, they are exploring the possibility of building a 3D printed microscope that would allow the user to see a corrected version of a frozen section slide.

He noted they are also looking into the possibility of a larger clinical trial that would evaluate the efficacy of using transferred tissue to make diagnoses at scale. It would essentially amount to a much larger version of their small-scale reader study to arrive at a population-level evaluation. 

Although the researchers trained their AI model on samples from the TCGA—an extensive dataset that comes from multiple institutions in the United States—he and colleagues are interested in expanding this “to be much, much larger from many more institutions, potentially across the country and across the world to have something that would really be applicable to all kinds of data,” Mahmood explained.

The variability in data depends upon how samples are scanned and prepared. They mitigated some of this by using an independent test cohort from Turkey, but there is a need for further validation and testing. “So it’s a first step,” he said. “It’s a proof of concept for what is possible.”

One of the biggest lingering challenges is having data that are diverse enough to train the models effectively. “Of course, we can show that it works very well in this constrained setting. And we’re showing that it’s adaptable with an independent test cohort, but we really need to test this more extensively,” Mahmood said.

In addition, evaluating these models is inherently difficult because the work involves synthetic data generation. There isn’t a great way to assess these AI models aside from showing them to pathologists and asking questions, he added, so better techniques are needed to help evaluate efficacy downstream.

Paul Nicolaus is a freelance writer specializing in science, nature, and health. Learn more at www.nicolauswriting.com.