By Dana Barberio
September 27, 2016 | Researchers at Stanford University and UCLA are bringing the latest in machine learning and digital image processing to bear on cancer diagnostics, expanding the toolkit for physicians.
Stanford University researchers are using machine learning to help evaluate lung cancer tissue on slides. The research, published on August 16 in Nature Communications (doi:10.1038/ncomms12474), was led by Daniel Rubin, Director of Biomedical Informatics at Stanford, and Michael Snyder, who directs the Stanford Center for Genomics and Personalized Medicine.
Distinguishing adenocarcinoma from squamous cell carcinoma of the lung is critical for chemotherapeutic selection and choice of targeted therapy; however, differentiating the two cancers visually is difficult, and whole slide tissue samples pose informatics challenges.
To overcome these problems, the Stanford researchers extracted quantitative histopathology features of lung cancer samples using a fully automated image-segmentation software pipeline, and they transformed the raw data of the lung cancer features into a mathematical model with a machine learning method. This helped them pinpoint the tissues’ features that most powerfully predicted diagnosis and prognosis.
To test their pipeline, they acquired publicly available histopathology-stained, whole-slide images of lung adenocarcinoma and squamous cell carcinoma, along with corresponding prognostic data: 2186 samples from The Cancer Genome Atlas (TCGA), and 294 additional images from the Stanford Tissue Microarray (TMA) Database.
Using their image-segmentation pipeline, they captured ten thousand characteristics: “some of which the human eye can see, [but] most of which the human eye cannot see,” Daniel Rubin told Diagnostics World News. They then applied machine learning methods to select features that would best distinguish short-term and long-term survivors in stage I adenocarcinoma or squamous cell carcinoma using TCGA data as the training set. “There were 15 or so features, or classifiers, that were most informative for survivability and differentiating tumor types,” Rubin said.
Next, the researchers tested the machine learning model using the TMA images as a test set, and were able to successfully predict patient outcome.
“The nice thing about the method is that it is all computational so it doesn’t require additional costly lab work. Most of the cost is getting at the data if it’s not publicly available,” said Rubin. Researchers do face data sharing hurdles in acquiring digital images of pathology slides, and integrated molecular data and image data is limited. But they push forward. Next in line for testing: breast cancer samples, as there is a wealth of data available.
Machine learning seems to hold promise as one piece of the puzzle for cancer diagnostics. “We have found correlations between the molecular processes (reflected in the gene expression data) and these histopathology quantitative image features,” Rubin said. “We will probably do best when we combine these three pieces [i.e. gene expression, histopathology, and clinical data] into an integrated diagnostic.”
Fast Cells, Fast Answers
At UCLA, researchers are also using advanced image analysis in conjunction with a slow motion microscope to find cancer cells early.
A photonic time-stretch microscope developed by Bahram Jalali, professor and Northrop Grumman Opto-Electronic Chair in Electrical Engineering at UCLA, can photograph flowing blood at an astounding 36 million images every second, making it possible to distinguish rare cancer cells in a milieu of healthy white blood cells. The microscope has the ability to detect these cancer cells without labeling them, a process that can damage the cells. Jalali and his team present this novel label-free imaging flow cytometry technique in their study published in a March Nature Scientific Reports article (doi:10.1038/srep21471).
Photonic time-stretch has applications in particle accelerators and characterization of explosions. Here, Jalali applies the technology to flow cytometry. “You can think of time-stretch as a slow motion technology,” similar to slow motion video, he explained. In the flow cytometry technique, cells flow rapidly through a tiny tube only 1 cell thick. The time-stretch microscope, using specially designed optics that boost image clarity, slows the flow and captures images of cells as they stream past.
“Much like a camera flash is used to take pictures recorded digitally, the technology uses flashes in the form of quadrillionth-of-a-second laser pulses,” said Jalali. This signal is too weak and too fast to be detected using normal instrumentation. Each flash captures a frame, the rate is slowed so that the signal can be digitized, and the device amplifies the signal so that the image can be detected.
To actually make use of the millions of images the microscope captures, the time-stretch technology is coupled with deep learning analysis to distinguish cancer cells from healthy white blood cells. Deep learning, a subset of machine learning, creates algorithms that use unstructured data—things that are traditionally hard for a computer to process, such as images or audio waves—to identify new images or audio waves. For example, you would train a computer using deep learning to recognize a picture of a cat by showing it thousands of pictures of cats. This is the first application in the label-free classification of cells, Jalali said.
“You end up with this big data challenge” using time-stretch technology, “something we initially hadn’t appreciated until this started to work,” said Jalali. 100 GB of data is generated from all the images. In order to distinguish cell types from each other using multiple characteristics, “you need a statistical method such as a machine-learning technique in which you can create a model that can be trained to distinguish cell types from each other,” he said. For this application, the novel technique identified 16 physical characteristics of the cells including biomass, size, and granularity, which could be used to distinguish healthy cells from cancerous cells.
The results? The researchers were able to identify colon cancer cells in a background of white blood T cells. In another application, they distinguished different lipid-accumulating algal strains, important in biofuel production. Their technology captured images of cells flowing at the unprecedented rate of 100,000 cells/sec with minimal distortion. This is faster and 17% more accurate than conventional size-based techniques.
The deep learning computer program identified cancer cells with over 95% accuracy. Jalali believes that together with time-stretch technology deep learning will have many medical, biotech, and research applications. The UCLA group is being supported by Nantworks as they move into pre-commercialization stage.