By Allison Proffitt
November 17, 2015 | Boston Children’s Hospital has announced the results of the CLARITY Undiagnosed challenge, the second CLARITY crowdsourced competition sponsored by the Manton Center for Orphan Disease Research at Boston Children's and the Department of Biomedical Informatics (DBMI) at Harvard Medical School.
The first CLARITY challenge launched in 2012 focused on identifying mutations underlying rare disorders. CLARITY Undiagnosed asked teams to interpret DNA sequence data and medical information to identify molecular diagnoses for five families with as-yet undiagnosed conditions, and return clinically useful reports to doctors and families.
Nationwide Children's Hospital (Columbus, Ohio), was named the winner and awarded $25,000. Invitae Corporation (San Francisco) and Wuxi NextCODE Genomics (Cambridge, Mass.) were named runners-up. Twenty-six teams from seven countries registered to participate; 21 teams submitted reports.
Each team received whole genome FASTQ files sequenced on an Illumina platform at about 30x-35x coverage, said Alan Beggs, co-organizer of the Challenge and director of Boston Children's Manton Center. Most teams used the Picard aligner and a “large number” used GATK to annotate the BAM files to generate VCFs.
“Our conclusion last time was that the informatics aspects of this were starting to converge,” said Beggs. But CLARITY 2 exposed disparity in the ways variants are translated into medical findings, especially in particularly challenging cases.
None of the patient families that participated in the challenge received what Beggs called a “smoking gun” mutation, and most of what was returned were variants of unknown significance.
That’s not too surprising; the cases were chosen because they were uncommonly challenging. “Two-thirds of cases don’t get solved among all chronic conditions and we had five cases that were particularly difficult,” Beggs said, “but in a number of cases there were some interesting candidates that are now useful for follow up.”
But there was not strong concordance among the different groups.
“What it says is that when groups are left searching for something, what you decide to report as potentially significant is very variable,” Beggs said. “It depends partly on how you look and it depends in large part on the particular views… or algorithm that’s being used to related genes to phenotypes.” How clinicians weight phenotype is also very significant, he added.
Variant classification was also inconsistent between teams. “If six or seven teams identified a particular variant in a particular gene, what we found was that maybe four of them classified it as a VUS, one of them said possibly pathogenic, and the other one said uncertain significance. They’re all sort of saying the same thing, but one of them is weighting it a little bit more,” Beggs said.
Reporting Best Practices
Beggs said that the process of digging deeply into the 21 submitted reports is only just beginning; the CLARITY organizers plan to publish their findings, and he expects that to take several months. Some best practices, though, are already clear, particularly around what information should be included in reports and how it should be presented.
“It’s very important to list in a tabular format the variants that you’re reporting on,” Beggs said. “You need a gene symbol; you need a chromosomal location; you need to specify which version of the genome you are using: is it hg19 or something else? If you cite the effect of that variant on a protein sequence, you need to list the accession number for your reference sequence. For example, amino acid 40 in one transcript might be amino acid 72 in another transcript.”
There should be an explanation for why a variant is listed, Beggs continued, a rationale for how a gene might be responsible for the patient’s phenotype and how a variant in that gene might be pathogenic.
“Without all this information, it can be impossible to properly interpret or reproduce some of the findings"
The winning teams were distinguished by their reports, Beggs said. Nationwide Children’s Hospital issued a packet of reports for each family, which Beggs said set their entry apart, and “really did an outstanding job of explaining things.”
Each packet included a report for the referring physician that was technically-written, but “still quite understandable”, a counseling letter addressed to the family that was written by the hospital’s genetic counselor, and an overview of the process that listed the variants that had passed the various testing criteria, Beggs explained.
Combined, the reports were thorough and useful. “Theoretically a referring physician or a researcher who was highly motivated and wanted to explore further could go in and look at more information,” he said.
Taking On the Toughest of the Tough
Peter White, director of the Biomedical Genomics Core and director of Molecular Bioinformatics at The Research Institute at Nationwide Children's Hospital, led the Nationwide Children’s team. Nationwide has been sequencing about 300 whole exome samples a month, and 5-10 whole genomes, but emphasis is shifting toward clinical whole genomes, he said.
White viewed the CLARITY challenge as an opportunity for Nationwide to test its abilities and learn from others. “It would be good for us as an institute to put a team together and start working as a team on solving some of these challenges around genomics,” he reasoned, “and I think we could also learn from some of the other teams that enter.”
White said he was “stunned” when Nationwide won. In fact, the team at Nationwide Children’s Hospital almost gave up.
To tackle the five CLARITY patients, he assemble a group of human geneticists, molecular geneticists, a neurologist, big data scientists, bioinformaticians, computer scientists, and three genetic counselors. Various parts of the team had worked together before, but never all on the same project.
“It’s great to start these conversations between the clinical folks and the research folks,” White said, “because as we move forward with the challenges around diagnosing patients with these more complicated genetic disorders it really is going to take a team approach like this.”
The Nationwide team used their own algorithm for secondary analysis: Churchill, a pipeline that Nationwide offers free to other academic groups and is licensed as a SaaS offering through GenomeNext.
Churchill can go from FASTQ files to VCF in about 70 minutes, White explained, and the algorithm gets, “much higher accuracy in the variant calls than with the Illumina pipeline.” The annotation helped the team do a better job of producing comprehensive reports, he said.
But even then, at one point White said the team was frustrated with not finding any results.
“In the case of the CLARITY challenge, these were really difficult cases. There was definitely one point in the competition where most of the team was like, ‘We’re not finding anything. Maybe we should just give up. This is too hard!’” White said. “I’m glad I kept them going,” he laughed.
In the end, Nationwide’s work earned it $25,000 and top honors.
Looking ahead, White does hope to apply some of the lessons learned from the CLARITY challenge to the clinic at Nationwide, but he admits that the size of the team he assembled here is not scalable.
Looking ahead, Nationwide does want to take on more families with undiagnosed disease—initially aiming for 100-300 samples a year—and White said that the CLARITY team will serve as a model for future teams. But he thinks the size and makeup of the current team will only scale to a few hundred cases a year.
“Once you start doing thousands of patients, it’s obviously going to have to be far more automated,” he said. “In the future if we are sequencing every patient this definitely wouldn’t scale.”
Negative Findings
Invitae was named one of the competition’s two runners-up. Beggs said he and the team were particularly impressed that their clinical reports described the limitations of their findings, including lists of candidate genes they examined and ruled out.
“They listed some of the candidate genes that they had ruled out. In a number of [other] cases, teams didn’t do that,” Beggs explained. “Negative information can be just as important to patients as positive information, even though they know negative information doesn’t rule something out.”
We all know that the sensitivity of these tests is unknown and well below 100%, Beggs continued, but practically naming the genes that were ruled out can be so helpful for patients. For one of the CLARITY 2 patients, a 64-year-old former mountain climber with unexplained loss of motor function, the competition was able to rule out a diagnosis of Amyotrophic lateral sclerosis (ALS).
The CLARITY challenge represents the future of genome diagnostics, bringing together academic and commercial labs to share best practices. We learned a lot through this and were happy to be able to participate,” said Michele Cargill, a co-founder of Invitae.
Back to the Raw Data
WuXi NextCODE was the other runner-up. WuXi NextCODE enjoyed the challenge, Hannes Smárason, the company’s CEO said, and was pleased that it drew attention to the need for genomic diagnostics.
"What sets our approach apart is making solving tough cases scalable. So we were very happy to have the chance to demonstrate our unique approach and world-class genome interpretation system as one of the winners,” Smarason said via email. “We hope that doing so helps to advance that goal of bringing genome diagnostics closer to being standard of care in undiagnosed diseases."
Alan Beggs said that WuXi NextCODE’s report stood out because it identified a possible intragenic deletion in a gene on the X chromosome in a male patient. “The deletion—if it exists—is a mosaic, de novo deletion,” Beggs said. “The reason they were able to do that is they looked at the individual reads and calculated read density across a number of genes. They went back to the raw data!”
The deletion is only a possibility, but it’s one of the findings Beggs is most excited about. “We are now experimentally confirming it,” he said. “Whether or not it’s real, it’s very meaningful that they identified it and flagged it… WuXi NextCODE is the only team that reported that and we happen to think it’s a strong candidate. It may still turn out not to be real; it hasn’t been confirmed. But it’s suspicious enough and relevant enough that they got credit for that.”
Focused on Caring
WuXi NextCODE’s finding was for a patient named Jeremy, a seven-year-old boy whose medical issues began at birth. Jeremy was tiny when he was born, and didn’t gain weight well. He became blind by 4 months old due to craniosynostosis, in which the plates that form the skull fuse too soon, putting pressure on the brain, which has no space to expand. Jeremy has had 38 surgeries to date, many to separate the skull bones and relieve the pressure. After surgery, his symptoms are relieved for a time and his vision returns, but his skull bones have repeatedly re-fused, causing the pressure in his head to increase again.
Angela Hobbs, Jeremy’s mother, said the family has crisscrossed the country looking for answers from doctors in Texas, Washington, D.C., Ohio and more over the past seven years. The family had already had many genetic tests done including whole exome sequencing, but the CLARITY challenge marked their first whole genome sequencing. Jeremy, his parents, and one of his siblings were sequenced.
The Hobbses have enrolled Jeremy in the Manton Center, where Beggs said they hope to confirm the mutation in his blood, saliva and possibly a skin sample, and then to determine the degree of mosaicism and the mutation’s role in Jeremy's health problems.
Angela Hobbs said the family is surprised and excited about the path forward. “He’s just had so much testing done before, and we were used to getting: ‘Sorry we didn’t find anything.’ We didn’t want to get our hopes up.”
The reports generated as part of the challenge haven’t been returned to the families, but Angela sees other best practices in the CLARITY model that she hopes will catch on in rare disease diagnosis. Her experience with CLARITY was different from previous research studies Jeremy was a part of that she described as “competitive and closed.”
“We’ve been in other research studies that we were excited about, then when it happened they wouldn’t give us any information, or tell any of our doctors any information because they said, ‘It’s a research study; we would only tell you if we were sure about a finding.’ I understand they don’t have anything major to tell us, but we wanted some information. We think, ‘Why did we put our son through this?’” Hobbes said. “This study has been so focused on caring for our son, and not focused on research groups wanting to write a research paper and keep everything completely silent. But this study, immediately from the start, as soon as they had sequencing results they forwarded it to our doctors. It’s just been so great.”