Tuesday, October 17, 2017

Announcement: Two new collaborations

To continue advancing towards our goals of improving patient health via integrated genomic and phenotypic data, the Monarch Initiative has begun two new, exciting collaborations.

Dr. David Osumi-Sutherland and Dr. Helen Parkinson, Head of Molecular Archival Resources at EMBL-EBI, are delighted to announce the beginning of a new collaboration between the EBI and the Monarch Initiative.

This collaboration will focus on integration of systematic phenotyping data from model organisms and biosamples, improvement of patient diagnosis using more deeply integrated model data, and improved rigor of semantic data integration for invertebrates, neural connectivity data, behavior, and molecular phenotypes. The Monarch team has collaborated with Drs. Parkinson and Osumi-Sutherland over many years on a variety of projects such as the International Mouse Phenotyping Consortium and the Global Alliance for Genomics and Health, and we are very excited to have funding from the NIH Office of the Director to support robust collaboration and integration with EBI resources.

Dr. Osumi-Sutherland is an expert in semantic data modeling and has directed and contributed to the Gene Ontology, the Virtual FlyBrain, FlyBase, and many other semantic standards. Dr. Parkinson is a trained geneticist and an expert on bioinformatics, biomedical ontologies, and knowledge engineering. The two will bring their vast experience with ontologies, biomedical data, model organisms, and biological data integration to assist in the development of the Monarch platform for variant prioritization and disease mechanism discovery.

The Monarch Initiative is also proud to announce our selection as a Driver Project for the Global Alliance for Genomics and Health (GA4GH) and their Connect Strategic Plan. GA4GH Connect will impact discovery, analysis, and interpretation of genomic data to enhance responsible sharing of this data by 2020.

Monarch is one of 13 international genomics groups selected as a Driver Project. Driver Projects were sourced from existing initiatives that have been leading genomic medicine research and data analysis and that were actively working with GA4GH in previous engagements. Monarch is thrilled to work closely with GA4GH as a Driver Project! Monarch’s leadership in developing the Human Phenotype Ontology (an IRDiRC recommended resource), the Exomiser variant prioritization tool that leverages cross-species genotype-phenotype data, Phenopackets, the patient-centered phenotyping tool Phenotypr, and their ClinGen/Monarch variant evidence modeling project are all relevant to the various new GA4GH workstreams.

These Projects will focus on creating data sharing frameworks and standards for complex genetic data, with goals of identifying ways to responsibly and securely share genomic data as openly as possible. These new standards and frameworks will advance research, improve clinical management, and lead to the creation of new interoperable tools for data analysis between different groups, communities, and countries. A primary aim is to encourage collaboration so genomic datasets can be used, analyzed, and managed in a way that advances scientific discoveries that will benefit patients with rare and complex diseases.

“Healthcare is harnessing the power of genomics to make better diagnoses and treatment decisions in rare disease and cancer across the world,” said Ewan Birney, Director of EMBL-EBI and Chair of the GA4GH Steering Committee, when discussing GA4GH. “We have a responsibility to enable this future for everyone, and to harness the resulting data for further research on human health and fundamental biology.”

Dr. Melissa Haendel will co-lead the Phenotype and Clinical Capture technical workstream, and Dr. Peter Robinson will direct the Monarch driver project.

The full announcement, here, was unveiled today at the GA4GH 5th Plenary Meeting. Also at this meeting, Monarch PI, Dr. Melissa Haendel presented the Monarch driver project on October 16th. “The Monarch Initiative is ecstatic about our collaborations with EBI and GA4GH.  There is a great opportunity to impact human health by building standards and tooling for genomic data interpretation using a wide variety of already available public data sources,” said Dr. Haendel. “These new collaborations and improved data sharing will push us closer to our goal of disease diagnosis and discovery.”

Friday, August 25, 2017

Tuesday, June 13, 2017

Redefining disease: an update from the NIH Undiagnosed Disease Program

There are approximately 30 million Americans living with a rare disease, and only about 5% of those diseases have a known treatment. To better understand rare diseases and conditions and to support the diagnosis of patients, the NIH Undiagnosed Diseases Program (UDP) was created in 2008. The UDP has worked with over 700 patients, of which, about 40% are children. Of these patients, between 25-50% of cases now have a diagnosis, although time to diagnosis has varied from one week to four years[1]. These numbers are lower for patients outside of the UDP: time to diagnosis is 4.8 years on average but can take up to 20 years[2]. Historically, the UDP has selectively accepted only about 100 new patients a year, focusing on the hardest-to-diagnose patients, with the goal of diagnosing and treating these patients to improve their livelihood.

A recent article[3] by Boerkoel et al., co-written by several members of the Monarch Initiative, describes recent advances made at the NIH UDP. The article, published in Frontiers in Medicine, is titled “Defining Disease, Diagnosis, and Translational Medicine within a Homeostatic Perturbation Paradigm: The National Institutes of Health Undiagnosed Diseases Program Experience.” Historically, diseases have been defined by experts that write about them in prose. However, definitions in free text can not easily be used by computer algorithms. The authors have pioneered new definitions of disease that capture logical relationships between diseases and symptoms; these rigorous definitions can then be leveraged in diagnostic software. This cutting-edge approach software is being used in the UDP diagnostic process, improving the speed and precision of diagnosis.

To perform the integrated analysis, the UDP group used a precision vocabulary of symptoms (Human Phenotype Ontology) together with diagnostic software such as Exomiser, PhenIX, and Exome Walker. These tools allowed comparison of patient symptoms to HPO terms, and comparison of UDP patient's phenotypic profiles to those of humans and models organisms with known diseases. Compared to manual diagnosis alone, incorporation of Exomiser software boosted rates of diagnosis by 10-20%.

To further complicate disease diagnosis, It is estimated that each person contains between 5 and 50 genetic variations that could eventually lead to a disease[4]. This amount of genetic variation can make it hard to figure out what disease the patient has. For a group of UDP patients with abnormalities of the nervous system but for whom there was no certain disease-causing gene, Exomiser prioritized 11 genes that were likely candidates for causing nervous system disease in several of the UDP patients; however, the gene variations observed in patients were not previously described in the scientific literature. To test the hypothesis, laboratory experiments were done in fruit flies engineered to have the same gene variations as the patients. In each case, the altered gene led to behavioral defects and a shortened lifespan in the fly. Taken together, these results suggest that each of the 11 genes is important for nervous system function in humans. These results also support the need for laboratory animals in rare disease research, both for diagnosis and for the development of therapies.

Importantly, the UDP group focused on making their research translational and scalable by building a “village” of global experts. The group created virtual communities to enable collaboration across the globe to increase knowledge of rare diseases and improve patient diagnosis. While the UDP is less than a decade old, their work will have a long-lasting impact on previously undiagnosed patients and their families, within the UDP and beyond.

A portion of the UDP data is made publicly available on the Monarch Initiative website, with a focus on comparing UDP data to model organisms and known diseases.

Monday, April 17, 2017

Biocuration 2017 Conference Highlights

Several members of the Monarch Initiative convened upon sunny Palo Alto, CA for the International Society for Biocuration (ISB) annual meeting. The meeting, held at Stanford University from March 26-29, 2017, brought together experts in the field of biocuration and included presentations, posters, and workshops on the topics of curating biological data, database creation and maintenance, community annotation, and education. Between brief, but glorious, moments of sun soaking (reminder: several of us are from rainy Portland), we enjoyed meeting our fellow biocurators and presenting talks and posters. Here’s a brief overview of the presentations from four Monarch team members, including links to the slides and posters in case you missed them!

Melissa Haendel, Monarch co-PI, presented a talk titled “How Open is Open?” discussing the open science principles of FAIR TLC. Here, FAIR TLC stands for making data: Findable, Accessible, Interoperable, Reusable, Traceable, Licensed, and Connected.  Melissa discussed how we can use the FAIR TLC principles to evaluate open biological databases and repositories and go beyond traditional evaluation metrics, such as publication numbers. You can view her talk slides here: bit.ly/biocurMH  

New Monarch team member, Lilly Winfree, presented in the precision medicine session on how Monarch uses various ontologies to semantically integrate genotype and phenotype data from multiple species with a goal of disease diagnosis for patients. Lilly explained how Monarch semantically integrates this data using the ontologies GENO (for genotypes) and SEPIO (Scientific Evidence and Provenance Information Ontology), which have been spearheaded by Monarch ontologist Matt Brush. Lilly also showed a use case of the Exomiser tool, which is currently being utilized by Genomics England to identify pathogenic variants. You can view Lilly’s slides here.


Chris Mungall, Monarch co-PI, was awarded the ISB Exceptional Contributions to Biocuration Award. Congrats, Chris! In his acceptance talk, Chris described his career path, including his brief stint as a chicken farmer! Or maybe not quite a chicken farmer...you’ll have to ask Chris for the details. We also learned that Chris and fellow award-winner Marc Feuermann share a love of science fiction, and have both authored sci-fi stories.


Nicole Vasilevsky, Monarch project manager, presented two posters, which both won a best poster award at the conference! Nicole also won an award for a community annotating contest, hosted by GigaScience, using the iCLiKVAL and Hypothes.is tools. Great work, Nicole!

Nicole’s poster titled “Training future biocurators through data science trainings and open educational resources” was co-authored by several OHSU faculty: Ted Laderas (DMICE), Jackie Wirz, Bjorn Pederson (DMICE), David Dorr (DMICE), Bill Hersh (DMICE), Shannon McWeeney (DMICE) and Melissa Haendel. The poster, available here, described development of in-person data science trainings offered as short courses to OHSU students, and the development of Open Educational Resources (OERs) that are available online (dmice.ohsu.edu/bd2k). Conference attendees were particularly interested in the BD2K tutorials on topics related to biocuration (such as BDK05 on Data Standards and BDK12 on Data annotation and Curation), as there is a lack of formal training in biocuration. Several biocuration efforts discussed at the conference involved crowd-sourced efforts, so these tutorials will be useful for training contributors to these community databases. I encourage you to check out these interesting tutorials!

The second award-winning poster titled “A need for better data sharing policies: a review of data sharing policies in biomedical journals” described a project led by Robin Champieux and co-authored by Nicole, Jessica Minnier (from the OHSU-PSU School of Public Health) and Melissa Haendel, and is available here. This poster described an analysis of biomedical journal data sharing policies. It is widely agreed that data sharing is important for ensuring transparency of research results and scientific reproducibility (and data sharing will certainly facilitate biocuration efforts to extract information from the literature into databases). This analysis showed that approximately 40% of journals (in our sample) either required or strongly encouraged data sharing upon publication. The data from this analysis is shared here (which includes a list of journals that require or encourage data sharing) and we hope that researchers will publish in those journals that require data sharing. A preprint is available here, and the manuscript has been accepted for publication in PeerJ and will be available soon.

Photos from the conference can be viewed

Biocuration F1000 channel presentations:

Written by Nicole Vasilevsky and Lilly Winfree

Monday, March 20, 2017

Meeting of the Minds: Monarch All Hands Meeting 2017

At the end of February, the global members of the Monarch Initiative convened at the Jackson Laboratory in Farmington, Connecticut for our annual All-Hands Meeting. This collaborative meeting allowed us to set goals for 2017, have hands-on working time for various projects, bond over an epic Hibachi meal, and compete in giant Jenga. Since the Monarch team works around the globe, the All-Hands Meeting was a unique chance for everyone to gather in the same room. As a new member to the team, I particularly enjoyed meeting the rest of my coworkers in person - instead of over video chat! The meeting was a big success, yet it ended on a dramatic note when several of our flights were canceled, resulting in three-hour-long taxi rides and an impromptu trip to Waffle House.

In this post I will mention some of the highlights from the meeting as well as the goals we discussed for the upcoming year and how these goals fit into three main themes: ontologies, tools, and collaborations.

A hallmark of the Monarch Initiative is work on the underlying ontologies, which was reflected during the meeting with discussions on several phenotype and disease ontologies. Monarch team members and colleagues presented work on the Human Phenotype Ontology (HPO), the Mammalian Phenotype Ontology, and Upheno, which is the "uber phenotype ontology" that combines all species-specific ontologies. We also heard about advances with the Disease Ontology and the Merged ONtology of Disease Objects (MonDO). Goals for the upcoming year include further developing the HPO and MonDO and increasing the ontology browsing capability on the Monarch webpage. The Monarch Team is also composing two ontology-rich manuscripts that will be published in the near future: one detailing the inclusion of lay person synonyms into the HPO, and the second explaining the development of the Genotype Ontology GENO. You can read more about how we incorporate these ontologies here: https://monarchinitiative.org/page/about.

Nicole Vasilevsky, Monarch Project Manager, describes MonDo as being "created using novel mechanisms to semi-automatically merge multiple disease resources to yield a coherent merged ontology. This ontology should aid users to find relevant information related to diseases of interest." Importantly, Nicole was designated as the MVP of the meeting as she actively contributed to Monarch ontologies during the meeting! Good work, Nicole!

During the meeting, there were several fascinating demos on Monarch tools as well, such as the Exomiser. Clinicians and researchers can use the Exomiser, which is a Java tool, to analyze genomic information for disease-causing variants and also compare phenotypes across species using the PhenoDigm algorithm. We also learned how to use the Monarch tools JAnnotator and PhenoTyper, which automatically select phenotype terms from journal article text, allowing those terms to then be used in phenotypic comparisons. Another interesting tool developed by the Monarch team is PhenoPackets, which I wrote about for Open Data Day. Many of these tools are still being actively designed, and we would love feedback (via email or write a Github issue) if you or your group have used these tools.

During the first day of our meeting, we had the pleasure of hearing from several of Monarch’s guests and collaborators. First, Sanford Imagenetics discussed precision public health and the role that Monarch tools can play in advancing patient health. Next David Adams from the Undiagnosed Disease Program spoke about how Monarch and the NIH can work together to improve diagnosis of rare diseases. Jean-Philippe “JP” Gourdine, research associate in glycobiology and metabolomics at OHSU, gave an interesting talk in which he introduced the use of metabolomics data for gene prioritization in the context of the Undiagnosed Diseases Network. JP and Matt Brush, Monarch Ontologist, also discussed the integration of the molecular glycophenotypes ontology (MGPO) into the HPO. Our next guest came all the way from Paris to represent Orphanet; Annie Olry explained the ongoing analytical work behind Orphanet’s portal on rare diseases. The next three presentations came from members of the Jackson labs, our gracious host for the meeting. Judith Blake and Cindy Smith demonstrated how the Mouse Genome Informatics resource is incorporating the Disease Ontology. The next presenter was Elissa Chesler, who showed us GeneWeaver, which is an interesting tool that can integrate heterogeneous genomic data. To round out the session, Sue Mockus discussed Jackson’s Clinical Knowledgebase tool for exploring genomic profiles. These presentations highlighted the important role that collaboration plays in the Monarch Initiative -- we thank our guests for joining us and helping make this Monarch All Hands meeting a success!


Epic Jenga battle

Leadership dinner

Code time

Sporting Monarch T-shirts

Before the taxi rides!
Hibachi dinner

Sunday, March 5, 2017

Open Data Day spotlight on PhenoPackets

To celebrate Open Data Day 2017, I want to highlight one of the Monarch Initiative’s innovative data sharing tools: the PhenoPacket. At Monarch we playfully refer to the PhenoPacket concept as a “bag of phenotypes” to describe patients. If you aren’t a researcher or clinician, you are probably wondering what a “phenotype” is. A phenotype can be simply defined as the patient signs and symptoms associated with a disease, or more technically defined as the physical manifestation of the combined effects of a person’s genes and their environment. PhenoPackets are a novel way to systematically organize and share the data associated with a patient’s phenotypes.

Currently, data about a patient’s phenotypes is collected by doctors and researchers and can be found in publications, databases, electronic health records, clinical trials, and even social media. This wide variation in data creation leads to diverse data that is not standardized or in a central location, so it is very difficult to see patterns and connections between patients among this data. PhenoPackets were designed to solve these large data integration and computation problems that researchers and clinicians face when collecting or working with phenotype data.

Creating phenotype data that is sharable and standardized allows clinicians and researchers to use this data to improve patient diagnosis. A goal of PhenoPackets is to create data that is more uniform and therefore computationally useful. For a doctor diagnosing a patient, this searchable, comparable phenotype data will allow the clinician to compare their patient to known diseases and other patients around the world, increasing the accuracy of diagnosis. When a patient comes to see their doctor, their PhenoPacket could be created from the phenotypes the doctor sees, and this information could follow the patient to other clinics and could also be published (without personal health information) in journals and databases or even on the Web directly.

What kind of data actually makes up a PhenoPacket? Each patient would have a collection of phenotypes that would be coupled with any available genomic information (e.g. sequencing data), and the data in their PhenoPacket would include: age of onset, symptoms, family history,  and quantitative values related to specific phenotypes, with evidence for each of these categories. The phenotypes are described with common terms using the Human Phenotype Ontology (HPO), allowing for integration of data and computational analysis. The HPO is now incorporated into the Unified Medical Language System (UMLS). Importantly, PhenoPackets do not contain any personal health information, so each patient can remain anonymous in publications or databases. On the technical side, PhenoPackets are encoded in JSON or YAML, but are programming language neutral with implementation tools in several languages. You can read more about the code behind PhenoPackets and try out a tutorial here: https://github.com/phenopackets/phenopacket-format/wiki/Getting-Started

PhenoPackets are supported by the Monarch Initiative and other communities of researchers to aid in disease diagnosis and personalized medicine, as well as model organism research. The creation of more standardized phenotype data that is openly shared amongst researchers and clinicians will lead to more large scale phenotypic data analysis, which can improve patient diagnosis and outcomes and mechanistic discovery.

Read more about the HPO here: