Biocuration 2017 Conference Highlights

Several members of the Monarch Initiative convened upon sunny Palo Alto, CA for the International Society for Biocuration (ISB) annual meeting. The meeting, held at Stanford University from March 26-29, 2017, brought together experts in the field of biocuration and included presentations, posters, and workshops on the topics of curating biological data, database creation and maintenance, community annotation, and education. Between brief, but glorious, moments of sun soaking (reminder: several of us are from rainy Portland), we enjoyed meeting our fellow biocurators and presenting talks and posters. Here’s a brief overview of the presentations from four Monarch team members, including links to the slides and posters in case you missed them!

Melissa Haendel, Monarch co-PI, presented a talk titled “How Open is Open?” discussing the open science principles of FAIR TLC. Here, FAIR TLC stands for making data: Findable, Accessible, Interoperable, Reusable, Traceable, Licensed, and Connected.  Melissa discussed how we can use the FAIR TLC principles to evaluate open biological databases and repositories and go beyond traditional evaluation metrics, such as publication numbers. You can view her talk slides here:  

New Monarch team member, Lilly Winfree, presented in the precision medicine session on how Monarch uses various ontologies to semantically integrate genotype and phenotype data from multiple species with a goal of disease diagnosis for patients. Lilly explained how Monarch semantically integrates this data using the ontologies GENO (for genotypes) and SEPIO (Scientific Evidence and Provenance Information Ontology), which have been spearheaded by Monarch ontologist Matt Brush. Lilly also showed a use case of the Exomiser tool, which is currently being utilized by Genomics England to identify pathogenic variants. You can view Lilly’s slides here.


Chris Mungall, Monarch co-PI, was awarded the ISB Exceptional Contributions to Biocuration Award. Congrats, Chris! In his acceptance talk, Chris described his career path, including his brief stint as a chicken farmer! Or maybe not quite a chicken’ll have to ask Chris for the details. We also learned that Chris and fellow award-winner Marc Feuermann share a love of science fiction, and have both authored sci-fi stories.


Nicole Vasilevsky, Monarch project manager, presented two posters, which both won a best poster award at the conference! Nicole also won an award for a community annotating contest, hosted by GigaScience, using the iCLiKVAL and tools. Great work, Nicole!

Nicole’s poster titled “Training future biocurators through data science trainings and open educational resources” was co-authored by several OHSU faculty: Ted Laderas (DMICE), Jackie Wirz, Bjorn Pederson (DMICE), David Dorr (DMICE), Bill Hersh (DMICE), Shannon McWeeney (DMICE) and Melissa Haendel. The poster, available here, described development of in-person data science trainings offered as short courses to OHSU students, and the development of Open Educational Resources (OERs) that are available online ( Conference attendees were particularly interested in the BD2K tutorials on topics related to biocuration (such as BDK05 on Data Standards and BDK12 on Data annotation and Curation), as there is a lack of formal training in biocuration. Several biocuration efforts discussed at the conference involved crowd-sourced efforts, so these tutorials will be useful for training contributors to these community databases. I encourage you to check out these interesting tutorials!

The second award-winning poster titled “A need for better data sharing policies: a review of data sharing policies in biomedical journals” described a project led by Robin Champieux and co-authored by Nicole, Jessica Minnier (from the OHSU-PSU School of Public Health) and Melissa Haendel, and is available here. This poster described an analysis of biomedical journal data sharing policies. It is widely agreed that data sharing is important for ensuring transparency of research results and scientific reproducibility (and data sharing will certainly facilitate biocuration efforts to extract information from the literature into databases). This analysis showed that approximately 40% of journals (in our sample) either required or strongly encouraged data sharing upon publication. The data from this analysis is shared here (which includes a list of journals that require or encourage data sharing) and we hope that researchers will publish in those journals that require data sharing. A preprint is available here, and the manuscript has been accepted for publication in PeerJ and will be available soon.

Meeting of the Minds: Monarch All Hands Meeting 2017

At the end of February, the global members of the Monarch Initiative convened at the Jackson Laboratory in Farmington, Connecticut for our annual All-Hands Meeting. This collaborative meeting allowed us to set goals for 2017, have hands-on working time for various projects, bond over an epic Hibachi meal, and compete in giant Jenga. Since the Monarch team works around the globe, the All-Hands Meeting was a unique chance for everyone to gather in the same room. As a new member to the team, I particularly enjoyed meeting the rest of my coworkers in person - instead of over video chat! The meeting was a big success, yet it ended on a dramatic note when several of our flights were canceled, resulting in three-hour-long taxi rides and an impromptu trip to Waffle House.

In this post I will mention some of the highlights from the meeting as well as the goals we discussed for the upcoming year and how these goals fit into three main themes: ontologies, tools, and collaborations.

A hallmark of the Monarch Initiative is work on the underlying ontologies, which was reflected during the meeting with discussions on several phenotype and disease ontologies. Monarch team members and colleagues presented work on the Human Phenotype Ontology (HPO), the Mammalian Phenotype Ontology, and Upheno, which is the "uber phenotype ontology" that combines all species-specific ontologies. We also heard about advances with the Disease Ontology and the Merged ONtology of Disease Objects (MonDO). Goals for the upcoming year include further developing the HPO and MonDO and increasing the ontology browsing capability on the Monarch webpage. The Monarch Team is also composing two ontology-rich manuscripts that will be published in the near future: one detailing the inclusion of lay person synonyms into the HPO, and the second explaining the development of the Genotype Ontology GENO. You can read more about how we incorporate these ontologies here:

Nicole Vasilevsky, Monarch Project Manager, describes MonDo as being "created using novel mechanisms to semi-automatically merge multiple disease resources to yield a coherent merged ontology. This ontology should aid users to find relevant information related to diseases of interest." Importantly, Nicole was designated as the MVP of the meeting as she actively contributed to Monarch ontologies during the meeting! Good work, Nicole!

During the meeting, there were several fascinating demos on Monarch tools as well, such as the Exomiser. Clinicians and researchers can use the Exomiser, which is a Java tool, to analyze genomic information for disease-causing variants and also compare phenotypes across species using the PhenoDigm algorithm. We also learned how to use the Monarch tools JAnnotator and PhenoTyper, which automatically select phenotype terms from journal article text, allowing those terms to then be used in phenotypic comparisons. Another interesting tool developed by the Monarch team is PhenoPackets, which I wrote about for Open Data Day. Many of these tools are still being actively designed, and we would love feedback (via email or write a Github issue) if you or your group have used these tools.

During the first day of our meeting, we had the pleasure of hearing from several of Monarch’s guests and collaborators. First, Sanford Imagenetics discussed precision public health and the role that Monarch tools can play in advancing patient health. Next David Adams from the Undiagnosed Disease Program spoke about how Monarch and the NIH can work together to improve diagnosis of rare diseases. Jean-Philippe “JP” Gourdine, research associate in glycobiology and metabolomics at OHSU, gave an interesting talk in which he introduced the use of metabolomics data for gene prioritization in the context of the Undiagnosed Diseases Network. JP and Matt Brush, Monarch Ontologist, also discussed the integration of the molecular glycophenotypes ontology (MGPO) into the HPO. Our next guest came all the way from Paris to represent Orphanet; Annie Olry explained the ongoing analytical work behind Orphanet’s portal on rare diseases. The next three presentations came from members of the Jackson labs, our gracious host for the meeting. Judith Blake and Cindy Smith demonstrated how the Mouse Genome Informatics resource is incorporating the Disease Ontology. The next presenter was Elissa Chesler, who showed us GeneWeaver, which is an interesting tool that can integrate heterogeneous genomic data. To round out the session, Sue Mockus discussed Jackson’s Clinical Knowledgebase tool for exploring genomic profiles. These presentations highlighted the important role that collaboration plays in the Monarch Initiative -- we thank our guests for joining us and helping make this Monarch All Hands meeting a success!


Open Data Day spotlight on PhenoPackets

To celebrate Open Data Day 2017, I want to highlight one of the Monarch Initiative’s innovative data sharing tools: the PhenoPacket. At Monarch we playfully refer to the PhenoPacket concept as a “bag of phenotypes” to describe patients. If you aren’t a researcher or clinician, you are probably wondering what a “phenotype” is. A phenotype can be simply defined as the patient signs and symptoms associated with a disease, or more technically defined as the physical manifestation of the combined effects of a person’s genes and their environment. PhenoPackets are a novel way to systematically organize and share the data associated with a patient’s phenotypes.

Currently, data about a patient’s phenotypes is collected by doctors and researchers and can be found in publications, databases, electronic health records, clinical trials, and even social media. This wide variation in data creation leads to diverse data that is not standardized or in a central location, so it is very difficult to see patterns and connections between patients among this data. PhenoPackets were designed to solve these large data integration and computation problems that researchers and clinicians face when collecting or working with phenotype data.

Creating phenotype data that is sharable and standardized allows clinicians and researchers to use this data to improve patient diagnosis. A goal of PhenoPackets is to create data that is more uniform and therefore computationally useful. For a doctor diagnosing a patient, this searchable, comparable phenotype data will allow the clinician to compare their patient to known diseases and other patients around the world, increasing the accuracy of diagnosis. When a patient comes to see their doctor, their PhenoPacket could be created from the phenotypes the doctor sees, and this information could follow the patient to other clinics and could also be published (without personal health information) in journals and databases or even on the Web directly.

What kind of data actually makes up a PhenoPacket? Each patient would have a collection of phenotypes that would be coupled with any available genomic information (e.g. sequencing data), and the data in their PhenoPacket would include: age of onset, symptoms, family history,  and quantitative values related to specific phenotypes, with evidence for each of these categories. The phenotypes are described with common terms using the Human Phenotype Ontology (HPO), allowing for integration of data and computational analysis. The HPO is now incorporated into the Unified Medical Language System (UMLS). Importantly, PhenoPackets do not contain any personal health information, so each patient can remain anonymous in publications or databases. On the technical side, PhenoPackets are encoded in JSON or YAML, but are programming language neutral with implementation tools in several languages. You can read more about the code behind PhenoPackets and try out a tutorial here:

PhenoPackets are supported by the Monarch Initiative and other communities of researchers to aid in disease diagnosis and personalized medicine, as well as model organism research. The creation of more standardized phenotype data that is openly shared amongst researchers and clinicians will lead to more large scale phenotypic data analysis, which can improve patient diagnosis and outcomes and mechanistic discovery.

Read more about the HPO here:

Rare Disease Day 2017

Diagnosing diseases is a tricky business requiring a formidable breadth and depth of knowledge and the skill to apply that knowledge. Patient diagnosis becomes even more difficult for rare diseases: quality reference data may not exist and a physician might only see one such patient in her entire career. According to the National Institutes of Health, there are between 6,000 and 7,000 rare diseases affecting from 25 to 30 million Americans, making it likely that most, if not all, health care professionals have seen these patients in their practice but may not have known it. Oftentimes, a patient with a rare disease gets misdiagnosed as having a more common disease with a similar set of symptoms. In such cases, the misdiagnosis can lead to ineffective, or even harmful treatment; this is a danger even for patients who have rarer forms of a common disease.

Next Tuesday is Rare Disease Day, a day devoted to raising awareness of rare diseases, learning from the patients and families living with these diseases, and promoting the research that is being done to find treatments.

What is a rare disease?
A disease is considered rare if it affects fewer than 1 in 200,000 people in the US, or fewer than 1 in 2,000 people in Europe. The more than 6,000 known rare diseases range from several kinds of cancer, to neurological, skin disorders, and diseases affecting the lungs. According to the Orphanet, a portal for rare diseases, most of these have no cure. Of the known rare diseases, about 80% have an established genetic cause, while the remaining 20% are thought to be caused by environmental factors, like infections. Approximately half of the people with a rare disease are children.

What is Rare Disease Day?
Rare Disease Day began in 2008 in Europe by EURORDIS and the Council of National Alliances as a day to raise awareness about rare diseases and the people living with them. The day also raises awareness for policy makers and health professionals. It occurs on the last day of February, which is Tuesday the 28th this year. Rare Disease Day has grown since its inception: last year there were events in 84 countries.

What are some important issues affecting people with rare diseases?
One of the largest problems facing patients living with rare diseases is a delay in diagnosis. From the time a patient with a rare disease first sees a doctor to the time of correct diagnosis averages 4.8 years, but can take up to 20 years. During this time, on average, the patient will see more than 7 physicians. There are several reasons why this diagnosis can take so long.
  • Rare diseases are, by definition, rare, meaning that a particular doctor may have never seen a similar patient before. In this situation, the patient might leave the doctor without a diagnosis, or be sent to another physician’s office to aid in the diagnosis.
  • To further complicate diagnosis, many rare diseases have symptoms that are similar to those associated with a more common disease, so misdiagnosis is very common among these patients.
  • Also, symptoms of the same rare disease can present themselves in different ways in different patients, leading to confusion. These misdiagnosis can be costly, potentially harmful, and frustrating for the patients and their families.
  • Sometimes a physician will correctly identify the rare disease, only to inform the patient that there is no cure for the disease, or there is an incomplete treatment.
Rare diseases pose a public health problem: each rare disease may be uncommon, but rare diseases combined affect a large population of patients, their families, and their caregivers.

This year is the 10th Rare Disease Day and has a special theme of research. The slogan, with research, possibilities are limitless, focuses on the important role that research plays in diagnosing and treating rare diseases. Researchers studying rare diseases can increase knowledge of how a particular disease occurs, which parts of the body are affected, or how the disease could be treated. This information could directly impact a patient’s life by improving diagnosis rate and increasing treatment options. Further, researching rare diseases can also provide insights into more common diseases. The Rare Disease Day press release highlights this point by saying: “Research is key. It brings hope to the millions of people living with a rare disease across the world and their families.”

Rare Disease Day shines a light on the need for international collaboration between researchers and clinicians. Since there are so few patients with a particular rare disease, increasing communication around the world can link these isolated patients together, building a support system and increasing knowledge about their shared disease. “Rare Disease Day 2017 is therefore an opportunity to call upon researchers, universities, students, companies, policy makers and clinicians to do more research and to make them aware of the importance of research for the rare disease community.” - Rare Disease Day press release

I recommend watching the 2017 Rare Disease Day video ( and learning more about individuals living with rare diseases by reading their stories here:

Monarch Initiative members recently attended the IRDiRC – International Rare Diseases Research Consortium (pictured below), and you can read more about this important conference here.

Sources and more information:

