Skip to main content

Open Data Day spotlight on PhenoPackets

To celebrate Open Data Day 2017, I want to highlight one of the Monarch Initiative’s innovative data sharing tools: the PhenoPacket. At Monarch we playfully refer to the PhenoPacket concept as a “bag of phenotypes” to describe patients. If you aren’t a researcher or clinician, you are probably wondering what a “phenotype” is. A phenotype can be simply defined as the patient signs and symptoms associated with a disease, or more technically defined as the physical manifestation of the combined effects of a person’s genes and their environment. PhenoPackets are a novel way to systematically organize and share the data associated with a patient’s phenotypes.

Currently, data about a patient’s phenotypes is collected by doctors and researchers and can be found in publications, databases, electronic health records, clinical trials, and even social media. This wide variation in data creation leads to diverse data that is not standardized or in a central location, so it is very difficult to see patterns and connections between patients among this data. PhenoPackets were designed to solve these large data integration and computation problems that researchers and clinicians face when collecting or working with phenotype data.

Creating phenotype data that is sharable and standardized allows clinicians and researchers to use this data to improve patient diagnosis. A goal of PhenoPackets is to create data that is more uniform and therefore computationally useful. For a doctor diagnosing a patient, this searchable, comparable phenotype data will allow the clinician to compare their patient to known diseases and other patients around the world, increasing the accuracy of diagnosis. When a patient comes to see their doctor, their PhenoPacket could be created from the phenotypes the doctor sees, and this information could follow the patient to other clinics and could also be published (without personal health information) in journals and databases or even on the Web directly.

What kind of data actually makes up a PhenoPacket? Each patient would have a collection of phenotypes that would be coupled with any available genomic information (e.g. sequencing data), and the data in their PhenoPacket would include: age of onset, symptoms, family history,  and quantitative values related to specific phenotypes, with evidence for each of these categories. The phenotypes are described with common terms using the Human Phenotype Ontology (HPO), allowing for integration of data and computational analysis. The HPO is now incorporated into the Unified Medical Language System (UMLS). Importantly, PhenoPackets do not contain any personal health information, so each patient can remain anonymous in publications or databases. On the technical side, PhenoPackets are encoded in JSON or YAML, but are programming language neutral with implementation tools in several languages. You can read more about the code behind PhenoPackets and try out a tutorial here:

PhenoPackets are supported by the Monarch Initiative and other communities of researchers to aid in disease diagnosis and personalized medicine, as well as model organism research. The creation of more standardized phenotype data that is openly shared amongst researchers and clinicians will lead to more large scale phenotypic data analysis, which can improve patient diagnosis and outcomes and mechanistic discovery.

Read more about the HPO here:

Popular posts from this blog

Finally, a medical terminology that patients, doctors, and machines can all understand.

By Nicole Vasilevsky, Mark Engelstad, Erin Foster, Julie McMurry, Chris Mungall, Peter Robinson, Sebastian Köhler, Melissa Haendel
For many patients with rare and undiagnosed diseases, getting an accurate diagnosis, or even finding the appropriate experts is a long and winding road. To accelerate and facilitate this process, we developed a medical vocabulary (“HPO”) which is comprised of 12,000 terms that doctors can use to codify the precise and distinct observations about patients and their conditions. The HPO is structured in a way that enables machines to intelligently compare a patient’s profile with what scientists worldwide have already uncovered about diseases and their genetic causes.
Until now, most of the HPO labels and synonyms were composed of clinical terms unfamiliar to patients. For example, a patient may know they are ‘color-blind’, but may not be familiar with the clinical term ‘Dyschromatopsia’. This is why we developed a layer of 5,000 corresponding terms that can b…

What's in a (gene) name? That which we call a gene by any other name would confuse a researcher

If you had told me that I would spend my PhD years studying a gene called Falafel, I probably would not have believed you. Yet, that is exactly what happened to me (I was also briefly studying a gene called Bazooka). When working with fruit flies, researchers often come up with entertaining names for newly discovered genes; however, these same genes in mammals can be quite different. For instance, Falafel is called PP4r3 in humans. This discrepancy in gene names (also called gene symbols) can be confusing, and part of the Monarch mission is to ease cross-talk between interspecies genotype data. As a researcher, it can be hard to remember what a gene is called in different species, and this problem becomes more difficult if a gene name is changed. Thankfully, gene names are infrequently changed, and there are groups committed to ensuring that gene names are systematic and regulated. Recently, however, I was prompted to think of alternative names for MARCH7, a gene discovered by Monarc…

Why the Human Phenotype Ontology?

We've often been asked, why should we use the Human Phenotype Ontology to describe patient phenotypes, rather than a more widely-used clinical vocabulary such as ICD or SNOMED? Here are the answers to some of these frequently asked questions:

1. We should use what other big NIH projects, like ClinVar, are using.

ClinVar is using HPO terms to describe phenotypes. This is done in collaboration with MedGen, which has imported HPO terms. Here is an example:

There are now many bioinformatics tools that use the HPO to empower exome diagnostics. The Monarch team has published two of these recently

1) Exomiser (Robinson et al., 2014 Genome Res.) => For discovering new disease genes via model organism data, several successful use cases at UDP and elsewhere

2) PhenIX (Zemojtel et al., 2014 Science Translational Medicine) => For clinical diagnostics of “difficult” cases. This paper was on Russ Altman's year in review at AMIA this year.

Also, a num…