Wednesday, May 27, 2015

Why the Human Phenotype Ontology?

We've often been asked, why should we use the Human Phenotype Ontology to describe patient phenotypes, rather than a more widely-used clinical vocabulary such as ICD or SNOMED? Here are the answers to some of these frequently asked questions:

1. We should use what other big NIH projects, like ClinVar, are using.

ClinVar is using HPO terms to describe phenotypes. This is done in collaboration with MedGen, which has imported HPO terms. Here is an example:


There are now many bioinformatics tools that use the HPO to empower exome diagnostics. The Monarch team has published two of these recently
1) Exomiser (Robinson et al., 2014 Genome Res.) => For discovering new disease genes via model organism data, several successful use cases at UDP and elsewhere
2) PhenIX (Zemojtel et al., 2014 Science Translational Medicine) => For clinical diagnostics of “difficult” cases. This paper was on Russ Altman's year in review at AMIA this year.

Also, a number of other groups are converging on use of the HPO, since in contrast to SNOMED, ICD, or other terminologies, it is a formal ontology that allows powerful computerized algorithms to be used. Amongst these are:

1) Stephen Kingsmore (Kansas City) now is using the Phenomizer to prioritize genes (currently unpublished, but see pmid: 23035047 for Stephen’s previous work)
2) The Yandell group in Utah: Singleton MV et al;  Phevor combines multiple biomedical ontologies for accurate identification of disease-causing alleles in single individuals and small nuclear families. Am J Hum Genet. 2014 Apr 3;94(4):599-610.
3) The Moreau group in Leuven: Sifrim A, et al; eXtasy: variant prioritization by
genomic data fusion. Nat Methods. 2013 Nov;10(11):1083-4.
4) The Children’s hospital of Philadelphia group around Jeff Pennington (BMC Genomics, accepted)
5) Lévesque group at McGill University Health Centre. Trakadis et al. BMC Med Genomics. 2014 May 12;7:22. doi: 10.1186/1755-8794-7-22. PhenoVar: a phenotype-driven approach in clinical genomics for the diagnosis of polymalformative syndromes.

The above 5 citations are merely the use of HPO for prioritization of genes in NGS studies, there are numerous other bioinformatics and genetics databases using the HPO, see the HPO homepage for a partial list.

Therefore, the reasons for choosing HPO over other terminologies include interoperability with current major genetics databases; a partial list is:
(i) UDB intramural
(ii) UK 100,000 genomes
(iii) The Thrombogenomics group: https://haemgen.haem.cam.ac.uk/thrombogenomics
(iv) ECARUCA
(v) The Sanger Institute’s DECIPHER (chromosomal database)
(vi) The Sanger Institute's DDD (Exome) database
(vii) Genome Canada (Care4Rare)
(viii) CINEAS and the other genetics databases in the Netherlands including databases at Nijmegen
(ix) Beginning collaborations with databases for non-Mendelian (complex) disease, including EULAR - The European League Against Rheumatism.
(x) ORPHANET
(xi) NCBI resources mentioned above (e.g. ClinVar)

There are multiple other resources that use the HPO. This is reflected in the fact that the HPO homepage is the top hit in Google for the search “human phenotype” -- in fact, 8 out of ten of the top hits point to various HPO resources. Even a search on “phenotype” alone places the HPO on the second page, with the previous entries either being dictionary definitions of the word “phenotype” or encyclopedia entries about genotype-phenotype correlations. Furthermore, due to the interoperability of the ontologies across species, the Model Organism Databases have been working on linking models to diseases based on phenotype descriptions.


2. We'll just map from ICD or SNOMED to HPO.

Preliminary analysis performed by Winnenburg and Bodenreider showed that the current UMLS has relatively poor coverage of phenotype terms. Their analysis suggests that coverage of HPO classes in UMLS is 54% in total and only 30% for SNOMED CT . Therefore, mappings from ICD or SNOMED will not take advantage of the full specificity as natively recording phenotypes in HPO. The HPO is designed to treat the patient as a biological subject.

We have initiated collaboration with the UMLS team to integrate the HPO into UMLS. We have agreed to maintain the mapping as the resources grow (the resource will be reviewed and updated for every 6-month release of the UMLS). The UMLS mapping will improve the ability of resources such as MedGen to keep their versions of the HPO up to date and crosslinked amongst the various NCBI resources that are using HPO, including ClinVar, Genetic Testing Registry, MedGen, and to some extent dbGAP (via the ISCA and some other projects). Following the integration into UMLS, expected to be released in November 2015, it will be straightforward to cross-walk. 

The International Consortium of Human Phenotype Terminologies (ICHPT) was co-developed by Peter Robinson, Segolene Ayme, the Orphanet team, and Ada Hamosh. This is a list of about 2700 terms with mappings between HPO, PhenoDB, MeSH, SNOMED, London Dysmorphology Database, POSSUM, Orphanet. It is not a formal ontology and would need to be utilized in the context of one of the other resources to allow computational analysis. 

3. We don't know much about HPO.  Why is it the best choice?

(a) Purpose-built. The HPO was designed for clinical phenotyping purposes to aid rare disease diagnosis. It was designed to make human clinical abnormalities “computable” in a way that would allow sophisticated bioinformatics analysis (see the above publications for an overview of current possibilities). No other resource allows this.

(b) Open. It is available under an open license. The license is simple and extremely liberal:
If desired, the HPO could also be licensed under creative commons if a user prefers to have it.

(c) Intuitive. It is not easy to get quality, structured data from anyone, least of all a busy clinician. The HPO was built for use by clinicians to reflect their normal clinical phenotyping processes, rather than for billing or more complex EHR recording. The HPO has been integrated into tools such as PhenoTips (a Monarch partner) for easy annotation, and our experiences thus far have shown that clinicians find HPO and PhenoTips very easy to use.

(d) Interoperable. One important feature of the HPO is that it is logically interoperable with the Gene Ontology, Cell ontology, Anatomy ontologies, etc. to support sophisticated bioinformatics workflows using modern semantic standards. For example, the logical interoperability in HPO enables more sophisticated phenotype profile matching both against known diseases as well as across species to identify candidates. This logic is the basis of comparisons that have been implemented in tools such as PhenIX, Exomiser, PhenomeCentral, etc. We and others are taking advantage of this important feature for diagnostic tool development and mechanistic analysis.

(e) Community-driven. An active developer community is responsive and the HPO is on a monthly release cycle. This was reflected in the authors list of a recent update paper that had about 40 coauthors from outside the MONARCH team, which is in addition to numerous other people who have made occasional contributions to the HPO.

Köhler S, Doelken SC, Mungall CJ, Bauer S, Firth HV, Bailleul-Forestier I,
Black GC, Brown DL, Brudno M, Campbell J, FitzPatrick DR, Eppig JT, Jackson AP,
Freson K, Girdea M, Helbig I, Hurst JA, Jähn J, Jackson LG, Kelly AM, Ledbetter
DH, Mansour S, Martin CL, Moss C, Mumford A, Ouwehand WH, Park SM, Riggs ER,
Scott RH, Sisodiya S, Van Vooren S, Wapner RJ, Wilkie AO, Wright CF, Vulto-van
Silfhout AT, de Leeuw N, de Vries BB, Washingthon NL, Smith CL, Westerfield M,
Schofield P, Ruef BJ, Gkoutos GV, Haendel M, Smedley D, Lewis SE, Robinson PN.
The Human Phenotype Ontology project: linking molecular biology and disease
through phenotype data. Nucleic Acids Res. 2014 Jan;42(Database issue):D966-74.
doi: 10.1093/nar/gkt1026. Epub 2013 Nov 11. PubMed PMID: 24217912; PubMed Central
PMCID: PMC3965098.

(f) Extensible. The HPO is easy to extend in terms of both content and logic. A Term Genie, similar to that utilized for the Gene Ontology and Cell Ontology, has been set up for HPO to allow addition of large numbers of terms in a logical fashion. A community portal is also being constructed for basic term requests, which currently go through a SourceForge tracker system. Consultation with experts is performed to extend specific areas of the ontology, for example, glycomics, skeletal dysplasias, craniofacial malformations, etc.

4. How hard is to enter HPO terms reliably and correctly.
We have found that it is straightforward to enter HPO terms using the PhenoTips tool. As part of the Monarch Initiative, we have integrated a specificity meter that tells the clinician entering the terms when they have reached satisfactory breadth and depth for their patient profile. PhenoTips also has a variety of user features such as tool tips, co-annotation suggestions, etc., to help the user enter the terms correctly. The clinicians we have surveyed using PhenoTips find it easy to use.

5. What guarantee is there that HPO will continue to be supported/who is keeping it financially afloat?
Whilst HPO development is currently funded via a number of grants (such as to Monarch, PhenoTips, from the German BMBF 0313911, and the European Community’s FP7 grant 602300), the main guarantee is that the HPO is essentially a community standard that relies in part on crowd-sourcing for extension and maintenance. It is part of a suite of ontologies such as the Gene Ontology and Uberon that have been maintained for many years in the same way and managed by many of the same key community members. It has a diverse user base, both academic and private sector. Part of the reason that it has staying power, is that the HPO developers work closely with specific communities to enhance some aspect of the ontology.  Our development strategies for coordination and community contribution are similar to those of the GO, and key members of the GO development team are also intimately involved in HPO development. We believe that projects with significant community support have higher longevity, as has been demonstrated by the GO consortium.

6.  What technologies exist to help support HPO documentation?

The HPO is natively edited in the Web Ontology Language, OWL, and is internally documented using standardized annotation properties. These properties are easily displayed wherever the file is utilized in any kind of browser, such as the NCBO Bioportal, Ontobee, within PhenoTips, and on the Monarch website. There is also a website for the HPO, www.human-phenotype-ontology.org<http://www.human-phenotype-ontology.org>, that contains further documentation regarding development and annotation. A community portal for basic term requests is being constructed. The Monarch Initiative website provides further documentation and services for accessing the ontologies, related data, and other ontology operations.