Saturday, June 6, 2015

Save the Date: Symposium on Linking Disease Model Phenotypes to Human Conditions


Monarch is co-hosting a NIH Symposium titled “Linking Disease Model Phenotypes to Human Conditions” on September 10-11, 2015 at the Fishers Lane Auditorium, NIH, Rockville, MD. 

The purpose of the meeting is to convene a colloquium on the current status of Phenomics and its role in closing the gap that exists between biomedical research and clinical medical practice. The wealth of whole organism, cellular, and molecular data generated in the research laboratory must be translated into clinically relevant knowledge that enables the physician to make the best possible treatment decisions. Phenomics is gaining momentum due to the availability of the complete genomes for many organisms as well as higher throughput methods to genetically modify model organism genomes and observe and record phenotypes. Disease models comprise some of the most important tools of biomedical research. The efficacy of the use of disease models is based upon the principles of evolutionary conservation between species, including conservation of pathogenic disease mechanisms. The lack of alignment of phenotypes between model species and humans has been a historic impediment to understanding disease processes. Further progress depends upon integration of clinical, biological and genomic data and development of the tools for identification and analysis of specific and amendable disease-causing molecular phenotypes of various diseases.

Session Topics: 
  • Current status of the human clinical phenotype ontology and terminology, and associated data annotation and use
  • Cross-species phenotype analysis and ontology
  • Large scale high throughput analysis of disease model phenotyping data and annotation of gene function
  • Linking disease-relevant phenotypes with physiologically relevant molecular pathways and networks 
  • Clinical and experimental biology data integration, and the positioning of molecular phenotypes in the emerging field of precision medicine 
  • Resources for submission, representation, analysis, and sharing of phenotypic and genomic information, development of resource identification and tracking to improve reproducibility and the tracking of resource utilization and trends


Organizing Committee: Oleg Mirochnitchenko (ORIP/NIH, MD), Harold Watson (ORIP/NIH, MD), Melissa Haendel (Oregon Health & Science University, OR), Olga Troyanskaya (Princeton University, NJ), Olivier Bodenreider  (NIH Library, MD), Yves Lussier (University of Arizona, AZ), Phil Bourne  (DS/NIH, MD), Janan Eppig (The Jackson Laboratory, ME), Mary Mullins (University of Pennsylvania, PA).

Symposium program and registration to attend this event at:

Please, register early, space is limited!

For more information on meeting logistics and registration, contact: Mark A. Dennis at mdennis@scgcorp.com,

P: (301) 670-4990, ext. 237
F: (301) 670-3815

For programmatic questions contact Oleg Mirochnitchenko at Oleg.Mirochnitchenko@nih.gov,
P: (301) 435-0748

Confirmed speakers include: 
Peter N. Robinson (Max Planck Institute for Molecular Genetics, Germany)
Carol M. Hamilton (PhenX, NC), Rachel Richesson (Duke University, NC)
Rex L. Chisholm (Northwestern University, IL)
Melissa Haendel (Oregon Health and Science University, OR)
Damian Smedley (Wellcome Trust Sanger Institute, UK)
Chris Mungall (Lawrence Berkeley Laboratory, CA)
Caleb Webber (Oxford University, UK)
Janan Eppig (The Jackson Laboratory, ME)
Elissa Chessler (The Jackson Laboratory, ME)
Derek Stemple (Welcome Trust Sanger Institute, UK)
Ross Cagan (The Mount Sinai Hospital, NY)
Olga Troyanskaya (Princeton University, NJ)
Kara Dolinski (Princeton University, NJ) 
John Quackenbush (Dana-Farber Cancer Institute, MA)
Jason Moore (Dartmouth College, NH)
Yves Lussier (Univ. of Arizona, AZ)
Razelle Kurzrock (Univertsity of California San Diego, CA)
Gail Herman (Nationwide Children's Hospital in Columbus, OH)
Calumn MacRae (Harvard Medical School and Brigham and Women's Hospital, MA)
Maryann Martone (University of California San Diego, CA)
Nicole Washington (Lawrence Berkeley National Laboratory, CA)
Paul Thomas (University of Southern California, CA)
Mary Mullins (University of Pennsylvania, PA)
Olivier Bodenreider (NIH Library, Bethesda, MD)
Phil Bourne (NIH, MD).  

Individuals with disabilities who need Sign Language Interpreters and/or reasonable accommodation to participate in this event should contact RegenerativeMedicine@lmbps.com or the Federal Relay (1-800-877-8339).


 This symposium is sponsored by the Office of Research Infrastructure Programs, DPCPSI/OD/NIH.

Wednesday, May 27, 2015

Why the Human Phenotype Ontology?

We've often been asked, why should we use the Human Phenotype Ontology to describe patient phenotypes, rather than a more widely-used clinical vocabulary such as ICD or SNOMED? Here are the answers to some of these frequently asked questions:

1. We should use what other big NIH projects, like ClinVar, are using.

ClinVar is using HPO terms to describe phenotypes. This is done in collaboration with MedGen, which has imported HPO terms. Here is an example:


There are now many bioinformatics tools that use the HPO to empower exome diagnostics. The Monarch team has published two of these recently
1) Exomiser (Robinson et al., 2014 Genome Res.) => For discovering new disease genes via model organism data, several successful use cases at UDP and elsewhere
2) PhenIX (Zemojtel et al., 2014 Science Translational Medicine) => For clinical diagnostics of “difficult” cases. This paper was on Russ Altman's year in review at AMIA this year.

Also, a number of other groups are converging on use of the HPO, since in contrast to SNOMED, ICD, or other terminologies, it is a formal ontology that allows powerful computerized algorithms to be used. Amongst these are:

1) Stephen Kingsmore (Kansas City) now is using the Phenomizer to prioritize genes (currently unpublished, but see pmid: 23035047 for Stephen’s previous work)
2) The Yandell group in Utah: Singleton MV et al;  Phevor combines multiple biomedical ontologies for accurate identification of disease-causing alleles in single individuals and small nuclear families. Am J Hum Genet. 2014 Apr 3;94(4):599-610.
3) The Moreau group in Leuven: Sifrim A, et al; eXtasy: variant prioritization by
genomic data fusion. Nat Methods. 2013 Nov;10(11):1083-4.
4) The Children’s hospital of Philadelphia group around Jeff Pennington (BMC Genomics, accepted)
5) Lévesque group at McGill University Health Centre. Trakadis et al. BMC Med Genomics. 2014 May 12;7:22. doi: 10.1186/1755-8794-7-22. PhenoVar: a phenotype-driven approach in clinical genomics for the diagnosis of polymalformative syndromes.

The above 5 citations are merely the use of HPO for prioritization of genes in NGS studies, there are numerous other bioinformatics and genetics databases using the HPO, see the HPO homepage for a partial list.

Therefore, the reasons for choosing HPO over other terminologies include interoperability with current major genetics databases; a partial list is:
(i) UDB intramural
(ii) UK 100,000 genomes
(iii) The Thrombogenomics group: https://haemgen.haem.cam.ac.uk/thrombogenomics
(iv) ECARUCA
(v) The Sanger Institute’s DECIPHER (chromosomal database)
(vi) The Sanger Institute's DDD (Exome) database
(vii) Genome Canada (Care4Rare)
(viii) CINEAS and the other genetics databases in the Netherlands including databases at Nijmegen
(ix) Beginning collaborations with databases for non-Mendelian (complex) disease, including EULAR - The European League Against Rheumatism.
(x) ORPHANET
(xi) NCBI resources mentioned above (e.g. ClinVar)

There are multiple other resources that use the HPO. This is reflected in the fact that the HPO homepage is the top hit in Google for the search “human phenotype” -- in fact, 8 out of ten of the top hits point to various HPO resources. Even a search on “phenotype” alone places the HPO on the second page, with the previous entries either being dictionary definitions of the word “phenotype” or encyclopedia entries about genotype-phenotype correlations. Furthermore, due to the interoperability of the ontologies across species, the Model Organism Databases have been working on linking models to diseases based on phenotype descriptions.


2. We'll just map from ICD or SNOMED to HPO.

Preliminary analysis performed by Winnenburg and Bodenreider showed that the current UMLS has relatively poor coverage of phenotype terms. Their analysis suggests that coverage of HPO classes in UMLS is 54% in total and only 30% for SNOMED CT . Therefore, mappings from ICD or SNOMED will not take advantage of the full specificity as natively recording phenotypes in HPO. The HPO is designed to treat the patient as a biological subject.

We have initiated collaboration with the UMLS team to integrate the HPO into UMLS. We have agreed to maintain the mapping as the resources grow (the resource will be reviewed and updated for every 6-month release of the UMLS). The UMLS mapping will improve the ability of resources such as MedGen to keep their versions of the HPO up to date and crosslinked amongst the various NCBI resources that are using HPO, including ClinVar, Genetic Testing Registry, MedGen, and to some extent dbGAP (via the ISCA and some other projects). Following the integration into UMLS, expected to be released in November 2015, it will be straightforward to cross-walk. 

The International Consortium of Human Phenotype Terminologies (ICHPT) was co-developed by Peter Robinson, Segolene Ayme, the Orphanet team, and Ada Hamosh. This is a list of about 2700 terms with mappings between HPO, PhenoDB, MeSH, SNOMED, London Dysmorphology Database, POSSUM, Orphanet. It is not a formal ontology and would need to be utilized in the context of one of the other resources to allow computational analysis. 

3. We don't know much about HPO.  Why is it the best choice?

(a) Purpose-built. The HPO was designed for clinical phenotyping purposes to aid rare disease diagnosis. It was designed to make human clinical abnormalities “computable” in a way that would allow sophisticated bioinformatics analysis (see the above publications for an overview of current possibilities). No other resource allows this.

(b) Open. It is available under an open license. The license is simple and extremely liberal:
If desired, the HPO could also be licensed under creative commons if a user prefers to have it.

(c) Intuitive. It is not easy to get quality, structured data from anyone, least of all a busy clinician. The HPO was built for use by clinicians to reflect their normal clinical phenotyping processes, rather than for billing or more complex EHR recording. The HPO has been integrated into tools such as PhenoTips (a Monarch partner) for easy annotation, and our experiences thus far have shown that clinicians find HPO and PhenoTips very easy to use.

(d) Interoperable. One important feature of the HPO is that it is logically interoperable with the Gene Ontology, Cell ontology, Anatomy ontologies, etc. to support sophisticated bioinformatics workflows using modern semantic standards. For example, the logical interoperability in HPO enables more sophisticated phenotype profile matching both against known diseases as well as across species to identify candidates. This logic is the basis of comparisons that have been implemented in tools such as PhenIX, Exomiser, PhenomeCentral, etc. We and others are taking advantage of this important feature for diagnostic tool development and mechanistic analysis.

(e) Community-driven. An active developer community is responsive and the HPO is on a monthly release cycle. This was reflected in the authors list of a recent update paper that had about 40 coauthors from outside the MONARCH team, which is in addition to numerous other people who have made occasional contributions to the HPO.

Köhler S, Doelken SC, Mungall CJ, Bauer S, Firth HV, Bailleul-Forestier I,
Black GC, Brown DL, Brudno M, Campbell J, FitzPatrick DR, Eppig JT, Jackson AP,
Freson K, Girdea M, Helbig I, Hurst JA, Jähn J, Jackson LG, Kelly AM, Ledbetter
DH, Mansour S, Martin CL, Moss C, Mumford A, Ouwehand WH, Park SM, Riggs ER,
Scott RH, Sisodiya S, Van Vooren S, Wapner RJ, Wilkie AO, Wright CF, Vulto-van
Silfhout AT, de Leeuw N, de Vries BB, Washingthon NL, Smith CL, Westerfield M,
Schofield P, Ruef BJ, Gkoutos GV, Haendel M, Smedley D, Lewis SE, Robinson PN.
The Human Phenotype Ontology project: linking molecular biology and disease
through phenotype data. Nucleic Acids Res. 2014 Jan;42(Database issue):D966-74.
doi: 10.1093/nar/gkt1026. Epub 2013 Nov 11. PubMed PMID: 24217912; PubMed Central
PMCID: PMC3965098.

(f) Extensible. The HPO is easy to extend in terms of both content and logic. A Term Genie, similar to that utilized for the Gene Ontology and Cell Ontology, has been set up for HPO to allow addition of large numbers of terms in a logical fashion. A community portal is also being constructed for basic term requests, which currently go through a SourceForge tracker system. Consultation with experts is performed to extend specific areas of the ontology, for example, glycomics, skeletal dysplasias, craniofacial malformations, etc.

4. How hard is to enter HPO terms reliably and correctly.
We have found that it is straightforward to enter HPO terms using the PhenoTips tool. As part of the Monarch Initiative, we have integrated a specificity meter that tells the clinician entering the terms when they have reached satisfactory breadth and depth for their patient profile. PhenoTips also has a variety of user features such as tool tips, co-annotation suggestions, etc., to help the user enter the terms correctly. The clinicians we have surveyed using PhenoTips find it easy to use.

5. What guarantee is there that HPO will continue to be supported/who is keeping it financially afloat?
Whilst HPO development is currently funded via a number of grants (such as to Monarch, PhenoTips, from the German BMBF 0313911, and the European Community’s FP7 grant 602300), the main guarantee is that the HPO is essentially a community standard that relies in part on crowd-sourcing for extension and maintenance. It is part of a suite of ontologies such as the Gene Ontology and Uberon that have been maintained for many years in the same way and managed by many of the same key community members. It has a diverse user base, both academic and private sector. Part of the reason that it has staying power, is that the HPO developers work closely with specific communities to enhance some aspect of the ontology.  Our development strategies for coordination and community contribution are similar to those of the GO, and key members of the GO development team are also intimately involved in HPO development. We believe that projects with significant community support have higher longevity, as has been demonstrated by the GO consortium.

6.  What technologies exist to help support HPO documentation?

The HPO is natively edited in the Web Ontology Language, OWL, and is internally documented using standardized annotation properties. These properties are easily displayed wherever the file is utilized in any kind of browser, such as the NCBO Bioportal, Ontobee, within PhenoTips, and on the Monarch website. There is also a website for the HPO, www.human-phenotype-ontology.org<http://www.human-phenotype-ontology.org>, that contains further documentation regarding development and annotation. A community portal for basic term requests is being constructed. The Monarch Initiative website provides further documentation and services for accessing the ontologies, related data, and other ontology operations.

Thursday, April 23, 2015

What NLM should think about

Monarch replied to the 2015 Request for Information  “Soliciting Input into the Deliberations of the Advisory Committee to the NIH Director (ACD) Working Group on the National Library of Medicine (NLM)”. The RFI sought input regarding the strategic vision for the NLM to ensure that it remains an international leader in biomedical data and health information. 

Below are the Monarch consortium's thoughts. Our comments are primarily informed by our work on the development of information resources in support of translational biomedical informatics.

Dr. Melissa Haendel
Dr. Peter Robinson
Dr. Chris Mungall
Dr. Harry Hochheiser
Dr. David Eichmann
Dr. Michel Dumontier

Training

The Biomedical Informatics Research Training Program is perhaps the single most valuable contribution to the research community, providing considerable value to all of the NLM’s constituencies. At a time when informatics positions are going unfilled and demand is expected to continue to grow, the NLM funded training programs educate students to become practitioners and researchers who will work to develop solutions to challenging medical informatics problems at all levels.  Students from these programs go on to work for health care systems, insurers, industry, and academic institutions, as they develop and evaluate information systems ranging from personal health records to nationwide big data translational data warehouses. Given the importance of a well-educated workforce that understands both data science and health care, the Biomedical Informatics training programs should be supported and expanded, particularly in directions that will encourage promising young students to enter the field.

Massive Open Online Courses (MOOCs) in bioinformatics and medical informatics fields should also be funded, such as is occurring within the BD2K educational award program. These MOOCs will help many clinicians and researchers with on the job training that is relevant to their current and evolving informatics learning needs.

Meeting anticipated needs for informatics professionals will also require training efforts that extend beyond graduate level fellowships. The NLM should actively support and participate in programs at the undergraduate level (and earlier) that expose young students to potential opportunities in the field.  This might include reaching out to undergraduate programs in related areas (biology, information science, computer science, etc.) and supporting programs like the AMIA’s high-school development program (https://www.amia.org/news-and-publications/press-release/high-school-students-present-national-informatics-symposium).

Ideally, the NLM programs would be integrated with the newly emerging BD2K educational resources and coursework that is being developed in the BD2K program. NLM has an opportunity to coordinate these types of training across all of NIH and beyond, and we would see this as a key role for NLM.

Standards and tools

The pioneering resources developed and maintained by the NLM through the National Center for Biotechnology Information (NCBI) and related efforts are invaluable to the research community. Conducting modern biomedical research without tools like PubMed, NCBI taxonomy, MeSH, UMLS and many others is almost unthinkable.  Charting a course that makes effective use of limited resources to ensure the future utility and viability of these tools should be a top priority for the NLM.  Specifically, NLM leadership should initiate a review of the coverage and compatibility of available resources, with an eye towards both improving existing tools and identifying unmet needs. For example, PubMed and all of the Entrez databases have great value, but the UCSC, IGB, JBrowse genome browsers have become de facto standards, and as such, NLM’s genome browser efforts may need to be evaluated in the light of this development. Another example is vocabulary interoperability, which NLM could facilitate by developing, promoting, and funding better tools to support technical development in this area. The tools currently are very poor, and it is no wonder that there exist a myriad of data integration challenges based on this problem alone.

The time is also ripe for a re-envisioning of PubMed. Some immensely valuable PubMed resources are often difficult to find or to use effectively.  For example, the LinkOuts within PubMed are so well hidden that they are of least use to the community, but have the potential to be of enormous value. Specifically, one should be able to see how the LinkOut is attached to a publication, directly on the abstract. The community should have a more sophisticated mechanism for contributing LinkOuts, and uses should be able to filter/facet on the ones of interest to them. In the end, looking through PubMed could and should involve review of the most salient metadata associated with the paper - else the sheer volume of the literature contained in PubMed is simply too massive to rely on text-based searches alone for searching.  The addition of affiliation to the author construct in MEDLINE 2015 was a significant step forward in disambiguation of researchers, but only partially realized, as there is no further decomposition of the unstructured affiliation string. Further structuring in this area would allow for retrieval by institution and department, opening new avenues of understanding the relationships inherent in the science enterprise.  Binding these entities to authority records leads to clear identification of related work in cognate disciplines.

We are pleased that the NLM has invested significant time and effort into releasing MeSH as Linked Data, thereby demonstrating a forward thinking agenda that aligns with emerging standards for data publication and interoperability. NLM now joins organizations such as the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB), the Database Center for Life Sciences (DBCLS), and grassroots efforts such as Bio2RDF to create an ever greater federation of data. However, much more must be done to make the vast array of NLM resources available as Linked Data. The graph of data must not only be fully connected within NLM, but also with these other stakeholders so as to reduce the barrier to discovery and reuse. The NLM could lead by fostering conversations, coordinating efforts, and providing funding towards data interoperability on a massive scale. It must be responsive to social and technical issues relating to knowledge representation, data publication, data interlinking, and data reuse.

NLM should also work to exploit community efforts such as Force11 that are exploring new visions of the scholarly publication process. NLM-support via inclusion in PubMed and support via NLM tools such as E-utilities could be invaluable for these efforts. The NLM should also work with publishers to leverage community-driven annotation standards so authors/publishers can tag parts of the text in scientific publications as clinically relevant. This can then be used by pubmed and/or other interfaces to summarize the text and present the end user with more relevant results for evidence-based medicine. For example, a review paper or a manuscript of a randomized control trial can have less than 10 sentences that have very high impact in terms of clinical practice but many more sentences that are just generic so by identifying (tagging/annotating) these sentences at the publisher level -much like abstracts or key words are being done now will have great benefit to the clinicians who are looking for clinically relevant information in the literature.

NLM should work with industry partners like Google, Amazon, Microsoft to create cloud computing standards (e.g. API standards) that can be used across these platforms and will enable researchers to utilize a combination of these platforms for big data research. Currently, such efforts are being performed by numerous third parties and are not well coordinated. How many external resources take Medline content, transform it in some way, and make it available back to the community in some enhanced form? No one really knows, but if one surveyed this landscape you would find many common requirements being met in slightly different ways. Such work could a) greatly inform requirements for NLM future development, and b) if coordinated with the third-parties, could reduce downstream labor. However, a word of caution - if the NLM does not coordinate well with the third-parties, they will only increase the downstream labor.

NLM should have a transparent mechanism for evaluation. At the recent BD2K standards workshop, one of the key issues that the community unanimously agreed upon, was the need to understand when a standard has outlived its utility or has become outmoded. An example of this is MeSH. Despite the the widespread use of MeSH, every bioinformatician currently has hacks for making MeSH and the content annotated with it more usable. Perhaps it is time to evolve MeSH to a more modern semantic standard? Although MeSH has clearly been of tremendous value to the community, it is now failing to reach its full potential because of its limited interoperability and semantic foundation. NLM could greatly increase its impact by adding enough semantics to allow, say, computers to distinguish between entries that represent human or animal diseases, phenotypic findings or other complications of diseases, and items such as "Cadaver" (which is currently described as a pathologic process). MeSH could be much more valuable if evolved into something more computable and interoperable.

Cross-NIH Collaboration

NLM has a great opportunity to aid coordination within NIH, across the US agencies, and internationally. One example is the coordination of standards development, such as for Common Data Elements, which are currently buried within different ICs. Coordination between the ICs and the BD2K program is also a must if we are to realize the goals of the BD2K program and related efforts such as the National Cancer Informatics Program of the National Cancer Institute  - which are not dissimilar to what one might want for the future of NLM.

Further, the collection, storage, and use of biomedical data by the research community should be supported by a linked and navigable landscape of data, papers, software, and other resources. The proposed NIH commons, the data discovery index, the software discovery index, the Resource Identification Initiative, and the many other related intra- and extramural efforts, both nationally and internationally, must interoperate to support maximal finding and use of content. While this must extend beyond any NLM silos, NLM is in a great position to help support the creation of such a landscape. This goal can be realized via the promotion of open access models for biomedical data and scientific literature creation, annotation, and tools; via collaborations amongst the community on computational methods for content indexing and knowledge derivation; and via manual indexing efforts happening in a distributed and crowd-sourced manner.

Community engagement

Meaningful community engagement must be a key component of the NLM of the future. As the primary consumers of NLM services, biomedical researchers are well-acquainted with the strengths, weaknesses, and opportunities associated with the NLM tools.  Better understanding of user information needs and requirements for specific content, integration between data types, and search and discovery will help inform redesign of tools and data models. Consideration of the need of diverse populations, including clinicians, educators, researchers, and patients, can drive improvements that will benefit all classes of users, while also potentially identifying new opportunities.

Listening to and learning from researchers as consumers of NLM services is an important first step, but it is not sufficient. Informatics researchers -  particularly those funded by the NLM -
have extensive research experience and have developed numerous artifacts that are directly relevant to the information services provided by the NLM and NCBI. Community expertise in requirements analysis, evaluation, ontology development, standards processes, and many other emerging areas of informatics has much to offer NCBI efforts. Specific relevant efforts include visionary attempts to redefine scholarly publishing, resource identification efforts aimed at increasing research reproducibility, dataset archiving and identification tools, and annotation infrastructure for extracting key passages from papers, drug labels, and other text resources that are currently inaccessible to computational approaches. Researchers working in these areas have much to offer NLM efforts. Resources that are currently being allocated to overlapping or potentially redundant efforts, might be repurposed to support the inclusion of community efforts. This could be via workshops or ideally even for shared staff.

Unfortunately, the possibility of close collaboration between extramural researchers and the groups developing and maintaining intramural NLM tools is all too often a missed opportunity. The NLM is often perceived of as less than transparent in terms of priorities, contact points, goals, and needs.  For example, we have been puzzled by the appearance of NLM tools that seem out of step with current practices in the research community, using data in ways that have diverged from standards and leading to poor interoperability.  Limitations on accessibility of tools, in the form of cumbersome licenses for UMLS components and lack of available source code for NCBI tools contribute to the perception that the NLM is not supportive of active collaboration within the biomedical research community. Attempts to collaborate or provide feedback to NLM regarding some resources generally go through a help-desk contact email. Others’ requests, which may be similar, are opaque to the community. Conversely, sometimes new NLM standards appear to the surprise of the community, because the need for their development has not been communicated and they have overlap with existing community standards. This further complicates the data integration landscape and causes increased siloing of NLM.

Why doesn’t NLM use a tracker system like other open source projects? Why doesn’t NLM provide files according to modern version control systems? We have in the past had to scrape HTML pages to get content from NCBI. This does not reflect well upon NLM, a purported leader in information science.

The biomedical research community and the NLM have the potential to significantly increase joint impact on medicine and public health. Realizing this potential will require concrete commitments to increased interaction, collaboration, and transparency, specifically involving:

      Greater transparency through publications of plans and goals for infrastructure development efforts. As the biomedical community has little insight into the development agenda for NLM tools, contributing to that development either through direct participation or through identification of relevant technologies, vocabularies, etc. is very difficult. Early discussion and engagement with the community, through mechanisms ranging from formal NIH requests for information to blog posts and other less formal methods will invite feedback, increase engagement with developers both intramurally and extramurally and facilitate development plans that will best meet researcher needs.

      Enhanced opportunities for feedback and community engagement. Modern web technologies have introduced numerous successful models for online community engagement, including synchronous chat sessions, audio/video meetings, focused community expertise-sharing sites such as stackoverflow, and code-sharing tools such as github. The NLM should embrace these tools as means of helping users, soliciting feedback, engaging software developers, and leveraging extramural efforts.

      Targeted outreach activities: Contests, hackathons, and other “challenge” events have become a popular tool for encouraging focused efforts, particularly from students,  on interesting problems. Taking inspiration from established efforts like the DREAM challenges (http://dreamchallenges.org/) and newer programs like the AMIA student design challenge (https://www.amia.org/amia2015/student-design-challenge), NLM should invite students and others to jump in to biomedical informatics work.  These efforts might be integrated with the NLM’s training mission, perhaps including events at the NLM training program annual meeting.

      More integrated science landscape and attribution. With the new biosketch and ScienCV system, NLM has the opportunity to create a much greater linked research activity landscape. This can provision for better attribution for non-traditional contributions and better research profiling, funding body decision making, and simply a deeper understanding of the science being done and the outcomes of funding and programs. This should necessarily include an improved value system, whereby all contributions can be considered and non-traditional scientists have a more prominent role in review processes and decision making activities. NLM can uniquely support a cross-disciplinary team science approach and improved collaboration.