Tuesday, February 21, 2017

What's in a (gene) name? That which we call a gene by any other name would confuse a researcher

If you had told me that I would spend my PhD years studying a gene called Falafel, I probably would not have believed you. Yet, that is exactly what happened to me (I was also briefly studying a gene called Bazooka). When working with fruit flies, researchers often come up with entertaining names for newly discovered genes; however, these same genes in mammals can be quite different. For instance, Falafel is called PP4r3 in humans. This discrepancy in gene names (also called gene symbols) can be confusing, and part of the Monarch mission is to ease cross-talk between interspecies genotype data.  As a researcher, it can be hard to remember what a gene is called in different species, and this problem becomes more difficult if a gene name is changed. Thankfully, gene names are infrequently changed, and there are groups committed to ensuring that gene names are systematic and regulated. Recently, however, I was prompted to think of alternative names for MARCH7, a gene discovered by Monarch Principle Investigator, Melissa Haendel, in the 1990s.

Why does the name of a gene change? There are several reasons why a gene name might be changed or updated, for instance: if a newly discovered gene has no known function, but later is known to be a part of a family of genes, that newly discovered gene could be renamed to match the family it now belongs to. This is the case of the gene MARCH7, discovered by Haendel during her PhD work. Haendel originally named the gene Axotrophin, but later Axotrophin was discovered to be a member of the MARCH (membrane associated ring-CH-type finger) family of genes, and was renamed. However, MARCH7 is about to be renamed - yet again. The HUGO Gene Nomenclature Committee has recently determined that MARCH7, along with several other genes, will be renamed because, when used within Microsoft Excel (a tool popular among researchers), the gene symbol MARCH7 gets corrupted.

The Excel corruption issue occurs when a gene symbol is recognized as a date, and the original text string is irrevocably overwritten. For example, in the MARCH family of genes, MARCH7 is converted to 42801 which is then visually rendered as 7-March. Because 42801 is not recognized by computers and other software as even being a gene name anymore, it leads to incorrect analyses later. This formatting error befalls other genes families as well: SEP, SEPT, APR, MAR, DEC, NOV, and OCT. While HUGO recognizes that this is not a traditional reason to change the name of a gene, the change has been deemed necessary.

There is another formatting issue in Excel that affects a subset of genes, those named with RIKEN identifiers. These identifiers are in the form “nnnnnnnenn” where n is a digit, for example, 3400000e12. RIKEN identifiers such as these are converted into floating numbers, for instance 3400000e12 would get converted into 3.4e+12. These conversions are irreversible; once changed, the user can no longer get the original gene name back.

Blaming Excel for these errors might be the easy thing to do, but researchers have the responsibility to ensure that their data is accurate. There are several workarounds that researchers can take advantage of to limit these identifier errors. In 2004, Zeeberg and colleagues published steps to stop the automatic reformatting of gene names and also shared a programming script that can detect if a gene name has accidentally been converted into a date or into a floating number format. But it seems that researchers are not taking advantage of these resources. A recent article by Ziemann et al. examined lists of gene names from 18 journals published in the last 10 years and found that almost 20% of papers with gene lists had erroneous gene names in those lists. Ultimately, HUGO has decided that the best solution for this gene symbol debacle is to change the names of these problematic genes.

So now the researchers that are most familiar with the MARCH family of genes have been tasked with renaming these gene symbols. What should be the new symbol for MARCH7? One suggested idea is MAUL; our own Melissa Haendel supports this name because, as she said, “Axotrophin killed everything I put it in!” While the semantic future of MARCH7 is yet to be determined, we do know that these gene symbol name changes will have far-reaching effects. In my blog post next week, I will discuss some of these ramifications and delve deeper into the problems that are caused by divergent gene symbols.