Custom Genome Annotation

To learn all functions of the expressed genes' products for either fundamental research of commercial use, genomes must be annotated. This requires aligning new, unrecognized gene sequences with those that have already been annotated. The best, but slowest way to identify a gene's function is to check what its product does in experimental conditions. The sequences with experimentally validated functions form the foundation of bioinformatics. These validated sequences can then be aligned with uncharacterized sequences to find those which might have similar functions. It is simple to assign functions to sequences via computer alignment, as long as there is no necessity to validate them experimentally.

In the last decade, a massive influx of newly sequenced genomes and metagenomes dramatically changed the traditional, slower way of genome annotation - the process became almost completely automated, retiring the practice of experimental validation except for rare specific cases.

Automated computer-based genome annotation is not as thorough as when done manually, resulting in a quickly growing number of unrecognized gene products which are designated "Putative Uncharacterized Proteins" (PUPs). The majority of these PUPs are functional; however, recognition of those functions requires a time consuming manual annotation, which translates into an ever-increasing staff of annotators and validating experimentalists.

Another negative effect of automated genome annotation is that, instead of being described by their particular functions, many gene products are named according to their protein class, categorized by common structural motifs, domains, generic functions, intracellular locations, etc. This type of classification may be used as a foundation for further manual analysis, but it is irrelevant to the metabolic reconstruction technology that assembles proteins into complex reaction networks.

