The final goal of genome sequencing projects is to understand how expressed genes and metabolic networks create the mystery of life.
The main steps towards deciphering life are: genome sequencing, gene function annotation, metabolic and functional reconstruction and, finally, converting the reconstruction into a mathematical model that would describe cellular metabolism in motion. The model must be simplified and then analyzed in terms of both expected and unexpected kinetic or dynamic behavior. The latter may be a good indication of a new and important phenomenon worth experimental verification.
There are specific problems along these steps:
- Successful conversion (mapping) of a genome onto the best matching metabolic reconstruction strictly requires a common nomenclature of gene names, enzymes, transporters, their subunits, isoforms, cellular localizations, coenzymes, cofactors, and any other signature features that would help to identify and discriminate closely related pathways.
- Unfortunately, the vast majority of genomes submitted to the public sequence databases do not follow a common nomenclature. This requires an additional step of a manual standardization of the gene names, re-naming protein functions, assigning EC numbers, computing possible sub-cellular localization, etc.
In the recent years, public repositories started processing an enormous influx of Next Gen sequencing and metagenomic data. Owing to this, genome annotation became mostly automatic. This resulted in substituting traditional specific function assignment by the "ORF calling", as well as accumulation of non-specific function names indicating belonging to a class/family of proteins sharing a common sequence motif. Such names are worthless for metabolic reconstructions, and sequences annotated using this method require a time-consuming manual analysis and re-annotation.
The automatic annotation causes errors to propagate and accumulate in the databases, resulting in BLAST outputs filled with perfect matches among "putative uncharacterized proteins" (PUP). Most of these initial errors were just fused or frame-shifted ORFs whose function could not be automatically recognized.
Some of these were due to multifunctional "promiscuous" enzymes, which are very often seen in the smaller genomes. These multifunctional enzymes require an additional sequence and literature analysis to predict and re-annotate their additional functions. Since possible range of such promiscuity is a matter of guessing, the requirements for these additional functions can be established only via metabolic reconstruction running in parallel with the re-annotation process.
This problem is well recognized within the genomics community, and steps are being taken to resolve it in the course of a few years. To aid in the effort, Genome Designs plans to publish complete functional annotations for a number of model genomes to help build a "golden standard" for the re-annotation. These genomes can be used later to improve the automated function assignments over the entire phylogenetic tree.
In that regard, we are looking for partnerships with academic groups applying for government funding for sequencing and annotation of bacteria, fungi and plants of high significance.
Genome Designs has proprietary tools and techniques that allow for devising not only the metabolic machinery of an organism, but also its regulation mechanisms. A number of patents are being filed to support the know-how.
Email us for more information.