ORION-VIRCAT: A Tool for Mapping ICTV and NCBI Taxonomies

 
 
 
Research>ORION-VIRCAT
 
 
  :: Main Menu
  :: Corporate Research
Company Home Page
Company Products
Corporate
News
Links of Interest

 

Motivation: Viruses, viroids and prions are the smallest infectious biological entities that depend on their host for replication. The number of pathogenic viruses is considerably large and their impact in human global health is well documented. Currently, the International Committee on the Taxonomy of Viruses (ICTV) has classified about 4,379 virus species while the National Center for Biotechnology Information Viral Genomes Resource (NCBI-VGR) database has mapped 617,705 proteins to eight large taxonomic groups. Despite these efforts, an automated approach for mapping the ICTV master list and its officially accepted virus naming to the NCBI-VGR’s taxonomical classification is not available. Due to metagenomic sequencing, it is likely that the discovery and naming of new viral species will increase by at least ten fold. Unfortunately, existing viral databases are not adequately prepared to scale, maintain and annotate automatically ultra-high throughput sequences and place this information into specific taxonomic categories.

Results: ORION-VIRCAT is a scalable and interoperable object-relational database designed to serve as a resource for the integration and verification of taxonomical classifications generated by the ICTV and NCBI-VGR. The current release (v1.0) of ORION-VIRCAT is implemented in PostgreSQL and it has been extended to ORACLE, MySQL, and SyBase. ORION-VIRCAT automatically mapped and joined 617,705 entries from the NCBI-VGR to the viral naming of the ICTV. This detailed analysis revealed that 399,095 entries from the NCBI-VGR can be mapped to the ICTV classification and that 1 Order, 10 families, 35 genera and 503 species listed in the ICTV disagree with the the NCBI-VGR classification schema. Nevertheless, we were eable to correct several discrepancies mapping 234,000 additional entries.

PERL ICTV-NCBI Mapping Scripts

ICTV2NCBI_PERL_parser ICTV_PERL_parser

 

ICTV-NCBI Mapping Taxonomical Discrepancies (Alphabetic)

ICTV_Orders.txt ICTV_Orders_Alph.txt
ICTV_Families.txt ICTV_Families_Alph.txt
ICTV_Subfamilies.txt ICTV_Subfamilies_Alph.txt
ICTV_Genuses.txt ICTV_Genuses_Alph.txt
ICTV_Species.txt ICTV_Species_Alph.txt
ICTV_Strains.txt ICTV_Strains_Alph.txt
ICTV_Master.txt ICTV_NCBI_errors.txt
Master_Species_List-2008.csv Master_Species_List-2008-2.csv


  :: Researcher Publications
  :: Researcher Projects
Genomic Barcoding
Biomarker Discovery
Nanoscale Diagnostics
Malaria Genomics
Motif Co-evolution
Synthetic Biology
Virtual Education