Approaches for Annotation of Endogenous Retrovirus Elements

Bioinformatics Internship Presentation

Olumuyiwa Daramola (Mentor: Dr. Arifa Khan, CBER, U.S. Food and Drug Administration)

August 30th, 2016, 2:00pm, Room 1300, Harris Building

Endogenous retrovirus sequences are a result of the replication life cycle of retroviruses that are stably present in high copy number in the genome of all host species. Retroviruses reproduce by reverse transcribing their RNA genomes into cDNA using an RNA-dependent-DNA polymerase also called reverse transcriptase (RT), which is then stably integrated into the host-cell genome as an endogenous element called the provirus. The laboratory is developing a new reference virus database (rVDB) for accurate detection of viral sequences including endogenous retroelements. Since endogenous retroviruses integrated in the host genome, they are associated with host cell sequences and are often poorly annotated or not at all. The aim of the project was to annotate such sequences obtained from GenBank ( to determine viral / cellular integration junctions and identify retroviral sub-genomic regions (5’LTR, gag, pol, env, and 3’LTR) that would facilitate BLAST searches for accurate virus detection.

Blastn tool ( was used to develop a step-wise strategy for mapping viral and host cell sequences.  Initially, a well-annotated, full-length genome of feline leukemia virus (FeLV, GI 9630707) was used as reference to map the viral / host sequence junctions in a selected list of queries of well-annotated, complete and partial endogenous FeLVs obtained from rVDB. Individual FeLV sub-genomic regions (5’LTR, gag, pol, env, and 3’LTR) were used to determine homologous regions in the query. Additionally, the integration junction of the virus into the host DNA was determined using blastn of the terminal regions containing 5’ and 3’ LTRs. The next step was to use a well-annotated, full-length of porcine endogenous retrovirus (PERV, GI 300825687) was used to map the viral sequences in selected list of annotated PERVs and poorly-annotated PERV-related sequences from other Sus species.

The project resulted in development of the initial steps for annotation of endogenous retroviral sequences in rVDB.  The use of well-annotated sequences (FeLV and PERV) confirmed that the strategy could be used to annotate some poorly characterized endogenous retroviral sequences from related host species (Sus scrofa and Sus Barbados).  In the future, these strategies can be applied for annotating unknown sequences, such as retrotransposons, in rVDB.