Extracting “legacy” genes from an invertebrate sequence capture dataset to complement phylogenomic datasets

Caroline Miller

Authors:  Caroline Miller, Michael Forthman, Rebecca T. Kimball, Christine W. Miller

Faculty Mentor: Dr. Michael Forthman

College:  College of Agricultural and Life Sciences


The field of phylogenetics has greatly benefited from the introduction of next-generation sequencing (NGS) and genome reduction approaches, allowing molecular datasets to consist of thousands of loci from model and non-model species. These phylogenetic studies result in rich datasets comprised of targeted regions of the genome but can also include off-target reads. Molecular datasets comprised of loci from NGS genome reduction approaches are superseding Sanger-based datasets that target a few well-known loci (“legacy markers”). However, integrating these types of datasets is of interest as legacy markers can include different types of loci (e.g., mitochondrial, ribosomal, and nuclear protein coding) across a potentially larger sample of species from past phylogenetic studies. Here, I am using existing legacy data for a group of leaf-footed bugs (Hemiptera: Coreoidea) — a model group for sexual selection studies — to recover legacy markers from off-target sequences in an existing NGS genome reduced dataset comprised of protein-coding ultraconserved elements. Specifically, I use two bioinformatic resources to extract legacy markers from off-target sequences: (1) MitoFinder to retrieve mitochondrial loci (2) and BLAST to retrieve nuclear protein coding and ribosomal loci.

