Calling Structural Variants with Confidence from Short-Read Data in Wild Bird Populations. Journal Article uri icon

Overview

abstract

  • Comprehensive characterization of structural variation in natural populations has only become feasible in the last decade. To investigate the population genomic nature of structural variation, reproducible and high-confidence structural variation callsets are first required. We created a population-scale reference of the genome-wide landscape of structural variation across 33 Nordic house sparrows (Passer domesticus). To produce a consensus callset across all samples using short-read data, we compare heuristic-based quality filtering and visual curation (Samplot/PlotCritic and Samplot-ML) approaches. We demonstrate that curation of structural variants is important for reducing putative false positives and that the time invested in this step outweighs the potential costs of analyzing short-read-discovered structural variation data sets that include many potential false positives. We find that even a lenient manual curation strategy (e.g. applied by a single curator) can reduce the proportion of putative false positives by up to 80%, thus enriching the proportion of high-confidence variants. Crucially, in applying a lenient manual curation strategy with a single curator, nearly all (>99%) variants rejected as putative false positives were also classified as such by a more stringent curation strategy using three additional curators. Furthermore, variants rejected by manual curation failed to reflect the expected population structure from SNPs, whereas variants passing curation did. Combining heuristic-based quality filtering with rapid manual curation of structural variants in short-read data can therefore become a time- and cost-effective first step for functional and population genomic studies requiring high-confidence structural variation callsets.

publication date

  • April 2, 2024

has restriction

  • gold

Date in CU Experts

  • March 22, 2024 3:45 AM

Full Author List

  • David G; Bertolotti A; Layer R; Scofield D; Hayward A; Baril T; Burnett HA; Gudmunds E; Jensen H; Husby A

author count

  • 10

Other Profiles

Electronic International Standard Serial Number (EISSN)

  • 1759-6653

Additional Document Info

volume

  • 16

issue

  • 4