Selective Sweep Detection in Genomics: From Statistical Methods to Deep Learning Approaches
Sanchit Pal Singh *
Indian Veterinary Research Institute, Izatnagar, Uttar Pradesh, India.
*Author to whom correspondence should be addressed.
Abstract
Selective sweeps, arising when beneficial alleles rapidly fix within a population and reduce genetic diversity in flanking chromosomal regions, are among the most informative signatures of positive selection in the genome. Their detection has broad implications across evolutionary biology, population genomics, and disease genetics, enabling researchers to identify loci underlying adaptation, resistance, and medically or economically important traits. Since the hitchhiking effect was first formalised by Smith and Haigh in 1974, detection methodology has advanced considerably from early neutrality tests such as Tajima’s D and Fay and Wu’s H, through haplotype-based statistics including EHH, iHS, H12 and their extensions, to contemporary machine learning frameworks capable of detecting subtle, ancient, and complex sweep signatures. This review traces that methodological progression, with particular focus on two recent convolutional neural network-based approaches: FlexSweep and the Domain Adaptive Neural Network (DANN). FlexSweep integrates eleven complementary summary statistics across multiple genomic scales, enabling detection of diverse sweeps up to 5,000 human generations old in modern genomic datasets. DANN introduces domain adaptation to population genomics via a gradient reversal layer, enabling robust sweep detection and classification in ancient DNA by actively correcting for simulation misspecification. A key finding of this review is that these two methods are best understood as complementary tools occupying distinct niches, FlexSweep excelling across diverse sweep types in modern genomic data, and DANN addressing the technically demanding problem of sweep detection in ancient DNA. The limitations of current machine learning approaches, including demographic confounding, simulation misspecification, and computational demands, are discussed alongside prospects for future application in livestock and non-model organism genomics.
Keywords: Selection signature, selective sweep, hitchhiking, flexsweep, domain adaptive neural network