Reveel: a large-scale population genotyper using low-coverage sequencing data
Please contact Lin Huang <email@example.com> for questions or comments
What is Reveel?
Reveel is an ultrafast tool for single nucleotide variant calling and genotyping of large cohorts that have been sequenced at low coverage. The studied cohort size can be tens to tens of thousands or even larger. Reveel leverages the underlying complex LD structure by employing a simplified model that scales linearly with the number of individuals in a cohort for a given number of imputed SNPs, while producing highly accurate genotype calls for both high- and low-frequency SNPs.
The citation for Reveel is
Lin Huang, Bo Wang, Ruitang Chen, Sivan Bercovici, and Serafim Batzoglou, "Reveel: large-scale population genotyping using low-coverage sequencing data", Bioinformatics, vol. 32, no. 11, pp. 1686-1696, Jun. 2016.
Which release shall I use?
The latest release is recommended for all the scenarios, because it is more user-friendly, scalable, and stable.
The second release accepts four commonly used formats as inputs: compressed BCF, uncompressed BCF, compressed VCF, and uncompressed VCF. Your input files can be in any of these formats. If you have BAM files of the query genomes, we recommend you calculate genotype likelihoods using SAMtools. The initial version of Reveel takes BAM files as inputs.
The second release produces the genotypes of the query genomes in BCF format. The REF field of the output BCF is consistent with that field of the
The second release allows users to partition the chromosomes in two ways: by the number of markers and by position. This feature enables the parallel usage of Reveel.
The second release has been tested on large datasets with up to 3,910 whole genomes.
The second release can be used to build a pipeline that can infer the genotypes of very small datasets.