Reveel: a large-scale population genotyper using low-coverage sequencing data

Please contact Lin Huang <linhuang@cs.stanford.edu> for questions or comments

What is Reveel?

Reveel is an ultrafast tool for single nucleotide variant calling and genotyping of large cohorts that have been sequenced at low coverage. The studied cohort size can be tens to tens of thousands or even larger. Reveel leverages the underlying complex LD structure by employing a simplified model that scales linearly with the number of individuals in a cohort for a given number of imputed SNPs, while producing highly accurate genotype calls for both high- and low-frequency SNPs.

Highlighted features

  • Ultrafast

  • High genotyping accuracy

Publication

The citation for Reveel is

Lin Huang, Bo Wang, Ruitang Chen, Sivan Bercovici, and Serafim Batzoglou, "Reveel: large-scale population genotyping using low-coverage sequencing data", Bioinformatics, vol. 32, no. 11, pp. 1686-1696, Jun. 2016.

Latest release

  • October 13, 2016: The second release of Reveel is now available here. The user manual is available here.

Previous release

Which release shall I use?

Short answer:

The latest release is recommended for all the scenarios, because it is more user-friendly, scalable, and stable.

Long answer:

The second release accepts four commonly used formats as inputs: compressed BCF, uncompressed BCF, compressed VCF, and uncompressed VCF. Your input files can be in any of these formats. If you have BAM files of the query genomes, we recommend you calculate genotype likelihoods using SAMtools. The initial version of Reveel takes BAM files as inputs.

The second release produces the genotypes of the query genomes in BCF format. The REF field of the output BCF is consistent with that field of the qry.vcf file. The sites that are invariant across the query cohort are reported in the output BCF file. These features are not supported in the initial release.

The second release allows users to partition the chromosomes in two ways: by the number of markers and by position. This feature enables the parallel usage of Reveel.

The second release has been tested on large datasets with up to 3,910 whole genomes.

The second release can be used to build a pipeline that can infer the genotypes of very small datasets.