Manual of the latest release

Please contact Lin Huang <linhuang@cs.stanford.edu> for questions or comments

Content

  • I/O format

  • Step 1: Indexing

  • Step 2: Calling genotypes

  • Step 3: Finalizing genotypes

  • Sample usages

  • Notes

  • I/O format

    Reveel requires a mandatory input file qry and an optional input file ref. Both files can be any of the following formats: compressed BCF, uncompressed BCF, compressed VCF, and uncompressed VCF. The qry file contains the genotype likelihoods of the query genomes in the PL field. The ref file contains the genotypes of the reference genomes in the GT field. The input file(s) should be indexed using tabix.

    Reveel produces a BCF file, containing the inferred genotypes of the query genomes. Compared to our previous release, we made the following improvements: (1) the REF field of the output BCF is consistent with that field of the qry file; (2) the sites that are invariant across the query cohort are reported in the output BCF file.

    top

    Indexing

    This step partitions a chromosome into a series of non-overlap segments.

    Usage

    Command line is

    reveel_caller index [options] <qry>

    Arguments

    Argument name Type Details
    qry string the genotype likelihoods of query genomes in VCF/BCF format

    Output

    <qry>.blk      the start and end positions of the genomic segments (for internal use only)

    Options of command  reveel_caller index

    Option name Type Details
    -m flag chunk a chromosome by the number of markers [default: chunk by positions]
    -u int maximum number of markers per segment, valid if -m is set [default: 10000]
    -b int segment length in bps, valid if -m is *not* set [default: 1000000]
    -d int SNP desert length in bps [default: 200000]

    top

    Calling genotypes

    This step infers genotypes of the query genomes from genotype likelihoods, incorporating the reference panel into the genotyping procees.

    Usage

    With a reference panel, command line is

    reveel_caller shortlist [options] <ref> <qry> <output>

    Without a reference panel, command line is

    reveel_caller shortlist -n [options] <qry> <qry> <output>

    Arguments

    Argument name Type Details
    ref string the genotypes of reference genomes in VCF/BCF format
    qry string the genotype likelihoods of query genomes in VCF/BCF format
    output string the prefix of output filenames

    Output

    <output>.iter<i>      the inferred genotypes of the query genomes in an internal format
    <output>.iter<i>.pr      an input file of Beagle (with Beagle mode only)
    <output>.iter<i>.marker      an input file of Beagle (with Beagle mode only)

    Options of command  reveel_caller shortlist

    Option name Type Details
    -n flag no reference panel [default: using a reference panel]
    -r string focus the computation on a region described as chr:beg-end
    -M int mode, 0: regular; 2: with Beagle [default: 0]. This release supports Beagle 3.3.2.
    -c int minimum number of SNPs processed in a batch [default: 5000]
    -t float candidate threshold [default: 0.5]. This threshold is set to differentiate likely polymorphic sites from constant sites. If the data set includes thousands of samples, we recommend setting this parameter to be 1. Higher threshold gives better precision but lower recall of SNP calling.
    -p float per base error rate [default: 0.001]. Users can give a rough estimation, because the experiments show our tool is not sensitive to the estimation error. In our experiments, we used 0.001. Users can try this value to begin with.
    -i int refinement iteration [default: 10]
    -m flag specical model for memory use estimation [default: genotype inference]. If this flag is set, Reveel will allocate and free memory as normal without doing the computation. This flag enables the estimation of memory usage in an efficiency manner.

    top

    Finalizing genotypes

    If Beagle is used, this step merges the outputs of Reveel and Beagle. Otherwise, this step converts the output of reveel_caller shortlist to VCF format.

    Usage

    Command line is

    reveel_caller merge <qry> <output>.iter<i> <output> [beagle_output.dose.gz]

    Arguments

    Argument name Type Details
    qry string the genotype likelihoods of query genomes in VCF format
    <output>.iter<i> string the output of reveel_caller shortlist
    <output> string the prefix of output filenames. This should be identical to the <output> argument of reveel_caller shortlist
    beagle_output.dose.gz string dose.gz file(s) produced by Beagle. This argument is required only if Beagle is used.

    Output

    <output>.wref.bcf      genotypes of query genomes in BCF format

    top

    Sample usages

  • Sample 1: regular mode; with a reference panel; partition by the number of markers, no more than 12000 markers per segment

  • reveel_caller index -m -u 12000 qry
    reveel_caller shortlist -M 0 -r 20:43000000-48000000 ref qry filename
    reveel_caller merge qry filename.iter10 filename

  • Sample 2: regular mode; without a reference panel; partition by position, no longer than 500000 bps per segment

  • reveel_caller index -b 500000 qry
    reveel_caller shortlist -n -M 0 -r 20:43000000-48000000 qry qry filename
    reveel_caller merge qry filename.iter10 filename

  • Sample 3: with Beagle mode; without a reference panel; partition by position, no longer than 1000000 bps per segment

  • reveel_caller index qry
    reveel_caller shortlist -n -M 2 -r 20:43000000-48000000 qry qry filename
    java -jar beagle.jar like=filename.20.iter10.pr markers=filename.20.iter10.marker out=prob

    reveel_caller merge qry filename.iter10 filename prob.filename.20.iter10.pr.dose.gz

  • Sample 4: estimate the memory usage of reveel_caller shortlist

  • reveel_caller shortlist -m -n -M 2 -r 20:43000000-48000000 qry qry filename

  • Sample 5: run Reveel in parallel

  • Start with index on the whole chromosome.

    Apply the shortlist+Beagle(if applicable)+merge pipeline on N non-overlapped regions, each per thread.

    Concatenate the output BCF files using BCFtools.

    top

    Notes

    The released executable is compiled on 64-bit Ubuntu 12.04.4.

    This release supports Beagle 3.3.2.

    The memory usage and computation overhead of index and merge are much lower than those of shortlist.

    top