nf-core/eager
A fully reproducible and state-of-the-art ancient DNA analysis pipeline
2.3.2). The latest
stable release is
2.5.3
.
22.10.6.
Learn more.
Define where the pipeline should find input data, and additional metadata.
Either paths or URLs to FASTQ/BAM data (must be surrounded with quotes). For paired end data, the path must use ‘{1,2}’ notation to specify read pairs. Alternatively, a path to a TSV file (ending .tsv) containing file paths and sequencing/sample metadata. Allows for merging of multiple lanes/libraries/samples. Please see documentation for template.
stringnullSpecifies whether you have UDG treated libraries. Set to ‘half’ for partial treatment, or ‘full’ for UDG. If not set, libraries are assumed to have no UDG treatment (‘none’). Not required for TSV input.
stringSpecifies that libraries are single stranded. Always affects MALTExtract but will be ignored by pileupCaller with TSV input. Not required for TSV input.
booleanSpecifies that the input is single end reads. Not required for TSV input.
booleanSpecifies which Illumina sequencing chemistry was used. Used to inform whether to poly-G trim if turned on (see below). Not required for TSV input. Options: 2, 4.
integer4Specifies that the input is in BAM format. Not required for TSV input.
booleanAdditional options regarding input data.
If library result of SNP capture, path to BED file containing SNPS positions on reference genome.
stringTurns on conversion of an input BAM file into FASTQ format to allow re-preprocessing (e.g. AdapterRemoval etc.).
booleanSpecify locations of references and optionally, additional pre-made indices
Path or URL to a FASTA reference file (required if not iGenome reference). File suffixes can be: ‘.fa’, ‘.fn’, ‘.fna’, ‘.fasta’.
stringName of iGenomes reference (required if not FASTA reference). Requires argument --igenomes_ignore false, as iGenomes is ignored by default in nf-core/eager
stringDirectory / URL base for iGenomes references.
strings3://ngi-igenomes/igenomes/Do not load the iGenomes reference config.
booleanPath to directory containing pre-made BWA indices (i.e. everything before the endings ‘.amb’ ‘.ann’ ‘.bwt’. Most likely the same path as —fasta). If not supplied will be made for you.
stringPath to directory containing pre-made Bowtie2 indices (i.e. everything before the endings e.g. ‘.1.bt2’, ‘.2.bt2’, ‘.rev.1.bt2’. Most likely the same value as —fasta). If not supplied will be made for you.
stringPath to samtools FASTA index (typically ending in ‘.fai’). If not supplied will be made for you.
stringPath to picard sequence dictionary file (typically ending in ‘.dict’). If not supplied will be made for you.
stringSpecify to generate more recent ‘.csi’ BAM indices. If your reference genome is larger than 3.5GB, this is recommended due to more efficient data handling with the ‘.csi’ format over the older ‘.bai’.
booleanIf not already supplied by user, turns on saving of generated reference genome indices for later re-usage.
booleanSpecify where to put output files and optional saving of intermediate files
The output directory where the results will be saved.
string./resultsMethod used to save pipeline results to output directory.
stringLess common options for the pipeline, typically set in a config file.
Display help text.
booleanEmail address for completion summary.
string^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$Email address for completion summary, only when pipeline fails.
string^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$Send plain-text email instead of HTML.
booleanFile size limit when attaching MultiQC reports to summary emails.
string25.MBDo not use coloured log outputs.
booleanCustom config file to supply to MultiQC.
stringDirectory to keep pipeline Nextflow logs and reports.
string${params.outdir}/pipeline_infoShow all params when using --help
booleanParameter used for checking conda channels to be set correctly.
booleanBoolean whether to validate parameters against the schema at runtime
booleantrueString to specify ignored parameters for parameter validation
stringgenomesString to describe the config profile that is run.
stringSet the top limit for requested resources for any single job.
Maximum number of CPUs that can be requested for any single job.
integer16Maximum amount of memory that can be requested for any single job.
string128.GBMaximum amount of time that can be requested for any single job.
string240.hParameters used to describe centralised config profiles. These generally should not be edited.
Git commit id for Institutional configs.
stringmasterBase directory for Institutional configs.
stringhttps://raw.githubusercontent.com/nf-core/configs/masterInstitutional configs hostname.
stringInstitutional config description.
stringInstitutional config contact information.
stringInstitutional config URL link.
stringThe AWSBatch JobQueue that needs to be set when running on AWSBatch
stringThe AWS Region for your AWS Batch job to run on
stringeu-west-1Path to the AWS CLI tool
stringSkip any of the mentioned steps.
booleanbooleanbooleanbooleanbooleanbooleanProcessing of Illumina two-colour chemistry data.
Turn on running poly-G removal on FASTQ files. Will only be performed on 2 colour chemistry machine sequenced libraries.
booleanSpecify length of poly-g min for clipping to be performed.
integer10Options for adapter clipping and paired-end merging.
Specify adapter sequence to be clipped off (forward strand).
stringAGATCGGAAGAGCACACGTCTGAACTCCAGTCACSpecify adapter sequence to be clipped off (reverse strand).
stringAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTASpecify read minimum length to be kept for downstream analysis.
integer30Specify minimum base quality for trimming off bases.
integer20Specify minimum adapter overlap required for clipping.
integer1Skip of merging forward and reverse reads together and turns on paired-end alignment for downstream mapping. Only applicable for paired-end libraries.
booleanSkip adapter and quality trimming.
booleanSkip quality base trimming (n, score, window) of 5 prime end.
booleanOnly use merged reads downstream (un-merged reads and singletons are discarded).
booleanSpecify the maximum Phred score used in input FASTQ files
integer41Options for reference-genome mapping
Specify which mapper to use. Options: ‘bwaaln’, ‘bwamem’, ‘circularmapper’, ‘bowtie2’.
stringSpecify the -n parameter for BWA aln, i.e. amount of allowed mismatches in the alignment.
number0.04Specify the -k parameter for BWA aln, i.e. maximum edit distance allowed in a seed.
integer2Specify the -l parameter for BWA aln i.e. the length of seeds to be used.
integer1024Specify the number of bases to extend reference by (circularmapper only).
integer500Specify the FASTA header of the target chromosome to extend (circularmapper only).
stringMTTurn on to remove reads that did not map to the circularised genome (circularmapper only).
booleanSpecify the bowtie2 alignment mode. Options: ‘local’, ‘end-to-end’.
stringSpecify the level of sensitivity for the bowtie2 alignment mode. Options: ‘no-preset’, ‘very-fast’, ‘fast’, ‘sensitive’, ‘very-sensitive’.
stringSpecify the -N parameter for bowtie2 (mismatches in seed). This will override defaults from alignmode/sensitivity.
integerSpecify the -L parameter for bowtie2 (length of seed substrings). This will override defaults from alignmode/sensitivity.
integerSpecify number of bases to trim off from 5’ (left) end of read before alignment.
integerSpecify number of bases to trim off from 3’ (right) end of read before alignment.
integerOptions for production of host-read removed FASTQ files for privacy reasons.
Turn on per-library creation pre-Adapter Removal FASTQ files without reads that mapped to reference (e.g. for public upload of privacy sensitive non-host data)
booleanHost removal mode. Remove mapped reads completely from FASTQ (remove) or just mask mapped reads sequence by N (replace).
stringOptions for quality filtering and how to deal with off-target unmapped reads.
Turn on filtering of mapping quality, read lengths, or unmapped reads of BAM files.
booleanMinimum mapping quality for reads filter.
integerSpecify minimum read length to be kept after mapping.
integerDefines whether to discard all unmapped reads, keep only bam and/or keep only fastq format Options: ‘discard’, ‘bam’, ‘fastq’, ‘both’.
stringOptions for removal of PCR amplicon duplicates that can artificially inflate coverage.
Deduplication method to use. Options: ‘markduplicates’, ‘dedup’.
stringTurn on treating all reads as merged reads.
booleanOptions for calculating library complexity (i.e. how many unique reads are present).
Specify the step size of Preseq.
integer1000Options for calculating and filtering for characteristic ancient DNA damage patterns.
Specify length filter for DamageProfiler.
integer100Specify number of bases of each read to consider for DamageProfiler calculations.
integer15Specify the maximum misincorporation frequency that should be displayed on damage plot. Set to 0 to ‘autoscale’.
number0.3Turn on PMDtools
booleanSpecify range of bases for PMDTools to scan for damage.
integer10Specify PMDScore threshold for PMDTools.
integer3Specify a path to reference mask for PMDTools.
stringSpecify the maximum number of reads to consider for metrics generation.
integer10000Turn on damage rescaling of BAM files using mapDamage2 to probabilistically remove damage.
booleanLength of read for mapDamage2 to rescale from 5p end.
integer12Length of read for mapDamage2 to rescale from 3p end.
integer12Options for getting reference annotation statistics (e.g. gene coverages)
Turn on ability to calculate no. reads, depth and breadth coverage of features in reference.
booleanPath to GFF or BED file containing positions of features in reference file (—fasta). Path should be enclosed in quotes.
stringOptions for trimming of aligned reads (e.g. to remove damage prior genotyping).
Turn on BAM trimming. Will only run on non-UDG or half-UDG libraries
booleanSpecify the number of bases to clip off reads from ‘left’ end of read for half-UDG libraries.
integer1Specify the number of bases to clip off reads from ‘right’ end of read for half-UDG libraries.
integer1Specify the number of bases to clip off reads from ‘left’ end of read for non-UDG libraries.
integer1Specify the number of bases to clip off reads from ‘right’ end of read for non-UDG libraries.
integer1Turn on using softclip instead of hard masking.
booleanOptions for variant calling.
Turn on genotyping of BAM files.
booleanSpecify which genotyper to use either GATK UnifiedGenotyper, GATK HaplotypeCaller, Freebayes, or pileupCaller. Options: ‘ug’, ‘hc’, ‘freebayes’, ‘pileupcaller’, ‘angsd’.
stringSpecify which input BAM to use for genotyping. Options: ‘raw’, ‘trimmed’, ‘pmd’ or ‘rescaled’.
stringSpecify GATK phred-scaled confidence threshold.
integer30Specify GATK organism ploidy.
integer2Maximum depth coverage allowed for genotyping before down-sampling is turned on.
integer250Specify VCF file for SNP annotation of output VCF files. Optional. Gzip not accepted.
stringSpecify GATK output mode. Options: ‘EMIT_VARIANTS_ONLY’, ‘EMIT_ALL_CONFIDENT_SITES’, ‘EMIT_ALL_ACTIVE_SITES’.
stringSpecify HaplotypeCaller mode for emitting reference confidence calls . Options: ‘NONE’, ‘BP_RESOLUTION’, ‘GVCF’.
stringSpecify GATK output mode. Options: ‘EMIT_VARIANTS_ONLY’, ‘EMIT_ALL_CONFIDENT_SITES’, ‘EMIT_ALL_SITES’.
stringSpecify UnifiedGenotyper likelihood model. Options: ‘SNP’, ‘INDEL’, ‘BOTH’, ‘GENERALPLOIDYSNP’, ‘GENERALPLOIDYINDEL’.
stringSpecify to keep the BAM output of re-alignment around variants from GATK UnifiedGenotyper.
booleanSupply a default base quality if a read is missing a base quality score. Setting to -1 turns this off.
stringSpecify minimum required supporting observations to consider a variant.
integer1Specify to skip over regions of high depth by discarding alignments overlapping positions where total read depth is greater than specified in —freebayes_C.
integerSpecify ploidy of sample in FreeBayes.
integer2Specify path to SNP panel in bed format for pileupCaller.
stringSpecify path to SNP panel in EIGENSTRAT format for pileupCaller.
stringSpecify calling method to use. Options: ‘randomHaploid’, ‘randomDiploid’, ‘majorityCall’.
stringSpecify the calling mode for transitions. Options: ‘AllSites’, ‘TransitionsMissing’, ‘SkipTransitions’.
stringSpecify which ANGSD genotyping likelihood model to use. Options: ‘samtools’, ‘gatk’, ‘soapsnp’, ‘syk’.
stringSpecify which output type to output ANGSD genotyping likelihood results: Options: ‘text’, ‘binary’, ‘binary_three’, ‘beagle’.
stringTurn on creation of FASTA from ANGSD genotyping likelihood.
booleanSpecify which genotype type of ‘base calling’ to use for ANGSD FASTA generation. Options: ‘random’, ‘common’.
stringOptions for creation of a per-sample FASTA sequence useful for downstream analysis (e.g. multi sequence alignment)
Turns on ability to create a consensus sequence FASTA file based on a UnifiedGenotyper VCF file and the original reference (only considers SNPs).
booleanSpecify name of the output FASTA file containing the consensus sequence. Do not include .vcf in the file name.
stringSpecify the header name of the consensus sequence entry within the FASTA file.
stringMinimum depth coverage required for a call to be included (else N will be called).
integer5Minimum genotyping quality of a call to be called. Else N will be called.
integer30Minimum fraction of reads supporting a call to be included. Else N will be called.
number0.8Options for creation of a SNP table useful for downstream analysis (e.g. estimation of cross-mapping of different species and multi-sequence alignment)
Turn on MultiVCFAnalyzer. Note: This currently only supports diploid GATK UnifiedGenotyper input.
booleanTurn on writing write allele frequencies in the SNP table.
booleanSpecify the minimum genotyping quality threshold for a SNP to be called.
integer30Specify the minimum number of reads a position needs to be covered to be considered for base calling.
integer5Specify the minimum allele frequency that a base requires to be considered a ‘homozygous’ call.
number0.9Specify the minimum allele frequency that a base requires to be considered a ‘heterozygous’ call.
number0.9Specify paths to additional pre-made VCF files to be included in the SNP table generation. Use wildcard(s) for multiple files.
stringSpecify path to the reference genome annotations in ‘.gff’ format. Optional.
stringNASpecify path to the positions to be excluded in ‘.gff’ format. Optional.
stringNASpecify path to the output file from SNP effect analysis in ‘.txt’ format. Optional.
stringNAOptions for the calculation of ratio of reads to one chromosome/FASTA entry against all others.
Turn on mitochondrial to nuclear ratio calculation.
booleanSpecify the name of the reference FASTA entry corresponding to the mitochondrial genome (up to the first space).
stringMTOptions for the calculation of biological sex of human individuals.
Turn on sex determination for human reference genomes. This will run on single- and double-stranded variants of a library separately.
booleanSpecify path to SNP panel in bed format for error bar calculation. Optional (see documentation).
stringOptions for the estimation of contamination of human DNA.
Turn on nuclear contamination estimation for human reference genomes.
booleanThe name of the X chromosome in your bam/FASTA header. ‘X’ for hs37d5, ‘chrX’ for HG19.
stringXOptions for metagenomic screening of off-target reads.
Turn on removal of low-sequence complexity reads for metagenomic screening with bbduk
booleanSpecify the entropy threshold that under which a sequencing read will be complexity filtered out. This should be between 0-1.
number0.3Turn on metagenomic screening module for reference-unmapped reads.
booleanSpecify which classifier to use. Options: ‘malt’, ‘kraken’.
stringSpecify path to classifier database directory. For Kraken2 this can also be a .tar.gz of the directory.
stringSpecify a minimum number of reads a taxon of sample total is required to have to be retained. Not compatible with —malt_min_support_mode ‘percent’.
integer1Percent identity value threshold for MALT.
integer85Specify which alignment mode to use for MALT. Options: ‘Unknown’, ‘BlastN’, ‘BlastP’, ‘BlastX’, ‘Classifier’.
stringSpecify alignment method for MALT. Options: ‘Local’, ‘SemiGlobal’.
stringSpecify the percent for LCA algorithm for MALT (see MEGAN6 CE manual).
integer1Specify whether to use percent or raw number of reads for minimum support required for taxon to be retained for MALT. Options: ‘percent’, ‘reads’.
stringSpecify the minimum percentage of reads a taxon of sample total is required to have to be retained for MALT.
number0.01Specify the maximum number of queries a read can have for MALT.
integer100Specify the memory load method. Do not use ‘map’ with GPFS file systems for MALT as can be very slow. Options: ‘load’, ‘page’, ‘map’.
stringSpecify to also produce SAM alignment files. Note this includes both aligned and unaligned reads, and are gzipped. Note this will result in very large file sizes.
booleanOptions for authentication of metagenomic screening performed by MALT.
Turn on MaltExtract for MALT aDNA characteristics authentication.
booleanPath to a text file with taxa of interest (one taxon per row, NCBI taxonomy name format)
stringPath to directory containing containing NCBI resource files (ncbi.tre and ncbi.map; available: https://github.com/rhuebler/HOPS/)
stringSpecify which MaltExtract filter to use. Options: ‘def_anc’, ‘ancient’, ‘default’, ‘crawl’, ‘scan’, ‘srna’, ‘assignment’.
stringSpecify percent of top alignments to use.
number0.01Turn off destacking.
booleanTurn off downsampling.
booleanTurn off duplicate removal.
booleanTurn on exporting alignments of hits in BLAST format.
booleanTurn on export of MEGAN summary files.
booleanMinimum percent identity alignments are required to have to be reported. Recommended to set same as MALT parameter.
number85Turn on using top alignments per read after filtering.
boolean