nf-core/pangenome
Renders a collection of sequences into a pangenome graph. https://doi.org/10.1093/bioinformatics/btae609.
1.1.2). The latest
stable release is
1.1.3
.
Define where the pipeline should find input data and save output data.
Path to BGZIPPED input FASTA to build the pangenome graph from.
string^\S+\.fn?a(sta)?(\.gz)?$The number of haplotypes in the input FASTA.
numberThe output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.
stringEmail address for completion summary.
string^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$MultiQC report title. Printed as page header, used for filename if not otherwise specified.
stringOptions for the all versus all alignment phase.
Percent identity in the wfmash mashmap step.
number90Segment length for mapping.
string5000Minimum block length filter for mapping.
stringKmer size for mashmap.
integer19Ignore the top % most-frequent kmers.
number0.001Keep this fraction of mappings (auto for giant component heuristic).
string1.0(auto|[01]\.\d+)Merge successive mappings.
booleanDisable splitting of input sequences during mapping.
booleanSkip mappings between sequences with the same name prefix before the given delimiter character. This can be helpful if several sequences originate from the same chromosome. It is recommended that the sequence names respect the https://github.com/pangenome/PanSN-spec. In future versions of the pipeline it will be required that the sequence names follow this specification.
stringSet the directory where temporary files should be stored. Since everything runs in containers, we don’t usually set this argument.
stringThe number of files to generate from the approximate wfmash mappings to scale across a whole cluster. It is recommended to set this to the number of available nodes. If only one machine is available, leave it at 1.
integer1If this parameter is set, only the wfmash alignment step of the pipeline is executed. This option is offered for users who want to run wfmash on a cluster.
booleanFilter out mappings unlikely to be this Average Nucleotide Identity (ANI) less than the best mapping.
integer30Number of mappings for each segment. [default: n_haplotypes - 1].
integerIgnores exact matches below this length.
integer23Number of base pairs to use for transitive closure batch.
string10000000Keep this randomly selected fraction of input matches.
numberSet the directory where temporary files should be stored. Since everything runs in containers, we don’t usually set this argument.
stringInput PAF file. The wfmash alignment step is skipped.
stringOptions for graph smoothing phase.
Skip the graph smoothing step of the pipeline.
booleanMaximum path jump to include in the block.
integerMaximum edge jump before a block is broken.
integerMaximum sequence length to put int POA. Is a comma-separated list. For each integer, SMOOTHXG wil be executed once.
string700,900,1100Minimum edit-based identity to cluster sequences.
stringMinimum ‘smallest / largest’ sequence length ration to cluster in a block.
integerPath depth at which we don’t pad the POA problem.
integer100Pad each end of each seuqence in POA with ‘smoothxg_poa_padding * longest_poa_seq’ base pairs.
number0.001Score parameters for POA in the form of ‘match,mismatch,gap1,ext1,gap2,ext2’. It may also be given as presets: ‘asm5’, ‘asm10’, ‘asm15’, ‘asm20’. [default: 1,19,39,3,81,1 = asm5].
string1,19,39,3,81,1Write MAF output representing merged POA blocks.
booleanUse this prefix for consensus path names.
stringConsensus_Set the directory where temporary files should be stored. Since everything runs in containers, we don’t usually set this argument.
stringKeep intermediate graphs during SMOOTHXG.
booleanRun abPOA. [default: SPOA].
booleanRun the POA in global mode. [default: local mode].
booleanNumber of CPUs for the potentially very memory expensive POA phase of SMOOTHXG. Default is ‘task.cpus’.
integerOptions for calling variants against reference(s).
Specify a set of VCFs to produce with --vcf_spec "REF[:LEN][,REF[:LEN]]*".
stringOptions to run the partition algorithm for community detection.
Enable community detection.
booleanParameters used to describe centralised config profiles. These should not be edited.
Git commit id for Institutional configs.
stringmasterBase directory for Institutional configs.
stringhttps://raw.githubusercontent.com/nf-core/configs/masterInstitutional config name.
stringInstitutional config description.
stringInstitutional config contact information.
stringInstitutional config URL link.
stringSet the top limit for requested resources for any single job.
Maximum number of CPUs that can be requested for any single job.
integer16Maximum amount of memory that can be requested for any single job.
string128.GB^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$Maximum amount of time that can be requested for any single job.
string240.h^(\d+\.?\s*(s|m|h|d|day)\s*)+$Less common options for the pipeline, typically set in a config file.
Display help text.
booleanDisplay version and exit.
booleanMethod used to save pipeline results to output directory.
stringEmail address for completion summary, only when pipeline fails.
string^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$Send plain-text email instead of HTML.
booleanFile size limit when attaching MultiQC reports to summary emails.
string25.MB^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$Do not use coloured log outputs.
booleanIncoming hook URL for messaging service
stringCustom config file to supply to MultiQC.
stringCustom logo file to supply to MultiQC. File name must also be set in the MultiQC config file
stringCustom MultiQC yaml file containing HTML including a methods description.
stringBoolean whether to validate parameters against the schema at runtime
booleantrueShow all params when using --help
booleanValidation of parameters fails when an unrecognised parameter is found.
booleanValidation of parameters in lenient more.
booleantrueDo we want to display hidden parameters?
booleanDo we want to display hidden parameters?
stringigenomes_base