-c ${custom_config}
, where -c
is the command
flag and ${custom_config}
is the path to the custom config.
-params-file
option (e.g.:
-params-file params.yml
where the yaml file contains all
custom parameters). Discussion linked to a params.yml
file
can be found at https://github.com/nf-core/viralrecon/issues/283 and https://github.com/nf-core/rnaseq/issues/754.work
directory are copied over to the
resulting directory (typically called results
) - deleting
the work directory may be desired to avoid duplication of files, however
deleting this directory also removes the log files (the
.command.*
files) and prevents you from resuming a pipeline
execution (with -resume
). If you do not
need to resume a pipeline run, and would like to remove large files from
the work
directory while keeping the logs, you may clean
this directory with the command nextflow clean -k -f
(and
we recommend performing a dry-run prior to this with
nextflow clean -k -n
).We recommend users to join the nf-core Slack group to search for
questions across pipelines specific channels (each pipeline has their
own channel [e.g.: #rnaseq
, atacseq
]). You may
find the link to join their Slack at https://nf-co.re/join.
In addition, nf-core has other beginner friendly channels such as the
#nostupidquestions
which may be useful for more general
questions across multiple pipelines. Please follow their guidelines when
asking a question along with the following best practices nicely
summarized in stack overflow: https://stackoverflow.com/help/how-to-ask (e.g.: use
Slack’s search function to search for your problem; provide a synopsis
of the issue along with a reproducible example etc. Please refrain from
DM pipeline developers).
-profile cheaha
. Full documentation of the profile can be
found in the nf-core config GitHub: https://github.com/nf-core/configs/blob/master/docs/cheaha.mdWe recommend to use this profile when processing multiple samples in
the pipeline as it will aid in parallel processing. If your run only
contains a small amount of samples, standard local
mode (to
a compute node) is usually sufficient. Feel free to let us know if
additional customization would benefit the community.
nf-core contains a library of pipelines which can be found at https://nf-co.re/pipelines. However, below are notes and tips for pipelines which U-BDS have tested in-depth:
Although a user choice, we generally recommend to enable Salmon’s
flags --seqBias --gcBias
. For versions >= 3.10, users
can enable these flags by using the following pipeline parameter:
extra_salmon_quant_args: "--seqBias --gcBias"
Note for versions < 3.10: a custom
custom.config
is required to enable these flags. An example
is provided below for version 3.6 (this configuration provides the flags
in the salmon run with star inputs, and with the salmon run where
quasi-mapping was performed by salmon):
// tool params not linked to direct pipeline params.
process {
withName: '.*:QUANTIFY_STAR_SALMON:SALMON_QUANT' {
ext.args = "--seqBias --gcBias"
}
withName: '.*:QUANTIFY_SALMON:SALMON_QUANT' {
ext.args = "--seqBias --gcBias"
}
}
Note that the syntax linked to custom parameters in the
custom.config
may differ from pipeline to pipeline or even
versions. We recommend to following similar syntax to what is present in
the modules.config
of the pipeline of interest such as this
one for v.3.6. If you have questions about this, please feel free to
ask a question linked to it during our data science office hours.
transcript_fasta
is not a required parameter by the
pipeline. If this is not provided by the user, it will be generated by
the process called RSEM_PREPAREREFERENCE_TRANSCRIPTS
(which
may be renamed in the near-future as this process is executed whenever
the user does not provide a transcriptome, regardless of the choice of
--aligner
). For more information, reference heretranscript_fasta
which is used at the Salmon steps, it
should be a transcriptome file generated from gffread
as
opposed to one downloaded from GENCODE or Ensembl. Issues linked to
Ensembl transcriptomes are derieved from the fact that the transcriptome
files (even after coding and non-coding RNA concatenation) have less
transcripts than the GTF file. The issues linked to GENCODE files are
linked to an issue in the pipeline which prevents it from correctly
processing the off-the-shelf transcript fasta due to the sequence name
format (despite enabling the --gencode
parameter). For more
information, reference here
and here
--fasta
and
--gtf
along with --save_reference
the first
time you run the pipeline and then let it generate all of the downstream
artifacts like the transcriptome and indices to be recycled thereafter
via including them in them as params in the yml. This is the way
recommended by the authors per the nf-core/rnaseq slack channel. For
more information, reference heregffread
example:
samtools faidx Mus_musculus.GRCm38.dna.primary_assembly.fa
gffread -w Mus_musculus.GRCm38_GTF_matched_transcripts.fa -g ./Mus_musculus.GRCm38.dna.primary_assembly.fa Mus_musculus.GRCm38.102.gtf
From the example above, the newly generated transcriptome
Mus_musculus.GRCm38_GTF_matched_transcripts.fa
would be
passed on to the pipeline.
We recommend to creata a conda environment with both dependencies and record the versions.
gffread
documentation can be found from: http://ccb.jhu.edu/software/stringtie/gff.shtml#gffread
transcript_fasta
: https://github.com/nf-core/rnaseq/issues/753container override example:
process {
withName: NANOPLOT {
container = 'https://depot.galaxyproject.org/singularity/nanoplot:1.32.1--py_0'
}
}
Place the above lines in a file called custom.config
.
When the pipleine is run, specify a new command-line parameter
-c path/to/custom.config
to the nextflow command, where
path/to/custom.config
would be the path to the custom
config file.
--input
must end in
.txt
process {
withName: SRA_FASTQ_FTP {
ext.args = '--retry 5 --continue-at -'
}
}
params.yml
file, and pass that to the pipeline by adding
-params-file params.yml
to the nextflow run command.enable_conda: true
force_sra_downloads: true
custom.conf
file and pass it to the pipeline by adding -c custom.conf
to the nextflow run command. Note that this command sets the max size to
50GB but just tweak this number to accommodate your downloadsprocess {
withName: SRATOOLS_PREFETCH {
ext.args = '--max-size 50G'
}
}
/tmp
dir on
the cluster, which is apt to run out of space while the process is
running and cause the macs2 output files to be empty (/tmp
is by design a small file system and it would be preferable to use user
scratch or user data). Since this version of the pipeline is DSL1, there
is not a way to modify the parameters of this process to store temporary
files in a different directory to properly resolve the issue, without
modifying the source code and executing it appropriately (this goes
beyond the scope of this guide. If more information is needed, we
recommend to stop by our data science office hours). As a result
we recommend running this pipeline on a local/personal computer rather
than on the cluster for the aforementioned versions.
Error in read.table(PeakFiles[idx], sep = "\t", header = FALSE) :
no lines available in input
Execution halted
alevin_qc
, which is a
collection of various stats and graphs about your dataset. These qc
results are only produced when using alevin as an aligner.