Details of the loop with your submission script
#!/bin/bash
#SBATCH -p express
#SBATCH -t 1:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --mail-user=<your.email.address@univ.edu>
#SBATCH --mail-type=end
module load trimmomatic
module load jre/1.8.0_221
for infile in *_1.fastq.gz; do base=$(basename ${infile} _1.fastq.gz); java -jar /opt/trimmomatic/0.38/prebuilt/trimmomatic-0.38.jar PE -threads 2 ${infile} ${base}_2.fastq.gz ${base}_1.trim.fastq.gz ${base}_1un.trim.fastq.gz ${base}_2.trim.fastq.gz ${base}_2un.trim.fastq.gz SLIDINGWINDOW:4:20 MINLEN:25 ILLUMINACLIP:NexteraPE-PE.fa:2:40:15; done
The submission script part
By now we should know the following lines set up the submission as an express queue, on one node, with 1 processor, for one hour:
#!/bin/bash
#SBATCH -p express
#SBATCH -t 1:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --mail-user=<your.email.address@univ.edu>
#SBATCH --mail-type=end
The line module load trimmomatic
sets up the default parameters for trimmomatic
Now the loop!
for infile in *_1.fastq.gz
do base=$(basename ${infile} _1.fastq.gz)
java -jar /opt/trimmomatic/0.38/prebuilt/trimmomatic-0.38.jar PE -threads 2 ${infile} ${base}_2.fastq.gz ${base}_1.trim.fastq.gz ${base}_1un.trim.fastq.gz ${base}_2.trim.fastq.gz ${base}_2un.trim.fastq.gz SLIDINGWINDOW:4:20 MINLEN:25 ILLUMINACLIP:NexteraPE-PE.fa:2:40:15
done
-
First we setup the variable
${infile}
as we iterate through the folder looking for all files ending in_1.fastq.gz
. So${infile}
represents all the COMPLETE filenames that have_1.fastq.gz
in them (e.g.SRR2589044_1.fastq.gz
). Remember that for every_1.fastq.gz
file, there is a corresponding_2.fastq.gz
file. -
The command
do base=$(basename ${infile} _1.fastq.gz)
does a couple things. We’ve seen the use ofbasename
before, so we know that the basename of the variable${infile}
is removing the_1.fastq.gz
from the variable value and as a result leaves only the names of the SAMPLES (e.g. SRR2589044, SRR2584863, SRR2584866, etc.) which are given the variable name${base}
. -
The trimmomatic command combines the
${infile}
variable for the complete name of the_1.fastq.gz
filenames with the${base}
variable to fill in the names for the other filenames for example;${base}_2.fastq.gz
, and${base}_1.trim.fastq.gz
, or${base}_2.trim.fastq.gz
. Below is a detailed description of the variable, values, and resulting filenames for our loop.
Broken down:
First time through loop:
VARIABLE | VARIABLE VALUE | RESULT |
---|---|---|
${infile} | SRR2589044_1.fastq.gz | SRR2589044_1.fastq.gz |
${base}_2.fastq.gz | SRR2589044 | SRR2589044_2.fastq.gz |
${base}_1.trim.fastq.gz | SRR2589044 | SRR2589044_1.trim.fastq.gz |
${base}_1un.trim.fastq.gz | SRR2589044 | SRR2589044_1.untrim.fastq.gz |
${base}_2.trim.fastq.gz | SRR2589044 | SRR2589044_2.trim.fastq.gz |
${base}_2un.trim.fastq.gz | SRR2589044 | SRR2589044_1.untrim.fastq.gz |
Second time through loop:
VARIABLE | VARIABLE VALUE | RESULT |
---|---|---|
${infile} | SRR2584863_1.fastq.gz | SRR2584863_1.fastq.gz |
${base}_2.fastq.gz | SRR2584863 | SRR2584863_2.fastq.gz |
${base}_1.trim.fastq.gz | SRR2584863 | SRR2584863_1.trim.fastq.gz |
${base}_1un.trim.fastq.gz | SRR2584863 | SRR2584863_1.untrim.fastq.gz |
${base}_2.trim.fastq.gz | SRR2584863 | SRR2584863_2.trim.fastq.gz |
${base}_2un.trim.fastq.gz | SRR2584863 | SRR2584863_1.untrim.fastq.gz |
etc. (This could go on for hundreds of files!)
Go Back to the Genomics Trimming and Filtering Lesson to look at your Trimmomatic outputs.