seqtools


CountSplicedReads (download)

Counts the number of spliced reads (or splice junction mapping reads) in a given RNA-Seq reads genomic alignment bam file.

USAGE:   perl  CountSplicedReads.pl   input_bam   remove_temporary_files

Arguments Description
input_bam Input bam file
remove_temporary_files T/F default value is ‘T’ (true)

Note: Picard tools and BamUtil binary are required for this script.
Path to Picard and bamUtil directory can be set in the script (default path is /usr/bin/).



DetermineFastqQualityEncoding (download)

Determines quality value encoding format in a given fastq file.

USAGE:   perl   DetermineFastqQualityEncoding.pl   fq

Arguments Description
fq Input fastq file


FastqPairedEndValidator (download)

Validates order of paired-end reads in given fastq files.

USAGE:   perl   FastqPairedEndValidator.pl   fq1   fq2

Arguments Description
fq1 Fastq format file for the first (left) pair
fq2 Fastq format file for the second (right) pair


AddPairedEndSuffix (download)

Adds \1 and \2 suffix (tags) to the first (left) and second (right) pairs of paired-end read names respectively in given fastq files.

USAGE:   perl   AddPairedEndSuffix.pl   fq_in   fq_out   pair_tag

Arguments Description
fq_in Input fastq format file
fq_out Output fastq format file
pair_tag Paired-end tag which is to be added (1 for left-end pair and 2 for right-end pair)


FqToSamPicard (download)

Converts paired-end fastq files to a merged and sorted (on read name) SAM file.

USAGE:   perl   FqToSamPicard.pl   fq1   fq2   out_tag   quality_format

Arguments Description
fq1 Fastq format file for the first pair
fq2 Fastq format file for the second pair
out_tag A string for the output SAM file suffix. Resulting file will be out_tag.merged.sorted.sam
quality_format Fastq quality scale (“Standard”, “Solexa”, “Illumina”)

Note: Picard tools are required for this program. Path to Picard directory can be set in the script (default path is /usr/bin/).

Picard’s FastqToSam program will automatically convert quality values in “Standard” or Phred scale.



UnmappedSamToFastq (download)

Fixes the order of appearance of paired-end reads in fastq files using a merged SAM files and also separates unpaired reads.

The merged SAM file is generated using paired-end read fastq files using Picard tools FastqToBam program followed by MergeSam program. See instructions here.

USAGE:   perl   UnmappedSamToFastq.pl   mergedFqSam   out_tag

Arguments Description
mergedFqSam Merged and read name soreted SAM file generated from the raw (unordered) fastq files.
out_tag A string for the output SAM file suffix. Resulting files will be: out_tag_1.fastq (left pair), out_tag_2.fastq (right pair) and out_tag.fastw (unpaired) reads.

Fix the order of appearance of paired-end reads in fastq format files:

Let’s say fastq1 and fastq2 files contain left and right mates of the paired-end reads, respectively.

  1. Convert fastq files to SAM files (individually) using Picard’s FastqToSam file.
  2. Merge the two SAM files into one using the Picard’s MergeSamFiles and set the sorting order as “queryname”.
  3. Supply the merged sam file to UnmappedSamToFastq in order to obtain paired-end fastq files and the third fastq files with unpaired reads (reads for which a pair was not found).

Step 1-2 can be perfoemed by the FqToSamPicard (described above).
To check if the reads in a given pair of fastq files are in correct order, run FastqPairedEndValidator (described above).
Please note that the UnmappedSamToFastq program expects “\1” and “\2” tags in the read names to distinguish between left and right end reads, respectively. If your fastq files do not
contain these tag, please run AddPairedEndSuffix (described above) in order to add these tags to the reads in your fastq files.
FqToSamPicard also expects the quality encoding format of the fastq files in order to run Picard tools. This format can be determined
using the program: DetermineFastqQualityEncoding (described above).

(Sanger or Illumina 1.9+ => “Standard”, Illumina 1.5+ => “Illumina”, Illumina 1.3+ => “Illumina”, Solexa => “Solexa”)

Overall workflow has been shown below: