Parse Paired-End FASTQ Files — parse_fastq

Reads paired-end FASTQ files (R1 and R2), validates primer sequences, and stores the read information (ID, sequence, quality scores, and overlap information) in a data frame. Reads that do not match the provided primer sequences within the specified mismatch tolerance are filtered out.

parse_fastq_pairs(
  fastq_file_r1,
  fastq_file_r2,
  forward_primer,
  reverse_primer,
  missmatch_tolerance = 2
)

Arguments

fastq_file_r1: Path to the R1 FASTQ file.
fastq_file_r2: Path to the R2 FASTQ file.
forward_primer: Forward primer sequence to validate against R1 reads.
reverse_primer: Reverse primer sequence to validate against R2 reads.
missmatch_tolerance: Number of mismatches allowed in primer matching (default: 2).

Value

A data frame with columns:

read_id: Base identifier for the read pair
sequence_r1: Forward read sequence
quality_r1: Forward read quality scores
sequence_r2: Reverse read sequence (reverse complemented)
quality_r2: Reverse read quality scores
original_id_r1: Original R1 read identifier
original_id_r2: Original R2 read identifier
overlap_length: Length of overlap between R1 and R2 sequences