Reads paired-end FASTQ files (R1 and R2), validates primer sequences, and stores the read information (ID, sequence, quality scores, and overlap information) in a data frame. Reads that do not match the provided primer sequences within the specified mismatch tolerance are filtered out.

parse_fastq_pairs(
  fastq_file_r1,
  fastq_file_r2,
  forward_primer,
  reverse_primer,
  missmatch_tolerance = 2
)

Arguments

fastq_file_r1

Path to the R1 FASTQ file.

fastq_file_r2

Path to the R2 FASTQ file.

forward_primer

Forward primer sequence to validate against R1 reads.

reverse_primer

Reverse primer sequence to validate against R2 reads.

missmatch_tolerance

Number of mismatches allowed in primer matching (default: 2).

Value

A data frame with columns:

  • read_id: Base identifier for the read pair

  • sequence_r1: Forward read sequence

  • quality_r1: Forward read quality scores

  • sequence_r2: Reverse read sequence (reverse complemented)

  • quality_r2: Reverse read quality scores

  • original_id_r1: Original R1 read identifier

  • original_id_r2: Original R2 read identifier

  • overlap_length: Length of overlap between R1 and R2 sequences