Reads paired-end FASTQ files (R1 and R2), validates primer sequences, and stores the read information (ID, sequence, quality scores, and overlap information) in a data frame. Reads that do not match the provided primer sequences within the specified mismatch tolerance are filtered out.
parse_fastq_pairs(
fastq_file_r1,
fastq_file_r2,
forward_primer,
reverse_primer,
missmatch_tolerance = 2
)Path to the R1 FASTQ file.
Path to the R2 FASTQ file.
Forward primer sequence to validate against R1 reads.
Reverse primer sequence to validate against R2 reads.
Number of mismatches allowed in primer matching (default: 2).
A data frame with columns:
read_id: Base identifier for the read pair
sequence_r1: Forward read sequence
quality_r1: Forward read quality scores
sequence_r2: Reverse read sequence (reverse complemented)
quality_r2: Reverse read quality scores
original_id_r1: Original R1 read identifier
original_id_r2: Original R2 read identifier
overlap_length: Length of overlap between R1 and R2 sequences