This function identifies valid amplicons based on expected length range and quality criteria. It accounts for potential overlap between paired-end reads by performing sequence alignment.

detect_valid_amplicon(
  reads_df,
  expected_length = 400,
  length_tolerance = 50,
  min_quality_score = 30,
  min_overlap_length = 10,
  max_mismatch_rate = 0.1
)

Arguments

reads_df

A data frame from parse_fastq_pairs containing read pairs

expected_length

Expected amplicon length (default: 400)

length_tolerance

Allowed deviation from expected length (default: 50)

min_quality_score

Minimum quality score threshold (default: 30)

min_overlap_length

Minimum length of overlap to consider (default: 10)

max_mismatch_rate

Maximum allowed mismatch rate in overlap region (default: 0.1)

Value

A logical vector indicating which reads are classified as valid amplicons