This function executes the complete workflow for analyzing amplicon sequencing data. It identifies primer dimers, valid amplicons, and off-target products, then generates a statistical summary and visualizations.

analyze_amplicons(
  fastq_file_r1,
  fastq_file_r2,
  forward_primer = "ACGTACGTACGT",
  reverse_primer = "TGCATGCATGCA",
  expected_length = 400,
  length_tolerance = 50,
  max_dimer_length = 100,
  min_quality_score = 30,
  output_dir = ".",
  write_to_disk = TRUE,
  separate_plots = FALSE
)

Arguments

fastq_file_r1

Path to the R1 FASTQ file

fastq_file_r2

Path to the R2 FASTQ file

forward_primer

Forward primer sequence

reverse_primer

Reverse primer sequence

expected_length

Expected amplicon length (default: 400)

length_tolerance

Allowed deviation from expected length (default: 50)

max_dimer_length

Maximum length for primer dimer classification (default: 100)

min_quality_score

Minimum quality score threshold (default: 30)

output_dir

Directory for output files (default: current directory)

write_to_disk

Store resulting plots and summary file to directory defined in output_dir (default: TRUE)

separate_plots

Logical indicating whether to create separate plots for each sample (default: FALSE)

Value

A list containing:

summary_stats

Data frame with statistical summary

length_distribution_plot

ggplot object showing read length distribution(s)