The amplicon package provides tools for analyzing pair
end amplicon sequencing data, with a focus on identifying and
characterizing PCR artifacts such as primer dimers and off-target
amplification products. This vignette will guide you through the main
features and typical workflow of the package.
The first step is to load your paired-end FASTQ files using the
parse_fastq_pairs function, as well as define the sequence
of the used primers (both in the same reading direction). Feel free to
use the provided (simulated) toy data! The primer sequences are defined
accordingly.
# Example with FASTQ files
fp <- "ACGTACGTACGT"
rp <- "TGCATGCATGCA"
forward_reads_path <- system.file("extdata", "r1_test.fastq", package = "amplicon")
reverse_reads_path <- system.file("extdata", "r2_test.fastq", package = "amplicon")
fastq_files <- parse_fastq_pairs(
fastq_file_r1 = forward_reads_path,
fastq_file_r2 = reverse_reads_path,
forward_primer = fp,
reverse_primer = rp
)After loading your data, you can identify potential primer dimers:
# Detect primer dimers with default parameters
primer_dimers <- detect_primer_dimer(
reads_df = fastq_files,
max_dimer_length = 100, # Maximum length for primer dimer classification
forward_primer = fp,
reverse_primer = rp
)
# View summary of primer dimers
summary(primer_dimers)Next, analyze your sequences for off-target amplification products:
# Detect off-target amplification
offtargets <- detect_offtargets(
reads_df = fastq_files,
expected_length = 400, # Expected amplicon length
length_tolerance = 50, # Allowed deviation from expected length
max_dimer_length = 100 # Maximum length for primer dimer classification
)
# Access the results
print(paste("Too short sequences:", sum(offtargets$too_short)))
print(paste("Too long sequences:", sum(offtargets$too_long)))Check if your amplicons meet the expected criteria:
# Validate amplicons
valid_amplicons <- detect_valid_amplicon(
reads_df = fastq_files,
min_quality = 30, # Minimum quality score
expected_length = 400, # Expected amplicon length
length_tolerance = 50 # Allowed deviation from expected length
)Perform a complete analysis of your amplicon sequencing data. You can choose between creating a combined plot with all samples (when sample number is limited) or separate plots for each sample:
# Run full analysis with combined plot (default)
results_combined <- analyze_amplicons(
forward_reads_path,
reverse_reads_path,
expected_length = 400,
length_tolerance = 50,
min_quality = 30,
max_dimer_length = 100,
separate_plots = FALSE # default behavior
)
# Run analysis with separate plots for each sample
results_separate <- analyze_amplicons(
forward_reads_path,
reverse_reads_path,
expected_length = 400,
length_tolerance = 50,
min_quality = 30,
max_dimer_length = 100,
separate_plots = TRUE # create individual plots
)
# Access different components of the analysis
print(results_combined$summary_stats)The package provides visualization options to help interpret your results. You can either create a combined plot with all samples using facets, or separate plots for each sample:
# For combined plot (default behavior)
plot(results_combined$length_distribution_plot)
# For separate plots
# results_separate$length_distribution_plot contains a list of plots, one per sample
# Plot a specific sample
plot(results_separate$length_distribution_plot[[1]]) # First sample
# Plot all samples in sequence
for (sample_plot in results_separate$length_distribution_plot) {
plot(sample_plot)
}When using separate_plots = TRUE, individual PDF files
are automatically saved in your output directory with names like
length_distribution_<sample_name>.pdf. With
separate_plots = FALSE (default), a single combined plot is
saved as length_distribution.pdf.
You can customize various parameters to suit your specific needs:
# Example with custom parameters
results <- analyze_amplicons(
fastq_file_r1, # Pair end read paths
fastq_file_r2,
forward_primer, # The primers used in your experiment
reverse_primer,
output_dir, # Custom output directory
expected_length = 400,
length_tolerance = 30, # Stricter length tolerance
min_quality_score = 35, # Higher quality threshold
max_dimer_length = 80, # Stricter primer dimer definition
separate_plots = TRUE # Generate individual plots per sample
)This vignette covered the basic workflow and main features of the
amplicon package. For more detailed information about
specific functions, please refer to the function documentation using
?function_name in R.
sessionInfo()
#> R version 4.5.0 (2025-04-11)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] amplicon_0.0.0.9000
#>
#> loaded via a namespace (and not attached):
#> [1] digest_0.6.37 desc_1.4.3 R6_2.6.1 fastmap_1.2.0
#> [5] xfun_0.52 cachem_1.1.0 knitr_1.50 htmltools_0.5.8.1
#> [9] rmarkdown_2.29 lifecycle_1.0.4 cli_3.6.5 pkgdown_2.1.2
#> [13] sass_0.4.10 textshaping_1.0.1 jquerylib_0.1.4 systemfonts_1.2.3
#> [17] compiler_4.5.0 tools_4.5.0 ragg_1.4.0 evaluate_1.0.3
#> [21] bslib_0.9.0 yaml_2.3.10 jsonlite_2.0.0 rlang_1.1.6
#> [25] fs_1.6.6