Supporting data for "Atria: An Ultra-fast and Accurate Trimmer for Adapter and Quality Trimming"
Dataset type: Genomic, Software, Transcriptomic, Bioinformatics
Data released on September 29, 2021
As Next Generation Sequencing takes a dominant role in terms of output capacity and sequence length, adapters attached to the reads and low-quality bases hinder the performance of downstream analysis directly and implicitly, such as producing false-positive single nucleotide polymorphisms (SNP), and generating fragmented assemblies. A fast trimming algorithm is in demand to remove adapters precisely, especially in read tails with relatively low quality.
We present a trimming program named Atria. Atria matches the adapters in paired reads and finds possible overlapped regions with a super-fast and carefully designed byte-based matching algorithm (O(n) time with O(1) space). Atria also implements multi-threading in both sequence processing and file compression and supports single-end reads.
Atria performs favorably in various trimming and runtime benchmarks of both simulated and real data with other cutting-edge trimmers. We also provide an ultra-fast and lightweight byte-based matching algorithm. The algorithm can be used in a broad range of short-sequence matching applications, such as primer search and seed scanning before alignment.
The Atria executables, source code, and benchmark scripts are available at https://github.com/cihga39871/Atria under the MIT license.