Evaluation of analytical protocols of alignment mapping tools using high throughput next-generation genome sequencing data
DOI:
https://doi.org/10.12775/TRVS.2020.005Keywords
Next-generation sequencing, NGS, illumina, Aligners, Alignments, Mapping, Algorithm, Reads, GenomeAbstract
Background: Ever since the development of first next-generation genome sequencer (NGS) in 2005, there are rapid developments of high throughput next-generation genome sequencing (HT-NGS) techniques and tools used in genetics and genomics has become much more comfortable and cheaper. The result is the generation of a massive amount of data sets, requiring detailed analysis, which becomes impossible without the use of appropriate bioinformatics tools. One of the crucial steps in the analysis of NGS data is to map readings to a reference sequence. Although the dominance of Illumina synthesis by sequencing (SBS) technology has been noticeable in recent years, the choice of the tools is hampered and the variety of input data and reference genomes. Moreover, the tools used are crucial for result files and further analysis.Methods: The subject of this paper is the three most frequently used alignment mapping programs, which have functions to allow working with many platforms: BWA, Bowtie2 and SMALT. The task of the tested aligners is to match short sequences coming from NGS with reference sequences. The most popular: BWA and Bowtie2 use for this purpose the Burrows-Wheeler transformation and SMALT maps the sequences using hashing and dynamic programming. The presented paper aimed to compare the quality and efficiency of the alignment mapping programs under examination, due to three criteria: i) the quality of the compared sequences of different lengths and from different platforms; ii) coefficient of wrongly compared sequences; iii) the computational resources used.
Results: By comparing the results of the mapping analyses for all the programs used, the least popular SMALT is the best. Obtaining the highest percentage of mapped readings for each platform and maintaining the lowest computational memory usage, turns out to be the most optimal choice.
Conclusions: The results presented in this paper can be used to verify and rebuild data analysis pipelines from NGS based so far on other tools. We conclude that by using the tools under appropriate conditions, it is possible to improve the quality of the analyses, speed them up and reduce their cost.
References
Pareek CS, Smoczynski R, Tretyn A. Sequencing technologies and genome sequencing. J Appl Genet. 2011;52:413-35.
Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754-60.
Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010;26:589-95.
Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9:357-9.
Langmead B, Wilks C, Antonescu V, Charles R. Scaling read aligners to hundreds of threads on general-purpose processors. Bioinformatics. 2019;35:421-432.
Grada A, Weinbrecht K. Next-generation sequencing: methodology and application. J Invest Dermatol. 2013;133:e11.
Niedringhaus TP, Milanova D, Kerby MB, Snyder MP, Barron AE. Landscape of next-generation sequencing technologies. Anal Chem. 2011;83:4327-41.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078-9.
Reinert K, Langmead B, Weese D, Evers DJ. Alignment of Next-Generation Sequencing Reads. Annu Rev Genomics Hum Genet. 2015;16:133-51.
Downloads
Published
How to Cite
Issue
Section
License
Title, logo and layout of TR in VS are reserved trademarks of TR in VR.
Stats
Number of views and downloads: 473
Number of citations: 0