Benchmark of algorithms for multiple DNA sequence alignment across livestock species

Artur Bąk; Grzegorz Migdałek; Chandra Shekhar Pareek; Kacper Żukowski

doi:10.12775/TRVS.2020.009

Authors

Artur Bąk Department of Cattle Breeding and Genetics, National Research Institute of Animal Production, Balice https://orcid.org/0000-0001-6623-0587
Grzegorz Migdałek Institute of Biology, Pedagogical University of Cracow https://orcid.org/0000-0003-1458-2673
Chandra Shekhar Pareek Department of Fundamental and Preclinical Sciences, Faculty of Biology and Veterinary Sciences, Nicolaus Copernicus University in Toruń https://orcid.org/0000-0002-0329-787X
Kacper Żukowski Department of Fundamental and Preclinical Sciences, Faculty of Biology and Veterinary Sciences, Nicolaus Copernicus University, Toruń https://orcid.org/0000-0002-5690-3634

DOI:

https://doi.org/10.12775/TRVS.2020.009

Keywords

multiple sequence alignment, ClustalO, ClustalW, Kalign, MAFFT, MUSCLE, Probcons and T-Coffee, bioinformatics pipeline, livestock

Abstract

Background: Due to the growing amount of biological data, it is often necessary to select the most optimal estimation method for DNA sequence alignment across livestock species. One of the most important benches of genomics is to modelling homology between considered DNA sequences. A multiple sequence alignment is a potent tool for molecular and evolutionary biology, and there are several programs and algorithms applicable for this purpose. The purpose of this paper was to study the most commonly used DNA alignment algorithms to select the optimal tool dedicated for short sequences.

Methods: Four steps of bioinformatics pipelines were considered to benchmark the algorithms for multiple DNA sequence alignment across livestock species: 1) selection of reference genome sequences of ARS1.2 for cattle, EquCab3.0 for horse and vicPac2 for alpaca with a low E-value using TBLASTn 2) removing gaps for these sequences 3) alignment of obtained sequences using examined algorithms 4) matching the quality of aligned sequences with sequences of reference genomes by more software. The time of computation was archived for the whole analysis. The seven programs were utilized, each based on different alignment algorithms, namely: ClustalO, ClustalW, Kalign, MAFFT, MUSCLE, Probcons and T-Coffee.

Results: The result obtained in this study showed that the fastest is progressive algorithms such as Kalign or MUSCLE-FAST. Moreover, the iterative algorithms like MAFFT and MUSCLE revealed a higher quality of the alignment. The T-Coffee and Probcons programs were computational cost-effective; simultaneously, they were generating a medium-quality calculation in a relatively long time. The best quality of alignment was shown by iterative variants of the MAFFT program; however, the speed of the calculations was relatively low. The fastest algorithm was Kalign, making alignment much faster than the competitors, but achieving average results in the quality of the alignment. The average speed ratio concerning the quality of the analyzed algorithms was obtained by the progressive version of MAFFT, NS1.

Conclusions: We conclude that the results of this study can be used to re-alignment of variant primers in new livestock genome releases.

References

Soon WW, Hariharan M, Snyder MP. High-throughput sequencing for biology and medicine. Mol Syst Biol. 2013;9:640.

Pareek CS, Smoczynski R, Tretyn A. Sequencing technologies and genome sequencing. J Appl Genet. 2011;52:413-35.

Zhou X, Ren L, Meng Q, Li Y, Yu Y, Yu J. The next-generation sequencing technology and application. Protein Cell. 2010;1:520-36.

Bąk A, Bodziony D, Migdałek G, Pareek CS, Żukowski K. Evaluation of analytical protocols of alignment mapping tools using high throughput next-generation genome sequencing data. Transl Res Vet Sci. 2020;3:62-65.

Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Söding J, Thompson JD, Higgins DG. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol. 2011;7:539.

Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 1997;25:4876-82.

Higgins DG, Bleasby AJ, Fuchs R. CLUSTAL V: improved software for multiple sequence alignment. Comput Appl Biosci. 1992;8:189-91.

Sievers F, Higgins DG. Clustal omega. Curr Protoc Bioinformatics. 2014;48:3.13.

Sievers F, Higgins DG. The Clustal Omega Multiple Alignment Package. Methods Mol Biol. 2021;2231:3-16.

Lassmann T, Sonnhammer EL. Kalign--an accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics. 2005;6:298.

Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002;30:3059-3066.

Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792-1797.

Do CB, Mahabhashyam MS, Brudno M, Batzoglou S. ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Res. 2005;15:330-340.

Di Tommaso P, Moretti S, Xenarios I, Orobitg M, Montanyola A, Chang JM, Taly JF, Notredame C. T-Coffee: a web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension. Nucleic Acids Res. 2011;39:W13-7.

Carroll H, Beckstead W, O'Connor T, Ebbert M, Clement M, Snell Q, McClellan D. DNA reference alignment benchmarks based on tertiary structure of encoded proteins. Bioinformatics. 2007;23:2648-9.

Benchmark of algorithms for multiple DNA sequence alignment across livestock species

Authors

DOI:

Keywords

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Stats

Search

Browse

User

Current Issue

Newsletter

Tags

Translational Research in Veterinary Science

Benchmark of algorithms for multiple DNA sequence alignment across livestock species

Authors

DOI:

Keywords

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Stats

Search

Browse

User

Current Issue

Newsletter

Tags