Quality control assessment of the RNA-Seq data generated from liver and pituitary transcriptome of Hereford bulls using StrandNGS software

Chandra Shekhar Pareek; Mateusz Sachajko; Adrian Szczepański; Magdalena Buszewska-Forajta; Katarzyna Żarczyńska; Przemysław Sobiech; Edyta Juszczuk-Kubiak; Qaisar Shahzad; Yang Qing Lu; Magdalena Ogłuszka; Ewa Polawska; Mariusz Pierzchała

doi:10.12775/TRVS.2019.001

Authors

Chandra Shekhar Pareek Centre for Modern Interdisciplinary Technologies, Nicolaus Copernicus University, Toruń, Poland Centre of Veterinary Sciences, Inter-university Centre of Veterinary Medicine, Nicolaus Copernicus University, Toruń https://orcid.org/0000-0002-0329-787X
Mateusz Sachajko Centre for Modern Interdisciplinary Technologies, Nicolaus Copernicus University, Toruń, Poland Centre of Veterinary Sciences, Inter-university Centre of Veterinary Medicine, Nicolaus Copernicus University, Toruń https://orcid.org/0000-0003-1901-6101
Adrian Szczepański Voluntary Author https://orcid.org/0000-0002-5928-5499
Magdalena Buszewska-Forajta Voluntary Author https://orcid.org/0000-0003-1401-2558
Katarzyna Żarczyńska Department and Clinic of Internal Diseases, Faculty of Veterinary Medicine, University of Warmia and Mazury in Olsztyn https://orcid.org/0000-0003-4969-8887
Przemysław Sobiech Department and Clinic of Internal Diseases, Faculty of Veterinary Medicine, University of Warmia and Mazury in Olsztyn https://orcid.org/0000-0001-9595-5907
Edyta Juszczuk-Kubiak Voluntary Author https://orcid.org/0000-0001-5093-5320
Qaisar Shahzad State Key Laboratory for Conservation and Utilisation of Subtropical Agro-bioresources, Guangxi University, Nanning, Guangxi, 530004 https://orcid.org/0000-0003-1418-0340
Yang Qing Lu State Key Laboratory for Conservation and Utilisation of Subtropical Agro-bioresources, Guangxi University, Nanning, Guangxi, 530004 https://orcid.org/0000-0003-1641-6142
Magdalena Ogłuszka Institute of Genetics and Animal Breeding, Polish Academy of Sciences, Jastrzębiec https://orcid.org/0000-0001-6226-4114
Ewa Polawska Institute of Genetics and Animal Breeding, Polish Academy of Sciences, Jastrzębiec https://orcid.org/0000-0002-6097-9826
Mariusz Pierzchała Institute of Genetics and Animal Breeding, Polish Academy of Sciences, Jastrzębiec https://orcid.org/0000-0001-7001-1336

DOI:

https://doi.org/10.12775/TRVS.2019.001

Keywords

RNA-seq, NGS, quality control, Bos taurus, cattle, liver, pituitary gland, Hereford, transcriptome, strandNGS

Abstract

Background: Quality control (QC) assessment is the most critical step in the high-throughput RNA-seq data analysis to characterize the in-depth understanding of genome and transcriptome assembling to a given reference genome. It provides not only a quick insight into the RNA-seq data quality to allow early identification of good or bad RNA-seq data samples, but also to verify the alignment QC checks for further essential high-throughput bioinformatics analysis such as, identification of novel genetic variants, differentially expressed genes (DEGs), gene network and metabolic pathways.

Method: After isolation of total RNA from liver (n=15) and pituitary gland (n=15) tissues of young Hereford bulls, the pooled total RNA (n=30) were fragmented using GeneRead rRNA depletion kit (Qiagen, Hilden, Germany) and cDNA library preparation were preformed using ScriptSeq^TM v2 RNA-Seq library preparation kit (Epicentre, illumina, USA), followed by high-throughput sequencing of combined liver and pituitary transcriptome using MiSeq reagent kit v2 (illumina, USA) to obtain high quality of paired-end RNA-seq reads of 251 base-pairs (bps). In this paper, the QC assessment of obtained RNA-seq raw data as well as post-alignment QC of processed RNA-seq data of combined liver and pituitary transcriptome (n=30) of Hereford bulls were performed using the strand NGS software v1.3 (Agilent; http://www.strand-ngs.com/) data analysis package. The reads were aligned with Bowtie using default settings against both Bull and Cow genome assembly.

Results: Using two runs of MiSeq platform, a total of over 60 million paired-end RNA-seq reads were successfully obtained and submitted to NCBI SRA resources (https://www.ncbi.nlm.nih.gov/sra?linkname=bioproject_sra_all&from_uid=312148). Library complexity plot results revealed 72.02% of duplicate reads with a low library complexity value of 0.28. The pre-alignment QC analysis of raw RNA-seq data revealed the sequence read lengths ranged from 35-251 bp size with more than 50% of all reads with length over 200bp and 10% of reads below 100bp.

Conclusion: By testing the RNA-seq methodology on Illumina platform, two MiSeq sequencing runs yielded significantly high quality of 30 million sequencing reads per single MiSeq run. Our initial pre-alignment and post-alignment analysis of RNA-seq data analysis revealed that mapping of the Hereford liver and pituitary gland transcriptome to reference Bos taurus genome was successfully performed, however, more than 50% of all reads with length over 200bp were recovered. Therefore, obtained results concludes that liver and pituitary transcriptome sequencing with rRNA depletion method is less effective than mRNA RNA-seq method.

References

Pareek CS, Smoczynski R, Tretyn A. Sequencing technologies and genome sequencing. J Appl Genet. 2011;52: 413–435.

Williams AG, Thomas S, Wyman SK, Holloway AK. RNA-seq Data: Challenges in and Recommendations for Experimental Design and Analysis. Curr Protoc Hum Genet. 2014;83: 1-20.

Chao HP, Chen Y, Takata Y, Tomida MW, Lin K, Kirk JS, Simper MS, Mikulec CD, Rundhaug JE, Fischer SM, Chen T, Tang DG, Lu Y, Shen J. Systematic evaluation of RNA-Seq preparation protocol performance. BMC Genomics. 2019;20: 571.

Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, Szcześniak MW, Gaffney DJ, Elo LL, Zhang X, Mortazavi A. A survey of best practices for RNA-seq data analysis. Genome Biol. 2016; 17: 13.

Xu J, Gong B, Wu L, Thakkar S, Hong H, Tong W. Comprehensive Assessments of RNA-seq by the SEQC Consortium: FDA-Led Efforts Advance Precision Medicine. Pharmaceutics. 2016; 8: E8.

Li W, Richter RA, Jung Y, Zhu Q, Li RW. Web-based bioinformatics workflows for end-to-end RNA-seq data computation and analysis in agricultural animal species. BMC Genomics. 2016; 17: 761.

Kroll KW, Mokaram NE, Pelletier AR, Frankhouser DE, Westphal MS, Stump PA, Stump CL, Bundschuh R, Blachly JS, Yan P. Quality Control for RNA-Seq (QuaCRS): An Integrated Quality Control Pipeline. Cancer Inform. 2014; 13(Suppl 3): 7-14.

Pérez-Rubio P, Lottaz C, Engelmann JC. FastqPuri: high-performance preprocessing of RNA-seq data. BMC Bioinformatics. 2019; 20: 226.

Zhou Q, Su X, Jing G, Chen S, Ning K. RNA-QC-chain: comprehensive and fast quality control for RNA-Seq data. BMC Genomics. 2018;19: 144.

Pandey RV, Pabinger S, Kriegner A, Weinhäusel A. ClinQC: a tool for quality control and cleaning of Sanger and NGS data in clinical research. BMC Bioinformatics. 2016;17: 56.

Lee S, Lee S, Ouellette S, Park WY, Lee EA, Park PJ. NGSCheckMate: software for validating sample identity in next-generation sequencing studies within and across data types. Nucleic Acids Res. 2017;45: e103.

Sheng Q, Vickers K, Zhao S, et al. Multi-perspective quality control of Illumina RNA sequencing data analysis. Brief Funct Genomics. 2017;16: 194–204.

Wingett SW, Andrews S. FastQ Screen: A tool for multi-genome mapping and quality control. F1000Res. 2018;7 :1338

Patel RK, Jain M. NGS QC Toolkit: a toolkit for quality control of next generation sequencing data. PLoSOne 2012;7: e30619.

SEQC/MAQC-III Consortium. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat Biotechnol. 2014;32: 903-914.

Xu J, Gong B, Wu L, Thakkar S, Hong H, Tong W. Comprehensive Assessments of RNA-seq by the SEQC Consortium: FDA-Led Efforts Advance Precision Medicine. Pharmaceutics. 2016;8: pii E8.

Chomczynski P, Sacchi N. Single-step method of RNA isolation by acid guanidinium thiocyanate-phenol-chloroform extraction. Anal Biochem. 1987;162: 156–159.

Liu Y, Schroder J, Schmidt B. Musket: a multistage k-mer spectrum-based error corrector for Illumina sequence data. Bioinformatics. 2013;29: 308–15.

Ilie L, Fazayeli F, Ilie S. HiTEC: accurate error correction in high-throughput sequencing data. Bioinformatics 2011;27: 295–302.

Schroder J, Schroder H, Puglisi SJ, etal. SHREC: a short-read error correction method. Bioinformatics 2009;25: 2157–63.

Pareek CS, Smoczyński R, Kadarmideen HN, Dziuba P, Błaszczyk P, Sikora M, Walendzik P, Grzybowski T, Pierzchała M, Horbańczuk J, Szostak A, Ogluszka M, Zwierzchowski L, Czarnik U, Fraser L, Sobiech P, Wąsowicz K, Gelfand B, Feng Y, Kumar D. Single Nucleotide Polymorphism Discovery in Bovine Pituitary Gland Using RNA-Seq Technology. PLoS One. 2016;11: e0161370.

Pareek CS, Błaszczyk P, Dziuba P, Czarnik U, Fraser L, Sobiech P, Pierzchała M, Feng Y, Kadarmideen HN, Kumar D. Single nucleotide polymorphism discovery in bovine liver using RNA-seq technology. PLoS One. 2017;12: e0172687.

Wysocka D, Sobiech P, Herudzińska M, Sachajko M, Pareek CS. Investigation of candidate genes for metabolic disorders expressed in liver and pituitary gland by comparing the RNA-seq data of Polish-HF and Polish-Red cattle. Trans Res Vet Sci. 2018;1: 69–83.

Pareek CS, Sachajko M, Jaskowski JM, Herudzinska M, Skowronski M, Domagalski K, Szczepanek J, Czarnik U, Sobiech P, Wysocka D, Pierzchala M, Polawska E, Stepanow K, Ogłuszka M, Juszczuk-Kubiak E, Feng Y, Kumar D. Comparative Analysis of the Liver Transcriptome among Cattle Breeds Using RNA-seq. Vet Sci. 2019;6: 36.

Quality control assessment of the RNA-Seq data generated from liver and pituitary transcriptome of Hereford bulls using StrandNGS software

Authors

DOI:

Keywords

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Stats

Search

Browse

User

Current Issue

Newsletter

Tags

Translational Research in Veterinary Science

Quality control assessment of the RNA-Seq data generated from liver and pituitary transcriptome of Hereford bulls using StrandNGS software

Authors

DOI:

Keywords

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Stats

Search

Browse

User

Current Issue

Newsletter

Tags