Quality control assessment of the RNA-Seq data generated from liver and pituitary transcriptome of Hereford bulls using StrandNGS software

Chandra Shekhar Pareek, Mateusz Sachajko, Adrian Szczepański, Magdalena Buszewska-Forajta, Katarzyna Żarczyńska, Przemysław Sobiech, Edyta Juszczuk-Kubiak, Qaisar Shahzad, Yang Qing Lu, Magdalena Ogłuszka, Ewa Polawska, Mariusz Pierzchała

DOI: http://dx.doi.org/10.12775/TRVS.2019.001


Background: Quality control (QC) assessment is the most critical step in the high-throughput RNA-seq data analysis to characterize the in-depth understanding of genome and transcriptome assembling to a given reference genome. It provides not only a quick insight into the RNA-seq data quality to allow early identification of good or bad RNA-seq data samples, but also to verify the alignment QC checks for further essential high-throughput bioinformatics analysis such as, identification of novel genetic variants, differentially expressed genes (DEGs), gene network and metabolic pathways.

Method: After isolation of total RNA from liver (n=15) and pituitary gland (n=15) tissues of young Hereford bulls, the pooled total RNA (n=30) were fragmented using GeneRead rRNA depletion kit (Qiagen, Hilden, Germany) and cDNA library preparation were preformed using ScriptSeqTM v2 RNA-Seq library preparation kit (Epicentre, illumina, USA), followed by high-throughput sequencing of combined liver and pituitary transcriptome using MiSeq reagent kit v2 (illumina, USA) to obtain high quality of paired-end RNA-seq reads of 251 base-pairs (bps). In this paper, the QC assessment of obtained RNA-seq raw data as well as post-alignment QC of processed RNA-seq data of combined liver and pituitary transcriptome (n=30) of Hereford bulls were performed using the strand NGS software v1.3 (Agilent; http://www.strand-ngs.com/) data analysis package. The reads were aligned with Bowtie using default settings against both Bull and Cow genome assembly.

Results: Using two runs of MiSeq platform, a total of over 60 million paired-end RNA-seq reads were successfully obtained and submitted to NCBI SRA resources (https://www.ncbi.nlm.nih.gov/sra?linkname=bioproject_sra_all&from_uid=312148). Library complexity plot results revealed 72.02% of duplicate reads with a low library complexity value of 0.28. The pre-alignment QC analysis of raw RNA-seq data revealed the sequence read lengths ranged from 35-251 bp size with more than 50% of all reads with length over 200bp and 10% of reads below 100bp.

Conclusion: By testing the RNA-seq methodology on Illumina platform, two MiSeq sequencing runs yielded significantly high quality of 30 million sequencing reads per single MiSeq run. Our initial pre-alignment and post-alignment analysis of RNA-seq data analysis revealed that mapping of the Hereford liver and pituitary gland transcriptome to reference Bos taurus genome was successfully performed, however, more than 50% of all reads with length over 200bp were recovered. Therefore, obtained results concludes that liver and pituitary transcriptome sequencing with rRNA depletion method is less effective than mRNA RNA-seq method.


RNA-seq; NGS; quality control; Bos taurus; cattle; liver; pituitary gland; Hereford; transcriptome; strandNGS

Full Text:



Pareek CS, Smoczynski R, Tretyn A. Sequencing technologies and genome sequencing. J Appl Genet. 2011;52: 413–435.

Williams AG, Thomas S, Wyman SK, Holloway AK. RNA-seq Data: Challenges in and Recommendations for Experimental Design and Analysis. Curr Protoc Hum Genet. 2014;83: 1-20.

Chao HP, Chen Y, Takata Y, Tomida MW, Lin K, Kirk JS, Simper MS, Mikulec CD, Rundhaug JE, Fischer SM, Chen T, Tang DG, Lu Y, Shen J. Systematic evaluation of RNA-Seq preparation protocol performance. BMC Genomics. 2019;20: 571.

Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, Szcześniak MW, Gaffney DJ, Elo LL, Zhang X, Mortazavi A. A survey of best practices for RNA-seq data analysis. Genome Biol. 2016; 17: 13.

Xu J, Gong B, Wu L, Thakkar S, Hong H, Tong W. Comprehensive Assessments of RNA-seq by the SEQC Consortium: FDA-Led Efforts Advance Precision Medicine. Pharmaceutics. 2016; 8: E8.

Li W, Richter RA, Jung Y, Zhu Q, Li RW. Web-based bioinformatics workflows for end-to-end RNA-seq data computation and analysis in agricultural animal species. BMC Genomics. 2016; 17: 761.

Kroll KW, Mokaram NE, Pelletier AR, Frankhouser DE, Westphal MS, Stump PA, Stump CL, Bundschuh R, Blachly JS, Yan P. Quality Control for RNA-Seq (QuaCRS): An Integrated Quality Control Pipeline. Cancer Inform. 2014; 13(Suppl 3): 7-14.

Pérez-Rubio P, Lottaz C, Engelmann JC. FastqPuri: high-performance preprocessing of RNA-seq data. BMC Bioinformatics. 2019; 20: 226.

Zhou Q, Su X, Jing G, Chen S, Ning K. RNA-QC-chain: comprehensive and fast quality control for RNA-Seq data. BMC Genomics. 2018;19: 144.

Pandey RV, Pabinger S, Kriegner A, Weinhäusel A. ClinQC: a tool for quality control and cleaning of Sanger and NGS data in clinical research. BMC Bioinformatics. 2016;17: 56.

Lee S, Lee S, Ouellette S, Park WY, Lee EA, Park PJ. NGSCheckMate: software for validating sample identity in next-generation sequencing studies within and across data types. Nucleic Acids Res. 2017;45: e103.

Sheng Q, Vickers K, Zhao S, et al. Multi-perspective quality control of Illumina RNA sequencing data analysis. Brief Funct Genomics. 2017;16: 194–204.

Wingett SW, Andrews S. FastQ Screen: A tool for multi-genome mapping and quality control. F1000Res. 2018;7 :1338

Patel RK, Jain M. NGS QC Toolkit: a toolkit for quality control of next generation sequencing data. PLoSOne 2012;7: e30619.

SEQC/MAQC-III Consortium. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat Biotechnol. 2014;32: 903-914.

Xu J, Gong B, Wu L, Thakkar S, Hong H, Tong W. Comprehensive Assessments of RNA-seq by the SEQC Consortium: FDA-Led Efforts Advance Precision Medicine. Pharmaceutics. 2016;8: pii E8.

Chomczynski P, Sacchi N. Single-step method of RNA isolation by acid guanidinium thiocyanate-phenol-chloroform extraction. Anal Biochem. 1987;162: 156–159.

Liu Y, Schroder J, Schmidt B. Musket: a multistage k-mer spectrum-based error corrector for Illumina sequence data. Bioinformatics. 2013;29: 308–15.

Ilie L, Fazayeli F, Ilie S. HiTEC: accurate error correction in high-throughput sequencing data. Bioinformatics 2011;27: 295–302.

Schroder J, Schroder H, Puglisi SJ, etal. SHREC: a short-read error correction method. Bioinformatics 2009;25: 2157–63.

Pareek CS, Smoczyński R, Kadarmideen HN, Dziuba P, Błaszczyk P, Sikora M, Walendzik P, Grzybowski T, Pierzchała M, Horbańczuk J, Szostak A, Ogluszka M, Zwierzchowski L, Czarnik U, Fraser L, Sobiech P, Wąsowicz K, Gelfand B, Feng Y, Kumar D. Single Nucleotide Polymorphism Discovery in Bovine Pituitary Gland Using RNA-Seq Technology. PLoS One. 2016;11: e0161370.

Pareek CS, Błaszczyk P, Dziuba P, Czarnik U, Fraser L, Sobiech P, Pierzchała M, Feng Y, Kadarmideen HN, Kumar D. Single nucleotide polymorphism discovery in bovine liver using RNA-seq technology. PLoS One. 2017;12: e0172687.

Wysocka D, Sobiech P, Herudzińska M, Sachajko M, Pareek CS. Investigation of candidate genes for metabolic disorders expressed in liver and pituitary gland by comparing the RNA-seq data of Polish-HF and Polish-Red cattle. Trans Res Vet Sci. 2018;1: 69–83.

Pareek CS, Sachajko M, Jaskowski JM, Herudzinska M, Skowronski M, Domagalski K, Szczepanek J, Czarnik U, Sobiech P, Wysocka D, Pierzchala M, Polawska E, Stepanow K, Ogłuszka M, Juszczuk-Kubiak E, Feng Y, Kumar D. Comparative Analysis of the Liver Transcriptome among Cattle Breeds Using RNA-seq. Vet Sci. 2019;6: 36.


  • There are currently no refbacks.

Partnerzy platformy czasopism