Preview

Modern transcriptome data processing algorithms: a review of methods and results of approbation

https://doi.org/10.21122/2309-4923-2021-2-54-62

Abstract

Analysis of bioinformatics data is an actual problem in modern computational biology and applied mathematics. With the development of biotechnology and tools for obtaining and processing such information, unresolved issues of the development and application of new algorithms and software have emerged.

Authors propose practical algorithms and methods for processing transcriptomic data for efficient results of annotation, visualization and interpretation of bioinformatics data.

About the Authors

M. V. Sprindzuk
United Institute of Informatics Problems of the NAS of Belarus
Belarus
Minsk


L. P. Titov
Republican Research and Practical Center for Epidemiology and Microbiology
Belarus
Minsk


A. P. Konchits
Forest Institute of the National Academy of Sciences of Belarus
Belarus
Gomel


L. V. Mozharovskaya
Forest Institute of the National Academy of Sciences of Belarus
Belarus
Gomel


References

1. Conesa, A., Madrigal, P., Tarazona, S., Gomez-Cabrero, D., Cervera, A. et al. A survey of best practices for RNA-seq data analysis // Genome biology. – 2016. – V. 17, № 1. – P. 13.

2. Eldem, V., Zararsiz, G., Taşçi, T., Duru, I.P., Bakir, Y. et al. Transcriptome Analysis for Non-Model Organism: Current Status and Best-Practices // Applications of RNA-Seq and Omics Strategies-From Microorganisms to Human Health. – 2017. – V. 1, № 2. – P. 1–19.

3. Liu, X., Li, N., Liu, S., Wang, J., Zhang, N. et al. Normalization Methods for the Analysis of Unbalanced Transcriptome Data: A Review // Front Bioeng Biotechnol. – 2019. – V. 7, – P. 358.

4. Mutz, K. O., Heilkenbrinker, A., Lönne, M., Walter, J.-G., Stahl, F. Transcriptome analysis using next-generation sequencing // Current opinion in biotechnology. – 2013. – V. 24, № 1. – P. 22–30.

5. Можаровская, Л. В. Идентификация и функциональная аннотация патоген-индуцированных генов проростков сосны обыкновенной / Л. В. Можаровская, С. В. Пантелеев, О. Ю. Баранов, В. Е. Падутов // Молекулярная и прикладная генетика: сб. науч. тр. / Институт генетики и цитологии НАН Беларуси; редкол.: А. В. Кильчевский (гл. ред.) [и др.]. – Минск: Институт генетики и цитологии НАН Беларуси, 2019. – Т. 26. – С. 69–78.

6. Можаровская, Л. В. Сравнительный анализ транскрипционных профилей проростков сосны обыкновенной (Pinus sylvestris L.) различающихся температурными условиями выращивания / Л. В. Можаровская // Проблемы лесоведения и лесоводства: Сб. науч. Трудов ИЛ НАН Беларуси. – Вып. 78. – Гомель: ИЛ НАН Беларуси, 2018. – С. 70–78.

7. Можаровская Л. В., Пантелеев С. В., Разумова О. А., Баранов О. Ю. Выявление сайтов редактирования мРНК в хлоропластном геноме сосны обыкновенной (Pinus sylvestris L.) Сборник научных трудов [Институт леса Национальной академии наук Беларуси]/ Национальная академия наук Беларуси, Институт леса. – Гомель, 2019. – Вып. 79: Проблемы лесоведения и лесоводства. – С. 54–61

8. Кирьянов П. С., Баранов О. Ю., Падутов В. Е. Выявление генетических особенностей среди форм березы повислой, различающихся по признаку узорчатости древесины // Лесное хозяйство: материалы 84-й науч.-техн. конференции профессорско-преподавательского состава, научных сотрудников и аспирантов (с международным участием), Минск, 03–14 февраля 2020 г. / отв. за издание И. В. Войтов; УО БГТУ. – Минск: БГТУ, 2020. – С. 106–107.

9. Падутов В. Е., Третьякова И. Н., Можаровская Л. В. Константинов А. В., Кулагин Д. В., Кусенкова М. П. Сравнительный анализ транскрипционных профилей каллусных культур лиственницы сибирской с различным эмбриогенным потенциалом // Лесное хозяйство: материалы 84-й науч.-техн. конференции профессорскопреподавательского состава, научных сотрудников и аспирантов (с международным участием), Минск, 03–14 февраля 2020 г. / отв. за издание И. В. Войтов; УО БГТУ. – Минск: БГТУ, 2020. – С. 131.

10. Wang Z., Gerstein M., Snyder M. RNA-Seq: a revolutionary tool for transcriptomics // Nature reviews genetics. – 2009. – V. 10. – № . 1. – P. 57–63.

11. Haas, B. J., Papanicolaou, A., Yassour, M., Grabherr, M., Blood, P.D. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis // Nat Protoc. – 2013. – V. 8, № 8. – P. 1494–512.

12. Wang, Y., Sun, M.-a. Transcriptome Data Analysis: Methods and Protocols. Springer, 2018.

13. [Электронный ресурс] – Режим доступа: http://bioinformaticsinstitute.ru/sites/default/files/07–28–04-kasyanov.pdf. – Дата доступа: 04.09.2020.

14. Касьянов А. С. Новые методы обработки данных, полученных с помощью современных технологий секвенирования, для решения задач анализа экспрессии генов: автореф. дисс. канд. физ.-мат. наук. – 2012.

15. Водясова Е. А., Челебиева Э. С., Кулешова О. Н. Новейшие технологии высокопроизводительного секвенирования транскриптома отдельных клеток //Вавиловский журнал генетики и селекции. – 2019. – Т. 23. – № . 5. – С. 508– 518.; Акберова Н. И. Анализ данных секвенирования транскриптома и метаболома: учебно-методическое пособие. – 2014. – 26 с.

16. Ewing B., Green P. Base-calling of automated sequencer traces using phred. II. Error probabilities // Genome research. – 1998. – V. 8. – № . 3. – P. 186–194.

17. Brown, J., Pirrung, M., McCue, L.A. FQC Dashboard: integrates FastQC results into a web-based, interactive, and extensible FASTQ quality control tool // Bioinformatics. – 2017. – V. 1, № 1. – P. 1–9.

18. Dai, M., Thompson, R. C., Maher, C., Contreras-Galindo, R., Kaplan, M.H. et al. NGSQC: cross-platform quality analysis pipeline for deep sequencing data // BMC Genomics. – 2010. – V. 11 Suppl 4, – P. S7.

19. Романенков К. В. Метод оценки качества сборки генома на основе частот k-меров // Препринты ИПМ им. М. В. Келдыша. 2017. № 11. 24 с. doi:10.20948/prepr-2017-11.

20. Giannoulatou, E., Park, S. H., Humphreys, D. T., Ho, J. W. Verification and validation of bioinformatics software without a gold standard: a case study of BWA and Bowtie // BMC Bioinformatics. – 2014. – V. 15 Suppl 16, – P. S15.

21. Langdon, W. B. Performance of genetic programming optimised Bowtie2 on genome comparison and analytic testing (GCAT) benchmarks // BioData Min. – 2015. – V. 8, № 1. – P. 1.

22. Lu, R., Zhang, J., Liu, D., Wei, Y. L., Wang, Y. et al. Characterization of bHLH/HLH genes that are involved in brassinosteroid (BR) signaling in fiber development of cotton (Gossypium hirsutum) // BMC Plant Biol. – 2018. – V. 18, № 1. – P. 304.

23. Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions // Genome Biol. – 2013. – V. 14, № 4. – P. R36.

24. Bankevich, A., Nurk, S., Antipov, D., Gurevich, A.A., Dvorkin, M. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing // J Comput Biol. – 2012. – V. 19, № 5. – P. 455–77.

25. Bankar, K. G., Todur, V. N., Shukla, R. N., Vasudevan, M. Ameliorated de novo transcriptome assembly using Illumina paired end sequence data with Trinity Assembler // Genom Data. – 2015. – V. 5, – P. 352–9.

26. Cabau, C., Escudie, F., Djari, A., Guiguen, Y., Bobe, J. et al. Compacting and correcting Trinity and Oases RNA-Seq de novo assemblies // PeerJ. – 2017. – V. 5, – P. e2988.

27. Haas, B. J., Papanicolaou, A., Yassour, M., Grabherr, M., Blood, P.D. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis // Nat Protoc. – 2013. – V. 8, № 8. – P. 1494–512.

28. Kim, C. S., Winn, M. D., Sachdeva, V., Jordan, K. E. K-mer clustering algorithm using a MapReduce framework: application to the parallelization of the Inchworm module of Trinity // BMC Bioinformatics. – 2017. – V. 18, № 1. – P. 467.

29. Cabau, C., Escudie, F., Djari, A., Guiguen, Y., Bobe, J. et al. Compacting and correcting Trinity and Oases RNA-Seq de novo assemblies // PeerJ. – 2017. – V. 5, – P. e2988.

30. Schulz, M. H., Zerbino, D. R., Vingron, M., Birney, E. Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels // Bioinformatics. – 2012. – V. 28, № 8. – P. 1086–92.

31. Birol, I., Jackman, S. D., Nielsen, C. B., Qian, J. Q., Varhol, R. et al. De novo transcriptome assembly with ABySS // Bioinformatics. – 2009. – V. 25, № 21. – P. 2872–7.

32. Jackman, S. D., Vandervalk, B. P., Mohamadi, H., Chu, J., Yeo, S. et al. ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter // Genome Res. – 2017. – V. 27, № 5. – P. 768–777.

33. Simpson, J. T., Wong, K., Jackman, S. D., Schein, J. E., Jones, S. J. et al. ABySS: a parallel assembler for short read sequence data // Genome Res. – 2009. – V. 19, № 6. – P. 1117–23.

34. Boerner, S., McGinnis, K. M. Computational Analysis of LncRNA from cDNA Sequences // Methods In Molecular Biology (Clifton, N.J.). – 2016. – V. 1402, – P. 255–269.

35. Ge, S., Jung, D. ShinyGO: a graphical enrichment tool for animals and plants. 2018.

36. Zhang C. et al. Evaluation and comparison of computational tools for RNA-seq isoform quantification //BMC genomics. – 2017. – V. 18. – № . 1. – P. 583.

37. Пантелеев, С. В. Молекулярно-генетическая диагностика инфекционных агентов побегов сосны обыкновенной с признаками «ведьминых метел» / С. В. Пантелеев, О. Ю. Баранов, И. Э. Рубель // Сб. науч. тр. / НАН Беларуси, Институт леса. – Гомель, 2016. – Вып. 76: Проблемы лесоведения и лесоводства. – С. 242–249.

38. Kremer, F. S., Eslabao, M. R., Dellagostin, O.A., Pinto, L. D. Genix: a new online automated pipeline for bacterial genome annotation // FEMS Microbiol Lett. – 2016. – V. 363, № 23.

39. T. W., Gan, R. C., Wu, T. H., Huang, P. J., Lee, C. Y. et al. FastAnnotator – an efficient transcript annotation web tool // BMC Genomics. – 2012. – V. 13 Suppl 7, – P. S9.

40. Huerta-Cepas, J., Szklarczyk, D., Forslund, K., Cook, H., Heller, D. et al. eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences // Nucleic Acids Research. – 2016. – V. 44, № D1. – P. D286-D293.

41. Van Bel, M., Proost, S., Van Neste, C., Deforce, D., Van de Peer, Y. et al. TRAPID: an efficient online tool for the functional and comparative analysis of de novo RNA-Seq transcriptomes // Genome Biol. – 2013. – V. 14, № 12. – P. R134.

42. Jones, P., Binns, D., Chang, H. Y., Fraser, M., Li, W. et al. InterProScan 5: genome-scale protein function classification // Bioinformatics. – 2014. – V. 30, № 9. – P. 1236–40.

43. Kelly, R. J., Vincent, D. E., Friedberg, I. IPRStats: visualization of the functional potential of an InterProScan run // BMC Bioinformatics. – 2010. – V. 11 Suppl 12. – P. S13.

44. Mulder, N., Apweiler, R. InterPro and InterProScan: tools for protein sequence classification and comparison // Methods Mol Biol. – 2007. – V. 396, – P. 59–70.

45. Quevillon, E., Silventoinen, V., Pillai, S., Harte, N., Mulder, N. et al. InterProScan: protein domains identifier // Nucleic Acids Research. – 2005. – V. 33, № Web Server issue. – P. W116–20.

46. Syed, A., Upton, C. Java GUI for InterProScan (JIPS): a tool to help process multiple InterProScans and perform ortholog analysis // BMC Bioinformatics. – 2006. – V. 7, – P. 462.

47. Zdobnov, E. M., Apweiler, R. InterProScan – an integration platform for the signature-recognition methods in InterPro // Bioinformatics. – 2001. – V. 17, № 9. – P. 847–8.

48. Kremer, F. S., McBride, A. J.A., Pinto, L. d. S. Approaches for in silico finishing of microbial genome sequences // Genetics and molecular biology. – 2017. – V. 40, № 3. – P. 553–576.

49. Abbas, Q., Raza, S. M., Biyabani, A.A., Jaffar, M.A. A review of computational methods for finding non-coding RNA genes // Genes. – 2016. – V. 7, № 12. – P. 113.

50. Abernathy, J., Overturf, K. Expression of Antisense Long Noncoding RNAs as Potential Regulators in Rainbow Trout with Different Tolerance to Plant-Based Diets // Animal Biotechnology. – 2017. – V. 2, № 1. – P. 1–8.

51. Andreia, S. R., Inês, C., Bruno Vasques, C., Yao-Cheng, L., Susana, L. et al. Small RNA profiling in Pinus pinaster reveals the transcriptome of developing seeds and highlights differences between zygotic and somatic embryos // Scientific Reports. – 2019. – № 1. – P. 1.

52. Babarinde, I.A., Li, Y., Hutchins, A. P. Computational methods for mapping, assembly and quantification for coding and non-coding transcripts // Computational and structural biotechnology journal. – 2019. – V. 1, № 1. – P. 2–14.

53. Bai, Y., Dai, X., Harrison, A. P., Chen, M. RNA regulatory networks in animals and plants: A long noncoding RNA perspective // Briefings In Functional Genomics. – 2015. – V. 14, № 2. – P. 91–101.

54. Boerner, S., McGinnis, K. M. Computational Analysis of LncRNA from cDNA Sequences // Methods In Molecular Biology (Clifton, N.J.). – 2016. – V. 1402, – P. 255–269.

55. Chaturvedi, S., Rao, A. L. N. Riboproteomics: A versatile approach for the identification of host protein interaction network in plant pathogenic noncoding RNAs // PLoS ONE. – 2017. – V. 12, № 10.

56. Chaves, I., Costa, B. V., Rodrigues, A. S., Bohn, A., Miguel, C. M. miRPursuit-a pipeline for automated analyses of small RNAs in model and nonmodel plants // FEBS Letters. – 2017. – V. 591, № 15. – P. 2261–2268.

57. Collemare, J., O’Connell, R., Lebrun, M. H. Nonproteinaceous effectors: the terra incognita of plant–fungal interactions // New Phytologist. – 2019. – V. 223, № 2. – P. 590–596.

58. Dhiman, H., Kapoor, S., Sivadas, A., Sivasubbu, S., Scaria, V. zflncRNApedia: A Comprehensive Online Resource for Zebrafish Long Non-Coding RNAs // PLoS ONE. – 2015. – V. 10, № 6. – P. e0129997-e0129997.

59. Fan, B., Wu, X. Q., Li, L., Chao, Y., Förstner, K. et al. DRNA-seq reveals genomewide TSSs and noncoding RNAs of plant beneficial rhizobacterium Bacillus amyloliquefaciens FZB42 // PLoS ONE. – 2015. – V. 10, № 11.

60. Hao, Z., Fan, C., Cheng, T., Su, Y., Wei, Q. et al. Genome-Wide Identification, Characterization and Evolutionary Analysis of Long Intergenic Noncoding RNAs in Cucumber. 2015.

61. Heera, R., Sivachandran, P., Chinni, S. V., Mason, J., Croft, L. et al. Efficient extraction of small and large RNAs in bacteria for excellent total RNA sequencing and comprehensive transcriptome analysis // BMC Research Notes. – 2015. – V. 8, – P. 1–11.

62. Hu, L., Xu, Z., Hu, B., Lu, Z. J. COME: a robust coding potential calculation tool for lncRNA identification and characterization based on multiple features // Nucleic Acids Research. – 2017. – V. 45, № 1. – P. e2-e2.

63. Matsui, A., Nakaminami, K., Seki, M. m. s. r. j. Biological Function of Changes in RNA Metabolism in Plant Adaptation to Abiotic Stress // Plant & Cell Physiology. – 2019. – V. 60, № 9. – P. 1897–1905.

64. Mingyang, Q., Jinhui, C., Deqiang, Z. Exploring the Secrets of Long Noncoding RNAs // International Journal of Molecular Sciences. – 2015. – V. 16, № 3. – P. 5467–5496.

65. Negri, T. D. C., Bugatti, P. H., Saito, P. T. M., Domingues, D. S., Paschoal, A. R. et al. Pattern recognition analysis on long noncoding RNAs: A tool for prediction in plants // Briefings in Bioinformatics. – 2019. – V. 20, № 2. – P. 682–689.

66. Ortogero, N., Hennig, G. W., Langille, C., Ro, S., Yan, W. et al. Computer-assisted annotation of murine sertoli cell small RNA transcriptome // Biology of Reproduction. – 2013. – V. 88, № 1.

67. Paschoal, A. R., Lozada-Chávez, I., Domingues, D. S., Stadler, P. F. ceRNAs in plants: computational approaches and associated challenges for target mimic research // Briefings in Bioinformatics. – 2018. – V. 19, № 6. – P. 1273–1289.

68. Zongbo, Q., Xiaojuan, L., Yuanyuan, Z., Manman, Z., Yinglang, W. et al. Genome-wide analysis reveals dynamic changes in expression of microRNAs during vascular cambium development in Chinese fir, Cunninghamia lanceolata // Journal of Experimental Botany. – 2015. – V. 66, № 11. – P. 3041–3054.


Review

For citations:


Sprindzuk M.V., Titov L.P., Konchits A.P., Mozharovskaya L.V. Modern transcriptome data processing algorithms: a review of methods and results of approbation. «System analysis and applied information science». 2021;(2):54-62. (In Russ.) https://doi.org/10.21122/2309-4923-2021-2-54-62

Views: 725


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2309-4923 (Print)
ISSN 2414-0481 (Online)