| Peer-Reviewed

Discovering Escherichia coli K-12 Promoter Features Using Convolutional Neural Network

Received: 24 May 2020     Accepted: 8 June 2020     Published: 20 June 2020
Views:       Downloads:
Abstract

The mechanism of prokaryotic gene expression remains incompletely understood. Promoters are regions in genome that locating upstream to genes and regulate of gene expressions. Despite more and more E. coli K-12 promoter sequences have been obtained experimentally, and some regions such as -10 region and -30 region have been described, the features in promoter sequences are far from explicitly characterized. Here, we address this challenge using an approach based on the deep convolutional neural network (CNN). We collected six classes of E. coli K-12 promoter sequences which are all annotated as with strong evidence and belong to only one promoter class in RegulonDB database. Then, we applied the CNN model to recognize the six classes of promoters. The CNN model achieved an accuracy of above 97% for all six classes of promoters. Next, we extracted the weight matrix of the last convolution layer in CNN with the Grad-Cam algorithm, and convert the weight matrix to an information content matrix. Finally, we visualized the information content matrix as promoter logos using the logomaker tool and discover the promoter features in six classes of promoters. Our approach could not only find the previous described promoter feature regions, but could also discover promoter features with better sensitivity and accuracy. We provide a novel computational approach to discover features in biological sequences.

Published in Computational Biology and Bioinformatics (Volume 8, Issue 1)
DOI 10.11648/j.cbb.20200801.13
Page(s) 15-19
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2020. Published by Science Publishing Group

Keywords

Convolution Neural Network (CNN), Promoter, Biological Sequence, Features

References
[1] He W, Jia C, Duan Y, et al. 70ProPred: a predictor for discovering sigma70 promoters based on combining multiple features. [J] BMC Systems Biology, 2018, 12 (4): 44.
[2] Barrios H, Valderrama B, Morett E. Compilation and analysis of sigma (54)-dependent promoter sequences. [J] Nucleic Acids Research, 1999, 27 (22): 4305-4313.
[3] Gruber TM, Gross CA. Multiple sigma subunits and the partitioning of bacterial transcription space. [J] Annual Review of Microbiology, 2003, 57: 441–66.
[4] Kang JG, Hahn MY, Ishihama A, Roe JH. Identification of sigma factors for growth phase-related promoter selectivity of RNA polymerases from Streptomyces coelicolor A3 (2). [J] Nucleic Acids Research, 1997, 25 (13): 2566-73.
[5] Santos-Zavaleta A, Salgado H, Gama-Castro S, et al. RegulonDB v 10.5: tackling challenges to unify classic and high throughput knowledge of gene regulation in E. coli K-12. [J] Nucleic acids research, 2019, 47: D212-D220.
[6] Lecun Y L, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition. [J] Proceedings of the IEEE, 1998, 86 (11): 2278-2324.
[7] Lecun Y, Boser B, Denker J, et al. Backpropagation Applied to Handwritten Zip Code Recognition. [J] Neural Computation, 2014, 1 (4): 541-551.
[8] Alipanahi B, Delong A, Weirauch MT, et al. Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. [J] Nature biotechnology, 2015, 33, 831.
[9] Zhou J, Troyanskaya OG. Predicting effects of noncoding variants with deep learning–based sequence model. [J] Nature methods, 2015, 12: 931.
[10] Kelley DR, Snoek J, Rinn JL. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. [J] Genome research, 2016, 26: 990-999.
[11] Eraslan G, Avsec Ž, Gagneur J, et al. Deep learning: new computational modelling techniques for genomics. [J] Nature Reviews Genetics, 2019, 20: 389-403.
[12] Gershenzon NI, Stormo GD, Ioshikhes IP. Computational technique for improvement of the position-weight matrices for the DNA/protein binding sites. [J] Nucleic Acids Research, 2005, 33 (7): 2290-301.
[13] Zhang L, Luo L. Splice site prediction with quadratic discriminant analysis using diversity measure. [J] Nucleic Acids Research, 2003, 31 (21): 6214-6220.
[14] Drioli S, Felluga F, Forzato C, et al. The recognition and prediction of σ 70, promoters in Escherichia coli K-12. [J] Journal of Theoretical Biology, 2006, 242 (1): 135.
[15] Gordon JJ, Towsey MW, Hogan JM, et al. Improved prediction of bacterial transcription start sites. [J] Bioinformatics, 2006, 22 (2): 142-148.
[16] Wang L, Wan P. Prediction of Escherichia Coli K-12 Promoters Using Convolutional Neural Network. [J] Computational Biology and Bioinformatics, 2018, 6: 2.
[17] Selvaraju RR, Cogswell M, Das A, et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. arXiv: 1610.02391, 2019, DOI: 10.1007/s11263-019-01228-7.
[18] Tareen A, Kinney JB. Logomaker: Beautiful sequence logos in python. [J] Bioinformatics, 2020, 36 (7): 2272–2274.
[19] Crooks GE, Hon G, Chandonia JM, et al. WebLogo: a sequence logo generator. [J] Genome research, 2004, 14: 1188-1190.
Cite This Article
  • APA Style

    Mengmeng Zhang, Lu Wang, Ping Wan. (2020). Discovering Escherichia coli K-12 Promoter Features Using Convolutional Neural Network. Computational Biology and Bioinformatics, 8(1), 15-19. https://doi.org/10.11648/j.cbb.20200801.13

    Copy | Download

    ACS Style

    Mengmeng Zhang; Lu Wang; Ping Wan. Discovering Escherichia coli K-12 Promoter Features Using Convolutional Neural Network. Comput. Biol. Bioinform. 2020, 8(1), 15-19. doi: 10.11648/j.cbb.20200801.13

    Copy | Download

    AMA Style

    Mengmeng Zhang, Lu Wang, Ping Wan. Discovering Escherichia coli K-12 Promoter Features Using Convolutional Neural Network. Comput Biol Bioinform. 2020;8(1):15-19. doi: 10.11648/j.cbb.20200801.13

    Copy | Download

  • @article{10.11648/j.cbb.20200801.13,
      author = {Mengmeng Zhang and Lu Wang and Ping Wan},
      title = {Discovering Escherichia coli K-12 Promoter Features Using Convolutional Neural Network},
      journal = {Computational Biology and Bioinformatics},
      volume = {8},
      number = {1},
      pages = {15-19},
      doi = {10.11648/j.cbb.20200801.13},
      url = {https://doi.org/10.11648/j.cbb.20200801.13},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.cbb.20200801.13},
      abstract = {The mechanism of prokaryotic gene expression remains incompletely understood. Promoters are regions in genome that locating upstream to genes and regulate of gene expressions. Despite more and more E. coli K-12 promoter sequences have been obtained experimentally, and some regions such as -10 region and -30 region have been described, the features in promoter sequences are far from explicitly characterized. Here, we address this challenge using an approach based on the deep convolutional neural network (CNN). We collected six classes of E. coli K-12 promoter sequences which are all annotated as with strong evidence and belong to only one promoter class in RegulonDB database. Then, we applied the CNN model to recognize the six classes of promoters. The CNN model achieved an accuracy of above 97% for all six classes of promoters. Next, we extracted the weight matrix of the last convolution layer in CNN with the Grad-Cam algorithm, and convert the weight matrix to an information content matrix. Finally, we visualized the information content matrix as promoter logos using the logomaker tool and discover the promoter features in six classes of promoters. Our approach could not only find the previous described promoter feature regions, but could also discover promoter features with better sensitivity and accuracy. We provide a novel computational approach to discover features in biological sequences.},
     year = {2020}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Discovering Escherichia coli K-12 Promoter Features Using Convolutional Neural Network
    AU  - Mengmeng Zhang
    AU  - Lu Wang
    AU  - Ping Wan
    Y1  - 2020/06/20
    PY  - 2020
    N1  - https://doi.org/10.11648/j.cbb.20200801.13
    DO  - 10.11648/j.cbb.20200801.13
    T2  - Computational Biology and Bioinformatics
    JF  - Computational Biology and Bioinformatics
    JO  - Computational Biology and Bioinformatics
    SP  - 15
    EP  - 19
    PB  - Science Publishing Group
    SN  - 2330-8281
    UR  - https://doi.org/10.11648/j.cbb.20200801.13
    AB  - The mechanism of prokaryotic gene expression remains incompletely understood. Promoters are regions in genome that locating upstream to genes and regulate of gene expressions. Despite more and more E. coli K-12 promoter sequences have been obtained experimentally, and some regions such as -10 region and -30 region have been described, the features in promoter sequences are far from explicitly characterized. Here, we address this challenge using an approach based on the deep convolutional neural network (CNN). We collected six classes of E. coli K-12 promoter sequences which are all annotated as with strong evidence and belong to only one promoter class in RegulonDB database. Then, we applied the CNN model to recognize the six classes of promoters. The CNN model achieved an accuracy of above 97% for all six classes of promoters. Next, we extracted the weight matrix of the last convolution layer in CNN with the Grad-Cam algorithm, and convert the weight matrix to an information content matrix. Finally, we visualized the information content matrix as promoter logos using the logomaker tool and discover the promoter features in six classes of promoters. Our approach could not only find the previous described promoter feature regions, but could also discover promoter features with better sensitivity and accuracy. We provide a novel computational approach to discover features in biological sequences.
    VL  - 8
    IS  - 1
    ER  - 

    Copy | Download

Author Information
  • College of Life Sciences, Capital Normal University, Beijing, China

  • College of Life Sciences, Capital Normal University, Beijing, China

  • College of Life Sciences, Capital Normal University, Beijing, China

  • Sections