Progress of bio-sequence modifications in Machine Learning

01.

Post-translational
modification in protein


Protein post-translational modifications including fifteen post-translational modifications, such as Phosphorylation, Acetylation, Methylation, Glycosylation, Pupylation, Palmitoylation, S-sulfenylation, and S-nitrosylation etc.

02.

Post-transcriptional
modification in RNA


RNA post-transcriptional modifications mainly including N6-methyladenosine, N1-methyladenosine, 5-methylcytosine, Pseudouridine, Adenosine-to-inosine (A-to-I), 2’-O-methylationation, and 5-hydroxymethylcytosine etc.

03.

Post-replication
modification in DNA


DNA post-replication modifications mainly including N4-methylcytosine, N6-methyladenin and 5-methylcytosine etal.

References:

  1. Ruheng Wang, Yi Jiang, Junru Jin, Chenglin Yin, Haoqing Yu, Fengsheng Wang, Jiuxin Feng, Ran Su, Kenta Nakai, Quan Zou*, Leyi Wei*. DeepBIO: An automated and interpretable deep-learning platform for high-throughput biological sequence prediction, functional annotation, and visualization analysis. Nucleic Acids Research. 2023, 51(7): 3017-3029.
  2. Zhibin Lv, Mingxin Li, Yansu Wang, Quan Zou*. Editorial: Machine Learning For Biological Sequence Analysis. Frontiers in Genetics. 2023, 14: 1150688.
  3. Chunyan Ao, Shihu Jiao, Yansu Wang, Liang Yu*, Quan Zou*. Biological sequence classification: a review on data and general methods. Research. 2022, 2022: 0011.
  4. Chunyan Ao, Liang Yu*, Quan Zou*. Prediction of bio-sequence modifications and the associations with diseases. Briefings in Functional Genomics. 2021, 20(1): 1-18.

DNA modification reference:

  1. Yijie Ding, Prayag Tiwari*, Fei Guo*, Quan Zou*. Multi-Correntropy Fusion based Fuzzy System for Predicting DNA N4-methylcytosine Sites. Information Fusion. 2023, 100: 101911.
  2. Yijie Ding, Wenying He, Jijun Tang, Quan Zou*, Fei Guo*. Laplacian Regularized Sparse Representation based Classifier for Identifying DNA N4-methylcytosine Sites via L2,1/2-matrix Norm. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2023, 20(1): 500-511.
  3. Yijie Ding, Prayag Tiwari*, Quan Zou*, Fei Guo*, Hari Mohan Pandey*. C-loss based Higher-order Fuzzy Inference Systems for Identifying DNA N4-methylcytosine Sites. IEEE Transactions on Fuzzy Systems. 2022, 30(11): 4754-4765..
  4. Junru Jin, Yingying Yu, Ruheng Wang, Xin Zeng, Chao Pang, Yi Jiang, Zhongshen Li, Yutong Dai, Ran Su, Quan Zou, Kenta Nakai*, Leyi Wei*. iDNA-ABF: multi-scale deep biological language learning model for the interpretable prediction of DNA methylations. Genome Biology. 2022, 23: 219.
  5. Haoyu Zhang, Quan Zou, Ying Ju, Chenggang Song, Dong Chen*. Distance-based Support Vector Machine to Predict DNA N6-methyladine Modification. Current Bioinformatics. 2022, 17(5): 473-482.
  6. Mobeen Ur Rehman, Hilal Tayara, Quan Zou*, Kil To Chong*. i6mA-Caps: A CapsuleNet-based framework for identifying DNA N6-methyladenine sites. Bioinformatics. 2022, 38(16): 3885-3891.
  7. Jhabindra Khanal, Hilal Tayar, Quan Zou*, Kil To Chong*. Identifying DNA N4-methylcytosine Sites in the Rosaceae Genome with a Deep Learning Model relying on Distributed Feature Representation. Computational and Structural Biotechnology Journal. 2021, 19: 1612-1619.
  8. Zhibin Lv, Hui Ding, Lei Wang, Quan Zou*. A Convolutional Neural Network Using Dinucleotide One-hot Encoder for identifying DNA N6-Methyladenine Sites in the Rice Genome. Neurocomputing. 2021,422: 214-221.
  9. Qianfei Huang, Jun Zhang, Leyi Wei, Fei Guo*, Quan Zou*. 6mA-RicePred: A method for identifying DNA N6-methyladenine sites in the rice genome based on feature fusion. Frontiers in Plant Science. 2020, 11: 4.
  10. Leyi Wei, Shasha Luan, Luis Eijy Nagai, Ran Su*, Quan Zou*. Exploring sequence-based features for the improved prediction of DNA N4-methylcytosine sites in multiple species. Bioinformatics. 2019, 35(8): 1326-1333.
  11. Wenying He, Cangzhi Jia*, Quan Zou*. 4mCPred: Machine Learning Methods for DNA N4-methylcytosine sites Prediction. Bioinformatics. 2019, 35(4): 593-601.

RNA modification reference:

  1. Zeeshan Abbas, Mobeen ur Rehman, Hilal Tayara, Quan Zou*, Kil To Chong*. XGBoost Framework with Feature Selection for the Prediction of RNA N5-methylcytosine sites. Molecular Therapy. 2023, 31(8): 2543-2551.
  2. Chunyan Ao, Xiucai Ye, Tetsuya Sakurai, Quan Zou*, Liang Yu*. m5U-SVM: Identification of RNA 5-methyluridine modification sites based on multi-view features of physicochemical features and distributed representation. BMC Biology. 2023, 21: 93.
  3. Chunyan Ao, Quan Zou*, Liang Yu*. NmRF: Identification of multispecies RNA 2'-O-methylation modification sites from RNA sequences. Briefings in Bioinformatics. 2022, 23(1): bbab480.
  4. Chao Wang, Ying Ju, Quan Zou*, Chen Lin*. DeepAc4C: A convolutional neural network model with hybrid features composed of physico-chemical patterns and distributed representation information for identification of N4 acetylcytidine in mRNA. Bioinformatics. 2022, 38(1): 52-57.
  5. Chunyan Ao, Quan Zou, Liang Yu*. RFhy-m2G: Identification of RNA N2-methylguanosine Modification Sites Based on Random Forest and Hybrid Features. Methods. 2022, 203: 32-39.
  6. Juntao Chen, Quan Zou*, Jing Li. DeepM6ASeq-EL: Prediction of Human N6-Methyladenosine (m6A) Sites with LSTM and Ensemble Learning. Frontiers of Computer Science. 2022, 16(2): 162302.
  7. Di Chai, Cangzhi Jia*, Jia Zheng, Quan Zou, Fuyi Li*. Staem5: a novel computational approach for accurate prediction of m5C site. Molecular Therapy: Nucleic Acid. 2021, 26: 1027-1034.
  8. Zeeshan Abbas, Hilal Tayara, Quan Zou*, Kil To Chong*. TS-m6A-DL: Tissue-specific identification of N6-methyladenosine sites using a universal deep learning model. Computational and Structural Biotechnology Journal. 2021, 19: 4619-4625.
  9. Jing Li, Shida He, Fei Guo*, Quan Zou*. HSM6AP: A high-precision predictor for the Homo sapiens N6-methyladenosine (m6A) based on multiple weights and feature stitching. RNA Biology. 2021, 18(11): 1882-1892.
  10. Zhibin Lv, Jun Zhang*, Hui Ding, Quan Zou*. RF-PseU: A Random Forest Predictor for RNA Pseudouridine Sites. Frontiers in Bioengineering and Biotechnology. 2020, 8: 134.
  11. Leyi Wei, Ran Su, Bing Wang, Xiuting Li, Quan Zou*, Xing Gao*. Integration of deep feature representations and handcrafted features to improve the prediction of N6-methyladenosine sites. Neurocomputing. 2019, 324: 3-9.
  12. Quan Zou*, Pengwei Xing, Leyi Wei, Bin Liu*. Gene2vec: Gene Subsequence Embedding for Prediction of Mammalian N6‐Methyladenosine Sites from mRNA. RNA. 2019, 25(2): 205-218.
  13. Wei Chen, Pengwei Xing, Quan Zou*. Detecting N6-methyladenosine sites from RNA transcriptomes using ensemble Support Vector Machines. Scientific Reports. 2017, 7: 40242.

Protein modification reference:

  1. Shihu Jiao, Xiucai Ye*, Chunyan Ao, Tetsuya Sakurai, Quan Zou, Lei Xu*. Adaptive learning embedding features to improve the predictive performance of SARS-CoV-2 phosphorylation sites. Bioinformatics. 2023, 39(11): btad627.
  2. Lijun Dou, Zilong Zhang, Lei Xu*, Quan Zou*. iKcr_CNN: a novel computational tool for imbalance classification of human nonhistone crotonylation sites based on convolutional neural networks with focal loss. Computational and Structural Biotechnology Journal. 2022, 20: 3268-3279.
  3. Jhabindra Khanal, Hilal Tayara, Quan Zou*, Kil To Chong*. DeepCap-Kcr: accurate identification and investigation of protein lysine crotonylation sites based on capsule network. Briefings in Bioinformatics. 2022, 23(1): bbab492.
  4. Lijun Dou, Fenglong Yang, Lei Xu*, Quan Zou*. A Comprehensive Review of the Imbalance Classification of Protein Post-Translational Modifications. Briefings in Bioinformatics. 2021, 22(5): bbab089.
  5. Yujia Xiang, Quan Zou*, Lilin Zhao*. VPTMdb: a viral post-translational modification database. Briefings in Bioinformatics. 2021, 22(4): bbaa251.
  6. Yun Zuo, Jianyuan Lin, Xiangxiang Zeng*, Quan Zou, Xiangrong Liu*. CarSite-II: an integrated classification algorithm for identifying carbonylated sites based on K-means similarity based undersampling and synthetic minority oversampling techniques. BMC Bioinformatics. 2021, 22: 216.
  7. Chunyan Ao, Shunshan Jin, Yuan Lin*, Quan Zou*. Review of Progress in Predicting Protein Methylation Sites. Current Organic Chemistry. 2019, 23(15): 1663-1670.
  8. Quan Zou, Chi-Wei Chen, Hao-Chen Chang, Yen-Wei Chu. Identifying Cleavage Sites of Gelatinases A and B by Integrating Feature Computing Models. Journal of Universal Computer Science. 2018, 24(6): 711-724.
  9. Leyi Wei, Pengwei Xing, Gaotao Shi, Zhiliang Ji*, Quan Zou*. Fast prediction of protein methylation sites using a sequence-based feature selection technique. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2019,16(4): 1264-1273.
  10. Chunyan Ao, Shunshan Jin, Yuan Lin*, Quan Zou*. Review of Progress in Predicting Protein Methylation Sites. Current Organic Chemistry. 2019, 23(15): 1663-1670.
  11. Wenying He, Leyi Wei, Quan Zou*. Research Progress in Protein Post-Translational Modification Site Prediction. Briefings in Functional Genomics. 2019, 18(4): 220-229.
  12. Cangzhi Jia*, Yun Zuo, Quan Zou*. O-GlcNAcPRED-II: an integrated classification algorithm for identifying O-GlcNAcylation sites based on fuzzy undersampling and a K-means PCA oversampling technique. Bioinformatics. 2018, 34(12): 2029-2036.
  13. Cangzhi Jia, Wenying He, Quan Zou*. DephosSitePred: a high accuracy predictor for protein dephosphorylation sites. Combinatorial Chemistry & High Throughput Screening. 2017, 20(2): 153-157.
  14. Leyi Wei, Pengwei Xing, Jijun Tang, Quan Zou*. PhosPred-RF: a novel sequence-based predictor for phosphorylation sites using sequential information only. IEEE Transactions on NanoBioscience. 2017, 16(4): 240-247.
E-mail address: acy196707@163.com.