DNA methylation is an important epigenetic modification, which is closely related to the development and progression of cancer. As a cancer-related biomarker, the effective identification of DNA methylation sites has import implications in understanding the pathogenesis of cancer, cancer diagnosis and drug development.


HSM6AP can identify M6A sites, accurately.The supported file formats are : Fasta and Txt.



The original training sequence and the original gene data are extracted from the original gene model. The original genetic data were assigned different weights, which were 2, 3, 4, 5, and the weights of the negative samples were 1. The negative samples were clustered through Gaussian Mixed Model (GMM), and then were sampled at random, which constitutes the true training sequence and gene datasets. The true training sequence are used to extracted features with four sequence-based methods and four physicochemical methods, while the training gene is extracted by using 14 gene-based feature extraction methods. The features generated by the above 22 Feature extraction methods are selected by using Max-Relevance-Max-Distance (MRMD). The selected features and NCP are spliced to generate the final feature vectors. Xgboost is used to train model by 5-fold cross validation.


  1. Kunqi Chen, Zhen Wei, Qing Zhang, Xiangyu Wu, Rong Rong, Zhiliang Lu, Jionglong Su, Jo茫o Pedro de Magalh茫es, Daniel J Rigden, Jia Meng, WHISTLE: a high-accuracy map of the human N6-methyladenosine (m6A) epitranscriptome predicted using a machine learning approach, Nucleic Acids Research.

all rights reserved@ 2019 | Quan Zou, Ph.D. & Professor
Last modified date:26/12/2019