EnDeep4mC: A dual-adaptive feature encoding framework in deep ensembles for predicting DNA N4-methylcytosine sites
A deep learning ensemble predictor for DNA 4mC modification sites.
Combines CNN, BLSTM and Transformer models with advanced feature engineering.

Introduction

DNA N4-methylcytosine (4mC), a key epigenetic modification regulating DNA repair and replication, requires efficient computational detection methods due to experimental limitations. While machine learning predictors have been proposed, their performance could be enhanced through systematic optimization of feature encoding schemes.

Here, we propose EnDeep4mC, a dual-adaptive framework integrating species-specific modeling with ensemble deep learning architectures to systematically optimize feature encoding schemes. Evaluated across six species, EnDeep4mC achieves a mean ACC of 95.28%, significantly outperforming current state-of-the-art predictors.

Cross-species validation confirms its robust transferability with an AUC of 0.80 in from animal to microbe. Evolutionary analysis further uncovers the functional differentiation of 4mC sequences in biological evolution: Prokaryotic 4mC relies on stable patterns, whereas eukaryotes achieve regulatory plasticity through dynamic sequence combinations, which provides experimental evidence for species-adaptive encoding strategies.

Availability: The source code, pretrained models, and datasets are publicly available at https://github.com/RaySYZhang/EnDeep4mC

Method Overview

1. EnDeep4mC Workflow Diagram

EnDeep4mC Workflow

Complete workflow of the EnDeep4mC prediction framework showing the three-tier ensemble architecture and dual-adaptive encoding system.

2. Dual-adaptive encoding framework

Dual-adaptive encoding framework

Dual-adaptive encoding framework with species-adaptive and model-adaptive feature processing mechanisms.

Details of model architecture and the full prediction interface