Balance data |
Group | Type | Literature | Year | Data | Acc | URL | |
---|---|---|---|---|---|---|---|
Function Unequal length sequence |
DNA Enhancers | 2021 | Download ↓ Training dataset: 742 strong enhancers 742 weak enhancers 1484 non-enhancers Testing dataset: 100 strong enhancers 100 weak enhancers 200 non-enhancers |
5-CV layerI: 76.18% layerII: 62.53% |
Independent testing layerI: 79.75% layerII: 85.00% |
iEnhancer-RF | |
2021 | 10-CV layerI: 81.10% layerII: 66.74% |
Independent testing layerI: 75.75% layerII: 63.50% |
iEnhancer-XG | ||||
2018 | jackknife test layerI: 78.03% layerII: 65.03% |
Independent testing layerI: 74.75% layerII: 61.00% |
iEnhancer-EL | ||||
2016 | Download ↓ Training dataset: 742 strong enhancers 742 weak enhancers 1484 non-enhancers |
jackknife test layerI: 77.39%; layerII: 68.19% |
EnhancerPred | ||||
2015 | jackknife test layerI: 76.89%; layerII: 61.93% |
iEnhancer-2L | |||||
Promoters | 2019 | Download ↓ Training dataset: promoter sequences: 3382 non-promoters: 3382 |
5-CV 84.06% |
iPSW(2L)-PseKNC | |||
Nucleosome | 2018 | Download ↓ Training dataset: S1: C. elegans 2567 nucleosome-forming 2608 nucleosome-inhibiting S2: D. melanogaster 2900 nucleosome-forming 2850 nucleosome-inhibiting |
jackknife test S1: 92.29% S2: 88.26% |
NucPosPred | |||
DNA sequence | 2021 | Large Data Download ↓ Training dataset: 4400000 sequences; Validation dataset: 8000 sequences; Test dataset: 455024 sequences |
AVAUROC: 0.94519 AVAUPR: 0.39522 |
DeepATT | |||
Modification Equal length sequence |
N4-methycytosine (4mC) |
2021 | Download ↓ F. vesca: Training dataset: P-3457/N-3457; Testing dataset: P-864/N-864,4320,12960; R. chinensis: Training dataset: P-1938/N-1938; Testing dataset: P-483/N-483,2415,7245; |
5-CV F.vesca: 0.8697 R.chinensis: 0.8541 |
Independent testing F.vesca: P/N=1:1 0.8632; P/N=1:5 0.8632; P/N=1:15 0.8412 R.chinensis: P/N=1:1 0.8490; P/N=1:5 0.8400; P/N=1:15 0.8477 |
4mC-w2vec | |
2020 | Download ↓ E.coli: Training dataset: P-388 /N-388 Testing dataset: P-134 /N-134 |
10-CV Training set: 85.4% |
Independent testing 83.2% |
EC4mC-SVM | |||
2020 | Large Data Download ↓ A. thaliana: P-20000/ N-20000; C. elegans: P-20000/ N-20000; D. melanogaster: P-20000/ N-20000; |
10-CV A. thaliana: 84.4%; C. elegans: 89.3%; D. melanogaster: 87.1% |
Deep4mcPred | ||||
2019 | Download ↓ DatasetI: C. elegans: P-1554 /N-1554; D. melanogaster: P-1769 /N-1769; A. thaliana: P-1978 /N-1978; E. coli: P-388/N-388; G. subterraneus: P-905/N-905; G. pickeringii: P-569/N-569; |
Cross_validation C. elegans: 0.880; D. melanogaster: 0.874; A. thaliana: 0.825; E. coli: 0.894; G. subterraneus: 0.886;G. pickeringii: 0.907 |
4mcPred-IFL | ||||
2019 | 10-CV C.elegans: 0.815; D.melanogaster: 0.830; A.thaliana: 0.787; E.coli: 0.833; G.subterruneus: 0.837; G.pickeringii: 0.860 |
4mcPred-SVM | |||||
2019 | jackknife test C.elegans: 87.71%; D.melanogaster: 87.79%; A.thaliana: 83.37%; E.coli: 94.97%; G.subterruneus: 91.04%; G.pickeringii: 90.89% |
Independent testing C.elegans: 82.21%; D.melanogaster: 82.63%; A.thaliana: 76.52%; E.coli: 82.69%; G.subterruneus: 83.33%; G.pickeringii: 77.63% |
4mCPred | ||||
2019 | Download ↓ Training dataset: DatasetI Testing dataset:C. elegans: P-750/ N-750; D. melanogaster: P-1000/ N-1000; A. thaliana: P-1250/ N-1250; E. coli: P-134/N-134; G. subterraneus: P-350/N-350; G. pickeringii: P-200/ N-200; |
10-CV C.elegans: 0.826; D.melanogaster: 0.842; A.thaliana: 0.792; E.coli: 0.848; G.subterruneus: 0.855; G.pickeringii: 0.891 |
Independent testing C.elegans: 0.870; D.melanogaster: 0.906; A.thaliana: 0.855; E.coli: 0.825; G.subterruneus: 0.850; G.pickeringii: 0.850 |
Meta-4mCpred | |||
2019 | Download ↓ mouse genome Training dataset: P-800/ N-800 Testing dataset: P-180/ N-180 |
10-CV Training dataset: 0.795±0.001 |
Independent testing 0.798±0.011 |
4mCpred-EL | |||
N6-methyladenine (6mA) |
2022 | Large Data Download ↓ Benchmark datasets Rice-Chen: P-154000/ N-154000; Rice-Lv: P-880/ N-880; Imbalanced datasets Rice-Chen(1:5): P-176/ N-880; Rice-Chen(1:10): P-88/ N-880; Rice-Chen(1:20): P-44/ N-880; Rice-Lv(1:5): P-30800/ N-154000; Rice-Lv(1:10): P-15400/ N-154000; Rice-Lv(1:20): P-7700/ N-154000; Independent datasets NIP_10000: P-10000/ N-10000; A.thaliana: P-15937/ N-15937; D.melanogaster: P-11191/ N-11191; R.chinensis: P-11815/ N-11815; |
10-CV Rice-Chen: 97.00% Rice-Lv: 96.00% |
Independent testing NIP_10000: 88.00%; A.thaliana: 77.00%; D.melanogaster:82.00%; R.chinensis:86.00%; |
MGF6mARice | ||
2021 | Large Data Download ↓ Rice: P-154000/ N-154000; A.thaliana: P-98483/ N-98483; F.vesca: P-1417/ N-1417; R.chinensis: P-5733/ N-5733; |
5-CV Rice: ACC 94.01%; A.thaliana: AUC 0.954; F.vesca: AUC 0.982; R.chinensis: AUC 0.961; |
Deep6mA | ||||
2021 | Large Data Download ↓ Rice Genome Dataset-I: P-154000/ N-154000 Dataset-II: P-880/ N-880 |
5-CV Dataset-I: 93.82% |
Independent testing Dataset-II: 96.19% |
iRicem6A-CNN | |||
2020 | 10-CV Dataset-II: 87.27% |
Independent testing Dataset-I: 85.65% |
6mA-RicePred | ||||
2019 | Download ↓ cross-species: P-2768/ N-2716; Rice: P-880/ N-880; M. musculus: P-1934/ N-1934; |
5-CV cross-species: 0.824; Rice: 0.875; M.musculus: 0.969 |
iIM-CNN | ||||
2019 | Download ↓ Cross-species Training dataset: P-2214/ N-2214; Testing dataset: P-554/ N-502; Rice: P-880/ N-880; M. musculus: P-1934/ N-1934; |
5-CV Cross-species Training dataset: 0.799; Rice: 0.861; M.musculus: 0.966 |
Independent testing Cross-species: 0.813; |
csDMA |
Note: 5-CV:5-fold cross validation; 10-CV: 10-fold cross validation;P: Positive samples; N: Negative samples;
Imbalance data |
Group | Type | Literature | Year | Data | Imbalance Algorithms | Acc | URL | |
---|---|---|---|---|---|---|---|---|
Function Unequal length sequence |
Promoters | 2021 | Download ↓ promoter sequences: σ24: 484; σ28: 134; σ32: 291; σ38: 163; σ54: 94; σ70: 1694 non-promoter sequences: 2860 |
SMOTE | 5-CV layerI: 90.05% |
5-CV layerII: σ24: 97.75%; σ28: 99.84%; σ32: 98.66%; σ38: 99.06%; σ54: 99.94%; σ70: 94.19% |
iPro2L-PSTKNC | |
2018 | IHTS | 5-CV layerI: 81.68% |
5-CV layerII: σ24: 93.50%; σ28: 96.85%; σ32: 94.41%; σ38: 94.69%; σ54: 94.04%; σ70: 80.66% |
iPromoter-2L | ||||
Modification Equal length sequence |
5-methylcytosine (5mC) |
2020 | Large Data Download ↓ Training dataset: P-55800/ N-13950 Testing dataset: P-658861/ N-164715 |
DSM | 5-CV Training dataset: 90.16% |
Independent testing 90.22% |
iPromoter-5mC | |
2015 | Download ↓ Methylation samples: 787 Non-methylation samples: 1639 |
NCR/SMOTE | jackknife test 77.49% |
iDNA‐Methyl |
Note: 3-CV:3-fold cross validation; 5-CV: 5-fold cross validation;P: Positive samples; N: Negative samples;
SMOTE: Synthetic Minority Oversampling Technique;
IHTS: Inserting Hypothetical Training Samples;
DSM: Down-Sampling Method;
NCR: Neighborhood Cleaning Rule;