Balance data |
Group | Type | Literature | Year | Data | Acc | URL | |
---|---|---|---|---|---|---|---|
Function Unequal length sequence |
eukaryotic mRNA | 2021 | Download ↓ Training dataset: Cytoplasm: 5310; Endoplasmic reticulum: 1185; Extracellular region: 710; Mitochondria: 350; Nucleus: 4855; Testing dataset1: Cytoplasm: 1066; Endoplasmic reticulum: 241; Extracellular region: 145; Mitochondria: 71; Nucleus: 976; Testing dataset2: Cytosol: 91; Nucleus: 148; Testing dataset3: Endoplasmic reticulum: 131; Nucleus: 131; |
5-CV Training set 0.659±0.006 |
Independent testing1 0.601; Independent testing2 0.506; Independent testing3 0.37; |
SubLocEP | |
lncRNA | 2021 | Download ↓ Dataset: Positive Dataset: 18000; Negative Dataset: 18000; (training set (80%) and test data (20%)) |
5-CV 96.5% |
PlncRNA-HDeep | |||
2019 | Download ↓ Human: Main: P-35760/ N-20299; Independent: P-1500/ N-1500; Mouse: Main: P-23987/ N-11746; Independent: P-1500/ N-1500; CPPred: Human-Testing: P-8557/ N-8241; Mouse-Testing: P-31102/ N-19930; Zebrafish-Testing: P-15594/ N-10662; Fruit-fly-Testing: P-17400/ N-4098; S.cerevisiae-Testing: P-6713/ N-413; Integrate-Testing: P-13903/ N-13903; |
10-CV Human-Main: 0.895; Mouse-Main: 0.914; |
Independent testing Human-Main: Human-Testing: 0.968; Mouse-Testing: 0.941; Zebrafish-Testing: 0.901; Fruit-fly-Testing: 0.940; S.cerevisiae-Testing: 0.960; Integrate-Testing: 0.907; Mouse-Main: Human-Testing: 0.887; Mouse-Testing: 0.944; Zebrafish-Testing: 0.843; Fruit-fly-Testing: 0.917; S.cerevisiae-Testing: 0.942; Integrate-Testing: 0.871; |
PredLnc-GFStack | |||
Non-Coding RNAs | 2021 | Download ↓ 6320 ncRNAs sequence [PS: One part of each family is randomly selected to form a test set and the rest to form a train set, so that all ncRNAs sequences can form 10-fold cross-validation train sets and test sets] |
10-CV 0.7972 |
ncRFP | |||
2021 | Download ↓ Training dataset: 6320; Testing dataset: 2600; |
10-CV 0.9519 |
Independent testing 0.9510 |
ncRDense | |||
2020 | 10-CV 88.04% |
ncRDeep | |||||
pre-miRNA | 2022 | Download ↓ miRNA training dataset: P-376/ N-376; Pre-miRNA training dataset: P-251/ N-251; Pre-miRNA+miRNA training dataset: 251 stress-responsive Pre-miRNAs/ 251 non-stress-responsive Pre-miRNAs; 251 stress-responsive miRNAs/ 251 non-stress-responsive miRNAs; Independent dataset: 70 stress-responsive miRNAs/ 100 non-stress-responsive miRNA; 70 stress-responsive Pre-miRNAs/ 100 non-stress-responsive Pre-miRNA |
5-CV miRNA: 65.33%; Pre-miRNA: 66.40%; miRNA + Pre-miRNA: 71.40%; |
Independent testing miRNA: 62.33%; Pre-miRNA: 64.85%; miRNA + Pre-miRNA: 69.21%; |
ASRmiRNA | ||
2020 | Download ↓ Training dataset: 2408 sequences; Validation dataset: 602 sequences; Testing dataset: 752 sequences; |
10-CV CNN: 87.24±1.80%; RNN: 88.44±1.80% |
DNNPreMiR | ||||
circRNA |
2020 | Download ↓ circRNA vs PCG: Training dataset: circRNA-10000/ PCG-8000; Testing dataset: circRNA-4084/ PCG-1533; circRNA vs lncRNA: Training dataset: circRNA-10000/ lncRNA-10000; Testing dataset: circRNA-4084/ lncRNA-9722; Stem cell vs not: Training dataset: circRNA-1800/non-circRNA-1800; Testing dataset: circRNA-282/non-circRNA-282; |
10-CV circRNA vs PCG: 0.815; circRNA vs lncRNA: 0.802; Stem cell vs not: 0.782; |
Independent testing circRNA vs PCG: 0.827; circRNA vs lncRNA: 0.854; Stem cell vs not: 0.812; |
CirRNAPL | ||
tRNA | 2015 | Download ↓ Training dataset: Positive tRNA sequences: 623 Negative tRNA sequences: 1183 |
10-CV 95.1%% |
tRNA-Predict∗ | |||
piRNA |
2020 | Download ↓ Training dataset: First Layer: 709/709 Second Layer:1418/1418 |
5-CV First Layer: 93.59%; Second Layer: 90.13%; |
2S-piRCNN | |||
2017 | 5-CV First Layer: 86.1%; Second Layer: 77.6%; |
2L-piRNA | |||||
Modification Equal length sequence |
N1-methyladenosine (m1A) 5-methylcytosine (m5C) N6-methyladenosine (m6A) Pseudouridine Adenosine-to-inosine (A-to-I) |
2020 | Download ↓ H.sapiens: m1A: P-6366/ N-6366; m5C: P-120/ N-120; m6A: P-1130/ N-1130; Pseudouridine: P-495/ N-195; A-to-I: P-3000/N-3000; S.cerevisiae: m1A: P-483/ N-483; m5C: P-211/ N-211; m6A: P-1307/ N-1307; Pseudouridine: P-313/ N-314; M.musculus: m1A: P-1064/ N-1064; m5C: P-97/ N-97; m6A: P-725/ N-725; Pseudouridine: P-472/ N-472; |
10-CV H.sapiens: m1A: 99.47%; m5C: 93.33%; m6A: 91.28%; Pseudouridine: 66.47%; A-to-I: 91.73%; S.cerevisiae: m1A: 98.87%; m5C: 100%; m6A: 78.41%; Pseudouridine: 71.91%; M.musculus: m1A: 99.29%; m5C: 99.00%; m6A: 89.59%; Pseudouridine: 74.48%; |
iMRM | ||
Pseudouridine | 2020 | Download ↓ H. sapiens: Training dataset: P-495/N-495; Testing dataset: P-100/ N-100; S. cerevisiae: Training dataset: P-314/N-314; Testing dataset: P-100/ N-100; M. musculus: Training dataset: P-472/N-472; |
10-CV H. sapiens: 64.3%; S. cerevisiae: 74.8%; M. musculus:74.8%; |
Independent testing H. sapiens: 75.0% S. cerevisiae: 77.0% |
RF-PseU | ||
2019 | 5-CV H. sapiens: 66.68%; S. cerevisiae: 68.15%; M. musculus:71.81%; |
Independent testing H. sapiens: 69.00% S. cerevisiae: 73.50% |
/ | ||||
2020 | Download ↓ H. sapiens: Training dataset: P-495/N-495; Testing dataset: P-100/ N-100; S. cerevisiae: Training dataset: P-319/N-319; Testing dataset: P-100/ N-100; M. musculus: Training dataset: P-495/ N-495; |
5-CV H. sapiens: 62.73%; S. cerevisiae: 70.54%; M. musculus: 71.72%; |
Independent testing H. sapiens: 60.20% S. cerevisiae: 77.0% |
/ | |||
5-methylcytosine (m5C) |
2020 | Download ↓ A. thaliana: Training dataset: P-5289 /N-5289 Testing dataset: P-1000/ N-1000 |
10-CV A. thaliana: 73.06% |
Independent testing A. thaliana: 80.15% |
/ | ||
2020 | Download ↓ H. sapiens: P-120/ N-120; M. musculus: P-97/ N-97; S. cerevisiae: P-211/ N-211; A. thaliana: Training dataset: P-5289/ N-5289; Testing dataset: P-1000/N-1000; |
jackknife test H. sapiens: 90.8%; M. musculus: 100%; S. cerevisiae: 100%; 10-CV A.thaliana: 70.7%; |
Independent testing A.thaliana: 74.0% |
iRNA-m5C | |||
5-hydroxymethylcytosine (5hmC) |
2021 | Download ↓ Training dataset: P-662/ N-662 |
5-CV 0.81 |
iRhm5CNN | |||
2020 | 5-CV 65.48% |
iRNA5hmC | |||||
N6-methyladenosine (m6A) |
2021 | Download ↓ Human Brain: Training dataset: P-4605/ N-4605; Testing dataset: P-4604/4604; Kindey: Training dataset: P-4574/ N-4574; Testing dataset: P-4573/ N-4573; Liver: Training dataset: P-2634/ N-2634; Testing dataset: P-2634/ N-2634; Mouse Brain: Training dataset: P-8025/ N-8025; Testing dataset: P-8025/ N-8025; Heart: Training dataset: P-2201/ N-2201; Testing dataset: P-2200/ N-2200; Kidney: Training dataset: P-3953/ N-3953; Testing dataset: P-3952/ N-3952; Liver: Training dataset: P-4133/ N-4133; Testing dataset: P-4133/ N-4133; Testis: Training dataset: P-4704/ N-4704; Testing dataset: P-4706/ N-4706; Rat Brain: Training dataset: P-2352/ N-2352; Testing dataset: P-2351/ N-2351; Kidney: Training dataset: P-3433/ N-3433; Testing dataset: P-3432/ N-3432; Liver: Training dataset: P-1762/ N-1762; Testing dataset: P-1762/ N-1762; |
5-CV Human: Brian: 0.7378; Kidney: 0.8048; Liver: 0.8130; Mouse: Brian: 0.7936; Heart: 0.7617; Kidney: 0.8196; Liver: 0.7358; Testis: 0.7662; Rat Brain: 0.7827; Kidney: 0.8338; Liver: 0.8263; |
Independent testing Human: Brian: 0.7327; Kidney: 0.7989; Liver: 0.8096; Mouse: Brian: 0.7859; Heart: 0.511; Kidney: 0.8087; Liver: 0.7295; Testis: 0.7712; Rat Brain: 0.7799; Kidney: 0.8304; Liver: 0.8164; |
DNN-m6A | ||
2020 | 5-CV Human: Brian: 71.26%; Kidney: 78.99%; Liver: 80.13%; Mouse: Brian: 78.75%; Heart: 72.79%; Kidney: 79.98%; Liver: 70.59%; Testis: 74.40%; Rat Brain: 75.96%; Kidney: 81.78%; Liver: 80.90%; |
Independent testing Human: Brian: 71.1%; Kidney: 77.76%; Liver: 79.01%; Mouse: Brian: 78.26%; Heart: 71.3%; Kidney: 79.31%; Liver: 68.79%; Testis: 73.54%; Rat Brain: 75.14%; Kidney: 81.42%; Liver: 79.85%; |
iRNA-m6A | ||||
2021 | Large Data Download ↓ H.sapiens: Training dataset: P-14025/N-14025; Independent dataset1: P-23478/ N-23478; Independent dataset2: P-40742/ N-40742; Independent dataset3: P-15696/ N-15696; |
5-CV Full transcript: A549: 0.973; CD8T: 0.968; HEK293-abacm: 0.978; HEK293_sysy: 0.985; Hela: 0.984; MOLM13: 0.968; Mature: A549: 0.908; CD8T: 0.89; HEK293-abacm: 0.97; HEK293_sysy: 0.899; Hela: 0.901; MOLM13: 0.967; |
Independent testing1 Full transcript: AUC: 0.976; Mature mRNA: AUC: 0.899; Independent testing2 Full transcript: AUC: 0.981; Mature mRNA: AUC: 0.914; Independent testing3 Full transcript: AUC: 0.96776; Mature mRNA: AUC: 0.890; |
HSM6AP | |||
2019 | Download ↓ A549: P-23478/N-234780; CD8T: P-19677/N-196770; Hela: P-37183/N-371830; HEK293(sys): P-12051/N-120510; HEK293(abcm): P-9536/N-95360; |
5-CV Full transcript: AUC (A549: 0.977; CD8T: 0.976; Hela: 0.976; HEK293(sys): 0.976; HEK293(abacm): 0.975; MOLM13: 0.979;) Mature mRNA: AUC (A549: 0.938; CD8T: 0.940; Hela: 0.936; HEK293(sysy): 0.941; HEK293(abacm): 0.942; MOLM13: 0.943); |
Independent Testing Full Transcript: AUC (A549: 0.973; CD8T: 0.947; Hela: 0.962; HEK293(sysy): 0.954; HEK293(abacm): 0.976; MOLM13: 0.947); Mature mRAN: AUC (A549: 0.923; CD8T: 0.922; Hela: 0.911; HEK293(sysy): 0.904; HEK293(abacm): 0.857; MOLM13: 0.856); |
WHISTLE | |||
2020 | Download ↓ S.cerevisiae: Dataset-1-M6A2614: P-1307/ N-1307 Dataset-2-M6A6540: P-3270/ N-3270 |
10-CV Dataset-1-M6A2614: 89.19%; Dataset-2-M6A6540: 87.44%; |
iMethyl-Deep | ||||
2018 | 10-CV Dataset-1-M6A2614: 80.50% |
DeepM6APred | |||||
2016 | jackknife test Dataset-1-M6A2614: 78.35% |
RAM-ESVM | |||||
2019 | Large Data Download ↓ Training dataset: P-11799/ N-11799 Testing dataset: P-2976/ N-2976 |
5-CV Training dataset: 0.726 |
Independent Testing 0.727 |
HLMethy | |||
2018 | Large Data Download ↓ Training dataset: 88579(P:N=1:1) Testing dataset: 88227(P:N=1:10) |
10-CV H. sapiens: AUC 0.8414; M. musculus: AUC 0.8145; |
Independent testing AUC 0.8414 |
Gene2vec | |||
2018 | Download ↓ S. cerevisiae: Training dataset: P-1307/ N-1307; Testing dataset: P-207/ N-207; A. thaliana: Training dataset: P-2100/ N-2100; Testing dataset: P-418/ N-418; Mammalian: Training dataset: P-40000/ N-400000; Testing dataset: P-10000/ N-100000; |
10-CV S. cerevisiae: 71.26%; 5-CV A. thaliana: 85.95%; Mammalia: Full transcript: 87.80%; Mature mRNA: 86.14%; |
Independent testing S. cerevisiae: 68.84%; A. thaliana: 87.20%; Mammalia: Full transcript: 88.34%; Mature mRNA:86.32%; |
BERMP | |||
2’-O-methylation | 2022 | Download ↓ H. sapiens: P-261/ N-329; S.cerevisiae: P-89/ N-189; M.musculus: P-10/ N-35; |
10-CV H. sapiens:89.069%; S.cerevisiae:93.885%; |
Independent testing H. sapiens: 86.88% |
NmRF | ||
2019 | Download ↓ H. sapiens: Training dataset: P-147/ N-147 |
5-CV 98.27% |
/ | ||||
2018 | 5-CV 97.95% |
iRNA-2OM | |||||
N7-methylguanosine (m7G) | 2021 | Download ↓ Training dataset: P-741/ N-741 |
10-CV 95.42% |
jackknife test 95.55% |
m7G-DPP | ||
2020 | 10-CV 92.50% |
m7G-IFL | |||||
2019 | jackknife test 89.88% |
iRNAm7G | |||||
2020 | Download ↓ Training dataset: P-595/ N-595 Testing dataset: P-150/ N-150 |
10-CV 86.00% |
Independent testing 86.00% |
m7GPredictor |
Note: 5-CV:5-fold cross validation; 10-CV: 10-fold cross validation;P: Positive samples; N: Negative samples;
Imbalance data |
Group | Type | Literature | Year | Data | Imbalance Algorithms | Acc | URL | |
---|---|---|---|---|---|---|---|---|
Function Unequal length sequence |
sgRNA | 2021 | Download ↓ G17 sequences: Training dataset: P-830/ N-900; Testing dataset: P-229/ N-3351; Gr sequences: Training dataset: P-550/ N-320; Testing dataset: P-181/ N-118; Gnr sequences: Training dataset: P-536/ N-180; Testing dataset: P-135/ N-57; Gm sequences: Training dataset: P-664/ N-180; Testing dataset: P-166/ N-51; G5 sequences: hek293t: Training dataset: P-1615/ N-428; Testing dataset: P-404/ N-108; hct116: Training dataset: P-3090/ N-428; Testing dataset: P-783/ N-108; hela: Training dataset: P-5923/ N-428; Testing dataset: P-782/ N-108; h160: Training dataset: P-1973/ N-428; Testing dataset: P-494/ N-108; |
CS-Smote | 10-CV G17: 84.7%; Gr: 69.4%; G5: hct116: 96.9%; hek293t: 94.0%; hela: 97.74%; h160: 94.10%; |
Independent testing G17: 86.3%; Gr: 91.6%; Gnr: 89.4%; Gm: 93.8%; G5: hct116: 96.50%; hek293t: 78.7%; hela: 97.9%; h160: 97.3%; |
SgRNA-RF | |
Modification Equal length sequence |
N2-methylguanosine (m2G) |
2021 | Download ↓ Training set: H. sapiens: P-41/ N-541; M. musculus: P-27/ N-427; S. cerevisiae: P-60/ N-283; Testing dataset: H. sapiens: P-5/ N-60; M. musculus: P-3/ N-47; S. cerevisiae: P-7/ N-31; |
SMOTE | 5-CV H. sapiens: 0.9982; M. musculus: 1.0; S. cerevisiae: 0.9965; |
Independent testing H. sapiens: 0.9417; M. musculus: 0.9043; S. cerevisiae: 1.0; |
/ | |
N6-methyladenosine (m6A) Pseudouridine N1-methyladenosine (m1A) N6,2′-O-dimethyladenosine (m6Am) 5-methylcytosine (m5C) 2′-O-methyladenosine (Am) 2′-O-methylcytidine (Cm) 2′-O-methylguanosine (Gm) 2′-O-methyluridine (Um) N7-methylguanosine (m7G) 5-Methyluridine (m5U) Inosine (I) |
2021 | Large Data Download ↓ m6A: 65178; Pseudouridine: 3137; m1A: 16380; m6Am: 2447; Am: 1591; Cm: 1878; Gm: 1471; Um: 2253; m5C: 12936; m7G: 1036; m5U: 1696; I: 52618; |
OHEM and UW | Am: 78.00%; Cm: 82.00%; Gm: 89.00%; Um: 82.00%; m1A: 72.00%; m5C: 85.00%; m5U: 92.00%; m6A: 80.00%; m6Am: 83.00%; m7G: 65.00%; Pseudouridine: 84.00%; I: 70.00%; | MultiRM | |||
Dihydrouridine (D) |
2021 | Download ↓ Multi-species Training dataset: P-140/ N-298 Testing dataset: P-36/ N-76 |
SMOTEEEN | jackknife test Training dataset: 97.24% |
Independent testing 93.75% |
/ | ||
5-methylcytosine (m5C) |
2020 | Download ↓ H. sapiens: Met240: Training dataset: P-120/ N-120; Met935: Training dataset: P-127/ N-808; Testing dataset: Test1157: P-157/ N-1000; |
SMOTEENN | jackknife test Met935: 82.20%; |
Independent testing Test1157: 74.85% |
/ |
Note: 5-CV:5-fold cross validation; 10-CV: 10-fold cross validation;P: Positive samples; N: Negative samples;
SMOTE: Synthetic Minority Oversampling Technique;
IHTS: Inserting Hypothetical Training Samples;
SMOTEENN: Synthetic Minority Over-sampling Technique (SMOTE) and undersampling method Edited Nearest Neighbours (ENN);
OHEM: online hard examples mining;
UW: uncertain weights;