Classification based on RNA sequence


Balance data

Group Type Literature Year Data Acc URL
Function

Unequal length sequence
eukaryotic mRNA 2021 Download ↓

Training dataset: Cytoplasm: 5310; Endoplasmic reticulum: 1185; Extracellular region: 710; Mitochondria: 350; Nucleus: 4855;
Testing dataset1: Cytoplasm: 1066; Endoplasmic reticulum: 241; Extracellular region: 145; Mitochondria: 71; Nucleus: 976;
Testing dataset2: Cytosol: 91; Nucleus: 148;
Testing dataset3: Endoplasmic reticulum: 131; Nucleus: 131;
5-CV
Training set 0.659±0.006
Independent testing1
0.601;
Independent testing2
0.506;
Independent testing3
0.37;
SubLocEP
lncRNA 2021 Download ↓

Dataset:
Positive Dataset: 18000; Negative Dataset: 18000;
(training set (80%) and test data (20%))
5-CV
96.5%
PlncRNA-HDeep
2019 Download ↓

Human:
Main: P-35760/ N-20299;
Independent: P-1500/ N-1500;
Mouse:
Main: P-23987/ N-11746;
Independent: P-1500/ N-1500;
CPPred:
Human-Testing: P-8557/ N-8241; Mouse-Testing: P-31102/ N-19930; Zebrafish-Testing: P-15594/ N-10662; Fruit-fly-Testing: P-17400/ N-4098; S.cerevisiae-Testing: P-6713/ N-413; Integrate-Testing: P-13903/ N-13903;
10-CV
Human-Main: 0.895;
Mouse-Main: 0.914;
Independent testing
Human-Main:
Human-Testing: 0.968; Mouse-Testing: 0.941; Zebrafish-Testing: 0.901; Fruit-fly-Testing: 0.940; S.cerevisiae-Testing: 0.960; Integrate-Testing: 0.907;
Mouse-Main:
Human-Testing: 0.887; Mouse-Testing: 0.944; Zebrafish-Testing: 0.843; Fruit-fly-Testing: 0.917; S.cerevisiae-Testing: 0.942; Integrate-Testing: 0.871;
PredLnc-GFStack
Non-Coding RNAs 2021 Download ↓

6320 ncRNAs sequence
[PS: One part of each family is randomly selected to form a test set and the rest to form a train set, so that all ncRNAs sequences can form 10-fold cross-validation train sets and test sets]
10-CV
0.7972
ncRFP
2021 Download ↓

Training dataset: 6320;
Testing dataset: 2600;
10-CV
0.9519
Independent testing
0.9510
ncRDense
2020 10-CV
88.04%
ncRDeep
pre-miRNA 2022 Download ↓

miRNA training dataset:
P-376/ N-376;
Pre-miRNA training dataset:
P-251/ N-251;
Pre-miRNA+miRNA training dataset:
251 stress-responsive Pre-miRNAs/ 251 non-stress-responsive Pre-miRNAs; 251 stress-responsive miRNAs/ 251 non-stress-responsive miRNAs;
Independent dataset:
70 stress-responsive miRNAs/ 100 non-stress-responsive miRNA; 70 stress-responsive Pre-miRNAs/ 100 non-stress-responsive Pre-miRNA
5-CV
miRNA: 65.33%;
Pre-miRNA: 66.40%;
miRNA + Pre-miRNA: 71.40%;
Independent testing
miRNA: 62.33%;
Pre-miRNA: 64.85%;
miRNA + Pre-miRNA: 69.21%;
ASRmiRNA
2020 Download ↓

Training dataset: 2408 sequences;
Validation dataset: 602 sequences;
Testing dataset: 752 sequences;
10-CV
CNN: 87.24±1.80%;
RNN: 88.44±1.80%
DNNPreMiR
circRNA

2020 Download ↓

circRNA vs PCG:
Training dataset: circRNA-10000/ PCG-8000; Testing dataset: circRNA-4084/ PCG-1533;
circRNA vs lncRNA:
Training dataset: circRNA-10000/ lncRNA-10000; Testing dataset: circRNA-4084/ lncRNA-9722;
Stem cell vs not:
Training dataset: circRNA-1800/non-circRNA-1800; Testing dataset: circRNA-282/non-circRNA-282;
10-CV
circRNA vs PCG: 0.815;
circRNA vs lncRNA: 0.802;
Stem cell vs not: 0.782;
Independent testing
circRNA vs PCG: 0.827;
circRNA vs lncRNA: 0.854;
Stem cell vs not: 0.812;
CirRNAPL
tRNA 2015 Download ↓

Training dataset:
Positive tRNA sequences: 623
Negative tRNA sequences: 1183
10-CV
95.1%%
tRNA-Predict
piRNA

2020 Download ↓

Training dataset:
First Layer: 709/709
Second Layer:1418/1418
5-CV
First Layer: 93.59%;
Second Layer: 90.13%;
2S-piRCNN
2017 5-CV
First Layer: 86.1%;
Second Layer: 77.6%;
2L-piRNA
Modification

Equal length sequence
N1-methyladenosine (m1A)
5-methylcytosine (m5C)
N6-methyladenosine (m6A)
Pseudouridine
Adenosine-to-inosine (A-to-I)


2020 Download ↓

H.sapiens:
m1A: P-6366/ N-6366;
m5C: P-120/ N-120;
m6A: P-1130/ N-1130;
Pseudouridine: P-495/ N-195;
A-to-I: P-3000/N-3000;
S.cerevisiae:
m1A: P-483/ N-483;
m5C: P-211/ N-211;
m6A: P-1307/ N-1307;
Pseudouridine: P-313/ N-314;
M.musculus:
m1A: P-1064/ N-1064;
m5C: P-97/ N-97;
m6A: P-725/ N-725;
Pseudouridine: P-472/ N-472;
10-CV
H.sapiens:
m1A: 99.47%; m5C: 93.33%; m6A: 91.28%; Pseudouridine: 66.47%; A-to-I: 91.73%;
S.cerevisiae:
m1A: 98.87%; m5C: 100%; m6A: 78.41%; Pseudouridine: 71.91%;
M.musculus:
m1A: 99.29%; m5C: 99.00%; m6A: 89.59%; Pseudouridine: 74.48%;
iMRM
Pseudouridine 2020 Download ↓

H. sapiens:
Training dataset: P-495/N-495;
Testing dataset: P-100/ N-100;
S. cerevisiae:
Training dataset: P-314/N-314;
Testing dataset: P-100/ N-100;
M. musculus:
Training dataset: P-472/N-472;
10-CV
H. sapiens: 64.3%;
S. cerevisiae: 74.8%;
M. musculus:74.8%;
Independent testing
H. sapiens: 75.0%
S. cerevisiae: 77.0%
RF-PseU
2019 5-CV
H. sapiens: 66.68%;
S. cerevisiae: 68.15%;
M. musculus:71.81%;
Independent testing
H. sapiens: 69.00%
S. cerevisiae: 73.50%
/
2020 Download ↓

H. sapiens:
Training dataset: P-495/N-495;
Testing dataset: P-100/ N-100;
S. cerevisiae:
Training dataset: P-319/N-319;
Testing dataset: P-100/ N-100;
M. musculus:
Training dataset: P-495/ N-495;
5-CV
H. sapiens: 62.73%;
S. cerevisiae: 70.54%;
M. musculus: 71.72%;
Independent testing
H. sapiens: 60.20%
S. cerevisiae: 77.0%
/
5-methylcytosine
(m5C)
2020 Download ↓

A. thaliana:
Training dataset: P-5289 /N-5289
Testing dataset: P-1000/ N-1000
10-CV
A. thaliana: 73.06%
Independent testing
A. thaliana: 80.15%
/
2020 Download ↓

H. sapiens: P-120/ N-120;
M. musculus: P-97/ N-97;
S. cerevisiae: P-211/ N-211;
A. thaliana:
Training dataset: P-5289/ N-5289;
Testing dataset: P-1000/N-1000;
jackknife test
H. sapiens: 90.8%;
M. musculus: 100%;
S. cerevisiae: 100%;

10-CV
A.thaliana: 70.7%;
Independent testing
A.thaliana: 74.0%
iRNA-m5C
5-hydroxymethylcytosine
(5hmC)
2021 Download ↓

Training dataset: P-662/ N-662
5-CV
0.81
iRhm5CNN
2020 5-CV
65.48%
iRNA5hmC
N6-methyladenosine
(m6A)
2021 Download ↓

Human
Brain:
Training dataset:
P-4605/ N-4605;
Testing dataset: P-4604/4604;
Kindey:
Training dataset:
P-4574/ N-4574;
Testing dataset: P-4573/ N-4573;
Liver:
Training dataset:
P-2634/ N-2634;
Testing dataset: P-2634/ N-2634;
Mouse
Brain:
Training dataset:
P-8025/ N-8025;
Testing dataset: P-8025/ N-8025;
Heart:
Training dataset:
P-2201/ N-2201;
Testing dataset: P-2200/ N-2200;
Kidney:
Training dataset:
P-3953/ N-3953;
Testing dataset: P-3952/ N-3952;
Liver:
Training dataset:
P-4133/ N-4133;
Testing dataset: P-4133/ N-4133;
Testis:
Training dataset:
P-4704/ N-4704;
Testing dataset: P-4706/ N-4706;
Rat
Brain:
Training dataset:
P-2352/ N-2352;
Testing dataset: P-2351/ N-2351;
Kidney:
Training dataset:
P-3433/ N-3433;
Testing dataset: P-3432/ N-3432;
Liver:
Training dataset:
P-1762/ N-1762;
Testing dataset: P-1762/ N-1762;
5-CV
Human:
Brian: 0.7378;
Kidney: 0.8048;
Liver: 0.8130;
Mouse:
Brian: 0.7936;
Heart: 0.7617;
Kidney: 0.8196;
Liver: 0.7358;
Testis: 0.7662;
Rat
Brain: 0.7827;
Kidney: 0.8338;
Liver: 0.8263;
Independent testing
Human:
Brian: 0.7327;
Kidney: 0.7989;
Liver: 0.8096;
Mouse:
Brian: 0.7859;
Heart: 0.511;
Kidney: 0.8087;
Liver: 0.7295;
Testis: 0.7712;
Rat
Brain: 0.7799;
Kidney: 0.8304;
Liver: 0.8164;
DNN-m6A
2020 5-CV
Human:
Brian: 71.26%;
Kidney: 78.99%;
Liver: 80.13%;
Mouse:
Brian: 78.75%;
Heart: 72.79%;
Kidney: 79.98%;
Liver: 70.59%;
Testis: 74.40%;
Rat
Brain: 75.96%;
Kidney: 81.78%;
Liver: 80.90%;
Independent testing
Human:
Brian: 71.1%;
Kidney: 77.76%;
Liver: 79.01%;
Mouse:
Brian: 78.26%;
Heart: 71.3%;
Kidney: 79.31%;
Liver: 68.79%;
Testis: 73.54%;
Rat
Brain: 75.14%;
Kidney: 81.42%;
Liver: 79.85%;
iRNA-m6A
2021 Large Data
Download ↓

H.sapiens:
Training dataset: P-14025/N-14025;
Independent dataset1: P-23478/ N-23478;
Independent dataset2: P-40742/ N-40742;
Independent dataset3: P-15696/ N-15696;
5-CV
Full transcript:
A549: 0.973; CD8T: 0.968; HEK293-abacm: 0.978; HEK293_sysy: 0.985; Hela: 0.984; MOLM13: 0.968;
Mature:
A549: 0.908; CD8T: 0.89; HEK293-abacm: 0.97; HEK293_sysy: 0.899; Hela: 0.901; MOLM13: 0.967;
Independent testing1
Full transcript:
AUC: 0.976;
Mature mRNA:
AUC: 0.899;
Independent testing2
Full transcript:
AUC: 0.981;
Mature mRNA:
AUC: 0.914;
Independent testing3
Full transcript:
AUC: 0.96776;
Mature mRNA:
AUC: 0.890;
HSM6AP
2019 Download ↓

A549: P-23478/N-234780; CD8T: P-19677/N-196770; Hela: P-37183/N-371830; HEK293(sys): P-12051/N-120510; HEK293(abcm): P-9536/N-95360;
5-CV
Full transcript:
AUC (A549: 0.977; CD8T: 0.976; Hela: 0.976; HEK293(sys): 0.976; HEK293(abacm): 0.975; MOLM13: 0.979;)
Mature mRNA: AUC (A549: 0.938; CD8T: 0.940; Hela: 0.936; HEK293(sysy): 0.941; HEK293(abacm): 0.942; MOLM13: 0.943);
Independent Testing
Full Transcript:
AUC (A549: 0.973; CD8T: 0.947; Hela: 0.962; HEK293(sysy): 0.954; HEK293(abacm): 0.976; MOLM13: 0.947);
Mature mRAN: AUC (A549: 0.923; CD8T: 0.922; Hela: 0.911; HEK293(sysy): 0.904; HEK293(abacm): 0.857; MOLM13: 0.856);
WHISTLE
2020 Download ↓

S.cerevisiae:
Dataset-1-M6A2614: P-1307/ N-1307
Dataset-2-M6A6540: P-3270/ N-3270
10-CV
Dataset-1-M6A2614: 89.19%;
Dataset-2-M6A6540: 87.44%;
iMethyl-Deep
2018 10-CV
Dataset-1-M6A2614: 80.50%
DeepM6APred
2016 jackknife test
Dataset-1-M6A2614: 78.35%
RAM-ESVM
2019 Large Data
Download ↓

Training dataset: P-11799/ N-11799
Testing dataset: P-2976/ N-2976
5-CV
Training dataset: 0.726
Independent Testing
0.727
HLMethy
2018 Large Data
Download ↓

Training dataset: 88579(P:N=1:1)
Testing dataset: 88227(P:N=1:10)
10-CV
H. sapiens: AUC 0.8414;
M. musculus: AUC 0.8145;
Independent testing
AUC 0.8414
Gene2vec
2018 Download ↓

S. cerevisiae:
Training dataset: P-1307/ N-1307;
Testing dataset: P-207/ N-207;
A. thaliana:
Training dataset: P-2100/ N-2100;
Testing dataset: P-418/ N-418;
Mammalian:
Training dataset: P-40000/ N-400000;
Testing dataset: P-10000/ N-100000;
10-CV
S. cerevisiae: 71.26%;
5-CV
A. thaliana: 85.95%;
Mammalia:
Full transcript: 87.80%; Mature mRNA: 86.14%;
Independent testing
S. cerevisiae: 68.84%;
A. thaliana: 87.20%;
Mammalia:
Full transcript: 88.34%;
Mature mRNA:86.32%;
BERMP
2’-O-methylation 2022 Download ↓

H. sapiens: P-261/ N-329;
S.cerevisiae: P-89/ N-189;
M.musculus: P-10/ N-35;
10-CV
H. sapiens:89.069%;
S.cerevisiae:93.885%;
Independent testing
H. sapiens: 86.88%
NmRF
2019 Download ↓

H. sapiens:
Training dataset: P-147/ N-147
5-CV
98.27%
/
2018 5-CV
97.95%
iRNA-2OM
N7-methylguanosine (m7G) 2021 Download ↓

Training dataset: P-741/ N-741
10-CV
95.42%
jackknife test
95.55%
m7G-DPP
2020 10-CV
92.50%
m7G-IFL
2019 jackknife test
89.88%
iRNAm7G
2020 Download ↓

Training dataset: P-595/ N-595
Testing dataset: P-150/ N-150
10-CV
86.00%
Independent testing
86.00%
m7GPredictor

Note: 5-CV:5-fold cross validation; 10-CV: 10-fold cross validation;P: Positive samples; N: Negative samples;



Imbalance data

Group Type Literature Year Data Imbalance Algorithms Acc URL
Function

Unequal length sequence
sgRNA 2021 Download ↓

G17 sequences:
Training dataset: P-830/ N-900;
Testing dataset: P-229/ N-3351;
Gr sequences:
Training dataset: P-550/ N-320;
Testing dataset: P-181/ N-118;
Gnr sequences:
Training dataset: P-536/ N-180;
Testing dataset: P-135/ N-57;
Gm sequences:
Training dataset: P-664/ N-180;
Testing dataset: P-166/ N-51;
G5 sequences:
hek293t:
Training dataset: P-1615/ N-428;
Testing dataset: P-404/ N-108;
hct116:
Training dataset: P-3090/ N-428;
Testing dataset: P-783/ N-108;
hela:
Training dataset: P-5923/ N-428;
Testing dataset: P-782/ N-108;
h160:
Training dataset: P-1973/ N-428;
Testing dataset: P-494/ N-108;
CS-Smote 10-CV
G17: 84.7%;
Gr: 69.4%;
G5: hct116: 96.9%; hek293t: 94.0%; hela: 97.74%; h160: 94.10%;
Independent testing
G17: 86.3%; Gr: 91.6%; Gnr: 89.4%; Gm: 93.8%; G5: hct116: 96.50%; hek293t: 78.7%; hela: 97.9%; h160: 97.3%;
SgRNA-RF
Modification

Equal length sequence
N2-methylguanosine
(m2G)

2021 Download ↓

Training set: H. sapiens: P-41/ N-541;
M. musculus: P-27/ N-427;
S. cerevisiae: P-60/ N-283;
Testing dataset: H. sapiens: P-5/ N-60;
M. musculus: P-3/ N-47;
S. cerevisiae: P-7/ N-31;
SMOTE 5-CV
H. sapiens: 0.9982;
M. musculus: 1.0;
S. cerevisiae: 0.9965;
Independent testing
H. sapiens: 0.9417;
M. musculus: 0.9043;
S. cerevisiae: 1.0;
/
N6-methyladenosine (m6A)
Pseudouridine
N1-methyladenosine (m1A)
N6,2′-O-dimethyladenosine (m6Am)
5-methylcytosine (m5C)
2′-O-methyladenosine (Am)
2′-O-methylcytidine (Cm)
2′-O-methylguanosine (Gm)
2′-O-methyluridine (Um)
N7-methylguanosine (m7G)
5-Methyluridine (m5U)
Inosine (I)
2021 Large Data
Download ↓

m6A: 65178; Pseudouridine: 3137; m1A: 16380; m6Am: 2447; Am: 1591; Cm: 1878; Gm: 1471; Um: 2253; m5C: 12936; m7G: 1036; m5U: 1696; I: 52618;
OHEM and UW Am: 78.00%; Cm: 82.00%; Gm: 89.00%; Um: 82.00%; m1A: 72.00%; m5C: 85.00%; m5U: 92.00%; m6A: 80.00%; m6Am: 83.00%; m7G: 65.00%; Pseudouridine: 84.00%; I: 70.00%; MultiRM
Dihydrouridine
(D)
2021 Download ↓

Multi-species
Training dataset: P-140/ N-298
Testing dataset: P-36/ N-76
SMOTEEEN jackknife test
Training dataset: 97.24%
Independent testing
93.75%
/
5-methylcytosine
(m5C)
2020 Download ↓

H. sapiens:
Met240: Training dataset: P-120/ N-120;
Met935:
Training dataset: P-127/ N-808;
Testing dataset: Test1157: P-157/ N-1000;
SMOTEENN jackknife test
Met935: 82.20%;
Independent testing
Test1157: 74.85%
/

Note: 5-CV:5-fold cross validation; 10-CV: 10-fold cross validation;P: Positive samples; N: Negative samples;
SMOTE: Synthetic Minority Oversampling Technique;
IHTS: Inserting Hypothetical Training Samples;
SMOTEENN: Synthetic Minority Over-sampling Technique (SMOTE) and undersampling method Edited Nearest Neighbours (ENN);
OHEM: online hard examples mining;
UW: uncertain weights;