Classification based on Protein sequence


Balance data

Group Type Literature Year Data Acc URL
Function

Unequal length sequence
DNA-Binding Proteins 2021 Download ↓

UniSwiss-Tr: DBPs: 4500; Non-DBPs: 4500;
UniSwiss-Tst: DBPs: 381; Non-DBPs: 381;
jackknife test
86.84%
Independent testing
85.83%
TargetDBP plus
2020 Download ↓

Training dataset-PDB1075: DBPs: 525; Non-DBPs: 525;
Independent testing-PDB186: DBPs: 93; Non-DBPs: 93;
jackknife test
PDB1075: 86.05%
Independent testing
PDB186: 75.30%
PSSMEI
Gram-negative bacteria 2021 Download ↓

T1SS: 195; T2SS: 83; T3SS: 1194; T4SS: 713; T6SS: 181;
/ BastionHub
Bacterial Type III Secreted Effectors 2021 Download ↓

Training dataset: P-379/ N-1112;
Testing dataset: P-108/ N-108;
5-CV
0.967±0.002
Independent testing
0.963±0.034
iT3SE-PX
2020 Download ↓

Training dataset: P-309/ N-310;
Testing dataset: P-42/ N-34;
10-CV
0.941±0.011
Independent testing
0.83
T3SEpp
Soluble protein 2022 Download ↓

Training dataset: P-5718/ N-5718;
Testing dataset: P-1550/ N-1550;
5-CV
0.70±0.02
Independent testing
0.728
NetSolP
2021 5-CV
76.0%
Independent testing
58.5%
SoluProt
Secretory Proteins 2021 Download ↓

Secretory proteins: 252
Non-secretory protein: 252
10-CV
97.8%
CRCF
Cancerlectin 2021 Download ↓

Cancerlectin sequences: 176;
Non-Cancerlectin sequences: 227;
Lectins sequences: 462;
Non-Lectins sequences: 435;
5-CV
Cancerlectin: 98.54%
Lectin: 95.38%
/
2020 Download ↓

Cancerlectin sequences: 178
Non-Cancerlectin sequences: 226
10-CV
83.91%
Cancerlectins
Phage virion protein 2020 Download ↓

Training dataset: P-99/ N-208
Testing dataset: P-30/ N-60
jackknife test
0.8958
Independent testing
0.7980
SVM-PVPData
Cell wall lytic enzymes 2021 Download ↓

Lyases: 68
Non-Lyases: 307
10-CV
96.09%
/
2020 jackknife test
0.955
CWLy-SVM
Thermophilic Proteins 2020 Download ↓

Thermophilic proteins: 915
Non-thermophilic proteins: 793
10-CV
98.02%
Tool
2020 cross-validation
96.02%
/
2019 10-CV
93.03%
Tool
Major Histocompatibility Complex 2019 Download ↓

Dataset1:
Training dataset: MHC protein: 4370; Non-MHC protein: 4370;
Testing dataset: MHC protein: 2342; Non-MHC protein: 2342;
Dataset2: MHC I: 3350; MHC II: 3362;
10-CV
Dataset1: 91.66%;
Dataset2: 94.224%
Independent testing
Dataset1: 93.17%
ELM-MHC
Amyloid Proteins 2021 Download ↓

Amyloid proteins: 165
Non_amyloid proteins: 382
(ps: 80% Training dataset, 20% Testing dataset)
10-CV
0.895
Independent testing
0.827
iAMY-SCM
2020 10-CV
91.59%
PredAmyl-MLP
2018 10-CV
89.19%
Independent testing
89.19%
RFAmyloid
Modification

Equal length sequence
Multiple PTMs 2021 Download ↓

S1(Ace): 3114; S2(Glyca): 1399; S3(Malon): 1224; S4(Meth): 1147; S5(Succ): 1645; S6(Sumo): 1174; S7(Ubiq): 3185;
(ps: 4/5 of all samples were used as training samples and the other 1/5 was utilized as an independent test set)
10-CV
0.8589
Independent testing
0.8549
MultiLyGAN
Acetylation 2020 Download ↓

Archaea:
Training dataset: P-193/ N-193;
Testing dataset: P-21/ N-21;
B.subtilis:
Training dataset: P-1040/ N-1040;
Testing dataset: P-115/ N-115;
C.glutamicum:
Training dataset: P-1021/ N-1021;
Testing dataset: P-113/ N-113;
E.amylovora:
Training dataset: P-95/ N-95;
Testing dataset: P-10/ N-10;
E.coli:
Training dataset: P-1919/ N-1919;
Testing dataset: P-213/ N-213;
G.kaustophilus:
Training dataset: P-189/ N-189;
Testing dataset: P-21/ N-21;
M.tuberculosis:
Training dataset: P-866/ N-866;
Testing dataset: P-96/ N-96;
S.typhimuricum:
Training dataset: P-174/ N-174;
Testing dataset: P-19/ N-19;
V.parahemolvticus:
Training dataset: P-1065/ N-1065;
Testing dataset: P-118/ N-118;

ProAcePred Negative data using this dataset
10-CV
Archaea: 84.74%; B.subtilis: 73.89%; C.glutamicum: 75.38%; E.amylovora: 96.89%; E.coli: 63.08%; G.kaustophilus: 89.15%; M.tuberculosis: 76.62%; S.typhimuricum: 90.51%; V.parahemolvticus: 75.46%;
Independent testing
Archaea: 90.00%;
B.subtilis: 98.26%;
C.glutamicum: 92.88%;
E.amylovora: 90.00%;
E.coli: 86.18%;
G.kaustophilus: 97.50%; M.tuberculosis: 96.44%; S.typhimuricum: 95.00%; V.parahemolvticus: 94.02%;
DNNAce
2018 10-CV
Archaea: 0.9;
B.subtilis: 0.796;
C.glutamicum: 0.8;
E.amylovora: 0.983;
E.coli: 0.69;
G.kaustophilus: 0.897;
M.tuberculosis: 0.834;
S.typhimuricum: 0.874;
V.parahemolvticus: 0.802;
Independent testing
Archaea: 0.81;
B.subtilis: 0.952;
C.glutamicum: 0.872;
E.amylovora: 0.9;
E.coli: 0.899;
G.kaustophilus: 0.881;
M.tuberculosis: 0.88;
S.typhimuricum: 0.816;
V.parahemolvticus: 0.869;
ProAcePred
2019 Large Data
Download ↓

Training dataset: P-12886/N-12886
Testing dataset: P-3221/ N-3221
10-CV
84.95%
Independent testing
84.87%
DeepAcet
Hydroxylation 2020 Download ↓

Hydroxylysine sites: 185
Non-Hydroxylysine sites: 497
jackknife test
97.24%
/
Malonylation 2020 Download ↓

Malonylation sites: 1735
Non-malonylation sites: 1735
(ps: 80% Training dataset, 20% Testing dataset)
5-CV
91.24%
Independent testing
90.65%
Mal-Prec
2019 Download ↓

H. sapiens
Training dataset: P-3585/ N-3585;
Testing dataset: P-300/ N-300;
M. musculus
Training dataset: P-2606/ N-2606;
Testing dataset: P-300/ N-300;
E. coli
Training dataset: P-1453/ N-1453;
Testing dataset: P-100/ N-100;
10-CV
E. coli:
0.801±0.002
M. musculus:
0.825±0.002
H. sapiens:
0.835±0.002
Independent testing
E. coli: 0.845
M. musculus: 0.833
H. sapiens: 0.860
KMAL-SP
Methylation 2020 Download ↓

Single-methylarginine: 1465
double-methylarginine: 474
Negative samples: 39980
10-CV
Single: 82.1%
Double: 82.5%
Code
2017 Download ↓

Methylation site R
Dataset-1: P-185/ N-185; Dataset-2: P-185/ N-185; Dataset-3: P-185/ N-185; Dataset-4: P-185/ N-185; Dataset-5: P-185/ N-185; Dataset-6: P-185/ N-186; Dataset-7: P-185/ N-185;
Methylation site K
Dataset-1: P-226/ N-217; Dataset-2: P-226/ N-217; Dataset-3: P-226/ N-217; Dataset-4: P-226/ N-217; Dataset-5: P-226/ N-218; Dataset-6: P-226/ N-217; Dataset-7: P-226/ N-217;
jackknife test
Methylation site R
Dataset-1: 80.3%; Dataset-2: 80.3%; Dataset-3: 78.4%; Dataset-4: 80.8%; Dataset-5: 82.2%; Dataset-6: 82.7%; Dataset-7: 80.5%; Average: 80.7%;
Methylation site K
Dataset-1: 67.7%; Dataset-2: 68.4%; Dataset-3: 72.7%; Dataset-4: 70.0%; Dataset-5: 69.0%; Dataset-6: 69.3%; Dataset-7: 72.5%; Average: 69.9%;
MePred-RF
Palmitoylation 2019 Download ↓

S-palmitoylation sites: 436
Non-S-palmitoylation sites: 500
Self-consistency testing
99.79%
10-CV
97.22%
SPalmitoylC-PseAAC
Phosphorylation 2021 Download ↓

Training dataset: S: P-4316/ N-4316; T: P-1551/ N-1551; Y: P-553/ N-553;
Testing dataset: S: P-2773/ N-17118; T: P-941/ N-6258; Y: P-210/ N-1296;
10-CV
S: 80.38%;
T: 80.01%;
Y: 77.76%;
Independent testing
S: 78.91%;
T: 84.81%;
Y: 82.73%;
DeepPPSite
2021 Download ↓

Training dataset: S/T: P-4308/ N-4308; Y: P-81/ N-81;
Testing dataset: S/T: P-1079/ N-1079; Y: P-21/ N-21
5-CV
S/T: 80.45%
Y: 75.22%
Independent testing
S/T: 80.63%
Y: 83.33%
DeepIPs
2019 Download ↓

SerD: P-300/ N-300; TyrD: P-110/ N-110; ThrD: P-100/ N-100;
jackknife test
SerD: AUC 0.904
TyrD: AUC 0.992
ThrD: AUC 0.990
iPhoPred
2017 Download ↓

Training dataset:
S-type: P-4316/ N-4316; T-type: P-1551/ N-1551; Y-type: P-553/ N-553;
Testing dataset:
S-type: P-2273/ N-17618; T-type: P-941/ N-6258; Y-type: P-296/ N-1210;
10-CV
S-type: AUC 0.851
T-type: AUC 0.818
Y-type: AUC 0.761
Independent testing
S-type: AUC 0.715
T-type: AUC 0.683
Y-type: AUC 0.654
PhosPred-RF
Pupylation 2021 Download ↓

Training dataset: Pupylated lysine: 186; Non-pupylated lysine: 186;
Testing dataset: Pupylated lysine: 87; Non-pupylated lysine: 191;
10-CV
0.884
Independent testing
0.82
PUP-Fuse
Succinylation 2021 Download ↓

Training dataset: Succinylation sites: 6512; Non-Succinylation sites: 6512;
Testing dataset: Succinylation sites: 1479; Non-Succinylation sites: 16457;
10-CV
79.9%
NA LSTMCNNsucc
Tyrosine sulfation 2019 Download ↓

Training dataset: P-200/ N-420
Testing dataset: P-80/ N-80
10-CV
94.2%
Independent testing
85.63%
/
Ubiquitination 2021 Download ↓

H.sapiens: P-31162/ N-31162; M.musculus: P-7746/ N-7748; S.cerevisiae: P-3506/ N-3506; R.norvegicus: P-1226/ N-1226; A.nidulans: P-2299/ N-2299; A.thaliana: P-586/ N-587; T.gondii: P-424/ N-424; O.sativa: P-308/ N-308;
(ps: Training dataset 90% and Independent test dataset 10%)
H. sapiens: 0.578; M.musculus: 0.604;
R.norvegicus: 0.578;
S.cerevisiae: 0.593;
A. thaliana: 0.525;
O. sativa: 0.500;
T.gondii: 0.538;
A. nidulans: 0.636;
Independent testing
H.sapiens: AUC 0.753; M.musculus: AUC 0.789; R.norvegicus: AUC 0.72; S.cerevisiae: AUC 0.772; T.gondii: AUC 0.824; A.thaliana: AUC 0.814;
DeepTL-Ubi
2019 Download ↓

Set1: P-150/ N-150;
Set2: P-3418/ N-3418;
Set3: P-6117/ N-6117;
5-CV
Set1: 98.33%; Set2: 81.12%; Set3: 76.90%
UbiSitePred

Note: 5-CV:5-fold cross validation; 10-CV: 10-fold cross validation; P: Positive samples; N: Negative samples;
Web-server/Code is not available at the time of writing



Imbalance data

Group Type Literature Year Data Imbalance Algorithms Acc URL
Function

Unequal length sequence
Antioxidant proteins 2021 Download ↓

Antioxidant proteins: 253
Non-antioxidant proteins:1552
SMOTE 10-CV
97.00%
ORS-Pred
2020 Download ↓

Antioxidant proteins: 434
Non-antioxidant proteins: 1550
/ 8-CV
0.982
/
2019 Download ↓

Antioxidant proteins: 253
Non-antioxidant proteins: 1552
/ jackknife test
0.942
AOPs-SVM
Non-classical secreted proteins(NCSPs) 2021 Download ↓

Training dataset: P-141/ N-446;
Testing dataset: P-34/ N-34;
SMOTE 5-CV
0.896±0.019
Independent testing
0.8088
ASPIRER
2020 / 10-CV
93.23%
Independent testing
86.76%
NonClasGP-Pred
2020 / 5-CV
0.900±0.016
Independent testing
0.779
PeNGaRo
Type IV secreted effector proteins 2022 Download ↓

Training data: P-518/ N-1584;
Testing dataset: P-20/ N-150;
/ 5-CV
90.4±1.4%
Independent testing
96.5%
T4SEfinder
2021 Download ↓

T4SE_train1502: P-390/ N-1112;
T4SE_test180: P-30/ N-150;
Train915: P-305/ N-610;
Test850: P-75/ N-775;
/ jackknife test
Train915: 0.924;
Train1502 0.950;
Independent testing
Test850: 0.956
Test180: 0.966
iT4SE-EP
2019 Download ↓

Training dataset: P-390/ N-1112;
Testing dataset: P-30/ N-150;
/ 5-CV
0.957±0.025
Independent testing
0.953±0.014
Bastion4
Bioluminescent Proteins 2021 Download ↓

BLP_General: P-4604/ N-7093;
BLP_Archaea: P-66/ N-748;
BLP_Bacteria: P-4362/ N-4919;
BLP_Eukaryota: P-176/ N-1426;
(ps: 70% training dataset, 30% testing dataset)
/ 10-CV
0.939
Independent testing
0.934
/
2021 Download ↓

BLP_General: P-7956/ N-7093;
BLP_Archaea: P-45/ N-748;
BLP_Bacteria: P-748/ N-4919;
BLP_Eukaryota: P-70/ N-1426;
(ps: 70% training dataset, 30% testing dataset)
/ 10-CV
0.850
Independent testing
0.884
iBLP
Cell wall lytic enzymes 2021 Download ↓

Lyases: 68
Non-Lyases: 307
SMOTE jackknife test
99.19%
/
Electron Transport Proteins 2020 Download ↓

Electron transport proteins: 1299
General transport proteins: 4559
/ 5-CV
98.5%
Independent testing
96.82%
FastET
2019 Download ↓

Electron transport proteins: 2678
Non-Electron transport proteins: 9630
/ 10-CV
84.0%
Independent testing
86.9%
/
RNA-Binding proteins 2021 Large Data
Download ↓

RBPs sequences: 72226
Negative sequences: 137003
/ 10-CV
Macro_AUC: 0.932; Micro_AUC: 0.966
rBPDL
2020 Download ↓

Training dataset: RBPs: 2780; Non-RBPs: 7093
Testing dataset: Human: RBPs: 967; Non-RBPs: 597; S. cerevisiae RBPs: 354; Non-RBPs: 135; A. thaliana RBPs: 456; Non-RBPs: 37;
SMOTE 10-CV
97.43%
Independent testing
Human: 95.63%;
S. cerevisiae: 88.82%;
A. thaliana: 92.35%;
RBPro-RF
2019 Download ↓

Training dataset:
Huamn(9606): RBPs: 1625; Non-RBPs: 10834; Salmonella(590): RBPs: 275; Non-RBPs: 1273; E.Coli(561): RBPs: 460; Non-RBPs: 3404;
Testing dataset:
Huamn(9606): RBPs: 181; Non-RBPs: 1204; Salmonella(590): RBPs: 31; Non-RBPs: 142; E.Coli(561): RBPs: 52; Non-RBPs: 379;
/ 10-CV
Human(9606): AUC 0.83;
Salmonella(590): AUC 0.86;
E.Coli(561): AUC 0.92;
/ TriPepSVM
Plant pentatricopeptide repeat 2021 Download ↓

Training dataset: P-487/ N-9590
/ 10-CV
Training set AUC: 0.966; F1: 0.974
/
2019 / 10-CV
Training set AUC: 0.9848; F-measure: 0.9554
MixedPPR
Sub-Golgi protein 2020 Download ↓

D3-Training dataset: cis-Golgi proteins: 101; trans-Golgi proteins: 217;
D5-Training dataset: cis-Golgi proteins: 135; trans-Golgi proteins: 1063;
D4-Testing dataset: cis-Golgi proteins: 19; trans-Golgi proteins: 51;
SMOTE LOO
D3 92.6%
D5 99.2%
Independent testing
D3-model: 98.4%
D5-model: 96.42%
isGP-DRLF
Type III secretion systems 2020 Download ↓

Training dataset1: P-283/ N-313;
Training dataset2: P-379/ N-1112;
Independent dataset 1: P-35/ N-86;
Independent dataset 2: P-83/ N-14;
Independent dataset 3: P-108/ N-108;
Independent dataset 4: P-226/ N-913;
SMOTE Independent testing
EP3_1_model:
Dataset 1: 0.967; Dataset 2: 0.887; Dataset 3: 0.773; Dataset 4: 0.895;
EP3_2_model:
Dataset 1: 0.818; Dataset 2: 0.629; Dataset 3: 0.922; Dataset 4: 0.838
EP3
Secretory Proteins 2019 Download ↓

Secretory proteins: 35
Non-secretory protein: 266
SMOTE 5-CV
91.6%
Independent testing
86.00%
SecProMTB
Modification

Equal length sequence
Multiple PTMs 2020 Large Data
Download ↓

Training dataset:
Phosphoserine/threonine:
P-135556/ N-2803647; Phosphortyrosine: P-9427/ N-93291; N-linked glycosylation: P-90344/ N-511755; O-lined glycosylation: P-4216/ N-103771;N6-acetyllysine: P-22355/ N-274668; Methylarginine: P-4675/ N-99946; Methyllysine: P-2781/ N-45524; S-palmitoylation-cysteine: P-3812/ N-26573; Pyrrolidone-carboxylic-acid: P-1394/ N-10528; Ubiquitination: P-3707/ N-49963; SUMOylation: P-1225/ N-23932; Hydroxylysine: P-356/ N-2650; Hydroxyproline: P-2773/ N-11761;
Testing dataset:
Phosphoserine/threonine:
P-8759/ N-230755; Phosphortyrosine: P-499/ N-5540; N-linked glycosylation: P-20522/ N-120384; O-lined glycosylation: P-218/ N-16248;N6-acetyllysine: P-683/ N-11371; Methylarginine: P-269/ N-6859; Methyllysine: P-154/ N-2001; S-palmitoylation-cysteine: P-151/ N-684; Pyrrolidone-carboxylic-acid: P-230/ N-8918; Ubiquitination: P-514/ N-6621; SUMOylation: P-65/ N-1310; Hydroxylysine: P-9/ N-37; Hydroxyproline: P-422/ N-814;
/ 10-CV
NA
Independent testing
Phosphoserine/threonine: AUC 0.896;
Phosphotyrosine: AUC 0.958;
N-linked glycosylation: AUC 0.993;
O-lined glycosylation: AUC 0.943;
N6-acetyllysine: AUC 0.978;
Methylarginine: AUC 0.941;
Methyllysine: AUC 0.951;
S-palmitoylation-cysteine: AUC 0.961;
Pyrrolidone-carboxylic-acid: AUC 0.979; Ubiquitination: AUC 0.804;
SUMOylation: AUC 0.990;
Hydroxylysine: AUC 0.982;
Hydroxyproline: AUC 0.732;
MusiteDeep
Acetylation 2019 Download ↓

Acetylation sites samples: 725
Non-acetylation sites samples: 2715
/ 5-CV
77.10%
iAcetyp
2019 Large Data
Download ↓

E.coli:
Training dataset: P-6592/ N-15060;
Testing dataset: P-361/ N-1384;
C.glutamicum:
Training dataset: P-1052/ N-6129;
Testing dataset: P-83/ N-830;
M.tuberculosis:
Training dataset: P-865/ N-5167;
Testing dataset: P-68/ N-576;
B.subtilis:
Training dataset: P-1571/ N-12173;
Testing dataset: P-125/ N-1165;
S.typhimurium:
Training dataset: P-198/ N-2477;
Testing dataset: P-10/ N-217;
G.kaustophilus:
Training dataset: P-206/ N-1812;
Testing dataset: P-17/ N-192;
/ 10-CV
E.coli: 0.772;
C.glutamicum: 0.756;
M.tuberculosis: 0.783;
B.subtilis: 0.719;
S.typhimurium: 0.821;
G.kaustophilus: 0.807;
Independent testing
E.coli: 0.851;
C.glutamicum: 0.793;
M.tuberculosis: 0.827;
B.subtilis: 0.831;
S.typhimurium: 0.795;
G.kaustophilus: 0.809;
PAPred
Carbonylation 2021 Large Data
Download ↓

K
Training dataset: P-618/ N-26995;
Testing dataset: P-117/ N-7439;
P
Training dataset: P-162/ N-22418;
Testing dataset: P-16/ N-5318;
R
Training dataset: P-204/ N-22849;
Testing dataset: P-54/ N-5966;
T
Training dataset: P-191 /N-24271;
Testing dataset: P-24/ N-6507;
SMOTE-KSU 10-CV
K: 82.73%;
P: 82.72%;
R: 83.16%;
T: 85.37%;
Independent testing
K: 98.21%;
P: 97.92%;
R: 97.96%;
T: 98.94%;
CarSite‑II
2021 Download ↓

K
Training dataset: P-266/ N-1802;
Testing dataset: P-34/ N-147;
P
Training dataset: P-114/ N-716;
Testing dataset: P-12/ N-76;
R
Training dataset: P-119/ N-754;
Testing dataset: P-17/ N-93;
T
Training dataset: P-116 /N-702;
Testing dataset: P-5/ N-30;
/ 10-CV
K: AUC 0.789;
P: AUC 0.814;
R AUC 0.726;
T: AUC 0.790;
Independent testing
K: AUC 0.756;
P: AUC 0.752;
R: AUC 0.6495;
T: AUC 0.8400;
iCarPS
2020 Download ↓

K P-300/ N-1949; P P-126/ N-792; R P-136/ N-847; T P-121/ N-732;
/ 10-CV
K: 84.43%; P: 86.79%; R: 84.23%; T: 86.17%;
iCar-PseCp
Glutarylation 2021 Download ↓

Training dataset: P-400/ N-1703
Testing dataset: P-44/ N-203
SMOTE-Tomek 10-CV
79.98%
Independent testing
72.07%
/
Glycosylation 2021 Download ↓

Training dataset:
N-linked glycosylation site:495; Non-N-linked glycosylation site: 5018;
Testing dataset: N-linked glycosylation site:103; Non-N-linked glycosylation site: 1019;
/ 5-CV
93.4%
Independent testing
92.9%
NIonPred
2019 Download ↓

First_stage Training dataset: P-629/ N-5566;
Second-stage Training dataset: P-2050/ N-1030
Testing dataset: P-167/ N-280;
/ 10-CV
0.733
Independent testing
0.740
N-GlyDE
Hydroxylation 2016 Download ↓

HyP sites
Hydroxylysine sites: 851
Non-Hydroxylysine sites: 3505
HyL sites
Hydroxylysine sites: 142
Non-Hydroxylysine sites: 980
/ jackknife test
HyP sites: 96.58%
HyL sites: 97.08%
iHyd-PseCp
Malonylation 2020 Download ↓

Mammalian
Training dataset: P-5006/ N-76264;
Testing dataset: P-1252/ N-19066;
Plant
Training dataset: P-196/ N-2394;
Testing dataset: P-82/ N-1195;
/ 10-CV
Mammalian: 0.764
Plant: 0.660
Independent testing
Mammalian: 0.866
Plant: 0.691
Kmalo
Methylation 2020 Download ↓

Training dataset: P-8344/ N-244600
Testing dataset: P-2085/ N-61150
Undersampling 5-CV
0.76
Independent testing
0.75
DeepRMethylSite
Palmitoylation 2020 Download ↓

S-palmitoylation sites: 3089
Non-S-palmitoylation sites: 18992
/ 10-CV
Human: AUC 0.900
Mouse: AUC 0.897
GPS-Palm
Phosphorylation 2020 Large Data
Download ↓

Training dataset:
S/T: P-165787/ N-879507; Y: P-28965/ N-134997;
Testing dataset:
S/T: P-18588/ N-102113; Y: P-3248/ N-14504;
/ S/T: AUC 0.82;
Y: AUC 0.73;
NA DeepPSP
Pupylation 2018 Download ↓

Training dataset: Pupylated sites: 183; Non-pupylated lysine: 2258;
Testing dataset: Pupylated lysine: 29; Non-pupylated lysine: 408;
/ 10-CV
95.09%
Independent testing
83.75%
/
S-nitrosylation 2019 Download ↓

Training dataset: SNO sites: 33833; Non-SMO sites: 17165;
Testing dataset: SNO sites: 351; Non-SNO sites: 3168;
/ 5-CV
0.70
Independent testing
0.752
PreSNO
2019 Download ↓

Training dataset: SNO sites: 731; Non-SMO sites: 810;
Testing dataset: SNO sites: 124; Non-SNO sites: 221;
/ 5-CV
83.11%
Independent testing
73.17%
/
2018 Download ↓

Training dataset:
Tyrosine nitration: P-1210/ N-8043;Tryptophan nitration: P-66/ N-155;
S-nitrosylation: P-3409/ N-17453;
Testing dataset:
Tyrosine nitration: P-189/ N-1182;
S-nitrosylation: P-485/ N-4947;
/ 10-CV
Tyrosine nitration: AUC 0.65;
Tryptophan nitration: AUC 0.80;
S-nitrosylation: AUC 0.70
Independent testing
Tyrosine nitration: AUC 0.6879;
Tryptophan nitration: AUC 0.8428;
S-nitrosylation: AUC 0.70
DeepNitro
S-sulfenylation 2021 Download ↓

Training dataset: S-sulfenylation sites: 900; Non-S-sulfenylation sites: 6856;
Testing dataset: S-sulfenylation sites: 145; Non-S-sulfenylation sites: 268;
/ 10-CV
79.9%
Independent testing
77.09%
Sulf-DNN
2018 / 10-CV
89.0%
Independent testing
74.0%
SVM-SulfoSite
Succinylation 2019 Download ↓

Training dataset: Succinylation sites: 3216; Non-Succinylation sites: 16412;
Testing dataset: Succinylation sites: 218; Non-Succinylation sites: 2621;
/ 10-CV
85.68%
Independent testing
86.79%
CNN-SuccSite
SUMOylation 2020 Download ↓

Training dataset: SUMOylation sites: 3363; Non-SUMOylation sites: 123131
/ 10-CV
AUC 0.8472
mUSP
Ubiquitination 2021 Download ↓

Training dataset:
Set1: P-150/ N-150; Set2: P-3419/ N-3419; Set3: P-6118/ N-6118; Set4: P-263/ N-4345;Set5: P-131/ N-639; Set6: P-37/ N-639;
Testing dataset:
Independent test1: P-92/ N-301;
Independent test2: P-176/ N-475;
Independent test3: P-96/ N-666;
SMOTE 5-CV
Set1: AUC 0.8258;
Set2: AUC 0.7592;
Set3: AUC 0.7853;
Set4: AUC 0.9777;
Set5: AUC 0.9782;
Set6: AUC 0.9860;
Independent testing
Independent test1: 78.09%;
Independent test2: 74.19%;
Independent test3: 87.13%;
UbiSite-XGBoost
2021 Download ↓

Training dataset: Ubiquitination site: 2043; Non-ubiquitination sites: 6130;
Testing dataset: Ubiquitination site: 511; Non-ubiquitination sites: 1533;
/ 5-CV
85.38%
Independent testing
85.36%
CNNAthUbi

Note: 5-CV:5-fold cross validation; 10-CV: 10-fold cross validation; LOO: leave-one-out cross-validation; P: Positive samples; N: Negative samples;
SMOTE: Synthetic Minority Oversampling Technique;
SMOTE-KSU: K-means similaritybased undersampling and the synthetic minority oversampling technique;
SMOTE-Tomek: Synthetic Minority Over-sampling Technique (SMOTE) and the undersampling method Tomek;