Balance data |
Group | Type | Literature | Year | Data | Acc | URL | |
---|---|---|---|---|---|---|---|
Function Unequal length sequence |
|||||||
DNA-Binding Proteins | 2021 | Download ↓ UniSwiss-Tr: DBPs: 4500; Non-DBPs: 4500; UniSwiss-Tst: DBPs: 381; Non-DBPs: 381; |
jackknife test 86.84% |
Independent testing 85.83% |
TargetDBP plus | ||
2020 | Download ↓ Training dataset-PDB1075: DBPs: 525; Non-DBPs: 525; Independent testing-PDB186: DBPs: 93; Non-DBPs: 93; |
jackknife test PDB1075: 86.05% |
Independent testing PDB186: 75.30% |
PSSMEI | |||
Gram-negative bacteria | 2021 | Download ↓ T1SS: 195; T2SS: 83; T3SS: 1194; T4SS: 713; T6SS: 181; |
/ | BastionHub | |||
Bacterial Type III Secreted Effectors | 2021 | Download ↓ Training dataset: P-379/ N-1112; Testing dataset: P-108/ N-108; |
5-CV 0.967±0.002 |
Independent testing 0.963±0.034 |
iT3SE-PX | ||
2020 | Download ↓ Training dataset: P-309/ N-310; Testing dataset: P-42/ N-34; |
10-CV 0.941±0.011 |
Independent testing 0.83 |
T3SEpp | |||
Soluble protein | 2022 | Download ↓ Training dataset: P-5718/ N-5718; Testing dataset: P-1550/ N-1550; |
5-CV 0.70±0.02 |
Independent testing 0.728 |
NetSolP | ||
2021 | 5-CV 76.0% |
Independent testing 58.5% |
SoluProt | ||||
Secretory Proteins | 2021 | Download ↓ Secretory proteins: 252 Non-secretory protein: 252 |
10-CV 97.8% |
CRCF | |||
Cancerlectin | 2021 | Download ↓ Cancerlectin sequences: 176; Non-Cancerlectin sequences: 227; Lectins sequences: 462; Non-Lectins sequences: 435; |
5-CV Cancerlectin: 98.54% Lectin: 95.38% |
/ | |||
2020 | Download ↓ Cancerlectin sequences: 178 Non-Cancerlectin sequences: 226 |
10-CV 83.91% |
Cancerlectins | ||||
Phage virion protein | 2020 | Download ↓ Training dataset: P-99/ N-208 Testing dataset: P-30/ N-60 |
jackknife test 0.8958 |
Independent testing 0.7980 |
SVM-PVPData | ||
Cell wall lytic enzymes | 2021 | Download ↓ Lyases: 68 Non-Lyases: 307 |
10-CV 96.09% |
/ | |||
2020 | jackknife test 0.955 |
CWLy-SVM | |||||
Thermophilic Proteins | 2020 | Download ↓ Thermophilic proteins: 915 Non-thermophilic proteins: 793 |
10-CV 98.02% |
Tool | |||
2020 | cross-validation 96.02% |
/ | |||||
2019 | 10-CV 93.03% |
Tool | |||||
Major Histocompatibility Complex | 2019 | Download ↓ Dataset1: Training dataset: MHC protein: 4370; Non-MHC protein: 4370; Testing dataset: MHC protein: 2342; Non-MHC protein: 2342; Dataset2: MHC I: 3350; MHC II: 3362; |
10-CV Dataset1: 91.66%; Dataset2: 94.224% |
Independent testing Dataset1: 93.17% |
ELM-MHC | ||
Amyloid Proteins | 2021 | Download ↓ Amyloid proteins: 165 Non_amyloid proteins: 382 (ps: 80% Training dataset, 20% Testing dataset) |
10-CV 0.895 |
Independent testing 0.827 |
iAMY-SCM | ||
2020 | 10-CV 91.59% |
PredAmyl-MLP | |||||
2018 | 10-CV 89.19% |
Independent testing 89.19% |
RFAmyloid | ||||
Modification Equal length sequence |
Multiple PTMs | 2021 | Download ↓ S1(Ace): 3114; S2(Glyca): 1399; S3(Malon): 1224; S4(Meth): 1147; S5(Succ): 1645; S6(Sumo): 1174; S7(Ubiq): 3185; (ps: 4/5 of all samples were used as training samples and the other 1/5 was utilized as an independent test set) |
10-CV 0.8589 |
Independent testing 0.8549 |
MultiLyGAN | |
Acetylation | 2020 | Download ↓ Archaea: Training dataset: P-193/ N-193; Testing dataset: P-21/ N-21; B.subtilis: Training dataset: P-1040/ N-1040; Testing dataset: P-115/ N-115; C.glutamicum: Training dataset: P-1021/ N-1021; Testing dataset: P-113/ N-113; E.amylovora: Training dataset: P-95/ N-95; Testing dataset: P-10/ N-10; E.coli: Training dataset: P-1919/ N-1919; Testing dataset: P-213/ N-213; G.kaustophilus: Training dataset: P-189/ N-189; Testing dataset: P-21/ N-21; M.tuberculosis: Training dataset: P-866/ N-866; Testing dataset: P-96/ N-96; S.typhimuricum: Training dataset: P-174/ N-174; Testing dataset: P-19/ N-19; V.parahemolvticus: Training dataset: P-1065/ N-1065; Testing dataset: P-118/ N-118; ProAcePred Negative data using this dataset |
10-CV Archaea: 84.74%; B.subtilis: 73.89%; C.glutamicum: 75.38%; E.amylovora: 96.89%; E.coli: 63.08%; G.kaustophilus: 89.15%; M.tuberculosis: 76.62%; S.typhimuricum: 90.51%; V.parahemolvticus: 75.46%; |
Independent testing Archaea: 90.00%; B.subtilis: 98.26%; C.glutamicum: 92.88%; E.amylovora: 90.00%; E.coli: 86.18%; G.kaustophilus: 97.50%; M.tuberculosis: 96.44%; S.typhimuricum: 95.00%; V.parahemolvticus: 94.02%; |
DNNAce | ||
2018 | 10-CV Archaea: 0.9; B.subtilis: 0.796; C.glutamicum: 0.8; E.amylovora: 0.983; E.coli: 0.69; G.kaustophilus: 0.897; M.tuberculosis: 0.834; S.typhimuricum: 0.874; V.parahemolvticus: 0.802; |
Independent testing Archaea: 0.81; B.subtilis: 0.952; C.glutamicum: 0.872; E.amylovora: 0.9; E.coli: 0.899; G.kaustophilus: 0.881; M.tuberculosis: 0.88; S.typhimuricum: 0.816; V.parahemolvticus: 0.869; |
ProAcePred∗ | ||||
2019 | Large Data Download ↓ Training dataset: P-12886/N-12886 Testing dataset: P-3221/ N-3221 |
10-CV 84.95% |
Independent testing 84.87% |
DeepAcet | |||
Hydroxylation | 2020 | Download ↓ Hydroxylysine sites: 185 Non-Hydroxylysine sites: 497 |
jackknife test 97.24% |
/ | |||
Malonylation | 2020 | Download ↓ Malonylation sites: 1735 Non-malonylation sites: 1735 (ps: 80% Training dataset, 20% Testing dataset) |
5-CV 91.24% |
Independent testing 90.65% |
Mal-Prec | ||
2019 | Download ↓ H. sapiens Training dataset: P-3585/ N-3585; Testing dataset: P-300/ N-300; M. musculus Training dataset: P-2606/ N-2606; Testing dataset: P-300/ N-300; E. coli Training dataset: P-1453/ N-1453; Testing dataset: P-100/ N-100; |
10-CV E. coli: 0.801±0.002 M. musculus: 0.825±0.002 H. sapiens: 0.835±0.002 |
Independent testing E. coli: 0.845 M. musculus: 0.833 H. sapiens: 0.860 |
KMAL-SP | |||
Methylation | 2020 | Download ↓ Single-methylarginine: 1465 double-methylarginine: 474 Negative samples: 39980 |
10-CV Single: 82.1% Double: 82.5% |
Code | |||
2017 | Download ↓ Methylation site R Dataset-1: P-185/ N-185; Dataset-2: P-185/ N-185; Dataset-3: P-185/ N-185; Dataset-4: P-185/ N-185; Dataset-5: P-185/ N-185; Dataset-6: P-185/ N-186; Dataset-7: P-185/ N-185; Methylation site K Dataset-1: P-226/ N-217; Dataset-2: P-226/ N-217; Dataset-3: P-226/ N-217; Dataset-4: P-226/ N-217; Dataset-5: P-226/ N-218; Dataset-6: P-226/ N-217; Dataset-7: P-226/ N-217; |
jackknife test Methylation site R Dataset-1: 80.3%; Dataset-2: 80.3%; Dataset-3: 78.4%; Dataset-4: 80.8%; Dataset-5: 82.2%; Dataset-6: 82.7%; Dataset-7: 80.5%; Average: 80.7%; Methylation site K Dataset-1: 67.7%; Dataset-2: 68.4%; Dataset-3: 72.7%; Dataset-4: 70.0%; Dataset-5: 69.0%; Dataset-6: 69.3%; Dataset-7: 72.5%; Average: 69.9%; |
MePred-RF | ||||
Palmitoylation | 2019 | Download ↓ S-palmitoylation sites: 436 Non-S-palmitoylation sites: 500 |
Self-consistency testing 99.79% |
10-CV 97.22% |
SPalmitoylC-PseAAC∗ | ||
Phosphorylation | 2021 | Download ↓ Training dataset: S: P-4316/ N-4316; T: P-1551/ N-1551; Y: P-553/ N-553; Testing dataset: S: P-2773/ N-17118; T: P-941/ N-6258; Y: P-210/ N-1296; |
10-CV S: 80.38%; T: 80.01%; Y: 77.76%; |
Independent testing S: 78.91%; T: 84.81%; Y: 82.73%; |
DeepPPSite | ||
2021 | Download ↓ Training dataset: S/T: P-4308/ N-4308; Y: P-81/ N-81; Testing dataset: S/T: P-1079/ N-1079; Y: P-21/ N-21 |
5-CV S/T: 80.45% Y: 75.22% |
Independent testing S/T: 80.63% Y: 83.33% |
DeepIPs | |||
2019 | Download ↓ SerD: P-300/ N-300; TyrD: P-110/ N-110; ThrD: P-100/ N-100; |
jackknife test SerD: AUC 0.904 TyrD: AUC 0.992 ThrD: AUC 0.990 |
iPhoPred | ||||
2017 | Download ↓ Training dataset: S-type: P-4316/ N-4316; T-type: P-1551/ N-1551; Y-type: P-553/ N-553; Testing dataset: S-type: P-2273/ N-17618; T-type: P-941/ N-6258; Y-type: P-296/ N-1210; |
10-CV S-type: AUC 0.851 T-type: AUC 0.818 Y-type: AUC 0.761 |
Independent testing S-type: AUC 0.715 T-type: AUC 0.683 Y-type: AUC 0.654 |
PhosPred-RF | |||
Pupylation | 2021 | Download ↓ Training dataset: Pupylated lysine: 186; Non-pupylated lysine: 186; Testing dataset: Pupylated lysine: 87; Non-pupylated lysine: 191; |
10-CV 0.884 |
Independent testing 0.82 |
PUP-Fuse | ||
Succinylation | 2021 | Download ↓ Training dataset: Succinylation sites: 6512; Non-Succinylation sites: 6512; Testing dataset: Succinylation sites: 1479; Non-Succinylation sites: 16457; |
10-CV 79.9% |
NA | LSTMCNNsucc | ||
Tyrosine sulfation | 2019 | Download ↓ Training dataset: P-200/ N-420 Testing dataset: P-80/ N-80 |
10-CV 94.2% |
Independent testing 85.63% |
/ | ||
Ubiquitination | 2021 | Download ↓ H.sapiens: P-31162/ N-31162; M.musculus: P-7746/ N-7748; S.cerevisiae: P-3506/ N-3506; R.norvegicus: P-1226/ N-1226; A.nidulans: P-2299/ N-2299; A.thaliana: P-586/ N-587; T.gondii: P-424/ N-424; O.sativa: P-308/ N-308; (ps: Training dataset 90% and Independent test dataset 10%) |
H. sapiens: 0.578; M.musculus: 0.604; R.norvegicus: 0.578; S.cerevisiae: 0.593; A. thaliana: 0.525; O. sativa: 0.500; T.gondii: 0.538; A. nidulans: 0.636; |
Independent testing H.sapiens: AUC 0.753; M.musculus: AUC 0.789; R.norvegicus: AUC 0.72; S.cerevisiae: AUC 0.772; T.gondii: AUC 0.824; A.thaliana: AUC 0.814; |
DeepTL-Ubi | ||
2019 | Download ↓ Set1: P-150/ N-150; Set2: P-3418/ N-3418; Set3: P-6117/ N-6117; |
5-CV Set1: 98.33%; Set2: 81.12%; Set3: 76.90% |
UbiSitePred |
Note: 5-CV:5-fold cross validation; 10-CV: 10-fold cross validation; P: Positive samples; N: Negative samples;
∗Web-server/Code is not available at the time of writing
Imbalance data |
Group | Type | Literature | Year | Data | Imbalance Algorithms | Acc | URL | |
---|---|---|---|---|---|---|---|---|
Function Unequal length sequence |
Antioxidant proteins | 2021 | Download ↓ Antioxidant proteins: 253 Non-antioxidant proteins:1552 |
SMOTE | 10-CV 97.00% |
ORS-Pred | ||
2020 | Download ↓ Antioxidant proteins: 434 Non-antioxidant proteins: 1550 |
/ | 8-CV 0.982 |
/ | ||||
2019 | Download ↓ Antioxidant proteins: 253 Non-antioxidant proteins: 1552 |
/ | jackknife test 0.942 |
AOPs-SVM | ||||
Non-classical secreted proteins(NCSPs) | 2021 | Download ↓ Training dataset: P-141/ N-446; Testing dataset: P-34/ N-34; |
SMOTE | 5-CV 0.896±0.019 |
Independent testing 0.8088 |
ASPIRER | ||
2020 | / | 10-CV 93.23% |
Independent testing 86.76% |
NonClasGP-Pred | ||||
2020 | / | 5-CV 0.900±0.016 |
Independent testing 0.779 |
PeNGaRo | ||||
Type IV secreted effector proteins | 2022 | Download ↓ Training data: P-518/ N-1584; Testing dataset: P-20/ N-150; |
/ | 5-CV 90.4±1.4% |
Independent testing 96.5% |
T4SEfinder | ||
2021 | Download ↓ T4SE_train1502: P-390/ N-1112; T4SE_test180: P-30/ N-150; Train915: P-305/ N-610; Test850: P-75/ N-775; |
/ | jackknife test Train915: 0.924; Train1502 0.950; |
Independent testing Test850: 0.956 Test180: 0.966 |
iT4SE-EP | |||
2019 | Download ↓ Training dataset: P-390/ N-1112; Testing dataset: P-30/ N-150; |
/ | 5-CV 0.957±0.025 |
Independent testing 0.953±0.014 |
Bastion4 | |||
Bioluminescent Proteins | 2021 | Download ↓ BLP_General: P-4604/ N-7093; BLP_Archaea: P-66/ N-748; BLP_Bacteria: P-4362/ N-4919; BLP_Eukaryota: P-176/ N-1426; (ps: 70% training dataset, 30% testing dataset) |
/ | 10-CV 0.939 |
Independent testing 0.934 |
/ | ||
2021 | Download ↓ BLP_General: P-7956/ N-7093; BLP_Archaea: P-45/ N-748; BLP_Bacteria: P-748/ N-4919; BLP_Eukaryota: P-70/ N-1426; (ps: 70% training dataset, 30% testing dataset) |
/ | 10-CV 0.850 |
Independent testing 0.884 |
iBLP | |||
Cell wall lytic enzymes | 2021 | Download ↓ Lyases: 68 Non-Lyases: 307 |
SMOTE | jackknife test 99.19% |
/ | |||
Electron Transport Proteins | 2020 | Download ↓ Electron transport proteins: 1299 General transport proteins: 4559 |
/ | 5-CV 98.5% |
Independent testing 96.82% |
FastET | ||
2019 | Download ↓ Electron transport proteins: 2678 Non-Electron transport proteins: 9630 |
/ | 10-CV 84.0% |
Independent testing 86.9% |
/ | |||
RNA-Binding proteins | 2021 | Large Data Download ↓ RBPs sequences: 72226 Negative sequences: 137003 |
/ | 10-CV Macro_AUC: 0.932; Micro_AUC: 0.966 |
rBPDL | |||
2020 | Download ↓ Training dataset: RBPs: 2780; Non-RBPs: 7093 Testing dataset: Human: RBPs: 967; Non-RBPs: 597; S. cerevisiae RBPs: 354; Non-RBPs: 135; A. thaliana RBPs: 456; Non-RBPs: 37; |
SMOTE | 10-CV 97.43% |
Independent testing Human: 95.63%; S. cerevisiae: 88.82%; A. thaliana: 92.35%; |
RBPro-RF | |||
2019 | Download ↓ Training dataset: Huamn(9606): RBPs: 1625; Non-RBPs: 10834; Salmonella(590): RBPs: 275; Non-RBPs: 1273; E.Coli(561): RBPs: 460; Non-RBPs: 3404; Testing dataset: Huamn(9606): RBPs: 181; Non-RBPs: 1204; Salmonella(590): RBPs: 31; Non-RBPs: 142; E.Coli(561): RBPs: 52; Non-RBPs: 379; |
/ | 10-CV Human(9606): AUC 0.83; Salmonella(590): AUC 0.86; E.Coli(561): AUC 0.92; |
/ | TriPepSVM | Plant pentatricopeptide repeat | 2021 | Download ↓ Training dataset: P-487/ N-9590 |
/ | 10-CV Training set AUC: 0.966; F1: 0.974 |
/ |
2019 | / | 10-CV Training set AUC: 0.9848; F-measure: 0.9554 |
MixedPPR | |||||
Sub-Golgi protein | 2020 | Download ↓ D3-Training dataset: cis-Golgi proteins: 101; trans-Golgi proteins: 217; D5-Training dataset: cis-Golgi proteins: 135; trans-Golgi proteins: 1063; D4-Testing dataset: cis-Golgi proteins: 19; trans-Golgi proteins: 51; |
SMOTE | LOO D3 92.6% D5 99.2% |
Independent testing D3-model: 98.4% D5-model: 96.42% |
isGP-DRLF | ||
Type III secretion systems | 2020 | Download ↓ Training dataset1: P-283/ N-313; Training dataset2: P-379/ N-1112; Independent dataset 1: P-35/ N-86; Independent dataset 2: P-83/ N-14; Independent dataset 3: P-108/ N-108; Independent dataset 4: P-226/ N-913; |
SMOTE | Independent testing EP3_1_model: Dataset 1: 0.967; Dataset 2: 0.887; Dataset 3: 0.773; Dataset 4: 0.895; EP3_2_model: Dataset 1: 0.818; Dataset 2: 0.629; Dataset 3: 0.922; Dataset 4: 0.838 |
EP3 | |||
Secretory Proteins | 2019 | Download ↓ Secretory proteins: 35 Non-secretory protein: 266 |
SMOTE | 5-CV 91.6% |
Independent testing 86.00% |
SecProMTB | ||
Modification Equal length sequence |
Multiple PTMs | 2020 | Large Data Download ↓ Training dataset: Phosphoserine/threonine: P-135556/ N-2803647; Phosphortyrosine: P-9427/ N-93291; N-linked glycosylation: P-90344/ N-511755; O-lined glycosylation: P-4216/ N-103771;N6-acetyllysine: P-22355/ N-274668; Methylarginine: P-4675/ N-99946; Methyllysine: P-2781/ N-45524; S-palmitoylation-cysteine: P-3812/ N-26573; Pyrrolidone-carboxylic-acid: P-1394/ N-10528; Ubiquitination: P-3707/ N-49963; SUMOylation: P-1225/ N-23932; Hydroxylysine: P-356/ N-2650; Hydroxyproline: P-2773/ N-11761; Testing dataset: Phosphoserine/threonine: P-8759/ N-230755; Phosphortyrosine: P-499/ N-5540; N-linked glycosylation: P-20522/ N-120384; O-lined glycosylation: P-218/ N-16248;N6-acetyllysine: P-683/ N-11371; Methylarginine: P-269/ N-6859; Methyllysine: P-154/ N-2001; S-palmitoylation-cysteine: P-151/ N-684; Pyrrolidone-carboxylic-acid: P-230/ N-8918; Ubiquitination: P-514/ N-6621; SUMOylation: P-65/ N-1310; Hydroxylysine: P-9/ N-37; Hydroxyproline: P-422/ N-814; |
/ | 10-CV NA |
Independent testing Phosphoserine/threonine: AUC 0.896; Phosphotyrosine: AUC 0.958; N-linked glycosylation: AUC 0.993; O-lined glycosylation: AUC 0.943; N6-acetyllysine: AUC 0.978; Methylarginine: AUC 0.941; Methyllysine: AUC 0.951; S-palmitoylation-cysteine: AUC 0.961; Pyrrolidone-carboxylic-acid: AUC 0.979; Ubiquitination: AUC 0.804; SUMOylation: AUC 0.990; Hydroxylysine: AUC 0.982; Hydroxyproline: AUC 0.732; |
MusiteDeep | |
Acetylation | 2019 | Download ↓ Acetylation sites samples: 725 Non-acetylation sites samples: 2715 |
/ | 5-CV 77.10% |
iAcetyp | |||
2019 | Large Data Download ↓ E.coli: Training dataset: P-6592/ N-15060; Testing dataset: P-361/ N-1384; C.glutamicum: Training dataset: P-1052/ N-6129; Testing dataset: P-83/ N-830; M.tuberculosis: Training dataset: P-865/ N-5167; Testing dataset: P-68/ N-576; B.subtilis: Training dataset: P-1571/ N-12173; Testing dataset: P-125/ N-1165; S.typhimurium: Training dataset: P-198/ N-2477; Testing dataset: P-10/ N-217; G.kaustophilus: Training dataset: P-206/ N-1812; Testing dataset: P-17/ N-192; |
/ | 10-CV E.coli: 0.772; C.glutamicum: 0.756; M.tuberculosis: 0.783; B.subtilis: 0.719; S.typhimurium: 0.821; G.kaustophilus: 0.807; |
Independent testing E.coli: 0.851; C.glutamicum: 0.793; M.tuberculosis: 0.827; B.subtilis: 0.831; S.typhimurium: 0.795; G.kaustophilus: 0.809; |
PAPred∗ | |||
Carbonylation | 2021 | Large Data Download ↓ K Training dataset: P-618/ N-26995; Testing dataset: P-117/ N-7439; P Training dataset: P-162/ N-22418; Testing dataset: P-16/ N-5318; R Training dataset: P-204/ N-22849; Testing dataset: P-54/ N-5966; T Training dataset: P-191 /N-24271; Testing dataset: P-24/ N-6507; |
SMOTE-KSU | 10-CV K: 82.73%; P: 82.72%; R: 83.16%; T: 85.37%; |
Independent testing K: 98.21%; P: 97.92%; R: 97.96%; T: 98.94%; |
CarSite‑II | ||
2021 | Download ↓ K Training dataset: P-266/ N-1802; Testing dataset: P-34/ N-147; P Training dataset: P-114/ N-716; Testing dataset: P-12/ N-76; R Training dataset: P-119/ N-754; Testing dataset: P-17/ N-93; T Training dataset: P-116 /N-702; Testing dataset: P-5/ N-30; |
/ | 10-CV K: AUC 0.789; P: AUC 0.814; R AUC 0.726; T: AUC 0.790; |
Independent testing K: AUC 0.756; P: AUC 0.752; R: AUC 0.6495; T: AUC 0.8400; |
iCarPS | |||
2020 | Download ↓ K P-300/ N-1949; P P-126/ N-792; R P-136/ N-847; T P-121/ N-732; |
/ | 10-CV K: 84.43%; P: 86.79%; R: 84.23%; T: 86.17%; |
iCar-PseCp | ||||
Glutarylation | 2021 | Download ↓ Training dataset: P-400/ N-1703 Testing dataset: P-44/ N-203 |
SMOTE-Tomek | 10-CV 79.98% |
Independent testing 72.07% |
/ | ||
Glycosylation | 2021 | Download ↓ Training dataset: N-linked glycosylation site:495; Non-N-linked glycosylation site: 5018; Testing dataset: N-linked glycosylation site:103; Non-N-linked glycosylation site: 1019; |
/ | 5-CV 93.4% |
Independent testing 92.9% |
NIonPred | ||
2019 | Download ↓ First_stage Training dataset: P-629/ N-5566; Second-stage Training dataset: P-2050/ N-1030 Testing dataset: P-167/ N-280; |
/ | 10-CV 0.733 |
Independent testing 0.740 |
N-GlyDE | |||
Hydroxylation | 2016 | Download ↓ HyP sites Hydroxylysine sites: 851 Non-Hydroxylysine sites: 3505 HyL sites Hydroxylysine sites: 142 Non-Hydroxylysine sites: 980 |
/ | jackknife test HyP sites: 96.58% HyL sites: 97.08% |
iHyd-PseCp | |||
Malonylation | 2020 | Download ↓ Mammalian Training dataset: P-5006/ N-76264; Testing dataset: P-1252/ N-19066; Plant Training dataset: P-196/ N-2394; Testing dataset: P-82/ N-1195; |
/ | 10-CV Mammalian: 0.764 Plant: 0.660 |
Independent testing Mammalian: 0.866 Plant: 0.691 |
Kmalo | ||
Methylation | 2020 | Download ↓ Training dataset: P-8344/ N-244600 Testing dataset: P-2085/ N-61150 |
Undersampling | 5-CV 0.76 |
Independent testing 0.75 |
DeepRMethylSite | ||
Palmitoylation | 2020 | Download ↓ S-palmitoylation sites: 3089 Non-S-palmitoylation sites: 18992 |
/ | 10-CV Human: AUC 0.900 Mouse: AUC 0.897 |
GPS-Palm | |||
Phosphorylation | 2020 | Large Data Download ↓ Training dataset: S/T: P-165787/ N-879507; Y: P-28965/ N-134997; Testing dataset: S/T: P-18588/ N-102113; Y: P-3248/ N-14504; |
/ | S/T: AUC 0.82; Y: AUC 0.73; |
NA | DeepPSP | ||
Pupylation | 2018 | Download ↓ Training dataset: Pupylated sites: 183; Non-pupylated lysine: 2258; Testing dataset: Pupylated lysine: 29; Non-pupylated lysine: 408; |
/ | 10-CV 95.09% |
Independent testing 83.75% |
/ | ||
S-nitrosylation | 2019 | Download ↓ Training dataset: SNO sites: 33833; Non-SMO sites: 17165; Testing dataset: SNO sites: 351; Non-SNO sites: 3168; |
/ | 5-CV 0.70 |
Independent testing 0.752 |
PreSNO | ||
2019 | Download ↓ Training dataset: SNO sites: 731; Non-SMO sites: 810; Testing dataset: SNO sites: 124; Non-SNO sites: 221; |
/ | 5-CV 83.11% |
Independent testing 73.17% |
/ | |||
2018 | Download ↓ Training dataset: Tyrosine nitration: P-1210/ N-8043;Tryptophan nitration: P-66/ N-155; S-nitrosylation: P-3409/ N-17453; Testing dataset: Tyrosine nitration: P-189/ N-1182; S-nitrosylation: P-485/ N-4947; |
/ | 10-CV Tyrosine nitration: AUC 0.65; Tryptophan nitration: AUC 0.80; S-nitrosylation: AUC 0.70 |
Independent testing Tyrosine nitration: AUC 0.6879; Tryptophan nitration: AUC 0.8428; S-nitrosylation: AUC 0.70 |
DeepNitro | |||
S-sulfenylation | 2021 | Download ↓ Training dataset: S-sulfenylation sites: 900; Non-S-sulfenylation sites: 6856; Testing dataset: S-sulfenylation sites: 145; Non-S-sulfenylation sites: 268; |
/ | 10-CV 79.9% |
Independent testing 77.09% |
Sulf-DNN | ||
2018 | / | 10-CV 89.0% |
Independent testing 74.0% |
SVM-SulfoSite | ||||
Succinylation | 2019 | Download ↓ Training dataset: Succinylation sites: 3216; Non-Succinylation sites: 16412; Testing dataset: Succinylation sites: 218; Non-Succinylation sites: 2621; |
/ | 10-CV 85.68% |
Independent testing 86.79% |
CNN-SuccSite∗ | ||
SUMOylation | 2020 | Download ↓ Training dataset: SUMOylation sites: 3363; Non-SUMOylation sites: 123131 |
/ | 10-CV AUC 0.8472 |
mUSP∗ | |||
Ubiquitination | 2021 | Download ↓ Training dataset: Set1: P-150/ N-150; Set2: P-3419/ N-3419; Set3: P-6118/ N-6118; Set4: P-263/ N-4345;Set5: P-131/ N-639; Set6: P-37/ N-639; Testing dataset: Independent test1: P-92/ N-301; Independent test2: P-176/ N-475; Independent test3: P-96/ N-666; |
SMOTE | 5-CV Set1: AUC 0.8258; Set2: AUC 0.7592; Set3: AUC 0.7853; Set4: AUC 0.9777; Set5: AUC 0.9782; Set6: AUC 0.9860; |
Independent testing Independent test1: 78.09%; Independent test2: 74.19%; Independent test3: 87.13%; |
UbiSite-XGBoost | ||
2021 | Download ↓ Training dataset: Ubiquitination site: 2043; Non-ubiquitination sites: 6130; Testing dataset: Ubiquitination site: 511; Non-ubiquitination sites: 1533; |
/ | 5-CV 85.38% |
Independent testing 85.36% |
CNNAthUbi | |||
The other imbalanced classification data of Protein post-translational modifications: ImbClassi_PTMs |
Note: 5-CV:5-fold cross validation; 10-CV: 10-fold cross validation; LOO: leave-one-out cross-validation; P: Positive samples; N: Negative samples;
SMOTE: Synthetic Minority Oversampling Technique;
SMOTE-KSU: K-means similaritybased undersampling and the synthetic minority oversampling technique;
SMOTE-Tomek: Synthetic Minority Over-sampling Technique (SMOTE) and the undersampling method Tomek;