Summary of Imbalance Classification of Protein PTMs (ImbClassi_PTMs)
.
Protein PTMs Tools/Web-servers Datasets Features Imbalance Algorithms Performance DOI Year
Pos/Neg Resources Cut-off Len Extraction Selection Exp. Sn(%) Sp(%) Acc(%) MCC AUC Others/Note
Lysine Glutarylation BiPepGlut All: 723/1923 PLMD 0.4 21 Bi-Peptide-Based Evolutionary Features ET Over-sampling
ET
10-CV
Inde
70.0
84.8
92.9
95.6
81.5
92.0
0.64
0.82
F1: 0.79
F1: 0.88
10.3390/genes11091023 2020
iGlu_AdaBoost Train: 400/1703;
Test: 44/203
PLMD, NCBI, SWISS-Prot 0.4 23 188D, CKSAAP, EAAC CHI2, IFS SMOTE-Tomek
AdaBoost
10-CV
Inde
87.48
72.73
72.49
71.92
79.98
72.07
0.61
0.36
0.89
0.63
Pre: 76.07%, F1: 0.81
Pre: 35.96%, F1: 0.48
10.1021/acs.jproteome.0c00314 2021
RF-GlutarySite * Train: 400/1703;
Test: 44/203
PLMD, NCBI, SWISS-Prot 0.4 23 AAIndex, AAfactor, FEBS XGBoost Under-sampling
RF
10-CV
Inde
81.00
73.00
68.00
70.00
75.00
72
0.50
0.43
0.81
0.81
10.1039/C9MO00028C 2019
MDDGlutar * Train: 430/860
Test :46/92
PLMD 0.4 21 AAC, AAPC, CKSAAP MDD
SVM
5-CV
Inde
0.677
0.677
0.619
0.619
0.638
0.638
0.28
0.28
10.1186/s12859-018-2394-9 2019
Lysine Succinylation HybridSucc Train: 21,770/165,071
Predict: 8710 proteins (23866 sites)
PLMD3.0, PhosphoSitePlus, dbPTM 0.4 21 PseAAC, CKSAAP, OBC, AAIndex, ACF, GPS, PSSM, ASA, SS, BTA PLR
DNN
10-CV General: AUC: 0.885; Human-specific: AUC: 0.952 10.1016/j.gpb.2019.11.010 2020
Inspector Train: 4755/50,565
Test: 254/2977
Pos: ProtKB/Swiss-Prot; Neg: NCBI
Test: 254/2977
0.3 35 DBPB, PSDAAP, PseAAC, PWAA, EGAAC, CKSAAGP F-score, IFS ENN, ADASYN
RF
10-CV
Inde
0.876
0.693
0.917
0.717
90.4
0.715
0.981
0.238
10.1016/j.ab.2020.113592 2020
SSKM_Succ Train: 4695/47,027(Unlabelled)
Validation:1815/24509(Unlabelled)
Test: 2608/1050(Unlabelled)
Train:PLMD,UniProt;
Others:dbPTM
0.3 21 proximal PTMs, Grey model, DNC, PSAAP RF Semi-supervised (Clustering SVM) Inde 82.21 75.15 80.19 0.546 10.1109/TCBB.2020.3006144 2020
CNN-SuccSite * Train: 3216/16,412
Test: 218/2621
PLMD 0.3 31 PSAAC, CKSAAP(Top 400 by mRMR), PSSM MDD
CNN
10-CV
Inde
86.94
84.40
85.43
86.99
85.68
86.79
0.608
0.489
10.1038/s41598-019-52552-4 2019
PSuccE Train: 4755/50,565
Test: 254/2977
Pos: ProtKB/Swiss-Prot; Neg: NCBI 0.3 35 AAC, BE, PCP, GPAAC IG Bootstrap Sampling
Ensemble SVM
10-CV
Inde
84.31
88.6
93.73
37.5
89.14
84.5
0.79
0.2
10.1186/s12859-018-2249-4 2018
Cysteine
S-sulfenylation
SulSite-GTB Train: 1031/8028
Test: 216/1418
Carroll Lab, RedoxDB, UniProtKB 0.4 21 AAC, DC, EBGW, KNN, PSAAP, PsePSSM, PWAAC LASSO SMOTE
Gradient tree boosting
5-CV
Inde
87.21
84.10
98.61
93.02
92.86
88.53
0.8631
0.774
0.9706
0.9425
10.1007/s00521-020-04792-z 2020
Butt's work * Train: 900/6856 NCBI
Test: 216/1418
Train: NCBI
Test: Carroll Lab, RedoxDB, UniProtKB
0.4 21 SVV, M-matrix, PRIM, R‑PRIM, FV, AAPIV, RAAPIV NN (GDAL) 10-CV
Inde
88.72
0.8653
98.09
0.8333
96.89
0.56
0.8618
0.9309
10.1007/s10989-019-09931-2 2019
Fu-SulfPred Xu: Train: 900/6856
      Test: 145/268
Bui: Train: 1031/8028
      Test: 216/1418
PRESS: Train:-
            Test: 44/266
NCBI(Xu)
Carroll Lab, RedoxDB, UniProtKB(Bui)
PDB(PRESS)
0.4 21 PSAAP, AAPPI Category-based Resampling
Multi-Forest


Inde

Inde

Inde


85.15

80.79

91.99


68.62

67.20

84.09


79.34

79.00

90.87


0.5437

0.3736

0.6809


10.1016/j.jtbi.2018.10.046 2019
Sulf_FSVM * Train: 900/6856
Test: 216/1418
Carroll Lab, RedoxDB, UniProtKB 0.4 21 AAC, BE, CKSAAP mRMR Fuzzy SVM 10-CV
Inde
73.26
79.81
70.78
79.69
71.07
0.2971
0.4461


10.1016/j.jtbi.2018.08.022 2018
PredCSO Train: 228/ 757
Test :51/200
PDB 0.9 11 PSAAP, PSSM, ASA, FBS2P mR, SBE Bootstrap resampling
Ensemble GTB by voting
10-CV
Inde
87.85
88.2
87.0
89.0
87.2
88.8
0.684
0.702


Pre: 67.1%, F1: 0.762
10.1039/c8mo00089a 2018
Citrullination PCSPred_SC Train: 116/232
Test: 138/150
Train: Zhang2017
Test: UniProt
21 BE, PSAAP, PseAAC, PP t-SNE ADAYSN
SVM
10-CV
Inde
94.80
92.8%
93.10
93.3%
93.70
93.1%
0.862
0.861
0.997
0.995

10.1109/ACCESS.2020.2992672 2020

Lysine Glycation
BPB_GlySite * Train: 223/446 CPLM, UniProt 0.4 15 BPB SVM 10-CV 63.68 72.60 69.63 0.3499 0.7622 10.1016/j.compbiolchem.2017.10.004 2017
Gly-PseAAC Train: 223/446 CPLM, UniProt 0.4 15 PSAAP SVM 10-CV 57.48 74.30 68.69 0.3166 0.7199 10.1016/j.gene.2016.11.021 2017
Lysine Sumoylation C-iSUMO Train: 780/21353 CPLM 0.4 31 ASA, TA NearMiss
AdaBoost
10-CV 0.719 0.758 0.738 0.478 10.1016/j.compbiolchem.2020.107235 2020
SUMO-Forest Train: 755/9944 UniProt 0.4 21 PSAAP, PseAAP, BK, SP Cascade Forest 10-CV
10-CV
90.72
92.05
99.29
99.03
98.69
98.54
0.8953
0.8915
98.13
99.05
SUMO-Forest-FM
SUMO-Forest-CM
10.1016/j.gene.2020.144536 2020
Lysine Formylation CKSAAP_FormSite * Train: 182/1637 PLMD 0.4 23 CKSAAP(Top300) F-score Biased SVM 10-CV 71.04 77.10 76.50 0.3228 10.1016/j.ygeno.2019.05.027 2020
LFPred Train: 184/1640(Unlabelled) UniProt, PLMD, dbPTM 0.4 41 AAC, BPF, AAI KNN Jackknife 80.4 78.9 79.3 0.55 10.1016/j.jtbi.2019.03.011 2019

Serine/Threonine O-GlcNAcylation
O-GlcNAcPRED-II Train: 945/50914
Test: 368/27139
Train: dbOGAP, Jochmann2014
Test: Wu2014, Li2015
0.3 23 AAC, DAA, BPB, ANBPB, DBPB, PSAAP, PSDAAP, PSTAAP KPCA-FUS
Rotation Forest (KNN, RF, NB, SVM)
5-CV
Inde
77.89
67.12
98.38
72.46
91.90
72.39
0.8112
0.1012


10.1093/bioinformatics/bty039 2018
Lysine Khib KhibPred * Train: 4659/60117
Test: 273/3946
Huang2018 0.4 29 CKSAAP, BE, AAF Ensemble SVM 10-CV
Inde
75.57
77.66
68.40
62.85
70.92
63.81
0.4207
0.2036
0.7937
0.7777

10.1016/j.compbiolchem.2020.107280 2020
KhibPred' * All:S.cerevisiaes: 960/8673
P. patens: 9600/48691
H. cells: 5444/51834
R. seeds: 8110/40776
UniProtKB/Swiss-Prot, Yu2017, Xue2020, Huang2017, Huang2018 0.3 21 BE, CKSAAP, PCP(AAIndex), KNN RF DT,GBR,GNB,KNN,RF,SVM Inde S. cerevisiaes: AUC = 0.807
P. patens: 0.781
R. Seeds: 0.825
H. cells: 0.831
10.1016/j.ab.2020.113793 2020
iLys-Khib * Train:4659/4659
Test:273/3946
Huang2018 0.4 35 AAF, BE, CKSAAP mRMR, IFS Fussy SVM 10-CV
Inde
74.48
72.53
65.77
64.27
70.12
64.80
0.4040
0.1864
0.7702
0.7557
Pre: 68.51%
Pre: 12.31%
10.1016/j.chemolab.2019.06.009 2019

Lysine Crotonylation
LightGBM-CroSite Train: 159/847 # UniProt 31 BE, PWAA, EBGW, KNN, PsePSSM Elastic Net SMOTE
LightGBM
Jackknife 98.86 99.11 98.99 0.9798 0.9996 1"https://doi.or0.1016/j.ab.2020.113903 2020
Wang's work Mamm: Train: 167/388
            Test:711/7458
Plant: Train: 2548/26859
           Test: 669/26859
Mamm: UniProt
Plant: Liu2018
0.3 31 AAC, AAPC, BE, CKSAAP, EAAC, EGAAC CHI2 RUS (Train and Test)
RF, SVM
Mamm
Plant
92
77
88
83
90
70
0.80
0.54


10.1038/s41598-020-77173-0 2020
pKcr Train: 2742/29676
Test: 711/7458
Liu2018 0.3 29 Word Embedding CNN 10-CV
Inde
51.69
53.67
90.00
90.00
86.78
85.64
0.339
0.335
0.855
0.853

10.1109/ACCESS.2020.2966592 2020
iCrotoK-PseAAC Train: 378/500 UniProt 0.6 41 SVV, SM, PRIM, RPRIM, FV, AAPIV, RAAPIV CNN 10-CV 99.17 99.40 99.53 0.98 10.1371/journal.pone.0223993 2019
Qiu's work Train: 159/847 # UniProt 31 PWAA SVM Jackknife 71.69 98.7 94.43 0.778 10.1016/j.artmed.2017.02.007 2017
Phosphorylation MusiteDeep
Serine(S), Threonine (T), Tyrosine (Y)
General:
S/T: Train: 34401/677157
      Test: 2074/60880
Y: Train: 1883/128,007
   Test: 47/9174
Kinase-specific(S/T):
CDK: 315/15878
PKA: 354/20231
CK2: 303/9687
MAPK: 399/16572
PKC:456/19779
General: Swiss-Prot
Kinase-specific: Swiss-Prot,PegPhos
0.5 33 One-of-K CNN with 2D attention mechanism General: AUC: 50% improvement
Kinase-specific: competitive results
10.1093/bioinformatics/btx496 2020
GPS 5.0
S,T,Y
ALL:
ssKSRs: 23,195
GPS 2.1, PhosphoSitePlus 61 PWD, SMO LR AUC score:
CDK: 0.9501; CK2: 0.9405; PKA: 0.9357; MAPK: 0.9432
PKs only:PKD: 0.9527; Tec: 0.9429; DYRK_ST: 0.9656; DYRK_Y: 0.9985
10.1016/j.gpb.2020.01.001 2020
DeepPPSite *
S,T,Y
S: Train: 4316/4316
   Test: 2773/17118
T: Train: 1551/1551
   Test: 941/6258
Y: Train: 553/553
   Test:210/1296 #
Train: ELM;
Test:PPA
0.3 S:29
T:19
15
BE, EBGW, CKSAAP, PSPM, IPCP F-score RNN-LSTM 10-CV
Inde
10-CV
Inde
10-CV
Inde
80.79
63.46
80.14
76.14
46.75
30.40
79.98
81.41
80.00
79.39
90.54
95.53





0.608
0.356
0.602
0.558
0.358
0.350
0.881
0.796
0.877
0.859
0.821
0.772





10.1016/j.ab.2020.113955 2020
HPhosPPred
S,T,Y
Train: 638/1132 UniProtKB, Swiss-Prot 0.5 PsePSSM, PSSM-ACT, NMBAC LFDA SVM 10-CV 88.34 63.64 79.43 0.544 10.1016/j.chemolab.2020.104066 2020
PROSPECT
Histidine (H)
Train: 219/1277
Test: 25/143
UniProt 0.4 27 One-of-K, EGAAC, CKSAAGP Weighted sum of two CNN and one RF classifiers 10-CV
Inde
35.16
48.00
90.00
90.2

0.260
0.37
0.770
0.821

10.1142/S0219720020500183 2020
Scansite
SWISSPROT, TrEMBL, Genpept and Ensembl PSSM 10.1093/nar/gkg584 2003
Cysteine
S-glutathionylation
Anashkina's work Train: 221/1047 RedoxDB 7 PSM; Glutathionylation Propensity Scores Jackknife 0.766 0.400 0.467 0.133 10.1186/s12859-020-03571-w 2020
Lysine Malonylation Kmalo Mamm:
Train: 5006/76,264
Test:1252/19,066
Inde Test: 460/10,289(Kmal-sp tool)+1251/19,061(LEMP)
Plant:
Train: 196/2394
Test: 82/1195
Inde Test: 82/1195
PLMD,LEMP, Liu2018 0.4 11-39 BE, AAIndex, PSSM, AAC, PAAC Mamm: Ensemble CNNs
Plant: Combiantion of CNNs and RF
10-CV
Inde
10-CV
Inde
0.653
0.910
0.653
0.682
0.661
0.864
0.661
0.692
0.764
0.866
0.660
0.691
0.174
0.480
0.174
0.195
0.742
0.943
0.742
0.772
10.1038/s41598-020-67384-w 2020

Lysine Ubiquitination
DeepTL-Ubi ALL: 94518 sites, 10% for Testing PhosphoSitePlus, mUbiSida, PLMD 0.4 31 BE, Deep Feature Extractor CNN Average AUC: 0.793 10.1016/j.ymeth.2020.08.003 2020

Lysine Acetylation
Ning's work Train: 1956/18064 CPLM, PLMD, PhosphoSitePlus, RCSB
HPRD
0.4 19 PCP, PSSM, AC, Kmer, ASA, SS mRMR, IFS Cascade SVMs Inde 76.71 72.19 74.45 0.19 10.1186/s12859-019-2938-7 2019
LAceP Train: 5910/82974
Test: 300/300
SysPTM2, PhosphoSitePlus 0.7 21 AAPP, TPM, PSSC RUS,Boosting 10-CV
Inde
68.01
61.33
69.95
75.40
68.98
68.37
0.3797
0.3788
10.1371/journal.pone.0089575.t002 2014

Glycosylation
SPRINT-Gly H: Train: 11175(N-),687(O-)
Test: 1260(N-),79(O-)
Mouse: Train: 13428(N-),839 (O-)
Test: 1500(N-),100 (O-)
Unsampling results: 1:1(N-),1:3(O-)
UniProt, dbPTM, GlycoProtDB, UniPep, UniCarbKB, Chauhan2012, Gupta2004, Hansen1998 0.3 N-:5
O-:9
AAS,EI DNN,SVM
Inde

Inde

98

99

Pre: 93%

Pre: 99%
10.1093/bioinformatics/btz215 2019

Nitration and Nitrosylation
DeepNitro Train(Pos/Neg);Test(Pos/Neg):
Y nitration: 1210/8043; 189/1182
W nitration: 66/155
S-nitrosylation: 3409/17453; 485/4947
PubMed 0.4 41 One-hot, PFR, k-space spectrum, PSSM DNN
10-CV
Inde
AUC score:
Y nitration: 0.65; W nitration: 0.80; C nitrosylation: 0.70
Y nitration: 0.6727; C nitrosylation: 0.7437
10.1016/j.gpb.2018.04.007 2018

Cleavage
Procleave All:
Train: 3759 sites
Test: 198 sites
MEROPS 0.3 sequence features, 3D structure features LOWESS smoothing Conditional Random Field (CRF) Cathepsin
Caspase-3
Caspase-6
MMP-2
Granzyme B
0.898
1.000
0.974
0.953
0.940
0.98
1.000
1.000
0.953
1.000
0.939
1.000
0.987
0.953
0.970
0.880
1.000
0.974
0.906
0.941
0.973
1.000
0.990
0.979
0.991
Pre: 0.978
Pre: 1.000
Pre: 1.000
Pre: 0.953
Pre: 1.000
10.1016/j.gpb.2019.08.002 2020
EvoCleave
HIV-1 Protease Cleavage
ALL:301-Dataset: 62/239
746-Dataset: 401/345
1625-Dataset: 374/1251
Impens-Dataset: 149/798
Schilling-Dataset: 434/2328
UCI machine learning repository Weighting Coevolutionary Patterns SVM 10-CV 301-Dataset: Sn:0.63, Pre: 1, AUC: 0.9518, PR: 0.89
746-Dataset: Sn:0.9, Pre: 0.88, AUC: 0.9393, PR: 0.95
1625-Dataset: Sn:0.84, Pre: 0.94, AUC: 0.9695, PR: 0.93
Impens-Dataset: Sn:0.62, Pre: 0.74, AUC: 0.9251, PR: 0.7292
Schilling-Dataset: Sn:0.73, Pre: 0.81 AUC: 0.9661, PR: 0.84
10.1109/tcbb.2019.2914208 2019
iProt-Sub All:
Aspartic: 1854
Cysteine: 901
Metallo: 1343
Serine: 2539
MEROPS, UniProt 0.7 Binary, CKSAAP, PSSM, BLOSUM, KNN, AAC, Aaindex, CHR, SS, SA, DISO mRMR SVR
5-CV
Inde
AUC score:
Caspase-3: 1.000; Caspase-7: 1.000; Caspase-6: 0.998; Caspase-8: 1.000; MMP-2: 0.968; MMP-3: 0.966; Granzyme B: 0.992(human), 0.998(mouse)
Caspase-3: 0.993; Caspase-7: 0.989; Caspase-6: 0.980; Caspase-8: 0.990; MMP-2: 0.724; MMP-3: 0.864; Granzyme B: 0.948(human), 0.972(mouse)
10.1093/bib/bby028 2019
DeepCalpain Train: 442/160,698 PubMed, UniProt 31 AAC, Binary, PSSM, CKSAAP Deep Neuro Network, PSO AUC score: Calpain: 0.9044; u-calpain: 0.9136; m-clapain: 0.9068 10.3389/fgene.2019.00715 2019
Multiple PTMs MusiteDeep Train(Pos/Neg);Test(Pos/Neg):
S/T Phos:135,556/280,3647;8759/230,755
Y Phos: 9427/93,291;499/5540
N-Glyco: 90,344/511,755; 20522/120,384
O-Glyco: 4216/103,771; 218/6248
K N6-acetyl: 22,355/274,668; 683/11,371
R Methyl: 4675/99,946; 269/6859
K Methyl: 2781/45,524; 154/2001
C S-palmitoyl: 3812/26,573; 151/684
Pyrrolidone-carboxylic-acid: 1394/10,528; 230/891
Ubi:3707/49,963; 514/6621;
SUMO: 1225/23,932; 65/1310
K Hydroxy: 356/2650; 9/37
P Hydroxy: 2773/11,761; 422/814
UniProtKB/Swiss-Prot 33 One-of-K Bootstraping
Weight average of MultiCNN and CapsNet
Transfer learning for samll traing samples
AUC/PR results:
S/T Phos: 0.896/0.329
Y Phos: 0.958/0.864
N-Glyco: 0.993/0.937
O-Glyco: 0.943/0.539
K N6-acetyl: 0.978/0.858
R Methyl: 0.941/0.844
K Methyl: 0.951/0.850
C S-palmitoyl: 0.961/0.922
Pyrrolidone-carboxylic-acid: 0.979/0.947
Ubi: 0.804/0.279
SUMO: 0.990/0.881
K Hydroxy: 0.982/0.930
P Hydroxy: 0.732/0.627
10.1093/nar/gkaa275 2020
CapsNet_PTM Train(Test):
S/T Phos:36395(12,177)
Y Phos:2141(826)
N N-Glyco:10,218(6564)
K N6-acetyl:6376(1907)
R Methyl:2241(455)
C S-Palmitoyl:572(266)
Q Pyrrolidone-carboxylic-acid:623(154)
K SUMO:334(108)
UniProtKB/Swiss-Prot 0.3 33 Based on 237 physichemical properties Bootstraping
Capsule Network
AUC/PR results:
S/T Phos: 0.8470/0.3437;
Y Phos: 0.7171/0.2620
N N-Glyco: 0.9808/0.8382;
K N6-acetyl: 0.7280/0.1970
R Methyl: 0.9891/0.9352;
C S-palmitoyl: 0.5003/0.5003
Q Pyrrolidone-carboxylic-acid: 0.9256/0.8333
K SUMO: 0.8680/0.5717
10.1093/bioinformatics/bty977 2019
lysineSGT
Acetyl, Crotonyl, Methyl, Succinyl
Has PTM(%)/No PTM(%):
Acetyl: 62/38
Crotonyl:2/98
Methyl: 2/98
Succinyl: 18/82
No PTM: 27/73
UniProt - 27 Unigram, Bigram, Trigram, CSKAAP, k-space with chemical property, Interaction CNNs, Sequence graph transform(SGT) 5-CV
Inde
Aiming: 83.91%, Coverage: 83.91%, Acc: 82.75%, Abs True: 85.21%, Abs False: 4.27%
Aiming: 65.00%, Coverage: 65.00%, Acc: 65.00%, Abs True: 85.00%, Abs False: 5.00%
10.1016/j.chemolab.2020.104171 2019
Note:
* Web-server is not available.
Phrases in italics: abbreviation of sepcies, such as Mamm (Mammalian), Plant, H (H. sapiens), M (M. musculus), T (T. aestivum).
Len: sequence length.
Thres: Threshold of redundant samples.
# Dataset address is not available/working at the time of writting.
Pos/Neg: Number of positive/negative samples.
Cut-off: Cut-off value of redundant samples.
10-CV: 10-fold cross validation; Jack: Jackknife test; Inde:    Independent test results.
Last updated: 2021-02-17
E-mail address: doulijun777@163.com.