Summary of Imbalance Classification of Protein PTMs (ImbClassi_PTMs)
Protein PTMs Tools/Web-servers Datasets Features Imbalance Algorithms Performance DOI Year
Pos/Neg Resources Length Similarity Extraction Selection
Lysine Glutarylation BiPepGlut All: 723/1923 PLMD 0.4 21 Bi-Peptide-Based Evolutionary Features ET Over-sampling; ET 10-CV:Sn=70.0%,Sp=92.9%,Acc=81.5%,MCC=0.64,F1=0.79;
Inde:84.8%,95.6%,92.0%,0.82,0.88
10.3390/genes11091023 2020
iGlu_AdaBoost Train: 400/1703;
Test: 44/203
PLMD, NCBI, SWISS-Prot; 0.4 23 188D, CKSAAP, EAAC CHI2, IFS SMOTE-Tomek; AdaBoost 10-CV: Sn=87.48%, Sp=72.49%, Acc=79.98%, MCC=0.61, AUC=0.89, Pre=76.07%, F1=0.81
Inde: Sn=72.73%, Sp=71.92%, Acc=72.07%, MCC=0.36, AUC=0.63, Pre=35.96%, F1=0.48
10.1021/acs.jproteome.0c00314 2021
RF-GlutarySite * Train: 400/1703;
Test: 44/203
PLMD, NCBI, SWISS-Prot; 0.4 23 AAIndex, AAfactor, FEBS XGBoost Under-sampling; RF 10-fold CV: Sn=81.00%, Sp=68.00%, Acc=75.00%, MCC=0.50, AUC=0.81
Inde: Sn=73.00%, Sp=70.00%, Acc =72%, MCC=0.43, AUC=0.81
10.1039/C9MO00028C 2019
MDDGlutar * Train: 430/860
Test :46/92
PLMD 0.4 21 AAC, AAPC, CKSAAP - MDD; SVM 5-CV: Sn=0.677, Sp=0.619, Acc=0.638, MCC=0.28
Inde:Sn=0.652, Sp=0.739, Acc=0.71, MCC=0.38
10.1186/s12859-018-2394-9 2019
Lysine Succinylation HybridSucc Train: 21,770/165,071
Predict: 8710 proteins (23866 sites)
PLMD3.0, PhosphoSitePlus, dbPTM 0.4 21 PseAAC, CKSAAP, OBC, AAIndex, ACF, GPS, PSSM, ASA, SS, BTA - PLR; DNN 10-CV: AUC= 0.885(General:); Human-specific: AUC=0.952 10.1016/j.gpb.2019.11.010 2020
Inspector Train: 4755/50,565
Test: 254/2977
Pos: ProtKB/Swiss-Prot; Neg: NCBI
Test: 254/2977
0.3 35 DBPB, PSDAAP, PseAAC, PWAA, EGAAC, CKSAAGP F-score, IFS ENN, ADASYN; RF 10-CV:Sn=0.876,Sp=0.917, Acc=90.4, MCC=0.981
Inde:Sn=0.693, Sp=0.717, Acc=0.715, MCC=0.238
10.1016/j.ab.2020.113592 2020
SSKM_Succ Train: 4695/47,027(Unlabelled)
Validation:1815/24509(Unlabelled)
Test: 2608/1050(Unlabelled)
Train:PLMD,UniProt;
Others:dbPTM
0.3 21 proximal PTMs, Grey model, DNC, PSAAP RF Semi-supervised (Clustering SVM) Inde: Sn=82.21%, Sp=75.15%, Acc=80.19%, MCC=0.546 10.1109/TCBB.2020.3006144 2020
CNN-SuccSite * Train: 3216/16,412 [4 neg deleted]
Test: 218/2621
PLMD 0.3 31 PSAAC, CKSAAP(Top 400 by mRMR), PSSM - MDD; CNN 10-CV: Sn=86.94%, Sp=85.43%, Acc=85.68%, MCC=0.608
Inde: Sn=84.40%; Sp=86.99%, Acc=86.79%, MCC=0.489
10.1038/s41598-019-52552-4 2019
PSuccE Train: 4755/50,565
Test: 254/2977
Pos: ProtKB/Swiss-Prot; Neg: NCBI 0.3 35 AAC, BE, PCP, GPAAC IG Bootstrap Sampling; Ensemble SVM 10-CV: Sn=84.31%, Sp=93.73%, Acc=89.14%, MCC=0.79
Inde: Sn=88.6%, Sn=37.5%, Acc=84.5%, MCC=0.2
10.1186/s12859-018-2249-4 2018

Cysteine S-sulfenylation
SulSite-GTB Train: 1031/8028
Test: 216/1418
Carroll Lab, RedoxDB, UniProtKB 0.4 21 AAC, DC, EBGW, KNN, PSAAP, PsePSSM, PWAAC LASSO SNOTE; Gradient tree boosting 5-CV: Acc=92.86%, Sn=87.21%, Sp=98.61%, MCC=0.8631, AUC=0.9706
Inde: Acc=88.53%, Sn=84.10%, Sp=93.02%, MCC=0.7740, AUC=0.9425
10.1007/s00521-020-04792-z 2020
Butt's work * Train: 900/6856 NCBI
Test: 216/1418
Train:NCBI
Test: Carroll Lab, RedoxDB, UniProtKB
0.4 21 SVV, M-matrix, PRIM, R‑PRIM, FV, AAPIV, RAAPIV - NN (GDAL) 10-CV: Sn=88.72%, Sp=98.09%,Acc=96.89%, MCC=0.8618, AUC=0.9309
Inde: Sn=0.8653,Sp=0.8333,MCC=0.56
10.1007/s10989-019-09931-2 2019
Fu-SulfPred Train:900/6856(Xu);1031/8028(Bui)
Test: 145/268(Xu);216/1418(Bui);44/266(PRESS)
NCBI(Xu)
Carroll Lab, RedoxDB, UniProtKB(Bui)
PDB(PRESS)
0.4 21 PSAAP, AAPPI - Category-based Resampling; Multi-Forest Inde: Sn=85.15%, Sp=68.62%, Acc=79.34%, MCC=0.5437(Xu); Sn=80.79%, Sp=67.20%, Acc=79.00%, MCC=0.3736(Bui); Sn=91.99%, Sp=84.09%, Acc=90.87%, MCC=0.6809(PRESS) 10.1016/j.jtbi.2018.10.046 2019
Sulf_FSVM * Train: 900/6856
Test: 216/1418
Carroll Lab, RedoxDB, UniProtKB 0.4 21 AAC, BE, CKSAAP mRMR fuzzy SVM 10-CV: Sn=73.26%, Sp=70.78%, Acc=71.07%, MCC=0.2971
Inde: Sn=0.7981, Sp=0.7969, MCC=0.4461
10.1016/j.jtbi.2018.08.022 2018
PredCSO Train: 228/ 757
Test :51/200
PDB 0.9 11 PSAAP, PSSM, ASA, FBS2P mR, SBE Bootstrap resampling; Ensemble GTB by voting 10-CV: Sn=87.85%, Sp=87.0%, Acc=87.2%, MCC=0.684
Inde: Sn=88.2%, Sp=89.0%, ACC=88.8%, MCC=0.702, Pre=67.1%, F1=0.762
10.1039/c8mo00089a 2018
Citrullination PCSPred_SC Train: 116/232
Test: 138/150
Train: Zhang2017
Test: UniProt
- 21 BE, PSAAP, PseAAC, PP t-SNE ADAYSN; SVM 10-CV: Sn=94.80%, Sp=93.10%, Acc=93.70%, MCC=0.862, AUC= 0.997
Inde: Sn=92.8%, Sp=93.3%, ACC=93.1%, MCC=0.861, AUC=0.995
10.1109/ACCESS.2020.2992672 2020

Lysine Glycation
BPB_GlySite * Train: 223/446 CPLM, UniProt 0.4 15 BPB - SVM 10-CV: Sn=63.68%, Sp=72.60%, Acc=69.63%, MCC=0.3499, AUC=0.7622 10.1016/j.compbiolchem.2017.10.004 2017
Gly-PseAAC Train: 223/446 CPLM, UniProt 0.4 15 PSAAP - SVM 10-CV: Sn=57.48%, Sp=74.30%, Acc=68.69%, MCC=0.3166, AUC=0.7199 10.1016/j.gene.2016.11.021 2017
Lysine Sumoylation C-iSUMO Train: 780/21353 CPLM 0.4 31 ASA, TA - NearMiss; AdaBoost 10-CV: Sn=0.719, Sp=0.758, Acc=0.738, MCC=0.478 10.1016/j.compbiolchem.2020.107235 2020
SUMO-Forest Train: 755/9944 UniProt 0.4 21 PSAAP, PseAAP, BK, SP - Cascade Forest 10-CV: SUMO-Forest-FM: Acc=98.69%, Sp=99.29%, Sn=90.72%, MCC=89.53%, AUC=98.13%, F1=95.01% SUMO-Forest-CM: Acc=98.54%, Sp=99.03%, Sn=92.05%, MCC=89.15%, AUC=99.05%, F1=94.56% 10.1016/j.gene.2020.144536 2020
Lysine Formylation CKSAAP_FormSite * Train: 182/1637 PLMD 0.4 23 CKSAAP(Top300) F-score Biased SVM 10-CV: Sn=71.04%, Sp=77.10%, Acc=76.50%, MCC=32.28% 10.1016/j.ygeno.2019.05.027 2020
LFPred Train: 184/1640(Unlabelled) UniProt, PLMD, dbPTM 0.4 41 AAC, BPF, AAI - KNN Jackknife: Sn=80.4%, Sp=78.9%, Acc=79.3, MCC=0.55 10.1016/j.jtbi.2019.03.011 2019

Serine/Threonine O-GlcNAcylation
O-GlcNAcPRED-II Train: 945/50914
Test: 368/27139
Train: dbOGAP, Jochmann2014
Test: Wu2014, Li2015
0.3 23 AAC, DAA, BPB, ANBPB, DBPB, PSAAP, PSDAAP, PSTAAP - KPCA-FUS; Rotation Forest (KNN, RF, NB, SVM) 5-CV: Sn=77.89%, Sp=98.38%, Acc=91.90%, MCC=0.8112
Inde: Sn=67.12%, Sp=72.46%, Acc=72.39, MCC=0.1012
10.1093/bioinformatics/bty039 2018
Lysine Khib KhibPred * Train: 4659/60117 (only train available)
Test: 273/3946
Huang2018 0.4 29 CKSAAP, BE, AAF - Ensemble SVM 10-CV: Sn=75.57%, Sp=68.40%, Acc=70.92%, MCC=0.4207, AUC= 0.7937
Inde: Sn=77.66%, Sp=62.85%, ACC=63.81%, MCC=0.2036, AUC=0.7777
10.1016/j.compbiolchem.2020.107280 2020
KhibPred' * All:S.cerevisiaes: 960/8673
P. patens: 9600/48691
H. cells: 5444/51834
R. seeds: 8110/40776
UniProtKB/Swiss-Prot, Yu2017, Xue2020, Huang2017, Huang2018 0.3 21 BE, CKSAAP, PCP(AAIndex), KNN RF DT,GBR,GNB,KNN,RF,SVM Inde: S. cerevisiaes: AUC=0.807
P. patens: 0.781
R. Seeds: 0.825
H. cells: 0.831
10.1016/j.ab.2020.113793 2020
iLys-Khib * Train:4659/4659
Test:273/3946
Huang2018 0.4 35 AAF, BE, CKSAAP mRMR, IFS Fussy SVM 10-CV: Sn=74.48, Sp=65.77,Pre= 68.51,Acc= 70.12,MCC=0.4040,AUC= 0.7702
Inde: Sn=72.53%,Sp=64.27%,Pre=12.31,Acc=64.80,MCC=0.1864,AUC=0.7557
10.1016/j.chemolab.2019.06.009 2019

Lysine Crotonylation
LightGBM-CroSite Train: 159/847 # UniProt - 31 BE, PWAA, EBGW, KNN, PsePSSM Elastic Net SMOTE; LightGBM Jackknife: Acc= 98.99%, Sn= 98.86%, Sp= 99.11%, MCC= 0.9798, AUC= 0.9996 10.1016/j.ab.2020.113903 2020
Wang's work Train: 2548/26859(Plant);167/388(Mammalian)
Test: 669/26859(P);711/7458(M)
Mammalian: UniProt
Plant: Liu2018;
0.3 31 AAC, AAPC, BE, CKSAAP, EAAC, EGAAC CHI2 RUS(Train and Test); RF, SVM Mammalian: Sn = 92%, Sp = 88%, Acc = 90%, MCC= 0.80;
Plant: Sn = 77%, Sp = 83%, Acc = 70%, MCC = 0.54
10.1038/s41598-020-77173-0 2020
pKcr Train: 2742/29676
Test: 711/7458
Liu2018 0.3 29 Word Embedding - CNN 10-CV: Sn=51.69%,Sp=90.00%, Acc=86.78%, MCC=0.339, AUC=0.855, AUC01=0.030
Inde: Sn=53.67%, Sp=90.00%, Acc=85.64, MCC=0.335, AUC=0.853
10.1109/ACCESS.2020.2966592 2020
iCrotoK-PseAAC Train: 378/500 UniProt 0.6 41 SVV, SM, PRIM, RPRIM, FV, AAPIV, RAAPIV - CNN 10-CV: Acc=99.17%, Sn=99.40%, Sp=99.53%, MCC=0.98 10.1371/journal.pone.0223993 2019
Qiu's work Train: 159/847 # UniProt - 31 PWAA - SVM Jackknife: Sn=71.69%, Sp=98.7%, Acc=94.43%, MCC=0.778 10.1016/j.artmed.2017.02.007 2017
Phosphorylation DeepPPSite *
serine(S), threonine (T), tyrosine (Y)
Train: 4316/4316(S);1551/1551(T);553/553(Y)
Test: 2773/17118(S);941/6258(T);210/1296(Y) #
Train: ELM;
Test:PPA
0.3 S:29
T:19
15
BE, EBGW, CKSAAP, PSPM, IPCP F-score RNN-LSTM 10-CV: Sn=80.79%, Sp=79.98%, MCC=0.608, AUC= 0.881(S); Sn=80.14%, Sp=80.00%, MCC=0.602, AUC=0.877(T); Sn=76.14%, Sp=79.39%, MCC=0.558, AUC=0.859(Y)
Inde: Sn=63.46%, Sp=81.41%, MCC=0.356, AUC=0.796(S); Sn=46.75%, Sp=90.54%, MCC=0.358, AUC= 0.821(T); Sn=30.40%, Sp=95.53%, MCC=0.350, AUC=0.772(Y)
10.1016/j.ab.2020.113955 2020
HPhosPPred
S,T,Y
Train: 638/1132(S,T,Y) UniProtKB, Swiss-Prot 0.5 - PsePSSM, PSSM-ACT, NMBAC LFDA SVM 10-CV: Acc=79.43%,Sn=88.34%,Sp=63.64%,MCC=0.544 10.1142/S0219720020500183 2020
PROSPECT
Histidine (H)
Train: 219/1277
Test: 25/143
UniProt 0.4 27 One-of-K, EGAAC, CKSAAGP - Weighted sum of two CNN and one RF classifiers 10-CV: Sn=35.16%, Sp=90.00%, MCC=0.260, AUC=0.770
Inde: Sn=48.00%, Sp=90.2%, MCC=0.37, AUC=0.821
10.1142/S0219720020500183 2020
MusiteDeep
S,T,Y
General:
Train: 34401/677157(S/T),1883/128007(Y)
Test: 2074/60880(S/T)),47/9174(Y)
Kinase-specific(S/T):
CDK: 315/15878; PKA: 354/20231; CK2: 303/9687; MAPK: 399/16572; PKC:456/19779
General:Swiss-Prot
Kinase-specific:Swiss-Prot,PegPhos
0.5 33 One-of-K - CNN with 2D attention mechanism General: AUPRC: 50% improvement
Kinase-specific: competitive results
10.1093/bioinformatics/btx496 2020

Cysteine S-glutathionylation
Anashkina's work Train: 221/1047 RedoxDB - 7 PSM; Glutathionylation Propensity Scores Jackknife resampling: Sn=0.766, Sp=0.400, Acc=0.467, Balanced Acc=0.583, MCC=0.133 10.1186/s12859-020-03571-w 2020
Lysine Malonylation Kmalo Train: 5006/76,264(H, M); 196/2394(T)
Test: 1252/19,066(H, M); 82/1195(T)
Inde Test: 460/10,289(Kmal-sp tool)+1251/19,061(LEMP)(H & M);82/1195(T)
H: H. sapiens, M: M. musculus, T: T. aestivum
PLMD,LEMP, Liu2018 0.4 11-39 BE, AAIndex, PSSM, AAC, PAAC - Mammalian: Ensemble CNNs; Plant: Combiantion of CNNs and RF 10-CV:Acc=0.764, Sn=0.653, Sp=0.661, MCC=0.174, AUC=0.742(Mammalian); Acc=0.660, Sn=0.653, Sp=0.661,MCC=0.174, AUC=0.742 (Plant)
Inde: Acc=0.866, Sn=0.910, Sp=0.864, MCC=0.480, AUC=0.943 (mammalian); Acc=0.691, Sn=0.682, Sp=0.692, MCC=0.195,AUC=0.772 (plant)
10.1038/s41598-020-67384-w 2020

Lysine Ubiquitination
DeepTL-Ubi ALL: 94518 sites, 10% for Testing PhosphoSitePlus, mUbiSida, PLMD (8 species) 0.4 31 BE, Deep Feature Extractor - CNN Average AUC=0.793 10.1016/j.ymeth.2020.08.003 2020

Glycosylation
SPRINT-Gly Train: Human:11175(N-),687(O-);Mouse:13428,839
Test: Human:1260,79;Mouse:1500,100
Unsampling for train: N-:1:1,O-:1:3
UniProt, dbPTM, GlycoProtDB, UniPep, UniCarbKB, Chauhan2012, Gupta2004, Hansen1998 0.3 N-:5
O-:9
AAS,EI - DNN,SVM Inde: Human: Sn=98%, Pre= 93%
Mouse: Sn=99%,Pre=99%
10.1093/bioinformatics/btz215 2019

Nitration and Nitrosylation
DeepNitro Ttrain: 1210/8043(Y nitration);66/155(W nitration),3409/17453(S-nitrosylation)
Test:189/1182(Y nitration);485/4947(S-nitrosylation)
PubMed 0.4 41 One-hot, PFR, k-space spectrum, PSSM - DNN 10-CV: ROC=0.65(Y nitration), 0.80(W nitration), 0.70(C nitrosylation) Inde: ROC=0.6727(Y nitration), 0.7437(C nitrosylation) 10.1016/j.gpb.2018.04.007 2018
Multiple PTMs MusiteDeep S/T Phos:Train:135,556/280,3647;Test:8759/230,755
Y Phos: 9427/93,291;499/5540
N-Glyco:90,344/511,755;20522/120,384
O-Glyco:4216/103,771;218/6248
K N6-acetyl:22,355/274,668;683/11,371
R Methyl:4675/99,946;269/6859
K Methyl:2781/45,524;154/2001
C S-palmitoyl:3812/26,573;151/684
Pyrrolidone-carboxylic-acid:1394/10,528;230/891
Ubi:3707/49,963;514/6621; SUMO:1225/23,932;65/1310
K Hydroxy:356/2650;9/37
P Hydroxy:2773/11,761;422/814
UniProtKB/Swiss-Prot - 33 One-of-K - Bootstraping; Weight average of MultiCNN and CapsNet, and transfer learning for samll traing samples S/T Phos:AUC/PR=0.896/0.329; Y Phos:0.958/0.864
N-Glyco:0.993/0.937; O-Glyco:0.943/0.539
K N6-acetyl:0.978/0.858
R Methyl:0.941/0.844; K Methyl:0.951/0.850
C S-palmitoyl:0.961/0.922
Pyrrolidone-carboxylic-acid:0.979/0.947
Ubi:0.804/0.279; SUMO:0.990/0.881
K Hydroxy:0.982/0.930; P Hydroxy:0.732/0.627
10.1093/nar/gkaa275 2020
CapsNet_PTM Train(Test): S/T Phos:36395(12,177);Y Phos:2141(826)
N N-Glyco:10,218(6564);K N6-acetyl:6376(1907)
R Methyl:2241(455);C S-Palmitoyl:572(266)
Q Pyrrolidone-carboxylic-acid:623(154)
K SUMO:334(108)
UniProtKB/Swiss-Prot 0.3 33 Based on 237 physichemical properties - Bootstraping;Capsule Network S/T Phos:AUC/PR=0.8470/0.3437; Y Phos: 0.7171/0.2620
N N-Glyco:0.9808/0.8382; K N6-acetyl:0.7280/0.1970
R Methyl:0.9891/0.9352; C S-palmitoyl:0.5003/0.5003
Q Pyrrolidone-carboxylic-acid:0.9256/0.8333
K SUMO:0.8680/0.5717
10.1093/bioinformatics/bty977 2019
Note: * Web-server is not available;
          # Dataset address is not available/working at the time of writting.
          Pos/Neg: Number of positive/negative samples.
          Cut-off: Threshold value of redundant samples.
          10-CV: 10-fold cross validation; Jack: Jackknife test; Inde: Independent test results.
Last updated: 2021-01-08
E-mail address: doulijun777@163.com.