PPIPre

Introduction

In this study, two negative data sets were constructed artificially to overcome the shortcomings of the ready-made PPNI database. In addition, proteins were encoded on the basis of three types of aspects, namely, primary structure information, secondary structure information, and physical and chemical properties of the protein. The n-gram-split and k-skip-2-gram feature extraction methods were proposed and improved from n-gram based on primary structure information. A tool called PSIPRED was utilized in this study to obtain the protein secondary structure feature. The 188-D feature extraction method based on the physical and chemical properties of proteins was adopted to improve the performance of PPI prediction.

In the second stage, we conducted experiments on different classifiers, namely, RF, J48, AdaBoost, Bagging, and LibD3C. The experiments show that RF and LibD3C have better results and more stable performance in the true data set among the different feature extraction methods. By contrast, the experiments show that LibD3C has acceptable results in the RandomPairs data set. However, the results of the RecombinePairs data set are unsatisfactory, which need further discussion. Through the experiments, we determined that feature extraction methods based on secondary structure information and physical and chemical properties have significant advantages over the other methods. Moreover, feature selection and Z test analysis indicate that the features based on primary structure information are effective for PPI prediction.

Introduction

A Web Server Demo