Download the .whl
file of SBSM.
Then, use the following command to install SBSM onto your computer:
pip install sbsm-xxx.whl
After a successful installation, you should first import sbsm in your python file
import sbsm
To achieve higher running speeds, we have incorporated the multiprocessing module for parallelization. This requires users to use protection under if __name__ == "__main__"
, and avoid errors by using the freeze_support()
method:
if __name__ == "__main__" from multiprocessing import freeze_support freeze_support()
Next, create an sbsm object, at which point you can specify parameters.
cls = sbsm.SBSM(c=64)
Train the model using the fit()
method, where the input is the raw protein sequence of the training samples and their corresponding labels.
cls.fit(X_train, y_train)
Predict using the predict()
method, where the input is the raw protein sequence of the prediction samples.
y_predict = cls.predict(X_test)
The complete usage example is shown below:
import sbsm
import numpy as np
if __name__ == '__main__':
from multiprocessing import freeze_support
freeze_support()
# Load you own data
X_train = read_fasta('train_sample.fasta') # raw protein sequences of training
y_train = read_txt('train_label.txt') # corresponding labels of training
X_test = read_fasta('test_sample.fasta') # raw protein sequences of testing
# y_test = read_txt('test_label.txt') # corresponding labels of testing
# Train and test the SBSM
cls = sbsm.SBSM(c=10) # define the SBSM object
cls.fit(X_train, y_train) # train the SBSM
y_predict = cls.predict(X_test) # test the SBSM
print(y_predict) # print the results
samples
: Raw protein sequences
labels
: Corresponding labels
C
: Float type. Penalty parameter that determines the penalty for misclassifications. A larger value implies lesser tolerance for errors and might lead to overfitting; a smaller value implies greater tolerance for errors, possibly leading to underfitting. It's usually positive, default is 64.0.
alpha
: Float type. Control parameter for kernel parameterization. Typically positive, default value is 1.0.
match_score
: Integer type. Score given when two characters match perfectly. It's usually positive, default value is 2.
mismatch_score
: Integer type. Score given when two characters do not match. Typically negative, default value is -1.
gap_score
: Integer type. Cost for introducing a gap. This cost is deducted when a new gap is initiated. Typically negative, default value is -2.
k_neighbors
: Integer type. In HCKDM, k represents the number of neighboring kernels chosen for local kernel selection. It's usually positive, default value is 15.
lambda
: Float type. Ratio parameter of the global kernel alignment in both global and local kernel alignments. Value should be greater than 0 and less than 1, default value is 0.9.
nu1
: Float type. Regularization parameter for the Laplacian graph regularization term. Prevents certain kernel weights from nearing 0. Typically positive, default value is 0.001.
nu2
: Float type. Regularization parameter for the L2 regularization term. Prevents kernel weights from nearing the average kernel weight. Typically positive, default value is 0.001.
Download the manual of SBSM.
Download manual