Support Bio-sequence Machine

Install

Download the .whl file of SBSM.

Download code

Then, use the following command to install SBSM onto your computer:

pip install sbsm-xxx.whl

Usage

After a successful installation, you should first import sbsm in your python file

import sbsm

To achieve higher running speeds, we have incorporated the multiprocessing module for parallelization. This requires users to use protection under if __name__ == "__main__", and avoid errors by using the freeze_support() method:

if __name__ == "__main__"
    from multiprocessing import freeze_support
    freeze_support()

Next, create an sbsm object, at which point you can specify parameters.

  cls = sbsm.SBSM(c=64)

Train the model using the fit() method, where the input is the raw protein sequence of the training samples and their corresponding labels.

  cls.fit(X_train, y_train)

Predict using the predict() method, where the input is the raw protein sequence of the prediction samples.

  y_predict = cls.predict(X_test)

The complete usage example is shown below:

import sbsm
import numpy as np

if __name__ == '__main__':
    from multiprocessing import freeze_support
    freeze_support()

    # Load you own data
    X_train = read_fasta('train_sample.fasta')  # raw protein sequences of training
    y_train = read_txt('train_label.txt')  # corresponding labels of training
    X_test = read_fasta('test_sample.fasta')  # raw protein sequences of testing
    # y_test = read_txt('test_label.txt')  # corresponding labels of testing

    # Train and test the SBSM
    cls = sbsm.SBSM(c=10)  # define the SBSM object
    cls.fit(X_train, y_train)  # train the SBSM
    y_predict = cls.predict(X_test)  # test the SBSM
    print(y_predict)  # print the results

Parameters description

samples: Raw protein sequences

labels: Corresponding labels

C: Float type. Penalty parameter that determines the penalty for misclassifications. A larger value implies lesser tolerance for errors and might lead to overfitting; a smaller value implies greater tolerance for errors, possibly leading to underfitting. It's usually positive, default is 64.0.

alpha: Float type. Control parameter for kernel parameterization. Typically positive, default value is 1.0.

match_score: Integer type. Score given when two characters match perfectly. It's usually positive, default value is 2.

mismatch_score: Integer type. Score given when two characters do not match. Typically negative, default value is -1.

gap_score: Integer type. Cost for introducing a gap. This cost is deducted when a new gap is initiated. Typically negative, default value is -2.

k_neighbors: Integer type. In HCKDM, k represents the number of neighboring kernels chosen for local kernel selection. It's usually positive, default value is 15.

lambda: Float type. Ratio parameter of the global kernel alignment in both global and local kernel alignments. Value should be greater than 0 and less than 1, default value is 0.9.

nu1: Float type. Regularization parameter for the Laplacian graph regularization term. Prevents certain kernel weights from nearing 0. Typically positive, default value is 0.001.

nu2: Float type. Regularization parameter for the L2 regularization term. Prevents kernel weights from nearing the average kernel weight. Typically positive, default value is 0.001.

Manual

Download the manual of SBSM.

Download manual

Document of Python

Install

Usage

Parameters description

Manual