API Reference

The SBSM class provides a protein sequence classification model using Support Vector Machines (SVM) with precomputed custom kernels. It supports hybrid kernel fusion techniques and cross-validation for robust sequence classification.

`SBSM.compute_kernels(X_train, X_test, y_train)`

Generates the training and testing kernel matrices using multiple substitution matrices and fusion strategies.

Arguments:

X_train (List[str]): List of training protein sequences.
X_test (List[str]): List of testing protein sequences.
y_train (List[int]): Training labels for kernel supervision.

Returns:

train_kernel (np.ndarray): Kernel matrix for training (n_train x n_train).
test_kernel (np.ndarray): Kernel matrix for testing (n_test x n_train).

`SBSM.compute_kernels_cv(X_train, y_train, k_fold=5, shuffle=True, random_state=None)`

Computes the fused training kernel and performs k-fold cross-validation.

Arguments:

X_train (List[str]): List of training protein sequences.
y_train (List[int]): Training labels.
k_fold (int): Number of folds for CV. Default = 5.
shuffle (bool): Whether to shuffle data before splitting. Default = True.
random_state (int, optional): Seed for reproducibility.

Returns:

List[Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]]: For each fold:
K_train: Training kernel (folded)
K_test: Test kernel (folded)
y_train: Training labels (folded)
y_test: Testing labels (folded)

`SBSM.fit(train_kernel, y_train, alpha=None, C=None)`

Fits the SVM classifier using the precomputed training kernel.

Arguments:

train_kernel (np.ndarray): Training kernel matrix (n x n).
y_train (List[int] | np.ndarray): Training labels.
alpha (float, optional): Optional override of the kernel parameter.
C (float, optional): Optional override of the penalty parameter.

Returns:

self: The fitted SBSM instance.

`SBSM.predict(test_kernel)`

Predicts class labels for the test set using the trained classifier and kernel.

Arguments:

test_kernel (np.ndarray): Test kernel matrix (n_test x n_train).

Returns:

y_pred (np.ndarray): Predicted class labels.

`SBSM.predict_proba(test_kernel)`

Predicts class probabilities for the test set using the trained classifier and kernel. The probability outputs follow the common ascending order convention used in scikit-learn. By default, in the predict_proba method, the first column corresponds to label 0, and the second column corresponds to label 1.

Arguments:

test_kernel (np.ndarray): Test kernel matrix (n_test x n_train).

Returns:

y_proba (np.ndarray): Predicted probabilities for each class.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search