API Reference
The SBSM class provides a protein sequence classification model using Support Vector Machines (SVM) with precomputed custom kernels. It supports hybrid kernel fusion techniques and cross-validation for robust sequence classification.
SBSM.compute_kernels(X_train, X_test, y_train)
Generates the training and testing kernel matrices using multiple substitution matrices and fusion strategies.
Arguments:
X_train (List[str]): List of training protein sequences.X_test (List[str]): List of testing protein sequences.y_train (List[int]): Training labels for kernel supervision.
Returns:
train_kernel (np.ndarray): Kernel matrix for training (n_train x n_train).test_kernel (np.ndarray): Kernel matrix for testing (n_test x n_train).
SBSM.compute_kernels_cv(X_train, y_train, k_fold=5, shuffle=True, random_state=None)
Computes the fused training kernel and performs k-fold cross-validation.
Arguments:
X_train (List[str]): List of training protein sequences.y_train (List[int]): Training labels.k_fold (int): Number of folds for CV. Default =5.shuffle (bool): Whether to shuffle data before splitting. Default =True.random_state (int, optional): Seed for reproducibility.
Returns:
List[Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]]: For each fold:K_train: Training kernel (folded)K_test: Test kernel (folded)y_train: Training labels (folded)y_test: Testing labels (folded)
SBSM.fit(train_kernel, y_train, alpha=None, C=None)
Fits the SVM classifier using the precomputed training kernel.
Arguments:
train_kernel (np.ndarray): Training kernel matrix (n x n).y_train (List[int] | np.ndarray): Training labels.alpha (float, optional): Optional override of the kernel parameter.C (float, optional): Optional override of the penalty parameter.
Returns:
self: The fittedSBSMinstance.
SBSM.predict(test_kernel)
Predicts class labels for the test set using the trained classifier and kernel.
Arguments:
test_kernel (np.ndarray): Test kernel matrix (n_test x n_train).
Returns:
y_pred (np.ndarray): Predicted class labels.
SBSM.predict_proba(test_kernel)
Predicts class probabilities for the test set using the trained classifier and kernel. The probability outputs follow the common ascending order convention used in scikit-learn. By default, in the predict_proba method, the first column corresponds to label 0, and the second column corresponds to label 1.
Arguments:
test_kernel (np.ndarray): Test kernel matrix (n_test x n_train).
Returns:
y_proba (np.ndarray): Predicted probabilities for each class.