API Reference
The SBSM
class provides a protein sequence classification model using Support Vector Machines (SVM) with precomputed custom kernels. It supports hybrid kernel fusion techniques and cross-validation for robust sequence classification.
SBSM.compute_kernels(X_train, X_test, y_train)
Generates the training and testing kernel matrices using multiple substitution matrices and fusion strategies.
Arguments:
X_train (List[str])
: List of training protein sequences.X_test (List[str])
: List of testing protein sequences.y_train (List[int])
: Training labels for kernel supervision.
Returns:
train_kernel (np.ndarray)
: Kernel matrix for training (n_train x n_train).test_kernel (np.ndarray)
: Kernel matrix for testing (n_test x n_train).
SBSM.compute_kernels_cv(X_train, y_train, k_fold=5, shuffle=True, random_state=None)
Computes the fused training kernel and performs k-fold cross-validation.
Arguments:
X_train (List[str])
: List of training protein sequences.y_train (List[int])
: Training labels.k_fold (int)
: Number of folds for CV. Default =5
.shuffle (bool)
: Whether to shuffle data before splitting. Default =True
.random_state (int, optional)
: Seed for reproducibility.
Returns:
List[Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]]
: For each fold:K_train
: Training kernel (folded)K_test
: Test kernel (folded)y_train
: Training labels (folded)y_test
: Testing labels (folded)
SBSM.fit(train_kernel, y_train, alpha=None, C=None)
Fits the SVM classifier using the precomputed training kernel.
Arguments:
train_kernel (np.ndarray)
: Training kernel matrix (n x n).y_train (List[int] | np.ndarray)
: Training labels.alpha (float, optional)
: Optional override of the kernel parameter.C (float, optional)
: Optional override of the penalty parameter.
Returns:
self
: The fittedSBSM
instance.
SBSM.predict(test_kernel)
Predicts class labels for the test set using the trained classifier and kernel.
Arguments:
test_kernel (np.ndarray)
: Test kernel matrix (n_test x n_train).
Returns:
y_pred (np.ndarray)
: Predicted class labels.
SBSM.predict_proba(test_kernel)
Predicts class probabilities for the test set using the trained classifier and kernel. The probability outputs follow the common ascending order convention used in scikit-learn. By default, in the predict_proba method, the first column corresponds to label 0, and the second column corresponds to label 1.
Arguments:
test_kernel (np.ndarray)
: Test kernel matrix (n_test x n_train).
Returns:
y_proba (np.ndarray)
: Predicted probabilities for each class.