API Reference

The SBSM class provides a protein sequence classification model using Support Vector Machines (SVM) with precomputed custom kernels. It supports hybrid kernel fusion techniques and cross-validation for robust sequence classification.

SBSM.compute_kernels(X_train, X_test, y_train)

Generates the training and testing kernel matrices using multiple substitution matrices and fusion strategies.

Arguments:

  • X_train (List[str]): List of training protein sequences.
  • X_test (List[str]): List of testing protein sequences.
  • y_train (List[int]): Training labels for kernel supervision.

Returns:

  • train_kernel (np.ndarray): Kernel matrix for training (n_train x n_train).
  • test_kernel (np.ndarray): Kernel matrix for testing (n_test x n_train).

SBSM.compute_kernels_cv(X_train, y_train, k_fold=5, shuffle=True, random_state=None)

Computes the fused training kernel and performs k-fold cross-validation.

Arguments:

  • X_train (List[str]): List of training protein sequences.
  • y_train (List[int]): Training labels.
  • k_fold (int): Number of folds for CV. Default = 5.
  • shuffle (bool): Whether to shuffle data before splitting. Default = True.
  • random_state (int, optional): Seed for reproducibility.

Returns:

  • List[Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]]: For each fold:
  • K_train: Training kernel (folded)
  • K_test: Test kernel (folded)
  • y_train: Training labels (folded)
  • y_test: Testing labels (folded)

SBSM.fit(train_kernel, y_train, alpha=None, C=None)

Fits the SVM classifier using the precomputed training kernel.

Arguments:

  • train_kernel (np.ndarray): Training kernel matrix (n x n).
  • y_train (List[int] | np.ndarray): Training labels.
  • alpha (float, optional): Optional override of the kernel parameter.
  • C (float, optional): Optional override of the penalty parameter.

Returns:

  • self: The fitted SBSM instance.

SBSM.predict(test_kernel)

Predicts class labels for the test set using the trained classifier and kernel.

Arguments:

  • test_kernel (np.ndarray): Test kernel matrix (n_test x n_train).

Returns:

  • y_pred (np.ndarray): Predicted class labels.

SBSM.predict_proba(test_kernel)

Predicts class probabilities for the test set using the trained classifier and kernel. The probability outputs follow the common ascending order convention used in scikit-learn. By default, in the predict_proba method, the first column corresponds to label 0, and the second column corresponds to label 1.

Arguments:

  • test_kernel (np.ndarray): Test kernel matrix (n_test x n_train).

Returns:

  • y_proba (np.ndarray): Predicted probabilities for each class.