Machine learning

Wang Chao (王 超)



    Description: a bioinformatics tool for the classification of fungal ITS barcodes to the species level. An ITS database covering more than 25,000 species in a broad range of fungal taxa was assembled. A word embedding algorithm was used to represent an ITS sequence as a dense low-dimensional vector. A random forest-based classifier was built for species identification. Benchmarking results showed that our model achieved an accuracy comparable to that of several state-of-the-art predictors, and more importantly, it could implement large datasets and greatly reduce dimensionality.

    Datasets: Fungal ITS

    Source code: Python scripts


    Description: A fungal effector predictor was designed to effectively learn from an imbalanced dataset. A granular support vector-based under-sampling (GSV-US) strategy combined with a genetic algorithm was used for majority class sampling. When evaluating on an independent test dataset, the FunEffector-Pred significantly outperformed the existing predictors for fungal effector identification.

    Datasets: Data

    Source code: Python scripts