mAML webserver is a web-based machine learning (ML) system that can automatically generate optimized and interpretable models for binary or multi-class classification tasks in microbiome studies. Once the input files were uploaded, the pipeline will search the best combination of preprocessors and non-tree based classifiers and simutaneously optimize the hyperparameters for all classifiers. When the task is completed, the compressed results will be automatically sent to the predefined user e-mail address. The user can also feed new data to the existing model or upload a previously trained model to make new predictions.
The pipeline includes four steps:
- Filtering low prevalence features.
- Select a feature subset using distal_DBA/mRMR/HFE/univariate_feature_felection method.
- Perform over-sampling for imbalanced data using RandomOverSampler/ADASYN/SMOTE.
- Select the best performing combination of data preprocessors and hyperparameter-optimized classifiers.
The pipeline was developed based on the python machine learning package scikit-learn. Any built-in parameters of the 13 classifiers and 10 feature preprocessing methods can be edited on the web interface and 11 different metrics are available for model performance evaluation. Being data-driven, the pipeline can also be used in other classification tasks if only the domain-specific feature matrix is supplied.
Fenglong Yang, Quan Zou*, mAML: an automated machine learning pipeline with a microbiome repository for human disease classification, Database, Volume 2020, 2020, baaa050, https://doi.org/10.1093/database/baaa050. (SCI, IF2018=3.683)