Taxonomic Classification Tools Collection

This table represents a collection of various taxonomic classification tools categorized by classification type, methodology, and other relevant details.

Classification Type Method Sub-Method Software Tool Year Database/Dataset Download Link
DNA Database-Based k-mer Kraken 2014 minikraken_20171019_4GB
minikraken_20171019_8GB
Timing data
Accuracy data
software and document
DNA Database-Based k-mer Kraken 2 2020 Standard(Refeq archaea, bacteria, viral, plasmid, human1, UniVec_Core)
PlusPFP(Standard plus Refeq protozoa, fungi & plant)
nt Database(Very large collection, inclusive of GenBank, RefSeq, TPA and PDB)
software and document
DNA Database-Based k-mer Centrifuge 2016 Refseq: bacteria, archaea, viral, human(8g)
NCBI: nucleotide non-redundant sequences(64g)
software and document
DNA Database-Based k-mer CLARK 2015 Using software downloads software and document
DNA Database-Based k-mer CLARK-S 2016 Using software downloads software and document
DNA Database-Based k-mer KrakenUniq 2018 Standard_377GB_kdb(archaea, bacteria, viral, human, UniVec_Core)
Standard_377GB_tar_gz(download both)
MicrobialDB_384GB_kdb(archaea, bacteria, viral, human, UniVec_Core, Eukaryotic pathogen genomes with contaminants removed)
MicrobialDB_384GB_tar_gz(download both)
source code
DNA Database-Based k-mer k-SLAM 2017 Using software downloads source code
software online
DNA Database-Based k-mer MegaBlast 2008 Index and software package(FTP) software(ftp)
You can also:sudo apt-get install blastp/blastn
DNA Database-Based k-mer MetaOthello 2018 The database index download link provided by the author has expired, please build it yourself source code
source code github
DNA Database-Based k-mer PathSeq 2018 Please use git LFS to download the pre built index on the GitHub page of the software source code
DNA Database-Based k-mer taxMaps 2018 Index(ftp) source code
Protein Database-Based k-mer DIAMOND 2015 One can either utilize existing BLAST databases directly or construct a database independently following the provided tutorial. source code
Direct installation method: conda install -c bioconda -c conda-forge diamond
Protein Database-Based k-mer Kaiju 2016 nr_177g(Subset of NCBI BLAST nr database containing Archaea, bacteria and viruses. )
nr_euk_204g(Like nr, but additionally including fungi and microbial eukaryotes)
refseq_ref_116g(Protein sequences from Archaea, bacteria from NCBI RefSeq representative assemblies, as well as viral protein sequences from NCBI RefSeq. )
software and document
Web server
source code
Protein Database-Based k-mer MMseqs2 2017 Data Resources from the MMseqs2 Family source code
DNA Database-Based Marker MetaPhlAn2 2015 Run 'metaphlan --install' command in the software to download source code
DNA Database-Based Marker MetaPhlAn3 2021 Run 'metaphlan --install' command in the software to download source code
DNA Database-Based Marker MetaPhlAn 4 2023 Recommended software download source code
DNA Database-Based Marker mOTUs3 2022 Data Sets source code
Virus Machine Learning-Based Traditional vConTACT 2019 Vcontact has its own reference database after downloading the software software and document
Virus Machine Learning-Based Traditional VirusTaxo 2022 test data source code
Virus Machine Learning-Based Deep Learning Virtifier 2022 RefSeq genomes
The CAMI Challenge Dataset
real human gut metagenomes
source code
Virus Machine Learning-Based Traditional VirFinder 2017 trained model for predicting both prokaryotic and eukaryotic viruses source code
DNA Machine Learning-Based Traditional IDTAXA 2018 Training sets for Classification software and document
Virus Machine Learning-Based Deep Learning ViBE 2022 pre-trained
BPDR150
BPDR250
DNA150
DNA250
RNA150
RNA250
source code
RNA Machine Learning-Based Traditional RdRpBin 2022 reference dataset and taxonomy files source code
DNA Machine Learning-Based Deep Learning BERTax 2022 Installed Git LFS download model: git clone https://github.com/f-kretschmer/bertax.git source code
DNA Machine Learning-Based Deep Learning DeepMicrobes 2020 DeepMicrobes-data source code
Virus Machine Learning-Based Deep Learning CHEER 2021 pre-trained model source code
DNA Machine Learning-Based Traditional ML-DSP 2019 Database source code
DNA Database-Based k-mer CDKAM 2020 A sample of sequencing Nanopore MinION data
Zymo mock dataset
source code
DNA Machine Learning-Based Traditional QIIME 2 2019 Silva 138 99% OTUs full-length sequences
Silva 138 99% OTUs from 515F/806R region of sequences
Greengenes2 2022.10 full length sequences
Greengenes2 2022.10 from 515F/806R region of sequences
Weighted Silva 138 99% OTUs full-length sequences
Weighted Greengenes 13_8 99% OTUs full-length sequences
Weighted Greengenes 13_8 99% OTUs from 515F/806R region of sequences
software and document