Hadoop Phylogenetic Tree
Hadoop Phylogenetic Tree is a package of multi-platform Java software tools, which aimed at constructing phylogenetic tree on large scale multiple similar DNA/RNA sequence alignment output .
As the tree construct process is a clustering process , Hadoop Phylogenetic tree employs one-pass clustering algorithm as the basic preprocessing algorithm for grouping data. After the preprocessing , the input sequences will be clustered into several clusters.
For improving the calculate speed , we deploy the algorithm on hadoop clusters . After the preprocessing , each cluster will execute the Neighbor Joining alogrithm to construct the sub tree. After the sub tree completed , chose the root of sub tree as the represent to construct the final tree . The final phylogenetic tree will also be created by the Neighbor Joining algorithm. The input file can be the sequences file without multiple sequences alignment(MSA) or the output file of MSA. You can use this tool in any OS with JVM.
Datasets
mitochondrial genomes
- Ref: Tanaka M., et al. (2004) Mitochondrial genome variation in eastern Asia and the peopling of Japan. Genome Res,14(10a), 1832-1850
- Dowload (zipped file): 1x (219KB) 20x (4.27MB) 50x (10.666MB) 100x (21.325MB)
16s rRNA
- Ref: DeSantis, T. Z., et al.(2006) NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes. Nucleic Acids Res, 34, W394-399
- Download (zipped file): small (21.864MB) big (197.224MB)
All Rights Reserved Copyright @ 2015|Dr. Quan Zou
Last Modified in 2016/7/26