Phylogenetic Tree

Hadoop Phylogenetic Tree

Hadoop Phylogenetic Tree is a package of multi-platform Java software tools, which aimed at constructing phylogenetic tree on large scale multiple similar DNA/RNA sequence alignment output .

As the tree construct process is a clustering process , Hadoop Phylogenetic tree employs one-pass clustering algorithm as the basic preprocessing algorithm for grouping data. After the preprocessing , the input sequences will be clustered into several clusters.

For improving the calculate speed , we deploy the algorithm on hadoop clusters . After the preprocessing , each cluster will execute the Neighbor Joining alogrithm to construct the sub tree. After the sub tree completed , chose the root of sub tree as the represent to construct the final tree . The final phylogenetic tree will also be created by the Neighbor Joining algorithm. The input file can be the sequences file without multiple sequences alignment(MSA) or the output file of MSA. You can use this tool in any OS with JVM.

Datasets

mitochondrial genomes

Ref: Tanaka M., et al. (2004) Mitochondrial genome variation in eastern Asia and the peopling of Japan. Genome Res,14(10a), 1832-1850
Dowload (zipped file): 1x (219KB) 20x (4.27MB) 50x (10.666MB) 100x (21.325MB)

16s rRNA

Ref: DeSantis, T. Z., et al.(2006) NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes. Nucleic Acids Res, 34, W394-399
Download (zipped file): small (21.864MB) big (197.224MB)