PASTA程序使用

PASTA下载地址:https://github.com/smirarab/pasta

目前PASTA只开发了支持 windows mac 的版本,不能用于windows

 

1Linux 版本从源代码安装

需要安装:

Python (一般系统都有自带)

Java(一般大家都装了)

Dendropy  (https://pythonhosted.org/DendroPy/)

Dendropy是一个Python 的库,专门用来进行进化树的计算。他提供了用于模拟、处理、操作进化树的类和方法。他支持读写多种格式的进化数据,比如NEXUSNEWICKNeXMLPhylipFASTA等等。

Dendropy安装包下载: 

http://pypi.python.org/packages/source/D/DendroPy/DendroPy-3.12.0.tar.gz

$tar -zxvf DendroPy-3.12.0.tar.gz

$ cd DendroPy-3.12.0

$ sudo python setup.py install

 

之后我们继续安装PASTA

$ mkdir pasta-code  (创建一个文件夹)

把下载的PASTA文件(下载地址:https://github.com/smirarab/pasta/archive/master.zip)复制到这个文件夹下

$ unzip pasta-master.zip ( 解压缩文件)

然后下载SATe tool (下载地址:https://github.com/smirarab/sate-tools-linux/archive/master.zip)并放到pasta-code 文件夹下,然后解压缩

$ unzip sate-tools-linux-master.zip

然后把 sate-tools移动到 pasta-master文件夹下面

$ mv sate-tools-linux-master pasta-master/sate-tools-linux 

然后cd pasta-master 里面

$ python setup.py install    

PASTA基本就安装完成了。

 

2、输入数据

这里的数据用的是PASTA安装包下面data目录里面提供的数据。

data / anolis.fasta    data/anolis.tre

 

data / anolis.fasta 的数据格式:

>Anolisahli

ATGAGCCCAATAATATACACAATTATACTATCAAGCCTAGCAACAGGCACTATCGTTACC

ATAACGAGCTACCACTGACTCCTAGCCTGAATCGGACTAGAAATAAACACTTTATCAATTATTCCAATTATTTCTACCATACACCACCCACGATCAACAGAGGCCGCCACAAAATACT

>Anolisaliniger

ATGAGCCCTACAGTTTATTCAATTATTTTGTCAAGCCTACCAACAGGCACAGTTATTACT

ATAACCAGCTACCATTGATTAATAGCCTGAGTCGGGCTAGAAATTAACACACTCGCAATTATTCCTGTTGTTTCAATACAACATCACCCACGGTCCACAGAAGCCGCCACAAAATATT

 

data/anolis.tre 的数据格式:

((((Anolisaliniger:0.1806,Anolisbahorucoensis:0.16601):0.07853,Anolisbarahonae:0.22174):0.02313,Anolisahli:0.27636):0.05616,Anolisalutaceus:0.22857,Anolisangusticeps:0.19573);

 

执行的命令是:

$ python run_pasta.py -i data/anolis.fasta -t data/anolis.tre --auto

其中anolis.tre 是初始树, 如果没有给定初始树,程序就按照论文里面的描述来估计初始树。

过程总结如下:

1、从序列集合里面随机选择一个子集合(大小为100

2、对子集合执行 MAFFT-linsi 比对

3、对子集合构建一个HMMER模型

4、将这个子集合作为主干,把剩下的序列集合依次进行比对

5、FastTree 将比对的输出构建出一棵初始树

 

3、输出结果

输出结果及其解释如下(红色部分是对结果的注释)

 

SATe INFO: Reading input sequences from 'data/anolis.fasta'...   (读取输入fasta文件)

SATe INFO: Configuration written to "/home/maoyaozong/pasta-code/pasta-master/data/satejob3_temp_sate_config.txt".  SATe的配置文件写到  satejob3_temp_sate_config.txt

SATe INFO: Directory for temporary files created at    (创建一个临时文件的目录)/home/maoyaozong/.sate/satejob/tempkaSq1Y

SATe INFO: Reading input sequences from 'data/anolis.fasta'...

SATe INFO: Reading starting trees from "data/anolis.tre"...  (读取初始树的文件)

SATe INFO: Name translation information saved to   (名字转换信息)/home/maoyaozong/pasta-code/pasta-master/data/satejob3_temp_name_translation.txt as safe name, original name, blank line format.

SATe INFO: Starting SATe algorithm on initial tree...  (对初始树开始SATe 算法)

SATe INFO: Max subproblem set to 3      (默认迭代3)

SATe INFO: Step 0. Realigning with decomposition strategy set to centroid 

SATe INFO: Step 0. Alignment obtained. Tree inference beginning...

SATe INFO: realignment accepted and score improved.

SATe INFO: current score: -5779.3102, best score: -5779.3102  (一轮得分)

SATe INFO: Step 1. Realigning with decomposition strategy set to centroid

SATe INFO: Step 1. Alignment obtained. Tree inference beginning...

SATe INFO: realignment accepted and score improved.

SATe INFO: current score: -5779.302, best score: -5779.302   (二轮得分)

SATe INFO: Step 2. Realigning with decomposition strategy set to centroid

SATe INFO: Step 2. Alignment obtained. Tree inference beginning...

SATe INFO: realignment accepted and despite the score not improving.

SATe INFO: current score: -5779.3124, best score: -5779.302  (三轮得分)

SATe INFO: Writing resulting alignment to    (把比对的结果写成文件)/home/maoyaozong/pasta-code/pasta-master/data/satejob3.marker001.anolis.aln

SATe INFO: Writing resulting tree to      (把进化树结果写到文件)/home/maoyaozong/pasta-code/pasta-master/data/satejob3.tre

SATe INFO: Writing resulting likelihood score to   (把相似度得分写到文件)/home/maoyaozong/pasta-code/pasta-master/data/satejob3.score.txt

SATe INFO: The resulting alignment (with the names in a "safe" form) was first written as the file "/home/maoyaozong/pasta-code/pasta-master/data/satejob3_temp_iteration_2_seq_alignment.txt"

 SATe INFO: The resulting tree (with the names in a "safe" form) was first written as the file "/home/maoyaozong/pasta-code/pasta-master/data/satejob3_temp_iteration_2_tree.tre"

 SATe INFO: Total time spent: 25.2676310539s  (时间开销)

 

每轮迭代生成3个文件: 这里有3 所以生成3*3 =9 个文件

生成一个配置文件:satejob3_temp_sate_config.txt

一个比对结果文件:satejob3.marker001.anolis.aln

一个树文件:satejob3.tre

相似度文件:satejob3.score.tx

名字转换信息文件:satejob3_temp_name_translation.txt

程序输出文件:satejob3.out.txt

一共15个文件