PASTA程序使用
PASTA下载地址:https://github.com/smirarab/pasta
目前PASTA只开发了支持 windows 和 mac 的版本,不能用于windows 。
1、Linux 版本从源代码安装:
需要安装:
Python (一般系统都有自带)
Java(一般大家都装了)
Dendropy (https://pythonhosted.org/DendroPy/)
Dendropy是一个Python 的库,专门用来进行进化树的计算。他提供了用于模拟、处理、操作进化树的类和方法。他支持读写多种格式的进化数据,比如NEXUS,NEWICK,NeXML,Phylip,FASTA等等。
Dendropy安装包下载:
http://pypi.python.org/packages/source/D/DendroPy/DendroPy-3.12.0.tar.gz
$tar -zxvf DendroPy-3.12.0.tar.gz
$ cd DendroPy-3.12.0
$ sudo python setup.py install
之后我们继续安装PASTA
$ mkdir pasta-code (创建一个文件夹)
把下载的PASTA文件(下载地址:https://github.com/smirarab/pasta/archive/master.zip)复制到这个文件夹下
$ unzip pasta-master.zip ( 解压缩文件)
然后下载SATe tool (下载地址:https://github.com/smirarab/sate-tools-linux/archive/master.zip)并放到pasta-code
文件夹下,然后解压缩
$ unzip
sate-tools-linux-master.zip
然后把 sate-tools移动到 pasta-master文件夹下面
$ mv
sate-tools-linux-master pasta-master/sate-tools-linux
然后cd 到 pasta-master 里面
$ python setup.py install
PASTA基本就安装完成了。
2、输入数据
这里的数据用的是PASTA安装包下面data目录里面提供的数据。
data / anolis.fasta data/anolis.tre
data / anolis.fasta 的数据格式:
>Anolisahli
ATGAGCCCAATAATATACACAATTATACTATCAAGCCTAGCAACAGGCACTATCGTTACC
ATAACGAGCTACCACTGACTCCTAGCCTGAATCGGACTAGAAATAAACACTTTATCAATTATTCCAATTATTTCTACCATACACCACCCACGATCAACAGAGGCCGCCACAAAATACT
>Anolisaliniger
ATGAGCCCTACAGTTTATTCAATTATTTTGTCAAGCCTACCAACAGGCACAGTTATTACT
ATAACCAGCTACCATTGATTAATAGCCTGAGTCGGGCTAGAAATTAACACACTCGCAATTATTCCTGTTGTTTCAATACAACATCACCCACGGTCCACAGAAGCCGCCACAAAATATT
data/anolis.tre 的数据格式:
((((Anolisaliniger:0.1806,Anolisbahorucoensis:0.16601):0.07853,Anolisbarahonae:0.22174):0.02313,Anolisahli:0.27636):0.05616,Anolisalutaceus:0.22857,Anolisangusticeps:0.19573);
执行的命令是:
$ python run_pasta.py -i data/anolis.fasta -t data/anolis.tre
--auto
其中anolis.tre 是初始树, 如果没有给定初始树,程序就按照论文里面的描述来估计初始树。
过程总结如下:
1、从序列集合里面随机选择一个子集合(大小为100)
2、对子集合执行 MAFFT-linsi 比对
3、对子集合构建一个HMMER模型
4、将这个子集合作为主干,把剩下的序列集合依次进行比对
5、用FastTree 将比对的输出构建出一棵初始树
3、输出结果
输出结果及其解释如下(红色部分是对结果的注释):
SATe INFO: Reading input
sequences from 'data/anolis.fasta'... (读取输入fasta文件)
SATe INFO: Configuration written
to
"/home/maoyaozong/pasta-code/pasta-master/data/satejob3_temp_sate_config.txt". (SATe的配置文件写到 satejob3_temp_sate_config.txt)
SATe INFO: Directory for
temporary files created at (创建一个临时文件的目录)/home/maoyaozong/.sate/satejob/tempkaSq1Y
SATe INFO: Reading input
sequences from 'data/anolis.fasta'...
SATe INFO: Reading starting trees
from "data/anolis.tre"...
(读取初始树的文件)
SATe INFO: Name translation
information saved to (名字转换信息)/home/maoyaozong/pasta-code/pasta-master/data/satejob3_temp_name_translation.txt
as safe name, original name, blank line format.
SATe INFO: Starting SATe
algorithm on initial tree... (对初始树开始SATe 算法)
SATe INFO: Max subproblem set to
3 (默认迭代3轮)
SATe INFO: Step 0. Realigning
with decomposition strategy set to centroid
SATe INFO: Step 0. Alignment
obtained. Tree inference beginning...
SATe INFO: realignment accepted
and score improved.
SATe INFO: current score:
-5779.3102, best score: -5779.3102 (一轮得分)
SATe INFO: Step 1. Realigning
with decomposition strategy set to centroid
SATe INFO: Step 1. Alignment
obtained. Tree inference beginning...
SATe INFO: realignment accepted
and score improved.
SATe INFO: current score:
-5779.302, best score: -5779.302
(二轮得分)
SATe INFO: Step 2. Realigning
with decomposition strategy set to centroid
SATe INFO: Step 2. Alignment
obtained. Tree inference beginning...
SATe INFO: realignment accepted
and despite the score not improving.
SATe INFO: current score:
-5779.3124, best score: -5779.302 (三轮得分)
SATe INFO: Writing resulting
alignment to (把比对的结果写成文件)/home/maoyaozong/pasta-code/pasta-master/data/satejob3.marker001.anolis.aln
SATe INFO: Writing resulting tree
to (把进化树结果写到文件)/home/maoyaozong/pasta-code/pasta-master/data/satejob3.tre
SATe INFO: Writing resulting
likelihood score to (把相似度得分写到文件)/home/maoyaozong/pasta-code/pasta-master/data/satejob3.score.txt
SATe INFO: The resulting
alignment (with the names in a "safe" form) was first written as the
file
"/home/maoyaozong/pasta-code/pasta-master/data/satejob3_temp_iteration_2_seq_alignment.txt"
SATe INFO: The resulting tree (with the names in a
"safe" form) was first written as the file
"/home/maoyaozong/pasta-code/pasta-master/data/satejob3_temp_iteration_2_tree.tre"
SATe INFO: Total time spent: 25.2676310539s (时间开销)
每轮迭代生成3个文件: 这里有3轮 所以生成3*3 =9 个文件
生成一个配置文件:satejob3_temp_sate_config.txt
一个比对结果文件:satejob3.marker001.anolis.aln
一个树文件:satejob3.tre
相似度文件:satejob3.score.tx
名字转换信息文件:satejob3_temp_name_translation.txt
程序输出文件:satejob3.out.txt
一共15个文件