WMSA 2

WMSA 2: A multiple DNA/RNA sequence alignment tool implemented with accurate progressive mode and a fast win-win mode combining the center star and progressive strategies.

Multiple sequence alignment is widely used for sequence analysis, such as identifying important sites and phylogenetic analysis. Traditional methods, such as progressive alignment, are time-consuming. To address this issue, we introduce StarTree, a novel method to fast construct a guide tree by combining sequence clustering and hierarchical clustering. Furthermore, we develop a new heuristic similar region detection algorithm using the FM-index and apply the k-banded dynamic program to the profile alignment. We also introduce a win-win alignment algorithm that applies the central star strategy within the clusters to fast the alignment process, then uses the progressive strategy to align the central-aligned profiles, guaranteeing the final alignment's accuracy. We present WMSA 2 based on these improvements and compare the speed and accuracy with other popular methods. The results show that the guide tree made by the StarTree clustering method can lead to better accuracy than that of PartTree while consuming less time and memory than that of UPGMA and mBed methods on datasets with thousands of sequences. During the alignment of simulated data sets, WMSA 2 can consume less time and memory while ranking at the top of Q and TC scores. The WMSA 2 is still better at the time, and memory efficiency on the real datasets and ranks at the top on the average sum of pairs score. For the alignment of 1 million SARS-CoV-2 genomes, the win-win mode of WMSA 2 significantly decreased the consumption time than the former version. Source code and data: https://github.com/malabz/WMSA2.

Soft

latest release of WMSA 2:

WMSA2.jar

more releases

Data

Hierarchical tree simulated datasets

Ref: HAlign 3: Fast Multiple Alignment of Ultra-Large Numbers of Similar DNA/RNA Sequences.
Download: (more information)
sars_cov_2_like_diff_similarity.tar.xz
sars_cov_2_like_diff_treelength.tar.xz
mt_like_diff_similarity.tar.xz
mt_like_diff_treelength.tar.xz

simluation RNA

Download: (more information)
RNA-255
RNA-511
RNA-1023
RNA-2047
RNA-4095
RNA-8191

Ancient human partial mitochondrial DNA

Ref: Ancient DNA Reveals Key Stages in the Formation of Central European Mitochondrial Genetic Diversity. DOI: 10.1126/science.1241844
Download: mt.zip

SARS-CoV-2 genome

Ref: HAlign 3: Fast Multiple Alignment of Ultra-Large Numbers of Similar DNA/RNA Sequences.
Download:
SARS-CoV-2_1020.zip
SARS-CoV-2_1M.tar.xz (more information see this link)

Related Tools

Score scripts

MSA tools

For OSX/Linux/Windows

1. Download and install JDK (version >= 11) for different systems from here.
2. Download WMSA 2 from relseases.

Usages

java -jar WMSA2.jar [-m] mode [-s] similarity [-i] path [-o] path

necessary arguments: 
    -i  Input file path (nucleotide sequences in fasta format)
    -o  Output file path

optional arguments:
    -m  align option (default mode: Pro)
        1. Pro        progressive alignment with StarTree method
        2. Win        combine central and progressive alignment with CluStar method
        3. StarTree   only output the guidetree
    -t  tow guide tree option for MSA (default: StarTree)
        1. StarTree
        2. UPGMA
    -s  the similarity of the cluster (used in Win mode. default: 0.9)

Example

1. Download data and uncompress the data.
2. Align the data with WMSA 2.

Use Pro mode to align the data.

java -jar WMSA2.jar -i test.fasta -o test_algined_WMSA2.fasta

Use Win mode to align the data(the default similarity value is 0.9).

java -jar WMSA2.jar -m Win -s 0.95 -i test.fasta -o test_algined_WMSA2.fasta

WMSA 2 is a free software, License under MIT.

If you use this software, please cite:

WMSA 2: A multiple DNA/RNA sequence alignment tool implemented with accurate progressive mode and a fast win-win mode combining the center star and progressive strategies(under review).

The software tools are developed and maintained by ZOU's lab. If you find any bug, welcome to contact us on the issues page. More tools and infomation can visit our github.

WMSA 2: A multiple DNA/RNA sequence alignment tool implemented with accurate progressive mode and a fast win-win mode combining the center star and progressive strategies.

Introduction

Download

Soft

Data

Related Tools

Installation

For OSX/Linux/Windows

Usages

Example

License

Citation

Contact us