MOE possesses a powerful and flexible facility for multiple sequence and multiple structure alignment of protein chains. A unique feature of MOE's protein alignment tool, MOE-Align, is that it allows mixed structural and non-structured data. The foundations of the alignment procedure are:
In this article we will describe MOE's multiple sequence and multiple structure alignment procedure, its output as well as the user interface of the application.
The input to the alignment process is a collection of protein chains. Typically, these chains are the results of a sequence search of an unstructured database (e.g., PIR) and the results of a search of a structured database (e.g., PDB). The objective of MOE-Align is to produce a single multiple sequence alignment and a simultaneous superposition of all structured chains in the input collection.
Multiple sequence alignment is implemented with a flexible 3-stage protocol based upon pairwise alignment of alignments comprising the following steps:
The Round Robin and Random Refinement stages, in almost all cases, significantly improve the multiple sequence alignment. However, each stage is optional, which allows, in particular, the refinment of existing alignments. The entire process is controlled with a single graphical interface consisting of the following control panel.

The fundamental sequence alignment algorithm of MOE is capable of producing a pairwise alignment of two sequences given an arbitrary similarity matrix. For sequence data, MOE-Align uses residue identity based matrices (roughly 20x20). For structured data, MOE-Align uses an MxN similarity matrix with each entry populated by a similarity measure of the atomic coordinates. The similarity matrix is populated with values derived from the spatial displacement of residue pairs in a 3D superposition of the structures. This raises a subtle point in that a residue with missing coordinate data cannot be compared to structured residues in the other chain.
In light of this problem, MOE-Align classifies each input chain into one of three classes:
Given such a partition of the input, it is necessary to use separate metrics (depending on the classes) to align each pair of chains. MOE-Align will use a 3D structure similarity matrix only if both sequences have been classified as Complete Structured and a residue identity similarity in all other cases.
If more than two Complete structures are present, MOE-Align performs a complete structural subalignment before proceeding with the alignment of the other chains. In the subsequent alignments the structured data will always be treated as a single block effectively reducing the problem to a situation in which there is at most one Complete Structured chain (or alignment). Thus, we obtain the following alignment protocol:
The Structural Refinement stage uses an initial alignment (the one produced by the sequence data only multiple sequence alignment after the first steps of the above procedure). The Structural Refinement stage is a Round Robin and Random Refinement protocol except that an MxN similarity matrix is used. In each of these stages, the Complete Structured chains are partitioned into two sets in preparation for a pairwise alignment of alignments. This pairwise alignment proceeds as follows.
This procedure is iterated and MOE-Align successively refines a (possibly) non-optimal initial alignment (eg. one generated by a multiple sequence alignment) for the Complete Structured chains.
The procedures so far are sufficient to produce a multiple sequence alignment of mixed structured and unstructured protein chains. In addition, the Complete Structured chains will have been superposed. However, one detail remains: the superposition of the Incomplete Structured coordinates. For each Incomplete Structured chain in the input collection, MOE-Align performs the following:
In this way a collection of mixed sequence and structural data can be simultaneously aligned and superposed in such a way that the structural data affects the sequence alignment and vice versa.
The output of the entire procedure is written to the MOE terminal window. The report consists of the global RMSD of the superposed chains and a pairwise assessment of similarity and alignment quality. As an example, we take structures from the Cytochrome C family with PDB codes
To form a basis for comparison, we disable the structured alignment components
of MOE-Align and apply the procedure (Pile-Up, Round Robin, and
Random Refinement) using residue identities only. This produces the output
pro_Align score (sum of pairs): 5632.0 (pileup)
pro_Align score (sum of pairs): 5680.0 (round robin)
pro_Align score (sum of pairs): 5681.0 (shuffle #8)
pro_Align score (sum of pairs): 5694.0 (shuffle #9)
pro_Align: pairwise percentage residue identity
256B.A 1RCP.A 2CCY.A 1BBH.A 256B.A: 100.0 28.3 22.6 31.1 1RCP.A: 23.3 100.0 25.6 24.8 2CCY.A: 18.9 26.0 100.0 24.4 1BBH.A: 25.2 24.4 23.7 100.0
256B.A : ADL--EDNME T----L-ND- -NL-KV---I EKADNAAQVK DA-LTKMRAA
1RCP.A : ADT--KEVLE ARE-AY-FK- -SLGGS---M KAMTGVAKAF DAEAAKVEAA
2CCY.A : QSK-PEDLLK LRQ-GL-MQ- -TL-KSQW-V PIAGFAAGKA DL-PADAAQR
1BBH.A : AGLSPEEQIE TRQAGYEFMG WNMGKIKANL EGEYNAAQV- EA-AANVIAA
256B.A : ALDAQKAT-- -P---PK-LE --D-K-SPDS PE------MK DFRHGFDILV
1RCP.A : KLEKILATDV AP-LFPAGTS STDLP-GQTE AKAAIWANMD DFGAKGKAMH
2CCY.A : AENMAMVAKL APIGWAKGTE --ALPNGETK PE-AFGSKSA EFLEGWKALA
1BBH.A : IANSGMGALY GP-GTDKNVG --DVK-TRVK PE--FFQNME DVGKIAREFV
256B.A : GQIDDALKLA NEGKVKEAQA AAEQLKTTRN AYHQKYR---
1RCP.A : EAGGAVIAAA NAGDGAAFGA ALQKLGGTCK ACHDDYREED
2CCY.A : TESTKLAAAA KAGP-DALKA QAAATGKVCK ACHEEFK-QD
1BBH.A : GAANTLAEVA ATGEAEAVKT AFGDVGAACK SCHEKYR-AK
When the structural alignment components are enabled, in order to refined
the above alignment, MOE-Align produces the following output.
pro_Align: pairwise percentage residue identity
256B.A 1RCP.A 2CCY.A 1BBH.A 256B.A: 100.0 20.8 16.0 21.7 1RCP.A: 17.1 100.0 19.4 22.5 2CCY.A: 13.4 19.7 100.0 18.1 1BBH.A: 17.6 22.1 17.6 100.0
pro_Align global RMSD: 6.463
pro_Align global RMSD: 3.092
pro_Align global RMSD: 2.936
pro_Align global RMSD: 2.738
pro_Align global RMSD: 2.685
Pairwise RMSD
- upper triangle under optimal pairwise superposition
- lower triangle under optimal global superposition
256B.A 1RCP.A 2CCY.A 1BBH.A 256B.A: 0.000 3.497 3.218 3.255 1RCP.A: 3.506 0.000 2.555 2.111 2CCY.A: 3.219 2.570 0.000 2.309 1BBH.A: 3.258 2.190 2.335 0.000
256B.A : ---------- ADLEDNMETL NDNLKVIEKA -----DNAAQ VKDALTKMRA
1RCP.A : --ADTK-EVL EAREAYFKSL GGSMKAMTGV AK--AFDAEA AKVEAAKLEK
2CCY.A : --QSKPEDLL KLRQGLMQTL KSQWVPIAGF AAGKADLPAD AAQRAENMAM
1BBH.A : AGLSPE-EQI ETRQAGYEFM GWNMGKIKAN L-EGEYNAAQ VEAAANVIAA
256B.A : AALDAQK-AT PPKLED---- ------KSPD SPEMKDFRHG FDILVGQIDD
1RCP.A : ILATDVAPLF PAGTSSTDLP GQTEA-KAAI WANMDDFGAK GKAMHEAGGA
2CCY.A : VAKLAPIGWA KGTEA----L PNGETKPEAF GSKSAEFLEG WKALATESTK
1BBH.A : IANSGMGALY GPGTDKNVGD VKTRVKPEFF Q-NMEDVGKI AREFVGAANT
256B.A : ALKLANEGKV KEAQAAAEQL KTTRNAYHQK YR---
1RCP.A : VIAAANAGDG AAFGAALQKL GGTCKACHDD YREED
2CCY.A : LAAAAKA-GP DALKAQAAAT GKVCKACHEE FKQD-
1BBH.A : LAEVAATGEA EAVKTAFGDV GAACKSCHEK YRAK-
Notice that the global RMSD of all of the structures falls from 6.463 (when the pre-structural alignment is used) to 2.685 after structural refinement. Also notice that the gaps at positions 20 and 50 have been cleaned up.