Intro to multiple sequences alignments

by Aidan Budd

Presentation

Making an MSA is Easy?!

The act of taking a set of sequences, inputting them to an automatic MSA tool, and obtaining a result is usually trivial.

Demonstration

Download this set of proteins in FASTA format that are very similar to the human src (UniProt:SRC_HUMAN, P12931) protein
Copy them into the EBI MUSCLE web server
Press submit
Enjoy the resulting MSA

Exercise

To see how easy it is to do this, try the demonstration above, trying to carry it out as quickly as possible.

“Manual” Pairwise Sequence Alignments using JalView - All Files Containing only 2 Sequences

Demonstration

We will demonstrate aligning these two tublin sequences to each other using JalView. The demonstration will involve

Loading the sequences into JalView using both
- File->Input Alignment->from File
- File->Input Alignment->from Textbox
Changing the residue colouring scheme to:
- Percentage Identity
- Clustalx
Inserting gaps into a sequence using the mouse
Saving your work

Note that choosing/building an alignment is “simply” about deciding where gaps belong and where they do not belong.

We will also automatically build pairwise alignments of the sequences using three different approaches:

and compare the results of these approaches to the alignment we bulit “manually”.

Exercises

Try building pairwise alignments yourself using JalView in a similar way with the pairs of sequences found in the following files.

Try also one (or more) of the automatic pairwise alignment tools listed above.

Compare your “manual” alignment with the automatic alignments and:

identify any differences between them
try to decide which (the manual or one of the automatic alignments) you think is better

While carrying out the manual alignments, write down:

Features that describe a relatively “good” alignment, thinking in terms of
- sizes of gaps
- numbers of gaps
- properties of residues in the same column (the same as each other? different?)
Instructions on how to change a “bad” alignment into a better one
Characteristics of sequences that are more difficult/take more time to align than others

Below are the sequences - if possible, try all of them, as they have been chosen to illustrate a range of different issues/points, and it will hopefully be useful for you to have encountered all of these.

The same two tubulin sequences used in the demonstration above
the coding regions of the cDNA for the same two tubulin sequences - note that to do the automatic alignments here you’ll need to choose the nucleotide implementations of the tools
fragments of mouse and rat collagen 18s
ACTB_CERPY and ACTB_TRIVU
SRC_HUMAN and SRK3_SPOLA kinase domains
ARPM1_MOUSE and ACTB_TRIVU
SRC-like Kinases SRC_MOUSE and STK_HYDAT
full length mouse and rat collagen 18s (for those who fancy a challenge)