Intro to multiple sequences alignments
Presentation
Making an MSA is Easy?!
The act of taking a set of sequences, inputting them to an automatic MSA tool, and obtaining a result is usually trivial.
Demonstration
- Download this set of proteins in FASTA format that are very similar to the human src (UniProt:SRC_HUMAN, P12931) protein
- Copy them into the EBI MUSCLE web server
- Press submit
- Enjoy the resulting MSA
Exercise
To see how easy it is to do this, try the demonstration above, trying to carry it out as quickly as possible.
“Manual” Pairwise Sequence Alignments using JalView - All Files Containing only 2 Sequences
Demonstration
We will demonstrate aligning these two tublin sequences to each other using JalView. The demonstration will involve
- Loading the sequences into JalView using both
- File->Input Alignment->from File
- File->Input Alignment->from Textbox
- Changing the residue colouring scheme to:
- Percentage Identity
- Clustalx
- Inserting gaps into a sequence using the mouse
- Saving your work
Note that choosing/building an alignment is “simply” about deciding where gaps belong and where they do not belong.
We will also automatically build pairwise alignments of the sequences using three different approaches:
- Smith-Waterman at the EBI (using Water from the EMBOSS package) - this link is to the protein implementation of this tool
- Needleman-Wunsch at the EBI (using Needle from the EMBOSS package) - this link is to the protein implementation of this tool
- BLAST2Sequences at the NCBI - this link is to the protein implementation of the tool
and compare the results of these approaches to the alignment we bulit “manually”.
Exercises
Try building pairwise alignments yourself using JalView in a similar way with the pairs of sequences found in the following files.
Try also one (or more) of the automatic pairwise alignment tools listed above.
Compare your “manual” alignment with the automatic alignments and:
- identify any differences between them
- try to decide which (the manual or one of the automatic alignments) you think is better
While carrying out the manual alignments, write down:
- Features that describe a relatively “good” alignment, thinking in
terms of
- sizes of gaps
- numbers of gaps
- properties of residues in the same column (the same as each other? different?)
- Instructions on how to change a “bad” alignment into a better one
- Characteristics of sequences that are more difficult/take more time to align than others
Below are the sequences - if possible, try all of them, as they have been chosen to illustrate a range of different issues/points, and it will hopefully be useful for you to have encountered all of these.
- The same two tubulin sequences used in the demonstration above
- the coding regions of the cDNA for the same two tubulin sequences - note that to do the automatic alignments here you’ll need to choose the nucleotide implementations of the tools
- fragments of mouse and rat collagen 18s
- ACTB_CERPY and ACTB_TRIVU
- SRC_HUMAN and SRK3_SPOLA kinase domains
- ARPM1_MOUSE and ACTB_TRIVU
- SRC-like Kinases SRC_MOUSE and STK_HYDAT
- full length mouse and rat collagen 18s (for those who fancy a challenge)