by Holger Dinkel

Objective: Get familiar with the Phospho.ELM resource.

Presentation Slides:

Searching via protein id

  1. Go to Phospho.ELM and enter protein accession ‘CASP9_HUMAN’ in the second form field.
    1. How many annotated sites/residues do you find? Which sites have information from multiple references?
    2. How many are annotated as high throughput experiments?
    3. What information is given on the surface accessibility of the annotated residues?
    4. (OPTIONAL) later during this course, when you’ve learned to use a 3D-viewer, use it to visualize the structure of (1NW9) and try to highlight the individual phosphorylatable residues and investigate their surface accessibility.
  2. Query Phospho.ELM for phosphorylation sites of protein ‘Cyclin dependent kinase inhibitor 1B’ (use the accession P46527)
    1. How many phosphorylation sites are annotated for this protein?
    2. Which sites are well conserved as well as reside in disordered region? (Tip: Click the column heads for sorting)
    3. Are there any (MINT) interactions annotated for this protein?
    4. (OPTIONAL) Which of these interactions at MINT describe the phosphorylation reaction?
  3. Query Phospho.ELM for SRRM2_HUMAN to see a protein with a quite high number of phosphorylations. Can you estimate how many there are?
    1. Have a look at how many of these annotations stem from low throughput experiments? (eg. not having ‘HTP’ as Source)

Searching via gene name

Recently, you’ve learned about the protein ‘EPSIN’ and you are curious to know if there are any phosphorylation sites annotated for this protein.

  1. Go to Phospho.ELM and enter protein name ‘epsin’ in the first form field (‘gene name’).
    1. You should see a table showing multiple hits in the database. Why?
    2. Select ‘Epsin 1’. Now you’re at the results page. However, if you scroll down, you see results from multiple proteins. Why?
    3. Which protein has the most annotation?
    4. Go back to the Phospho.ELM startpage and query the database so that you’ll receive only results for this one protein.
  2. Start the Jalview Plugin and look at the conservation of the sequence (You’re looking for high conservation in non-structured regions).
    1. Can you spot any?

Using BLAST to find homologous information

  1. Search for Uniprot id ‘ABL1_MOUSE’
    1. How many annotated phosphorylation sites do you find?
  2. Next, run PhosphoBlast on the sequence or ID of ‘ABL1_MOUSE’.
    1. How many phosphorylation sites are found?
    2. Why the difference?
    3. In which proteins / organisms?


PhosphoSitePlus:

Get familiar with PhosphoSitePlus:

This resource has an enormous collection of PTMs, referenced to the source literature. It has a symbiotic relationship with Cell Signaling Technology and links to their antibodies.

The interface is a bit confusing at first glance so the exercises will show you to find information.

  1. Open PhosphoSitePlus and search for p53 using the available search window
    1. How many proteins are retrieved?
    2. What are the modifications found in p53? (Mouse over show legend)
    3. Click on the links to find out what you get.
    4. Use that experience to click on the human p53 page.
      • Find the graphic with modifications.
      • Are PTMs mostly in the ~100AA disordered termini or in the folded DNA-binding domain?
    5. Some PTMs sites can’t be clicked on in the graphic.
      • Why might that be? To find out, track down where the PTM evidence comes from for the Cysteine residue C229.
    6. Examine the large table of PTMs from different organisms.
      • What is LTP and what is HTP?
      • Which evidence type is more reliable?
      • Is it good to have both?
      • Is the residue with the most evidence important for function?
    7. How many p53 sites are sumoylated?
      • Is there any overlap of sumoylation and other PTMs?
    8. Click to get the multiple alignment.
      • Does mouse p53 have more experimental sites than human?
  2. (OPTIONAL) We would like to find proteins which are phosphorylated by the CSK kinase. For this we use the pattern from the ELM class MOD_TYR_CSK. Go back to the PhosphoSitePlus home page and click on ‘Protein, Sequence or Reference search’
    1. In the motif search box type in the reg-exp for MOD_TYR_CSK like this: [TAD][EA]xQY[QE]x[GQA][PEDLS]
    2. Click search
    3. The result page will give you the list of the protein matches
    4. Use the links to the human JMJD3 and BPAG1 proteins to try to find out which of these are likely to actually be phosphorylated by CSK. Have a look at the number of LTP vs. HTP references. Also follow the link to STRING to see if there are any interactions with CSK annotated.

Objective: Get familiar with ELM

ELM (Eukaryotic Linear Motif) prediction tool.

  1. Search protein SRC_HUMAN (accession P12931) for ELMs using the following parameters:
    • Cell Compartment: Not specified
    • Motif Probability Cutoff: 100
    • Context information: (leave blank)
      1. How many instances do you find?
      2. Redo the search (again accession P12931) now using these parameters:
    • Cell Compartment: cytosol
    • Motif Probability Cutoff: 0.01
    • Context information: Homo sapiens
      1. How many instances do you find now?
      2. How many of the instances are ‘annotated’?
      3. Do the structural predictors/filters (SMART, GlobPlot, IUPRED, Secondary Structure) agree in terms of which regions are structured/disordered?
      4. Compare the location of the annotated instances with structural information at hand (IUPRED, Secondary Structure).
      5. For the annotated instances, which of the ELM classes require a phosphorylation at a certain residue of the motif? (Hint: This information can be found in the description of the ELM class)
      6. Which residue in SRC_HUMAN corresponds to this and can you find evidence for a phosphorylation of this residue (using Phospho.ELM)?
  2. Submit the following sequence to ELM, using default parameters, no cellular compartment.
    1. Compare the results with a search for the same sequence when using the cellular compartment ‘plasma membrane’
    2. Are there any phosphorylations annotated for the carboxy terminus of this protein? Why?
  3. There are three annotated instances of the ELM class LIG_NRBOX in the protein NCOA2_HUMAN. Do they reside in ordered or disordered regions (according to IUPred and SMART)?

  4. Search ELM for the following proteins and familiarize yourself with the different modular organizations of the following proteins. You should focus on the different types of protein architectures, the different amount of information which is available from different resources as well as where functional motifs are located / in which part of the protein would you expect them? Feel free to click on the link to resources such as ‘Uniprot’ or ‘SMART’, to get more information about the selected protein.
    1. EGFR_HUMAN
    2. CASP9_HUMAN
    3. EPN1_HUMAN
    4. SMAD3_HUMAN
    5. SOS1_HUMAN
    6. PTN3_HUMAN
    7. SRRM2_HUMAN
    8. KMT2D_HUMAN
    9. KI67_HUMAN
    10. AP180_HUMAN
    11. Q94833_TRIVA
    12. AMPH_HUMAN
    13. MAPK2_HUMAN
    14. CTNB1_HUMAN
    15. JUN_HUMAN
  1. Search elm.eu.org using the protein name ‘MDM4_HUMAN’ and look for the ‘USP binding motif’ DOC_USP7_1.
    1. How many such motif instances are found in this protein sequence?
    2. Try to assess the biological relevance of each of these instances.
    3. (OPTIONAL) Repeat this exercise with protein ‘AMPH_HUMAN’ and ELM class ‘LIG_Clathr_ClatBox_1’
    4. (OPTIONAL) Is the annotation for the biological relevance in accordance with the globular structure?
  2. You’re studying the cell surface expression of a receptor and find out that some isoforms are expressed at the surface (Q05586-2) while another isoform is retained in the endoplasmatic reticulum (Q05586-5). You want to investigate a possible role of linear motifs in this phenomenon.
    1. First, align these sequences to see which parts are similar/identical and which are different (go to http://www.uniprot.org/uniprot/Q05586, scroll down to ‘sequences’, select isoforms 2 and 5 and click ‘align’).
    2. Then use http://elm.eu.org to scan these sequences for linear motifs, using cell compartment filter ‘cytosol’. You’re looking for targeting motifs (TRG_*).
    3. By looking through the annotations of these targeting ELM classes, can you find motif instances that might be responsible for the different behaviour of the isoforms?
    4. If there are multiple instances of that motif found per protein, can you use differential information (comparing the motifs found in the different isoforms) to narrow down the number of candidate instances?
    5. Next, you sequence another isoform (Q05586-4) which also features this motif at a homologous position, but strangely does not get expressed at the cell surface. You discuss this with your colleague and he tells you that he recently found out that this protein also binds to the PDZ domain of DLG4. Can you come up with a hypothesis how this all fits together?
  3. (OPTIONAL) Search for linear motifs in the protein sequence SMAD3_HUMAN.
    1. (OPTIONAL) Which annotated instances can you see?
    2. (OPTIONAL) Click on one of the annotated docking motifs to read about a switching mechanism involving these motifs. How does this switch work?
  1. (OPTIONAL) Caspase 9 is mainly globular protein. Focus on the carboxyterminal globular domain (CASc Caspase, interleukin-1 beta converting enzyme) and try to assess which modification sites (MOD_) are surface accessible (Hint: Mouse-over to find high accessibility scores with low p-values)

Objective: Get familiar with the ELM database

the ELM (Eukaryotic Linear Motif) database.

Instances

  1. (OPTIONAL) Search protein UNG_HUMAN (P13051) for ELMs.
    1. For the annotated instances, which of these ELM classes require a phosphorylation at a certain residue of the motif? (Hint: This information can be found in the description of the ELM class)
    2. Which amino acid residue in UNG_HUMAN corresponds to this and can you find evidence for a phosphorylation of this residue (using Phospho.ELM)?
  2. (OPTIONAL) Get all annotated instances that contain the search term “retinoblastoma” (again, using url http://elm.eu.org/elms/browse_instances.html)
    1. Compare the number of human instances with the number of viral instances.
    2. Read the abstract for the ELM class LIG_Rb_LxCxE_1 to find out why so many viral proteins interact with Rb.
  3. (OPTIONAL) Search Pubmed for the terms “noonan syndrome” AND “motif” (you should get exactly one resulting publication, if not, make sure you use quotes around ‘“noonan syndrome”’)
    1. find the protein sequence that was analysed in this publication, retrieve the sequence from uniprot and submit it to ELM. Can you find the two mutation hotspots that are responsible for the syndrome described in the publication?
  4. (OPTIONAL) Get all annotated instances for Homo sapiens that contain the search term ciliar (Hint: Use http://elm.eu.org/elms/browse_instances.html)
    1. How many are there?
    2. Which experimental evidence is annotated and how reliable is this evidence?
    3. Try to get these instances as TSV-file (tab separated values)

Pathways

  1. (OPTIONAL) Go to the ELM Pathways page (ELM db -> ELM Pathways) and type “jak” into the search field.
    1. From the search results, click on the link “Jak STAT signaling pathway” for the human pathway (hsa04630)
    2. In the KEGG pathway map, try to locate the “SOS” protein.
    3. Which color is it? What does that mean?
    4. Go back tho the ELM database and search for annotated instances in the SOS1_HUMAN protein (use the quick search at the top right of the ELM page)
    5. Click on the instance with startposition 1151. Which interaction has been annotated for this instance? Do you find this interactor in the KEGG pathway? Which domain(s) does this protein consist of? Which motif in SOS1_HUMAN does it interact with?

switches.elm.eu.org:

  1. Search Phospho.ELM for transcription factor ‘Fos’ (id P01101).
  2. You would like to find out which of these annotated phosphorylation sites are involved in switching mechanisms annotated at Switches.ELM?
    1. For this, enter P01101 into the switches.elm search box Search database and click submit.
    2. How many switching motifs do you find?
    3. What type are these?
    4. Click on the first one and investigate the role of the phosphorylation.
  3. Use the “Analyse” function at switches.elm and submit the accession number O70601.
    1. Select residues 170-180 and click submit.
    2. Using the visualization to the right and focussing on the area highlighted by the Motif of interest, what can you find out about:
      1. modified residues?
      2. region of interest?
      3. possible motifs mediating an interaction?

References:

  1. Van Roey K, Uyar B, Weatheritt RJ, Dinkel H, Seiler M, Budd A, Gibson TJ, Davey NE. (2014) “Short Linear Motifs: Ubiquitous and Functionally Diverse Protein Interaction Modules Directing Cell Regulation” Chem Rev. 2014 Jun 13. [URL]
  2. Dinkel H, Van Roey K, Michael S, Davey NE, Weatheritt RJ, Born D, Speck T, Krueger D, Grebnev G, Kuban M, Strumillo M, Uyar B, Budd A, Altenberg B, Seiler M, Chemes LB, Glavina J, Sanchez IE, Diella F, Gibson TJ. (2014) “The eukaryotic linear motif resource ELM: 10 years and counting” Nucleic Acids Res., Nov, 2013 [URL]
  3. Davey NE, Van Roey K, Weatheritt RJ, Toedt G, Uyar B, Altenberg B, Budd A, Diella F, Dinkel H and Gibson TJ (2012), “Attributes of short linear motifs”, Mol Biosyst., Jan, 2012. Vol. 8, pp. 268-281. [DOI] [URL]
  4. Davey NE, Travé G and Gibson TJ (2011), “How viruses hijack cell regulation”, Trends Biochem Sci., Mar, 2011. Vol. 36, pp. 159-169. [DOI] [URL]
  5. Dinkel H, Chica C, Via A, Gould CM, Jensen LJ, Gibson TJ and Diella F (2011), “Phospho.ELM: a database of phosphorylation sites–update 2011.”, Nucleic Acids Res., Jan, 2011. Vol. 39(Database issue), pp. D261-D267. [[DOI] [[URL]
  6. Dyson HJ and Wright PE (2005), “Intrinsically unstructured proteins and their functions”, Nat Rev Mol Cell Biol., Mar, 2005. Vol. 6, pp. 197-208. [DOI] [URL]