by Scooter Morris, Nadya Doncheva, Lars Juhl Jensen

For this session you will be using the new STRING App for Cytoscape. Install this app if you have not done so already. If you already installed it earlier, please make sure that you have the latest version installed.

Network visualization of proteomics data

For this practical we will use data from the paper “Temporal proteomics of NGF-TrkA signaling identifies an inhibitory role for the E3 ligase Cbl-b in neuroblastoma cell differentiation”. Specifically, we will use Supplementary Table S1.

If you have another suitable dataset yourself, you are of course welcome to use that instead.

Obtaining a STRING network for a disease

Besides explicitly querying STRING for a list of proteins, the app also gives the option to obtain the list of proteins from other sources. One option to have it query the DISEASES for proteins associated with a disease of interest and fetch a STRING network for these.

  • From “Import Network from Public Databases” select “STRING: disease query”.
  • Query for the disease of interest, e.g. “Alzheimer’s”, and select the correct disease term from the list.
  • Specify the desired number of proteins for the disease and the STRING confidence cutoff.

This is all it takes to obtain a network for a disease. You can use the “Expand network” function of the app to add proteins, which are functionally associated with proteins linked to the disease (these might be new candidates). You may want to color the nodes in the network based on, for example, their tissue expression (“nervous system” would make sense for Alzheimer’s disease) or target class (“pharos family”).

You can use multiple disease queries in conjuction with Cytoscape’s merge function to identify proteins that may explain comorbidities. For example, Alzheimer’s disease shows comorbidity with glaucoma. Retrieve a network for each disease find the intersection.

Obtaining a network of proteins associated with a PubMed query

The second way to have the app produce a protein list itself is to base it on a PubMed query. This works by first querying PubMed (via its REST interface) to obtain the list of PMIDs that match the query of interest. Next we search for proteins that are overrepresented in this list of PMIDs relative to PubMed as a whole. This is possible, because we have already before text mined the entire PubMed database; we thus know which abstracts mention which proteins. The list of the most overrepresented proteins then used to fetch a STRING network.

  • From “Import Network from Public Databases” select “STRING: PubMed query”.
  • Choose the organizm of interest, e.g. “Arabidopsis thaliana”.
  • Enter the PubMed query, e.g. “arabidopsis stress response” (without quotes).
  • Specify the desired number of proteins and the STRING confidence score.

When querying this way, it is always wise to first use the PubMed web interface to finetune the query. Once you are satisfied with the set of abstracts retrieved, simply copy the query into the app. Having retrieved the network, you can of course again use all of the Cytoscape functionality, for example, to color the nodes based on their subcellular compartment (“mitochondrion” would make sense for the example query).