Tutorial 7: Modules and complexes
Contents
Biological networks have a modular architecture. A network module is a group of nodes in the network that work together to execute some common function. Once you have identified the nodes in a module, you can intuitively reduce the complexity of your network by replacing the individual nodes with one large parent node, as illustrated in the conceptual diagram below. This will allow you to focus on the interactions with the module, and not worry about its internal operation.
This tutorial will cover methods for finding modules as well as complexes, a special type of module in which several individual proteins are assembled into one larger macromolecular machine. In this tutorial, you will learn:
- How to identify putative complexes in two ways: through network connectivity, and through connectivity and coexpression.
- How to use expression data to identify the putative modules or pathways with significant response to the experimental conditions.
This tutorial features the following plugins, all available via the Cytoscape plugins page.
The MCODE Plugin, developed by Gary Bader at the University of Toronto. The plugin and supporting documentation are available at http://baderlab.org/Software/MCODE and was published in BMC Bioinformatics, 2003.
- (not updated) The Dynamic Expression Plugin, developed by Iliana Avila-Campillo at the Institute for Systems Biology.
- The jActiveModules Plugin, developed by the Ideker Lab at the Department of Bioengineering at UCSD, and published in Bioinformatics, 2002. This plugin is available on the Cytoscape plugin page.
The BiNGO Plugin, developed by the Computational Biology Division, Dept. of Plant Systems Biology, Flanders Interuniversitary Institute for Biotechnology (VIB), published in Bioinformatics in 2005. The plugin and supporting documentation can be found at http://www.psb.ugent.be/cbd/papers/BiNGO/index.htm.
and the following data files:
galFiltered.sif, a model of the galactose utilization pathway in yeast.
gal.5936x20.mrna, a companion expression dataset with the results of twenty genetic pertubation experiments.
galExpData.pvals, another companion expression dataset. This dataset contains P values to describe the significance of each observed change in expression.
Before starting, please download these files to your local computer by right-clicking on the hyperlinks.
This tutorial and accompanying lectures were delivered at CSC, the Finnish IT center for science. The lecture slides of background material and an accompanying video presentation are available courtesy of the CSC at http://www.csc.fi/english/research/sciences/bioscience/Courses_and_events/cytoscape/index_html.
Identify complexes by connectivity: MCODE and BiNGO
Complexes are a special type of module: they are a group of proteins that interact to form one single piece of cellular machinery, such as the ribosome or the spliceosome. One method to determine complexes is by using MCODE, which follows the principle that highly-connected regions (or clusters) of interaction networks are often complexes.
- Start Cytoscape.
- Load the network file galFiltered.sif and apply your favorite layout algorithm.
Go to Plugins → MCODE → Start MCODE. A new window of CytoPanel 1 will appear as shown below:
- Click Analyze to start finding clusters using the default settings. To change various parameters, click on the Advanced Options tab.
The results of the MCODE analysis will appear in CytoPanel 3, as shown:
- Click on the results for the first complex. Notice that on the Cytoscape canvas, the corresponding nodes are selected.
- If these nodes are a portion of a complex, then there should be some process in which they all operate. Thus, if we explore GO term enrichment using the BiNGO plugin, we should see some biological process with significant enrichment for these nodes.
Go to Plugins → BiNGO 2.0.
- In the BiNGO Settings dialog box, fill in the following:
- A network name of your choice (in this example, I used the highly-creative name of "b").
- Leave the box Get cluster from network checked.
- Select the Hypergeometric statistics test with the FDR multiple testing correction.
- Select a high cutoff p-value of 0.05. Why? A higher cutoff value will give us more data that we can review in detail below.
- Select the GO categories overrepresented after correction for visualization.
- Under Select Reference Set, select Test cluster versus network. Why choose that and not Test cluster versus complete annotation? Because this network is a portion of the yeast galactose utilization pathway, and thus any random collection of genes in the network are probably involved in galactose utilization. If we want to know what specific role is played by a portion of the network, we need to look for enrichment relative to the rest of the network.
- Select an ontology of GO_Biological_Process and the organism Saccharomyces cerevisiae.
- Click Start BiNGO. You should see a graph that appears like the one below:
- Notice the dark color of the nodes "peroxisome organization and biogenesis" and "protein-peroxisome targeting". Recall that dark colors imply significant enrichment. What is the p-value? Find out by selecting the adjustedPValue and description parameters that are followed by an underscore and the name of the BiNGO cluster. Note that according to the P values, the enrichment is most significant for "peroxisome organization and biogenesis". With further investigation, you would see that this MCODE complex prediction contains all the genes in S. cerevisiae with this GO term. Thus, this was probably a significant hit.
- For contrast, return to your MCODE results, select putative cluster #10, and run BiNGO on this cluster. You should see a graph like the one shown below, and no P value of comparable significance (verify this).
Identify complexes by coexpression: Dynamic Expression and BiNGO
The Dynamic Expression plugin is being updated for use with Cytoscape 2.5. It is not currently operational in Cytoscape 2.4 or 2.4.1.
Identify perturbed complexes using expression data: jActiveModules
This section will illustrate application of the Cytoscape jActiveModules plugin to find subnetworks of nodes where all or most nodes show substantial responses to the same experimental conditions.
- Return to the network galFiltered.sif.
Load the expression data matrix galExpData.pvals using the File → Import → Attribute/Expression Matrix... function, and assign values to nodes using ID. This file contains expression results for three sets of expression analysis, involving perturbation of three transcription factors involved in the yeast galactose utilization pathway. This file also contains a necessary ingredient for jActiveModules: p-values indicating the significance of each expression value.
Go to Plugins → jActiveModules → Active Modules: Set Parameters. Select all three expression conditions (gal1RGsig, gal4RGsig, and gal80Rsig) for analysis by checking their respective boxes. Notice that the Number of Paths field is set to 5. This means that five putative hits will be returned, even if only one good one is found. Click the Dismiss button to close the window.
Go to Plugins → jActiveModules → Active Modules: Find Modules. This will run jActiveModules. Shortly, you should see a Conditions vs. Pathways window similar to the one shown below. You might not get exactly the same results, because jActiveModules involves random sampling, as we shall discuss below.
What do these results mean?
- The plugin found five putative modules.
- Module #1 contains 14 nodes, has a respectable score of roughly 3.7, and appears to be significant in all three experimental conditions (gal1RG, gal4RG, and gal80R). This looks like a significant hit.
- Module #5 contains four nodes, with a moderate score of about 2.6, significant in all three experimental conditions. This might not be an interesting hit, but can be explored further later.
- Under the Module 1 column, go down to the experimental conditions column, and click on one of the red bars indicating which experimental conditions yielded significant results. On the Cytoscape canvas, the nodes belonging to this module should be selected and highlighted in yellow, as shown:
Click on the bars corresponding to another modules, and the nodes of that module should be highlighted instead. What do these subnetworks represent? This is a set of connected nodes that altogether showed significant expression changes in the same experiment. So, this is a subnetwork with an overall significant response to the experimental conditions. As described in Ideker et al., Bioinformatics 2005 18:S233-S240, such subnetworks tend to correspond to known pathways in the literature.
Congratulations! By now, you are almost ready to go out into the Systems Biology world and do great things! First, have a cup of coffee to celebrate.
For comments or suggestions, please post to the cytoscape-discuss mailing list.
Return to the Cytoscape advanced tutorials.