Tutorial 7: Modules and complexes

Biological networks have a modular architecture. A network module is a group of nodes in the network that work together to execute some common function. Once you have identified the nodes in a module, you can intuitively reduce the complexity of your network by replacing the individual nodes with one large parent node, as illustrated in the conceptual diagram below. This will allow you to focus on the interactions with the module, and not worry about its internal operation.


This tutorial will cover methods for finding modules as well as complexes, a special type of module in which several individual proteins are assembled into one larger macromolecular machine. In this tutorial, you will learn:

This tutorial features the following plugins, all available via the Cytoscape plugins page.

and the following data files:

Before starting, please download these files to your local computer by right-clicking on the hyperlinks.

This tutorial and accompanying lectures were delivered at CSC, the Finnish IT center for science. The lecture slides of background material and an accompanying video presentation are available courtesy of the CSC at http://www.csc.fi/english/research/sciences/bioscience/Courses_and_events/cytoscape/index_html.

Identify complexes by connectivity: MCODE and BiNGO

Complexes are a special type of module: they are a group of proteins that interact to form one single piece of cellular machinery, such as the ribosome or the spliceosome. One method to determine complexes is by using MCODE, which follows the principle that highly-connected regions (or clusters) of interaction networks are often complexes.

  1. Start Cytoscape.
  2. Load the network file galFiltered.sif and apply your favorite layout algorithm.
  3. Go to Plugins → MCODE → Start MCODE. A new window of CytoPanel 1 will appear as shown below:

    • small_mcode_open.png

  4. Click Analyze to start finding clusters using the default settings. To change various parameters, click on the Advanced Options tab.
  5. The results of the MCODE analysis will appear in CytoPanel 3, as shown:

    • small_mcode_results.png

    Thirteen putative complexes are listed, giving the score and the number of nodes and edges in each. Not all of these results will be significant. A significant result is one with a high score (greater than one) and a decent number of nodes and edges. In this case, the first three results may be significant, while the others are more dubious.
  6. Click on the results for the first complex. Notice that on the Cytoscape canvas, the corresponding nodes are selected.
  7. If these nodes are a portion of a complex, then there should be some process in which they all operate. Thus, if we explore GO term enrichment using the BiNGO plugin, we should see some biological process with significant enrichment for these nodes.
  8. Go to Plugins → BiNGO 2.0.

  9. In the BiNGO Settings dialog box, fill in the following:
    • A network name of your choice (in this example, I used the highly-creative name of "b").
    • Leave the box Get cluster from network checked.
    • Select the Hypergeometric statistics test with the FDR multiple testing correction.
    • Select a high cutoff p-value of 0.05. Why? A higher cutoff value will give us more data that we can review in detail below.
    • Select the GO categories overrepresented after correction for visualization.
    • Under Select Reference Set, select Test cluster versus network. Why choose that and not Test cluster versus complete annotation? Because this network is a portion of the yeast galactose utilization pathway, and thus any random collection of genes in the network are probably involved in galactose utilization. If we want to know what specific role is played by a portion of the network, we need to look for enrichment relative to the rest of the network.
    • Select an ontology of GO_Biological_Process and the organism Saccharomyces cerevisiae.
    The BiNGO Settings dialog box should appear as follows:
    • small_bingo.png

  10. Click Start BiNGO. You should see a graph that appears like the one below:
    • small_b_results.png

  11. Notice the dark color of the nodes "peroxisome organization and biogenesis" and "protein-peroxisome targeting". Recall that dark colors imply significant enrichment. What is the p-value? Find out by selecting the adjustedPValue and description parameters that are followed by an underscore and the name of the BiNGO cluster. Note that according to the P values, the enrichment is most significant for "peroxisome organization and biogenesis". With further investigation, you would see that this MCODE complex prediction contains all the genes in S. cerevisiae with this GO term. Thus, this was probably a significant hit.
  12. For contrast, return to your MCODE results, select putative cluster #10, and run BiNGO on this cluster. You should see a graph like the one shown below, and no P value of comparable significance (verify this).
    • small_q_results.png

Identify complexes by coexpression: Dynamic Expression and BiNGO

The Dynamic Expression plugin is being updated for use with Cytoscape 2.5. It is not currently operational in Cytoscape 2.4 or 2.4.1.

Identify perturbed complexes using expression data: jActiveModules

This section will illustrate application of the Cytoscape jActiveModules plugin to find subnetworks of nodes where all or most nodes show substantial responses to the same experimental conditions.

  1. Return to the network galFiltered.sif.
  2. Load the expression data matrix galExpData.pvals using the File → Import → Attribute/Expression Matrix... function, and assign values to nodes using ID. This file contains expression results for three sets of expression analysis, involving perturbation of three transcription factors involved in the yeast galactose utilization pathway. This file also contains a necessary ingredient for jActiveModules: p-values indicating the significance of each expression value.

  3. Go to Plugins → jActiveModules → Active Modules: Set Parameters. Select all three expression conditions (gal1RGsig, gal4RGsig, and gal80Rsig) for analysis by checking their respective boxes. Notice that the Number of Paths field is set to 5. This means that five putative hits will be returned, even if only one good one is found. Click the Dismiss button to close the window.

    • small_am_param.png

  4. Go to Plugins → jActiveModules → Active Modules: Find Modules. This will run jActiveModules. Shortly, you should see a Conditions vs. Pathways window similar to the one shown below. You might not get exactly the same results, because jActiveModules involves random sampling, as we shall discuss below.

    • am_results.png

What do these results mean?

Congratulations! By now, you are almost ready to go out into the Systems Biology world and do great things! First, have a cup of coffee to celebrate.

For comments or suggestions, please post to the cytoscape-discuss mailing list.

Return to the Cytoscape advanced tutorials.

Presentations/07_Complexes (last edited 2009-02-12 01:03:33 by localhost)

Funding for Cytoscape is provided by a federal grant from the U.S. National Institute of General Medical Sciences (NIGMS) of the Na tional Institutes of Health (NIH) under award number GM070743-01. Corporate funding is provided through a contract from Unilever PLC.

MoinMoin Appliance - Powered by TurnKey Linux