Support for Modules in Cytoscape

Introduction

In many cases, biologists are interested in relationship between functional modules first. Then, they look into the details inside the modules, i.e., actual interactions. Cytoscape can partially handle this problem using Group API and some related plugins, but it's not a standard feature. Also, there is no simple file format to represent hierarchy/substructures. In this project, we are going to implement a mechanism to handle output of module finding algorithms as a hierarchy of subnetworks and make Cytoscape core module-aware.

Usecases

Module Finding Plugins

Currently, there are several plugins finding functional modules from large networks. In some cases, they produce subnetworks, but there is no universal UI or function to keep the structure in the session or other files.

Pathway Overview

Cytoscape has function to import KEGG/Reactome data as attributes. It is useful if Cytoscape has a function to generate substructure automatically based on such annotation.

Implementation

File Formats

Most of popular XML graph file formats support hierarchical structure.

DOT

This is a standard file format in graphviz. It has XML-like structure in the text file, and can represent substructure.

Pajek

This program has 3 types of files to represent substructure:

Therefore, user needs to load 4 files (network, partition, cluster, hierarchy) to reconstruct the saved substructures.

igraph

This package does not have its own file format to represents subgraph. Instead, supporting GraphML and dot.

Cytoscape New File Format for Subgraphs

Standard XML file formats such as GraphML or XGMML can represent substructures. However, they are not easy to edit by hand for many biologists. We need to implement a simple table/text style file format which is editable on spreadsheet programs.

Sample

sample_modules.png

moduleA.png moduleB.png

moduleC.png moduleD.png

Option 1: Two Files

This approach needs two files, standard SIF file and subnetwork definition file (SDF?). This is similar to pajek, but less number of files are required to rebuild the substructures and hierarchies.

Extended SIF

We need to introduce two types of special value for interaction edge attributes:

The first column defines edge attribute for defining which module has the edge in the row.

The sample graph above can be represented in the following:

moduleB node7   DirectedEdge    node3
moduleA node6   DirectedEdge    node2
moduleD node5   DirectedEdge    node4
moduleD node4   DirectedEdge    node2
        node4   DirectedEdge    node3
moduleA node1   DirectedEdge    node2
moduleC node1   DirectedEdge    node0
moduleA node0   DirectedEdge    node6
moduleA node0   DirectedEdge    node2
        moduleC child_of        moduleA
        moduleD         interact_with   moduleA
        moduleD         interact_with   moduleB

If parser finds these two keywords in interaction edge attribute, treat it as a module relationship definition.

SDF

This file defines members of each subnetwork.

moduleA node0 node1 node2 node6
moduleB node3 node7
moduleC node0 node1
moduleD node2 node4 node5

Based on the information above, Cytoscape can rebuild the module relation map:

moduleDAG.png

Pros: simple and biologist-editable Cons: extra files needed

The alternative is separate the first column of SIF, and read as a regular edge attribute. In this case, number of files is three.

Option 2: One File (PSI-MI TAB 2.5-like table format)

This approach is similar to PSI-MI TAB 2.5. MI-TAB file contains both node and edge attributes in one file. For example, if we need to convert the data above into this format, it looks like this:

part_of source  source members  interaction     target  target members
moduleB node7           DirectedEdge    node3   
moduleA node6           DirectedEdge    node2   
moduleD node5           DirectedEdge    node4   
moduleD node4           DirectedEdge    node2   
        node4           DirectedEdge    node3   
moduleA node1           DirectedEdge    node2   
moduleC node1           DirectedEdge    node0   
moduleA node0           DirectedEdge    node6   
moduleA node0           DirectedEdge    node2   
        moduleC node0,node1     child_of        moduleA node0,node1,node2,node6
        moduleD         node2,node4,node5       interact_with   moduleA node0,node1,node2,node6
        moduleD         node2,node4,node5       interact_with   moduleB node3,node7

Pros: Only one file. Excel editable. Cons: Redundant module member information.

Option 3: Simply Support DOT

As far as I know, DOT is the most complete solution for non-XML file format for representing modules. If we support this format, we can use it to represent

User Interface

ModuleFinding (last edited 2009-09-29 18:07:05 by KeiichiroOno)

Funding for Cytoscape is provided by a federal grant from the U.S. National Institute of General Medical Sciences (NIGMS) of the Na tional Institutes of Health (NIH) under award number GM070743-01. Corporate funding is provided through a contract from Unilever PLC.

MoinMoin Appliance - Powered by TurnKey Linux