Diff for "groupAPI" - Cytoscape Wiki

Differences between revisions 25 and 26

RFC Name : Grouping API

About this document

This is an official Request for Comment (RFC) for supporting groups in Cytoscape. This RFC encompasses and leverages the MetaNodes RFC (RFC 8) and the SimplifiedMetaNodeDataStructureRFC (RFC 9) by extending the notion of a metanode, to a more general group concept.

For details on RFCs in general, check out the [http://www.answers.com/main/ntquery?method=4&dsid=2222&dekey=Request+for+Comments&gwp=8&curtab=2222_1&linktext=Request%20for%20Comments Wikipedia Entry: Request for Comments (RFCs)]

Status

Open for public comment

How to Comment

To view/add comments, click on any of 'Comment' links below. By adding your ideas to the Wiki directly, we can more easily organize everyone's ideas, and keep clear records. Be sure to include today's date and your name for each comment. Here is an example to get things started: ["/Comment"].

Try to keep your comments as concrete and constructive as possible. For example, if you find a part of the RFC makes no sense, please say so, but don't stop there. Take the extra step and propose alternatives.

Proposal

The goal is to provide a new package, e.g. cytoscape.groups, that suplants all direct calls to the giny metanodes methods, and extends the concept of metanodes in a structured manner. Similar to the MetaNodes implementation, there is a [http://www.cgl.ucsf.edu/Research/cytoscape/groupAPI/doc/edu/ucsf/groups/GroupManager.html GroupManager] class that should be the main interface for most developer. There are also three interfaces to allow a group to have different abstraction models and different visual properties. The three interfaces are:

[http://www.cgl.ucsf.edu/Research/cytoscape/groupAPI/doc/edu/ucsf/groups/model/GroupModel.html GroupModel] - provides an interface for classes that handle the model associated with presentation of grouped nodes,
[http://www.cgl.ucsf.edu/Research/cytoscape/groupAPI/doc/edu/ucsf/groups/view/GroupViewer.html GroupViewer] - an interface for classes that handle the actual presentation of groups, and
[http://www.cgl.ucsf.edu/Research/cytoscape/groupAPI/doc/edu/ucsf/groups/data/GroupAttributesHandler.html GroupAttributesHandler] - an interface for classes that handle the node and edge attributes of groups.

This package must be in the core to provide direct, consistent access to the grouping API for the XGMML reader/writer, the Cytoscape Editor, and the Metanode Plugin (which will still be provided as one interface to grouping). An [http://www.cgl.ucsf.edu/Research/cytoscape/groupAPI/doc Overview] of the proposed API is available for comment.

There three significant assumptions that underlie this proposal:

A Group exists in only one CyNetwork. This is really a matter of user expectations. Groups are different than CyNodes and CyEdges in that they have a visual state (grouped or ungrouped). Attempting to maintain different states in different CyNetworks that might have different collections of nodes and edges visible would be difficult. The API explicitly provides a method to perform a shallow copy of a group.
For the subnetwork given to create a group, all Edges and Nodes in this subnetwork are used in creating a Group.
This API will take advantage of events to inform it of the deletion of nodes and edges that it cares about. The groupAPI will also issue events to inform interested listeners of significant changes to the groups. These will be documented as part of the API.

Biological Questions / Use Cases

Each use case should be expanded in a separate page by the person (or group) designated in italics. Please use the Use Case template, which has the following elements:

Name of use case
1 paragraph summary
Step-by-step user action
Visual mockup & storyboard
Requirements-met & missing in existing Cytoscape implementation
Frequency of use/importance e.g. every time we analyze data X
Give examples in other programs, or papers

Due Date: November 15th

1. Clustering - Biomodules Gary

[:groupAPI/UseCase 1:Use Case 1]
Biological application: Group proteins in a graph of protein-protein interactions that have a collective function in the cell (http://www.genome.org/cgi/content/abstract/14/3/380) in order to discern higher levels of organization in the biological network. In this case, there might be overlap between two clusters or modules.
Group solution: A group of proteins can be visualized by a single node that has visual and topological characteristics that reflect the underlying group of proteins. For example, the size of the node is proportional to the number of proteins it represents, its connections to other proteins reflect connections from its inside proteins to other proteins, its color represents the average expression levels of its members for a certain condition, etc. See the image in http://labs.systemsbiology.net/galitski/projs/biomodules/index.html. Round nodes are metanodes.

2. Protein Complexes - Pico/GenMAPP (note by Cline/Pasteur) GenMAPP

[:groupAPI/UseCase 2A:UseCase 2A]
Biological application: Group proteins in a pathway that are known to form complexes in order to simplify visualization and store known associations in the data model.
Group solution: Ideally, there could be two views of protein complexes. (1) A collapsed view, similar to that used in Biomodules above, but with a default size (not scaled by number of members) and a label that is unique to the metanode (i.e., PKA complex). (2) A stacked view, where all the children nodes are visible and simply stacked (like gene boxes in GenMAPP).
Extensions: Note that the solution should also fit for protein domains since the particular boundaries between protein domains in a single chain and between proteins in a complex is rather arbitrary, a matter of evolutionary fate. The solution should also extend to the grouping of paralogs and splice variants.
Further note (Cline/Pasteur): these extensions become especially interesting, now that there are high-throughput platforms to measure separate expression levels of genomic features. Biologically, the likelihood of a given interaction will depend on the isoforms produced in the cell, and whether or not the protein features involved in the interactions are expressed.
- - One Group implentation is for each child node to represent a different component of the gene or protein - an exon, or a protein domains. Where the right data is available, interaction would be tied to the components involved in the interaction - much as is done now in the Domain Network plugin.
  - A second Group implementation would have each child node represent a protein isoform of the parent. Again, where the right data is available, interactions would be associated with the isoforms that can interact. Here, some thought should go into how to handle the edges. If one metanode in an interaction has N child nodes, representing N protein isoforms, and the second has M modes representing M isoforms, having up to N*M different edges represents a lot of complexity.
Note: The stacked view is not merely a visualization problem that can be solved by having the ability of viewing different sections within a single node (coloring them differently, etc). This is because we wish to have edges connected to each individual component of the stack. Because of this reason, this is indeed a biological application that can be solved using metanodes.

3. Intragenic Features - Pico/GenMAPP GenMAPP

[:groupAPI/UseCase 3A:UseCase 3A]
Biological application: Associate features such as exon structure, promoter regions and SNP positions with proteins in a pathway. These features are quickly becoming the preferred level of abstraction for microarray analysis and other high-throughput methods. We must be able to translate these massive datasets into biological context (i.e., pathways) in an efficient manner.
Group solution: By associating these feature-level nodes with a protein node in a parent-child relationship, we could efficient map these data types to the biology at the pathway level. These associations might be best viewed as collapsed nodes colored by specified algorithms that consider the data type (e.g., a splicing analysis on all exon data mapped to the whole gene). And instead of expanding the node on the same network, perhaps we could restrict the expansion to a new network (like a small pop up window) that displays the feature-level nodes and direct data mapping, e.g., expression level for each exon associated with the protein. See [attachment:HighResWindow.jpg attached image] for an illustration of the pop-up window.
Note: This type of metanode seems to be qualitatively different and might require separate terminology. Then, again, maybe not?
Note2: After a group discussion, it was decided that this biological application does not require a metanode solution. It can be solved by implemeting a Cytoscape plugin that makes use of already existing Cytoscape functionality.

4. Boxing of groups.

An additional desired way of visualizing groups of nodes is to box them. The box itself is not a node, it is just an enclosing area for a group that can be dragged around while the nodes move with it. In this case, the metanodes can be used a a mechanism to group nodes, to keep track of these groupings, and to modify these groupings (removing or adding membership). But the metanode itself is not visualized. All of the biological applications above can be solved using this boxing visualization.

5. Alternate paralogs in pathways. GenMAPP

[:groupAPI/UseCase 5A:UseCase 5A]
Might have multiple nodes which perform the same function (sort of a logical "OR"). Would want to see these as a group.

6. Protein superfamily networks Scooter

Protein superfamilies can be represented as large networks where the nodes represent the proteins, and the edges represent the relationship (defined by BLAST e-value, structural relationships (RMSD), etc.). These network can contain 1,000s of nodes, but there is often a defined hierarchy -- the superfamily contains several subgroups, which contain families, which contain proteins.
Group solution: The idea is to be able to group various levels of the hierarchy to present a simpler (more abstract) view, and allow the user to be able to "drill down" into the next level of the hierarchy to provide a more detailed view. One possible implementation is to implement all of the nodes contained within the group as a subnetwork (a normal CyNetwork). The user should be able to either "ungroup" the nodes (i.e. display all contained nodes as part of the current view) or be able to open the contained nodes (and edges) up into a new network (view).

7. Named list of genes. Piet & GenMAPP

[:groupAPI/UseCase 7A:UseCase 7A]
Similar to geWorkbench's idea of a "panel", which can have an arbitrary group of nodes e.g. process, cytoplasm. As used in GOMiner, MAPPFinder, etc. e.g. Apoptosis
Use case: Group all nodes belonging to a certain Gene Ontology category
- Annotate network with GO-category: cellcycle
- Annotate network with GO-category: apoptosis
- Group nodes belonging to one or both of the two categories
- Extend this to more categories

8. Black-box pathways. Ethan/Ben/Gary

[:groupAPI/UseCase8:Use Case 8]
Similar to #7 above, but includes connectivity between the group of nodes.

9. States of a protein/generics. For example, grouping together splice variants, PTMs, etc. Ethan/Ben/Gary

10. Groups of graphical elements that are not necessarily nodes or edges. GenMAPP

[:groupAPI/UseCase 10A:UseCase 10A]

11. General collapse/expand paradigm for reducing complexity by hiding Piet

Not necessarily any biological semantics
Use case: Collapse all nodes having edges with the same source nodes
- Biological networks tend to be scale free; a few hubs target large number of genes. These networks are not very clear and it is of interest to see which nodes have hubs in common
- A backbone network of hubs is visible immediately

12. Topological grouping Piet

Hide "downstream" components. Similar to number 11, but selection, construction, and collapse of group would be automatic based on some topological value (e.g. node neighborhood, downstream nodes, etc.)
Use case: It has to be decided which knockout cellline of a gene participating in a number of pathways has to be created; which genes are expected to be affected hypothetically and which genes not
- Create large network from existing pathways
- Assign knockout gene(s)
- Create groups affected by / not affected by based on directionality of edges
- Explore by expanding / collapsing

13. Quick Find and Group Node Jim

Quick find and group nodes should be modified to allow nodes nested (and hidden) within a group node to be searched for.
- See: ["groupAPI/QuickFindAndGroups"]

Implementation Plan

[:groupAPI/Implementation Plan:Implementation Plan]

Comments

PageComment2

-  ← Revision 25 as of 2006-11-15 21:26:15 →
  Size: 13853
  Editor: GaryBader
  Comment:
+  ← Revision 26 as of 2006-11-15 21:30:19 →
  Size: 13887
  Editor: GaryBader
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 89:
+ * [:groupAPI/UseCase8:Use Case 8]