Differences between revisions 28 and 29
Revision 28 as of 2007-05-29 22:29:51
Size: 25031
Editor: 142
Comment:
Revision 29 as of 2009-02-12 01:04:12
Size: 25051
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 5: Line 5:
[[TableOfContents([2])]] <<TableOfContents([2])>>
Line 10: Line 10:
For details on RFCs in general, check out the [http://www.answers.com/main/ntquery?method=4&dsid=2222&dekey=Request+for+Comments&gwp=8&curtab=2222_1&linktext=Request%20for%20Comments Wikipedia Entry: Request for Comments (RFCs)] For details on RFCs in general, check out the [[http://www.answers.com/main/ntquery?method=4&dsid=2222&dekey=Request+for+Comments&gwp=8&curtab=2222_1&linktext=Request%20for%20Comments|Wikipedia Entry: Request for Comments (RFCs)]]
Line 17: Line 17:
To view/add comments, click on any of 'Comment' links below. By adding your ideas to the Wiki directly, we can more easily organize everyone's ideas, and keep clear records. Be sure to include today's date and your name for each comment. Here is an example to get things started: ["/Comment"]. To view/add comments, click on any of 'Comment' links below. By adding your ideas to the Wiki directly, we can more easily organize everyone's ideas, and keep clear records. Be sure to include today's date and your name for each comment. Here is an example to get things started: [[/Comment]].
Line 22: Line 22:
The goal is to provide a new package, e.g. cytoscape.groups, that suplants all direct calls to the giny metanodes methods, and extends the concept of metanodes in a structured manner. Similar to the MetaNodes implementation, there is a [http://www.cgl.ucsf.edu/Research/cytoscape/groupAPI/doc/edu/ucsf/groups/GroupManager.html GroupManager] class that should be the main interface for most developer. There are also three interfaces to allow a ''group'' to have different abstraction models and different visual properties. The three interfaces are:

 * [http://www.cgl.ucsf.edu/Research/cytoscape/groupAPI/doc/edu/ucsf/groups/model/GroupModel.html GroupModel] - provides an interface for classes that handle the model associated with presentation of grouped nodes,
 * [http://www.cgl.ucsf.edu/Research/cytoscape/groupAPI/doc/edu/ucsf/groups/view/GroupViewer.html GroupViewer] - an interface for classes that handle the actual presentation of groups, and
 * [http://www.cgl.ucsf.edu/Research/cytoscape/groupAPI/doc/edu/ucsf/groups/data/GroupAttributesHandler.html GroupAttributesHandler] - an interface for classes that handle the node and edge attributes of groups.
This package must be in the core to provide direct, consistent access to the grouping API for the XGMML reader/writer, the Cytoscape Editor, and the Metanode Plugin (which will still be provided as one interface to grouping). An [http://www.cgl.ucsf.edu/Research/cytoscape/groupAPI/doc Overview] of the proposed API is available for comment.
The goal is to provide a new package, e.g. cytoscape.groups, that suplants all direct calls to the giny metanodes methods, and extends the concept of metanodes in a structured manner. Similar to the MetaNodes implementation, there is a [[http://www.cgl.ucsf.edu/Research/cytoscape/groupAPI/doc/edu/ucsf/groups/GroupManager.html|GroupManager]] class that should be the main interface for most developer. There are also three interfaces to allow a ''group'' to have different abstraction models and different visual properties. The three interfaces are:

 * [[http://www.cgl.ucsf.edu/Research/cytoscape/groupAPI/doc/edu/ucsf/groups/model/GroupModel.html|GroupModel]] - provides an interface for classes that handle the model associated with presentation of grouped nodes,
 * [[http://www.cgl.ucsf.edu/Research/cytoscape/groupAPI/doc/edu/ucsf/groups/view/GroupViewer.html|GroupViewer]] - an interface for classes that handle the actual presentation of groups, and
 * [[http://www.cgl.ucsf.edu/Research/cytoscape/groupAPI/doc/edu/ucsf/groups/data/GroupAttributesHandler.html|GroupAttributesHandler]] - an interface for classes that handle the node and edge attributes of groups.
This package must be in the core to provide direct, consistent access to the grouping API for the XGMML reader/writer, the Cytoscape Editor, and the Metanode Plugin (which will still be provided as one interface to grouping). An [[http://www.cgl.ucsf.edu/Research/cytoscape/groupAPI/doc|Overview]] of the proposed API is available for comment.
Line 47: Line 47:
 * [:groupAPI/UseCase 1:Use Case 1]  * [[groupAPI/UseCase_1|Use Case 1]]
Line 52: Line 52:
 * [:groupAPI/UseCase 2A:UseCase 2A]  * [[groupAPI/UseCase_2A|UseCase 2A]]
Line 62: Line 62:
 * [:groupAPI/UseCase 3A:UseCase 3A]  * [[groupAPI/UseCase_3A|UseCase 3A]]
Line 73: Line 73:
 * [:groupAPI/UseCase 5A:UseCase 5A]  * [[groupAPI/UseCase_5A|UseCase 5A]]
Line 81: Line 81:
 * [:groupAPI/UseCase 7A:UseCase 7A]  * [[groupAPI/UseCase_7A|UseCase 7A]]
Line 90: Line 90:
 * [:groupAPI/UseCase8:Use Case 8]  * [[groupAPI/UseCase8|Use Case 8]]
Line 96: Line 96:
 * [:groupAPI/UseCase 10A:UseCase 10A]  * [[groupAPI/UseCase_10A|UseCase 10A]]
Line 116: Line 116:
  * See: ["groupAPI/QuickFindAndGroups"]   * See: [[groupAPI/QuickFindAndGroups]]
Line 118: Line 118:
 * [:groupAPI/Implementation Plan:Implementation Plan]  * [[groupAPI/Implementation_Plan|Implementation Plan]]

RFC Name : Grouping API

Editor(s): ScooterMorris

<<TableOfContents: execution failed [Argument "maxdepth" must be an integer value, not "[2]"] (see also the log)>>

About this document

This is an official Request for Comment (RFC) for supporting groups in Cytoscape. This RFC encompasses and leverages the MetaNodes RFC (RFC 8) and the SimplifiedMetaNodeDataStructureRFC (RFC 9) by extending the notion of a metanode, to a more general group concept.

For details on RFCs in general, check out the Wikipedia Entry: Request for Comments (RFCs)

Status

Open for public comment

How to Comment

To view/add comments, click on any of 'Comment' links below. By adding your ideas to the Wiki directly, we can more easily organize everyone's ideas, and keep clear records. Be sure to include today's date and your name for each comment. Here is an example to get things started: /Comment.

Try to keep your comments as concrete and constructive as possible. For example, if you find a part of the RFC makes no sense, please say so, but don't stop there. Take the extra step and propose alternatives.

Proposal

The goal is to provide a new package, e.g. cytoscape.groups, that suplants all direct calls to the giny metanodes methods, and extends the concept of metanodes in a structured manner. Similar to the MetaNodes implementation, there is a GroupManager class that should be the main interface for most developer. There are also three interfaces to allow a group to have different abstraction models and different visual properties. The three interfaces are:

  • GroupModel - provides an interface for classes that handle the model associated with presentation of grouped nodes,

  • GroupViewer - an interface for classes that handle the actual presentation of groups, and

  • GroupAttributesHandler - an interface for classes that handle the node and edge attributes of groups.

This package must be in the core to provide direct, consistent access to the grouping API for the XGMML reader/writer, the Cytoscape Editor, and the Metanode Plugin (which will still be provided as one interface to grouping). An Overview of the proposed API is available for comment.

There three significant assumptions that underlie this proposal:

  1. A Group exists in only one CyNetwork. This is really a matter of user expectations. Groups are different than CyNodes and CyEdges in that they have a visual state (grouped or ungrouped). Attempting to maintain different states in different CyNetworks that might have different collections of nodes and edges visible would be difficult. The API explicitly provides a method to perform a shallow copy of a group.

  2. For the subnetwork given to create a group, all Edges and Nodes in this subnetwork are used in creating a Group.

  3. This API will take advantage of events to inform it of the deletion of nodes and edges that it cares about. The groupAPI will also issue events to inform interested listeners of significant changes to the groups. These will be documented as part of the API.

Biological Questions / Use Cases

Each use case should be expanded in a separate page by the person (or group) designated in italics. Please use the Use Case template, which has the following elements:

  • Name of use case
  • 1 paragraph summary
  • Step-by-step user action
  • Visual mockup & storyboard

  • Requirements-met & missing in existing Cytoscape implementation

  • Frequency of use/importance e.g. every time we analyze data X
  • Give examples in other programs, or papers

Due Date: November 15th

1. Clustering - Biomodules Gary

  • Use Case 1

  • Biological application: Group proteins in a graph of protein-protein interactions that have a collective function in the cell (http://www.genome.org/cgi/content/abstract/14/3/380) in order to discern higher levels of organization in the biological network. In this case, there might be overlap between two clusters or modules.

  • Group solution: A group of proteins can be visualized by a single node that has visual and topological characteristics that reflect the underlying group of proteins. For example, the size of the node is proportional to the number of proteins it represents, its connections to other proteins reflect connections from its inside proteins to other proteins, its color represents the average expression levels of its members for a certain condition, etc. See the image in http://labs.systemsbiology.net/galitski/projs/biomodules/index.html. Round nodes are metanodes.

2. Protein Complexes - Pico/GenMAPP (note by Cline/Pasteur) GenMAPP

  • UseCase 2A

  • Biological application: Group proteins in a pathway that are known to form complexes in order to simplify visualization and store known associations in the data model.
  • Group solution: Ideally, there could be two views of protein complexes. (1) A collapsed view, similar to that used in Biomodules above, but with a default size (not scaled by number of members) and a label that is unique to the metanode (i.e., PKA complex). (2) A stacked view, where all the children nodes are visible and simply stacked (like gene boxes in GenMAPP).

  • Extensions: Note that the solution should also fit for protein domains since the particular boundaries between protein domains in a single chain and between proteins in a complex is rather arbitrary, a matter of evolutionary fate. The solution should also extend to the grouping of paralogs and splice variants.
  • Further note (Cline/Pasteur): these extensions become especially interesting, now that there are high-throughput platforms to measure separate expression levels of genomic features. Biologically, the likelihood of a given interaction will depend on the isoforms produced in the cell, and whether or not the protein features involved in the interactions are expressed.
    • - One Group implentation is for each child node to represent a different component of the gene or protein - an exon, or a protein domains. Where the right data is available, interaction would be tied to the components involved in the interaction - much as is done now in the Domain Network plugin.

      - A second Group implementation would have each child node represent a protein isoform of the parent. Again, where the right data is available, interactions would be associated with the isoforms that can interact. Here, some thought should go into how to handle the edges. If one metanode in an interaction has N child nodes, representing N protein isoforms, and the second has M modes representing M isoforms, having up to N*M different edges represents a lot of complexity.

  • Note: The stacked view is not merely a visualization problem that can be solved by having the ability of viewing different sections within a single node (coloring them differently, etc). This is because we wish to have edges connected to each individual component of the stack. Because of this reason, this is indeed a biological application that can be solved using metanodes.

3. Intragenic Features - Pico/GenMAPP GenMAPP

  • UseCase 3A

  • Biological application: Associate features such as exon structure, promoter regions and SNP positions with proteins in a pathway. These features are quickly becoming the preferred level of abstraction for microarray analysis and other high-throughput methods. We must be able to translate these massive datasets into biological context (i.e., pathways) in an efficient manner.
  • Group solution: By associating these feature-level nodes with a protein node in a parent-child relationship, we could efficient map these data types to the biology at the pathway level. These associations might be best viewed as collapsed nodes colored by specified algorithms that consider the data type (e.g., a splicing analysis on all exon data mapped to the whole gene). And instead of expanding the node on the same network, perhaps we could restrict the expansion to a new network (like a small pop up window) that displays the feature-level nodes and direct data mapping, e.g., expression level for each exon associated with the protein.
  • Note: This type of metanode seems to be qualitatively different and might require separate terminology. Then, again, maybe not?
  • Note2: After a group discussion, it was decided that this biological application does not require a metanode solution. It can be solved by implemeting a Cytoscape plugin that makes use of already existing Cytoscape functionality.

4. Boxing of groups.

  • An additional desired way of visualizing groups of nodes is to box them. The box itself is not a node, it is just an enclosing area for a group that can be dragged around while the nodes move with it. In this case, the metanodes can be used a a mechanism to group nodes, to keep track of these groupings, and to modify these groupings (removing or adding membership). But the metanode itself is not visualized. All of the biological applications above can be solved using this boxing visualization.

5. Alternate paralogs in pathways. GenMAPP

  • UseCase 5A

  • Might have multiple nodes which perform the same function (sort of a logical "OR"). Would want to see these as a group.

6. Protein superfamily networks Scooter

  • Protein superfamilies can be represented as large networks where the nodes represent the proteins, and the edges represent the relationship (defined by BLAST e-value, structural relationships (RMSD), etc.). These network can contain 1,000s of nodes, but there is often a defined hierarchy -- the superfamily contains several subgroups, which contain families, which contain proteins.
  • Group solution: The idea is to be able to group various levels of the hierarchy to present a simpler (more abstract) view, and allow the user to be able to "drill down" into the next level of the hierarchy to provide a more detailed view. One possible implementation is to implement all of the nodes contained within the group as a subnetwork (a normal CyNetwork). The user should be able to either "ungroup" the nodes (i.e. display all contained nodes as part of the current view) or be able to open the contained nodes (and edges) up into a new network (view).

7. Named list of genes. Piet & GenMAPP

  • UseCase 7A

  • Similar to geWorkbench's idea of a "panel", which can have an arbitrary group of nodes e.g. process, cytoplasm. As used in GOMiner, MAPPFinder, etc. e.g. Apoptosis
  • Use case: Group all nodes belonging to a certain Gene Ontology category

    • Annotate network with GO-category: cellcycle
    • Annotate network with GO-category: apoptosis
    • Group nodes belonging to one or both of the two categories
    • Extend this to more categories

8. Black-box pathways. Ethan/Ben/Gary

  • Use Case 8

  • Similar to #7 above, but includes connectivity between the group of nodes.

9. States of a protein/generics. For example, grouping together splice variants, PTMs, etc. Ethan/Ben/Gary

10. Groups of graphical elements that are not necessarily nodes or edges. GenMAPP

11. General collapse/expand paradigm for reducing complexity by hiding Piet

  • Not necessarily any biological semantics
  • Use case: Collapse all nodes having edges with the same source nodes

    • Biological networks tend to be scale free; a few hubs target large number of genes. These networks are not very clear and it is of interest to see which nodes have hubs in common
    • A backbone network of hubs is visible immediately

12. Topological grouping Piet

  • Hide "downstream" components. Similar to number 11, but selection, construction, and collapse of group would be automatic based on some topological value (e.g. node neighborhood, downstream nodes, etc.)
  • Use case: It has to be decided which knockout cellline of a gene participating in a number of pathways has to be created; which genes are expected to be affected hypothetically and which genes not

    • Create large network from existing pathways
    • Assign knockout gene(s)
    • Create groups affected by / not affected by based on directionality of edges
    • Explore by expanding / collapsing

13. Quick Find and Group Node Jim

  • Quick find and group nodes should be modified to allow nodes nested (and hidden) within a group node to be searched for.

Implementation Plan

Comments

MichaelCreech 2006-09-08 08:10:43 Assumptions A few assumptions you might want to explicitly state: 1) A Group exists in only one CyNetwork

  • a) Corollary: The Group's identifying node exists in only one CyNetwork (as far as the Group machinery is concerned).

2) For the subnetwork given to create a group, all Edges and Nodes in this subnetwork are used in creating a Group.

Issues 1) The current model requires all Cytoscape-based code to explicitly tell GroupUtils when a node and edge is added or deleted (through GroupUtils.deleteEdgeNotify() and deleteNodeNotify()) in order to keep the Group consistent with its underlying Cytoscape structure.

  • I think this is a major problem for several reasons:

    a) It requires all core code and plugins to change their code to call these GroupUtils methods. b) I also forces that the Group API implementation cannot be a plugin--must be part of the core--otherwise these calls would not work if the plugin were not loaded. c) It leads to strong coupling between the various components that make up Cytoscape since now all the various components that change Nodes and Edges must know about and explicitly reference GroupUtils. The alternate approach is for Groups to use event handling and track when nodes and edges are added or deleted. There may be performance and other issues with the current event handling implementation that must be fixed before this can be used as a solution.

2) Should only CyNode, CyEdge, and CyNetwork be referenced or should their underlying interfaces be referenced, such as Node, Edge, and GraphPerspective? CyNode and CyEdge are also implementation classes, not interfaces.

  • Unless there are specific methods used in CyNode, CyEdge, and CyNetwork that aren't a part of their underlying interfaces, the API would be more flexible referencing the underlying interfaces--with the *big* caveat that the underlying interfaces aren't being removed.

Suggestions 1) Based on assumption one, add a GroupUtils.getCyNetwork (CyNode group_node).

  • This would return the CyNetwork for which a given group node belongs.

2) Remove CyNetwork parameter to all operations where it is not needed.

3) Change implementation-specific parameters to more general interfaces.

  • Example: ArrayList is used as a parameter to several operations (GroupAttributesHandler.setAttributes()) versus a List, Collection, or Iterator parameter type.

4) Use only standard java data structures.

5) Clarify mutability of List return values or change to Iterator.

  • Example: GroupUtils.getGroupMembers(). Such methods should clearly state if the List returned is safely modifiable. Another approach is to return a immutable Iterator, which leaves more flexibility for the implementation.

6) Fix inconsistencies in GroupAbstractionModel:

  • a) addEdgeNodify() states: "Inform the group model abstraction that an edge has been added." However, GroupAbstractionModel.addNodeNodify() states: "Inform the group model abstraction that an edge has been deleted." b) There is a deleteEdgeNotify() but no deleteNodeNotify().

7) Possibly remove duplicate methods that perform the same operation for nodes and edges.

  • For example, there is a GroupAbstractionModel.addEdgeNotify() and GroupAbstractionModel.addNodeNotify(). Why not have one addGraphObjectNotify()? I know why this is the case-it's been done this way in the past. However, there is an abstraction in Giny of a GraphObject, of which both a Node and Edge are extensions. Thus, you could have an addGraphObjectNotify (GraphObject obj).

8) Change the name GroupUtils to something less misleading (maybe Group or GroupManager)?

  • Usually 'utils' implies lower-level miscellaneous utility operations for some API versus the main top-level mechanism of interaction with the API.

9) Change GroupUtils.getGroupMembers() to return the CyNetwork that is the sub-network representing the Group versus returning a List of Nodes.

  • This would be useful because it also gives what edges are in the Group. Otherwise, you probably need two different operations: getGroupNodes() and getGroupEdges(), or change getGroupMembers() to return a heterogeneous List of Nodes and Edges.

10) Allow null network parameter to GroupUtils.getGroupNodes().

  • When the network is null, this would mean to return *all* groups (across all CyNetworks) for which the given member belongs.

11) Add a GroupUtils.getSubGroups (CyNode groupNode)

  • This would return a Collection or Iterator or all the sub groups contained within a Group.

12) Drop 'Abstraction' from GroupAbstractionModel and GroupAbstractionViewer

13) Add GroupUtils.getGroupModel() and getGroupAttributesHandler()

  • If a group is created without specifying a specific handler and model, there is no way to get at the handler and model.

14) Change setGroupAbstraction() to setGroupAbstractionModel()

  • Or setGroupModel() if suggestion 12 is used.

15) Add GroupUtils.setGroupAttributesHandler()

  • This is then consistent with the existing setGroupAbstraction(Model).


MichaelCreech 2006-09-08 08:25:11

16) Need to be able to Add and Delete from a Group

  • Might add new API functions for these operations or tell users to directly modify the sub-group, if you catch add/remove events.

17) Need a way to copy a group


MichaelCreech 2006-09-08 14:56:27 Another suggestion:

18) Change variable references in the API to method references.

  • There are a few variable references in the API, namely GroupUtils.defaultModel, GroupUtils.defaultAttHandler, and GroupAttributesHandler.DEFAULT_NODE_LABEL_ATTRIBUTE. It would be better *only* access these through equivalently named methods--thus allowing more flexibility and better encapsulation in the implementation. Possible methods are GroupUtils.getDefaultModel(), GroupUtils.getDefaultAttHandler(), and GroupAttributesHandler.getDefaultNodeLabelAttribute().


MikeSmoot 2006-10-03 16:34:26 Assumptions:

#1 I'm not sure about this. If the nodes in a group exist in more than one network, why wouldn't the group?

Issues:

#1 I wholly agree. We're going to need decent event handling to support undo as well, so I think this would be time well invested.

#2 Not sure about this one. Some people argue that it's better to use CyNode, CyEdge, etc. because they are specific to cytoscape whereas giny Nodes and Edges are not. If we were to ever move away from giny, the use of Nodes and Edges as opposed to CyNodes and CyEdges would complicate things. That said, I think it's highly unlikely we'd ever do anything other than subsume giny into cytoscape. If others agree, then we should definitely code to the interface.

Suggestions:

#1,2 Not sure about assumption 1, so I'm not sure if this is a good idea or not.

#3 Absolutely.

#4 Maybe. It depends on how the structures are being used and what kind of performance is needed. Since all of the core data structures use ints, it might be better to use the non-standard int data structures rather than the standard Integer ones. Even if we change the group api, these int specific data structures are used elsewhere.

#5 Agreed.

#6 Agreed.

#7 Yeah, but see my hesitation on Issue #2. If we agree that we can use giny, then I agree with this change.

#8 Absolutely.

#9 Agreed. I would prefer that we return a CyNetwork rather than a heterogeneous list of Nodes and Edges.

#10 Disagree. In general, I don't like methods that have "hidden" behavior that isn't apparent from the interface. I'd prefer two methods getAllGroups(), and getGroups(CyNetwork).

#11 Ok.

#12 Absolutely.

#13 Ok.

#14 See #12.

#15 Ok.

#16 Ok.

#17 Ok.

#18 Agreed.


MichaelCreech 2006-10-04 06:48:55 Suggestion:

19) Separate user event handling from the GroupAbstractionModel.

  • It looks like the only way to get event information about changes to a Group is to create or extend the GroupAbstractionModel and override the appropriate XXXnotify() methods (e.g., addEdgeNotify()). Getting information about changes to a group should be independent of the need to create a new Group Model.


AdityaVailaya 2006-10-04 10:49:32 Assumptions #1 It is my understanding that GenMapp group prefers not to have a group automatically created across networks, but have a group be local to a particular network. Further, it was suggested at the mini-retreat (early September in San Francisco) it was suggested that a copy (note no sharing) of a group would be made if it is to be reused in another network. The "copy" allows for different properties to be attached to a group in different networks.


ScooterMorris 2006-10-04 11:44:12 Regarding assumption #1: the issue is one of user expectations. If a user is looking at a network that has been created from another network, and they create a group, would they expect to see the nodes in the other original network group? Our conclusion was that they would not. In addition, since the collection of non-grouped nodes might be different in the two networks, the implementation would be more difficult. That being said, we also talked about the user expectations of creating a new network from a group of nodes in an existing network, when that group of nodes includes a groupNode. In that case, we felt that the right thing would be to make a copy of the group (see Aditya's comment above).


MikeSmoot 2006-11-17 12:12:45 I know we debated this for hours at the retreat, but I don't recall that we ever agreed on Assumption #1.

In discussing this, I also think we need to be careful about distinguishing a group model from a group view.

If the nodes and edges contained in a group are represented by a CyNetwork, then that CyNetwork will be just like any other CyNetwork.

If a group model is represented in a CyNetwork by a CyNode with a pointer to another CyNetwork, then that CyNode is just like any other CyNode. This means there is one representation of that CyNode in the rootgraph and as such, that CyNode can be included in more than one CyNetwork.

That said, we might consider supporting multiple views of networks, something we don't do right now, but is (theoretically) supported by GINY.

Or we might tie a group view to a network view. Even if (normal) nodes are in the same network, their view is different depending on the network view they are in. The same could be accomplished with group views.

Also note that the necessity of copying groups - which seems to motivate this assumption - is not captured in any of the uses cases.

groupAPI (last edited 2009-09-22 02:04:12 by GaryBader)

Funding for Cytoscape is provided by a federal grant from the U.S. National Institute of General Medical Sciences (NIGMS) of the Na tional Institutes of Health (NIH) under award number GM070743-01. Corporate funding is provided through a contract from Unilever PLC.

MoinMoin Appliance - Powered by TurnKey Linux