## page was renamed from RFCTemplate ## This template may be useful for creating new RFC's (Request for comments) ## This is a wiki comment - leave these in so that people can see them when editing the page || '''RFC Name''' : Grouping API || '''Editor(s)''': ScooterMorris || <> == About this document == This is an official Request for Comment (RFC) for supporting ''groups'' in Cytoscape. This RFC encompasses and leverages the MetaNodes RFC (RFC 8) and the SimplifiedMetaNodeDataStructureRFC (RFC 9) by extending the notion of a metanode, to a more general ''group'' concept. For details on RFCs in general, check out the [[http://www.answers.com/main/ntquery?method=4&dsid=2222&dekey=Request+for+Comments&gwp=8&curtab=2222_1&linktext=Request%20for%20Comments|Wikipedia Entry: Request for Comments (RFCs)]] == Status == ##Put the date and the status. Status can be e.g. "Not yet completely written", "Open for public comment", "Closed for public comment". There could be some explanation of the status Open for public comment == How to Comment == To view/add comments, click on any of 'Comment' links below. By adding your ideas to the Wiki directly, we can more easily organize everyone's ideas, and keep clear records. Be sure to include today's date and your name for each comment. Here is an example to get things started: [[/Comment]]. '''Try to keep your comments as concrete and constructive as possible. For example, if you find a part of the RFC makes no sense, please say so, but don't stop there. Take the extra step and propose alternatives.''' == Proposal == The goal is to provide a new package, e.g. cytoscape.groups, that suplants all direct calls to the giny metanodes methods, and extends the concept of metanodes in a structured manner. Similar to the MetaNodes implementation, there is a [[http://www.cgl.ucsf.edu/Research/cytoscape/groupAPI/doc/edu/ucsf/groups/GroupManager.html|GroupManager]] class that should be the main interface for most developer. There are also three interfaces to allow a ''group'' to have different abstraction models and different visual properties. The three interfaces are: * [[http://www.cgl.ucsf.edu/Research/cytoscape/groupAPI/doc/edu/ucsf/groups/model/GroupModel.html|GroupModel]] - provides an interface for classes that handle the model associated with presentation of grouped nodes, * [[http://www.cgl.ucsf.edu/Research/cytoscape/groupAPI/doc/edu/ucsf/groups/view/GroupViewer.html|GroupViewer]] - an interface for classes that handle the actual presentation of groups, and * [[http://www.cgl.ucsf.edu/Research/cytoscape/groupAPI/doc/edu/ucsf/groups/data/GroupAttributesHandler.html|GroupAttributesHandler]] - an interface for classes that handle the node and edge attributes of groups. This package must be in the core to provide direct, consistent access to the grouping API for the XGMML reader/writer, the Cytoscape Editor, and the Metanode Plugin (which will still be provided as one interface to grouping). An [[http://www.cgl.ucsf.edu/Research/cytoscape/groupAPI/doc|Overview]] of the proposed API is available for comment. There three significant assumptions that underlie this proposal: 1. '''A Group exists in only one CyNetwork.''' This is really a matter of user expectations. Groups are different than CyNodes and CyEdges in that they have a visual state (grouped or ungrouped). Attempting to maintain different states in different CyNetworks that might have different collections of nodes and edges visible would be difficult. The API explicitly provides a method to perform a shallow copy of a group. 1. '''For the subnetwork given to create a group, all Edges and Nodes in this subnetwork are used in creating a Group.''' 1. '''This API will take advantage of events to inform it of the deletion of nodes and edges that it cares about.''' The groupAPI will also issue events to inform interested listeners of significant changes to the groups. These will be documented as part of the API. ##The sections below may be useful when creating an RFC, delete the ones that are not == Biological Questions / Use Cases == Each use case should be expanded in a separate page by the person (or group) designated in italics. Please use the Use Case template, which has the following elements: * Name of use case * 1 paragraph summary * Step-by-step user action * Visual mockup & storyboard * Requirements-met & missing in existing Cytoscape implementation * Frequency of use/importance e.g. every time we analyze data X * Give examples in other programs, or papers ==== Due Date: November 15th ==== 1. Clustering - Biomodules ''Gary'' * [[groupAPI/UseCase_1|Use Case 1]] * Biological application: Group proteins in a graph of protein-protein interactions that have a collective function in the cell (http://www.genome.org/cgi/content/abstract/14/3/380) in order to discern higher levels of organization in the biological network. In this case, there might be overlap between two clusters or modules. * Group solution: A group of proteins can be visualized by a single node that has visual and topological characteristics that reflect the underlying group of proteins. For example, the size of the node is proportional to the number of proteins it represents, its connections to other proteins reflect connections from its inside proteins to other proteins, its color represents the average expression levels of its members for a certain condition, etc. See the image in http://labs.systemsbiology.net/galitski/projs/biomodules/index.html. Round nodes are metanodes. 2. Protein Complexes - Pico/GenMAPP (note by Cline/Pasteur) ''GenMAPP'' * [[groupAPI/UseCase_2A|UseCase 2A]] * Biological application: Group proteins in a pathway that are known to form complexes in order to simplify visualization and store known associations in the data model. * Group solution: Ideally, there could be two views of protein complexes. (1) A '''''collapsed''''' view, similar to that used in Biomodules above, but with a default size (not scaled by number of members) and a label that is unique to the metanode (i.e., PKA complex). (2) A '''''stacked''''' view, where all the children nodes are visible and simply stacked (like gene boxes in GenMAPP). * Extensions: Note that the solution should also fit for protein domains since the particular boundaries between protein domains in a single chain and between proteins in a complex is rather arbitrary, a matter of evolutionary fate. The solution should also extend to the grouping of paralogs and splice variants. * Further note (Cline/Pasteur): these extensions become especially interesting, now that there are high-throughput platforms to measure separate expression levels of genomic features. Biologically, the likelihood of a given interaction will depend on the isoforms produced in the cell, and whether or not the protein features involved in the interactions are expressed. . - One Group implentation is for each child node to represent a different component of the gene or protein - an exon, or a protein domains. Where the right data is available, interaction would be tied to the components involved in the interaction - much as is done now in the Domain Network plugin. - A second Group implementation would have each child node represent a protein isoform of the parent. Again, where the right data is available, interactions would be associated with the isoforms that ''can'' interact. Here, some thought should go into how to handle the edges. If one metanode in an interaction has N child nodes, representing N protein isoforms, and the second has M modes representing M isoforms, having up to N*M different edges represents a lot of complexity. * Note: The '''''stacked''''' view is not merely a visualization problem that can be solved by having the ability of viewing different sections within a single node (coloring them differently, etc). This is because we wish to have edges connected to each individual component of the stack. Because of this reason, this is indeed a biological application that can be solved using metanodes. 3. Intragenic Features - Pico/GenMAPP ''GenMAPP'' * [[groupAPI/UseCase_3A|UseCase 3A]] * Biological application: Associate features such as exon structure, promoter regions and SNP positions with proteins in a pathway. These features are quickly becoming the preferred level of abstraction for microarray analysis and other high-throughput methods. We must be able to translate these massive datasets into biological context (i.e., pathways) in an efficient manner. * Group solution: By associating these feature-level nodes with a protein node in a parent-child relationship, we could efficient map these data types to the biology at the pathway level. These associations might be best viewed as collapsed nodes colored by specified algorithms that consider the data type (e.g., a splicing analysis on all exon data mapped to the whole gene). And instead of expanding the node on the same network, perhaps we could restrict the expansion to a new network (like a small pop up window) that displays the feature-level nodes and direct data mapping, e.g., expression level for each exon associated with the protein. * Note: This type of metanode seems to be qualitatively different and might require separate terminology. Then, again, maybe not? * Note2: After a group discussion, it was decided that this biological application does not require a metanode solution. It can be solved by implemeting a Cytoscape plugin that makes use of already existing Cytoscape functionality. 4. Boxing of groups. * An additional desired way of visualizing groups of nodes is to box them. The box itself is not a node, it is just an enclosing area for a group that can be dragged around while the nodes move with it. In this case, the metanodes can be used a a mechanism to group nodes, to keep track of these groupings, and to modify these groupings (removing or adding membership). But the metanode itself is not visualized. All of the biological applications above can be solved using this boxing visualization. 5. Alternate paralogs in pathways. ''GenMAPP'' * [[groupAPI/UseCase_5A|UseCase 5A]] * Might have multiple nodes which perform the same function (sort of a logical "OR"). Would want to see these as a group. 6. Protein superfamily networks ''Scooter'' * Protein superfamilies can be represented as large networks where the nodes represent the proteins, and the edges represent the relationship (defined by BLAST e-value, structural relationships (RMSD), etc.). These network can contain 1,000s of nodes, but there is often a defined hierarchy -- the superfamily contains several subgroups, which contain families, which contain proteins. * Group solution: The idea is to be able to group various levels of the hierarchy to present a simpler (more abstract) view, and allow the user to be able to "drill down" into the next level of the hierarchy to provide a more detailed view. One possible implementation is to implement all of the nodes contained within the group as a subnetwork (a normal CyNetwork). The user should be able to either "ungroup" the nodes (i.e. display all contained nodes as part of the current view) or be able to open the contained nodes (and edges) up into a new network (view). 7. Named list of genes. ''Piet & GenMAPP'' * [[groupAPI/UseCase_7A|UseCase 7A]] * Similar to geWorkbench's idea of a "panel", which can have an arbitrary group of nodes e.g. process, cytoplasm. As used in GOMiner, MAPPFinder, etc. e.g. Apoptosis * __Use case__: Group all nodes belonging to a certain Gene Ontology category * Annotate network with GO-category: cellcycle * Annotate network with GO-category: apoptosis * Group nodes belonging to one or both of the two categories * Extend this to more categories 8. Black-box pathways. ''Ethan/Ben/Gary'' * [[groupAPI/UseCase8|Use Case 8]] * Similar to #7 above, but includes connectivity between the group of nodes. 9. States of a protein/generics. For example, grouping together splice variants, PTMs, etc. ''Ethan/Ben/Gary'' 10. Groups of graphical elements that are not necessarily nodes or edges. ''GenMAPP'' * [[groupAPI/UseCase_10A|UseCase 10A]] 11. General collapse/expand paradigm for reducing complexity by hiding ''Piet'' * Not necessarily any biological semantics * __Use case__: Collapse all nodes having edges with the same source nodes * Biological networks tend to be scale free; a few hubs target large number of genes. These networks are not very clear and it is of interest to see which nodes have hubs in common * A backbone network of hubs is visible immediately 12. Topological grouping ''Piet'' * Hide "downstream" components. Similar to number 11, but selection, construction, and collapse of group would be automatic based on some topological value (e.g. node neighborhood, downstream nodes, etc.) * __Use case__: It has to be decided which knockout cellline of a gene participating in a number of pathways has to be created; which genes are expected to be affected hypothetically and which genes not * Create large network from existing pathways * Assign knockout gene(s) * Create groups affected by / not affected by based on directionality of edges * Explore by expanding / collapsing 13. Quick Find and Group Node'' Jim '' * Quick find and group nodes should be modified to allow nodes nested (and hidden) within a group node to be searched for. * See: [[groupAPI/QuickFindAndGroups]] == Implementation Plan == * [[groupAPI/Implementation_Plan|Implementation Plan]] == Comments == MichaelCreech 2006-09-08 08:10:43 Assumptions A few assumptions you might want to explicitly state: 1) A Group exists in only one CyNetwork a) Corollary: The Group's identifying node exists in only one CyNetwork (as far as the Group machinery is concerned). 2) For the subnetwork given to create a group, all Edges and Nodes in this subnetwork are used in creating a Group. Issues 1) The current model requires all Cytoscape-based code to explicitly tell GroupUtils when a node and edge is added or deleted (through GroupUtils.deleteEdgeNotify() and deleteNodeNotify()) in order to keep the Group consistent with its underlying Cytoscape structure. I think this is a major problem for several reasons: a) It requires all core code and plugins to change their code to call these GroupUtils methods. b) I also forces that the Group API implementation cannot be a plugin--must be part of the core--otherwise these calls would not work if the plugin were not loaded. c) It leads to strong coupling between the various components that make up Cytoscape since now all the various components that change Nodes and Edges must know about and explicitly reference GroupUtils. The alternate approach is for Groups to use event handling and track when nodes and edges are added or deleted. There may be performance and other issues with the current event handling implementation that must be fixed before this can be used as a solution. 2) Should only CyNode, CyEdge, and CyNetwork be referenced or should their underlying interfaces be referenced, such as Node, Edge, and GraphPerspective? CyNode and CyEdge are also implementation classes, not interfaces. Unless there are specific methods used in CyNode, CyEdge, and CyNetwork that aren't a part of their underlying interfaces, the API would be more flexible referencing the underlying interfaces--with the *big* caveat that the underlying interfaces aren't being removed. Suggestions 1) Based on assumption one, add a GroupUtils.getCyNetwork (CyNode group_node). This would return the CyNetwork for which a given group node belongs. 2) Remove CyNetwork parameter to all operations where it is not needed. Because of assumption one, and the use of suggestion one, many of the existing API operations don't need the CyNetwork parameter. For example, GroupUtils.isGrouped(), GroupUtils.regroupGroup(), GroupUtils.removeGroup(), GroupUtils.ungroupGroup() and corresponding operations in GroupAbstractionModel. 3) Change implementation-specific parameters to more general interfaces. Example: ArrayList is used as a parameter to several operations (GroupAttributesHandler.setAttributes()) versus a List, Collection, or Iterator parameter type. 4) Use only standard java data structures. For example, use java.util.Map instead of AbstractIntIntMap as a parameter in GroupAttributesHandler.setEdgeAttributes(). 5) Clarify mutability of List return values or change to Iterator. Example: GroupUtils.getGroupMembers(). Such methods should clearly state if the List returned is safely modifiable. Another approach is to return a immutable Iterator, which leaves more flexibility for the implementation. 6) Fix inconsistencies in GroupAbstractionModel: a) addEdgeNodify() states: "Inform the group model abstraction that an edge has been added." However, GroupAbstractionModel.addNodeNodify() states: "Inform the group model abstraction that an edge has been deleted." b) There is a deleteEdgeNotify() but no deleteNodeNotify(). 7) Possibly remove duplicate methods that perform the same operation for nodes and edges. For example, there is a GroupAbstractionModel.addEdgeNotify() and GroupAbstractionModel.addNodeNotify(). Why not have one addGraphObjectNotify()? I know why this is the case-it's been done this way in the past. However, there is an abstraction in Giny of a GraphObject, of which both a Node and Edge are extensions. Thus, you could have an addGraphObjectNotify (GraphObject obj). 8) Change the name GroupUtils to something less misleading (maybe Group or GroupManager)? Usually 'utils' implies lower-level miscellaneous utility operations for some API versus the main top-level mechanism of interaction with the API. 9) Change GroupUtils.getGroupMembers() to return the CyNetwork that is the sub-network representing the Group versus returning a List of Nodes. This would be useful because it also gives what edges are in the Group. Otherwise, you probably need two different operations: getGroupNodes() and getGroupEdges(), or change getGroupMembers() to return a heterogeneous List of Nodes and Edges. 10) Allow null network parameter to GroupUtils.getGroupNodes(). When the network is null, this would mean to return *all* groups (across all CyNetworks) for which the given member belongs. 11) Add a GroupUtils.getSubGroups (CyNode groupNode) This would return a Collection or Iterator or all the sub groups contained within a Group. 12) Drop 'Abstraction' from GroupAbstractionModel and GroupAbstractionViewer Just call them GroupModel and GroupViewer. 13) Add GroupUtils.getGroupModel() and getGroupAttributesHandler() If a group is created without specifying a specific handler and model, there is no way to get at the handler and model. 14) Change setGroupAbstraction() to setGroupAbstractionModel() Or setGroupModel() if suggestion 12 is used. 15) Add GroupUtils.setGroupAttributesHandler() This is then consistent with the existing setGroupAbstraction(Model). ---- MichaelCreech 2006-09-08 08:25:11 16) Need to be able to Add and Delete from a Group Might add new API functions for these operations or tell users to directly modify the sub-group, if you catch add/remove events. 17) Need a way to copy a group Maybe GroupUtils.copyGroup (CyNode groupNode, CyNetwork newParent) ---- MichaelCreech 2006-09-08 14:56:27 Another suggestion: 18) Change variable references in the API to method references. There are a few variable references in the API, namely GroupUtils.defaultModel, GroupUtils.defaultAttHandler, and GroupAttributesHandler.DEFAULT_NODE_LABEL_ATTRIBUTE. It would be better *only* access these through equivalently named methods--thus allowing more flexibility and better encapsulation in the implementation. Possible methods are GroupUtils.getDefaultModel(), GroupUtils.getDefaultAttHandler(), and GroupAttributesHandler.getDefaultNodeLabelAttribute(). ---- MikeSmoot 2006-10-03 16:34:26 Assumptions: #1 I'm not sure about this. If the nodes in a group exist in more than one network, why wouldn't the group? Issues: #1 I wholly agree. We're going to need decent event handling to support undo as well, so I think this would be time well invested. #2 Not sure about this one. Some people argue that it's better to use CyNode, CyEdge, etc. because they are specific to cytoscape whereas giny Nodes and Edges are not. If we were to ever move away from giny, the use of Nodes and Edges as opposed to CyNodes and CyEdges would complicate things. That said, I think it's highly unlikely we'd ever do anything other than subsume giny into cytoscape. If others agree, then we should definitely code to the interface. Suggestions: #1,2 Not sure about assumption 1, so I'm not sure if this is a good idea or not. #3 Absolutely. #4 Maybe. It depends on how the structures are being used and what kind of performance is needed. Since all of the core data structures use ints, it might be better to use the non-standard int data structures rather than the standard Integer ones. Even if we change the group api, these int specific data structures are used elsewhere. #5 Agreed. #6 Agreed. #7 Yeah, but see my hesitation on Issue #2. If we agree that we can use giny, then I agree with this change. #8 Absolutely. #9 Agreed. I would prefer that we return a CyNetwork rather than a heterogeneous list of Nodes and Edges. #10 Disagree. In general, I don't like methods that have "hidden" behavior that isn't apparent from the interface. I'd prefer two methods getAllGroups(), and getGroups(CyNetwork). #11 Ok. #12 Absolutely. #13 Ok. #14 See #12. #15 Ok. #16 Ok. #17 Ok. #18 Agreed. ---- MichaelCreech 2006-10-04 06:48:55 Suggestion: 19) Separate user event handling from the GroupAbstractionModel. It looks like the only way to get event information about changes to a Group is to create or extend the GroupAbstractionModel and override the appropriate XXXnotify() methods (e.g., addEdgeNotify()). Getting information about changes to a group should be independent of the need to create a new Group Model. ---- AdityaVailaya 2006-10-04 10:49:32 Assumptions #1 It is my understanding that GenMapp group prefers not to have a group automatically created across networks, but have a group be local to a particular network. Further, it was suggested at the mini-retreat (early September in San Francisco) it was suggested that a copy (note no sharing) of a group would be made if it is to be reused in another network. The "copy" allows for different properties to be attached to a group in different networks. ---- ScooterMorris 2006-10-04 11:44:12 Regarding assumption #1: the issue is one of user expectations. If a user is looking at a network that has been created from another network, and they create a group, would they expect to see the nodes in the other original network group? Our conclusion was that they would not. In addition, since the collection of non-grouped nodes might be different in the two networks, the implementation would be more difficult. That being said, we also talked about the user expectations of creating a new network from a group of nodes in an existing network, when that group of nodes includes a groupNode. In that case, we felt that the right thing would be to make a copy of the group (see Aditya's comment above). ---- MikeSmoot 2006-11-17 12:12:45 I know we debated this for hours at the retreat, but I don't recall that we ever agreed on Assumption #1. In discussing this, I also think we need to be careful about distinguishing a group model from a group view. If the nodes and edges contained in a group are represented by a CyNetwork, then that CyNetwork will be just like any other CyNetwork. If a group model is represented in a CyNetwork by a CyNode with a pointer to another CyNetwork, then that CyNode is just like any other CyNode. This means there is one representation of that CyNode in the rootgraph and as such, that CyNode can be included in more than one CyNetwork. That said, we might consider supporting multiple views of networks, something we don't do right now, but is (theoretically) supported by GINY. Or we might tie a group view to a network view. Even if (normal) nodes are in the same network, their view is different depending on the network view they are in. The same could be accomplished with group views. Also note that the necessity of copying groups - which seems to motivate this assumption - is not captured in any of the uses cases.