## page was renamed from Cytoscape_3.0/Model || '''Discussion Title''' : Cytoscape 3.0 Model || '''Editor(s)''': ScooterMorris || <> == About this document == This document should serve to begin the discussion about the Cytoscape 3.0 model. At the 2007 Retreat, it was agreed that we would investigate a new class model for Cytoscape objects to replace the current, complicated combination of cytoscape, giny, ding, and fing models that currently provide the API for Cytoscape. Our goal at this point should be to design a model that provides a clean interface to cytoscape objects and provides the cleanest interface we can imagine for plugin writers. == General Notes == == References == I have included a [[attachment:Cytoscape3.cys|Cytoscape session file]] that demonstrates the class hierarchy. The session includes a custom Link Out URL that will pull up the [[http://lewis.compbio.ucsf.edu/Cytoscape3.0/doc/index.html|JavaDoc]] file for that interface. I have currently written proposed interfaces for CyNetwork, CyNode, CyEdge, CyModeObject, CyGroup, CyProject, and CyModelObject. Here is an image from the session: {{attachment:Cytoscape3ClassHierarchy.png}} == Discussion == ===== Note From Conference Call 2007 Nov 29 ===== ScooterMorris presented the Javadocs for the model. At this stage, it seemed one would still be querying the network for information about node and edge relationships, which was done in part by Scooter to allow for the possibility of CyNodes participating in multiple graphs. BrianTurner and SarahKillcoyne pointed out that the discussions in Amsterdam indicated a desire for the model API to more closely reflect the way graphs are generally modeled. In this sense CyNodes would know about their edges--even at the price of losing the idea of a CyNode participating in multiple graphs, which, like the root graph, seemed another unpopular concept with plugin developers at the retreat. However, as GaryBader pointed out, and everyone agreed with, this is about the API, and providing an intuitive way to interrogate the model and its components, not the underlying implementation--that is to say, we would like to be able as a CyNode about its edges and neighbors, but the actual implementation might, nonetheless, have network itself actually managing and knowing about those relationships. MikeSmoot - I'm very concerned by the push to include lots of information in CyNodes, specifically information about adjacent edges, neighbors, etc.. This is because graph data structures simply aren't built this way. In almost all cases I'm aware of, fast and space efficient graph data structures are implemented as adjacency lists of integers (and for those with lots and lots of space, adjacency matrices). I'm all for syntactic sugar in the right places, but if we deviate too far from the implementation, then we risk performance problems and issues with conceptual integrity. We need to be very careful about making design decisions that allow or encourage uses of the graph data structure that don't conform well with their behavior. As nice as it would be to ignore the implementation of the underlying graph, I don't thing we have the luxury of doing this. . ScooterMorris - Can you give some examples? Certainly prefuse includes methods on a Node to access edges, as does JUNG (through implicit methods to get siblings). I thought that the strong sentiment at the Retreat was that Nodes should reference their edges. This allows us to have degree methods on nodes, adjacency methods on nodes, etc. Note that I think we can still have an underlying implementation consists of adjacency lists. . MikeSmoot - [[http://www.boost.org/libs/graph/doc/index.html|boost/graph]] and [[http://www.algorithmic-solutions.info/leda_manual/graph.html|Leda]] are two examples, but perhaps even more important is our implementation of DynamicGraph. I understand that there are arguments for including that information, but I don't think that the complexity they add is worth the convenience they provide. I just don't see the huge advantage of node.getNeighbors() over network.getNeighors(node) especially when all that a Node object would do is call parentNetwork.getNeighbors(this). . ScooterMorris - I think we're going to have to get a broader audience involved in this discussion. We really need to come to a consensus on this since I really can't proceed with the new interfaces without it. At any rate, here is my perspective, for what its worth. I think that if we define things in the most natural way for plugin writers, we may find opportunities for significant performance improvements. In the example above, we might have a node keep a cache of its neighbors, particularly if the api is such that the connections between nodes are made through a CyNode interface. This would significantly speed up certain algorithms. In this case, a call to node.getNeighbors() would not result in a call to parentNetwork.getNeighbors(this), even though the entire connectivity graph (for performance reasons) would be maintained within the parentNetwork. ==== Overall model ==== GaryBader - We should add CyHyperedge. Also, CyGroup and CyHyperedge shoudl extend CyModelObject. This will make it easier to write general methods like getSelected, which can apply to any network object. CyNetwork should not extend CyModelObject since you don't want to add a network to a network. Also, maybe change CyModelObject to CyNetworkObject. . MikeSmoot - I don't know if I agree that CyHyperEdge should be part of the base model. Couldn't we have an extension of CyNetwork that handles this? . ScooterMorris - Given the upcoming change to CyNode, where nodes will maintain a reference to their edges, I think that the right solution to handling HyperEdges is probably to extend CyEdge. I would imagine that the way we would traverse the network is to get the first node, then get the list of neighbors from that node, etc., etc. To extend this to hyperedges, all we would need to do would be to add CyHyperEdge support to the getNeighbors routine. MikeSmoot - Nodes will almost certainly '''NOT''' contain references to edges. Any method that does this will only appear to do this and instead query the graph object. See my comment above for more detail. . ScooterMorris - I really think that for now we need to focus on our best judgment, based on our collective experience as plugin writers, as to what the model should look like. If we later need to change the model because we can't achieve the desired performance, then we'll change the model. I don't think we should restrict ourselves as this early stage. . MikeSmoot - My point here is that we need to be very careful about how we think of these things. Since Edges aren't stored in Nodes (as a practical matter - regardless of what the interface looks like), then it worries me that we'd be making design decisions based on this. MikeSmoot - Given what CyModelObject does now, then I think it's probably OK for CyNetwork to extend it. ==== CyModelObject ==== GaryBader - is clone a deep copy? ScooterMorris - I think it should be. GaryBader - me too - just note it in the docs GaryBader - getIdentifier() - will this be unique across all CyModelObjects within a Cytoscape session? This could be useful for e.g. storing objects in hashes. . ScooterMorris - Yes. I think we need to separate getIdentifier() from getName, which would give us much more flexibility. An identifier, in this case, is an internal value that (in general) the user should never see. . GaryBader - I think we should only have one unique ID across a session, probably an integer to prevent people using it to store gene names. Where is getName? We probably shouldn't have getName if we have an identifier concept. Just store the name in the attributes. GaryBader - What's the use case for getAttribute(java.lang.String attributeName)? . ScooterMorris - it returns the named attribute for this object. The idea is to separate global attributes from attributes that are specific to an object. . GaryBader - does this support all attribute types: int, float, list, etc. ScooterMorris - one thing I would like to add to CyModelObject is to add some form of userDataObject stuff back in. I know that it was there at one point and not heavily used, but I think that we have a lot more plugin developers today that are implementing a much wider variety of applications. Providing them "hooks" at this level would be potentially very useful. . GaryBader - the problem with this is that it was too hard for Cytoscape to save their data in a general way. It would probably be cleaner for them to save/load if they maintained their own data structures and handled their own saving/loading directly from those data structures, rather than having to scan Cytoscape model objects for their data. . ScooterMorris - I think an alternative would be to allow users to store UserDataObjects, which must implement the UserData interface. This interface would include a call to serialize the object in some reasonable way that could be stored in the session. Alternatively, we could provide a series of hooks to allow users to store things in sessions, but it sounds much easier, IMHO, so allow users to take advantage of the existing mechanisms and not have to worry about I/O. In any case, we need to provide solutions to plugin writers' need to save and restore data associated with networks, nodes, and edges in the session. To date, we've forced users to use attributes exclusively. Maybe that's the right solution, but I think it's worth a broader dialog. . MikeSmoot - UserDataObjects sounds like a lot like CyAttributes... I understand the utility of having a getObject()/setObject() method and very much like the idea of building in flexibility, but until there's a specific use case, I don't want to add this back in given the problems it has presented in the past. What use cases does CyAttributes fail to meet? How could we improve it to make it easier to use? . ScooterMorris - I think that the issue (in my mind) with CyAttributes is that we've begun to overload them significantly. Initially, CyAttributes was the mechanism for mapping data onto the network. Once we had an efficient way to do that, we started using them for storing arbitrary pieces of information ''about'' network objects, which led to us needing to have hidden attributes and non-editable attributes, etc., etc. Ideally, I would like to see a separation between CyAttributes, which are data values users can see and manipulate -- are available to the vizmapper, etc., and the more housekeeping-oriented things we maintain (such as group membership). In fact, I would imagine these might be implemented under the covers in the same way (using a multihashmap), but the user perspective would be very different. ==== CyNetwork ==== GaryBader - Should there be a remove change listener? . MikeSmoot - I don't know that we want to concern ourselves with event listeners just yet. We just need to know that events will need to be supported. . ScooterMorris - No, but I see no reason not to capture the place holders. I actually like the suggestion of moving all of these to CyModelObjects. GaryBader - Naming: use List instead of plurals? E.g. addEdgeList instead of addEdges. A list can contain one object and addEdge is very close to addEdges, so could be confusing. GaryBader - Should there be an addGroupList method? GaryBader - Should the CyNetwork be a factory for nodes, edges, etc.? If so, shouldn't the object be immediately added to the network? (some of the javadoc is not clear on this policy) . MikeSmoot - My initial reaction is NO that CyNetwork should NOT be a factory for nodes and edges. I think that the CyNetwork ''interface'' should only be concerned with how one interacts with a network once it's created and shouldn't be concerned with issues of creation. GaryBader - Hyperedges should not be included in edges - they should be their own object, just like groups is separate. Overloading CyEdge with hyperedge will make simple graph algorithms much more complex to implement because you will have to check in many places to make sure you're not using a hyperedge edge. We should evaluate the Agilent hyperedge object for model inclusion as CyHyperedge. . ScooterMorris - on the other hand, if there are hyperedges in the network, all of the standard graph algorithms will break. I'm now thinking that a CyHyperEdge extends CyEdge and that internally we provide mechanisms to handle it appropriately. We know that there is a valid use case for hyperedges, so I think we should consider including them in the model. . GaryBader - Agilent spent some time creating hyperedges, but that includes the concept of roles for the nodes that are part of the hyperedge and also hyperedge view code. It is likely we would need hyperedge views, just like we have group views. GaryBader - getDegree must include undirected, in, out. In general, we should probably have clearer separation between undirected and directed edge methods e.g. getDegree would give you everything, getDegreeIn, getDegreeOut, getDegreeUndirected (or use an enum to filter) would give you specific results. Same things with all edge methods, edgecount, getedgelist. Right now you can only filter with some methods and you can't specify to get only undirected. . MikeSmoot - We need to clarify if we're supporting a mixed graph or just directed/undirected. We may want sub interfaces to support different the different cases. . GaryBader - I think in Amsterdam people wanted the mixed mode. I'm still somewhat concerned that it adds more complexity in the model than it's worth (as evidenced by my comment above). GaryBader - Naming: getEdgesList -> getEdgeList (avoid all use of plurals in all packages) GaryBader - Why is hide part of CyNetwork - should it only be part of the view? We may want to remove this altogether and just use remove/add GaryBader - Use case for isNeighbor? It could be useful, but maybe better as a utility method. . MikeSmoot - This should definitely be part of CyNetwork or CyNode as this is something that only the graph or node knows about. In general I'm opposed to utility classes with utility methods. . GaryBader - do you think that asking one node if it is the neighbor of another node is a common query? . MikeSmoot - Perhaps not. What I very much want to avoid are the cases where simple code gets written over and over. If you need to get all neighbors of a node, iterate through the list, test each node, the code is simple but will get duplicated wherever you need to do this. I'd rather have a fatter interface than duplicated code. Utility methods are OK, but why spread the functionality out between more than one object? I guess I don't see the benefit. GaryBader - Why not use the CyModelObject here? - can you not select groups and hyperedges? . boolean isSelected(CyEdge edge) boolean isSelected(CyNode node) GaryBader - same thing goes for add, remove, unselect, etc. . MikeSmoot - good point! GaryBader - if we use CyModelObject here, then I don't think CyNetwork should be a CyModelObject (and CyModelObject should be changed to CyNetworkMember or similar) GaryBader - Don't think we need selectAllNodes, edges in there, as they are utility methods. Same with unselect. You can easily select all using setSelectedEdgeState + getNodeList. GaryBader - createEdge - change 'interaction' parameter to edgeType, or similar. MikeSmoot - createNetworkView should definitely '''NOT''' be part of this interface. . ScooterMorris - OK, then we should add methods for "addNetworkView and removeNetworkView" . MikeSmoot - No, the point is that the model knows ''nothing'' about the view. The view knows all about the model, changes the model, responds to model events, and does things based on model events, but the model does '''not''' react to anything that happens in the view. Think of it this way: as soon as we have to add an {{{import org.cytoscape.view.CyNetworkView}}} statement anywhere in the model package, we've created a cyclic dependency. You should be able to write code that operates strictly on the topology of the network (e.g. a graph traversal algorithm) that only requires you to use the CyNetwork jar without needing to import or use CyNetworkView. This is a '''''really, really''''' important point, so please keep the discussion going if this doesn't make sense. . ScooterMorris - I thought we agreed that CyNodeView, CyEdgeView, and CyNetworkView were going to be part of the model, didn't we? If not, then I totally agree with you, and these shouldn't even be in the image above. If they are part of the model, and there is something else that will actually map the information to a "view" then I think the above methods make sense. Perhaps we need to rename these to CyNodePresAttr, CyEdgePresAttr, and CyNetworkPresAttr (where PresAttr == Presentation Attributes)? . MikeSmoot - I think there need to be (at least) three layers: 1. The topology of the graph, 2. A presentation or view interface that defines things like x,y position, node shape, node color, etc.. 3. The application layer that provides an implementation of 2. The goal of the first layer is to hide the implementation of the graph data structure and to allow users to operate strictly on the topology of a network (e.g. for statistical tests, getting neighbors, etc.). The second layer provides an abstraction of a view. The goal for this is to be able to specify the visual attributes without needing the actual visualization such as with headless mode or with an AJAX mode (i.e. where we won't have swing components popping up). So 2 is kind of like a view model, so I understand why you're thinking along those lines. However, I think that layers 1 and 2 need to separate so that 1 can be used independently and so that we can have multiple instances of 2 for a single 1. Clear as mud? . ScooterMorris - If we follow along these lines, we would need to create a network, create a view model for that network, and then create a view for the view model -- right? I'm just wondering if that's the direction we really want to go. I agree that we need to have multiple instances of the view model for each network object -- and I can imagine where we might want to have multiple views for each view model. As a plugin writer, though, it sounds like it will be pretty complicated to traverse through all three of those layers to get from the view to the network, or from the network to the view. Often what I really want to do is just update the color or X,Y position (for example) and have the view reflect that (through events?). ==== CyNode ==== GaryBader - Group methods should only be in CyGroup, not in CyNode. . ScooterMorris - not sure what you mean. Asking a CyNode if its a group, or what groups it is a member of is much easier than trying to maintain a list of CyGroup static methods. I think that this is somewhat the same argument as having nodes "know" about their edges. . GaryBader - I see your point, just that the more types we add to the model, the more of these 'inter model' methods get created, in an n-squared fashion. Unless we make good use of a class hierarchy. GaryBader - Some of the methods in various packages are missing their return type. ==== CyEdge ==== GaryBader - Again, remove hyperedge methods to simplify this. GaryBader - getInteraction() - should be getEdgeType. How will this be related to identifier? Is the edge type part of the ID like 2.x? I don't think it should, since this causes a lot of confusion. . ScooterMorris - nice suggestion. GaryBader - Need to be able to filter by edge type in getTarget type methods - source, target and none for the undirected case. This would be another good place for a global edge enum type: in, out, undirected. ==== CyGroup ==== GaryBader - GetState - what is this? . ScooterMorris - the current state of the group. The specific semantics are implemented by the CyGroupViewer, but the state needs to be saved and restored, so it needs to be part of the CyGroup itself. GaryBader - getViewer() returns string - should this be a groupviewer? . ScooterMorris - No, because there is no way I can know if the groupViewer has been registered, yet. ==== CyGroupViewer ==== GaryBader - Shouldn't a viewer just register for a group change event? Maybe change listener should be on CyModelObject? ==== CyNetworkChangeListener ==== GaryBader - Should we have more enums for types of changes, like CyGroupViewer.ChangeType? We could just have all of these in one place. . MikeSmoot - I agree that we probably want to consider consolidating all ChangeTypes into a single enum. ==== CyProject ==== GaryBader - addManipulation - too granular? Can we use comments for this? If necessary, we could add a type of comment, allowing attribute/value pairs. . MikeSmoot - Maybe we want the granularity to be at the command layer? Or maybe the granularity should be events? Gary Bader - I like the idea of automatically storing a chain of commands in the project, as a history, rather than storing modification strings. GaryBader - What does addFile do with the file once it's added? . MikeSmoot - addFile sounds a lot like an IO thing. Should this be in the model? . ScooterMorris - My idea is just that the project will maintain a list of all of the files that were used to create the session (network files, ontologies, attributes, etc.). This is just the way to tell the project that a file was added to the session. Perhaps this could be replaced with a special type of comment, if we follow the suggestion above of extending comments. GaryBader - getname, but no setname? GaryBader - Should we be able to have subprojects? E.g. most IDEs have this. GaryBader - Where is the active state of the project set? There is no setActive method. GaryBader - setProjectPath - is this the name of the file storing the project? ==== CyAttributes ==== GaryBader - Has anything changed here from the current CyAttributes? . MikeSmoot - I'd like to transition to using enums to define attribute types rather than ints like we've got now. . GaryBader - sounds like a great idea.