Contents

Model Layer definition
1. Component Modules
Requirements

This is the homepage of the Cytoscape 3.0 Model layer design discussion.

Model Layer definition

The model layer contains the object/data model for core data structures useful for Cytoscape or Cytoscape-like software.

Component Modules

Graph
Attributes
Identifier policy (for objects that need to be referenced, like nodes/edges)
Hypergraph? (is this different from Graph? see open issues)
Groups? (is this different from Graph? see open issues)
Project?

Requirements

Core structure of the Model layer

APIs should be easy to use. Use case: to reduce bugs and duplicated code (from cyto2 experience)
Minimize dependencies between APIs. Use case: to reduce maintenance work (from cyto2 experience)
API dependencies must be acyclic!!!!
- (SK - I do not agree that this is true for API's within a module. It is absolutely true between modules however)
APIs should be non-redundant (i.e. don't create multiple APIs for the same model unless there is a clear use case). Use case: easier for core developers and plugin writers to work with (from cyto2 experience)

Open issues

Should the APIs be loosely coupled (e.g. just via shared IDs) or strongly coupled e.g. by imports and shared data structure use?
Should there be a plugin writer API on top of a streamlined core API or just one API for everyone.
- Gary: I strongly suggest having just one API for everyone. There is no real differentiation between core and plugin developers, as they both need easy to use APIs to avoid bugs and maintenance headaches - a single, clean, powerful API will keep the core clean and the plugins clean.

Network/Graph

Is extremely fast and memory efficient for creating and updating. Use case: large graphs up to millions of nodes
Support for a multi-graph. Use case: represent basic networks with graph, multiple types of edges in a protein interaction network with a multi-graph.
Support for hypergraph. Use case: biochemical reactions using hypergraph.
Support for nested graphs. Use case: protein complexes using nested graph.
Ability for user to determine what type of graph they are working with: e.g. graph, multi-graph, hypergraph, nested graph (contains groups). Use case: algorithm writer needs to know this so they avoid running an algorithm on an incompatible input data structure.

Open issues

How do we implement more complex graphs e.g. hypergraphs at the same time as regular graphs?
- Possible solutions are:
  - have different classes for each that inherit from the most general class;
  - overlay all types on one simple graph model e.g. use one graph model with special flags for nodes and edges; have a loose association of classes linked only by node and edge IDs, where the more complex model maintains its own consistency if related models are changed. Note: a hypergraph is not a type of graph - it is the other way around.
  - Don't support hypergraphs at all. I think that hypergraphs are a very complicated abstraction and possibly not the correct one for biochemical reactions.
How should these related aspects of the model be kept consistent e.g. are events needed in the core? Yes, events are needed in the core for the network model to communicate state changes to classes that depend on the network (but not vice versa).
Should nodes be shared between networks or not?
Should the node label be in attributes or in the node?
What is the largest size of network we should consider in our design (order of magnitude)? Millions of nodes/edges?

Design Ideas

Have both a fast core graph implementation and a higher-level object oriented network API that makes it easier to manipulate networks for plugin writers. The latter is more memory and CPU intensive, so would have a lazy implementation.

Two proposals have been made but not fully integrated and neither is necessarily final:

Model API Proposal #1

Here is a proposed network API that includes an multi-graph implementation, but not a hypergraph or nested-graph. http://chianti.ucsd.edu/svn/csplugins/trunk/ucsd/mes/api/
See GraphPerspectiveToCyNetwork for details on other issues.

Model API Proposal #2

JavaDocs http://chianti.ucsd.edu/svn/csplugins/trunk/ucsf/scooter/cy3PluginAPI This proposal includes all of the interfaces that the graph module is expected to contain, including graph objects, attributes, projects and potentially groups. The basic idea is a two-level API.

Top level (org.cytoscape.model) provides a clean model for accessing and manipulating Cytoscape networks. It includes the familiar CyNetwork, CyNode, CyEdge, and CyAttribute interfaces, in addition to support for CyHyperEdge and CyGroups. CyProject is an interface that is a first step towards documenting data provenance, and providing an interface to the information that is currently held within CytoscapeSession.
Lower-level API (org.cytoscape.model.graph) consists of a single object: Graph. A Graph is a very low level interface, providing access to integer arrays of nodes and arrays of edges, and basic connectivity. A Graph can be used when performance is critical, but in general plugin writers will not have to deal with/think about it.
All networks, nodes, and edges are represented by integer identifiers that are unique to this session. CyNetwork does not provide an interface to the associated Graph -- they are implicitly linked by their shared network identifiers. Similarly the correspondence between CyNodes and nodes in the Graph is solely their shared node identifiers. CyNetwork does not inherit from Graph, but the CyNetwork implementation is based on the Graph API. One of the strengths of this approach is that CyNodes and CyEdges can be created lazily by calls to CyNetwork. This allows us to read in very large graphs without the overhead of CyNode and CyEdge objects for every node and edge in the graph. Note that the inclusion of getNodesByAttribute and getEdgesByAttribute methods in CyNetwork allows us to only create CyNodes and CyEdges for subsets of the Graph.

A couple of other things to note:

Flags: Saw a requirement to include the capability to set, clear, and test flags on nodes and edges. We may want to broaden this. Currently, these flags can be used to determine some information about the nodes and edges, including whether this node represents a group, or is a housekeeping node (such as a hyperEdge connector node), or whether this Edge is actually a HyperEdge. As I thought about this, it may provide us with a way to add special nodes and edges in the future without needing significant changes to the API. They are also important in that they allow us to generalize some of the special features of Groups and HyperEdges for other things.
Groups: A CyGroup is implemented as an extension of a CyNode. This has some interesting advantages. For example, if we are dealing with a "metaNode", that is, a node that represents a subnetwork, we can set the HOUSEKEEPING flag on the node when it's not collapsed, and it will not be included in node traversal (which is shouldn't be). This is similar in concept to our current "hidden" node approach without the confusing semantics that mix model with view. Another point of discussion: does it make sense to be able to ask a node what groups it's a member of? Since nodes can be in multiple groups, getting the list of groups for a node turns out to be a very common operation, on the other hand, having a getGroupList method in the node interface ties CyNode to CyGroup in a way that might not be appropriate.
HyperEdges: similarly, a CyHyperEdge is an edge that extends CyEdge by adding additional sources and targets. The connecting node that provides the connectivity could be marked as a "HOUSEKEEPING" node and excluded from node lists, etc. In the case of HyperEdges in particular, I'm proposing that the methods in CyNode and CyNetwork that depend on edge connectivity be smart enough to support hyperedges (e.g. getNeighborList would not return the node providing hyperedge connectivity, but would return the targets of that node).
CyAttributes: there is a lot of discussion about CyAttributes, so what I have here should be considered only a placeholder. I do think that it makes sense to have both local and global CyAttributes. Local CyAttributes are pretty easy to figure out -- they are just indexed by the network, node, edge integer. Global CyAttributes are particularly difficult because there is no standard key we can use, so they will probably need to be indexed by some well-known local attribute (CANONICAL NAME?).

Attributes

The CyAttributes interface shouldn't be terribly different from what it is now. It basically has 3 different methods, similar to a Map interface:

getAttribute()
setAttribute()
containsAttribute()

In one way or another, the interface will need to support the use of primitive types but also limit the allowed types of attributes to those pre-defined values. In general we want to support 4 primitive types which facilitate inter-plugin communication:

int
float
String
boolean

Then

Lists of primitive types.
Maps of primitive types.

Finally, we would also like to support the recursive definition of these classes to support arbitrary types composed of the 4 primitive types:

Lists of Lists or Maps of primitive types.
Maps of Lists or Maps of primitive types.

Other requirements include:

Is fast for writing and reading and is memory efficient. Use case: large gene expression data sets
Ability to optionally backend attributes to a database. Use case: large gene expression data sets

Note: several API's were proposed through email attachments and have not yet been added here. Be sure to look back through email discussions on attributes to find them.

Open issues

The central dilemma of the Model API design is how the Attributes module relates to the Network module. In 2.x era Cytoscape, there is no explicit relationship between these modules. Attributes don't (explicitly) depend on Network and Network doesn't depend on Attributes. However, there is an implicit dependency connecting the two through the use of string identifiers. This means that Attributes effectively depends on a Network model that uses strings to identify graph objects.

What requirements are not met by current CyAttributes class (it seems to meet most current requirements)
- Large amounts of boilerplate code.
  - Any time where case statements are needed to evaluate how to behave with a certain attribute type. A brief check tells me this happens in the vizmapper, layouts, several parsers, and all over the implementation of attributes. This is especially problematic because most case statements don't account for all attribute types, which leads to things being mishandled.
  - The implementation of CyAttributes where all of the setAttribute methods are nearly identical. And so are the getAttribute methods. This is exactly the sort of thing that generics were designed for.
- The current global CyAttributes objects result in many very problematic dependencies that aren't apparent in interfaces. Because access to the CyAttributes objects are from a single static method, anyone can use these classes. Dependencies are OK when explicit (i.e. in the method or constructor signature), but when they're hidden through the use of static methods they make the code much less modular and much harder to test. Fixing this would be a great use case for a Dependency Injection pattern (i.e. Guice or Spring).
How should we implement local vs. global attributes?
- What is the use case for global attributes? If we've decided that nodes and edges are specific to networks and that they won't be re-used, what need will we have for global attributes?
How should special attributes be implemented? E.g. hidden.

Design Ideas

There are essentially two approaches to linking the Attributes module and the Network module:

Attributes depends on Network.
Network depends on Attributes.

Approach 1: Attributes depends on Network. Leave things pretty much as they are, but use an SUID instead of a string identifier. Attributes will at once be both global (i.e. accessible in any context) and local since the SUID is specific to a node/edge/network and not reused in multiple networks (as nodes and edges are now). In this case, Attributes depends on a Network modelthat uses SUIDs, so the dependency is much like it currently is, just not with strings.

For this approach the method interfaces would look something like this:

getAttribute(int SUID, String attrName);
setAttribute(int SUID, String attrName, Object attrValue);
containsAttribute(String attrName)

where the SUID is used to identify which object the attribute is bound to.

Advantages:

Conceptually simple: an SUID accesses attributes.
Similar to what we have now, so there won't be many conceptual hurdles for plugin developers.
Would require relatively few changes.
Still easy for attibute browser and other to see all attributes.

Disadvantages:

Extra steps are involved for getting the attribute, e.g.
- attrs.getAttribute( node.getSUID(), ...);
Saving state might be harder since the mapping from attribute to graph object is ephemeral (SUID).
???

Approach 2: Network depends on Attributes. Reverse the dependency so that Networks depend on Attributes and make local Attribute objects part of the CyNetwork/CyNode/CyEdge classes. Instead of getting attributes from one location, attribute objects will become available from individual objects, such as nodes, edges, or networks.

For this approach the method interfaces would look something like this:

getAttribute(String attrName);
setAttribute(String attrName, Object attrValue);
containsAttribute(String attrName)

where the object binding is implicit to the object.

Advantages:

Conceptually simple: an object has attributes.
Simple access to attributes.
- node.getAttribute(...), edge.getAttribute(...)

Disadvantages:

A big change since this is opposite of how we do things now.
A conceptual shift for plugin developers.
Perhaps more difficult to track what attributes currently exist since you'd have to know what objects have attributes.
???

Resolution

We've decided to begin with approach 2 as it seems to make the most sense to people that attributes are connected directly to their graph objects.
We've initially decided to keep the concept of CyDataTable separate from CyAttributes.

Identifiers

Gary's ID proposal from Dec.15.2007:

Graph objects = nodes, edges, groups, hyperedges (these are children of GraphObject)
Each graph object has a session unique integer ID (G-SUID) (not globally unique, just unique for a given Cytoscape session)
Each attribute row in a table has a SUID (A-SUID - not the same space as the G-SUID)
A map exists between graph object SUIDs (G-SUIDs) and attribute table row SUIDs (A-SUIDs). This is simple and allows a lot of flexibility. The presence of SUIDs mean you don't have to track IDs in any context, which is the simplest option. This also allows multiple attribute tables (think relational database tables). For instance, we can implement user attributes in one table and hidden attributes in another table. In this case of >1 attribute tables, you need to store table context in the map. Further, we can have multiple attribute tables in the same space e.g. load up 2 gene expression datasets in 2 different tables. This gets rid of the need for empty attribute table cells. I don't think we should really implement this latter option, as the GUI is more complex and we already have an efficient CyAttribute data structure (MultiHashMap), but having it is only a change in how the graph object SUID to attribute SUID map is used. Also, the presence of a GraphObject superclass will make it easier for more general algorithm code to be written (really important!) and provides a place to track G-SUIDs. Also, the GraphObject class can have additional children if we need to add more things in the future. The advantage of this design is that there is no coupling in the model i.e. network doesn't know about groups leaving the model layer to be very flexible, however, composition (bringing different parts of the model together) in the application layer is facilitated. What about network attributes? It's just another SUID map. Networks are separate from other objects that have SUIDs because of the above point about algorithm generality. You don't want graph algorithms to work on networks as a GraphObject (and semantically, networks should not be network objects, otherwise we introduce a high level of complexity into the model by having hierarchy in there by default - you could be tempted to implement groups like this, but I'm not sure it would be a good idea due to the complexity issue).
Graph objects are not shared between graphs. This is important for reducing complexity. If you want to have network specific attributes or shared attributes across networks, just update the SUID map. Challenges: keeping the map up to date is work, unlike in our current model. It could be implemented either using an event based manager design or building a layer on top of the simple model APIs that handles adding attributes to graph objects and keeping the map current. This is all application layer work, though, and the application sets the policy of attribute sharing among graph objects.

Notes:

Persistence: The only rule for persistence would be uniqueness within a session or file, so you can write or read IDs however you want, as long as you maintain uniqueness per session. This would likely involve renumbering the SUIDs of all graphobjects upon reading from a session file.

Network merging: you can't use SUIDs to determine equivalence between graph objects. This needs to be done at the application layer using additional information, such as attributes of those graph objects.

Open Issues

Should we implement multiple maps to handle local vs. global vs. hidden attributes?
Should users be able to use this like a database and just create their own tables? (with Cytoscape having default tables defined for graph object attributes?)
Should we use an existing database layer to implement this or just use a basic map?
Need to decide a data sharing policy for nodes e.g. no sharing of attributes (memory intensive in the worst case) vs. sharing by default (current case, breaks in some cases e.g. local attributes). The simplest level of the proposal does not deal with shared graph objects or attributes (1:1 mapping). If you want to have 1:many mapping from graph objects to attributes (shared attributes/node), you have to do extra work (though Sarah had a good point that databases may already do some of that work). You could also have other mapping relationships i.e. many:1 (shared nodes/attribute) or many:many, but I don't think we want those, at least initially, because their complexity doesn't justify their use in terms of our use cases.

Resolution

We've decided to keep the CyAttributes concept and CyDataTable concept separate for now.
All nodes, edges, networks, etc. will have SUIDs.

Conceptual Split of CyAttributes and CyDataTable

Trey has suggested that one of the major advantages of Cytoscape is that it provides a link between network data and non-network data, without necessarily constraining either side. As we've discussed how to provide both per-object CyAttributes as well as "global" CyAttributes, an approach has emerged that might help us conceptualize what we're trying to achieve. What follows is an articulation of that conceptualization. The main point is to separate the concept of a CyAttribute, which would be bound to a GraphObject and a CyDataTable, which would be explicitly unbound. This has several implications:

CyAttributes would behave pretty much as they do now, although they would always be "local". Because they are explicitly local, we could also provide methods in GraphObject to get the attributes for that object. For new developers, this would be a more natural layering, I think.
CyDataTable could be implemented as a MultiHashMap, but it could also be implemented as an interface to a relational database, either an in-memory implementation or a client-server implementation.
If we want to provide multiple backends for CyDataTable (a good idea, I think), we should think very hard about the interface a back-end would need to provide and how the features of the backend can be appropriately exposed.
There needs to be a mechanism to explicitly link a CyAttribute to a cell in a CyDataTable. There are two pretty straightforward approaches to this:
1. There could be an application-level function that would allow the users to bind a CyAttribute to a CyDataTable. This would essentially just copy the data. This has several advantages:
  - CyAttribute and CyDataTable would be completely separate at the Model layer. No danger of cyclic dependencies.
  - CyAttributes are pretty fast, and we could maintain that speed for all of the things we use CyAttributes for (e.g. VizMapper).
  But, the application layer interface would need to listen for events to update the CyAttribute if the data in the CyDataTable changes.
2. Alternatively, we could have a special kind of CyAttribute that is an explicit link to a row and column in a CyDataTable (or a single-cell result from a query into a CyDataTable?). This would be very cool, and provide significant expansion possibilities, but it seems to me that it could also make the overall performance of Cytoscape dependent on the performance of the CyDataTable backend, which is risky.
In either approach, this resolve some of the ID mapping issues discussed above -- there becomes an explicit way (and a clear conceptual model) for the developer and user to map CyAttribute (easy, since they are explicitly bound to a GraphObject) and CyDataTable data.

Open Issues

Backwards compatibility: would we need to predefine three data tables for unbound node, edge, and network attributes?
What is the user interface to a CyDataTable? Could we just extend the Data Browser to include a tab for every loaded data table, or do we need something more elaborate?
How do expose the underlying capabilities of a backend? If our default implementation is based on MultiHashMap, I really don't think we want to try to implement an SQL interface to MultiHashMap. On the other hand, if we've got an embedded interface, we may certainly want to expose an SQL interface.

Resolution

We've initially decided to keep the concept of CyDataTable separate from CyAttributes.

Events

Define a low level event mechanism that can be used by modules to communicate their state changes to other modules that depend on them, but not vice versa.
We should isolate all data access to the Event object instead of the Listener interface. The goal is to allow the type of data returned or accessible by the event to be changed in a painless way. If the interface encodes this information then to change the type of data returned, we'd need to change every implementation of the interface!!! Obviously, this isn't a flexible design. An example of where this is done incorrectly is the MulitHashMapListener. Contrast this with the GraphPerspectiveChangeListener which demands that only one method be implemented. All of the data that needs to be passed along for a GraphPerspective change is captured in the GraphPerspectiveChangeEvent, which would be relatively easy to add methods to without impacting anyone as long as the existing interface is supported.
An OSGi best practice is to use the whiteboard pattern and ServiceRegistry to listen for events. The general idea is that the Listener interface becomes a service and an event Listener there registers itself as a service provider with the ServiceRegistry. This means that an event producer does not keep track of a list of listeners itself, instead it simply queries the ServiceRegistry to find all Listener services. Once it has them, it calls the appropriate Listener methods. This completely decouples event Producers and even Listeners.

Concerns

The whiteboard pattern registers a listener interface globally. For instance a NetworkListener interface would add itself to the service registry, but then that listener would hear events for any network that fires events. If the listener only wanted to hear events for a specific network, then it would be forced to listen for all events and check each one if it is for the proper network. If we used the standard listener pattern, then Listeners would register themselves with specific networks and would only hear events from that specific network. However, if listeners were forced to register with each network individually, then they would also need to listen for NetworkCreated events to get the networks which they could then register themselves with.

Synchronous vs. Asynchronous events.

Synchronous
- events are simple and simply coded.
- There is a HUGE potential for listeners to severely impact the performance of Cytoscape by Listeners doing some intensive computation in the listener method.
- This potential performance issue exists now and while it hasn't been a severe problem, it has bitten us a few times.
Asynchronous
- This eliminates the performance issue of Listeners but introduces the need for threads and thread safety.
- Creates lots of opportunities for thread weirdness and out-of-order execution problems.
- Any module that emits asynchronous events would need to be thread safe. Or is is merely those classes referenced by the event object itself which would need to be thread safe? Either way, this is a VERY big change from how we've been doing things, but this is likely necessary for other reasons as well.
- Creating a network. Right now the network gets created, then the view, layouts are applied, etc., and this all happens in "control" code in Cytoscape.java. This will go away, but will need to be replaced with a mechanism that is similarly effective and provides a simple interface. Call createNetwork on CyNetwork, when it's finished, it fires a NetworkCreated event, which the View hears, creates a view, etc., then fires a ViewCreated event, the layout hears the ViewCreated event and modifies, the view accordingly and the same happens with the VizMap and anyone else who's listening. As each listener to the ViewCreated event updates the View, the view fires a ViewChanged event which the Renderer hears and redraws the Graph. Could this work with asynchronous events?

What if we allow thread safe libs to emit asynchronous events and force non-thread safe libs to fire synchronous events? Can we even have a mix of synchronous and asynchronous events?

Group related events using inheritance. If a network fires 4 network change events (add node, add edge, delete node, delete edge), but all you care about is that the network changed and not about specific nodes or edges, then instead of listening for 4 different types of events it would be useful to just listen for a single "change" event. Can we provide support for this that doesn't result in lots of duplicate events being fired?

Design Ideas

Event type

The Event type model means there is one Event object that is differentiated by a String or some other identifier. The data that defines the state of the event is generally captured in some sort of untyped Object payload like Object getNewValue(); or Object getOldValue();. This is how PropertyChangeEvents work.

Advantages

It is easy to write a listener since you only have to write one method to handle all Event types.
New events are easy to define, you just give them a new identifier.

Disadvantages

Your one listener method must differentiate between the possible Event types, which can lead to big (ugly) if/case statements.
It's hard to know what events are available since the identifiers are not captured in any single place.
Potential performance hit if the end user is not careful about which Event types s/he handles.

Inheritance

Instead of differentiating Event objects by a String or identifier, you differentiate based on object type. The important aspect of this difference is that differently typed objects can customized access to state data. So, instead of Object getNewValue(); you can have CyNetwork getNewNetwork();, which is less prone to error. An inheritance event model allows for two types of Listener models: a single listener that listens for events of the base type or (many) typed listeners that listen for events of a specific type.

Single Listener Advantages

It is easy to write a listener since you only have to write one method to handle all Event types.
New events are easy to define, you just need to extend the base class.

Single Listener Disadvantages

Your one listener method must differentiate between the possible Event types, which can lead to big (ugly) if/case statements.
It's hard to know what events are available since the events are not defined in any single place. However, javadoc handles this by capturing the inheritance relationships between events, which can very helpful.
Potential performance hit if the end user is not careful about which Event types s/he handles.
Listeners are less type safe.

Multiple Type-specific Listener Advantages

You only listen to the events you're interested in.
New events are easy to define, you just need to extend the base class.
Less chance of performance issues since you only listen to the events you're interested in.
Events are more type safe.
Listeners are more type safe.

Multiple Type-specific Listener Disadvantages

You'll need to write at least one listener method for each Listener interface you implement.
New events will require a new Listener interface as well.
It's hard to know what events are available since the events are not defined in any single place. However, javadoc handles this by capturing the inheritance relationships between events, which can very helpful.
It's hard to know what Listeners are available since they are not defined in any single place. Javadoc doesn't help here.

Resolution

We've decided that we'd prefer to use the inheritance model rather than the event type model.
We'd prefer multiple listener interfaces, but we'd also like to explore using inheritance to group similar events.
We will provide facilities for both Synchronous and Asynchronous events. The challenge will be that any event (and any object referenced by the event) that is fired asynchronously will need to be thread safe. Since almost nothing in Cytoscape is currently thread safe, we will initially fire most events synchronously.
We will initially use the OSGi Service Whiteboard pattern for registering event listeners. The primary concern is with large numbers of events and potential impacts on performance. We will monitor this closely.
Event listeners can be grouped using inheritance and by using reflection we can fire events both listeners and super classes of listeners. We will implement this as needed.

An initial attempt at the event handling API is available here: http://chianti.ucsd.edu/svn/csplugins/trunk/ucsd/mes/api/src/main/java/org/cytoscape/event/, with some sample event and listener interfaces defined here: http://chianti.ucsd.edu/svn/csplugins/trunk/ucsd/mes/api/src/main/java/org/cytoscape/network/events/

Groups

A Group is a concept where multiple nodes and edges get collapsed into a single node in a network view. There are several use cases for groups. Here is a link to 13 detailed use cases.

ScooterMorris: In the current implementation, the concept of a Group is a collection of Nodes. We intentionally avoided limiting the concept of groups to subnetworks or metanodes since there were other use cases that did not involve collapsing and expanding functionality, which we considered more of a View function. The current implementation supports a separation between the way a group is viewed and the group object itself. Group viewing is implemented by a CyGroupViewer, of which I am aware of three examples: namedSelection, metanodePlugin2, and bubbleRouter. Of these, only the metanodePlugin2 viewer supports expand/contract.

My conceptual model for groups derives from biological pathways, where you want to represent a collection of proteins as a single node (e.g., a protein complex or set of paralogs). You may want to view the collection as a metanode or as an expanded (stacked or boxed) group. In terms of the model, it is critical that groups facilitate the mapping of attributes from the children nodes to the group node to allow, for example, the display of an average value on the group node calculated from the children. Here the children nodes are conceptually part of the network, even when collapsed. For example, if I ask how many proteins are in the pathway, I expect the children of every group to be counted. Use case 14
The cluster code we're developing creates groups for each cluster. In the case of a hierarchical cluster, we build a hierarchical series of groups, which are visualized using the namedSelection plugin. In the case of an MCL or K-Means cluster, we build groups which use the metanode viewer. Use case 15
For our large superfamily networks, each subgroup and family is represented as a group and typically visualized using the metanode plugin. This allows us to simplify a complex network by collapsing it to see just the families or subgroups. Use case 16

MikeSmoot: The 13 use cases above and described here as well as Scooter's three motivations (henceforth use cases 14-16) can be abstracted into two different classes:

The collapse-expand, metanode view where multiple nodes in a network can be collapsed into a single view element the reverse where the single view element can be expanded into the original, multiple nodes. This subsumes use cases 1,2,3,4(?),5,6,8,11,12,14. Use case 13 can be generalized to the question of how algorithms that operate on networks support metanodes, i.e. do the algorithms see an expanded or collapsed network?
Nodes are "grouped", which is to say identified in some way as special, but no particular visualization semantics are attached the group. This includes use cases 4(?),7,9,15,16.

Use case 10 and the issue of background (and foreground) graphics is, I believe, a totally separate issue.

To my mind the second class of use cases is solved by attributes. To identify a "group" of nodes that are special in some way, just use one or more attributes. This is not to say that there shouldn't be support code to make life easier for people using these attributes (i.e. supporting hierarchical relationships), but I don't see a need for a new data representation when the same functionality is already supported using existing classes.

So that leaves the design for the first class of use cases, or the metanode/collapse-expand problem.

Open Issue

Currently, Groups are modeled as Nodes. I believe that this is the wrong abstraction. A Group consists of nodes and edges just like a network, the only difference being that Groups have the concept of Inner and Outer edges. Inner edges connect two nodes which are both within the group whereas Outer edges connect one node that is within the group and one node that is outside of the group.

NOTE: Currently, a Group is actually not modeled as a Node, a group is represented by a node in the network, which in the metanode case allows groups to have edges, and in other ways provided a way to add attributes to groups, which is critical. It also provided a way to find groups by iterating over all nodes, which was critical for serialization.

Design Ideas

Model Groups as a subclass of CyNetwork with additional methods that return just inner edges and just outer edges.
Nodes can be associated with Groups using attributes that point to network SUIDs.
Several approaches to handling groups:
- We could abandon the current concept of Groups, and implement subnetworks throughout the architecture. This would mean that a Network contains Nodes, Edges, and Networks. This might involve a fair amount of work in the renderer, but this may be the cleanest approach and would offer a reasonably consistent model. It really doesn't meet some of our current requirements, however, since when a subnetwork gets expanded we want nodes within the subnetwork to connect to nodes in it's parent. This is not easily modeled with a hierarchical network without additional support, which complicates the model. I suppose we could add Outer edges to the Network, which has some advantages (Networks could now be modeled as parts of a larger whole).
- Alternatively, we could implement a CyGroup in the model along the lines of the current group concept. A group would not be either a node or a network -- it would be a group, which would be a subclass of GraphObject, which would mean that groups would have attributes just like nodes, edges, and networks. The visual semantics of a group (collapse/expand, selection, visual boxing, etc.) would be handled at the view level.
- We could punt and not provide for groups at the model layer and implement them at the application layer. This would require a lot of tracking at the application layer to maintain node membership, edges, etc., and I think it would be very complicated to implement without some support in the model.

Hyperedges

A hyperedge is an edge in a hypergraph that can connect two or more nodes at once. A multigraph is subset of a hypergraph where one edge may connect a maximum of two nodes. The requirement for hyperedges is the ability to render biochemical interaction diagrams where reactions are modeled as edges between nodes and these reactions are modified by connecting another edge to the existing reaction edge. The appearance is roughly that of one edge connecting 3 nodes, which would imply that a hyperedge is needed.

Open Issues

I think the Hyperedge model is the incorrect abstraction for biochemical interaction diagrams. The problem is that in a hypergraph a hyperedge is a single entity with no distinct parts. In the biochemical interaction diagram example the reaction edge is a distinguishable entity from the modifier edge, meaning that the single hyperedge connecting the three nodes is at least two edges (reaction and modifier). If a true hypergraph were used, the distinction between reaction and modifier would not be possible since a true hyperedge is a single entity. I think the fundamental problem here is that as biochemical interaction diagrams are currently conceptualized, they are not graphs at all!
An implementation of hyperedge currently exists which models hyperedges using connector nodes. That means a reaction edge would actually be two edges connected to an intermediate connector node. Then, any modifier of the reaction is also connected to the connector node. What this effectively does is convert the non-graph biochemical interaction diagram into a simple multigraph.
- (AllanK: one way to conceptualize the distinction between reaction and modifier is to think of each Node as playing a Role in an interaction/reaction/edge. In a simple edge, you'd have one Node with a Role of Source and one Node with a Role of Target. In a hyperedge, you'd have Nodes with Roles of Substrate, Product, Modifier, zero or more of each. In the hyperedge plugin, this Role is represented in the 'interaction' attribute of each of the edges that run from a participant Node to the connector node. If there were a true hypergraph model in which hyperedges were atomic, as Mike describes above, then the semantics of a Role would probably need to be associated with the Node object in order to distinguish between reaction and modifier. This, however, would add complexity to the Node object).
Here are some other pathway types that may benefit from hyperedges:
- Molecular interaction maps, where node-node relationships are not enough, they also use node-edge relationships (e.g. node x inhibits edge y).
- GPML pathways: GPML supports edge-edge interactions, see example. In this example, the blue edge would be a hyperedge, representing the dimerization of P1 and P2.

MikeSmoot: I think in both the case of MIMs and GPML a hypergraph is the wrong abstraction for thinking about node-edge or edge-edge relationships. For example, if the dimerization of P1 and P2 in the previous example happens before inhibition, then I don't think that a hyperedge can capture that information because all the edge knows about are source and target nodes.

Design Ideas

Instead of implementing a true Hypergraph in Cytoscape, I would advocate a variation of the current Hyperedge plugin. Instead of adding (incorrect) hyperedge semantics to the current network model, I would change the way that we present the plugin. The idea is to think of the plugin of a way of simply transforming a biochemical interaction diagram into a multigraph. The advantage of this approach is that we maintain a simple multigraph network model and should allow us to use much of the code that already exists.
- (ThomasKelder: I like this solution, my only request is to make the HyperEdge API more visible by putting it in the core, so that other plugins can easily use it (and are encouraged to do so). Currently, most plugins don't deal with hyperedges, since it's not part of the Cytoscape API. So, if you perform layout or run a network analysis plugin, they treat the internal hyperedge nodes as regular nodes, which is usually not what you want. Perhaps the hyperedge model could be included in the core API as an extra layer over the simple graph model, or, as you suggest, a module that translates between two representations).

Outdated_Cytoscape_3.0/ModelDiscussions (last edited 2011-02-24 15:36:39 by PietMolenaar)