This is the homepage of the Cytoscape 3.0 Model layer design discussion.

Model Layer definition

The model layer contains the object/data model for core data structures useful for Cytoscape or Cytoscape-like software.

Component Modules

Requirements

Core structure of the Model layer

Open issues


Network/Graph

Open issues

Design Ideas

Two proposals have been made but not fully integrated and neither is necessarily final:

Model API Proposal #1

Model API Proposal #2

JavaDocs http://chianti.ucsd.edu/svn/csplugins/trunk/ucsf/scooter/cy3PluginAPI This proposal includes all of the interfaces that the graph module is expected to contain, including graph objects, attributes, projects and potentially groups. The basic idea is a two-level API.

A couple of other things to note:


Attributes

The CyAttributes interface shouldn't be terribly different from what it is now. It basically has 3 different methods, similar to a Map interface:

In one way or another, the interface will need to support the use of primitive types but also limit the allowed types of attributes to those pre-defined values. In general we want to support 4 primitive types which facilitate inter-plugin communication:

Then

Finally, we would also like to support the recursive definition of these classes to support arbitrary types composed of the 4 primitive types:

Other requirements include:

Note: several API's were proposed through email attachments and have not yet been added here. Be sure to look back through email discussions on attributes to find them.

Open issues

The central dilemma of the Model API design is how the Attributes module relates to the Network module. In 2.x era Cytoscape, there is no explicit relationship between these modules. Attributes don't (explicitly) depend on Network and Network doesn't depend on Attributes. However, there is an implicit dependency connecting the two through the use of string identifiers. This means that Attributes effectively depends on a Network model that uses strings to identify graph objects.

Design Ideas

There are essentially two approaches to linking the Attributes module and the Network module:

  1. Attributes depends on Network.
  2. Network depends on Attributes.

Approach 1: Attributes depends on Network. Leave things pretty much as they are, but use an SUID instead of a string identifier. Attributes will at once be both global (i.e. accessible in any context) and local since the SUID is specific to a node/edge/network and not reused in multiple networks (as nodes and edges are now). In this case, Attributes depends on a Network modelthat uses SUIDs, so the dependency is much like it currently is, just not with strings.

For this approach the method interfaces would look something like this:

where the SUID is used to identify which object the attribute is bound to.

Advantages:

Disadvantages:

Approach 2: Network depends on Attributes. Reverse the dependency so that Networks depend on Attributes and make local Attribute objects part of the CyNetwork/CyNode/CyEdge classes. Instead of getting attributes from one location, attribute objects will become available from individual objects, such as nodes, edges, or networks.

For this approach the method interfaces would look something like this:

where the object binding is implicit to the object.

Advantages:

Disadvantages:

Resolution


Identifiers

Gary's ID proposal from Dec.15.2007:

  1. Graph objects = nodes, edges, groups, hyperedges (these are children of GraphObject)

  2. Each graph object has a session unique integer ID (G-SUID) (not globally unique, just unique for a given Cytoscape session)
  3. Each attribute row in a table has a SUID (A-SUID - not the same space as the G-SUID)
  4. A map exists between graph object SUIDs (G-SUIDs) and attribute table row SUIDs (A-SUIDs). This is simple and allows a lot of flexibility. The presence of SUIDs mean you don't have to track IDs in any context, which is the simplest option. This also allows multiple attribute tables (think relational database tables). For instance, we can implement user attributes in one table and hidden attributes in another table. In this case of >1 attribute tables, you need to store table context in the map. Further, we can have multiple attribute tables in the same space e.g. load up 2 gene expression datasets in 2 different tables. This gets rid of the need for empty attribute table cells. I don't think we should really implement this latter option, as the GUI is more complex and we already have an efficient CyAttribute data structure (MultiHashMap), but having it is only a change in how the graph object SUID to attribute SUID map is used. Also, the presence of a GraphObject superclass will make it easier for more general algorithm code to be written (really important!) and provides a place to track G-SUIDs. Also, the GraphObject class can have additional children if we need to add more things in the future. The advantage of this design is that there is no coupling in the model i.e. network doesn't know about groups leaving the model layer to be very flexible, however, composition (bringing different parts of the model together) in the application layer is facilitated. What about network attributes? It's just another SUID map. Networks are separate from other objects that have SUIDs because of the above point about algorithm generality. You don't want graph algorithms to work on networks as a GraphObject (and semantically, networks should not be network objects, otherwise we introduce a high level of complexity into the model by having hierarchy in there by default - you could be tempted to implement groups like this, but I'm not sure it would be a good idea due to the complexity issue).

  5. Graph objects are not shared between graphs. This is important for reducing complexity. If you want to have network specific attributes or shared attributes across networks, just update the SUID map. Challenges: keeping the map up to date is work, unlike in our current model. It could be implemented either using an event based manager design or building a layer on top of the simple model APIs that handles adding attributes to graph objects and keeping the map current. This is all application layer work, though, and the application sets the policy of attribute sharing among graph objects.

Notes:

Persistence: The only rule for persistence would be uniqueness within a session or file, so you can write or read IDs however you want, as long as you maintain uniqueness per session. This would likely involve renumbering the SUIDs of all graphobjects upon reading from a session file.

Network merging: you can't use SUIDs to determine equivalence between graph objects. This needs to be done at the application layer using additional information, such as attributes of those graph objects.

Open Issues

Resolution


Conceptual Split of CyAttributes and CyDataTable

Trey has suggested that one of the major advantages of Cytoscape is that it provides a link between network data and non-network data, without necessarily constraining either side. As we've discussed how to provide both per-object CyAttributes as well as "global" CyAttributes, an approach has emerged that might help us conceptualize what we're trying to achieve. What follows is an articulation of that conceptualization. The main point is to separate the concept of a CyAttribute, which would be bound to a GraphObject and a CyDataTable, which would be explicitly unbound. This has several implications:

Open Issues

  1. Backwards compatibility: would we need to predefine three data tables for unbound node, edge, and network attributes?
  2. What is the user interface to a CyDataTable? Could we just extend the Data Browser to include a tab for every loaded data table, or do we need something more elaborate?

  3. How do expose the underlying capabilities of a backend? If our default implementation is based on MultiHashMap, I really don't think we want to try to implement an SQL interface to MultiHashMap. On the other hand, if we've got an embedded interface, we may certainly want to expose an SQL interface.

Resolution


Events

Concerns

The whiteboard pattern registers a listener interface globally. For instance a NetworkListener interface would add itself to the service registry, but then that listener would hear events for any network that fires events. If the listener only wanted to hear events for a specific network, then it would be forced to listen for all events and check each one if it is for the proper network. If we used the standard listener pattern, then Listeners would register themselves with specific networks and would only hear events from that specific network. However, if listeners were forced to register with each network individually, then they would also need to listen for NetworkCreated events to get the networks which they could then register themselves with.

Synchronous vs. Asynchronous events.

What if we allow thread safe libs to emit asynchronous events and force non-thread safe libs to fire synchronous events? Can we even have a mix of synchronous and asynchronous events?

Group related events using inheritance. If a network fires 4 network change events (add node, add edge, delete node, delete edge), but all you care about is that the network changed and not about specific nodes or edges, then instead of listening for 4 different types of events it would be useful to just listen for a single "change" event. Can we provide support for this that doesn't result in lots of duplicate events being fired?

Design Ideas

Event type

The Event type model means there is one Event object that is differentiated by a String or some other identifier. The data that defines the state of the event is generally captured in some sort of untyped Object payload like  Object getNewValue(); or  Object getOldValue();. This is how PropertyChangeEvents work.

Advantages

Disadvantages

Inheritance

Instead of differentiating Event objects by a String or identifier, you differentiate based on object type. The important aspect of this difference is that differently typed objects can customized access to state data. So, instead of Object getNewValue(); you can have CyNetwork getNewNetwork();, which is less prone to error. An inheritance event model allows for two types of Listener models: a single listener that listens for events of the base type or (many) typed listeners that listen for events of a specific type.

Single Listener Advantages

Single Listener Disadvantages

Multiple Type-specific Listener Advantages

Multiple Type-specific Listener Disadvantages

Resolution

An initial attempt at the event handling API is available here: http://chianti.ucsd.edu/svn/csplugins/trunk/ucsd/mes/api/src/main/java/org/cytoscape/event/, with some sample event and listener interfaces defined here: http://chianti.ucsd.edu/svn/csplugins/trunk/ucsd/mes/api/src/main/java/org/cytoscape/network/events/


Groups

A Group is a concept where multiple nodes and edges get collapsed into a single node in a network view. There are several use cases for groups. Here is a link to 13 detailed use cases.

ScooterMorris: In the current implementation, the concept of a Group is a collection of Nodes. We intentionally avoided limiting the concept of groups to subnetworks or metanodes since there were other use cases that did not involve collapsing and expanding functionality, which we considered more of a View function. The current implementation supports a separation between the way a group is viewed and the group object itself. Group viewing is implemented by a CyGroupViewer, of which I am aware of three examples: namedSelection, metanodePlugin2, and bubbleRouter. Of these, only the metanodePlugin2 viewer supports expand/contract.

MikeSmoot: The 13 use cases above and described here as well as Scooter's three motivations (henceforth use cases 14-16) can be abstracted into two different classes:

  1. The collapse-expand, metanode view where multiple nodes in a network can be collapsed into a single view element the reverse where the single view element can be expanded into the original, multiple nodes. This subsumes use cases 1,2,3,4(?),5,6,8,11,12,14. Use case 13 can be generalized to the question of how algorithms that operate on networks support metanodes, i.e. do the algorithms see an expanded or collapsed network?
  2. Nodes are "grouped", which is to say identified in some way as special, but no particular visualization semantics are attached the group. This includes use cases 4(?),7,9,15,16.

Use case 10 and the issue of background (and foreground) graphics is, I believe, a totally separate issue.

To my mind the second class of use cases is solved by attributes. To identify a "group" of nodes that are special in some way, just use one or more attributes. This is not to say that there shouldn't be support code to make life easier for people using these attributes (i.e. supporting hierarchical relationships), but I don't see a need for a new data representation when the same functionality is already supported using existing classes.

So that leaves the design for the first class of use cases, or the metanode/collapse-expand problem.

Open Issue

NOTE: Currently, a Group is actually not modeled as a Node, a group is represented by a node in the network, which in the metanode case allows groups to have edges, and in other ways provided a way to add attributes to groups, which is critical. It also provided a way to find groups by iterating over all nodes, which was critical for serialization.

Design Ideas


Hyperedges

A hyperedge is an edge in a hypergraph that can connect two or more nodes at once. A multigraph is subset of a hypergraph where one edge may connect a maximum of two nodes. The requirement for hyperedges is the ability to render biochemical interaction diagrams where reactions are modeled as edges between nodes and these reactions are modified by connecting another edge to the existing reaction edge. The appearance is roughly that of one edge connecting 3 nodes, which would imply that a hyperedge is needed.

Open Issues

MikeSmoot: I think in both the case of MIMs and GPML a hypergraph is the wrong abstraction for thinking about node-edge or edge-edge relationships. For example, if the dimerization of P1 and P2 in the previous example happens before inhibition, then I don't think that a hyperedge can capture that information because all the edge knows about are source and target nodes.

Design Ideas


Outdated_Cytoscape_3.0/ModelDiscussions (last edited 2011-02-24 15:36:39 by PietMolenaar)

Funding for Cytoscape is provided by a federal grant from the U.S. National Institute of General Medical Sciences (NIGMS) of the Na tional Institutes of Health (NIH) under award number GM070743-01. Corporate funding is provided through a contract from Unilever PLC.

MoinMoin Appliance - Powered by TurnKey Linux