CytoscapeLayerRefactor - Cytoscape Wiki

RFC Name : Layering of Cytoscape Code

Editor(s): Sarah Killcoyne, John Boyle, Mike Smoot

Date: August 16, 2007

Status: Open for comment

<<TableOfContents: execution failed [Argument "maxdepth" must be an integer value, not "[2]"] (see also the log)>>

Proposal

Many Cytoscape clones are available that: utilize both web and server based technologies; use advanced information visualizations of graphs; and support a plugin architecture. While Cytoscape is still preferentially, we can improve the functionality in a number of ways, including: clear scripting/macro capability; web based delivery; allowing for high throughput asynchronous message based analysis; server/distributed functionality; uniform access to data sources; and a componentized code base (e.g. to change the rendering code). While it may be possible to do many of these things with the current code base, all of these would be simpler and cleaner to implement if the code is relayered to separate the functionality. This would also enable cleaner builds and ensure that the code is maintainable in the future. This proposal builds upon the following RFCs: (38) Code Layering, (40) Scripting in Cytoscape and (6) Cytoscape Headless Mode Operation.

In order to accomplish the goal of relayering we will introduce use cases/features, including those that that require scripting, web front ends and easier maintenance. We will also discuss the package structure and build dependencies. The plan is to iteratively move towards a fully layered architecture through meeting several milestones that will also allow Cytoscape to continue some forward development during this work. The current package structure will first be refactored into the an intermediate structure that mimics that model/view relayered code we desire, then a set of temporary interfaces will be created to handle the dependencies that will exist to allow us to meet the first milestone. We can then take that structure and set up each part as an OSGi service with the registry to meet the second milestone. Finally we can go back and remove the various artificial interfaces and dependencies we introduced initially in order to have a fully layered system. At this point it will be much simpler to set up interfaces for plugins, scripting and command-line control of Cytoscape.

Use Cases

Allows for headless mode/scripting/macros to be implemented fairly simply
Easily change/replace giny backend with other rendering code
State capture
Clear logging
Switch around front ends: swing gui/Spotfire/Web site
Portlets
High throughput analysis (via headless/server/scripting)
Better plugin interfaces using each layer only as required
Distributed/Server based functionality
Robust code
Easier maintenance and testing

Build Dependencies

The proposal is to initially introduce three layers into the code: Model layer which contains the business logic; view layer containing all the graphical/presentation code; and the IO layer which controls readers/writers. These will be built as subsystems, with the no dependencies existing on the model layer. View and IO systems will be dependent on Model and Application will depend on all three. Temporarily we will introduce a “Common” subsystem which all other systems will depend upon to allow us to factor out dependencies more simply. As the relayering progresses, interfaces from the Common system will be refactored to remove this dependency.

Package Structure Overview

The packages will be broken into four main groups as described above: model, view, application and io. Each of these packages will contain classes specific to that part of the system.

Model will contain two main packages: network, for the model objects required to create a network and handle network events and attribute, for the model objects required to create attributes and handle attribute events. The model package will have no dependencies outside of itself (except temporarily the common package).
View will contain two main packages: network, for viewing the entire network at the macro and micro level (and any other views we may wish to add to a network); attribute, for viewing object attributes using tables, notes or other views.
IO/Comms will again contain the two packages network and attribute for the finders and producers of the two model objects, initially just filesystem based, this can eventually be expended to handle database or internet finders and producers.
Application will contain several packages that will be specific to the swing gui application including: dialogs, to contain all of the various dialogs required for user interaction or messaging; init, to set up and start the application; process, to handle the tasks various parts of the application need to do; widgets, for bits like the visual styles and menu actions; util for any utilities package necessary.

Model Layer

The model layer contains the basic data structures necessary for the underlying graph, including networks, nodes, edges and attributes.

Model Definition

The network package structure will be a set of interfaces for the basic network object types. All will inherit off of a single interface that will contain basic information each network object should need (see below).

BaseNetworkModel: interface for the basic information an object in the network package should contain including unique identifier, name, attributes and selection state
CyNetwork: interface (extends BaseNetworkModel) for additional requirements of a network such as creating, adding and removing a node/edge or groups of nodes/edges
CyNetworkArray: interface (extends BaseNetworkModel) for requirements of lists of networks such as adding, removing and getting a CyNetwork from the list
CyNode and CyEdge are interfaces (extending BaseNetworkModel) for the Node and Edge object types

Model Events

The network event package structure is based on the standard java event pattern (<item>producers firing <item>events to a list of <item>consumers). The definition and reason of the event objects being past between objects depends on the type of event, and is detailed in the description below. Each model contains a list of its consumers, however to enable the ability to have multiple selections propagation mechanisms attached to one model a series of selection delegates can be attached to the model (each containing separate lists of listeners.

The CyNetworkArray can fire the following events to the related listeners:
- NetworkArrayModelSelectionEvent: Contains a set selection object containing the identities of the network models that have been selected, the reason can be one of SELECTED or DESELECTED.
- NetworkArrayModelCreateEvent: Contains the identity of the new network model that has been created.
- NetworkArrayModelDestroyEvent: Contains the identity of the network model that has been destroyed. This could possibly be a vetoable action.
- NetworkArrayModelModifyEvent: The reason can be one of NAMEMODIFY, STRUCTUREMODIFY, IDMODIFY, ATTRIBUTEMODIFY. The NAMEMODIFY and IDMODIFY means that the string representations have changed, STRUCTUREMODIFY means that the order of models has changed and the ATTRIBUTEMODIFY means that the attributes/descriptions associated with the NetworkArrayModel has changed.
The CyNetwork can fire the following events to the related listeners:
- NetworkModelSelectionEvent: Contains a list of the edges and nodes that have been selected. The reason can be one of SELECTED or DESELECTED.
- NetworkModelCreateEvent: Contains a list of the new nodes or edges that have been created. The reason that is passed with the event is one of NODECREATE, EDGECREATE or NODESANDEDGESCREATE.
- NetworkModelDestroyEvent: Contains a list of the nodes or edges that were destroyed. The reason for the event can be one of NODEDESTROY, EDGEDESTROY, or NODESANDEDGESDESTROY.
- NetworkModelModifyEvent: The reason for the event can be on of NAMEMODIFY, IDMODIFY or STRUCTUREMODIFY. The NAMEMODIFY and IDMODIFY reflect changes to these fields. The STRUCTUREMODIFY means that the attached list of nodes and edges has changed, these are changes to their attributes or display properties.

Attribute Definition

The attribute package structure would contain the basic interface for an attribute and set of attributes.

AttributeModel: interface to define the basic information for an attribute including id, name, namespace, parent namespace and data.
AttributeArrayModel: interface to handle sets of attributes with the basic get, set and remove operations.

The attribute events package would contain events for create, destroy, modify and selection events (patterned similarly to network events).

The AttributeModel can fire the following events to the related listeners:
- AttributeModelCreateEvent: Contains the identity of the new attribute model that has been created.
- AttributeModelDestroyEvent: Contains the identity of the attribute model that has been destroyed.
- AttributeModelSelectionEvent: Contains a selection object with a list of attribute models that have been selected. The reason is either SELECTED or UNSELECTED.
- AttributeModelModifyEvent: The reason can be one of NAMEMODIFY, NAMESPACEMODIFY, DATAMODIFY. The NAMEMODIFY means that the string representation has changed, NAMESPACEMODIFY means that the namespace of the attribute or it’s parent has been modified, DATAMODIFY means that the data contained by the attribute has been modified.

View Layer

View layer is dependent on the model. It provides visual representations of the model, and will include additional state information. Controller classes will not be used in the first instance, but will be introduced when/if a web based system is introduced. The planned views include:

Macro network view:
- The current network view that a user interacts with in the desktop version of Cytoscape
- Web based network view for user interaction
Micro network view:
- A small view like the current “birds-eye-view” in the desktop
Attribute table view
- Table containing the attribute information for each object (network, node, edge) via a swing gui or an html table
Attribute notes view
- View of a single attribute visually attached to the object it is annotating, similar to Mac Stickies, a table or pop-up html
Group (networks, nodes, edges, attributes) views
- Groups of nodes or edges could be viewed with shapes around the grouped objects in a swing or html view, as separate sub-networks connected to a single parent node or groups of attribute notes (e.g. Mac Stickes)
Portlets
- View a network or set of networks (macro or micro) and associated attributes in separate portlets

IO Layer

IO/Comms layer handles network creation from both remote and local locations. The system is designed to be extensible to a number of different scenarios (including web service, J2EE, database and file system based loading). A basic single login security model and location “property” are suggested.

Producers are used to access single Models (either CyNetwork or CyNetworkArray) and so require a unique id, whilst finders provide “searching” functionality. The searching specification can either be through a generic method or through specific method signatures. The security model takes login and location information (e.g. JNDI resource, file name). Initially only a file system based implementation is planned.

Additional Packages (Layers)

Application package will contain anything specific to the gui application (e.g swing for a desktop application).

Dialogs required for user interaction with any part of the view
- Menus/Menu bars
- Various window panels
Initialization classes to start the application and associated models/views.
Process classes to handle the various tasks a gui application requires such as downloads, creating network views etc.
Widgets for the discrete bits of the application that are specialized
- Vizmapper (visual styles)
- Actions to handle menu interaction
- Layout classes to handle laying out the visualized networks
Util classes

Common package will contain all the glue interfaces required as we separate out the layers. This would be a temporary package that will eventually be removed when the final layered system is complete.

Project Management

Project Plan

Excel version of the plan: RefactorProjectPlan.xls

Tasks and Milestones: All time estimates based on 1 FTE.

Milestone/Deliverable 1: Completed modular systems.
1. Refactor the packages (estimated at 1 month): move the code into the new package structure. Starting with the model each package should compile in the order demonstrated above. The model should have no dependencies outside itself. View, IO and Plugin depend on the model and the application will depend on Model, View and IO.
2. Introduce interfaces (estimated at 3 months): use ‘artificial’ interfaces to break dependencies between classes in different packages. Starting with the model, write interfaces to allow other layers to access the model and remove all dependencies to classes outside the model. Once the model has no dependencies, work on other refactoring to ensure dependencies are correct and unidirectional.
3. Unit test redevelopment and documentation (estimated at 2 months).
Milestone/Deliverable 2: Components based runtime
1. Produce componentized build (estimated at 1 month): build the system as a series of artifacts (using Maven or similar).
2. Produce and register the relevant OSGi services (estimated at 1 month): Register each of the new components as services.
Milestone/Deliverable 3: First relayered system.
1. Remove dependencies (estimated at 6 months): Recode Cytoscape to reduce and repackage the artificial interfaces.
2. Redo build and services/components (estimated at 2 months). First release of new componentized Cytoscape.
3. Unit test redevelopment and documentation (estimated at 1 month).

Project Dependencies

The following project depend on the successful completion of the milestones in this project.

Plugin API - after the 3rd Milestone
- includes OSGi wrapping
Data integration architecture - after the 1st Milestone
- Web services such as id mapping
- See RFC 39
Web based prototype - after 3rd Milestone
- SVG or google toolkit UI
- Contain some data tools for accessing public interaction data
Headless - after 2nd Milestone
- Command-line interface, requires good interface for commands that are available
- See RFC 6
Scripting interface prototype - after 2nd Milestone
- User defined macros
- Ruby, Python, etc
- Maybe use similar interface to the command line interface implemented in Headless mode
- See RFC 40
Cytoscape server prototype - after 3rd Milestone
- Include database of (some) interaction information
- Lay web prototype over top for some gui functionality
Basic network analysis functionality - after 1st Milestone
- See RFC 41

Issues

There are a few issues that need to be resolved:

Graph Layout manager: is there a requirement to have two views share the same layout instance. If so, then a layout manager/handler with graph update events is need (via a delegate/handler system attached to the model). Additionally should graph layout be through a separate model attached to the view, allowing for easier maintainability of graph layout occurring in a separate thread.
Process Management: resource management is going to be needed if we wish to enforce “good behavior” of the system (and the plugins). If a process manager is needed, which will provide a basic state machine for thread based operations, then a process package should be introduced. This issue is related to the Plugin API RFC.
Attribute change propagations should be channeled through a high level object to keep the number of (delegate) objects small. These could be through the AttributeArrayModel or through the CyNetwork.
NetworkArrayModelModifyEvent could alternatively use a strongly typed system aligned with swing ListModel (including positional based insert based reasons).
Setting up all network objects to be parts of lists (node lists, edge lists) to be able to create specific groups of these objects based on some parameter
Determine what properties (attributes) belong to the model natively. Need to discuss specifically what belongs to a node: X, Y, Z, T? Others?

Alternative #1

Here is an alternative package structure. It looks complicated, but the dependencies are acyclic. Some existing class names are added for clarity.

At the bottom is the model package which contains the network model interfaces and within a subpackage or as an OSGi bundle, the implementations.
The view package contains interfaces that define the visualizations of the network model interfaces. The view package is only dependent on the network model. It is unclear to me whether implementations of the view (e.g. DingNetworkView) would be part of this package, an implementation subpackage, or part of the application.
The attributes package contains the interfaces that define the binding of arbitrary attribute data to the network model. This package only depends on the network model.
Instead of a distinction between view attributes and model attributes, the vizmap package defines how the attributes map to the view.
layouts depends on both the view and attributes packages. It may also need to depend on the model package.
io depends several packages. The primary reason for this is that most graph/network file formats conflate the network topology with the network visualization. If this weren't the case then io could be hidden in respective subpackages which would hide topology or visualization specific code.
actions is meant to provide basic commands for cytoscape such as loadNetwork, layoutNetwork, destroyNetwork, createView, etc.. There are no dependency arrow for this package because I'm undecided on how the dependencies should flow. I'm beginning to think that there should be an Action interface that each package can provide implementations of and the actions package wouldn't actually contain anything but the interface and an action broker or manager. This would mean that every package would depend on actions. This approach would have the nice effect of isolating model specific actions in the model package, layout specific actions in the layout package, etc..
application will contain the application specific code. Initially, this will be most of the GUI code in cytoscape. There could eventually be alternative implementations, such as an AJAX interface, or a command line interface. There are no dependency arrows here because this package will depend on everything.

Comments

How to Comment

Edit the page and add your comments under the provided header. By adding your ideas to the Wiki directly, we can more easily organize everyone's ideas, and keep clear records. Be sure to include today's date and your name for each comment. Try to keep your comments as concrete and constructive as possible. For example, if you find a part of the RFC makes no sense, please say so, but don't stop there. Take the extra step and propose alternatives.

AllanK: I have a couple of questions:

1. Can the architecture be utilized to provide some orchestration of user-level event handling, for example multiple, independently-written plugins that might compete over processing of a DropTarget? If so, then how would we do that?

SarahKillcoyne Aug 20, 2007 My understanding of this issue is the use case in which two or more plugins attempt to handle a particular event. I think the organization of this resides in the View layer, but that the Plugin API(s) should be written to handle resource allocation, rather than the view itself.

2. What does the network model assume about whether nodes, edges, groups, attributes are shared vs. copied when they appear in multiple networks? Our current set of conventions is quite complex. Can we achieve some simplification during the process of re-layering?

SarahKillcoyne Aug 20, 2007 I'm basing this on the idea that only parent/child/sibling networks share nodes and edges. If any two network can share nodes and edges I'm not sure how I would make this simpler except to say that only parent/child/sibling networks should do it. With that assumption I can think of two use cases and ways to handle them that I think should be simpler than what happens now.
- User has a large network, wants to view a selection of that network and make modifications. In this case I think we don't make an entirely new network model. We make a view for that selection, any changes made in the view propogate to the network and any other views of that network. This kind of does away with the subnetwork idea by offering views into the network instead.
- User has a large network and chooses to create a duplicate network (maybe with less/more nodes/edges). This should be a new network model with new nodes and edges and made clear to the user that this is the case.
I think this is simpler because in neither case do we need to handle sharing of any model objects. We would need to make very clear options for what the user does (creating a view vs a new network) but I think it's a neat idea too in that you could pop up certain groups from a network in a view and see changes back in the larger network. Clearly this is a departure from what we have done and needs discussion.
MikeSmoot Aug 23, 2007
I don't know that the model actually needs to know anything about shared nodes/edges. I think the fact that the RootGraph is currently exposed in a lot of interfaces is a design flaw. I think the use of a RootGraph should be an implementation decision. I'd probably keep the RootGraph, but I'm not sure that the Cytoscape application needs to know about it.

3. moving forward, we need to support arbitrary graphics, i.e. graphical shapes that are not attached to a particular node or edge in a network. Do we need an additional Interface for this? If so, then should we add a fourth interface to the model package?

SarahKillcoyne Aug 20, 2007 This would be an issue for the View package interfaces rather than the model.

ScooterMorris: This is a very nice start! Thanks so much for getting this moving. A couple of comments:

1. The model probably needs to include CyGroups as a separate model element, or we have to bind a group to a network, which I'm not sure is what we want (although there is room for discussion about that).

SarahKillcoyne Aug 20, 2007 I agree, I think groups make sense as part of the model.

ScooterMorris:

2. I've been increasingly wondering if Attributes really should be as separate from the model as we currently make them. I believe that that is a result of the implementation, not a desired architecture. Personally, I would like to be able to find all the attributes of this CyNetwork (or CyNode or CyEdge) without having to search a separate data structure. If we're going to go through a full relayering, I would certainly like to have the discussion about why CyAttributes aren't "attributes" of the objects they annotate.

SarahKillcoyne Aug 20, 2007:
I agree that attributes are part of the object they annotate. With the interfaces outlined in the model here, an attribute is a separate object in the model, but all network objects (CyNetwork, CyNode, CyEdge) would be required via the interface to handle their own attributes.
MikeSmoot Aug 23, 2007: I'm not at all persuaded that attributes should be tied to particular networks. Right now Cytoscape treats these as entirely separate entitities, which gives us a lot of flexibility. This decoupling of attributes allows us to do things like import a large attributes file (e.g. of synonyms) but have smaller networks that just use subsets of the available attributes. This decoupling reduces redundancy. What happens when you create a new network from selected nodes? Should we duplicate all attributes? What if we then modify the attribute in one network, but not the other?
As things are designed now, all attributes belong to this network, it's just a matter of whether the attribute maps to the identifiers. I think attribute namespaces might alleviate some of this confusion.
SarahKillcoyne 9/12/07: I think we might have a lot less redundancy to worry about if we have the idea of "views into the network" (like making current child networks views into the big network) versus "entirely new network that just happens to share nodes/edges". Views into a network wouldn't have to worry about being separate networks and having their own sets of attributes. It would use and modify the objects of the larger network, but let the user see just a subpart of the big one. Whereas if you have an entirely new network you don't want to have any attributes that it just happens to share with other networks altered by any other network. Personally I think it might make more sense than the current way where loading attributes for one network shows them on all networks making it sometimes difficult to find the attributes you want and sometimes confusing (yes, this is a UI thing but it appears to tie into the idea that all attributes belong to all networks).
MikeSmoot 9/18/07:
What you're describing here is the RootGraph concept. All nodes and edges in Cytoscape are in one giant graph called the root graph and each network is then a perspective on the RootGraph. Assuming I'm understanding what you're saying, then what we already do may be what you're proposing. Please clarify if I'm missing something.
SarahKillcoyne 9/24/07:
I think what I'm describing is just a little more fine grained and I'm not stuck on this, it's just how I understand it most easily. If I understand what you said, then each instance of Cytoscape really has only one ultimate Network model (RootGraph) and all the "networks" a user sees are just views into that global graph? I think what I'm proposing is that each network model is a data model of it's own (as if each network had it's own RootGraph though this doesn't really need to be the case in the implementation) and that each of these can then have views ("perspectives") that let you delve into subparts and share/change data on the large network. Perhaps it's just a matter of making this separation of view vs network clearer to a user, but it is true that right now changing a child network doesn't change the parent network right? I suppose at this point it becomes a matter of semantics, but in this case I think it makes most sense to have attributes that belong to a single parent network. It seems as if namespaces will be used to do just this anyhow?

ScooterMorris:

3. It's not entirely clear that layout should be dependent on the view or just the model. If a CyNode had an X,Y[,Z,t?] attribute then there are many very useful things we could do without having to create an entire view. Once of those things would be to layout the Network, then we could apply a view to map the layout to a specific representation or media. Notice here that I'm separating out the concept of a node's location from other graphic ideas such as color and opacity, which are more clearly (at least in my mind) part of the view.

SarahKillcoyne Aug 20, 2007: I'm having some trouble with layouts, maybe we need some specific use cases from the point of view of how you might use them via scripting or headless mode?