RFC Name : Cytoscape Projects |
Editor(s): ScooterMorris & SarahKillcoyne |
Date: Oct 23, 2007 |
Status: Draft |
Proposal
Extend the concept of sessions to a more flexible project based system where all of the data involved in a given project is transparent and accessible for altering or copying to another project. This would help biologists to work in the way they are used to, where one project may build off of another and use many of the same properties or data files. For example, a Project would "remember" that a network (saved as an XGMML file) was created by reading in a SIF file, and associating three node attribute files and two edge attribute files with it. The goal would be data provenance, not necessarily to be able to recreate the initial data sets. Moving forward, Projects could include logs (Cytoscape 3.0 logging), user created scripts/macros, workflows, etc. While the Projects panel might list all available projects, only one project would be opened at a time. As a result, a project could replace a Cytoscape Session as the main mechanism for saving state.
Background
Scientists are accustomed to thinking about their work in terms of projects where all of the various types of data and analysis they have done are stored together, but separately accessible for use in other work. Many pieces of open and proprietary software used by scientists use this concept now to help organize and share work (such as CPAS and SBEAMS). Cytoscape does nothing to help users track their data provenance. Where did these attributes come from? How was this network created? What steps (including what data sources used) were performed to expand the network? All of these are legitimate questions that Cytoscape provides no means of assisting users to remember.
Use Cases
- Tracking what data and sources were used to create or expand a network
- Easily copying a visual style from one Project to another allowing users to continue using their work (rather than creating it de novo each time)
- Sharing all of the data, properties and attributes easily
- Saving the full state of Cytoscape in a given project including:
- Properties
- Data
- Plugins/Themes loaded or used
- Macros/scripts
- Others?
Implementation Plan
There are two major parts to implementing a project system in Cytoscape.
Expand the definition of a session to include data provenance information
Both will be simplified through the proposed refactor of Cytoscape. In a relayered system the hooks into loaders in the IO layer can be added to track the data imported into a project from any source. Hooks added to the Application layer would allow for the association of scripts, macros or workflows with a particular project as well.
Visual representation of data provenance
This visual representation can be dealt with in the View and Application layers. The views on a network and information about the data in that network can be provided to the user in a format similar to that of the Eclipse IDE where projects contain trees of information that can include the type, date and path of any particular piece of data within the tree. This sort of system would make it easy to add new features to a project such as a log file that could function as a sort of laboratory notebook, recording the exectution and results of Cytoscape actions (including plugins) and notes recorded by the user.
Prior to relayering
Hooks into the current network and attribute loaders can be added. The current network view panel can be replaced by a Project panel that can initially list the network and attributes loaded and potentially the visual style currently applied. This panel can also start by listing the network, attributes and visual style. This particular work is not necessary to a 2.x release but could help prepare users for the switch.
Project Management
Project Timeline
This work will depend on the relayering of Cytoscape and can be started during the refactoring of packages (See Milestone 1).
Tasks and Milestones
- All estimates are in addition to the current estimates from Milestone 1
- Milestone 1: Track Loaded Data (est. 6 weeks)
- Add hooks into the IO package to track loading of data
- Milestone 2: Project View (est. 4 weeks)
- Create view of project that allows registration by various sources (for easy extension for actions, macros, logging etc at a later point) for the data loaded in IO
- Milestone 3: Project Panel (est. 4 weeks)
- Create a panel in the Application package to display the Project View
- Milestone 1: Track Loaded Data (est. 6 weeks)
Project Dependencies
Outline and projects that depend on this project, link to relevant RFC's and note at what point dependent projects could be started.
Issues
Comments
How to Comment
Edit the page and add your comments under the provided header. By adding your ideas to the Wiki directly, we can more easily organize everyone's ideas, and keep clear records. Be sure to include today's date and your name for each comment. Try to keep your comments as concrete and constructive as possible. For example, if you find a part of the RFC makes no sense, please say so, but don't stop there. Take the extra step and propose alternatives.