Diff for "CytoscapeProject"

Differences between revisions 5 and 9 (spanning 4 versions)

RFC Name : Cytoscape Projects

Editor(s): ScooterMorris & SarahKillcoyne

Date: Oct 23, 2007

Status: Draft

Contents

Proposal
Use Cases
Implementation Plan
Project Management
Issues
Comments

Proposal

Extend the concept of sessions to a more flexible project based system where all of the data involved in a given project is transparent and accessible for altering or copying to another project. This would help biologists to work in the way they are used to, where one project may build off of another and use many of the same properties or data files. For example, a Project would "remember" that a network (saved as an XGMML file) was created by reading in a SIF file, and associating three node attribute files and two edge attribute files with it. The goal would be data provenance, not necessarily to be able to recreate the initial data sets. Moving forward, Projects could include logs (Cytoscape 3.0 logging), user created scripts/macros, workflows, etc. While the Projects panel might list all available projects, only one project would be opened at a time. As a result, a project could replace a Cytoscape Session as the main mechanism for saving state.

Background

Scientists are accustomed to thinking about their work in terms of projects where all of the various types of data and analysis they have done are stored together, but separately accessible for use in other work. Many pieces of open and proprietary software used by scientists use this concept now to help organize and share work (such as CPAS and SBEAMS). Cytoscape does nothing to help users track their data provenance. Where did these attributes come from? How was this network created? What steps (including what data sources used) were performed to expand the network? All of these are legitimate questions that Cytoscape provides no means of assisting users to remember.

Use Cases

Tracking what data and sources were used to create or expand a network
Easily copying a visual style from one Project to another allowing users to continue using their work (rather than creating it de novo each time)
Sharing all of the data, properties and attributes easily
Saving the full state of Cytoscape in a given project including:
- Properties
- Data
- Plugins/Themes loaded or used
- Macros/scripts
- Others?

Implementation Plan

There are two major parts to implementing a project system in Cytoscape.

Expand the definition of a session to include data provenance information

Both will be simplified through the proposed refactor of Cytoscape. In a relayered system the hooks into loaders in the IO layer can be added to track the data imported into a project from any source. Hooks added to the Application layer would allow for the association of scripts, macros or workflows with a particular project as well.

Visual representation of data provenance

This visual representation can be dealt with in the View and Application layers. The views on a network and information about the data in that network can be provided to the user in a format similar to that of the Eclipse IDE where projects contain trees of information that can include the type, date and path of any particular piece of data within the tree. This sort of system would make it easy to add new features to a project such as a log file that could function as a sort of laboratory notebook, recording the exectution and results of Cytoscape actions (including plugins) and notes recorded by the user.

Prior to relayering

Hooks into the current network and attribute loaders can be added. The current network view panel can be replaced by a Project panel that can initially list the network and attributes loaded and potentially the visual style currently applied. This panel can also start by listing the network, attributes and visual style. This particular work is not necessary to a 2.x release but could help prepare users for the switch.

Project Management

Project Timeline

This work will depend on the relayering of Cytoscape and can be started during the refactoring of packages (See Milestone 1).

Tasks and Milestones

All estimates are in addition to the current estimates from Milestone 1
1. Milestone 1: Track Loaded Data (est. 6 weeks)
  1. Add hooks into the IO package to track loading of data
2. Milestone 2: Project View (est. 4 weeks)
  1. Create view of project that allows registration by various sources (for easy extension for actions, macros, logging etc at a later point) for the data loaded in IO
3. Milestone 3: Project Panel (est. 4 weeks)
  1. Create a panel in the Application package to display the Project View

Project Dependencies

Outline and projects that depend on this project, link to relevant RFC's and note at what point dependent projects could be started.

Issues

Comments

How to Comment

Edit the page and add your comments under the provided header. By adding your ideas to the Wiki directly, we can more easily organize everyone's ideas, and keep clear records. Be sure to include today's date and your name for each comment. Try to keep your comments as concrete and constructive as possible. For example, if you find a part of the RFC makes no sense, please say so, but don't stop there. Take the extra step and propose alternatives.

-  ← Revision 5 as of 2007-10-24 00:42:17 →
  Size: 4822
  Editor: ScooterMorris
  Comment:
+  ← Revision 9 as of 2010-05-17 03:55:33 →
  Size: 5769
  Editor: GaryBader
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 7:
-[[TableOfContents([2])]]
+<<TableOfContents(2)>>
 Line 10:
-Expand the concept of Session and the presentation of networks in the Control Panel to introduce the concept of a Project.  Initially, a Project would include nothing more than the information associated with a session, including a limited amount of information about the files that were loaded to create the session.  For example, a Project would "remember" that a network (saved as an XGMML file) was created by reading in a SIF file, and associating three node attribute files and two edge attribute files with it.  The goal would be data provenance, not to be able to recreate the initial data sets.  Moving forward, Projects could include logs (Cytoscape 3.0 logging), user created scripts, workflows, etc.  While the Projects panel might list all available projects, only one project would be opened at a time.  As a result, a project could replace a Cytoscape Session as the main mechanism for saving state.
+Extend the concept of sessions to a more flexible project based system where all of the data involved in a given project is transparent and accessible for altering or copying to another project.  This would help biologists to work in the way they are used to, where one project may build off of another and use many of the same properties or data files. For example, a Project would "remember" that a network (saved as an XGMML file) was created by reading in a SIF file, and associating three node attribute files and two edge attribute files with it. The goal would be data provenance, not necessarily to be able to recreate the initial data sets. Moving forward, Projects could include logs (Cytoscape 3.0 logging), user created scripts/macros, workflows, etc. While the Projects panel might list all available projects, only one project would be opened at a time. As a result, a project could replace a Cytoscape Session as the main mechanism for saving state.
 Line 13:
-It seems pretty clear that our users think of their work in terms of projects.  This was apparent in the information we received from the users who were interviewed at ISB in preparation for the Cytoscape 3.0 discussion, and anecdotally from other user communities.  In addition, Cytoscape does nothing to help users track their data provenance.  Where did these attributes come from?  How was this network created?  What steps were performed to expand the network?  All of these are legitimate questions that Cytoscape provides no means of assisting users to remember.
+Scientists are accustomed to thinking about their work in terms of projects where all of the various types of data and analysis they have done are stored together, but separately accessible for use in other work.  Many pieces of open and proprietary software used by scientists use this concept now to help organize and share work (such as CPAS and SBEAMS). Cytoscape does nothing to help users track their data provenance. Where did these attributes come from? How was this network created? What steps (including what data sources used) were performed to expand the network? All of these are legitimate questions that Cytoscape provides no means of assisting users to remember.
 Line 16:
-~-''Provide examples of how the products of this project will be used.''-~
+  * Tracking what data and sources were used to create or expand a network

  * Easily copying a visual style from one Project to another allowing users to continue using their work (rather than creating it de novo each time)

  * Sharing all of the data, properties and attributes easily

  * Saving the full state of Cytoscape in a given project including:

    * Properties

    * Data

    * Plugins/Themes loaded or used

    * Macros/scripts

    * Others?
-Line 19:
+Line 28:
-There are two major pieces to this implementation and a phased approach is certainly viable in this instance.  The general approach to the implementation is to begin by expanding the definition of a session to include some limited data provenance information.  Specifically, this would involve including hooks into the network loaders, Cytoscape editor, attribute loaders and the attribute browser editor that would update a network attribute that lists the sources that make up the network.
+There are two major parts to implementing a project system in Cytoscape.
-Line 21:
+Line 30:
-The second major piece is to provide a visual representation of the data provenance in the project panel (which would replace the network panel).    The vision is to have a JTree-style interface where underneath the network itself, you would have the views on that network, as well as the information about what data made up that network.  The data fields would provide limited information (date, time, path?), but would still serve as a useful reminder of the elements of the network.  Future enhancements might include adding a network-specific log which would function as a sort of laboratory notebook.  In additional to potentially recording the execution and results of Cytoscape actions, it could also take user-generated comments.  Beyond that, it might be reasonable to associate scripts and workflows with specific networks.
+=== Expand the definition of a session to include data provenance information ===

Both will be simplified through the proposed [[CytoscapeLayerRefactor| refactor]] of Cytoscape.  In a relayered system the hooks into loaders in the IO layer can be added to track the data imported into a project from any source.  Hooks added to the Application layer would allow for the association of scripts, macros or workflows with a particular project as well.



=== Visual representation of data provenance ===

This visual representation can be dealt with in the View and Application layers.  The views on a network and information about the data in that network can be provided to the user in a format similar to that of the Eclipse IDE where projects contain trees of information that can include the type, date and path of any particular piece of data within the tree.  This sort of system would make it easy to add new features to a project such as a log file that could function as a sort of laboratory notebook, recording the exectution and results of Cytoscape actions (including plugins) and notes recorded by the user.



'' '''Prior to relayering''' ''



Hooks into the current network and attribute loaders can be added.  The current network view panel can be replaced by a Project panel that can initially list the network and attributes loaded and potentially the visual style currently applied.  This panel can also start by listing the network, attributes and visual style.  This particular work is not necessary to a 2.x release but could help prepare users for the switch.
-Line 26:
+Line 43:
-~-''Provide a timeline for implementation. Insert a graphic if you can. Try this free online tool for making project timelines -> [http://www.helpuplan.com/index.asp Help-u-Plan] (create a new chart; modify; right-click to save gif; then attach to this page)''-~
+This work will depend on the relayering of Cytoscape and can be started during the refactoring of packages (See [[http://cytoscape.org/cgi-bin/moin.cgi/CytoscapeLayerRefactor#head-60085f879c96ab49cb30ebe4ab5174161920863b|Milestone 1]]).
-Line 30:
+Line 46:
-~-''Outline the major milestones and tasks involved in implementation.''-~



 1. '''Milestone 1: Backend Design''' 

  1. Task 1:  

  1. Task 2: ...

 1. '''Milestone 2: …'''
+ All estimates are in addition to the current estimates from Milestone 1

  1. Milestone 1: Track Loaded Data (est. 6 weeks)

    a.	Add hooks into the IO package to track loading of data

  1.	Milestone 2: Project View (est. 4 weeks)

    a.	Create view of project that allows registration by various sources  (for easy extension for actions, macros, logging etc at a later point) for the data loaded in IO

  1.	Milestone 3: Project Panel (est. 4 weeks)

    a.	Create a panel in the Application package to display the Project View
-Line 40:
+Line 57:
-== Related RFCs ==

~-''Link to other related RFCs''-~
+== Issues ==
-Line 43:
+Line 59:
-== Issues ==

~-''List any issues, conflict, or dependencies raised by this proposal''-~
-Line 47:
+Line 61:
-##If you want to create a separate subpage for Comments, then provide this link:  ["/Comment"]
-Line 49:
+Line 62:
- *''Add comment here…''