Hackathon2005 - Cytoscape Wiki

This is just a brainstorming session right now. We can prioritize discussion and ideas for projects later.

Discuss the difference between node hide and delete concepts -> Goal: get conceptual clarity and a plan to document/implement any necessary changes based on this. Also select/deselect vs. flag/unflag

HACKATHON NOTES: Delete

Concepts

GINY is the API for all graph-related stuff (view and model). It is independent of Cytoscape. It contains:
- RootGraph: contains nodes/edges
- GraphPerspective: contains nodes/edges
- GraphView
FING implements GINY. PHOEBE implements the view part of GINY. But, this is hidden from users and Cytoscape.
Cytoscape builds on GINY by extending or implementing.
Cytoscape components:
- CyNetwork extends GraphPerspective
- CyNetworkView extends GraphView
- CyNode implements cytoscape.giny.Node extends giny.model.node
- CyEdge same as CyNode
- CyAttributes - we are happy with this
- cytoscape.visual: Mapping, Visual Style, Calculators, Manager, etc.
- Filters API

How things are right now with delete stuff

Cytoscape API: removeNode/removeEdge with a "permanent" argument
Cytoscape API: addNode/addEdge with with a "create" argument
Cytoscape API: createNode, but not add it to any CyNetwork. We like this.
GINY API: hide, not in GP(GraphPerspective) but still in RG (RootGraph)
GINY API: restore, add it back to the GP

Meeting proposals about delete

Proposal, Global remove: Cytoscape.remove()
Proposal, Local remove: CyNetwork.remove()
Encourage coders to use Cytoscape methods, not GINY methods? CONTROVERSIAL...
Do we hide GINY from coders?
(not related to delete) Proposal for a GraphObject class that Edge and Node extend. This would simplify the API by not having methods like this: removeNode, removeEdge, addNode, addEdge, etc. High priority for 2.3!
Final conclusion: OK, yes the OO design of CyNetwork is ugly, but, we are not going to fix it now. We are going to document.
Delete and attributes. Local and global attributes? Tag attributes into types (user loaded or computed)? It is clear that we want local network attributes. If a network is destroyed, then its local attributes would also be deleted. We could have a UI to separate attributes that users want to keep, and attributes that users do not want to keep (when nodes/edges/nets are deleted).

Conclusion on delete and attributes: Local (CyNetwork), global (Cytoscape). Seems to solve most of the problems in that you can choose whether attributes are local or global.
Separate hairy issue: deleting and plugins. A plugin could delete attributes or graph-objects that other plugins need. Plugins should never try to talk to each other, if they did, there is lots of potential for bugs. Need Plugin "Good Citizen" guidelines. If plugins do not follow these rules, they risk not being used. This is a documentation issue.

Concepts on flagging

Why does the view have select methods and the model has flag methods? This is because originally, all graphs had views. But in 2.1, this is not true. Some graphs may not have views. Graphs without views need to be able to have selected nodes/edges, hence, flag methods in CyNetwork.
Flagging is a way by which plugins talk to each other. For example, filters flag nodes that pass a filter. Then, a separate tool can act on the selected nodes (like create subgraph).
Clipboard could be helpful, but flagging is pretty much the same thing.
Final conclusion on selection vs. flag: only have "select" method at the CyNetwork level. Deprecate CyNetwork.flag and CyNetworkView.select.

Improve conceptual clarity and better document the core and graph model - Is the root graph a multigraph or a graph? Is the root graph directed, undirected or mixed? How about graph perspective? Can nodes or edges be duplicated? etc.

HACKATHON NOTES: SEE METANODES NOTES.

Discuss switching to a numbered node system instead of the current node ID as a string system
- This may be required for some graph editing use cases (having nodes with no name yet) and two nodes with the same name.

HACKATHON NOTES: Switch to number node IDs

Summary of final conclusions

Use a unique String ID that is generated by Cytoscape (maybe Strings parsable as numbers).
These IDs are NOT visible to users (for example in the attribute browser).
CANONICAL_NAME, COMMON_NAME should go away, instead use:
LABEL attribute.
SIF reader has to ensure that there are no duplicate labels. SIF reuses a label if it can find one. We are going to write SIF files.
Many implementation issues, the persons who deal with this will have to think about importing and exporting. Import: OK to use old attributes file. Export: Not expected to use old attribute file format.

Brainstorming ideas, not final conclusions!

Problem: canonical name is being used as unique id, label, and DB key. We need to separate these concepts. Use numeric IDs (Strings)!
BUT: Currently, node and edge indeces are not persistent across Cytoscape sessions. If you save a sif, this does not matter. For GML it might.
BUT: Currently, node and edge indices are reused if nodes or edges are deleted. ???
Proposal: use unique numerical ids for nodes, each node has a label, which is not necessarily unique. Separate discussion: get rid of CANONICAL_NAME, use LABEL. Numerical String IDs are used as keys for attributes. So they are available to programmers, but hidden to users.
API proposal: Only use graph objects on methods. Not IDs. For example, a lot of methods look like: int [] getAdjacentEdges (int nodeIndex);
Use case: nodes with the same label, but different molecule type. SIF cannot handle this. GML can. Maybe we sould need to handle this for SIF.
Cytoscape uses attributes to set labels. It does not use CyNode.label. We should only store the label either in attributes or in CyNode.label. Which one should it be? Big discussion follows.

Currently, labels get stored in an attribute. Some agreement exists in that it should be stored in CyNode.label. But this presents problems for the attribute browser, bacause users like to view the label in a column, and, attributes can exist separate from the graph. To solve this, the attribute browser would display CyNode.label as a column. This means that the browser can only exist if there is a non-empty root graph (ugly?). In terms of usage, it means that you cannot view expression data that you loaded if there is no graph loaded.

SIF. We need a new more informative SIF format. We still want to support the old SIF. The mechanism to read this old SIF should be the same as it is currently. If a GML is loaded first, and then a SIF, then nodes would not be duplicated (just as it is now!).

Discussion of undo manager

HACKATHON NOTES: undo

Summary

Allan described his undo data structure (a stack).
Complications: multiple networks, global operations AND local operations.
Proposal: Visual undo manager that shows operations so that users can select actions to undo. Have a global stack and a stack per network.
Suggestion: only make certain actions undoable, not all. We need to find use cases.
Currently undone: attribute browser cell editing, restore deleted nodes.
Undo layouts???
We need requests about what actions should be undoable!!! (cytostaff list)
- add/remove nodes and/or edges
- edit attribute value
- destroy network
- destroy nodes/edges

Prioritize subsystems for refactoring PROPOSAL FOR RETREAT

Discuss new documentation options e.g. wiki based, which can be translated to PDF and Java help. DISCUSS HT

HACKATHON NOTES: Documentation

Wiki based
Table of contents with links to chapters/sections
HTML -> PDF tools could be used.
Seems like Wiki is enough for aesthetic purposes.
Mike suggested using DocBook, but it is not as easy to use as Wiki. On the other hand, DocBook has a lot of file type conversion capabilities, including to Java Doc type.
It would be useful to have at the end of each page a discuss link that allows users to enter their own advice, documentation, comments, etc.
Ideal solution would be to convert Wiki to DocBook using a script. Googling resulted in several possibilities of converters.

Plan core code cleanup - removing old libraries, old classes, clean up of package structure.
How can we package code so it is easier for developers to load - we have a problem with core plugin and library code being in too many places.
Clarify and document the difference between core and non-core and decide on policies for making future decisions on this.

HACKATHON NOTES: What is core, and what is plugin?

Action items:

We need to come up with a basic review process by which plugins become core plugins: not biological, peer reviewed (not buggy, stable), useful to users. End point of review process would be to decide where to include the new plugin/library. We will come up with a formal proposal.
We need to have one single Java Doc that contains everything that coders need.
Wiki that contains links to Java Docs and downloading sites of all libraries that Cytoscape uses.
We do not HAVE TO reorganize the core (right now). But, looking forward, we should try to keep things more organized.

General discussion

We need to know where new code fits in: core, plugins, core plugins, library???
Library vs. plugin. Plugins that are used as libraries should be very modular (API separate from the plugin class). Plugins that make use of these libraries, include the library jar in their paths.
lib/ : what should it include?
what is core???
Most of the group agrees that all source that we own should be included in the core. ant can have separate targets for each component. But, then anyone can change code that someone else wrote.
(SIDENOTE) Have a controled vocabulary area for bio-semantics (for example, PROTEIN, DNA, etc). This should probably be a "semantics plugin".

Review community development process and core coding conventions, etc -> goal is to increase quality of our codebase and application. ETHAN RELATED, RETREAT

Discuss user interface standardization issues (Benno brought this up 2 years ago).

HACKATHON NOTES:

All agree that this is a good goal. No time to do it right now. We have bigger problems.

MetaNodes
Discussion of how to handle "compound" node types -- e.g. complexes, families (sets) (related to MetaNodes above...) (Note: I'd like to make this a high priority for discussion. Alex Pico of the GenMAPP project will be participating in the Hackathon and can go into considerable depth on the requirements for these constructs. It would be very useful to have this discussion while we have the opportunity to pick Alex's brain).

HACKATHON NOTES: Metanodes and hyperedges.

Alex Pico could not get in a plane. So we had the discussion without him.

Summary

We agreed that we are no longer dealing with a 'traditional' graph (directed or undericted, one level). We have mixed types of edges (directed, underected) and metanodes.
New terminology for our complex graphs (meta-mixed-networks?):
- multi-edges between nodes
- mixed edge direction
- metanodes
Work group should do some research to see what's out there, what has been done, etc.
Have 'renderers'/'modelers'/'converters'/'mappers' (we did not agree on a term) that depict meta-networks in different ways. Each iterpretation of a meta-network would be a GraphPerspective. The GraphPerspectives need to be connected to each other so that they correctly reflect changes to the model. This plugin/code would be a layer on top of Nerius' renderer.
This is a very BIG job. We need to find use cases and take care of those.
Need a work group!!!! Show of hands: Allan, Iliana and Melissa want to get really involved in this. Gary wants to review progress and have some input.

Discuss new rendering engine that Nerius is building

HACKATHON NOTES: New rendering

Summary

Nerius showed us his new rendering tool. There was general agreement in that it was impressive and fast.
His API only contains static methods.
The static render method takes objects that specify node and edge details (positions, colors, etc). This facilitates communication to databases that contain network information.
Nerius says that we will use this new renderer for 2.3! Cool.

Simplifying file formats and implementing the save session feature. Does .sif need to have two ways to specify interactions? Do we need GML if we have a save session file?

HACKATHON NOTES: File formats

Current types of file formats in Cytoscape:

sif, network, i/o
gml, network, i/o
noa, attr, i/o
eda, attr, i/o
expression, TP, i/o
.onto, .anno, .syno, .obo (BioDataServer)
GO .onto, .anno, .syno, (Kei)

Issues on each file format:

expression file: should be a noa file
GO: information gets loaded onto runtime memory, not reasonable for big species,(different topic: databases)
GML: We are not respecting the format. Node integer IDs are ignored.

Action items:

Test Rowan's file format for attributes
Other items under general heading: Questions/Issues/TODO on saving state below