CytoscapeRetreat2006/Hackathon/HackathonNotes

Notes from Cytoscape Hackathon

Observations on Cytoscape Usability (Melissa)

Motivations: assess Cytoscape usability from biologists perspective, address stumbling blocks in the UI, assess biologists respeonse to vizmapper and filters.

Interviews with 9 users, conducted in workspace, presented in context of users' data and analyses, observed how users worked with software and explore new functionality.

Ran a brainstorming/prototyping workshop with users from usability studies, 2 groups, each group selected one topic to prototype with screen-dumps, post-its, etc. and were videotaped.

Issues raised were:

Documentation:

PDF-format manual too monolithic, HTML preferred

Online tutorials not used -- they want tutorials oriented around biological tasks, not software functionality

Question: how does this fit with Cytoscape's being domain-neutral? Suggestion: have a top-level set of domain-specific tutorials, with pointers to more detailed software functionality tutorials.
Issue: given the funding structure, is there enough incentive for groups to develop good documentation/tutorials? Suggestion: be creative in supplying a foundation that users can extend with Help documentation. Also, ISB may have some funding for this kind of work -- this should be brought up at the board meeting.
Also, a glossary would be critical
Suggestion: incorporate the tutorials more with the software

Data Import and Export

exporting data from Excel into Cytoscape is a big bottleneck.

PSI-MI now has an excel template that we could use when importing protein-protein interactions.

input file requirements not clearly understood

suggestion: have the input dialog indicate what programs can be used to view the different types of data files

why don't users look at the sample data?

some examples too large to be viable -- unfortunately BIND_human and BIND_yeast show up first in the list of sample files. Suggestion: have a demo directory. GalFiltered files are quite useful, should appear first.
Distinguish files created by people from those created by machines.
ACTION ITEMS generate a demo directory
users discouraged when they see examples from another domain. Suggestion: work with user community to expand the breadth of the examples. Suggestion: add wording to top-level Cytoscape description to make clear that Cytoscape supports multiple organisms.

Connectivity to External Databases

all users would benefit from improved connectivity for network and attribute data. Benefits: user doesn't have to search web for data, format translation performed automatically. Suggestion: provide generalized database wrappers.

GenMapp to provide 'back-page' functionality to multiple databases for Cytoscape. Much of this can come from Ensemble, which has become very efficient for data base provision. GenMAPP would provide a Gene database.

General User Interface Issues

lots of confusion with the menu system. typically they scan the menus from right to left to relocate menu options. Suggestion: move more things to right-click menu and simplify global menu.

not enough feedback in operations like selection or filtering; in import operations, make clear when the import is done.

some ambiguity, can close a task bar and an import will still work.

lots of menu items that don't do anything unless some conditions are met. Suggestions: disable menu items when input conditions are not met. Would want a library of tools that handles enabling/disabling of menu items.

Vizmapper intimidates new users

"Calculators" and "Map attributes" reflect programmer thi;nking, not user thinging. Suggestion: provide direct controsl tying attribute values to visual properties.

"Define", "Delete", "Duplicate" visual style scares users

Filters

Everybody wants them but no one -- not one -- ahve been suggessful with them. Incremental laytering of filters (a la topological filter) natural for Computer types, but not for biologis.

Search/filtering are similar. Cytoscape design should reflect tis.

Graph Layout

Every single user was comfortable with graph layout est

Unresolved use case: how to represent a conditional edge -- advanced user studying closely-related networks, certain edges exist in only some experiments. Some sort of visual filter might be useful. Or use some gradation via translucency.

Plugin architecture

users not aware of plugins: suggestion: incorporate some plugin documentation into manual. Or provide something along the lines of "eclipse update".

users not comfortable installing JAR files. Suggestion: provide template install scripots.

users find the separation of "core" and "plugin" strange. It's all Cytoscape. Suggestion: break the convention that plugins use plugin menu. Need some structuring of plugins. Perhaps another look at menu items to see if all types of plugins can be handled by a small set of menu items.

Suggestion: have a plugin management system that maintains information about the plugin and handles loading of plugins.

Suggestion: use the term "Extension" rather than "plugin". Mimic foxfire's handling of plugins. Have a "search for new features" capability.

Suggestion: provide feedback about what plugins (and versions of them) are loaded.

Advanced User workarounds

For one user who wanted to view conditional edges, he: 1. used Cytoscape to generate GML, hen used script to construct PDB file from GML, the cmpared PDB files with ta molecular modeling viewer.

Another user had plugin that specified node by name, node is fetched from a larger network, with right-click menu user can extend network with first neighbors. Node border indicates if there are any neighbors not shown. Use edge typoe to reference publication documenting interactions. Visual properties are assigned using edge classes.

users choosing not to update their Cytoscape installation

many users at ISB still run Cytoscape 1.X

why? key plugins no longer compatible with core. Changes to input file specs. Other changes to key concepts, not always obvious.

Brainstorming/Prototyping workshop

30 minutes of open brainstorming, followed by construction with screen shots, markers, post-it notes.

include users that are already familiar with the process.

main categories: filtering/selection/attributes, data import/export/management, installation and architecture, graph layout, combining multiple networks and multiple heterogeneous networks.

brainstorming ideas: dynamic filtering, global network annotation, integration w/ public dbs, multiple heterogeneous networks, multi-level analysis (e.g. motifs/domains in the proteins of the network).

multi-level network (network, protein complex, protein, domain, peptide, GO term). Nodes replaced by corresponding domains. Similar domains merged into one node.

Conclusions

ideas are simple but not communicated clearly

simplify the process of improt

simplify visual property management, even if it means losing some flexibility

integrate plugins more closely with the core.

2 - Foundation Architecture (GINY Refactoring) - Mike Smoot

Edge IDs should be addressable, just like node IDs
- General sense is that edges should be addressable. IDs should be unique
- Concepts: 1) root graph integer identifier, 2) edge string ID, constructed from node IDs
- Assumption: it makes no sense to have two edges with the same nodes, direction and interaction type. - 3 attributes make an edge unique: source, target and type.
  - Action item: clarify this in the documentation.

Issues:

However, type is an attribute and this can be changed by the user, which is not good for the purpose of computing edge identifiers.
- Solution, action item: make the edge type attribute immutable
Edge name is annoying to create in edge attribute files i.e. "source (edge type) target" - this is solvable by import code (which can support other file formats for edge attributes)
There is a create identifier method in CyNode and CyEdge classes, but these can only be used once - do we need these methods? This should be refactored so that there is a clear constructor like Cytoscape.getCyEdge.

Action item: getCyEdge needs to be refactored - it has a for loop, which makes it slow. Maybe a hashmap would be better?

        public static CyEdge getCyEdge(Node source, Node target, String attribute, Object attribute_value, boolean create, boolean directed)

We should ensure that all of Cytoscape currently assumes directed edges
- Answer: yes! (for now, undirected edges are not implemented)
how/when do we deal with undirected edges?
- Answer: we don't - action item: refactor Cytoscape to deprecate methods that contain boolean directed in the signature (and add a new method that doesn't have the directed flag to replace them).
GINY metanode refactoring: remove meta methods - all functionality will be present in the metanode/group API. This will significantly simplify the GINY API and data structure allowing developers to be more confident that their algorithms will work on CyNetwork in general.
- Decision: yes - deprecate all meta methods when we have a replacement of the GroupAPI - only for 2.5. For 2.4, we keep the meta methods active. Issue, current metanode and xgmml reader uses these methods. They should be deprecated rather quickly, since not many people are using these.
Refactor the GINY interface/inheritance architecture: e.g. should we encapsulate GINY in CyNetwork? See CyNode_Identification
- Decision: Yes. Action items:
  - Ensure CyEdge, CyNode, CyNetwork are interfaces.
  - Encapsulate GINY classes inside Cy* classes. Cy* classes use, but don't extend GINY classes - GINY should not be visible, but we may need a method to get the encapsulated data structures, like the root graph (we need to make a decision later about whether to make these visible or not)
  - Need to figure out what plugins really need (e.g. warning sent to the discussion list)
  - Make a recommendation now to use CyNode and CyEdge now (and not Node and Edge)
  - Start by removing the implements on CyNode and CyEdge and see what happens from there.
  - More unit tests for giny to help us make these changes
- Decision: save this for 3.0 (potential big impact)
Clean up deprecated methods
- Decision: make a task to check for removing deprecated methods at every release.
Clean up Cytoscape class – duplicated methods in CyNetwork vs. Cytoscape class and the general bloat of the Cytoscape class.
- Decision: check how much each method is used by plugins in SVN and make decisions about which classes should be moved. Best not mess with highly used methods
  - Methods like confirmQuit, etc. should be moved to CytoscapeInit
  - Many variables could be made enumerations
  - Code that is used only by the core can be moved out.
  - Part of the clean up is to give it a clearer definition of what belongs in there - it should just contain the commonly used methods - the main part of the API.
- Decision: this is a 3.0 change (potential big impact)
GINY lives on Sourceforge - we have free reign to do whatever we want with GINY as far as our use, but the GINY sourceforge site will exist as interfaces (according to recent discussion with Rowan, by Mike)

2a - Network specific (local) node and edge attributes - Mike Smoot

Use case from GenMAPP group
- Use case: a gene that can exist both within the nucleus and cytoplasm and you want to display them at the same time.
  - Requirement: some attributes are shared between nodes. Decision: this can be accomplished with current Cytoscape functionality
- Use case: Agilent LitSearch - requirement: store different attribute values for the same edge e.g. a search with the same genes using difference contexts, one 'cancer' and one 'diabetes', which provide different sentences on the same edges
- Use case: scoring edges using the same algorithm with difference parameters
- Decision: don't need to implement this. The workaround to create multiple edges with different edge types is fine. It would be very complex to create network specific node and edge attributes.

2b - Memory management- Mike Smoot

When networks are destroyed, nodes, egdes and their attributes are not actually deleted, but maybe there should be some garbage collection if users want. Also, this allows attributes to pop back up for new nodes that are identically named to previously deleted nodes
- Decision: it would be good to free memory when nodes and edges are not used any more (including all attributes referenced by the nodes and edges).
- Issue: we don't know how to do this yet. Action item: investigate the code to figure out how this works. Need an implementation proposal.

2c - Plugin Architecture- Mike Smoot

How can plugins know which attributes are relevant to which network?
- Not sure what this means - who added this?
How can plugins depend on other plugins?
- Some discussion of 2 phase loading
What is a plugin if the core developers are maintaining it?
- Still open for discussion.
Action item: The Cytoscape plugin project (like sourceforge) - plugin writers could register themselves with the community and a bunch of services would be available e.g. loads, run unit tests, etc. - we just run automatic testing and report the results. We don't actually do any manual testing. Benefit to plugin writer, you show up on the list of plugins that the user sees and can download. Goal is to create a nice automated test site. Benefit to Cytoscape is it is easier to track community, and to know which methods are in use by the community. Benefit to users, they know more about which cytoscape plugin versions work with which version of cytoscape. Different levels of compliance - documentation, ant file, etc.
- Action item: for Ethan - how well would Maven work for this?
Data-aux.jar - Ethan: are all of these still necessary
Decision: create a plugin/libs directory for all plugin libraries

Further discussion of overall package architecture. Mike showed a Cytoscape representation of the dependencies among packages (made using jdepend). We should go through the libraries that we use to see how much we are using them. If we are only using a small part of a library, we may want to reimplement that part and save on using the whole library.

3 - BioDataServer

URL based ontology and annotation readers.
One window for loading all information.
Users can import local or remote files.
Consider auto-sizing columns - implies horizontal scrollbars (will be included in 2.4)
Make delimiters use a combination of delimiters rather than just one, e.g. tab and space.
Very cool!
Consider using the same text preview for importing network text (will be included in 2.4)
- Specify one column as source,
- one columns would be target,
- other columns could be interaction or edge attributes
- Also, users can select other columns as source or target (node) attributes.
Consider loading the whole file and allowing sorting based on columns. (will be included in 2.4)
OR, say "first 100 lines of X shown"
What if the first 100 columns are different than columns 100 through X?
Have a check box that says "show all lines" so that users can see all of their data in the preview, instead of just the first 100 columns.
Make the "Attribute Data Type" box in the "Set Attribute Name and type" dialog should be hidden behind an advanced checkbox so that users don't see it each time.
Column highlight needs to be fixed.
Ontology root should not exist, particularly if no ontologies have been loaded.
Ontologies should only be shown as network if a checkbox in the import window has been clicked. Otherwise, don't show the ontology as a network. This implies that CyNetwork will need an option so that it will not be displayed.
Ideally there will be a separate panel (next to Network and Editor panels in Cytopanel 1) with a treeview of the ontology along with whatever other infomation might exist.
We can import anything in OBO.
We should probably only show the first 5 (or so) ontologies, although provide the option to see all of them.
BioJava appears to fit well and fulfill the requirements

Bookmarks

Where should bookmarks go?
Maybe JAXB for bookmarks is too complicated?
Not sure what the resolution for bookmarks is.
Will be a part of more flexible properties system in Cytoscape 2.5.

4 - Event handling - Mike Creech

Goal is to stimulate thinking about EventHandling
- Problem: spotting, inconsistent and undocumented - difficult to trace
- Documentation may be the first order of business
Complexities (see linked page for details):
- Should there be before and after notifications?
- Listener needs to know the state and consistency of state per object (e.g., node deletion)
General agreement that fixes are direly needed, but how to approach the problems
Possible solution strategies and guidelines (see linked page for details):
- The state and consistency of objects should be easily determined and required when firing noifications
  - Major concern here related to deleting nodes; real deletion should be recorded in a state manager
- Hack around performance issues:
  - Plug all listeners during major functions like session loading
  - Look at state first and only fire if state matches, allowing one to wait until a function is done or to batch
- Gary: caWorkbench has developed an event handlinng model: 20041001_caWorkbench_presentation.ppt
  - Communication Models: Asynchronous
  - Java-based model to improve performance for a delegated listener system
Implemenation ideas:
- Gary: Do we want to adopt an event model and covert all, or do we want to carefully pick through and fix according to guidlines?
- We need to transform these ideas into proposals for group feedback.
- These changes need to be slated for a major revision, i.e., 3.0
- What we can do now is work on serious documentation, i.e., 2.5 action items:
  - We should all make an effort to document current/future projects
    - Add custom tag for Java Doc, e.g., @fires
  - We should write up clear developer guidelines
  - Dedicate a registry/package for event handling
  - Begin initial design of event model
- In addition to documentation, we can begin to tackle big, batch items that will have the largest effect on performance

5 - Integration of Hyperedges and Groups APIs into the core

Big Questions: does a particular hyperedge or group exist only in one network?
- If yes, then this vastly simplifies the model
- If no, then we have a ca of worms
Answer: yes.
- When a group is copied it is a new node with independent set of attributes
- Changes to one copy of a group does not effect the other copies
- Problem: edges to a group will be lost when copied
Answer: no.
- When a group is copied it is a shared copy where attributes (e.g., child nodes) are global
Group nodes have a dual nature: node (shared) and network (copy)
Shared implementation will be easy to do and preferable as a first stage. It'll be easier to add "copy" features to a "shared" implementation versus the other way around.
With a copy solution we still need an abstracted group editor window which is shared with the parent (e.g., Scooter's use case)
Default should be to share; option should be to copy

Conclusions

Shared version of groups will be identical networks; if changes are desired in one copy, the the group is made into a copy and they are no longer shared. The shared implementation will be easier to implement with copy functionality added on top.

We really need a thorough write-up of the use cases and implementation details (i.e., at each each level: root, network, view).
We do not want a group or hyperedge to exist outside of a CyNetwork; it must exist in at least one network.

==== Brainstorm Session == Update use cases and design phase on GroupAPI page November 15th Due Date!

VizMapper discussion

Mike reviewed the existing VizMapper UI
Suggestion: link style within the attribute browser
- Right-click on column would provide the option to map a style for that attribute
- Could also right-click on a cell to provide specific style for that cell
Suggestion: assign visual styles for filter results
General point: visual style, filtering, and searching are all interrelated
Question: should style be attribute based or style based
Suggestion: should also be able to set a style for a particular node/edge or all nodes with similar attributes
Question: do we want different interfaces for different use cases?
Mike: Vizmapper is really over-engineered. Mapping and mapping types are very confusing. In a lot of cases, you could infer the type of mapping.
Ethan: Attribute first allows us to make the UI data-aware.
Suggestion: does it make sense to use a property-sheet or style-sheet metaphor? This might break a number of things, though.
- Might just have the "style sheet" interface create a new attribute.
Ethan: it might be best just to redesign the current basic VizMapper, then worry about the override.
Mellisa: perhaps do a phased approach -- phase 1 might be to take the current interface and move it to a CytoPanel to make it easy to access it for users.
Ethan: Currently can't combine attributes (except through the Attritube Modification interface, which is very difficult)
Perhaps expanding the Attritube Modification interface to allow calculations between attributes
Override: seems clear that you are going to right-click on a node and change the values, how do we change the current vizMapper?
Suggestion: would like to see an overview of visual mappings that are currently set at any point in time
Suggestion: get a list of attributes -> get a sample of the values, and can set the desired visual representation
Suggestion: use a four column view: attribute - property - value - sample
- Might add a checkbox for "use for legend"
- There is some argument for having properties first, then attribute
- There could be a "+" button to ask for another attribute or property
Issue: color wheel is sometime difficult to reproduce -- could display the RGB or color value to make it reproducable-

Annotation discussion

Add support to have an arbitrary number of canvases (not just background & foreground)
Consider each canvas to be a network, use the network rendering capabilities already in place to support rendering
A layer manager class / api should be created to manage the multiple layers
By default basic set of canvases/attributes are created (ie, default background annotation & foreground legend)
Have advanced settings for advanced users
Create layer manager view (akin to network manager panel/view - see adobe illustrator)
Considerations:
- add things like stretching later on
- how do arbitrary icons/images interact with vizmapper
- how are canvases differentiated

Layout discussion

Layout plugins should automatically get loaded
A basic set of layout algorithms (Cytoscape layouts) should get moved into the core
Plugin writers should be able to determine what layouts are available
Ethan: we should use jung library as reference api implementation (it has cool things like ability to debug layout outside of gui)

CytoscapeRetreat2006/Hackathon/HackathonNotes (last edited 2009-02-12 01:03:44 by localhost)