Differences between revisions 2 and 3
Revision 2 as of 2008-04-17 18:07:39
Size: 8213
Editor: GaryBader
Comment:
Revision 3 as of 2008-05-01 18:11:33
Size: 9410
Editor: nebbiolo
Comment:
Deletions are marked like this. Additions are marked like this.
Line 21: Line 21:
 * Keep APIs simple (i.e. don't create multiple APIs for the same model unless there is a clear use case). Use case: easier for core developers and plugin writers to work with (from cyto2 experience)  * API dependencies must be '''''acyclic'''''!!!!
 *
APIs should be non-redundant (i.e. don't create multiple APIs for the same model unless there is a clear use case). Use case: easier for core developers and plugin writers to work with (from cyto2 experience)

----
Line 24: Line 27:
 * Support for graph, multi-graph, hypergraph and nested graph. Use case: represent basic networks with graph, multiple types of edges in a protein interaction network with a multi-graph, biochemical reactions using hypergraph, protein complexes using nested graph.  * Is extremely fast and memory efficient for creating and updating. Use case: large graphs up to millions of nodes
 * Support for a multi-graph. Use case: represent basic networks with graph, multiple types of edges in a protein interaction network with a multi-graph.
 * Support for hypergraph. Use case: biochemical reactions using hypergraph.
 * Support for nested graphs. Use case: protein complexes using nested graph.
Line 26: Line 32:
 * Ability to read/write model using XGMML. Use case: saving and loading a session
 * Is extremely fast and memory for creating and updating. Use case: large graphs up to millions of nodes
Line 30: Line 34:
 * How do we implement more complex graphs e.g. hypergraphs at the same time as regular graphs?
   * Possible solutions are:
      * have different classes for each that inherit from the most general class;
      * overlay all types on one simple graph model e.g. use one graph model with special flags for nodes and edges; have a loose association of classes linked only by node and edge IDs, where the more complex model maintains its own consistency if related models are changed. Note: a hypergraph is not a type of graph - it is the other way around.
      * Don't support hypergraphs at all. I think that hypergraphs are a very complicated abstraction and possibly not the correct one for biochemical reactions.
 * How should these related aspects of the model be kept consistent e.g. are events needed in the core? ''Yes, events are needed in the core for the network model to communicate state changes to classes that depend on the network (but not vice versa).''

=== Design Ideas ===
Line 31: Line 43:
 * How do we implement more complex graphs e.g. hypergraphs at the same time as regular graphs? Possible solutions are: have different classes for each that inherit from the most general class; overlay all types on one simple graph model e.g. use one graph model with special flags for nodes and edges; have a loose association of classes linked only by node and edge IDs, where the more complex model maintains its own consistency if related models are changed. Note: a hypergraph is not a type of graph - it is the other way around.
 * How should these related aspects of the model be kept consistent e.g. are events needed in the core?
----
Line 36: Line 47:
 * IO compatible with XGMML. Use case: saving and loading a session
Line 42: Line 52:
   * Large amounts of boilerplate code.
   * The current global CyAttributes objects result in many very problematic dependencies that aren't apparent in interfaces.
Line 45: Line 57:

----
Line 80: Line 94:

-----

=== Events ===

 * Define a low level event mechanism that can be used by modules to communicate their state changes to other modules that depend on them, but not vice versa.

=== Open Issues ===
 * Use an inheritance model or Event type model?
----

=== I/O ===
 * Read and write networks as XGMML.
 * Read and write attributes as part of XGMML.
 * Design general interfaces that support exporting different aspects of the Cytoscape system.
  * Export just network topology.
  * Export network topology AND graphical information.
  * Export attribute data.
  * Export images of networks.
  * Export Cytoscape session files.

=== Open Issues ===

TableOfContents()

This is the homepage of the Cytoscape 3.0 Model layer design discussion.

Model Layer definition

The model layer contains the object/data model for core data structures useful for Cytoscape or Cytoscape-like software.

Component Modules

  • Graph
  • Attributes
  • Identifier policy (for objects that need to be referenced, like nodes/edges)
  • Hypergraph? (is this different from Graph? see open issues)
  • Groups? (is this different from Graph? see open issues)
  • Project?

Requirements

Core structure of the Model layer

  • APIs should be easy to use. Use case: to reduce bugs and duplicated code (from cyto2 experience)
  • Minimize dependencies between APIs. Use case: to reduce maintenance work (from cyto2 experience)
  • API dependencies must be acyclic!!!!

  • APIs should be non-redundant (i.e. don't create multiple APIs for the same model unless there is a clear use case). Use case: easier for core developers and plugin writers to work with (from cyto2 experience)


Network/Graph

  • Is extremely fast and memory efficient for creating and updating. Use case: large graphs up to millions of nodes
  • Support for a multi-graph. Use case: represent basic networks with graph, multiple types of edges in a protein interaction network with a multi-graph.
  • Support for hypergraph. Use case: biochemical reactions using hypergraph.
  • Support for nested graphs. Use case: protein complexes using nested graph.
  • Ability for user to determine what type of graph they are working with: e.g. graph, multi-graph, hypergraph, nested graph (contains groups). Use case: algorithm writer needs to know this so they avoid running an algorithm on an incompatible input data structure.

Open issues

  • How do we implement more complex graphs e.g. hypergraphs at the same time as regular graphs?
    • Possible solutions are:
      • have different classes for each that inherit from the most general class;
      • overlay all types on one simple graph model e.g. use one graph model with special flags for nodes and edges; have a loose association of classes linked only by node and edge IDs, where the more complex model maintains its own consistency if related models are changed. Note: a hypergraph is not a type of graph - it is the other way around.
      • Don't support hypergraphs at all. I think that hypergraphs are a very complicated abstraction and possibly not the correct one for biochemical reactions.
  • How should these related aspects of the model be kept consistent e.g. are events needed in the core? Yes, events are needed in the core for the network model to communicate state changes to classes that depend on the network (but not vice versa).

Design Ideas

  • Have both a fast core graph implementation and a higher-level object oriented network API that makes it easier to manipulate networks for plugin writers. The latter is more memory and CPU intensive, so would have a lazy implementation.


Attributes

  • Is fast for writing and reading and is memory efficient. Use case: large gene expression data sets
  • Has simple types, which are good for inter-layer communication and core simplification, that can be combined into more complex types by advanced users. Use case: simple communication of data structures between plugins and modules, experience from Cyto1 and Cyto2.
  • Ability to optionally backend attributes to a database. Use case: large gene expression data sets

Open issues

  • What requirements are not met by current CyAttributes class (it seems to meet most current requirements)

    • Large amounts of boilerplate code.
    • The current global CyAttributes objects result in many very problematic dependencies that aren't apparent in interfaces.

  • How should we implement local vs. global attributes?
  • How should special attributes be implemented? E.g. hidden.
  • What is the largest size of network we should consider in our design (order of magnitude)? Millions of nodes/edges?


Identifiers

  • Gary's ID proposal from Dec.15.2007:

1. Graph objects = nodes, edges, groups, hyperedges (these are children of GraphObject)
2. Each graph object has a session unique integer ID (G-SUID) (not globally unique, just unique for a given Cytoscape session)
3. Each attribute row in a table has a SUID (A-SUID - not the same space as the G-SUID)
4. A map exists between graph object SUIDs (G-SUIDs) and attribute table row SUIDs (A-SUIDs)

This is simple and allows a lot of flexibility.  The presence of SUIDs mean you don't have to track IDs in any context, which is the simplest option.  This also allows multiple attribute tables (think relational database tables).  For instance, we can implement user attributes in one table and hidden attributes in another table.  In this case of >1 attribute tables, you need to store table context in the map.  Further, we can have multiple attribute tables in the same space e.g. load up 2 gene expression datasets in 2 different tables.  This gets rid of the need for empty attribute table cells.  I don't think we should really implement this latter option, as the GUI is more complex and we already have an efficient CyAttribute data structure (MultiHashMap), but having it is only a change in how the graph object SUID to attribute SUID map is used.

Also, the presence of a GraphObject superclass will make it easier for more general algorithm code to be written (really important!) and provides a place to track G-SUIDs.  Also, the GraphObject class can have additional children if we need to add more things in the future.  The advantage of this design is that there is no coupling in the model i.e. network doesn't know about groups leaving the model layer to be very flexible, however, composition (bringing different parts of the model together) in the application layer is facilitated.

What about network attributes?  It's just another SUID map.  Networks are separate from other objects that have SUIDs because of the above point about algorithm generality.  You don't want graph algorithms to work on networks as a GraphObject (and semantically, networks should not be network objects, otherwise we introduce a high level of complexity into the model by having hierarchy in there by default - you could be tempted to implement groups like this, but I'm not sure it would be a good idea due to the complexity issue).

5. Graph objects are not shared between graphs.

This is important for reducing complexity.  If you want to have network specific attributes or shared attributes across networks, just update the SUID map.

Challenges: keeping the map up to date is work, unlike in our current model.  It could be implemented either using an event based manager design or building a layer on top of the simple model APIs that handles adding attributes to graph objects and keeping the map current.  This is all application layer work, though, and the application sets the policy of attribute sharing among graph objects.

Notes:

Persistence: The only rule for persistence would be uniqueness within a session or file, so you can write or read IDs however you want, as long as you maintain uniqueness per session.  This would likely involve renumbering the SUIDs of all graphobjects upon reading from a session file.

Network merging: you can't use SUIDs to determine equivalence between graph objects.  This needs to be done at the application layer using additional information, such as attributes of those graph objects.

Open Issues

  • Should we implement multiple maps to handle local vs. global vs. hidden attributes?
  • Should users be able to use this like a database and just create their own tables? (with Cytoscape having default tables defined for graph object attributes?)
  • Should we use an existing database layer to implement this or just use a basic map?
  • Need to decide a data sharing policy for nodes e.g. no sharing of attributes (memory intensive in the worst case) vs. sharing by default (current case, breaks in some cases e.g. local attributes). The simplest level of the proposal does not deal with shared graph objects or attributes (1:1 mapping). If you want to have 1:many mapping from graph objects to attributes (shared attributes/node), you have to do extra work (though Sarah had a good point that databases may already do some of that work). You could also have other mapping relationships i.e. many:1 (shared nodes/attribute) or many:many, but I don't think we want those, at least initially, because their complexity doesn't justify their use in terms of our use cases.


Events

  • Define a low level event mechanism that can be used by modules to communicate their state changes to other modules that depend on them, but not vice versa.

Open Issues

  • Use an inheritance model or Event type model?


I/O

  • Read and write networks as XGMML.
  • Read and write attributes as part of XGMML.
  • Design general interfaces that support exporting different aspects of the Cytoscape system.
    • Export just network topology.
    • Export network topology AND graphical information.
    • Export attribute data.
    • Export images of networks.
    • Export Cytoscape session files.

Open Issues

Outdated_Cytoscape_3.0/ModelDiscussions (last edited 2011-02-24 15:36:39 by PietMolenaar)

Funding for Cytoscape is provided by a federal grant from the U.S. National Institute of General Medical Sciences (NIGMS) of the Na tional Institutes of Health (NIH) under award number GM070743-01. Corporate funding is provided through a contract from Unilever PLC.

MoinMoin Appliance - Powered by TurnKey Linux