Differences between revisions 5 and 8 (spanning 3 versions)
Revision 5 as of 2006-02-07 00:23:20
Size: 5229
Editor: GaryBader
Comment:
Revision 8 as of 2006-02-08 22:30:50
Size: 6440
Editor: mskresolve-b
Comment:
Deletions are marked like this. Additions are marked like this.
Line 29: Line 29:
 * use unique numerical ids for nodes  * Remove m_identifier string from CyNode
 * CyNode unique ID string is generated by CyNode
 * User gets unique ID string for CyNode from CyNode
 * Split Cytoscape.getCyNode(string alias, boolean create) into createCyNode(..) and getCyNode(string uid)
 * Add LABEL attribute as a default node attribute
 * CyAttributes is keyed using unique ID String generated by respective CyNode
 * Any method that takes a string id to identify a CyNode should instead take a CyNode
Line 31: Line 37:
 * add LABEL attribute as a default node attribute
Line 60: Line 65:
 * Expression data file format: same semantics as SIF, but more often used to store expression IDs, like Affymetrix probeset IDs, while SIF often stored gene names. '''SIF and expression data IDs are difficult to map to each other.'''
 * Attribute browser: currently displays CyNode string ID, canonical name, common name, aliases (these are often the same string - 'duplicated data''')
 * Merge plugin: merges based on CyNode string ID. This only works for networks where CyNode string IDs are part of the same 'namespace'.
Line 61: Line 69:
 * BioDataServer synonym table: Was formerly used for mapping IDs e.g. between SIF and expression data IDs, but is currently part of the code, but non-functional.
Line 76: Line 85:
  
'''GraphMerge'''
 * Does graph merge depend on a unique string ID? Does it assume all identical nodes have the same root graph ID?

RFC Name : CyNode identifier

Editor(s): Ben Gross

TableOfContents([2])

About this document

This is an official Request for Comment (RFC) for CyNode Identification

For details on RFCs in general, check out the [http://www.answers.com/main/ntquery?method=4&dsid=2222&dekey=Request+for+Comments&gwp=8&curtab=2222_1&linktext=Request%20for%20Comments Wikipedia Entry: Request for Comments (RFCs)]

Status

This RFC is still under construction and open for public comment. (01/17/06 -Ben)

How to Comment

To view/add comments, click on any of 'Comment' links below. By adding your ideas to the Wiki directly, we can more easily organize everyone's ideas, and keep clear records. Be sure to include today's date and your name for each comment. Here is an example to get things started: ["/Comment"].

Try to keep your comments as concrete and constructive as possible. For example, if you find a part of the RFC makes no sense, please say so, but don't stop there. Take the extra step and propose alternatives.

Proposal

Switch to a numbered node system instead of the current node ID as a string system. There should be a clear distinction between a node id and its label.

  • Remove m_identifier string from CyNode

  • CyNode unique ID string is generated by CyNode

  • User gets unique ID string for CyNode from CyNode

  • Split Cytoscape.getCyNode(string alias, boolean create) into createCyNode(..) and getCyNode(string uid)
  • Add LABEL attribute as a default node attribute
  • CyAttributes is keyed using unique ID String generated by respective CyNode

  • Any method that takes a string id to identify a CyNode should instead take a CyNode

  • remove CANONICAL_NAME and COMMON_NAME ["/Comment"]
  • only use graph objects, not node indices, as parameters to methods, i.e. int [] getAdjacentEdges (Node node);

Biological Questions / Use Cases

Graph Editing:

  • Ability to manipulate nodes that have not been labeled.
  • Ability to have two or more unique nodes with the same label.

General Notes

In this RFC, the term "label" or "node label" has been used in place of the more historic term "name" or "node name".

Requirements

  • Cytoscape nodes have a unique identifier (ID).
  • Cytoscape nodes have a standard way of attaching a string label.
  • Cytoscape must be able to import from and export to SIF and GML.

Analysis

Cytoscape subsystems and their node identifier semantics

  • CyNetwork: Node ID in CyNode is a unique string (a GINY root graph index unique integer is also maintained and can be accessed by users). Semantics: the unique string is often expected to encode a gene name. Gene names are known not to be unique. This is a semantic conflict.

  • CyAttributes: A MultiHashMap using a unique string a key

  • SIF: Node ID is a unique string, often expected to encode a gene name. This is mapped to the CyNode unique string ID. This is a semantic conflict.

  • GML: Node ID is a unique integer. A label attribute is available per node that is a string (does not have to be unique). This string label is mapped to CyNode unique string ID. This is a datatype conflict.

  • Node attributes file format: same semantics as SIF.
  • Expression data file format: same semantics as SIF, but more often used to store expression IDs, like Affymetrix probeset IDs, while SIF often stored gene names. SIF and expression data IDs are difficult to map to each other.

  • Attribute browser: currently displays CyNode string ID, canonical name, common name, aliases (these are often the same string - 'duplicated data)

  • Merge plugin: merges based on CyNode string ID. This only works for networks where CyNode string IDs are part of the same 'namespace'.

  • CyEdge: uses 2 CyNode string IDs as part of its ID.

  • BioDataServer synonym table: Was formerly used for mapping IDs e.g. between SIF and expression data IDs, but is currently part of the code, but non-functional.

Open Issues

Unique ID generation:

  • Should ID's be unique across Cytoscape community (say if stored in SIF file) ?
  • Should this unique id be stored/represented internally as a String ?

SIF File Format:

  • What new information (if any) has to be written to SIF ?
  • SIF reader has to ensure that there are no duplicate labels.
  • SIF reuses a label if it can find one.
  • What effect does this have on new Cytoscape session saving subsystem ?

Edges:

  • Should edges have an ID, or is a nodeID-edgeType-nodeID good enough to uniquely identify edges?

GraphMerge

  • Does graph merge depend on a unique string ID? Does it assume all identical nodes have the same root graph ID?

Backward Compatibility

Importing/Exporting:

  • Ok to import old attributes file.
  • Exporting should use new format.

Current Implementation Notes (2.2)

  • cytoscape.graph.dynamic.util.DynamicGraphRepresentation.nodeCreate() creates a unique node id as integer.

  • cytoscape.giny.CyNodeDepot creates new CyNode using nodeID obtained from cytoscape.graph.dynamic.util.DynamicGraphRepresentation.nodeCreate()

  • nodes are created/retrieved through Cytoscape.getCyNode(), using cytoscape.giny.CytoscapeFingRootGraph (which extends fing.model.FRootGraph). cytoscape.giny.CytoscapeFingRootGraph handles mapping of node indentifiers as strings to node id's as integers via com.sosnoski.util.hashmap.StringIntHashMap() (it translates the string to id before forwarding request to fing.model.FRootGraph).

  • cytoscape.getCyNode() is called from:
    • cytoscape.data.readers.GMLReader.read()
    • cytoscape.data.readers.GMLReader2.createGraph()
    • cytoscape.data.readers.InteractionsReader.createRootGraphFromInteractionData()

    • cytoscape.editor.CytoscapeEditorManager.addNode()

Implementation Plan

  • Resolve open issues within RFC/clarify specification. Code analysis to support final RFC proposal.

CyNode_Identification (last edited 2009-02-12 01:03:31 by localhost)

Funding for Cytoscape is provided by a federal grant from the U.S. National Institute of General Medical Sciences (NIGMS) of the Na tional Institutes of Health (NIH) under award number GM070743-01. Corporate funding is provided through a contract from Unilever PLC.

MoinMoin Appliance - Powered by TurnKey Linux