Differences between revisions 8 and 11 (spanning 3 versions)
Revision 8 as of 2006-02-08 22:30:50
Size: 6440
Editor: mskresolve-b
Comment:
Revision 11 as of 2006-02-09 02:31:47
Size: 7660
Editor: GaryBader
Comment:
Deletions are marked like this. Additions are marked like this.
Line 43: Line 43:
'''Graph Editing:'''

 * Ability to manipulate nodes that have not been labeled.
 * Ability to have two or more unique nodes with the same label.
 * User manipulates a node that has not been labeled e.g. when first created in the editor. (We need this now in the graph editor)
 * User views two or more nodes with the same unique identifier. (not yet implemented, but desired)
 * User wants to map nodes defined using UniProt IDs to nodes defined using Affymetrix IDs. (requires separation of externally defined node IDs from the unique CyNode IDs)
Line 53: Line 52:
 * Cytoscape nodes have a unique identifier (ID).
 * Cytoscape nodes have a standard way of attaching a string label.
 * Cytoscape must be able to import from and export to SIF and GML.
Cytoscape need to have:
 * nodes with a unique identifier (ID) that is unique for a session. It does not have to be globally unique or persist across sessions.
 * a standard way of attaching a string label to a CyNode that can be used to store e.g. the gene name
 * the label and the unique identifier be separated.
 * ability to import from and export to SIF, GML, node/edge attributes, gene expression data files.
Line 60: Line 61:
 * CyNetwork: Node ID in CyNode is a '''unique string''' (a GINY root graph index '''unique integer''' is also maintained and can be accessed by users). Semantics: the unique string is often expected to encode a gene name.  Gene names are known not to be unique. '''This is a semantic conflict'''.
 * CyAttributes: A MultiHashMap using a '''unique string''' a key
 * SIF: Node ID is a '''unique string''', often expected to encode a gene name. This is mapped to the CyNode unique string ID. '''This is a semantic conflict'''.
 * GML: Node ID is a '''unique integer'''. A label attribute is available per node that is a string (does not have to be unique). This string label is mapped to CyNode unique string ID. '''This is a datatype conflict'''.
 * Node attributes file format: same semantics as SIF.
 * Expression data file format: same semantics as SIF, but more often used to store expression IDs, like Affymetrix probeset IDs, while SIF often stored gene names. '''SIF and expression data IDs are difficult to map to each other.'''
 * Attribute browser: currently displays CyNode string ID, canonical name, common name, aliases (these are often the same string - 'duplicated data''')
 * Merge plugin: merges based on CyNode string ID. This only works for networks where CyNode string IDs are part of the same 'namespace'.
 * CyNetwork: Node ID in CyNode is a ''unique string'' (a GINY root graph index ''unique integer'' is also maintained and can be accessed by users). Semantics: the unique string is often expected to encode a gene name. '''Problems: two unique IDs are maintained (confusing for the developer), the unique string is used for gene names, which are not unique'''.
 * CyAttributes: A MultiHashMap using a ''unique string'' a key
 * SIF: Node ID is a ''unique string'', often expected to encode a gene name. This is mapped to the CyNode unique string ID. '''Problem: user can define what the unique CyNode identifier is, thus the CyNode ID assumes different semantics depending on the user, which is a problem for graph merging'''.
 * GML: Node ID is a ''unique integer''. A label attribute is available per node that is a string (does not have to be unique). This string label is mapped to CyNode unique string ID. '''Problem: a non-unique label is mapped to a unique string leading to potential conflicts'''.
 * Node attributes file format: Node ID is a ''unique string'', often expected to encode a gene name. This is mapped to the CyNode unique string ID. '''Problem: user can define what the unique CyNode identifier is, thus the CyNode ID assumes different semantics depending on the user, which is a problem for matching node IDs in node attribute files to nodes defined in sif files'''.
 * Expression data file format: same semantics as SIF, but more often used to store expression IDs, like Affymetrix probeset IDs, while SIF often stored gene names. '''Problem: SIF and expression data IDs are often difficult to map to each other.'''
 * Attribute browser: currently displays CyNode string ID, canonical name, common name, aliases (these are often the same string - '''Problems: duplicated data, confusing for the user''')
 * Merge plugin: merges based on CyNode unique string ID. '''Problem: this only works for networks where CyNode string IDs can be sensibly matched. If this is not the case, distinct nodes could be mistakenly merged'''. (TODO: need to check this with Ryan/Trey)
Line 69: Line 70:
 * BioDataServer synonym table: Was formerly used for mapping IDs e.g. between SIF and expression data IDs, but is currently part of the code, but non-functional.  * BioDataServer synonym table: Was formerly used for mapping IDs e.g. between SIF and expression data IDs, but is currently part of the code. '''Problem: synonym mapping is currently non-functional'''.
 * Visual mapper: Uses nodeAttributes.getCanonicalName(node) to identify nodes.
Line 72: Line 74:

'''Unique ID generation:'''
 * Should ID's be unique across Cytoscape community (say if stored in SIF file) ?
 * Should this unique id be stored/represented internally as a String ?
Line 107: Line 105:
attachment:CyNodeObjectModel.png

RFC Name : CyNode identifier

Editor(s): Ben Gross

TableOfContents([2])

About this document

This is an official Request for Comment (RFC) for CyNode Identification

For details on RFCs in general, check out the [http://www.answers.com/main/ntquery?method=4&dsid=2222&dekey=Request+for+Comments&gwp=8&curtab=2222_1&linktext=Request%20for%20Comments Wikipedia Entry: Request for Comments (RFCs)]

Status

This RFC is still under construction and open for public comment. (01/17/06 -Ben)

How to Comment

To view/add comments, click on any of 'Comment' links below. By adding your ideas to the Wiki directly, we can more easily organize everyone's ideas, and keep clear records. Be sure to include today's date and your name for each comment. Here is an example to get things started: ["/Comment"].

Try to keep your comments as concrete and constructive as possible. For example, if you find a part of the RFC makes no sense, please say so, but don't stop there. Take the extra step and propose alternatives.

Proposal

Switch to a numbered node system instead of the current node ID as a string system. There should be a clear distinction between a node id and its label.

  • Remove m_identifier string from CyNode

  • CyNode unique ID string is generated by CyNode

  • User gets unique ID string for CyNode from CyNode

  • Split Cytoscape.getCyNode(string alias, boolean create) into createCyNode(..) and getCyNode(string uid)
  • Add LABEL attribute as a default node attribute
  • CyAttributes is keyed using unique ID String generated by respective CyNode

  • Any method that takes a string id to identify a CyNode should instead take a CyNode

  • remove CANONICAL_NAME and COMMON_NAME ["/Comment"]
  • only use graph objects, not node indices, as parameters to methods, i.e. int [] getAdjacentEdges (Node node);

Biological Questions / Use Cases

  • User manipulates a node that has not been labeled e.g. when first created in the editor. (We need this now in the graph editor)
  • User views two or more nodes with the same unique identifier. (not yet implemented, but desired)
  • User wants to map nodes defined using UniProt IDs to nodes defined using Affymetrix IDs. (requires separation of externally defined node IDs from the unique CyNode IDs)

General Notes

In this RFC, the term "label" or "node label" has been used in place of the more historic term "name" or "node name".

Requirements

Cytoscape need to have:

  • nodes with a unique identifier (ID) that is unique for a session. It does not have to be globally unique or persist across sessions.
  • a standard way of attaching a string label to a CyNode that can be used to store e.g. the gene name

  • the label and the unique identifier be separated.
  • ability to import from and export to SIF, GML, node/edge attributes, gene expression data files.

Analysis

Cytoscape subsystems and their node identifier semantics

  • CyNetwork: Node ID in CyNode is a unique string (a GINY root graph index unique integer is also maintained and can be accessed by users). Semantics: the unique string is often expected to encode a gene name. Problems: two unique IDs are maintained (confusing for the developer), the unique string is used for gene names, which are not unique.

  • CyAttributes: A MultiHashMap using a unique string a key

  • SIF: Node ID is a unique string, often expected to encode a gene name. This is mapped to the CyNode unique string ID. Problem: user can define what the unique CyNode identifier is, thus the CyNode ID assumes different semantics depending on the user, which is a problem for graph merging.

  • GML: Node ID is a unique integer. A label attribute is available per node that is a string (does not have to be unique). This string label is mapped to CyNode unique string ID. Problem: a non-unique label is mapped to a unique string leading to potential conflicts.

  • Node attributes file format: Node ID is a unique string, often expected to encode a gene name. This is mapped to the CyNode unique string ID. Problem: user can define what the unique CyNode identifier is, thus the CyNode ID assumes different semantics depending on the user, which is a problem for matching node IDs in node attribute files to nodes defined in sif files.

  • Expression data file format: same semantics as SIF, but more often used to store expression IDs, like Affymetrix probeset IDs, while SIF often stored gene names. Problem: SIF and expression data IDs are often difficult to map to each other.

  • Attribute browser: currently displays CyNode string ID, canonical name, common name, aliases (these are often the same string - Problems: duplicated data, confusing for the user)

  • Merge plugin: merges based on CyNode unique string ID. Problem: this only works for networks where CyNode string IDs can be sensibly matched. If this is not the case, distinct nodes could be mistakenly merged. (TODO: need to check this with Ryan/Trey)

  • CyEdge: uses 2 CyNode string IDs as part of its ID.

  • BioDataServer synonym table: Was formerly used for mapping IDs e.g. between SIF and expression data IDs, but is currently part of the code. Problem: synonym mapping is currently non-functional.

  • Visual mapper: Uses nodeAttributes.getCanonicalName(node) to identify nodes.

Open Issues

SIF File Format:

  • What new information (if any) has to be written to SIF ?
  • SIF reader has to ensure that there are no duplicate labels.
  • SIF reuses a label if it can find one.
  • What effect does this have on new Cytoscape session saving subsystem ?

Edges:

  • Should edges have an ID, or is a nodeID-edgeType-nodeID good enough to uniquely identify edges?

GraphMerge

  • Does graph merge depend on a unique string ID? Does it assume all identical nodes have the same root graph ID?

Backward Compatibility

Importing/Exporting:

  • Ok to import old attributes file.
  • Exporting should use new format.

Current Implementation Notes (2.2)

  • cytoscape.graph.dynamic.util.DynamicGraphRepresentation.nodeCreate() creates a unique node id as integer.

  • cytoscape.giny.CyNodeDepot creates new CyNode using nodeID obtained from cytoscape.graph.dynamic.util.DynamicGraphRepresentation.nodeCreate()

  • nodes are created/retrieved through Cytoscape.getCyNode(), using cytoscape.giny.CytoscapeFingRootGraph (which extends fing.model.FRootGraph). cytoscape.giny.CytoscapeFingRootGraph handles mapping of node indentifiers as strings to node id's as integers via com.sosnoski.util.hashmap.StringIntHashMap() (it translates the string to id before forwarding request to fing.model.FRootGraph).

  • cytoscape.getCyNode() is called from:
    • cytoscape.data.readers.GMLReader.read()
    • cytoscape.data.readers.GMLReader2.createGraph()
    • cytoscape.data.readers.InteractionsReader.createRootGraphFromInteractionData()

    • cytoscape.editor.CytoscapeEditorManager.addNode()

attachment:CyNodeObjectModel.png

Implementation Plan

  • Resolve open issues within RFC/clarify specification. Code analysis to support final RFC proposal.

CyNode_Identification (last edited 2009-02-12 01:03:31 by localhost)

Funding for Cytoscape is provided by a federal grant from the U.S. National Institute of General Medical Sciences (NIGMS) of the Na tional Institutes of Health (NIH) under award number GM070743-01. Corporate funding is provided through a contract from Unilever PLC.

MoinMoin Appliance - Powered by TurnKey Linux