Diff for "DataIntegration"

Differences between revisions 8 and 10 (spanning 2 versions)

RFC Name : Cytoscape Data Integration

Editor(s): Sarah Killcoyne, Alex Pico

Status: Open for Comments

Proposal

Improve Cytoscape's data integration by creating a set of API's for some generic services that would be useful to the users and plugin developers. Create a registry and associated service taxonomy to allow users to search for and choose the service they are interested in using. Add a description of the services and API's to the wiki or manual for others to use when creating a new service.

Propsed Services

Synonym mapping

Users request this frequently. If Cytoscape can provide the connectivity and the API of such a service others could create one that could be registered/discovered within Cytoscape allowing a user to pick the service they are interested in using.
This would effectively deal with the requests we often get for a service based on the organism a researcher is working with (e.g. Yeast)
Particularly with regards to protein id services, I think this is entirely doable, the T1Dbase put together a database of mapping through various gene and protein ids, I think this would just break down into gene-gene mapping, gene-protein and protein-protein mapping services
Description of current ISB services: attachment:ISB_SynonymService.doc
Mammalian WSDL: attachment:ISB_Mammalian_WSDL.xml
Use cases specific to the service from Bader lab [http://baderlab.org/IdentifierMapping Bader Lab ID Mapping page]:
- Identifier translation. Some analysis methods require specific translations from one set of identifiers to another. For instance, our ‘activity centers’ analysis requires translation from protein or gene identifiers in a pathway database to Affymetrix probe set identifiers or other gene expression array platform identifiers. Finding a favorite gene name (e.g. Hugo or even lab-specific) is another type identifier translation.
- Unification during dataset merging. During a merge operation e.g. of two protein-protein interaction datasets from independently created databases, it is vital to recognize that two protein objects, one from each data source, represent the same protein molecule, even if the protein objects don’t share any database accession numbers. Unification requires knowledge of record type e.g. you cannot reliably use a gene ID to unify proteins (mostly because splice variants exist).
- Link out to related references. When presenting information about a protein to a user on a web page, it is useful to display links to related information about the protein, such as further information about the protein sequence and sequence feature annotations (e.g. in UniProt), Gene Ontology annotations, domains annotations (InterPro), etc.
- Translating via orthology.Special case of identifier translation between species via orthology links. See HomoloGene.
Example mapping plugin:

attachment:ISB_idmapper2.png

Network creation

Several plugins do this on their own already: cPath, AgilientLiteratureSearch, BioNetBuilder. If a common API could be arrived at for requesting/receiving network information the rest of the work can continue to be done by the respective servers instead of expecting plugins to build the networks as well.
This would allow groups that currently offer their own data (e.g. Reactome) via web start to easily connect with Cytoscape users without having to create multiple web starts.
Use cases specific to the service:
- Search for a network based on a set of identifiers or terms
- Select the data to search, e.g. Reactome, Agilent or cPath
- Retrieve network with annotation information. The data returned should able to encompass annotation/expression data along with the interaction data

Others?

Any data tool that can be generic enough for a single API (WSDL)

Requirements

A WSDL to be agreed upon for each service (see attached examples) as well as a standard representation (xml) for each requested object. REST may be the easiest to support and for others to implement but either SOAP or REST will use a WSDL. We could use a wiki to edit and maintain the WSDL over time (though this is in essence changing the API). These services would be tested for interoperability (e.g. document/literal WS-I basic profile).
Taxonomy to describe services, with a manager to handle them (see registry). At the ISB we use three taxnomies (administrative, functional and descriptive) to describe different facets of a service.
A registry to allow for search and discovery of services. This will allow users the flexibility to choose the service they wish to use (e.g. Yeast group will prefer to use the yeast synonym mapping service) See the following:
- S-Moby [attach the S-Moby PDF]
- UDDI [attach PDF from Oasis]
  - ISB UDDI browser plugin:

attachment:ISB_uddibrowser.png

Implementation Plan

These steps would be required for each service, the steps outlined below are specific to the Synonym Service because it is the first one we would like to implement. All time estimates are for 1 FTE. Time estimates will vary for the other services, though with experience implementing one any others should be simpler.

Create WSDL for each service (start with id mapping as ISB and UCSF/Gladstone have working ones already)
1. WSDL basically defines the API of a service
2. Time: 2 weeks
3. Deliverable - WSDL
Create taxonomy to describe synonym services
1. Time: 1 week
2. Deliverable - Service taxonomy
Create at least one service that implements the WSDL and taxonomy.
1. ISB has a prototype in place that could be made public
2. UCSF/Gladstone has ID mapping services it would like to reimplement as a web service
3. Time: 2 weeks if we adopt our or another service, 16 weeks to create otherwise
4. Deliverable - Usable services
Choose and implement registry to locate services (see ISB’s UDDI browser plugin screenshots)
1. There is already 3rd party software available for this used in the ISB browser
2. Time: 2 weeks
3. Deliverable - registry search backend
Implement UI
1. Time: 1 week
2. Deliverable - UI
Document WSDL, taxonomy and registry for other servers, unit tests
1. Contact others who might be interested in implementing the service?
2. Time: 3 weeks
3. Deliverable - documentation and unit tests

Comments

How to Comment

Edit the page and add your comments under the provided header. By adding your ideas to the Wiki directly, we can more easily organize everyone's ideas, and keep clear records. Be sure to include today's date and your name for each comment. Try to keep your comments as concrete and constructive as possible. For example, if you find a part of the RFC makes no sense, please say so, but don't stop there. Take the extra step and propose alternatives.

-  ← Revision 8 as of 2007-07-19 17:58:59 →
  Size: 7792
  Editor: pix39
  Comment:
+  ← Revision 10 as of 2007-07-20 18:18:31 →
  Size: 7616
  Editor: AlexPico
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 5:
-|| '''RFC Name''' : Cytoscape Data Integration || '''Editor(s)''': Sarah Killcoyne, Alex Pico ||
+##Fill in RFC Name, primary authors and editors, and the status of the RFC

##Example Status states include: Work in Progress, Open for Comment, Closed



|| '''RFC Name''' : Cytoscape Data Integration || '''Editor(s)''': Sarah Killcoyne, Alex Pico || '''Status''': Open for Comments ||
-Line 8:
+Line 11:
-== About this document ==



This is an official Request for Comment (RFC) for '''Add your text here'''.



For details on RFCs in general, check out the [http://www.answers.com/main/ntquery?method=4&dsid=2222&dekey=Request+for+Comments&gwp=8&curtab=2222_1&linktext=Request%20for%20Comments Wikipedia Entry:  Request for Comments (RFCs)]



== Status ==



July 19, 2007 In progress, open for comments



== How to Comment ==



To view/add comments, click on any of 'Comment' links below.  By adding your ideas to the Wiki directly, we can more easily organize everyone's ideas, and keep clear records.  Be sure to include today's date and your name for each comment.  Here is an example to get things started:  ["/Comment"].



'''Try to keep your comments as concrete and constructive as possible.  For example, if you find a part of the RFC makes no sense, please say so, but don't stop there.  Take the extra step and propose alternatives.'''
-Line 100:
+Line 87:
+##If you want to create a separate subpage for Comments, then provide this link:  ["/Comment"]



=== How to Comment ===

Edit the page and add your comments under the provided header. By adding your ideas to the Wiki directly, we can more easily organize everyone's ideas, and keep clear records.  Be sure to include today's date and your name for each comment.  '''Try to keep your comments as concrete and constructive as possible.  For example, if you find a part of the RFC makes no sense, please say so, but don't stop there.  Take the extra step and propose alternatives.'''