Outdated_Cytoscape_3.0/IODiscussions

Cytoscape 3 Data I/O Layer

(Under Construction)

Data Sources

Files
- Should be URI-based.
Web Service
- All wrapped services (in 2.6, they are called CyWebServiceClient) will be registered to OSGi Service Registory
- WebServiceManager will be an intermediate service broker for other bundles (i.e., plugins).
- For more detail, please read this section.
Database
- Should support roundtrip between Network/Attribute Model (Objects) and major RDB (MySQL/Derby/PostgreSQL)
- O/R Mapper will be used (Hibernate?)

Cytoscape as a Data Source (Server Version)

Use the Cytoscape Data Model as its backend
Publish as web services

Use Cases / Requirements

Read and write networks as XGMML/SIF/GML/PSI-MI (TAB/XML)/SBML.
Read and write attributes as part of XGMML.
The ImportHandler has worked really well in the 2.x versions, so we should keep that pattern in mind when developing 3.0.
Instead of relying on a homegrown ImportHandler, we should leverage OSGi and and the service registry. This means that all IO interaction will be through interfaces.
We'll probably still need something like CyFileFilter to define and distinguish file types, but instead of returning a GraphReader, it should probably return an extension of a Reader interface. The Reader interface could then be extended to return whatever sort of data we need.
Design general interfaces that support exporting different aspects of the Cytoscape system. There should be mirror Import and Export systems. It isn't necessary that every import/export file type supports it's opposite operation, but it would be nice in cases where it made sense (e.g. SBML).
- Export just network topology.
- Export network topology AND graphical information.
- Export attribute data.
- Export images of networks.
- Export Cytoscape session files.

Open Issues

Support for other popular network file formats
- GraphML
- Binary Matrix (sometimes called
- DOT (used by Graphviz)
- NET (used by Pajek)
- Edge List (simple 2-column text file supported by igraph and many other applicaitons)
- GXL
- RDF (we should have a general RDF parser)

Design Ideas

Import

The concept is to compose Reader objects using different, independent interfaces that define what gets read. The benefit of this approach is that different Readers can be tailored to work for precisely the right kind of data.

There would be a core interface that would trigger the read (of all data types) and possibly specify the input source.

public interface CyReader {
   public void read();
   public void setInput(URI uri);  // Or something.  There just needs to be a general way that input can be specified
}

The CyReader object could then be supplemented by one or more interfaces that identify what the CyReader will read. The return values of methods indicate what sort of objects will be produced from a given file. If the CyReader is capable of producing a CyNetwork from a given type of input, then it should implement an interface like:

public interface CyNetworkProducer {
   public List<CyNetwork> getReadNetworks();
}

A network reader service would then implement both the CyReader and the CyNetworkProducer interfaces. The getReadNetworks() method could then be called after the read() method has completed.

If the input data contains attribute data in addition to network topology data that we'd like to read, then we could provide an additional interface for the Reader to implement:

public interface CyDataTableProducer {
   public List<CyDataTable> getReadCyDataTables();
}

This separation between a CyNetworkProducer and a CyDataTableProducer is useful because a SIF file doesn't contain any additional attribute data and so it wouldn't make sense for a SIF reader to implement a method like getReadCyDataTable(). However, such a method would be necessary for an XGMML reader.

This approach is extremely flexible because it allows Reader objects to be composed of several different interfaces. If we ever introduce a new data type (e.g. reading a background image) then we would create a new interface to support that data type. Existing readers could then simply add a new method to their implementation.

Import Alternative 1:

Instead of the Reader object producing an object (e.g. a CyNetwork) it could allow for an object to be updated. The parameters of the methods in the interfaces would provide the objects that get updated when CyReader.read() method gets called. So to read a network, you might have an interface that looks like this:

public interface NetworkUpdater {
   public void updateNetwork(CyNetwork n);
}

The updateNetwork() method would be called first to provide the network object (i.e. an empty network) to the reader, then the read() method would be executed which would update the network based on whatever input was provided. A benefit of the update paradigm is that existing networks or tables could be easily supplemented instead of new ones created. A disadvantage is that you'd have to know how many networks would need to be updated beforehand. Perhaps that is a fatal flaw.

Export

Export could use a similar composition strategy to Import. There would be a core Export interface:

public interface CyExporter {
  public void export(File f);
}

Then the CyExporter would be supplemented with other interfaces like CyNetworkExporter that specify which networks to be exported:

public interface CyNetworkExporter {
  public void setNetworks(List<CyNetwork> n);
}

Implementation

Bundles

IO module will be consists of 3 main bundles:

org.cytoscape.io-api - General file I/O API packages.
org.cytoscape.io-impl - Implementations of file IO API.
org.cytoscape.io.service - Remoting and Web Services

These bundles are available in Core3 SVN repository.

File IO

File readers/writers are exported as services.
CyReaderFactory listening to the reader/writer service events.

Outdated_Cytoscape_3.0/IODiscussions (last edited 2011-02-24 15:34:05 by PietMolenaar)