Remote Job API

Scooter Morris, Barry Demchak

2015-06-04

Initial

Remote Job API

Kei Ono, Barry Demchak

2015-06-05

Comments

Proposal

Beginning in Cytoscape 3.3, we hope to enable the execution of long running jobs remote from Cytoscape. As with PSICQUIC, a connection to an external service requires custom Cytoscape code (either core or app, but called app for the purpose of this discussion) to manage the overall workflow and the the dataflow between the service and the Cytoscape model. In this RFC, we propose mechanisms for assisting the dataflow, while leaving specific handshaking and interfacing to the app and service themselves.

Specifically, this proposal intends to be agnostic as to the specific protocols for interfacing with remote computations. (At this time, we foresee interfacing with both Opal-based services and CI-based services, both of which share basic protocol philosophy, but differ in many details. We intend to accommodate services whose interface details we don't know about yet, too.) It focuses on generic Cytoscape infrastructure that assists app code in interfacing with the Cytoscape model and remote computation.

Background

Provide a brief description of the background of project.

Sample Use Case

As a base case, suppose a remote execution platform (aka service) that is capable of accepting a number of computation parameters (possibly including one or more networks), returning a token identifying the computation, returning status for the computation, and returning a result (possibly including one or more networks or other data). The computation would execute detached from Cytoscape until it completes or until the computation is aborted.

Once Cytoscape initiates the computation, it would would poll status until the result is ready, and would then download the result and process it.

Variations of this scenario may include (for example):

Whichever variation the computation presents, we assume custom Cytoscape code will be built accordingly.

In a typical scenario:

  1. User executes an app that communicates with the remote computation
  2. App displays dialog box that gets execution parameters from user
    • As Tunables or Custom UI

    • If the external service requires authentication, user credentials should be entered in this step
  3. App combines parameters and a network/table residing in the Cytoscape model
  4. App initiates computation, passing parameters and network/table
  5. Computation returns job token to Cytoscape
    • CyJob

    • Q. Does Job token includes user information?
  6. Cytoscape stores job token as part of the Cytoscape session
    • Q. Does Cytoscape automatically creates a new file to save a snapshot of the state?

    • If exact same state is required for merging result, Session may need unique ID, like MD5.
  7. User saves Cytoscape session, including the job token
  8. User terminates Cytoscape
  9. User starts Cytoscape some time later
  10. User initiates app to begin polling for computation completion
  11. Upon completion, app downloads result
    • CyJobFetcher

    • The state may be different if the session is not a snapshot of the state when user started the Job.
    • If the state is different from original, it may need to display warning
  12. App integrates result into Cytoscape session (as a new network, a merged network, a new display, or in some other way)
  13. App purges result

Other Scenario (Stateless Tasks):

  1. (Optional) User enters ID/PW to a service. This will be saved as an encrypted data in CytoscapeConfig dir

  2. Select a menu item to start a service caller
  3. Enter parameters for the job
  4. Submit a job to external service
  5. Job IDs will be saved in CytoscapeConfig dir

  6. Do whatever he/she wants
  7. Quit Cytoscape
  8. User starts Cytoscape again
  9. Load the Job list from CytoscapeConfig

  10. If the list is not empty, check status of them
  11. Once it is finished, ask user to get/discard result
  12. If user wants the result, just fetch it whatever the current state is
  13. Remove the Job ID from the list

Note: There are many use cases of this. In general, if an external service is for generating new networks/tables, it can be merged to any session.

Technical Proposal

Generally, executing a remote computation from Cytoscape involves supporting the following steps (called the Cytoscape internal workflow):

  1. marshalling model data and execution parameters
  2. executing the remote job
  3. tracking job status
  4. retrieving job results
  5. unmarshalling result data into the model or elsewhere

We propose an API that can be used by app code code to implement a link between Cytoscape and a remote computation. The general idea behind the API is that each step of Cytoscape internal workflow could be implemented differently for different remote computations (which implicitly means possibly different job execution environments), but sometimes the implementations should be shared or reused. So, in the new API, we have the following objects:

CyJob A CyJob is an instance of an external execution that will (possibly receive data from Cytoscape, perform some external task, and (possibly) get data back from Cytoscape. In general, a CyJob is returned from a CyJobExecutor.

CyJobData contains the results from a data marshaller or fetcher to be handed to a CyJobExecutor. There are two types of CyJobData objects: CyJobBinaryData and CyJobStringData.

CyJobViewMarshaller, CyJobNetworkMarshaller, and CyJobTableMarshaller will implement the actual marshalling of data. Note that to marshall a series of network views with its underlying model including tables, you would only use the CyJobViewMarshaller, which extends the CyJobNetworkMarshaller. The CyJobNetworkMarshaller extends the CyJobTableMarshaller -- the idea being that a view marshaller will need to marshall the tables, etc.

CyJobExecutor will implement the necessary exchange with the remote environment to submit or execute the job. This could logically be run as part of a task so user's can select the parameters and objects for the execution. However, once the CyJobExecutor completes it is assumed the the task will complete and that the job runs asynchronously.

CyJobStatusChecker checks on the status of the job.

CyJobFetcher fetches the results of a job.

CyJobUnmarshaller unmarshalls the data and (possibly) registers the results with the appropriate managers

Q. Do we need a ServiceUser interface to access user credentials if authentication is required? Or is it part of CyJob?

Implementation Plan

Outline and describe the process and major issues related to implementing this proposal. Illustrate your plan when possible. Try this free online tool for making diagrams -> Best4c (draw; save; then insert hyperlink into this page)

Project Management

Project Timeline

Provide a timeline for implementation. Insert a graphic if you can. Try this free online tool for making project timelines -> Help-u-Plan (create a new chart; modify; right-click to save gif; then attach to this page)

Tasks and Milestones

Outline the major milestones and tasks involved in implementation.

  1. Milestone 1: …

    1. Task 1: ...
    2. Task 2: ...
  2. Milestone 2: …

Project Dependencies

Outline and projects that depend on this project, link to relevant RFC's and note at what point dependent projects could be started.

Link to other related RFCs

Issues

List any issues, conflict, or dependencies raised by this proposal

Comments

  public enum Status {
    SUBMITTED("Submitted"),
    QUEUED("In queue"),
    RUNNING("Running"),
    TERMINATED("Terminated"),
    FINISHED("Successfully finished"),
    FAILED("Failed"),
    ERROR("Finished with errors or warnings");

    private final String name;
    Status(String n) {
      this.name = n;
    }
    public String toString() {return name;}
  }

We could certainly add a PURGED value.

How to Comment

Edit the page and add your comments under the provided header. By adding your ideas to the Wiki directly, we can more easily organize everyone's ideas, and keep clear records. Be sure to include today's date and your name for each comment. Try to keep your comments as concrete and constructive as possible. For example, if you find a part of the RFC makes no sense, please say so, but don't stop there. Take the extra step and propose alternatives.

RemoteJobAPI (last edited 2015-06-05 21:54:40 by bdemchak)

Funding for Cytoscape is provided by a federal grant from the U.S. National Institute of General Medical Sciences (NIGMS) of the Na tional Institutes of Health (NIH) under award number GM070743-01. Corporate funding is provided through a contract from Unilever PLC.

MoinMoin Appliance - Powered by TurnKey Linux