Remote Job API : … |
Scooter Morris, Barry Demchak: … |
2015-06-04: … |
Initial: … |
Contents
Proposal
Beginning in Cytoscape 3.3, we hope to enable the execution of long running jobs remote from Cytoscape. As with PSICQUIC, a connection to an external service requires custom Cytoscape code (either core or app) to manage the overall workflow and the the dataflow between the service and the Cytoscape model. In this RFC, we propose mechanisms for assisting the dataflow, while leaving specific handshaking and interfacing to the custom Cytoscape code and service themselves.
Background
Provide a brief description of the background of project.
Sample Use Case
As a base case, suppose a remote execution platform (aka service) that is capable of accepting a number of computation parameters (possibly including one or more networks), returning a token identifying the computation, returning status for the computation, and returning a result (possibly including one or more networks or other data). The computation would execute detached from Cytoscape until it completes or until the computation is aborted.
Once Cytoscape initiates the computation, it would would poll status until the result is ready, and would then download the result and process it.
Variations of this scenario may include (for example):
- the computation not returning a token, but executing until completion and then returning the result immediately
- the computation providing management facilities for users to inspect and manipulate job queues independent of Cytoscape
- the computation providing facilities for custom Cytoscape code to manipulate job queues from within Cytoscape
- the computation requiring user credentials, which the custom Cytoscape code could procure, or which may be procured via other means
- the computation being able to send e-mail to notify the user when the computation is complete
Whichever variation the computation presents, we assume custom Cytoscape code will be built accordingly.
In a typical scenario:
- User executes an app that communicates with the remote computation
- App displays dialog box that gets execution parameters from user
- App combines parameters and a network residing in the Cytoscape model
- App initiates computation, passing parameters and network
- Computation returns job token to Cytoscape
- Cytoscape stores job token as part of the Cytoscape session
- User saves Cytoscape session, including the job token
- User terminates Cytoscape
- User starts Cytoscape some time later
- User initiates app to begin polling for computation completion
- Upon completion, app downloads result
- App integrates result into Cytoscape session (as a new network, a merged network, a new display, or in some other way)
- App purges result
Technical Proposal
Generally, executing a remote computation from Cytoscape involves supporting the following steps (called the Cytoscape internal workflow):
- marshalling model data and execution parameters
- executing the remote job
- tracking job status
- retrieving job results
- unmarshalling result data into the model or elsewhere
We propose an API that can be used by custom Cytoscape code to implement a link between Cytoscape and a remote computation. The general idea behind the API is that each step of Cytoscape internal workflow could be implemented differently for different remote computations (which implicitly means possibly different job execution environments), but sometimes the implementations should be shared or reused. So, in the new API, we have the following objects:
CyJob A CyJob is an instance of an external execution that will (possibly receive data from Cytoscape, perform some external task, and (possibly) get data back from Cytoscape. In general, a CyJob is returned from a CyJobExecutor.
CyJobData contains the results from a data marshaller or fetcher to be handed to a CyJobExecutor. There are two types of CyJobData objects: CyJobBinaryData and CyJobStringData.
CyJobViewMarshaller, CyJobNetworkMarshaller, and CyJobTableMarshaller will implement the actual marshalling of data. Note that to marshall a series of network views with its underlying model including tables, you would only use the CyJobViewMarshaller, which extends the CyJobNetworkMarshaller. The CyJobNetworkMarshaller extends the CyJobTableMarshaller -- the idea being that a view marshaller will need to marshall the tables, etc.
CyJobExecutor will implement the necessary exchange with the remote environment to submit or execute the job. This could logically be run as part of a task so user's can select the parameters and objects for the execution. However, once the CyJobExecutor completes it is assumed the the task will complete and that the job runs asynchronously.
CyJobStatusChecker checks on the status of the job.
CyJobFetcher fetches the results of a job.
CyJobUnmarshaller unmarshalls the data and (possibly) registers the results with the appropriate managers
Implementation Plan
Outline and describe the process and major issues related to implementing this proposal. Illustrate your plan when possible. Try this free online tool for making diagrams -> Best4c (draw; save; then insert hyperlink into this page)
Project Management
Project Timeline
Provide a timeline for implementation. Insert a graphic if you can. Try this free online tool for making project timelines -> Help-u-Plan (create a new chart; modify; right-click to save gif; then attach to this page)
Tasks and Milestones
Outline the major milestones and tasks involved in implementation.
Milestone 1: …
- Task 1: ...
- Task 2: ...
Milestone 2: …
Project Dependencies
Outline and projects that depend on this project, link to relevant RFC's and note at what point dependent projects could be started.
Related RFCs
Link to other related RFCs
Issues
List any issues, conflict, or dependencies raised by this proposal
Comments
[Scooter] I think this covers pretty much all of the cases I can think of for the various kinds of job execution environments. I'm pretty sure we can even write a CyJobExecutor that will launch binaries on the local machine and monitor for completion and then unmarshall the resulting data. I'm still not happy with the marshalling step. An alternative would be to have a more generic CyJobMarshaller interface the took views, networks, and tables in a single input statement and then just "did the best that it could". Anyone have a better approach? At any rate, I can land this in develop so people can take a look at it before I start adding any implementation. From the core perspective, the only implementation will be an implementation of the CyJobsManager, which monitors and manages all of the jobs. When a job completes, the CyJobManager will execute a task that will ask the user if they want to fetch the data (or dispense with it), fetch the data, and unmarshall it.
Add comment here…
How to Comment
Edit the page and add your comments under the provided header. By adding your ideas to the Wiki directly, we can more easily organize everyone's ideas, and keep clear records. Be sure to include today's date and your name for each comment. Try to keep your comments as concrete and constructive as possible. For example, if you find a part of the RFC makes no sense, please say so, but don't stop there. Take the extra step and propose alternatives.