Differences between revisions 1 and 2
Revision 1 as of 2009-12-22 17:44:34
Size: 3016
Editor: malbec
Comment:
Revision 2 as of 2009-12-22 18:25:49
Size: 6998
Editor: malbec
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
## page was renamed from RFC Template
## This template should be used for creating new RFC's (Request for comments) for Cytoscape development

|| '''Equation Attributes''' : …|| '''Editor(s)''': Johannes Ruscheinski || '''Date''': ||'''Open for comment''': … ||
|| '''RFC64''': Equation Attributes || '''Editor(s)''': Johannes Ruscheinski || '''Date''': 2009-12-22 ||'''Status''': Open for comment ||
Line 15: Line 12:
Allows users to easily colour a node based on a threshold value. Example: Given a Double attribute "someValue" and a threshold called "someThreshold" we could create a new boolean attribute called "exceedsThreshold" which would be defined by an equation attribute that is "$(someValue) > someThreshold".  Many more complex examples can be easily construed.  * Allows users to easily colour a node based on a threshold value. Example: Given a Double attribute "someValue" and a threshold called "someThreshold" we could create a new boolean attribute called "exceedsThreshold" which would be defined by an equation attribute that is "$(someValue) > someThreshold".
 * Assume we have an attribute called "someAttrib" and we would like to assign colour values based on the log of this attribute's value then we could assign a new new Double attribute "shiftedLog" defined by "log10($(someAttrib)) + 22.7"
 * Consider an attribute that has missing values called "maybe
Missing" and assume we could like to provide a derived attribute that has a default value of -10.2 when ever "maybeMissing" is not set, then we could create a new attribute called "neverMissing" defined by "$(maybeMissing=-10.2)"

Many
more complex examples can be easily construed.
Line 19: Line 20:
 2. Write a parser (LR1 grammar, recursive descent) implementing the syntax and error handling (Pay special attention to easy extension with more built-in functions.)
 3. Test and debug the stand-alone parser
 4. Identify the location in the Cytoscape code base for integration and integrate the parser
 5. Test and debug the new capability within Cytoscape

== Project Management ==
 1. Write a parser (LL(1) grammar, recursive descent(http://en.wikipedia.org/wiki/Recursive_descent_parser)) implementing the syntax and error handling (Pay special attention to easy extension with more built-in functions.)
 1. Test and debug the stand-alone parser
 1. Identify the location in the Cytoscape code base for integration and integrate the parser
 1. Test and debug the new capability within Cytoscape
Line 28: Line 27:
 1. Define Syntax (est. 1/2 day to 1 day)  1. Define syntax (est. 1/2 day to 1 day)
Line 35: Line 34:
~-''Outline the major milestones and tasks involved in implementation.''-~

 1. '''Milestone 1: …'''
  1. Task 1: ...
  1. Task 2: ...
 1. '''Milestone 2: …'''
 1. Define syntax
  1. Cull a subset of the "C" syntax that corresponds to what we want to implement
  1. Check for violations of LL(1) and transform to LL(1), if necessary
 1. Implement and test stand-alone parser
  1. Write a scanner/tokenizer (GetToken/UngetToken)
  1. Convert the syntax into a series of recursive function calls
  1. Add type checking and error handling
  1. Create unit tests
  1. Implement the built-in functions (Possibly add the capability to register Java function objects as built-in functions.)
  1. Test and debug
 1. Integrate the parser into the Cytoscape code base
  1. Find where in the Cytoscape code base would be the natural location for integration
  1. Actually integrate the new code
  1. Create unit tests for the integrated code
  1. Test the new code as part of Cytoscape
  1. Manually test and debug the new code and possibly extend the unit tests
Line 43: Line 52:
~-''Outline and projects that depend on this project, link to relevant RFC's and note at what point dependent projects could be started.''-~ This project does '''not''' depend on any other projects.
Line 45: Line 54:
== Issues ==
~-''List any issues, conflict, or dependencies raised by this proposal''-~
== Informal Description of Equation Syntax ==
In addition to String, Integer, Double and List attributes, it is also possible to define "attributes" that are functions of one or more other attributes. These functions may be string-, integer-, double-, or list-valued and refer to other attributes. In the case of Integer and Double attributes, you may used arbitrary arithmetic expression using addition, subtraction, multiplication, exponentiation (using "^"), as well as a small set of predefined mathematical functions and grouping and nesting using parentheses. Currently the supported functions are:

 * abs - absolute value
 * log10 - logarithm to base 10
 * exp - e raised to a power
 * round - nearest integer
 * trunk - returns the integer part of a number, e.g. trunk(-1.3) = trunk(-1.9) = -1

For String attributes you may use "+" for concatenation and the functions:

 * tolower - convert a string to lowercase
 * toupper - convert a string to uppercase
 * substring - takes 3 arguments, (string, startIndex, length) and returns the substring of "string" starting at 0-based offset "startIndex" and of length "length." This function is robust in that, if startIndex is beyond the end of "string", an empty string will be returned and if "length" would extend beyond the end of "string", as much as possible of "string" will be returned. On the other hand, a negative "startIndex" will result in an error.
 * format - convert an arithmetic value or expression to a string. The arguments to "format" are (arithmeticExpression, width, precision [,padding [, type]``]) where padding and type are optional and default to " " for padding and NE for type. Padding must be a string literal of length 1 and type can be either E (for exponential) or NE (for non-exponential). Padding will be used to extend a result, if its width is less than the specified width. Use a width of 0 if you do not want any padding. Precision specifies the number of significant digits in the generated string. When using the E format, generated strings will look like "2.3e5" rather than "23000".

String literals are also allowed and need to be enclosed in double-quotes. Double-quotes, newlines or backslashes may be embedded in a string literal by preceding them with a backslash.

Boolean functions may compare arithmetic values using the operators == (equal) != (not equal) <, <=, >, >=, ! (not), && (and), and || (or). Again arbitrary nesting will be supported through the use of parentheses.

References to other attributes are written like this:

$(!AttributeName)

or, when you would like to provide a default value for when an attribute values is missing, like this:

$(!AttributeName=!DefaultValue)

Type errors or arithmetic errors are reported in the field that contains the equation, for example if the function attempts to add a string and an integer or if an equation references a non-existent or empty attribute without providing a default.

RFC64: Equation Attributes

Editor(s): Johannes Ruscheinski

Date: 2009-12-22

Status: Open for comment

Proposal

Add the ability to Cytoscape to describe attributes as being derived or computed based on expression which may involve references to other attributes.

Background

This capability would seem to be a natural extension in the attribute browser. Especially for numerical-valued attributes it seems obvious that Cytoscape users might want to derive new quantities based on the values of other attributes. A similar argument holds for boolean attributes.

Use Cases

  • Allows users to easily colour a node based on a threshold value. Example: Given a Double attribute "someValue" and a threshold called "someThreshold" we could create a new boolean attribute called "exceedsThreshold" which would be defined by an equation attribute that is "$(someValue) > someThreshold".

  • Assume we have an attribute called "someAttrib" and we would like to assign colour values based on the log of this attribute's value then we could assign a new new Double attribute "shiftedLog" defined by "log10($(someAttrib)) + 22.7"
  • Consider an attribute that has missing values called "maybeMissing" and assume we could like to provide a derived attribute that has a default value of -10.2 when ever "maybeMissing" is not set, then we could create a new attribute called "neverMissing" defined by "$(maybeMissing=-10.2)"

Many more complex examples can be easily construed.

Implementation Plan

  1. Define a syntax for the expressions (Cull this as a subset from some programming language's grammar?)
  2. Write a parser (LL(1) grammar, recursive descent(http://en.wikipedia.org/wiki/Recursive_descent_parser)) implementing the syntax and error handling (Pay special attention to easy extension with more built-in functions.)

  3. Test and debug the stand-alone parser
  4. Identify the location in the Cytoscape code base for integration and integrate the parser
  5. Test and debug the new capability within Cytoscape

Project Timeline

  1. Define syntax (est. 1/2 day to 1 day)
  2. Implement and test stand-alone parser (3 to 4 days)
  3. Integrate the parser into the Cytoscape code base (3 to 4 days)

Total time: 6 1/2 to 9 days.

Tasks and Milestones

  1. Define syntax
    1. Cull a subset of the "C" syntax that corresponds to what we want to implement
    2. Check for violations of LL(1) and transform to LL(1), if necessary
  2. Implement and test stand-alone parser
    1. Write a scanner/tokenizer (GetToken/UngetToken)

    2. Convert the syntax into a series of recursive function calls
    3. Add type checking and error handling
    4. Create unit tests
    5. Implement the built-in functions (Possibly add the capability to register Java function objects as built-in functions.)
    6. Test and debug
  3. Integrate the parser into the Cytoscape code base
    1. Find where in the Cytoscape code base would be the natural location for integration
    2. Actually integrate the new code
    3. Create unit tests for the integrated code
    4. Test the new code as part of Cytoscape
    5. Manually test and debug the new code and possibly extend the unit tests

Project Dependencies

This project does not depend on any other projects.

Informal Description of Equation Syntax

In addition to String, Integer, Double and List attributes, it is also possible to define "attributes" that are functions of one or more other attributes. These functions may be string-, integer-, double-, or list-valued and refer to other attributes. In the case of Integer and Double attributes, you may used arbitrary arithmetic expression using addition, subtraction, multiplication, exponentiation (using "^"), as well as a small set of predefined mathematical functions and grouping and nesting using parentheses. Currently the supported functions are:

  • abs - absolute value
  • log10 - logarithm to base 10
  • exp - e raised to a power
  • round - nearest integer
  • trunk - returns the integer part of a number, e.g. trunk(-1.3) = trunk(-1.9) = -1

For String attributes you may use "+" for concatenation and the functions:

  • tolower - convert a string to lowercase
  • toupper - convert a string to uppercase
  • substring - takes 3 arguments, (string, startIndex, length) and returns the substring of "string" starting at 0-based offset "startIndex" and of length "length." This function is robust in that, if startIndex is beyond the end of "string", an empty string will be returned and if "length" would extend beyond the end of "string", as much as possible of "string" will be returned. On the other hand, a negative "startIndex" will result in an error.
  • format - convert an arithmetic value or expression to a string. The arguments to "format" are (arithmeticExpression, width, precision [,padding [, type]]) where padding and type are optional and default to " " for padding and NE for type. Padding must be a string literal of length 1 and type can be either E (for exponential) or NE (for non-exponential). Padding will be used to extend a result, if its width is less than the specified width. Use a width of 0 if you do not want any padding. Precision specifies the number of significant digits in the generated string. When using the E format, generated strings will look like "2.3e5" rather than "23000".

String literals are also allowed and need to be enclosed in double-quotes. Double-quotes, newlines or backslashes may be embedded in a string literal by preceding them with a backslash.

Boolean functions may compare arithmetic values using the operators == (equal) != (not equal) <, <=, >, >=, ! (not), && (and), and || (or). Again arbitrary nesting will be supported through the use of parentheses.

References to other attributes are written like this:

$(AttributeName)

or, when you would like to provide a default value for when an attribute values is missing, like this:

$(AttributeName=DefaultValue)

Type errors or arithmetic errors are reported in the field that contains the equation, for example if the function attempts to add a string and an integer or if an equation references a non-existent or empty attribute without providing a default.

Comments

  • Add comment here…

How to Comment

Edit the page and add your comments under the provided header. By adding your ideas to the Wiki directly, we can more easily organize everyone's ideas, and keep clear records. Be sure to include today's date and your name for each comment. Try to keep your comments as concrete and constructive as possible. For example, if you find a part of the RFC makes no sense, please say so, but don't stop there. Take the extra step and propose alternatives.

Cytoscape_2.8/EquationAttributes (last edited 2010-05-19 15:45:08 by malbec)

Funding for Cytoscape is provided by a federal grant from the U.S. National Institute of General Medical Sciences (NIGMS) of the Na tional Institutes of Health (NIH) under award number GM070743-01. Corporate funding is provided through a contract from Unilever PLC.

MoinMoin Appliance - Powered by TurnKey Linux