Differences between revisions 11 and 12
Revision 11 as of 2009-12-22 21:36:51
Size: 11608
Editor: malbec
Comment:
Revision 12 as of 2009-12-22 21:38:12
Size: 11599
Editor: malbec
Comment:
Deletions are marked like this. Additions are marked like this.
Line 113: Line 113:
abstract class EquationEvaluator: public CyAttribute { class EquationEvaluator: public CyAttribute {

RFC65: Equation Attributes

Editor(s): Johannes Ruscheinski

Date: 2009-12-22

Status: Open for comment

Proposal

Add the ability to Cytoscape to describe attributes as being derived or computed based on expression which may involve references to other attributes.

Background

This capability would seem to be a natural extension in the attribute browser. Especially for numerical-valued attributes it seems obvious that Cytoscape users might want to derive new quantities based on the values of other attributes. A similar argument holds for boolean attributes.

Use Cases

  • Allows users to easily colour a node based on a threshold value. Example: Given a Double attribute "someValue" and a threshold called "someThreshold" we could create a new boolean attribute called "exceedsThreshold" which would be defined by an equation attribute that is "$(someValue) > someThreshold".

  • Assume we have an attribute called "someAttrib" and we would like to assign colour values based on the log of this attribute's value then we could assign a new new Double attribute "shiftedLog" defined by "log10($(someAttrib)) + 22.7"
  • Consider an attribute that has missing values called "maybeMissing" and assume we could like to provide a derived attribute that has a default value of -10.2 when ever "maybeMissing" is not set, then we could create a new attribute called "neverMissing" defined by "$(maybeMissing=-10.2)"

Many more complex examples can be easily construed.

Implementation Plan

  1. Define a syntax for the expressions (Cull this as a subset from some programming language's grammar?)
  2. Write a parser (LL(1) grammar, recursive descent (http://en.wikipedia.org/wiki/Recursive_descent_parser)) implementing the syntax and error handling (Pay special attention to easy extension with more built-in functions.)

  3. Test and debug the stand-alone parser
  4. Identify the location in the Cytoscape code base for integration and integrate the parser
  5. Test and debug the new capability within Cytoscape

Project Timeline

  1. Define syntax (est. 1/2 day to 1 day)
  2. Implement and test stand-alone parser (3 to 4 days)
  3. Integrate the parser into the Cytoscape code base (3 to 4 days)

Total time: 6 1/2 to 9 days.

Tasks and Milestones

  1. Define syntax
    1. Cull a subset of the "C" syntax that corresponds to what we want to implement
    2. Potentially rearrange parts of the syntax to make it correspond to our precedence requirements
    3. Check for violations of LL(1) and transform to LL(1), if necessary
  2. Implement and test stand-alone parser
    1. Write a scanner/tokenizer (GetToken/UngetToken)

    2. Convert the syntax into a series of recursive function calls
    3. Add type checking and error handling
    4. Create unit tests
    5. Implement the built-in functions (Possibly add the capability to register Java function objects as built-in functions.)
    6. Test and debug
  3. Integrate the parser into the Cytoscape code base
    1. Find where in the Cytoscape code base would be the natural location for integration
    2. Actually integrate the new code
    3. Create unit tests for the integrated code
    4. Test the new code as part of Cytoscape
    5. Manually test and debug the new code and possibly extend the unit tests

Project Dependencies

This project does not depend on any other projects.

Informal Description of Equation Syntax and Functions

Attribute References

References to existing, possibly empty attributes can take two forms:

  1. Unconditional references look like $(attribName). This type of reference results in an error should the attribute "attribName" not exist or if it exists but has no assigned value.

  2. Conditional references look like $(attribName=defaultValue). This type of reference results in an error should the attribute "attribName" not exist or if the type of "defaultValue" is inconsistent with the type of "attribName". If attribName exists but is empty, the default value "defaultValue" will be substituted instead.

Initially only references to Double, Int, String and Boolean attributes will be allowed. In the case of Boolean attribute references the two predefined constants true and false will be the only possible default values.

Expressions

Expressions are composed of constants, operators, function calls and attribute references. Their syntax follows that of various programming languages and everyday expectations for arithmetic expressions. The result type of an expression can be either a Double or a Boolean value. Grouping and nesting of subexpressions may be accomplished by using parentheses.

Built-In Functions

Function name

Description

Argument Type(s)

Return Type

Comments

abs

absolute value

Double/Int

Double

log10

logarithm to base 10

Double/Int

Double

exp

e raised to a power

Double/Int

Double

round

nearest integer

Double/Int

Double

trunk

returns the integer part of a number, e.g. trunk(-1.3) = trunk(-1.9) = -1

Double/Int

Double

tolower

convert a string to lowercase

String

String

toupper

convert a string to uppercase

String

String

substring

returns the substring of "string" starting at 0-based offset "startIndex" and of length "length"

(string: String, startIndex: Int, length: Int)

String

This function is robust in that, if startIndex is beyond the end of "string", an empty string will be returned and if "length" would extend beyond the end of "string", as much as possible of "string" will be returned. On the other hand, a negative "startIndex" will result in an error.

format

convert an arithmetic value or expression to a string

(arithmeticExpression: Double/Int, width: Int, precision: Int [,padding: String [, type: (N|NE)]])

String

padding and type are optional and default to " " for padding and NE for type. Padding must be a string literal of length 1 and type can be either E (for exponential) or NE (for non-exponential). Padding will be used to extend a result, if its width is less than the specified width. Use a width of 0 if you do not want any padding. Precision specifies the number of significant digits in the generated string. When using the E format, generated strings will look like "2.3e5" rather than "23000".

Operators

The usual complement of arithmetic as well as boolean comparison operators will be provided. Operator precedence will probably follow the C/Java/FORTRAN conventions. Arithmetic operator precedence will follow that of common mathematic, i.e. unary plus and minus having the highest precedence, followed by exponentiation and then the other four arithmetic operators. Comparison operators will have intermediate precedence and boolean operators the lowest precedence.

Symbol

Description

Operand Type(s)

Result Type

Associativity

Comments

"-"

unary minus

Double,Int

Double

non-associative

"+"

unary plus

Double,Int

Double

non-associative

"+"

addition

Double/Int, Double/Int

Double

left

may result in -Inf or +Inf

"-"

subtraction

Double/Int, Double/Int

Double

left

may result in -Inf or +Inf

"*"

multiplication

Double/Int, Double/Int

Double

left

may result in NaN, +Inf, or -Inf

"/"

division

Double/Int, Double/Int

Double

left

may result in NaN, +Inf, or -Inf

"^"

exponentiation

Double/Int, Double/Int

Double

right

may result in NaN, +Inf, or -Inf

"=="

equality

Double/Int, Double/Int

Boolean

non-associative

"=="

equality

String, String

Boolean

non-associative

"!="

inequality

Double/Int, Double/Int

Boolean

non-associative

"!="

inequality

String, String

Boolean

non-associative

"not"

logical inversion

Boolean

Boolean

non-associative

"and"

logical conjunction

Boolean, Boolean

Boolean

non-associative

"or"

logical disjunction

Boolean, Boolean

Boolean

non-associative

"<"

less than

Double/Int, Double/Int

Boolean

non-associative

">"

greater than

Double/Int, Double/Int

Boolean

non-associative

"<="

less than or equal

Double/Int, Double/Int

Boolean

non-associative

">="

greater than or equal

Double/Int, Double/Int

Boolean

non-associative

String Constants a.k.a. String Literals

String literals are sequences of characters enclosed in double-quotes. Double-quotes, newlines or backslashes may be embedded in a string literal by preceding them with a backslash. Arbitrary unicode characters may be embedded in a string with a \uXXXX escape sequence where each X stands for a hexadecimal digit. This allows for the portable embedding of characters from foreign scripts or rarely used English characters with diacritical marks or ligatures etc.

Error Reporting

Type errors or arithmetic errors are reported in the field that contains the equation, for example if the function attempts to add a string and an integer or if an equation references a non-existent or empty attribute without providing a default. (Note: This may need some more thought, for example we may consider to provide an empty value in case of an error.)

Java API

The Java interface will defined by the following class:

class EquationEvaluator: public CyAttribute {
      /** Parses the equation specified by "equation" and generates a "skeleton" which can later be evaluated by specifying current values for attribute references.
        * @throws IllegalArgumentException Thrown if the basic syntax of "expression" is invalid.
        */
      public EquationEvaluator(final String equation) throws IllegalArgumentException;

      /** Actually evaluates an expression by filling in the attribute references contained in the equation passed into the constructor.
        * @throws IllegalArgumentException will be thrown if referenced attributes are missing, attributes that are required to be non-empty are empty or types of          
        * referenced attributes are incompatible with the context within which they are used.
        * @throws ArithmeticException Thrown if an arithmetic error occurred during the evaluation of the expression specified as the constructor argument.
        * @returns a CyAttribute resulting from an equation evaluated with values provided by "currentAttribs".
        */
      CyAttribute eval(final CyAttribute currentAttribs[]) throws IllegalArgumentException, ArithmeticException;
}

Comments

  • Add comment here…

How to Comment

Edit the page and add your comments under the provided header. By adding your ideas to the Wiki directly, we can more easily organize everyone's ideas, and keep clear records. Be sure to include today's date and your name for each comment. Try to keep your comments as concrete and constructive as possible. For example, if you find a part of the RFC makes no sense, please say so, but don't stop there. Take the extra step and propose alternatives.

Cytoscape_2.8/EquationAttributes (last edited 2010-05-19 15:45:08 by malbec)

Funding for Cytoscape is provided by a federal grant from the U.S. National Institute of General Medical Sciences (NIGMS) of the Na tional Institutes of Health (NIH) under award number GM070743-01. Corporate funding is provided through a contract from Unilever PLC.

MoinMoin Appliance - Powered by TurnKey Linux