EthanCerami, 10/1/2005 - While we are at it, perhaps we should consider creating a class called CyAttributeNames, that will contain a list of recommended / officially supported attribute names. This would be similar to a controlled vocabulary list, but would not actually be enforced by CyAttributes. For example, CyAttributeNames will include constants for attributes, such as COMMON_NAME, CANONICAL_NAME, SPECIES, COMMENT, DATABASE_SOURCE, SEQUENCE, etc. If we all use these recommended terms, it's a bit easier to share data between plugins, and we can also create more meaningful attribute browsers. We can then modify the javadocs for each of the setters in CyAttributes to say something like: 'You are free to use any attribute name you like. However, for maximal benefit, consider using one of the recommended attribute names in CyAttributeNames.' I think we have talked about doing this for a real long time, and now is probably the time to do it.
Here is a proposed first draft of CyAttributeNames
1 package cytoscape.data;
2
3 /**
4 * Comments go here...
5 **/
6 public class CyAttributeNames {
7 public static final String CANONICAL_NAME = "canonicalName";
8 public static final String COMMON_NAME = "commonName";
9 public static final String ORGANISM_SPECIES_NAME = "organism_species";
10 public static final String ORGANISM_COMMON_NAME = "organism_common_name";
11 public static final String ORGANISM_NCBI_TAXONOMY_ID =
12 "organism_ncbi_taxonomy_id";
13 public static final String SEQUENCE = "sequence";
14 public static final String MOLECULE_TYPE = "molecule_type";
15 public static final String GO_MOLECULAR_FUNCTION = "molecular_function";
16 public static final String GO_BIOLOGICAL_PROCESS = "biological_process";
17 public static final String GO_CELLULAR_COMPONENT = "cellular_component";
18 public static final String COMMENT = "comment";
19 public static final String DESCRIPTION = "description";
20 public static final String PUB_MED_ID = "pub_med_id";
21 }
22
IlianaAvila - 10/3/05 - We have cytoscape.data.Semantics, which contains static names like CANONICAL_NAME, COMMON_NAME, INTERACTION, etc. Would we deprecate it?
GaryBader - 10/4/05 - I think most of these semantics should be moved into semantic plugin packages so that people can swap in the PSI-MI semantics for the BioPAX semantics for the xyz semantics, etc. We would need to better organize these attribute names to avoid generating a huge flat list in CyAttributeNames. Maybe some basic attribute names should be available. As Iliana mentioned, these are in data.Semantics - we should re-evaluate that class to see if it's accessible enough for users.
EthanCerami - 10/4/05 - At yesterday's meeting, Iliana agreed to look into data.Semantics some more, and see if we could just re-use it as much as possible. Two things to consider, however: 1) data.Semantics has a bunch of String constants, but some of them (e.g. COMMON_NAME) are for attribute names, and some of them (e.g. DNA) are for attribute values. So, we can't just modify CyAttributes to say something like, 'go check out data.Semantics for a list of possible attribute names', b/c this is not true per say; and 2) data.Semantics is overloaded with lots of functionality, including attribute names, attribute values, biodata server, species query methods, node alias methods, synonym methods, etc. I think that confuses the issue, and it would be much more clear if we had one class or one set of classes that held attribute names only.
IlianaAvila -10/4/05 -- Here are my first impressions of cytoscape.data.Semantics:
WHAT IS cytoscape.data.Semantics?
1. Container for static variables CANONICAL_NAME, COMMON_NAME, ALIASES, SPECIES and INTERACTION (among questionable others, like "DNA").
2. Container for 9 static methods whose function is mostly related to one of these: synonym handling or species handling. Most of these methods make use of BioDataServer, and (soon to be removed) GraphObjAttributes. Only 4 of these 9 methods are used in the core (5 classes use them). Some of the methods are dumb. Like public static String getInteractionType( cytoscape.CyNetwork network, Edge e), which simply calls a single GraphObjAttributes method.
WHAT SHOULD WE DO WITH cytoscape.data.Semantics for 2.3?
1. CANONICAL_NAME, COMMON_NAME, ALIASES, SPECIES and INTERACTION are used in several places in the core. I personally use them too in my plugins, so I am guessing other plugins use them too. Therefore, these static variables need to be present in the core, either in cytoscape.data.Semantics or in the class that replaces cytoscape.data.Semantics (in which case, they would be deprecated in Semantics).
(BTW. I am not sure I like "SPECIES", since we are supposed to be biological-semantics free. Maybe we should consider replacing "SPECIES" by some other name, like "CATEGORY", "CLASS", or something like that, although these last ones could also replace "MOLECULE_TYPE". In short, I would like to see biological vocabulary eliminated)
2. The BioDataServer is a whole different beast that we decided to think about for 2.4. We need to decide how to handle synonyms and species for 2.4. So I suggest that we leave cytoscape.data.Semantics as it is (don't deprecate the whole class yet) until 2.4.
3. I think that the static variables should be in a separate class from the methods for synonyms and species handling. So, for 2.3, if we decide to introduce the new class CyAttributeNames, we would need to deprecate the static variables in cytoscape.data.Semantics and direct the user to the new class which would be located in the new package cytoscape.data.attr.
James McIninch - 10/6/05 -- A few things... First, I would say that the namespace for things associated with attributes ought to be separate. Further, there is a distinction between a CV for keys and values that should be recognized. I'd almost go so far as to say that the constants should be typesafe enumerations.
I would prefer cytoscape.node.attribute.Key.NCBI_TAXON_ID and cytoscape.node.attribute.Value.HOMO_SAPIENS as the form for namespaces associated with these sorts of constants/controlled vocabularies. In the ideal case, these things would be externalized so that they could be localized.
Also, I'd like a little more care given to the conventions for these identifiers. We are already mixing 'canonicalName', Java-style, with 'molecular_function', C-style key names. And 'organism_ncbi_taxon_id' seems a little redundant when 'ncbi_taxon_id' would do, yet 'molecular_function' is too generic if you really mean 'go_molecular_function'. I'd like to see both a little more rationale behind the naming and documentation that conveys that rationale to later developers. If we must, maybe we should use QNames in place of Strings.
RowanChristmas 10/06/05 -- I would also like to see canonicalName go away and get replaced with species_systematic_name.
GaryBader - Oct.12.2005 - This is a very large issue and I don't think we have the resources to deal with it rigorously right now without delaying the next release. I think we should table this and deal with it for the next version (2.3). This means leave the Semantics class as is and don't add any new attribute name classes now.
EthanCerami - October 13, 2005 -- Cytoscape developers group voted to accept the CyAttributes API, but defer the issue of attribute names until a later release.