Tutorial 5: Using the Agilent Literature Search plugin <> In cases where there are few measured interactions, text mining can be a useful mechanism for inferring network data. The Agilent Literature Search plugin for Cytoscape provides a flexible, interactive platform for mining text and assessing the results in a network context. Here we shall explore the use of this plugin. This plugin searches public literature repositories such as !PubMed for articles matching user-specified queries, and then builds a network based on putative associations suggested in the text of the articles. Putative associations are sentences of two or more gene or protein names, and verbs that suggest interaction such as "catalyzes", "is repressed by", or "regulates". For a description of the algorithm, see [[http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=15608051&query_hl=1&itool=pubmed_docsum|this article]]. If you have completed the basic Cytoscape tutorials, in this tutorial you will: * Learn how to apply the Agilent Literature Search plugin to build a network of putative molecular associations from a set of literature search terms * Explore the associations generated by the plugin, and learn how to remove any that you judge to be in error. * Learn how to refine your search by using context information. This tutorial and accompanying lectures were delivered at [[http://www.csc.fi/suomi/info/index.phtml.en|CSC]], the Finnish IT center for science. The lecture slides of background material and an accompanying video presentation are available courtesy of the CSC at [[http://www.csc.fi/english/research/sciences/bioscience/Courses_and_events/cytoscape/index_html]]. '''Important:''' Public literature repositories such as !PubMed are always changing. The illustrations shown here are based on the public literature databases as they existed when this document was written. At another time, after more papers have been published, the composition of the databases will change, and the exact search results will change. Consequently, ''your own search results will probably not look exactly like the ones shown here.'' If you have not already done so, download and install Cytoscape on your local computer, following the instructions given in the Cytoscape manual. Download and install the Agilent Literature Search plugin, as follows: 1. Go to the [[http://www.cytoscape.org/download_agilent_literature_search_v2.4.php?file=litsearch_v2.4|plugin license page]]. 2. Read the license agreement and fill out the short form below, clicking the checkbox to accept the terms of the agreement. 3. Click on the Proceed to Download button, proceeding from there to the plugin download page. 4. Select the installer for your platform (Windows, Mac OS X, or Linux), click on the appropriate download link, and follow the installation instructions provided via the adjacent View link. 5. If you are currently running Cytoscape, exit. == Basic operation == 1. Start up Cytoscape. Under the Plugins menu, select Agilent Literature Search. The Agilent Literature Search Agreement will appear. Check the "Don't show this dialog again" box and click the Accept button. 2. The following window will appear. {{attachment:small_proxy.png}} If you are using a proxy server to access the Internet, click Yes and enter your proxy settings. Otherwise, click No. 3. The following window should appear. {{attachment:small_litsearch.png}} 4. In the Terms window, enter P53. The term "P53" should appear in the Query Editor, and the forward arrow just below should turn blue to indicate it is available. Click on the forward arrow to begin searching. 5. After a brief interval, the search results should appear in two places: i. Under Query Matches, there should appear a numbered list of articles labeled Results, as shown below. A slider at the right side of the window allows you to scroll through the list of selected articles. Each article should be listed along with a URL, and a hyperlink for jumping directly to that URL. Since the number of matches was set to ten, up to ten articles will be displayed. In general, these are the most recent articles that match your search terms. If you bring up a web browser and search !PubMed on the same search term (P53) you should see exactly the same matches. {{attachment:small_query.png}} i. A network should appear on the Cytoscape canvas, showing interactions inferred from sentences in the selected articles. {{attachment:small_network.png}} In this case, the canvas shows tp53 connected directly to eight nodes, with another four connected to yy1 but not tp53. How did these other four nodes and their interactions get into a network generated by a search on P53? This is a consequence of the two-step process used by the Literature Search plugin: * First, it selects articles according to your query terms. In general, this retrieves articles based on comparing the search term to the abstract and keywords. * Second, it scans the complete text of the articles for sentences describing putative interactions between genes or proteins. It then adds all putative interactions to the network, whether or not they involve any of the search terms. Why? If a putative interaction appeared in an article that relates to the search tems, then it's expected that it relates to the search terms in some way. * In consequence, when you perform a literature search, you will typically get a network with many genes or proteins that you did not search for specifically. In some cases, the genes or proteins that you did search for might not appear in your network. This is not a bug or a usage error: it simply reflects what putative interactions were found in the articles retrieved. You can usually avoid this by increasing the number of matches. == Validating, refining, and saving your search results == You can explore the sentences that were selected as evidence of interaction between these nodes, as follows: 1. Go to your Cytoscape canvas and click on some edge with your right mouse button. A menu should appear listing the interaction and listing a sub-menu labeled Evidence from Literature. This sub-menu should have four options, labeled Show Sentences from the Literature, Gather Evidence from the Literature, Extend Network from the Literature, and Highlight Search Terms. 2. Select Show Sentences from the Literature. A window should appear, listing a number of sentences as shown: {{attachment:small_sentences.png}} 3. Recall that each of these sentences is ''predicted'' to represent an interaction between these two proteins. What if you disagree with one of these predictions? Right-click on the sentence, and a menu should appear with the option Delete Sentence. Click on the option to delete the sentence, or click elsewhere to keep it. 4. Delete all the sentences for these two nodes. When you are done, notice that the canvas has changed: the edge between these nodes has been removed. Since this leaves the two nodes with no connections, the nodes are also deleted. In addition to exploring individual sentences, you can explore the evidence from entire articles, as follows: 1. Return to the Query Matches section of the Agilent Literature Search window. 2. Right-click on the first match. 3. A popup menu should appear with the option Delete Match. Click to remove the match to the first article, along with any interactions supported by that article only. After deleting the match, any nodes containing only edges that are supported by only that article will be deleted. 4. If the article has a small Cytoscape logo next to the title, the right-click menu will also show the option "Highlight Match". Selecting this option will highlight the matches derived from this article on the Cytoscape canvas. The nodes should turn yellow, and the edges between them should turn red. 5. Under the File menu of the Agilent Literature Search window, you will see options labeled Load Search Results and Save Search Results. If you want to save a set of search results for later analysis, these options will allow you to do so. == Refining your search == Under the Agilent Literature Search window, there are a number of basic search controls, as described here. * There is a pull-down menu to select an organism ("Concept Lexicon"). * There is a threshold on the maximum number of matches per search engine ("Max Engine Matches"). '''Out of courtesy to the public search engines, always try to use a low threshold!''' If you are experimenting with the use of the plugin, or have just started a new line of analysis, always start with a small number of matches and increase that gradually as needed. * Under Extraction Controls, there is a menu labeled Interaction Lexicon with a choice of ''limited'' and ''relaxed''. This controls the set of verbs that identify putative protein interaction sentences: ''limited'' selects a high-confidence set of verbs (such as "activate", "methylate" and "cleave"), while ''relaxed'' selects a more permissive set (including "join", "augment", and "induce"). Repeat your search on P53 with the Interaction Lexicon set to relaxed. How has the network changed? Can you identify any new edges? Compare the sentences associated with the old and the new edges. * Under the View menu of the Agilent Literature Search window, you will find the option Engine Selections. When you click on Engine Selections, you should see the following menu: {{attachment:small_engines.png}} Repeat your query with OMIM and USPTO selected in addition. How does the network change? Return to this menu, and turn off querying OMIM and USPTO for the moment. * Under the Search Controls section of the window, there is a button labeled Use Aliases. Click on this button, and in the Query Editor you should see your search term of "p53" change to "(p53 OR trp53 OR tp53)". This is a very useful option, because gene names have many aliases. The only time when it is not valuable is when you believe that the aliases really identify two distinct macromolecules. In such cases, you can still edit your query under the Query Editor to remove any alias you do not wish to use. i. Repeat the search using aliases. How did your network change? i. In the Query Editor, modify the query so that it reads "(p53 OR tp53)" and repeat your search. How did the network change this time? * You can specify multiple search terms. Under Terms, under P53, add the oncogenes BCLX and SRC. Note that each term should be on a separate line. Run the search by clicking on the blue forward arrow. Your query window should appear as follows: {{attachment:small_multiple.png}} When performing a complex query, you should see up to ten matches per search term returned, and a larger network on the canvas such as the one shown: {{attachment:small_complex.png}} * Specifying a context provides a valuable way to refine your search. This can yield a network that is more specific to your biological question, and potentially just as large. i. Set your list of terms to P53. i. In the Context window, enter "Cancer". i. Click on the Use Context button in the Search Controls section. Your search window should appear as follows: {{attachment:small_context.png}} Notice that in your query window, your query specifies P53 AND Cancer. In other words, the context acts as a filter. i. Experiment with adding some additional search and context terms (one per line) to see how the query changes. Note that you can also enter or modify your query under the Query Editor. i. Perform a search on P53 AND Cancer. You should get back ten query matches and the corresponding network. i. Enter a new search on P53, this time with the context "dna repair". '''Note: when a search term consists of two or more consecutive words, the term should be put in quotation marks.''' This should produce a different set of ten articles and a different network. * Context searching can also be used to control such search parameters as what specific journals are searched. Suppose you are interested in P53 in cancer, but only as published in the journals Science or Nature. You can do this search as follows: i. Add the following lines to your context list: {{{Science[ta] Nature[ta] }}} i. This results in (P53) AND (Cancer OR Science[ta] OR Nature[ta]). Unfortunately, this is not quite what we want: Instead of returning articles on P53 and cancer published in the selected journals, it would return articles on P53 that either involved cancer or were published in the selected journals. So, in your query editor, revise the query to read {{{(P53) AND Cancer AND (Science[ta] OR Nature[ta])}}} i. Issue your query, and explore the query results as outlined above. The search context can be used in similar ways to limit !PubMed searches by !MeSH term, publication date, and other attributes. For more information, see this !PubMed [[http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helppubmed.box.pubmedhelp.Box_1_Search_Field_D|help document]]. == Gathering Supporting Evidence == In addition to mining the literature for associations, you can use the Agilent Literature Search plugin to verify other interactions against the literature. This section illustrates how. 1. Download [[attachment:BINDHumanSubset.sif|BINDHumanSubset.sif]] to your local computer by right-clicking on the hyperlink and saving the file. This is a sample set of interactions from the [[http://bind.ca/|BIND database]] provided by BOND. '''Important:''' This functionality relies on parameters defined in the last use of the Literature Search plugin, in this session of Cytoscape. If you have taken a break since the previous steps, and have just started a new Cytoscape session, then before continuing, you must do a literature search with Homo Sapiens selected as the species. 2. Load the network BINDHumanSubset.sif. Lay out the network with the yFiles Organic layout. Your Cytoscape canvas should appear as shown: {{attachment:small_human.png}} 3. Select by name the node Sp1, and zoom in to get a closer look of the other nodes that interact with it. Your canvas should appear as shown: {{attachment:small_sp1.png}} 4. Right-click on the edge between Sp1 and Myc. You should see a menu containing the option Evidence from Literature. This item should take you to another menu, with the option Gather Evidence from the Literature. Select this option, and a literature search will be executed. When it is done, a message will appear at the bottom of the Cytoscape Desktop indicating if any new evidence was found. 5. Go to the Edge Attribute Browser and click on the Select Attributes button. You should see several new attributes: * HasTSI: Indicates if the interaction is supported by literature searching * !NumberOfSources: Indicates the number of distinct articles supporting the interaction * nbrSentences: Indicates the number of distinct sentences supporting the interaction How much support is there for this interaction? 6. Go back to the Cytoscape Desktop, and right-click on the edge between Myc and Sp1 again. If any evidence was found in the literature for this interaction, then under Evidence from Literature, there should be a new option: Show Sentences from the Literature. Selecting this option will bring up a window listing the supporting sentences, as shown below. {{attachment:small_support.png}} 7. Exactly what was the search that generated these results? The Gather Evidence function performs a search of ten articles per search term, using the two nodes as search terms and using the species and interaction lexicon used in the last full literature search. For example, if your last full literature search was done on mouse with a relaxed interaction lexicon, then the Gather Evidence function would search for the two nodes in mouse, using a relaxed interaction lexicon. You can also gather evidence on all the edges for a given node. For example, select the node YY1 and right-click on it. Select Gather Evidence from the Literature in the resulting menu. This will perform a literature search on all the edges of the node. Congratulations! You have not only completed one more tutorial, but you've also learned how to use a powerful tool that is also fun to play with! For comments or suggestions, please post to the [[http://cytoscape.org/community.php|cytoscape-discuss]] mailing list. Return to the Cytoscape [[Presentations/Advanced|advanced tutorials]].