Tutorial 4: Basic Expression Analysis <> If you have completed the [[Presentations/01_Get_Started|Getting Started]] and [[Presentations/02_Filter_Edit|Filters and Editor]] tutorials, this tutorial will show you some expression analysis basics available wth Cytoscape. This tutorial will introduce you to: * Input formats for expression data * Coloring nodes by expression data values * Assessing expression data in the context of a biological network This tutorial and accompanying lectures were delivered at [[http://www.csc.fi/suomi/info/index.phtml.en|CSC]], the Finnish IT center for science. The lecture slides of background material and an accompanying video presentation are available courtesy of the CSC at http://www.csc.fi/english/research/sciences/bioscience/Courses_and_events/cytoscape/index_html. This tutorial features the following data files: * [[attachment:galFiltered.sif]] , also distributed in the Cytoscape testData directory. This network contains protein-protein and protein-DNA interactions associated with Galactose metabolism in yeast. * [[attachment:galExpData.pvals]] , also distributed in the Cytoscape testData directory. This file contains gene expression measurements for three pertubation experiments. In each experiment, the level of one key protein was perturbed artificially. * [[attachment:galExpData.mrna]] , '''not''' distributed in the Cytoscape testData directory. This file contains a subset of the data from galExpData.pvals. For further information on these datasets, see ''Science'' 2001, '''292''':929-34. Before starting, please download these data files to your local disk by right-clicking on the links and saving them. Ensure that the filenames do not have ".txt" appended to them. Begin by clicking here: [[attachment:cyto.jnlp|WEB START]] (approximate download size: 22 MB) This starts Cytoscape on your own computer, after downloading the program and annotations from our website. == Loading expression data == Here, we will explore the basic structures for applying expression data to a Cytoscape network. In this section, you will learn about the expression data formats expected by Cytoscape, to format the nodes in your network according to expression values. 1. Start Cytoscape and load the network galFiltered.sif. After detaching Data Panel and Results Panel, maximizing the canvas, and applying the spring-embedded layout , you should see a nework similar to the one below. . {{attachment:small_Fig1.jpg}} 1. Using your favorite text editor, open the file galExpData.mrna. The first few lines of the file are as follows: . {{{ GENE COMMON gal1RG gal1RG gal80R YHR051W COX6 -0.034 -0.034 -0.304 YHR124W NDT80 -0.090 -0.090 -0.348 YKL181W PRS1 -0.167 -0.167 0.112 YGR072W UPF3 0.245 0.245 0.787 }}} The file structure is as follows: * The first line consists of labels. * All columns are separated by a single whitespace character, such as a space or a tab. * The first column contains node names, and must match the names of the nodes in your network '''exactly!''' * The second column contains common locus names. This column is optional, and the data is not currently used by Cytoscape, but including this column makes the format consistent with the output of many microarray analysis packages, and makes the file easier to read. * The remaining columns contain experimental data, one column per experiment, and one line per node. In this case, there are three expression results per node. 1. Under the File menu, select Import → Attribute/Expression Matrix... and import galExpData.mrna. After a brief load, a status window will appear, indicating how many experimental conditions were found (three) and what type of significance values were included (none). Click the Close button. . {{attachment:loadgalexp.png}} 1. Now we will use the Node Attribute Browser to browse through the expression data, as follows. i. Select a node on the Cytoscape canvas. i. In the Node Attribute Browser, click the Select Attributes {{attachment:select.png}} button, and select the attributes gal1RGexp, gal4RGexp, and gal80Rexp by left-clicking on them. Right-click to close the menu. i. Under the Node Attribute Browser, you should see your node listed with their expression values, as shown. . {{attachment:galbrowse.png}} == Coloring nodes == Probably the most common use of expression data in Cytoscape is to set the visual attributes of the nodes in a network according to expression data. This creates a powerful visualization, portraying functional relation and experimental response at the same time. Here, we will walk through the steps for doing this. 1. Open the !VizMapper by clicking on its icon: {{attachment:vizmapper.png}} 1. Create a new visual style named Gal80 by clicking on the Copy existing Visual style button to duplicate the default style. 1. Define the node color of this visual style as follows: i. Under the Node Color tab, set the Mapping type to continuous. i. In the dropdown list Node Color, select the attribute gal80RGexp. This specifies that each node will be colored on a color continuum according to Gal80 expression: * Large negative values (indicating high repression) are colored red * Small negative values (indicating slight repression) are colored pink * Values close to zero are colored white * Small positive values (indicating slight induction) are colored light green * Large positive values (indicating high induction) are colored bright green * Extreme values (negative values less than -2.5 and positive values greater than 2.1) are colored blue and black respectively i. Note that the default node color of pink falls within this spectrum. A useful trick is to choose a color outside this spectrum to distinguish between nodes with no defined expression value and those with slight repression. Under Default, click on Change Default, and select a default color of grey. 1. Click on Apply. You should see most nodes colored pink, green, or white, with a few grey nodes and a few black nodes. == A biological analysis scenario == This section presents one scenario of how expression data can be combined with network data to tell a biological story. First, here is some background on your data. You are working with yeast, and the genes Gal1, Gal4, and Gal80 are all yeast transcription factors. Your expression experiments all involve some pertubation of these transcription factor genes. Gal1, Gal4, and Gal80 are also represented in your interaction network, where they are labeled according to yeast locus tags: Gal1 corresponds to YBR020W, Gal4 to YPL248C, and Gal80 to YML051W. Your network contains a combination of protein-protein (pp) and protein-DNA (pd) interactions. Here, we shall filter out the protein-protein interactions to focus on the protein-DNA interactions. 1. Create a filter to select edges with text attributes of interaction that match the pattern pp. For more information, see the tutorial on [[Presentations/Basic|filters and editing]]. 1. Click on Apply selected filter. This should select 251 of the 362 edges. 1. Under the Edit menu, select Delete Selected Nodes and Edges, and then apply a graph layout algorithm to see the edges that remain. Using the yFiles Organic layout, your network should now appear as follows: . {{attachment:organic.png}} Notice that all three black (highly induced) nodes are in the same region of the graph. Zoom into the graph to see more details. 1. Notice that there are two nodes that interact with all three black nodes: YPL248C and YOL051W. Select these two nodes and their immediate neighbors, and copy them to a new network. This makes it easier to focus on the interactions involving these nodes. With some layout and zooming, this new network should appear similar to the one shown: . {{attachment:small_Fig7.jpg}} 1. With a little exploration in the node attribute browser, you should see the following: * The two nodes that interact with all three black nodes are YOL051W (Gal11, a general transcription cofactor with many interactions) and YPL248C (Gal4). * Both nodes show fairly small changes in expression, and neither change is statistically-significant: they are rendered as light-colored circles. These slight changes in expression suggest that the critical change affecting the black nodes might be somewhere else in the network, and not either of these nodes. * YPL248C interacts with YML051W (Gal80), which shows a significant level of repression: it is depicted as a reddish square. * Note that while YML051W shows evidence of significant repression, most nodes interacting with YPL248C show significant levels of induction: they are rendered as green or black squares. 1. Go to the NCBI website (http://www.ncbi.nlm.nih.gov/), and search the Gene database for YPL248C. The items returned should include Gal4. Click on the link for Gal4 to get more information. 1. Reading the description of Gal4, you will see that it is a transcription factor that is repressed by Gal80. 1. Putting all of this together, we see that the transcriptional activation activity of Gal4 is repressed by Gal80. So, repression of Gal80 increases the transcriptional activation activity of Gal4. Even though the expression of Gal4 itself did not change much, the Gal4 transcripts were much more likely to be active transcription factors when Gal80 was repressed. This explains why there is so much up-regulation in the vicinity of Gal4. Good work! Network analysis and expression data are a powerful combination, and now you have the skills to do some substantial analysis. Go reward yourself with a good cup of coffee. For comments or suggestions, please post to the [[http://cytoscape.org/community.php|cytoscape-discuss]] mailing list. Return to Cytoscape [[Presentations/Basic|introductory tutorials]].