--- output: pdf_document --- # Protocol to Build Interaction Network from Co-Expression Data Set knitr global properties ```{r setup, include=F} knitr::opts_chunk$set(echo=T, message=F, warning=F) ``` ## load common functions defined in Utils.R ```{r load Utils.R} source("Utils.R") ``` ## Package list needed igraph: to ultimately build a co-expression graph reshape: function cast() ```{r package list} CRAN.packages <- c("igraph", "reshape") ``` ## Package install and library loading ```{r libraries} install.packages.if.necessary(CRAN.packages) ``` ## Set data directory, create it if necessary ```{r set data dir} data.dir <- "networks" if (!file.exists(data.dir)) { dir.create(data.dir) } ``` ## Build Co-Expression Network from RNA-seq expression data Site: http://www.proteinatlas.org/about/download Work on RNA data: "rna_tissue.csv.zip" and "rna_celline.csv.zip" expression data file from Human Protein Atlas in the "networks" directory These data correspond to RNA-seq expression levels in 45 cell lines and 32 tissues. It contains Ensembl genes, corresponding FPKM values (Fragments Per Kilobase Of Exon Per Million Fragments Mapped) and HGNC canonical names ### Fetch data from the internet, build data.frame, work on interesting columns (Gene.name, Sample, Value), ```{r fetch data} tissue.url <- "http://www.proteinatlas.org/download/rna_tissue.csv.zip" data.tissue <- get.data.from.zip(download.if.necessary(tissue.url, data.dir)) data.tissue <- data.tissue[, c(2,3,4)] celline.url <- "http://www.proteinatlas.org/download/rna_celline.csv.zip" data.celline <- get.data.from.zip(download.if.necessary(celline.url, data.dir)) data.celline <- data.celline[, c(2,3,4)] ``` ### Merge both data sets ```{r merging data} #data <- rbind(data.tissue, data.celline) # too heavy, hence use tissue-only as a workaround data <- data.tissue ``` ### Find the mean of Value by Sample for each Gene.name. Output a matrix with Gene.name as first column and, in every other column compute the mean Value for each given Sample Then transform "Gene.name"" column into row names ```{r co-expr matrix} coexpr <- cast(data, Gene.name ~ Sample, mean, value="Value") rownames(coexpr) <- coexpr[, 1] coexpr[, 1] <- NULL ``` ### Compute Spearman correlation (correlation between ranks of values) Need to take the transposed matrix (rows <-> columns) Values rounded to the 2nd decimal Be aware it is a 3GB matrix ```{r threshold} matrix.coexpr <- as.matrix(coexpr) CR <- round(cor(t(matrix.coexpr), method="spearman"), 2) ``` ### Select correlation threshold and build a network The correlation threshold is set to 0.7. All pairs of genes that have a correlation above this threshold (positive or negative) are considered as linked in the Co-Expression network. Select indexes over the threshold Retrieve names matching the index Construct a network from indexes Transform matrix into data.frame And change column names ```{r build graph as matrix} inds <- which(abs(CR[1:nrow(CR), ]) >= 0.9, arr.ind=T) rnames <- rownames(CR)[inds[, 1]] cnames <- colnames(CR)[inds[, 2]] net <- apply(cbind(rnames, cnames), 1, function(x) unname(unlist(x))) net <- as.data.frame(t(net)) colnames(net) <- c("Gene1", "Gene2") ``` ### Graph conversion, simplification and export to file ```{r build graph as igraph} net <- build.network(as.matrix(net)) ecount(net) gorder(net) write.graph(net, file.path(data.dir, "Co-Expression.gr"), format="ncol", weights=NULL) ```