--- output: pdf_document --- # Protocol to Build Interaction Network from PPI Data Set knitr global properties ```{r setup, include=F} knitr::opts_chunk$set(echo=T, message=F, warning=F) ``` ## load common functions defined in Utils.R ```{r load Utils.R} source("Utils.R") ``` ## Package list needed igraph: for function ecount() PSICQUIC: to fetch interaction data from public databases ```{r package list} CRAN.packages <- c("igraph") bioconductor.packages <- c("PSICQUIC") ``` ## Package install and library loading ```{r libraries} install.packages.if.necessary(CRAN.packages, bioconductor.packages) ``` ## Set data directory, create it if necessary ```{r set data dir} data.dir <- "networks" if (!file.exists(data.dir)) { dir.create(data.dir) } ``` ## Protein-protein interaction Network from PSICQUIC + CCSB Interactome Database The goal is to construct a protein-protein interaction network containing binary direct physical protein interactions. Two sources are used : * The PSICQUIC portal, which allow to retrieve interaction data from different databases participating to the IMEX consortium. These databases describe the interactions in the PSI-MI format, allowing to select direct interactions between human proteins. Reference: del-Toro, N., Dumousseau, M., Orchard, S., Jimenez, R. C., Galeota, E., Launay, G., … Hermjakob, H. (2013). A new reference implementation of the PSICQUIC web service. Nucleic Acids Research, 41(Web Server issue), W601–6. * The Center for Cancer Systems Biology (CCSB) generates human interactome data. They have released in particular the dataset HI-II-2014 containing validated yeast 2-hybrid interactions between human proteins. Reference: Rolland, T., Taşan, M., Charloteaux, B., Pevzner, S. J., Zhong, Q., Sahni, N., … Vidal, M. (2014). A Proteome-Scale Map of the Human Interactome Network. Cell, 159(5), 1212–1226. doi:10.1016/j.cell.2014.10.050 The code below creates networks from these 2 sources and merge them together ### Human PPI from PSICQUIC databases Fetch human interactions (9096), from given list of databases, type direct (possibility to look for specific detection methods) ```{r fetch from PSICQUIC} psicquic <- PSICQUIC() # providers(psicquic) list all available DBs DB.list <- c("BioGrid", "IntAct", "MatrixDB", "MBInfo", "MINT", "Reactome", "Reactome-FIs", "UniProt") tbl.big <- PSICQUIC::interactions(psicquic, species="9606", provider=DB.list, # detectionMethod="experimental interaction detection", type="direct interaction") ``` Mapping gene names to HGNC Different gene name formats are used in the different databases. Everything is converted to HGNC gene symbol ```{r map names using biomart} tbl.big <- addGeneInfo(IDMapper("9606"), tbl.big) ``` Extract columns of interest (protein A / protein B) and construct binary network Replace empty cells by NA Remove incomplete lines ```{r clean HGNC network} colnames(tbl.big)[names(tbl.big) == "A.name"] <- "Symbol.A" colnames(tbl.big)[names(tbl.big) == "B.name"] <- "Symbol.B" NetHGNC <- clean.network(tbl.big) ``` ### Human PPI from CCSB interactome database Download the data file from "http://interactome.dfci.harvard.edu/H_sapiens/index.php" ```{r fetch CCSB data} interactome.file <- "http://interactome.dfci.harvard.edu/H_sapiens/download/HI-II-14.tsv" CCSB <- read.table(download.if.necessary(interactome.file, data.dir), header=T, sep="\t", as.is=T) ``` Select column of interest Replace empty cells by NA Remove incomplete lines ```{r clean CCSB network} CCSB <- clean.network(CCSB) ``` ### Merge the 2 PPI Networks ```{r merge} net <- rbind (CCSB, NetHGNC) ``` ### Graph conversion, simplification and export to file ```{r build igraph} net <- build.network(as.matrix(net)) ecount(net) gorder(net) write.graph(net, file.path(data.dir, "PPI.gr"), format="ncol", weights=NULL) ```