### 技术控

今日:289| 主题:58108

# [其他] Network visualization – part 6: D3 and R (networkD3)

176 1
 (This article was first published on Fun with R , and kindly contributed toR-bloggers)   I was never that much into JavaScript until I was introduced to D3.js. This open source JS library provides the features for dynamic data manipulation and visualization and allows users to become active participants in data visualization process. As such, D3.js plots are not a static “as it is” data representation, but allow users to explore data points, hierarchies among the data, filter data by groups, and similar.    As somebody who uses R often for data analysis, I was excited to see the some of the libraries that link R and D3, such as plotly and networkD3 (or previously https://cran.r-project.org/web/packages/d3Network/). Here, I will focus on the networkD3 package.    As before, the network I will use as an illustration is a weighted network of characters’ coappearances in Victor Hugo’s novel “Les Miserables” (from D. E. Knuth, The Stanford GraphBase: A Platform for Combinatorial Computing, Addison-Wesley, Reading, MA, 1993) that consists of 77 nodes, corresponding to characters, and 254 weighted edges, corresponding to the number of characters coappearances in the same chapter of the book. I used four properties to characterize this network (for the sole purpose of making visualization more interesting) – the network nodes were characterized with two properties: degree and betweenness centrality, and the network edges were characterized with two properties: weight and Dice similarity (to see more details about these properties, see Network Visualization part 1 blog post). For calculation of network properties, I used the igraph package.    The networkD3 package provides a function called igraph_to_networkD3 , that uses an igraph object to convert it into a format that networkD3 uses to create a network representation. As I used igraph object to store my network, including node and edge properties, I was hoping that I may only need to use this function to create a visualization of my network. However, this function does not work exactly like that (which is not that surprising, given the differences in how D3.js works and how igraph object is defined). Instead, it extracts lists of nodes and edges from the igraph object, but not the information about all node and edges properties (the exception is a priori specified information about nodes membership groups/clusters, which can be derived from one or more network properties, e.g., node degree). Additionally, the igraph_to_networkD3 function does not plot the network itself, but only extracts parameters that are later used in the forceNetwork function that plots the network.    So let’s focus on the forceNetwork function instead. This function creates a D3.js force directed network graph from two data frames, one containing information about network nodes and the other one containing information about network edges. In our case, these data frames, denoted as nodeList and edgeList, respectively, contain the following columns (for more details see the code at the end of this post):    nodeList:“ ID “, “ nName “, “ nodeDegree “, and “ nodeBetweenness ”    edgeList:“ SourceID “, “ SourceName “, “ TargetID “, “ TargetName “, “ Weight “, and “ diceSim ”    Given the information about nodes and edges stored in these data frame, we will use the forceNetwork function to create a network in which node size corresponds to the node betweenness value, node color corresponds to node degree, distance between two nodes and edge thickness corresponds to edge weight, and edge color corresponds to the dice similarity. Each node will be described by its name. The forceNetwork function expects edge list to contain pairs of interactions in form of their IDs (starting from zero). Node attributes, stored in the nodeList data frame, are expected to be ordered in the increasing order starting with the first node (ID = 0). Based on this ordering, the forceNetwork function will know which node ID to map to specified node property (in our example, we used node name, “ nName “). To use more than a single property, one will have to combine two existing node properties into a new one. Basically, this means that one needs to create a new column in the nodeList data frame, as the forceNetwork function uses column name (string) to specify this property. Node color is defined by the “ Group ” variable – all nodes assigned to the same group will be colored the same. Hence, if we want all nodes to be colored differently, each node will be assigned to different group (one can use node ID as a group number). In our case, we colored nodes based on their degree (“ nodeDegree “) – all nodes with the same degree will be colored the same color. Variable Nodesize is used to define the size of the node. We used “ nodeBetweenness ” column to define node size. To define node link, we used a JS function to calculate distance between two nodes based on the value of the edge weight. The function uses variables already defined in the forceNetwork function (e.g., Value, or Nodesize), not variables/column names from the node and edge data frame. Thus, if you plan to use JS to perform any type of mathematical operations, selection of variables assign to the function is important (and also limiting factor). To define link color, we will interpolate edge colors based on their dice similarity values, using the “colorRampPalette” function (similar has been done and explained in one of the previous visualization blog posts ( Network Visualization Part 2 ):    F2 <- colorRampPalette(c("#FFFF00", "#FF0000"), bias = nrow(edgeList), space = "rgb", interpolate = "linear") colCodes <- F2(length(unique(edgeList\$diceSim))) edges_col <- sapply(edgeList\$diceSim, function(x) colCodes[which(sort(unique(edgeList\$diceSim)) == x)])    We can also define node opacity, opacity of node labels when they are inactive (no mouse over their corresponding nodes), and ability to zoom. Finally, the forceNetwork function provides an option to include additional functionalities, as a character string with a JavaScript expression that will be evaluated when there is a click on the node.   While the presented options allow us to create network representation described above, networkD3 package still lacks a number of features full D3.js library has and as such, has possible application limitations. For example, we cannot use different types of nodes (beside circles), edges (directed or undirected) or line styles (dashed, curved, etc)., we cannot assign edge labels or use multiple node labels, there are no filtering or zoom-in-zoom-out options that would accounted for different network structures (node clusters as a high-level visualization vs nodes within clusters as a low-level, in-depth visualization), etc.   Let's go back to our example. Given the above defined node and edge data frames, we can create a D3 object (denoted as D3_network_LM) as follows:   D3_network_LM <- networkD3::forceNetwork(Links = edgeList, Nodes = nodeList, Source = "SourceID", Target = "TargetID",Value = "Weight", NodeID = "nName", Nodesize = "nodeBetweenness", Group = "nodeDegree", height = 500, width = 1000, fontSize = 20, linkDistance = networkD3::JS("function(d) { return 10*d.value; }"), linkWidth = networkD3::JS("function(d) { return d.value/5; }"),opacity = 0.85, zoom = TRUE, opacityNoHover = 0.1, linkColour = edges_col)   To see the network we created, we just need to type its name:   D3_network_LM       Network visualization – part 6: D3 and R (networkD3)    D3 network representation      Since we allowed the zoom option, double click on any node will zoom in the network and allow us to see that node and its neighborhood in more details:       Network visualization – part 6: D3 and R (networkD3)    D3 network representation - zoomed in     Alternatively, we can save it as html file:   networkD3::saveNetwork(D3_network_LM, "D3_LM.html", selfcontained = TRUE)   and use it independently from R: click here to see the network exported as html file .   This example demonstrated that it is relatively easy to create a simple but still visually descriptive D3 network visualization from R with the networkD3 package. The simplicity to visualize network with networkD3 may be enough to make one ignore the lack of some features that would be available when working directly with D3, but would require significant time spent in learning D3 and designing a custom network visualization.   Finally, here is the code used to create the network: ######################################################################################################################################################################################### Plotting networks in R - an example how to plot a network and # customize its appearance using networkD3 library######################################################################################################################################################################################### Clear workspace # rm(list = ls())############################################################################################# Read a data set. # Data format: dataframe with 3 variables; variables 1 & 2 correspond to interactions; variable 3 is weight of interactionedgeList <- read.table("lesmis.txt", header = FALSE, sep = "\t")colnames(edgeList) <- c("SourceName", "TargetName", "Weight")# Create a graph. Use simplyfy to ensure that there are no duplicated edges or self loopsgD <- igraph::simplify(igraph::graph.data.frame(edgeList, directed=FALSE))# Create a node list object (actually a data frame object) that will contain information about nodesnodeList <- data.frame(ID = c(0:(igraph::vcount(gD) - 1)), # because networkD3 library requires IDs to start at 0                       nName = igraph::V(gD)\$name)# Map node names from the edge list to node IDsgetNodeID <- function(x){  which(x == igraph::V(gD)\$name) - 1 # to ensure that IDs start at 0}# And add them to the edge listedgeList <- plyr::ddply(edgeList, .variables = c("SourceName", "TargetName", "Weight"),                         function (x) data.frame(SourceID = getNodeID(x\$SourceName),                                                 TargetID = getNodeID(x\$TargetName)))############################################################################################# Calculate some node properties and node similarities that will be used to illustrate # different plotting abilities and add them to the edge and node lists# Calculate degree for all nodesnodeList <- cbind(nodeList, nodeDegree=igraph::degree(gD, v = igraph::V(gD), mode = "all"))# Calculate betweenness for all nodesbetAll <- igraph::betweenness(gD, v = igraph::V(gD), directed = FALSE) / (((igraph::vcount(gD) - 1) * (igraph::vcount(gD)-2)) / 2)betAll.norm <- (betAll - min(betAll))/(max(betAll) - min(betAll))nodeList <- cbind(nodeList, nodeBetweenness=100*betAll.norm) # We are scaling the value by multiplying it by 100 for visualization purposes only (to create larger nodes)rm(betAll, betAll.norm)#Calculate Dice similarities between all pairs of nodesdsAll <- igraph::similarity.dice(gD, vids = igraph::V(gD), mode = "all")F1 <- function(x) {data.frame(diceSim = dsAll[x\$SourceID +1, x\$TargetID + 1])}edgeList <- plyr::ddply(edgeList, .variables=c("SourceName", "TargetName", "Weight", "SourceID", "TargetID"),                            function(x) data.frame(F1(x)))rm(dsAll, F1, getNodeID, gD)############################################################################################# We will also create a set of colors for each edge, based on their dice similarity values# We'll interpolate edge colors based on the using the "colorRampPalette" function, that # returns a function corresponding to a collor palete of "bias" number of elements (in our case, that# will be a total number of edges, i.e., number of rows in the edgeList data frame)F2 <- colorRampPalette(c("#FFFF00", "#FF0000"), bias = nrow(edgeList), space = "rgb", interpolate = "linear")colCodes <- F2(length(unique(edgeList\$diceSim)))edges_col <- sapply(edgeList\$diceSim, function(x) colCodes[which(sort(unique(edgeList\$diceSim)) == x)])rm(colCodes, F2)############################################################################################# Let's create a networkD3_network_LM <- networkD3::forceNetwork(Links = edgeList, # data frame that contains info about edges                        Nodes = nodeList, # data frame that contains info about nodes                        Source = "SourceID", # ID of source node                         Target = "TargetID", # ID of target node                        Value = "Weight", # value from the edge list (data frame) that will be used to value/weight relationship amongst nodes                        NodeID = "nName", # value from the node list (data frame) that contains node description we want to use (e.g., node name)                        Nodesize = "nodeBetweenness",  # value from the node list (data frame) that contains value we want to use for a node size                        Group = "nodeDegree",  # value from the node list (data frame) that contains value we want to use for node color                        height = 500, # Size of the plot (vertical)                        width = 1000,  # Size of the plot (horizontal)                        fontSize = 20, # Font size                        linkDistance = networkD3::JS("function(d) { return 10*d.value; }"), # Function to determine distance between any two nodes, uses variables already defined in forceNetwork function (not variables from a data frame)                        linkWidth = networkD3::JS("function(d) { return d.value/5; }"),# Function to determine link/edge thickness, uses variables already defined in forceNetwork function (not variables from a data frame)                        opacity = 0.85, # opacity                        zoom = TRUE, # ability to zoom when click on the node                        opacityNoHover = 0.1, # opacity of labels when static                        linkColour = edges_col) # edge colors# Plot networkD3_network_LM          # Save network as html filenetworkD3::saveNetwork(D3_network_LM, "D3_LM.html", selfcontained = TRUE)################################################################################# sessionInfo()## R version 3.3.1 (2016-06-21)# Platform: x86_64-redhat-linux-gnu (64-bit)# Running under: Fedora 24 (Workstation Edition)# # locale:#   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       # [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   # [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              # [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       # # attached base packages:#   [1] stats     graphics  grDevices utils     datasets  methods   base     # # loaded via a namespace (and not attached):#   [1] htmlwidgets_0.7  plyr_1.8.4       magrittr_1.5     htmltools_0.3.5  tools_3.3.1      igraph_1.0.1    # [7] yaml_2.1.13      Rcpp_0.12.7      jsonlite_1.1     digest_0.6.10    networkD3_0.2.13# ################################################################################复制代码

★狼也无聊★ 投递于 2016-10-26 11:48:08
 不作不死，No zUo No Die

• ## 万字长文 | 共享汽车的宿命

如今共享经济与各行各业都有扯不清的关系， [...]

• ## 2017超流行十款发色推荐，炎炎夏日就要“色

微信号 jnntf001 [...]

• ## 拆了那些格子间，梦想加想让你单纯的“享受

题图：梦想加望京店 “你见过10年前的豪车 [...]

• ## 睡不着？Sana智能护目镜让您摆脱失眠困扰

相信大部分人都有过失眠的体验，虽然失眠只是一 [...]

• ## 瞄准近千亿农业植保市场，弧光航空欲用大载

随着中国土地流转率的提升、农村年龄结构老化 [...]

• ## Blockchains are the new Linux, not the n

Cryptocurrencies are booming beyond belief [...]

• ## 很好，看完这个节目让我觉得自己就是个什么

微信号 sinaenterta [...]

• ## 暴击！93年的孙怡素颜像董子健妈！不保养后

微信号 aq8050 董子 [...]

• ## 你和时髦之间还差一件马甲！

微信号 ishepin 马 [...]

• ## US weighs banning laptop computers on in

A bombing at a concert in Mancheste [...]

© 2001-2017 Comsenz Inc.