技术控

    今日:109| 主题:49179
收藏本版 (1)
最新软件应用技术尽在掌握

[其他] Network visualization – part 6: D3 and R (networkD3)

[复制链接]
薛友珍 发表于 2016-10-5 22:37:23
127 1

立即注册CoLaBug.com会员,免费获得投稿人的专业资料,享用更多功能,玩转个人品牌!

您需要 登录 才可以下载或查看,没有帐号?立即注册

x
(This article was first published on Fun with R , and kindly contributed toR-bloggers)
  I was never that much into JavaScript until I was introduced to D3.js. This open source JS library provides the features for dynamic data manipulation and visualization and allows users to become active participants in data visualization process. As such, D3.js plots are not a static “as it is” data representation, but allow users to explore data points, hierarchies among the data, filter data by groups, and similar.
   As somebody who uses R often for data analysis, I was excited to see the some of the libraries that link R and D3, such as plotly and networkD3 (or previously https://cran.r-project.org/web/packages/d3Network/). Here, I will focus on the networkD3 package.
   As before, the network I will use as an illustration is a weighted network of characters’ coappearances in Victor Hugo’s novel “Les Miserables” (from D. E. Knuth, The Stanford GraphBase: A Platform for Combinatorial Computing, Addison-Wesley, Reading, MA, 1993) that consists of 77 nodes, corresponding to characters, and 254 weighted edges, corresponding to the number of characters coappearances in the same chapter of the book. I used four properties to characterize this network (for the sole purpose of making visualization more interesting) – the network nodes were characterized with two properties: degree and betweenness centrality, and the network edges were characterized with two properties: weight and Dice similarity (to see more details about these properties, see Network Visualization part 1 blog post). For calculation of network properties, I used the igraph package.
   The networkD3 package provides a function called igraph_to_networkD3 , that uses an igraph object to convert it into a format that networkD3 uses to create a network representation. As I used igraph object to store my network, including node and edge properties, I was hoping that I may only need to use this function to create a visualization of my network. However, this function does not work exactly like that (which is not that surprising, given the differences in how D3.js works and how igraph object is defined). Instead, it extracts lists of nodes and edges from the igraph object, but not the information about all node and edges properties (the exception is a priori specified information about nodes membership groups/clusters, which can be derived from one or more network properties, e.g., node degree). Additionally, the igraph_to_networkD3 function does not plot the network itself, but only extracts parameters that are later used in the forceNetwork function that plots the network.
   So let’s focus on the forceNetwork function instead. This function creates a D3.js force directed network graph from two data frames, one containing information about network nodes and the other one containing information about network edges. In our case, these data frames, denoted as nodeList and edgeList, respectively, contain the following columns (for more details see the code at the end of this post):
   nodeList:“ ID “, “ nName “, “ nodeDegree “, and “ nodeBetweenness
   edgeList:“ SourceID “, “ SourceName “, “ TargetID “, “ TargetName “, “ Weight “, and “ diceSim
   Given the information about nodes and edges stored in these data frame, we will use the forceNetwork function to create a network in which node size corresponds to the node betweenness value, node color corresponds to node degree, distance between two nodes and edge thickness corresponds to edge weight, and edge color corresponds to the dice similarity. Each node will be described by its name. The forceNetwork function expects edge list to contain pairs of interactions in form of their IDs (starting from zero). Node attributes, stored in the nodeList data frame, are expected to be ordered in the increasing order starting with the first node (ID = 0). Based on this ordering, the forceNetwork function will know which node ID to map to specified node property (in our example, we used node name, “ nName “). To use more than a single property, one will have to combine two existing node properties into a new one. Basically, this means that one needs to create a new column in the nodeList data frame, as the forceNetwork function uses column name (string) to specify this property. Node color is defined by the “ Group ” variable – all nodes assigned to the same group will be colored the same. Hence, if we want all nodes to be colored differently, each node will be assigned to different group (one can use node ID as a group number). In our case, we colored nodes based on their degree (“ nodeDegree “) – all nodes with the same degree will be colored the same color. Variable Nodesize is used to define the size of the node. We used “ nodeBetweenness ” column to define node size. To define node link, we used a JS function to calculate distance between two nodes based on the value of the edge weight. The function uses variables already defined in the forceNetwork function (e.g., Value, or Nodesize), not variables/column names from the node and edge data frame. Thus, if you plan to use JS to perform any type of mathematical operations, selection of variables assign to the function is important (and also limiting factor). To define link color, we will interpolate edge colors based on their dice similarity values, using the “colorRampPalette” function (similar has been done and explained in one of the previous visualization blog posts ( Network Visualization Part 2 ):
   F2 <- colorRampPalette(c("#FFFF00", "#FF0000"), bias = nrow(edgeList), space = "rgb", interpolate = "linear")
colCodes <- F2(length(unique(edgeList$diceSim)))
edges_col <- sapply(edgeList$diceSim, function(x) colCodes[which(sort(unique(edgeList$diceSim)) == x)])
   We can also define node opacity, opacity of node labels when they are inactive (no mouse over their corresponding nodes), and ability to zoom. Finally, the forceNetwork function provides an option to include additional functionalities, as a character string with a JavaScript expression that will be evaluated when there is a click on the node.
  While the presented options allow us to create network representation described above, networkD3 package still lacks a number of features full D3.js library has and as such, has possible application limitations. For example, we cannot use different types of nodes (beside circles), edges (directed or undirected) or line styles (dashed, curved, etc)., we cannot assign edge labels or use multiple node labels, there are no filtering or zoom-in-zoom-out options that would accounted for different network structures (node clusters as a high-level visualization vs nodes within clusters as a low-level, in-depth visualization), etc.
  Let's go back to our example. Given the above defined node and edge data frames, we can create a D3 object (denoted as D3_network_LM) as follows:
  D3_network_LM <- networkD3::forceNetwork(Links = edgeList, Nodes = nodeList, Source = "SourceID", Target = "TargetID",Value = "Weight", NodeID = "nName", Nodesize = "nodeBetweenness", Group = "nodeDegree", height = 500, width = 1000, fontSize = 20, linkDistance = networkD3::JS("function(d) { return 10*d.value; }"), linkWidth = networkD3::JS("function(d) { return d.value/5; }"),opacity = 0.85, zoom = TRUE, opacityNoHover = 0.1, linkColour = edges_col)
  To see the network we created, we just need to type its name:
  D3_network_LM
     
Network visualization – part 6: D3 and R (networkD3)-1 (previously,published,somebody,provides,features)
   D3 network representation
     Since we allowed the zoom option, double click on any node will zoom in the network and allow us to see that node and its neighborhood in more details:

Network visualization – part 6: D3 and R (networkD3)-2 (previously,published,somebody,provides,features)
   D3 network representation - zoomed in
    Alternatively, we can save it as html file:
  networkD3::saveNetwork(D3_network_LM, "D3_LM.html", selfcontained = TRUE)   and use it independently from R: click here to see the network exported as html file .
  This example demonstrated that it is relatively easy to create a simple but still visually descriptive D3 network visualization from R with the networkD3 package. The simplicity to visualize network with networkD3 may be enough to make one ignore the lack of some features that would be available when working directly with D3, but would require significant time spent in learning D3 and designing a custom network visualization.
  Finally, here is the code used to create the network:
  [code]############################################################################################
############################################################################################
# Plotting networks in R - an example how to plot a network and
# customize its appearance using networkD3 library
############################################################################################
############################################################################################
# Clear workspace
# rm(list = ls())
############################################################################################

# Read a data set.
# Data format: dataframe with 3 variables; variables 1 & 2 correspond to interactions; variable 3 is weight of interaction
edgeList <- read.table("lesmis.txt", header = FALSE, sep = "\t")
colnames(edgeList) <- c("SourceName", "TargetName", "Weight")

# Create a graph. Use simplyfy to ensure that there are no duplicated edges or self loops
gD <- igraph::simplify(igraph::graph.data.frame(edgeList, directed=FALSE))

# Create a node list object (actually a data frame object) that will contain information about nodes
nodeList <- data.frame(ID = c(0:(igraph::vcount(gD) - 1)), # because networkD3 library requires IDs to start at 0
                       nName = igraph::V(gD)$name)

# Map node names from the edge list to node IDs
getNodeID <- function(x){
  which(x == igraph::V(gD)$name) - 1 # to ensure that IDs start at 0
}
# And add them to the edge list
edgeList <- plyr::ddply(edgeList, .variables = c("SourceName", "TargetName", "Weight"),
                        function (x) data.frame(SourceID = getNodeID(x$SourceName),
                                                TargetID = getNodeID(x$TargetName)))

############################################################################################
# Calculate some node properties and node similarities that will be used to illustrate
# different plotting abilities and add them to the edge and node lists

# Calculate degree for all nodes
nodeList <- cbind(nodeList, nodeDegree=igraph::degree(gD, v = igraph::V(gD), mode = "all"))

# Calculate betweenness for all nodes
betAll <- igraph::betweenness(gD, v = igraph::V(gD), directed = FALSE) / (((igraph::vcount(gD) - 1) * (igraph::vcount(gD)-2)) / 2)
betAll.norm <- (betAll - min(betAll))/(max(betAll) - min(betAll))
nodeList <- cbind(nodeList, nodeBetweenness=100*betAll.norm) # We are scaling the value by multiplying it by 100 for visualization purposes only (to create larger nodes)
rm(betAll, betAll.norm)

#Calculate Dice similarities between all pairs of nodes
dsAll <- igraph::similarity.dice(gD, vids = igraph::V(gD), mode = "all")

F1 <- function(x) {data.frame(diceSim = dsAll[x$SourceID +1, x$TargetID + 1])}
edgeList <- plyr::ddply(edgeList, .variables=c("SourceName", "TargetName", "Weight", "SourceID", "TargetID"),
                           function(x) data.frame(F1(x)))

rm(dsAll, F1, getNodeID, gD)

############################################################################################
# We will also create a set of colors for each edge, based on their dice similarity values
# We'll interpolate edge colors based on the using the "colorRampPalette" function, that
# returns a function corresponding to a collor palete of "bias" number of elements (in our case, that
# will be a total number of edges, i.e., number of rows in the edgeList data frame)
F2 <- colorRampPalette(c("#FFFF00", "#FF0000"), bias = nrow(edgeList), space = "rgb", interpolate = "linear")
colCodes <- F2(length(unique(edgeList$diceSim)))
edges_col <- sapply(edgeList$diceSim, function(x) colCodes[which(sort(unique(edgeList$diceSim)) == x)])

rm(colCodes, F2)
############################################################################################
# Let's create a network

D3_network_LM <- networkD3::forceNetwork(Links = edgeList, # data frame that contains info about edges
                        Nodes = nodeList, # data frame that contains info about nodes
                        Source = "SourceID", # ID of source node
                        Target = "TargetID", # ID of target node
                        Value = "Weight", # value from the edge list (data frame) that will be used to value/weight relationship amongst nodes
                        NodeID = "nName", # value from the node list (data frame) that contains node description we want to use (e.g., node name)
                        Nodesize = "nodeBetweenness",  # value from the node list (data frame) that contains value we want to use for a node size
                        Group = "nodeDegree",  # value from the node list (data frame) that contains value we want to use for node color
                        height = 500, # Size of the plot (vertical)
                        width = 1000,  # Size of the plot (horizontal)
                        fontSize = 20, # Font size
                        linkDistance = networkD3::JS("function(d) { return 10*d.value; }"), # Function to determine distance between any two nodes, uses variables already defined in forceNetwork function (not variables from a data frame)
                        linkWidth = networkD3::JS("function(d) { return d.value/5; }"),# Function to determine link/edge thickness, uses variables already defined in forceNetwork function (not variables from a data frame)
                        opacity = 0.85, # opacity
                        zoom = TRUE, # ability to zoom when click on the node
                        opacityNoHover = 0.1, # opacity of labels when static
                        linkColour = edges_col) # edge colors

# Plot network
D3_network_LM
         
# Save network as html file
networkD3::saveNetwork(D3_network_LM, "D3_LM.html", selfcontained = TRUE)

################################################################################
# sessionInfo()
#
# R version 3.3.1 (2016-06-21)
# Platform: x86_64-redhat-linux-gnu (64-bit)
# Running under: Fedora 24 (Workstation Edition)
#
# locale:
#   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8      
# [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
# [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
# [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C      
#
# attached base packages:
#   [1] stats     graphics  grDevices utils     datasets  methods   base     
#
# loaded via a namespace (and not attached):
#   [1] htmlwidgets_0.7  plyr_1.8.4       magrittr_1.5     htmltools_0.3.5  tools_3.3.1      igraph_1.0.1   
# [7] yaml_2.1.13      Rcpp_0.12.7      jsonlite_1.1     digest_0.6.10    networkD3_0.2.13
#
################################################################################[/code]
友荐云推荐




上一篇:MariaDB High Availability: Replication Manager
下一篇:How LinkedIn Mobile App Spurred A Complete Code Cleanup
酷辣虫提示酷辣虫禁止发表任何与中华人民共和国法律有抵触的内容!所有内容由用户发布,并不代表酷辣虫的观点,酷辣虫无法对用户发布内容真实性提供任何的保证,请自行验证并承担风险与后果。如您有版权、违规等问题,请通过"联系我们"或"违规举报"告知我们处理。

★狼也无聊★ 发表于 2016-10-26 11:48:08
不作不死,No zUo No Die
回复 支持 反对

使用道具 举报

*滑动验证:
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

我要投稿

推荐阅读

扫码访问 @iTTTTT瑞翔 的微博
回页顶回复上一篇下一篇回列表手机版
手机版/CoLaBug.com ( 粤ICP备05003221号 | 文网文[2010]257号 )|网站地图 酷辣虫

© 2001-2016 Comsenz Inc. Design: Dean. DiscuzFans.

返回顶部 返回列表