技术控

    今日:104| 主题:49431
收藏本版 (1)
最新软件应用技术尽在掌握

[其他] Graph Convolutional Networks – An introduction to neural networks on graphs

[复制链接]
单恋很苦 发表于 2016-10-2 05:30:12
161 1

立即注册CoLaBug.com会员,免费获得投稿人的专业资料,享用更多功能,玩转个人品牌!

您需要 登录 才可以下载或查看,没有帐号?立即注册

x

Graph Convolutional Networks – An introduction to neural networks on graphs-1 (techniques,structured,previously,attention,important)
   Multi-layer Graph Convolutional Network (GCN) with first-order filters.
    Overview

  Many important real-world datasets come in the form of graphs or networks: social networks, knowledge graphs, protein-interaction networks, the World Wide Web, etc. (just to name a few). Yet, until recently, very little attention has been devoted to the generalization of neural network models to such structured datasets.
   In the last couple of years, a number of papers re-visited this problem of generalizing neural networks to work on arbitrarily structured graphs ( Bruna et al. , ICLR 2014; Henaff et al. , 2015; Duvenaud et al. , NIPS 2015; Li et al. , ICLR 2016; Defferrard et al. , NIPS 2016; Kipf & Welling , 2016), some of them now achieving very promising results in domains that have previously been dominated by, e.g., kernel-based methods, graph-based regularization techniques and others.
  In this post, I will give a brief overview of recent developments in this field and point out strengths and drawbacks of various approaches. The discussion here will mainly focus on two recent papers:
  
       
  • Kipf & Welling (2016), Semi-Supervised Classification with Graph Convolutional Networks (disclaimer: I'm the first author)   
  • Defferrard et al. (NIPS 2016), Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering  
   and a review/discussion post by Ferenc Huszar: How powerful are Graph Convolutions? that discusses some limitations of these kinds of models. I wrote a short comment on Ferenc's review(at the very end of this post).
  Outline

  
       
  • Short introduction to neural network models on graphs   
  • Spectral graph convolutions and Graph Convolutional Networks (GCNs)   
  • Demo: Graph embeddings with a simple 1st-order GCN model   
  • GCNs as differentiable generalization of the Weisfeiler-Lehman algorithm  
   If you're already familiar with GCNs and related methods, you might want to jump directly toEmbedding the karate club network.
  How powerful are Graph Convolutional Networks?

  Recent literature

   Generalizing well-established neural models like RNNs or CNNs to work on arbitrarily structured graphs is a challenging problem. Some recent papers introduce problem-specific specialized architectures (e.g. Duvenaud et al. , NIPS 2015; Li et al. , ICLR 2016), others make use of graph convolutions known from spectral graph theory( Bruna et al. , ICLR 2014; Henaff et al. , 2015) to define parameterized filters that are used in a multi-layer neural network model, akin to "classical" CNNs that we know and love.
   More recent work focuses on bridging the gap between fast heuristics and the slow, but somewhat more principled, spectral approach. Defferrard et al. (NIPS 2016) approximate smooth filters in the spectral domain using Chebyshev polynomials with free parameters that are learned in a neural network-like model. They achieve convincing results on regular domains (like MNIST), closely approaching those of a simple 2D CNN model.
   In Kipf & Welling (2016), we take a somewhat similar approach and start from the framework of spectral graph convolutions, yet introduce simplifications (we will get to those later in the post) that in many cases allow both for significantly faster training times and higher predictive accuracy, reaching state-of-the-art classification results on a number of benchmark graph datasets.
  GCNs Part I: Definitions

   Currently, most graph neural network models have a somewhat universal architecture in common. I will refer to these models as Graph Convolutional Networks (GCNs); convolutional, because filter parameters are typically shared over all locations in the graph (or a subset thereof as in Duvenaud et al. , NIPS 2015).
   For these models, the goal is then to learn a function of signals/features on a graph \(\mathcal{G}=(\mathcal{V}, \mathcal{E})\) which takes as input:
  
       
  • A feature description \(x_i\) for every node \(i\) ; summarized in a \(N\times D\) feature matrix \(X\) ( \(N\) : number of nodes, \(D\) : number of input features)   
  • A representative description of the graph structure in matrix form; typically in the form of an adjacency matrix \(A\) (or some function thereof)  
   and produces a node-level output \(Z\) (an \(N\times F\) feature matrix, where \(F\) is the number of output features per node). Graph-level outputs can be modeled by introducing some form of pooling operation (see, e.g. Duvenaud et al. , NIPS 2015).
   Every neural network layer can then be written as a non-linear function \[ H^{(l+1)} = f(H^{(l)}, A) \, ,\] with \(H^{(0)} = X\) and \(H^{(L)} = Z\) (or \(z\) for graph-level outputs), \(L\) being the number of layers. The specific models then differ only in how \(f(\cdot, \cdot)\) is chosen and parameterized.
  GCNs Part II: A simple example

  As an example, let's consider the following very simple form of a layer-wise propagation rule:
  \[f(H^{(l)}, A) = \sigma\left( AH^{(l)}W^{(l)}\right) \, ,\]
   where \(W^{(l)}\) is a weight matrix for the \(l\) -th neural network layer and \(\sigma(\cdot)\) is a non-linear activation function like the \(\text{ReLU}\) . Despite its simplicity this model is already quite powerful (we'll come to that in a moment).
   But first, let us address two limitations of this simple model: multiplication with \(A\) means that, for every node, we sum up all the feature vectors of all neighboring nodes but not the node itself (unless there are self-loops in the graph). We can "fix" this by enforcing self-loops in the graph: we simply add the identity matrix to \(A\) .
   The second major limitation is that \(A\) is typically not normalized and therefore the multiplication with \(A\) will completely change the scale of the feature vectors (we can understand that by looking at the eigenvalues of \(A\) ). Normalizing \(A\) such that all rows sum to one, i.e. \(D^{-1}A\) , where \(D\) is the diagonal node degree matrix, gets rid of this problem. Multiplying with \(D^{-1}A\) now corresponds to taking the average of neighboring node features. In practice, dynamics get more interesting when we use a symmetric normalization, i.e. \(D^{-\frac{1}{2}}AD^{-\frac{1}{2}}\) (as this no longer amounts to mere averaging of neighboring nodes). Combining these two tricks, we essentially arrive at the propagation rule introduced in Kipf & Welling (2016):
  \[f(H^{(l)}, A) = \sigma\left( \hat{D}^{-\frac{1}{2}}\hat{A}\hat{D}^{-\frac{1}{2}}H^{(l)}W^{(l)}\right) \, ,\]
   with \(\hat{A} = A + I\) , where \(I\) is the identity matrix and \(\hat{D}\) is the diagonal node degree matrix of \(\hat{A}\) .
   In the next section, we will take a closer look at how this type of model operates on a very simple example graph: Zachary's karate club network (make sure to check out the Wikipedia article !).
  GCNs Part III: Embedding the karate club network

Graph Convolutional Networks – An introduction to neural networks on graphs-2 (techniques,structured,previously,attention,important)
    Karate club graph, colors denote communities obtained via modularity-based clustering ( Brandes et al. , 2008).
     Let's take a look at how our simple GCN model (see previous section or Kipf & Welling , 2016) works on a well-known graph dataset: Zachary's karate club network (see Figure above).
   We take a 3-layer GCN with randomly initialized weights. Now, even before training the weights, we simply insert the adjacency matrix of the graph and \(X = I\) (i.e. the identity matrix, as we don't have any node features) into the model. The 3-layer GCN now performs three propagation steps during the forward pass and effectively convolves the 3rd-order neighborhood of every node (all nodes up to 3 "hops" away). Remarkably, the model produces an embedding of these nodes that closely resembles the community-structure of the graph (see Figure below). Remember that we have initialized the weights completely at random and have not yet performed any training updates (so far)!
12下一页
友荐云推荐




上一篇:A subpixel super-resolution neural net implementation in Tensorflow
下一篇:Why I'm a React Native Developer: A Response to Ariel Elkin
酷辣虫提示酷辣虫禁止发表任何与中华人民共和国法律有抵触的内容!所有内容由用户发布,并不代表酷辣虫的观点,酷辣虫无法对用户发布内容真实性提供任何的保证,请自行验证并承担风险与后果。如您有版权、违规等问题,请通过"联系我们"或"违规举报"告知我们处理。

董泽伟 发表于 2016-10-2 09:16:16
回个帖子,下班咯~
回复 支持 反对

使用道具 举报

*滑动验证:
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

我要投稿

推荐阅读

扫码访问 @iTTTTT瑞翔 的微博
回页顶回复上一篇下一篇回列表手机版
手机版/CoLaBug.com ( 粤ICP备05003221号 | 文网文[2010]257号 )|网站地图 酷辣虫

© 2001-2016 Comsenz Inc. Design: Dean. DiscuzFans.

返回顶部 返回列表