Motivation 由于底层网络结构复杂,Shallow model 无法捕捉高度非线性的网络结构,导致网络表示次优。 因此,如何找到一种能够有效捕捉高度非线性网络结构并保留全局和局部结构的方法是一个开放而重要的问题。 Structural Deep Network Embedding method, namely SDNE. A semi-supervised deep model. SDNE model has multiple layers of non-linear functions, thereby being able to capture the highly non-linear network structure. Then we propose to exploit the first-order and second-order proximity jointly to preserve the network structure.
The second-order proximity is used by the unsupervised component to capture the global network structure.
The first-order proximity is used as the supervised information in the supervised component to preserve the local network structure.
High non-linearity: the underlying structure of the network is highly non-linear.
Structure-preserving: The underlying structure of the network is very complex. The similarity of vertexes is dependent on both the local and global network structure. Therefore, how to simultaneously preserve the local and global structure is a tough problem.
Sparsity: Many real-world networks are often so sparse that only utilizing the very limited observed links is not enough to reach a satisfactory performance .
在本节中,我们首先定义问题。 然后我们介绍了所提出的 SDNE 的半监督深度模型。 最后我们对该模型进行了一些讨论和分析。Definition 1. (Graph) A graph is denoted as G=(V,E), where V=v1,...,vn represents n vertexes and E={ei,j}ni,j=1 represents the edges.Each edge ei,j is associated with a weight si,j≥0 . For vi and vj not linked by an edge, si,j=0 . Otherwise, for unweighted graph si,j=1 and for weighted graph, si,j>0 .Definition 2. (First-Order Proximity) The first-order proximity describes the pairwise proximity between vertexes. For any pair of vertexes, if si,j>0, there exists positive first-order proximity between vi and vj . Otherwise, the first-order proximity between vi and vj is 0. DEFINITION 3. (Second-Order Proximity) The second-order proximity between a pair of vertexes describes the proximity of the pair's neighborhood structure. Let Nu={su,1,…,su,|V|} denote the first-order proximity between vu and other vertexes. Then, secondorder proximity is determined by the similarity of Nu and Nv . Definition 4. (Network Embedding) Given a graph denoted as G=(V,E) , network embedding aims to learn a mapping function f:vi↦yi∈Rd , where d≪|V| . The objective of the function is to make the similarity between yi and yj explicitly preserve the first-order and second-order proximity of vi and vj . 回忆: 上述解释可以参考 :论文解读(LINE)《LINE: Large-scale Information Network Embedding》
3.2 The Model
3.2.1 Framework
一个 semi-supervised 的深度模型,其框架如图 Figure 2 所示.
3.2.2 Loss Functions
我们在 Table 1 中定义了一些术语和符号,稍后将使用这些术语和符号。
首先回顾一下深度自编码器的关键思想如下:
两部分,即 Encoder 和 Decoder :
Encoder 由多个 non-linear functions 组成,这些函数将 input data 映射到 representation space。
因为 Y=σ(Y(K−1)W(K)+b(K)), 第二项 ∂Y/∂W(K) 可容易计算出。对于第一项 ∂L1st/∂Y,我们有: ∂L1 st ∂Y=2(L+LT)⋅Y(11)
同样地,利用反向传播,我们可以完成对的 L1st 偏导数的计算。
现在我们得到了这些参数的偏导数。通过对参数的初始化,可以利用 SGD 对所提出的深度模型进行优化。需要注意的是,由于模型的非线性较高,在参数空间中存在许多局部最优。因此,为了找到一个良好的参数空间区域,我们首先使用 Deep Belief Network 对参数进行 pretrain ,这在文献中被证明是深度学习的必要参数初始化。
SDNE 完整算法在 Alg 1 中提出。
3.3 Analysis and Discussions
New vertexes
网络嵌入的一个实际问题是如何学习新到达的顶点的表示,对于一个新节点 vk,如果它与现有的顶点的连接已知,我们可以得到它的邻接向量,其中si,k 表示现有的 vi 和新的顶点 vk 之间的相似性。可以简单地将 x 输入我们的深度模型,并使用训练后的参数 θ 来得到 vk 的表示。对于不知道邻接向量的情况,SDNE 和最新的方法都无法解决。
Training Complexity:
不难看出,我们的模型的训练复杂度是 O(ncdI),其中 n 是顶点数,d 是隐藏层的最大维数,c 是网络的平均度,I 是迭代次数。
4. EXPERIMENTS
4.1 Datasets
Three social networks, one citation network and one language network, for three real-world applications, i.e. multi-label classification, link prediction and visualization.
数据集的详细统计数据如 Table 2 所示。
4.2 Baseline Algorithms
DeepWalk : It adopts random walk and skip-gram model to generate network representations.
LINE : It defines loss functions to preserve the first-order or second-order proximity separately. After optimizing the loss functions, it concatenates these representations.
GraRep : It extends to high-order proximity and uses the SVD to train the model. It also directly concatenates the representations of first-order and high-order.
Laplacian Eigenmaps (LE) : It generates network representations by factorizing the Laplacian matrix of the adjacency matrix. It only exploits the first-order proximity to preserve the network structure.
Common Neighbor : It only uses the number of common neighbors to measure the similarity between vertexes. It is used as the baseline only in the task of link prediction.