Graph Embedding

카테고리 없음

Graph Embedding - I

Jacob_ 2022. 8. 24. 09:08

그래프 임베딩 추천 강의
https://www.boostcourse.org/ai211/joinLectures/323421?isDesc=false

Node Embedding
기계학습의 다양한 툴은 벡터로 표현된 데이터를 입력으로 받습니다. 이러한 툴들을 이용해서 그래프를 분석할 수 있을까요? 그래프를 벡터로 나타낼 수 있으면 가능하겠죠? 이번 강의에서는 그래프의 정점(Node)을 벡터로 표현하는 방법인 정점 임베딩(Node Embedding)에 대해 배웁니다. 정점을 어떻게 벡터로 표현하는지, 정점 사이의 유사성을 어떻게 표현하는지 집중하면서 강의를 들어봅시다!

그래프 임베딩 요약 블로그 (published in WATCHA)

https://medium.com/watcha/%EA%B7%B8%EB%9E%98%ED%94%84-%EC%9E%84%EB%B2%A0%EB%94%A9-%EC%9A%94%EC%95%BD-bc2732048999

그래프 임베딩 요약

이 글은 Primož Godec의 Graph Embeddings — The Summary를 허락받고 번역한 글입니다.

medium.com

위 글의 원문 (아래는 원문 영문, 위는 번역본 한글)
https://towardsdatascience.com/graph-embeddings-the-summary-cc6075aba007

Graph Embeddings — The Summary

This article present what graph embeddings are, their use, and the comparison of the most commonly used graph embedding approaches.

towardsdatascience.com

쉽게 참고할 만한 자료
개쉬운 그래프 뉴럴 네트워크 Graph Neural Network - 유튜브 동영상

https://www.youtube.com/watch?v=9eMbvfRM9_8

Neural Graph Collaborative Filtering 논문

https://arxiv.org/pdf/1905.08108v2.pdf

Introduction to Node Embeddings

What are node embeddings?
How to generate node embeddings?
When can we even use node embeddings?

임베딩(embedding)은 영어 사전에 따르면 물질이나 단단한 물체에 무언가를 고정하는 것을 의미합니다. 그래프를 사용하면 전체 그래프를 N차원 공간에 매핑하는 것을 의미합니다. 아래의 예를 살펴보십시오. 이 예에서는 2차원 공간의 모든 노드를 매핑했습니다. 이제 그래프에 두 개의 클러스터(또는 커뮤니티)가 있음이 분명해야 합니다. 우리 인간에게는 2차원 공간에서 클러스터를 식별하는 것이 더 쉽습니다. 이 예에서는 그래프 레이아웃에서 클러스터를 쉽게 찾을 수도 있습니다.

To embed, per the English dictionary, means to fix something in a substance or solid object. With graphs, it would mean to map the whole graph in N-dimensional space. Take a look at the example below. In this example, we mapped all the nodes in 2-dimensional space. Now, it should be obvious we have two clusters (or communities) in the graph. For us humans, it is easier to identify clusters in 2-dimensional space. In this example, it is also easy to spot clusters just from the graph layout, but imagine the graph having 1000 nodes - things aren’t as straightforward anymore.

그러나 1000개의 노드가 있는 그래프를 상상해 보세요. 상황이 더 이상 간단하지 않습니다. 또한 컴퓨터의 경우 노드 임베딩(숫자의 벡터)으로 작업하는 것이 더 쉽습니다. 그래프에서만 계산하는 것보다 N차원 공간의 임베딩에서 2개의 노드가 얼마나 유사한(공간에서 가까운) 계산하는 것이 더 쉽기 때문입니다. 반면에 그래프만으로는 두 노드의 근접도를 계산할 수 있는 적절한 방법이 없습니다. 최단 경로 알고리즘과 같은 것을 사용할 수 있지만 그 자체로는 충분하지 않습니다. 벡터를 사용하면 더 쉽습니다. 가장 자주 사용되는 메트릭을 코사인 유사도라고 합니다.

Furthermore, for a computer, it is easier to work with node embeddings (vectors of numbers), because it is easier to calculate how similar (close in space) 2 nodes are from embeddings in N-dimensional space than it would be to calculate from a graph only. On the other hand, there is no proper way how we could calculate the closeness of two nodes just from the graph. You could use something like the shortest path algorithm, but that itself is not representative enough. With vectors, it’s easier. The most often used metric is called cosine similarity.

How to generate node embeddings?
Researches have divided these methods into three broad categories:
- Factorization based
- Random Walk based
- Deep Learning based

https://memgraph.com/blog/introduction-to-node-embedding

Introduction to Node Embedding

Graphs for stream devs: Find out what node embeddings are, how to generate them and where they can be used

memgraph.com

그래프 관련 논문 자료

Graph Convolutional Neural Networks

그래프에서 직접 작동하는 컨볼루션 신경망의 효율적인 변형을 기반으로 하는 그래프 구조 데이터에 대한 반 지도 학습을 위한 확장 가능한 접근 방식을 제시합니다. 스펙트럼 그래프 컨볼루션의 지역화된 1차 근사를 통해 컨볼루션 아키텍처를 선택하도록 동기를 부여합니다. 우리 모델은 그래프 에지의 수에서 선형으로 확장되고 로컬 그래프 구조와 노드의 기능을 모두 인코딩하는 은닉층 표현을 학습합니다. 인용 네트워크 및 지식 그래프 데이터 세트에 대한 여러 실험에서 우리의 접근 방식이 관련 방법보다 상당한 차이가 있음을 보여줍니다.

Graph Attention Network

우리는 그래프 구조 데이터에서 작동하는 새로운 신경망 아키텍처인 그래프 주의 네트워크(Graph Attention Networks, GAT)를 제시하며, 마스크된 자기 주의 레이어를 활용하여 그래프 컨볼루션 또는 그 근사를 기반으로 하는 이전 방법의 단점을 해결합니다. 노드가 이웃의 기능을 처리할 수 있는 레이어를 쌓음으로써 값비싼 행렬 연산(예: 반전)이 필요하지 않거나 그래프를 아는 데 의존하지 않고도 이웃의 다른 노드에 다른 가중치를 (암시적으로) 지정할 수 있습니다. 구조를 미리. 이러한 방식으로 스펙트럼 기반 그래프 신경망의 몇 가지 주요 문제를 동시에 해결하고 귀납적 문제와 변환적 문제에 쉽게 적용할 수 있는 모델을 만듭니다.

We present graph attention networks (GATs), novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations. By stacking layers in which nodes are able to attend over their neighborhoods' features, we enable (implicitly) specifying different weights to different nodes in a neighborhood, without requiring any kind of costly matrix operation (such as inversion) or depending on knowing the graph structure upfront. In this way, we address several key challenges of spectral-based graph neural networks simultaneously, and make our model readily applicable to inductive as well as transductive problems. Our GAT models have achieved or matched state-of-the-art results across four established transductive and inductive graph benchmarks: the Cora, Citeseer and Pubmed citation network datasets, as well as a protein-protein interaction dataset (wherein test graphs remain unseen during training).

GraphSage

큰 그래프에서 노드의 저차원 임베딩은 콘텐츠 추천에서 단백질 기능 식별에 이르기까지 다양한 예측 작업에서 매우 유용한 것으로 입증되었습니다. 그러나 대부분의 기존 접근 방식에서는 임베딩 교육 중에 그래프의 모든 노드가 있어야 합니다. 이러한 이전 접근 방식은 본질적으로 변환적이며 자연적으로 보이지 않는 노드로 일반화되지 않습니다. 여기에서는 노드 기능 정보(예: 텍스트 속성)를 활용하여 이전에 볼 수 없었던 데이터에 대한 노드 임베딩을 효율적으로 생성하는 일반적인 귀납적 프레임워크인 GraphSAGE를 제시합니다. 각 노드에 대한 개별 임베딩을 훈련하는 대신 노드의 로컬 이웃에서 기능을 샘플링하고 집계하여 임베딩을 생성하는 함수를 학습합니다. 우리의 알고리즘은 세 가지 귀납적 노드 분류 벤치마크에서 강력한 기준선을 능가합니다.