# Translating Embeddings (TransE)

In this post I present **Translating Embeddings (TransE)**, a method for the prediction of missing relationships in knowledge graphs. Rather than digging into the mathematics, I focus on presenting the intuition behind this method by the help of explanatory animations using the excellent WebGL library Mathbox^2. Doing so, I hope that even a reader with limited mathematical or machine learning background will be able to understand the essence of TransE. Readers interested by a formal description can report to the original article.

I like to consider TransE as a textbook case model to describe link prediction in multi-relational data (knowledge graphs). It has a straight geometric visualisation which makes it particularly intuitive and easy to understand. TransE is a method which models relationships by interpreting them as **translations** operating not on the graph structure directly but on a learned **low-dimensional embedding** of the knowledge graph entities. The method has been published in NIPS 2013 by Antoine Bordes *et al.* and inspired from the work of Tomas Mikolov: word2vec.

Throughout the post, I will use an example from the Freebase data set (now replaced by wikidata) about book authors, their mutual influence and genre. The example is small enough to be easily visualised.

## 1. It all started with a Knowledge Graph

Think about your network of friends; a list of countries, their capitals and regions; or the books on your shelf, their authors and genre. This information can be conveniently represented as a graph connecting people, things, places in the world. A knowledge graph is composed of facts/statements about inter-related concrete or abstract entities.

There is a natural representation of the fact “*J.K. Rowling is influenced by C.S. Lewis*” in the form of a graph with two nodes/vertices “*J.K. Rowling*” and “*C.S. Lewis*” linked by a relationship/edge labeled “*is influenced by*” from the subject vertex “*J.K. Rowling*” to the object “*C.S. Lewis*“. Naturally, other facts can be added to this knowledge graph, involving different relation types such as the fact that both “*J.K. Rowling*” and “*C.S. Lewis*” have for writing genre “*Fantasy*“.

The visualisation below, illustrates the creation of our knowledge graph example composed of 9 vertices, 11 edges and 2 relation types. **Green vertices** represent entities of type “** Genre**“,

**blue vertices**vertices of type “

*“.*

**Person****Light green edges**represent the relationship type “

**” while**

*genre***light blue edges**represent relationships of type “

**“.**

*is influenced by*## 2. Link Prediction in Knowledge Graphs

Although knowledge graphs (KGs) contain a large number of facts (Freebase has around 2 billion triples), they are missing some entities and links. **Incompleteness** of KGs can substantially affect the efficiency of systems relying on them. Recent research effort is focusing on the development of automatic methods for the prediction of new implicit facts. An activity called **Link Prediction** or **Knowledge Graph completion**.

Link prediction models learn from local and global connectivity patterns from entities and relationships of different types at the same time. Relation predictions are then performed by using the learned patterns to generalise observed relationships between an entity of interest and all others. In our toy KG, we can see how local connectivity patterns may help the prediction of implicit genre and influence of “*J.K. Rowling*“.

## 3. Essence of Translating Embeddings (TransE)

Manipulating large knowledge graph can be challenging. To address **scalability **and performance issues, a state-of-the-art solution is to **embed knowledge graph** components (entities and relation types) in a low dimensional continuous vector space while preserving some properties of the original graph of importance for a given discovery task.

In TransE, relationships are represented as **translations **in the embedding space: if a fact “*subject, relation, object*” holds, then the embedding of the object entity **should be close** to the embedding of the subject entity plus some vector representing the relationship type. We want the learned embedding of \mathbf{subject+relation\;type\approx object} for a given fact in the KG while \mathbf{subject+relation} should be far away from the \mathbf{object} if the fact is wrong.

The visualisation below illustrates the iterative process by which the method learns entities and relationships embeddings to conform to the translation property. In this example, “*J.K. Rowling*” and “*C.S. Lewis*” will end up having a similar vector as their translation by the vector representing “*genre*” relationship type must be close to the embedding of “*Fantasy*” genre.

Once the model has learned an embedding vector for each entity and relationship type, predictions will be performed using the same translation approach in the embedding space. The prediction for a given subject – relation\;type is evaluated by picking the **nearest neighbor entity** of the translation of the subject by the relation\;type. In other terms, the prediction task corresponds to the query: “What is the most likely object associated to a given subject entity and a given relation type?”. The prediction in the embedding space can then be transformed into a **new triple fact** in the knowledge graph.

For example, if we want to predict the literature genre of “*Lloyd Alexander*“, we can use the translation of its embedding by the “*genre*” relationship type vector. The closest entity “*Fantasy*” is the most likely true answer.

## 4. Learning TransE

#### – Model optimisation objective:

The embeddings are learned over several iterations where the system will try to find particular vectors configuration for entities and relationships that follows the intuition: **the “error of translation” of a true fact (that exists in the KG) should be smaller by a certain margin than the “error of translation” of a wrong fact (that does not exist in the KG)**. The “error of translation” is called dissimilarity measure of a KG fact. In order to generate wrong facts, the system corrupt the subject or the object of an existing fact by replacing it with any other randomly picked entity in the KG.

#### – Initialisation in low dimensionsal space:

The learning process starts with the **random initialisation** of all entities and relationships embeddings in a space with low dimensions (e.g. 50 dimensions). This initialisation is meant to set them in an optimal range for applying optimisation and to avoid the particular case of having embeddings equal to zero

#### – Optimisation epochs:

Each learning iteration, also called “epoch”, starts with normalising the embeddings (this prevents the method to trivially optimise the model by increasing the relative distance between the entity embeddings). As in many Machine Learning models, the system iterates over a portion of the KG facts (training set) and perform an **optimisation task of the model objective**. After each iteration, the entities and relationships embeddings are updated so to optimise this objective. The learning process stops based on the model performance on a portion of the KG facts left apart (validation set).

An illustration of the learning phase of the model is presented in the next visualisation (click on pause to explore manually the 3D visualisation). “*J.K. Rowling*” and several literature genre entities embeddings are shown at epochs 0 (initialisaiton), 40, 80, 120, 160 an 200. The arrow materialise the “*genre*” relationship type vector for “*J.K. Rowling*” entity. At the 200th epoch, the most likely genre of J.K. Rowling are in descendent order of likelihood: **[Fantasy, Science fiction, Mystery fiction, Horror, Autobiography]**. For each genre is associated a score and a rank (descending relative distance compared to other object entities). Note that the model automatically learned that “*Autobiography*” is far away from “*Fantasy*” compared to relative proximity of “*Fantasy*” and “*Science fiction*“.