meaning representation nlp

Data Science
Data Analysis
Data Visualization
Machine Learning
Deep Learning
Computer Vision
Artificial Intelligence
AI ML DS Interview Series
AI ML DS Projects series
Data Engineering
Web Scrapping

Understanding Semantic Analysis – NLP

Introduction to semantic analysis.

Semantic Analysis is a subfield of Natural Language Processing (NLP) that attempts to understand the meaning of Natural Language. Understanding Natural Language might seem a straightforward process to us as humans. However, due to the vast complexity and subjectivity involved in human language, interpreting it is quite a complicated task for machines. Semantic Analysis of Natural Language captures the meaning of the given text while taking into account context, logical structuring of sentences and grammar roles.

Parts of Semantic Analysis

Semantic Analysis of Natural Language can be classified into two broad parts:

1. Lexical Semantic Analysis: Lexical Semantic Analysis involves understanding the meaning of each word of the text individually. It basically refers to fetching the dictionary meaning that a word in the text is deputed to carry.

2. Compositional Semantics Analysis: Although knowing the meaning of each word of the text is essential, it is not sufficient to completely understand the meaning of the text.

For example, consider the following two sentences:

Sentence 1: Students love GeeksforGeeks.
Sentence 2: GeeksforGeeks loves Students.

Although both these sentences 1 and 2 use the same set of root words {student, love, geeksforgeeks}, they convey entirely different meanings.

Hence, under Compositional Semantics Analysis, we try to understand how combinations of individual words form the meaning of the text.

Tasks involved in Semantic Analysis

In order to understand the meaning of a sentence, the following are the major processes involved in Semantic Analysis:

Word Sense Disambiguation
Relationship Extraction

Word Sense Disambiguation:

In Natural Language, the meaning of a word may vary as per its usage in sentences and the context of the text. Word Sense Disambiguation involves interpreting the meaning of a word based upon the context of its occurrence in a text.

For example, the word ‘Bark’ may mean ‘the sound made by a dog’ or ‘the outermost layer of a tree.’

Likewise, the word ‘rock’ may mean ‘ a stone ‘ or ‘ a genre of music ‘ – hence, the accurate meaning of the word is highly dependent upon its context and usage in the text.

Thus, the ability of a machine to overcome the ambiguity involved in identifying the meaning of a word based on its usage and context is called Word Sense Disambiguation.

Relationship Extraction:

Another important task involved in Semantic Analysis is Relationship Extracting. It involves firstly identifying various entities present in the sentence and then extracting the relationships between those entities.

For example, consider the following sentence:

Semantic Analysis is a topic of NLP which is explained on the GeeksforGeeks blog. The entities involved in this text, along with their relationships, are shown below.

Relationships

Elements of Semantic Analysis

Some of the critical elements of Semantic Analysis that must be scrutinized and taken into account while processing Natural Language are:

Hyponymy: Hyponymys refers to a term that is an instance of a generic term. They can be understood by taking class-object as an analogy. For example: ‘ Color ‘ is a hypernymy while ‘ grey ‘, ‘ blue ‘, ‘ red ‘, etc, are its hyponyms.
Homonymy: Homonymy refers to two or more lexical terms with the same spellings but completely distinct in meaning. For example: ‘ Rose ‘ might mean ‘ the past form of rise ‘ or ‘ a flower ‘, – same spelling but different meanings; hence, ‘ rose ‘ is a homonymy.
Synonymy: When two or more lexical terms that might be spelt distinctly have the same or similar meaning, they are called Synonymy. For example: (Job, Occupation), (Large, Big), (Stop, Halt).
Antonymy: Antonymy refers to a pair of lexical terms that have contrasting meanings – they are symmetric to a semantic axis. For example: (Day, Night), (Hot, Cold), (Large, Small).
Polysemy: Polysemy refers to lexical terms that have the same spelling but multiple closely related meanings. It differs from homonymy because the meanings of the terms need not be closely related in the case of homonymy. For example: ‘ man ‘ may mean ‘ the human species ‘ or ‘ a male human ‘ or ‘ an adult male human ‘ – since all these different meanings bear a close association, the lexical term ‘ man ‘ is a polysemy.
Meronomy: Meronomy refers to a relationship wherein one lexical term is a constituent of some larger entity. For example: ‘ Wheel ‘ is a meronym of ‘ Automobile ‘

Meaning Representation

While, as humans, it is pretty simple for us to understand the meaning of textual information, it is not so in the case of machines. Thus, machines tend to represent the text in specific formats in order to interpret its meaning. This formal structure that is used to understand the meaning of a text is called meaning representation.

Basic Units of Semantic System:

In order to accomplish Meaning Representation in Semantic Analysis, it is vital to understand the building units of such representations. The basic units of semantic systems are explained below:

Entity: An entity refers to a particular unit or individual in specific such as a person or a location. For example GeeksforGeeks, Delhi, etc.
Concept: A Concept may be understood as a generalization of entities. It refers to a broad class of individual units. For example Learning Portals, City, Students.
Relations: Relations help establish relationships between various entities and concepts. For example: ‘GeeksforGeeks is a Learning Portal’, ‘Delhi is a City.’, etc.
Predicate: Predicates represent the verb structures of the sentences.

In Meaning Representation, we employ these basic units to represent textual information.

Approaches to Meaning Representations:

Now that we are familiar with the basic understanding of Meaning Representations, here are some of the most popular approaches to meaning representation:

First-order predicate logic (FOPL)
Semantic Nets
Conceptual dependency (CD)
Rule-based architecture
Case Grammar
Conceptual Graphs

Semantic Analysis Techniques

Based upon the end goal one is trying to accomplish, Semantic Analysis can be used in various ways. Two of the most common Semantic Analysis techniques are:

Text Classification

In-Text Classification, our aim is to label the text according to the insights we intend to gain from the textual data.

For example:

In Sentiment Analysis, we try to label the text with the prominent emotion they convey. It is highly beneficial when analyzing customer reviews for improvement.
In Topic Classification , we try to categories our text into some predefined categories. For example: Identifying whether a research paper is of Physics, Chemistry or Maths
In Intent Classification , we try to determine the intent behind a text message. For example: Identifying whether an e-mail received at customer care service is a query, complaint or request.

Text Extraction

In-Text Extraction, we aim at obtaining specific information from our text.

For Example,

In Keyword Extraction , we try to obtain the essential words that define the entire document.
In Entity Extraction , we try to obtain all the entities involved in a document.

Significance of Semantics Analysis

Semantics Analysis is a crucial part of Natural Language Processing (NLP). In the ever-expanding era of textual information, it is important for organizations to draw insights from such data to fuel businesses. Semantic Analysis helps machines interpret the meaning of texts and extract useful information, thus providing invaluable data while reducing manual efforts.

Besides, Semantics Analysis is also widely employed to facilitate the processes of automated answering systems such as chatbots – that answer user queries without any human interventions.

Please Login to comment...

Improve your Coding Skills with Practice

What kind of Experience do you want to share?

Natural Language Processing Tutorial
NLP - Introduction
NLP - Linguistic Resources
NLP - Word Level Analysis
NLP - Syntactic Analysis
NLP - Semantic Analysis
NLP - Word Sense Disambiguation
NLP - Discourse Processing
NLP - Part of Speech (PoS) Tagging
NLP - Inception
NLP - Information Retrieval
NLP - Applications of NLP
NLP - Python
Natural Language Processing Resources
NLP - Quick Guide
NLP - Useful Resources
NLP - Discussion
Selected Reading
UPSC IAS Exams Notes
Developer's Best Practices
Questions and Answers
Effective Resume Writing
HR Interview Questions
Computer Glossary

Natural Language Processing - Semantic Analysis

The purpose of semantic analysis is to draw exact meaning, or you can say dictionary meaning from the text. The work of semantic analyzer is to check the text for meaningfulness.

We already know that lexical analysis also deals with the meaning of the words, then how is semantic analysis different from lexical analysis? Lexical analysis is based on smaller token but on the other side semantic analysis focuses on larger chunks. That is why semantic analysis can be divided into the following two parts −

Studying meaning of individual word

It is the first part of the semantic analysis in which the study of the meaning of individual words is performed. This part is called lexical semantics.

Studying the combination of individual words

In the second part, the individual words will be combined to provide meaning in sentences.

The most important task of semantic analysis is to get the proper meaning of the sentence. For example, analyze the sentence “Ram is great.” In this sentence, the speaker is talking either about Lord Ram or about a person whose name is Ram. That is why the job, to get the proper meaning of the sentence, of semantic analyzer is important.

Elements of Semantic Analysis

Followings are some important elements of semantic analysis −

It may be defined as the relationship between a generic term and instances of that generic term. Here the generic term is called hypernym and its instances are called hyponyms. For example, the word color is hypernym and the color blue, yellow etc. are hyponyms.

It may be defined as the words having same spelling or same form but having different and unrelated meaning. For example, the word “Bat” is a homonymy word because bat can be an implement to hit a ball or bat is a nocturnal flying mammal also.

Polysemy is a Greek word, which means “many signs”. It is a word or phrase with different but related sense. In other words, we can say that polysemy has the same spelling but different and related meaning. For example, the word “bank” is a polysemy word having the following meanings −

A financial institution.

The building in which such an institution is located.

A synonym for “to rely on”.

Difference between Polysemy and Homonymy

Both polysemy and homonymy words have the same syntax or spelling. The main difference between them is that in polysemy, the meanings of the words are related but in homonymy, the meanings of the words are not related. For example, if we talk about the same word “Bank”, we can write the meaning ‘a financial institution’ or ‘a river bank’. In that case it would be the example of homonym because the meanings are unrelated to each other.

It is the relation between two lexical items having different forms but expressing the same or a close meaning. Examples are ‘author/writer’, ‘fate/destiny’.

It is the relation between two lexical items having symmetry between their semantic components relative to an axis. The scope of antonymy is as follows −

Application of property or not − Example is ‘life/death’, ‘certitude/incertitude’

Application of scalable property − Example is ‘rich/poor’, ‘hot/cold’

Application of a usage − Example is ‘father/son’, ‘moon/sun’.

Meaning Representation

Semantic analysis creates a representation of the meaning of a sentence. But before getting into the concept and approaches related to meaning representation, we need to understand the building blocks of semantic system.

Building Blocks of Semantic System

In word representation or representation of the meaning of the words, the following building blocks play an important role −

Entities − It represents the individual such as a particular person, location etc. For example, Haryana. India, Ram all are entities.

Concepts − It represents the general category of the individuals such as a person, city, etc.

Relations − It represents the relationship between entities and concept. For example, Ram is a person.

Predicates − It represents the verb structures. For example, semantic roles and case grammar are the examples of predicates.

Now, we can understand that meaning representation shows how to put together the building blocks of semantic systems. In other words, it shows how to put together entities, concepts, relation and predicates to describe a situation. It also enables the reasoning about the semantic world.

Approaches to Meaning Representations

Semantic analysis uses the following approaches for the representation of meaning −

First order predicate logic (FOPL)

Semantic Nets

Conceptual dependency (CD)

Rule-based architecture

Case Grammar

Conceptual Graphs

Need of Meaning Representations

A question that arises here is why do we need meaning representation? Followings are the reasons for the same −

Linking of linguistic elements to non-linguistic elements

The very first reason is that with the help of meaning representation the linking of linguistic elements to the non-linguistic elements can be done.

Representing variety at lexical level

With the help of meaning representation, unambiguous, canonical forms can be represented at the lexical level.

Can be used for reasoning

Meaning representation can be used to reason for verifying what is true in the world as well as to infer the knowledge from the semantic representation.

Lexical Semantics

The first part of semantic analysis, studying the meaning of individual words is called lexical semantics. It includes words, sub-words, affixes (sub-units), compound words and phrases also. All the words, sub-words, etc. are collectively called lexical items. In other words, we can say that lexical semantics is the relationship between lexical items, meaning of sentences and syntax of sentence.

Following are the steps involved in lexical semantics −

Classification of lexical items like words, sub-words, affixes, etc. is performed in lexical semantics.

Decomposition of lexical items like words, sub-words, affixes, etc. is performed in lexical semantics.

Differences as well as similarities between various lexical semantic structures is also analyzed.

DOI: 10.18653/v1/2022.emnlp-tutorials.1
Corpus ID: 256460894

Meaning Representations for Natural Languages: Design, Models and Applications

Jeffrey Flanigan , Ishan Jindal , +3 authors Nianwen Xue
Published in Conference on Empirical… 2022
Computer Science, Linguistics

66 References

Abstract meaning representation for multi-document summarization, designing a uniform meaning representation for natural language processing, unsupervised entity linking with abstract meaning representation, machine comprehension using rich semantic representations, world knowledge for abstract meaning representation parsing, semantics-aware bert for language understanding, a discriminative graph-based parser for the abstract meaning representation, semantic neural machine translation using amr, abstract meaning representation for sembanking, improving neural machine translation with amr semantic graphs, related papers.

Showing 1 through 3 of 0 Related Papers

Welcome to the Uniform Meaning Representation Project!

The Uniform Meaning (UMR) project is a collaborative research program between faculty and students at the University of Colorado, Boulder, Brandeis University, and the University of New Mexico, whose goal is to design a meaning representation that can be used to annotate the semantic content of a text in any language.

UMR extends AMR to other languages, particularly morphologically complex, low-resource languages. UMR also adds features to AMR that are critical to semantic interpretation and enhances AMR by proposing a companion document-level representation that captures linguistic phenomena such as coreference as well as temporal and modal dependencies that potentially go beyond sentence boundaries. UMR is intended to be scalable, learnable, and cross-linguistically plausible. It is designed to support both lexical and logical inference.

Latest News

UMR Parsing Workshop

University of Colorado, Boulder , June 14, 2024

The 5th International Workshop on Designing Meaning Representation

UMR Summer School

Applications extended to February 9!

February, 2024 : The 5th International Workshop on Designing Meaning Representations will be held in Torino, Italy, on May 21, 2024.
February, 2024 : The UMR Summer School will be running June 9 - 15, 2024!
January, 2024 : Julia Bonn and Jin Zhao hold a 2-day UMR tutorial at Georgetown University.
May 1, 2021 : The UMR website is now up and running.

The UMR project is a collaboration between the University of Colorado, Boulder , Brandeis University , and the University of New Mexico , with funding from the National Science Foundation , IIS Division (Awards No. 1763926, 1764048, 1764091) and CNS Division (Awards No. 2213804, 2213805).

Interested in participating in the UMR Project? Contact us here!

Meaning Representations for Natural Languages: Design, Models and Applications

This tutorial reviews the design of common meaning representations, SoTA models for predicting meaning representations, and the applications of meaning representations in a wide range of downstream NLP tasks and real-world applications. Reporting by a diverse team of NLP researchers from academia and industry with extensive experience in designing, building, and using meaning representations, our tutorial has three components: (1) an introduction to common meaning representations, including basic concepts and design challenges; (2) a review of SoTA methods on building models for meaning representations; and (3) an overview of applications of meaning representations in downstream NLP tasks and real-world applications.

Publication

Nianwen Xue
Jeffrey Flanigan
Tim O'gorman
Ishan Jindal
Natural Language Processing

We’ve had this conversation before: A Novel Approach to Measuring Dialog Similarity

Abstractified multi-instance learning (amil) for biomedical relation extraction, from spectra to structure: automated structure elucidation for organic chemistry.

Representation Learning and NLP

Open Access
First Online: 04 July 2020

Cite this chapter

You have full access to this open access chapter

Zhiyuan Liu 4 ,
Yankai Lin 5 &
Maosong Sun 6

44k Accesses

9 Citations

9 Altmetric

Natural languages are typical unstructured information. Conventional Natural Language Processing (NLP) heavily relies on feature engineering, which requires careful design and considerable expertise. Representation learning aims to learn representations of raw data as useful information for further classification or prediction. This chapter presents a brief introduction to representation learning, including its motivation and basic idea, and also reviews its history and recent advances in both machine learning and NLP.

You have full access to this open access chapter, Download chapter PDF

1.1 Motivation

Machine learning addresses the problem of automatically learning computer programs from data. A typical machine learning system consists of three components [ 5 ]:

That is, to build an effective machine learning system, we first transform useful information on raw data into internal representations such as feature vectors. Then by designing appropriate objective functions, we can employ optimization algorithms to find the optimal parameter settings for the system.

Data representation determines how much useful information can be extracted from raw data for further classification or prediction. If there is more useful information transformed from raw data to feature representations, the performance of classification or prediction will tend to be better. Hence, data representation is a crucial component to support effective machine learning.

Conventional machine learning systems adopt careful feature engineering as preprocessing to build feature representations from raw data. Feature engineering needs careful design and considerable expertise, and a specific task usually requires customized feature engineering algorithms, which makes feature engineering labor intensive, time consuming, and inflexible.

Representation learning aims to learn informative representations of objects from raw data automatically. The learned representations can be further fed as input to machine learning systems for prediction or classification. In this way, machine learning algorithms will be more flexible and desirable while handling large-scale and noisy unstructured data, such as speech, images, videos, time series, and texts.

Deep learning [ 9 ] is a typical approach for representation learning, which has recently achieved great success in speech recognition, computer vision, and natural language processing. Deep learning has two distinguishing features:

Distributed Representation . Deep learning algorithms typically represent each object with a low-dimensional real-valued dense vector, which is named as distributed representation . As compared to one-hot representation in conventional representation schemes (such as bag-of-words models), distributed representation is able to represent data in a more compact and smoothing way, as shown in Fig. 1.1 , and hence is more robust to address the sparsity issue in large-scale data.

Deep Architecture . Deep learning algorithms usually learn a hierarchical deep architecture to represent objects, known as multilayer neural networks. The deep architecture is able to extract abstractive features of objects from raw data, which is regarded as an important reason for the great success of deep learning for speech recognition and computer vision.

Distributed representation of words and entities in human languages

Currently, the improvements caused by deep learning for NLP may still not be so significant as compared to speech and vision. However, deep learning for NLP has been able to significantly reduce the work of feature engineering in NLP in the meantime of performance improvement. Hence, many researchers are devoting to developing efficient algorithms on representation learning (especially deep learning) for NLP.

In this chapter, we will first discuss why representation learning is important for NLP and introduce the basic ideas of representation learning. Afterward, we will briefly review the development history of representation learning for NLP, introduce typical approaches of contemporary representation learning, and summarize existing and potential applications of representation learning. Finally, we will introduce the general organization of this book.

1.2 Why Representation Learning Is Important for NLP

NLP aims to build linguistic-specific programs for machines to understand languages. Natural language texts are typical unstructured data, with multiple granularities, multiple tasks, and multiple domains, which make NLP challenging to achieve satisfactory performance.

Multiple Granularities . NLP concerns about multiple levels of language entries, including but not limited to characters, words, phrases, sentences, paragraphs, and documents. Representation learning can help to represent the semantics of these language entries in a unified semantic space, and build complex semantic relations among these language entries.

Multiple Tasks . There are various NLP tasks based on the same input. For example, given a sentence, we can perform multiple tasks such as word segmentation, part-of-speech tagging, named entity recognition, relation extraction, and machine translation. In this case, it will be more efficient and robust to build a unified representation space of inputs for multiple tasks.

Multiple Domains . Natural language texts may be generated from multiple domains, including but not limited to news articles, scientific articles, literary works, and online user-generated content such as product reviews. Moreover, we can also regard texts in different languages as multiple domains. Conventional NLP systems have to design specific feature extraction algorithms for each domain according to its characteristics. In contrast, representation learning enables us to build representations automatically from large-scale domain data.

In summary, as shown in Fig. 1.2 , representation learning can facilitate knowledge transfer across multiple language entries, multiple NLP tasks, and multiple application domains, and significantly improve the effectiveness and robustness of NLP performance.

Distributed representation can provide unified semantic space for multi-grained language entries and for multiple NLP tasks

1.3 Basic Ideas of Representation Learning

In this book, we focus on the distributed representation scheme (i.e., embedding), and talk about recent advances of representation learning methods for multiple language entries, including words, phrases, sentences, and documents, and their closely related objects including sememe-based linguistic knowledge, entity-based world knowledge, networks, and cross-modal entries.

By distributed representation learning, all objects that we are interested in are projected into a unified low-dimensional semantic space. As demonstrated in Fig. 1.1 , the geometric distance between two objects in the semantic space indicates their semantic relatedness; the semantic meaning of an object is related to which objects are close to it. In other words, it is the relative closeness with other objects that reveals an object’s meaning rather than the absolute position.

1.4 Development of Representation Learning for NLP

In this section, we introduce the development of representation learning for NLP, also shown in Fig. 1.3 . To study representation schemes in NLP, words would be a good start, since they are the minimum units in natural languages. The easiest way to represent a word in a computer-readable way (e.g., using a vector) is one-hot vector , which has the dimension of the vocabulary size and assigns 1 to the word’s corresponding position and 0 to others. It is apparent that one-hot vectors hardly contain any semantic information about words except simply distinguishing them from each other.

One of the earliest ideas of word representation learning can date back to n -gram models [ 15 ]. It is easy to understand: when we want to predict the next word in a sequence, we usually look at some previous words (and in the case of n -gram, they are the previous \(n-1\) words). And if going through a large-scale corpus, we can count and get a good probability estimation of each word under the condition of all combinations of \(n-1\) previous words. These probabilities are useful for predicting words in sequences, and also form vector representations for words since they reflect the meanings of words.

The timeline for the development of representation learning in NLP. With the growing computing power and large-scale text data, distributed representation trained with neural networks and large corpora has become the mainstream

The idea of n -gram models is coherent with the distributional hypothesis : linguistic items with similar distributions have similar meanings [ 7 ]. In another phrase, “a word is characterized by the company it keeps” [ 6 ]. It became the fundamental idea of many NLP models, from word2vec to BERT.

Another example of the distributional hypothesis is Bag-Of-Words (BOW) models [ 7 ]. BOW models regard a document as a bag of its words, disregarding the orders of these words in the document. In this way, the document can be represented as a vocabulary-size vector, in which each word that has appeared in the document corresponds to a unique and nonzero dimension. Then a score can be further computed for each word (e.g., the numbers of occurrences) to indicate the weights of these words in the document. Though very simple, BOW models work great in applications like spam filtering, text classification, and information retrieval, proving that the distributions of words can serve as a good representation for text.

In the above cases, each value in the representation clearly matches one entry (e.g., word scores in BOW models). This one-to-one correspondence between concepts and representation elements is called local representation or symbol-based representation , which is natural and simple.

In distributed representation , on the other hand, each entity (or attribute) is represented by a pattern of activation distributed over multiple elements, and each computing element is involved in representing multiple entities [ 11 ]. Distributed representation has been proved to be more efficient because it usually has low dimensions that can prevent the sparsity issue. Useful hidden properties can be learned from large-scale data and emerged in distributed representation. The idea of distributed representation was originally inspired by the neural computation scheme of humans and other animals. It comes from neural networks (activations of neurons), and with the great success of deep learning, distributed representation has become the most commonly used approach for representation learning.

One of the pioneer practices of distributed representation in NLP is Neural Probabilistic Language Model (NPLM) [ 1 ]. A language model is to predict the joint probability of sequences of words ( n -gram models are simple language models). NPLM first assigns a distributed vector for each word, then uses a neural network to predict the next word. By going through the training corpora, NPLM successfully learns how to model the joint probability of sentences, while brings word embeddings (i.e., low-dimensional word vectors) as learned parameters in NPLM. Though it is hard to tell what each element of a word embedding actually means, the vectors indeed encode semantic meanings about the words, verified by the performance of NPLM.

Inspired by NPLM, there came many methods that embed words into distributed representations and use the language modeling objective to optimize them as model parameters. Famous examples include word2vec [ 12 ], GloVe [ 13 ], and fastText [ 3 ]. Though differing in detail, these methods are all very efficient to train, utilize large-scale corpora, and have been widely adopted as word embeddings in many NLP models. Word embeddings in the NLP pipeline map discrete words into informative low-dimensional vectors, and help to shine a light on neural networks in computing and understanding languages. It makes representation learning a critical part of natural language processing.

The research on representation learning in NLP took a big leap when ELMo [ 14 ] and BERT [ 4 ] came out. Besides using larger corpora, more parameters, and more computing resources as compared to word2vec, they also take complicated context in text into consideration. It means that instead of assigning each word with a fixed vector, ELMo and BERT use multilayer neural networks to calculate dynamic representations for the words based on their context, which is especially useful for the words with multiple meanings. Moreover, BERT starts a new fashion (though not originated from it) of the pretrained fine-tuning pipeline. Previously, word embeddings are simply adopted as input representation. But after BERT, it becomes a common practice to keep using the same neural network structure such as BERT in both pretraining and fine-tuning, which is taking the parameters of BERT for initialization and fine-tuning the model on downstream tasks (Fig. 1.4 ).

This figure shows how word embeddings and pre-trained language models work in NLP pipelines. They both learn distributed representations for language entries (e.g., words) through pretraining objectives and transfer them to target tasks. Furthermore, pre-trained language models can also transfer model parameters

Though not a big theoretical breakthrough, BERT-like models (also known as Pre-trained Language Models (PLM) , for they are pretrained through language modeling objective on large corpora) have attracted wide attention in the NLP and machine learning community, for they have been so successful and achieved state-of-the-art on almost every NLP benchmarks. These models show what large-scale data and computing power can lead to, and new research works on the topic of Pre-Trained language Models (PLMs) emerge rapidly. Probing experiments demonstrate that PLMs implicitly encode a variety of linguistic knowledge and patterns inside their multilayer network parameters [ 8 , 10 ]. All these significant performances and interesting analyses suggest that there are still a lot of open problems to explore in PLMs, as the future of representation learning for NLP.

Based on the distributional hypothesis, representation learning for NLP has evolved from symbol-based representation to distributed representation. Starting from word2vec, word embeddings trained from large corpora have shown significant power in most NLP tasks. Recently, emerged PLMs (like BERT) take complicated context into word representation and start a new trend of the pretraining fine-tuning pipeline, bringing NLP to a new level. What will be the next big change in representation learning for NLP? We hope the contents of this book can give you some inspiration.

1.5 Learning Approaches to Representation Learning for NLP

People have developed various effective and efficient approaches to learn semantic representations for NLP. Here we list some typical approaches.

Statistical Features : As introduced before, semantic representations for NLP in the early stage often come from statistics, instead of emerging from the optimization process. For example, in n -gram or bag-of-words models, elements in the representation are usually frequencies or numbers of occurrences of the corresponding entries counted in large-scale corpora.

Hand-craft Features : In certain NLP tasks, syntactic and semantic features are useful for solving the problem. For example, types of words and entities, semantic roles and parse trees, etc. These linguistic features may be provided with the tasks or can be extracted by specific NLP systems. In a long period before the wide use of distributed representation, researchers used to devote lots of effort into designing useful features and combining them as the inputs for NLP models.

Supervised Learning : Distributed representations emerge from the optimization process of neural networks under supervised learning. In the hidden layers of neural networks, the different activation patterns of neurons represent different entities or attributes. With a training objective (usually a loss function for the target task) and supervised signals (usually the gold-standard labels for training instances of the target tasks), the networks can learn better parameters via optimization (e.g., gradient descent). With proper training, the hidden states will become informative and generalized as good semantic representations of natural languages.

For example, to train a neural network for a sentiment classification task, the loss function is usually set as the cross-entropy of the model predictions with respect to the gold-standard sentiment labels as supervision. While optimizing the objective, the loss gets smaller, and the model performance gets better. In the meantime, the hidden states of the model gradually form good sentence representations by encoding the necessary information for sentiment classification inside the continuous hidden space.

Self-supervised Learning : In some cases, we simply want to get good representations for certain elements, so that these representations can be transferred to other tasks. For example, in most neural NLP models, words in sentences are first mapped to their corresponding word embeddings (maybe from word2vec or GloVe) before sent to the networks. However, there are no human-annotated “labels” for learning word embeddings. To acquire the training objective necessary for neural networks, we need to generate “labels” intrinsically from existing data. This is called self-supervised learning (one way for unsupervised learning).

For example, language modeling is a typical “self-supervised” objective, for it does not require any human annotations. Based on the distributional hypothesis, using the language modeling objective can lead to hidden representations that encode the semantics of words. You may have heard of a famous equation: \(\mathbf {w}(\mathtt {king}) - \mathbf {w}(\mathtt {man}) + \mathbf {w}(\mathtt {woman}) = \mathbf {w}(\mathtt {queen})\) , which demonstrates the analogical properties that the word embeddings have possessed through self-supervised learning.

We can see another angle of self-supervised learning in autoencoders. It is also a way to learn representations for a set of data. Typical autoencoders have a reduction (encoding) phase and a reconstruction (decoding) phase. In the reduction phase, an item from the data is encoded into a low-dimensional representation, and in the reconstruction phase, the model tries to reconstruct the item from the intermediate representation. Here, the training objective is the reconstruction loss, derived from the data itself. During the training process, meaningful information is encoded and kept in the latent representation, while noise signals are discarded.

Self-supervised learning has made a great success in NLP, for the plain text itself contains abundant knowledge and patterns about languages, and self-supervised learning can fully utilize the existing large-scale corpora. Nowadays, it is still the most exciting research area of representation learning for natural languages, and researchers continue to put their efforts into this direction.

Besides, many other machine learning approaches have also been explored in representation learning for NLP, such as adversarial training, contrastive learning, few-shot learning, meta-learning, continual learning, reinforcement learning, et al. How to develop more effective and efficient approaches of representation learning for NLP and to better take advantage of large-scale and complicated corpora and computing power, is still an important research topic.

1.6 Applications of Representation Learning for NLP

In general, there are two kinds of applications of representation learning for NLP. In one case, the semantic representation is trained in a pretraining task (or designed by human experts) and is transferred to the model for the target task. Word embedding is an example of the application. It is trained by using language modeling objective and is taken as inputs for other down-stream NLP models. In this book, we will also introduce sememe knowledge representation and world knowledge representation, which can also be integrated into some NLP systems as additional knowledge augmentation to enhance their performance in certain aspects.

In other cases, the semantic representation lies within the hidden states of the neural model and directly aims for better performance of target tasks as an end-to-end fashion. For example, many NLP tasks want to semantically compose sentence or document representation: tasks like sentiment classification, natural language inference, and relation extraction require sentence representation and the tasks like question answering need document representation. As shown in the latter part of the book, many representation learning methods have been developed for sentences and documents and benefit these NLP tasks.

1.7 The Organization of This Book

We start the book from word representation. By giving a thorough introduction to word representation, we hope the readers can grasp the basic ideas for representation learning for NLP. Based on that, we further talk about how to compositionally acquire the representation for higher level language components, from sentences to documents.

As shown in Fig. 1.5 , representation learning will be able to incorporate various types of structural knowledge to support a deep understanding of natural languages, named as knowledge-guided NLP. Hence, we next introduce two forms of knowledge representation that are closely related to NLP. On the one hand, sememe representation tries to encode linguistic and commonsense knowledge in natural languages. Sememe is defined as the minimum indivisible unit of semantic meaning [ 2 ]. With the help of sememe representation learning, we can get more interpretable and more robust NLP models. On the other hand, world knowledge representation studies how to encode world facts into continuous semantic space. It can not only help with knowledge graph tasks but also benefit knowledge-guided NLP applications.

The architecture of knowledge-guided NLP

Besides, the network is also a natural way to represent objects and their relationships. In the network representation section, we study how to embed vertices and edges in a network and how these elements interact with each other. Through the applications, we further show how network representations can help NLP tasks.

Another interesting topic related to NLP is the cross-modal representation, which studies how to model unified semantic representations across different modalities (e.g., text, audios, images, videos, etc.). Through this section, we review several cross-modal problems along with representative models.

At the end of the book, we introduce some useful resources to the readers, including deep learning frameworks and open-source codes. We also share some views about the next big topics in representation learning for NLP. We hope that the resources and the outlook can help our readers have a better understanding of the content of the book, and inspire our readers about how representation learning in NLP would further develop.

Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. A neural probabilistic language model. Journal of Machine Learning Research , 3(Feb):1137–1155, 2003.

Google Scholar

Leonard Bloomfield. A set of postulates for the science of language. Language , 2(3):153–164, 1926.

Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics , 5:135–146, 2017.

Article Google Scholar

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL , 2019.

Pedro Domingos. A few useful things to know about machine learning. Communications of the ACM , 55(10):78–87, 2012.

John R Firth. A synopsis of linguistic theory, 1930–1955. 1957.

Zellig S Harris. Distributional structure. Word , 10(2–3):146–162, 1954.

John Hewitt and Christopher D. Manning. A structural probe for finding syntax in word representations. In Proceedings of NAACL-HLT , 2019.

Goodfellow Ian, Yoshua Bengio, and Aaron Courville. Deep learning. Book in preparation for MIT Press, 2016.

Nelson F. Liu, Matt Gardner, Yonatan Belinkov, Matthew E. Peters, and Noah A. Smith. Linguistic knowledge and transferability of contextual representations. In Proceedings of NAACL-HLT , 2019.

James L McClelland, David E Rumelhart, PDP Research Group, et al. Parallel distributed processing. Explorations in the Microstructure of Cognition , 2:216–271, 1986.

T Mikolov and J Dean. Distributed representations of words and phrases and their compositionality. Proceedings of NeurIPS , 2013.

Jeffrey Pennington, Richard Socher, and Christopher Manning. Glove: Global vectors for word representation. In Proceedings of EMNLP , 2014.

Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. Deep contextualized word representations. In Proceedings of NAACL-HLT , pages 2227–2237, 2018.

Claude E Shannon. A mathematical theory of communication. Bell system technical journal , 27(3):379–423, 1948.

Download references

Author information

Authors and affiliations.

Tsinghua University, Beijing, China

Zhiyuan Liu

Pattern Recognition Center, Tencent Wechat, Beijing, China

Department of Computer Science and Technology, Tsinghua University, Beijing, China

Maosong Sun

You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhiyuan Liu .

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Liu, Z., Lin, Y., Sun, M. (2020). Representation Learning and NLP. In: Representation Learning for Natural Language Processing. Springer, Singapore. https://doi.org/10.1007/978-981-15-5573-2_1

Download citation

DOI : https://doi.org/10.1007/978-981-15-5573-2_1

Published : 04 July 2020

Publisher Name : Springer, Singapore

Print ISBN : 978-981-15-5572-5

Online ISBN : 978-981-15-5573-2

eBook Packages : Computer Science Computer Science (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Publish with us

Policies and ethics

Find a journal
Track your research

IMAGES

Guide to Abstract Meaning Representation(AMR) to text with TensorFlow
NLP Coach
What is NLP? NLP simply explained
Semantic Parsing using Abstract Meaning Representation
NLP Preferred Representational System
NLP lead & representation system

VIDEO

CSE4238
CSCI:548 Unsupervised Entity Linking with Abstract Meaning Representation
The Power of Natural language Processing (NLP): Explained in 1 minute
what is meaning by NLP in psychology english emotional intelligence #humanconsciousness #psychology
NLP_02_TextPreprocessing&Representation Part1
Norman Malcolm on Wittgenstein (1967)

COMMENTS

Understanding Semantic Analysis
Introduction to Semantic Analysis. Semantic Analysis is a subfield of Natural Language Processing (NLP) that attempts to understand the meaning of Natural Language. Understanding Natural Language might seem a straightforward process to us as humans. However, due to the vast complexity and subjectivity involved in human language, interpreting it ...
PDF Meaning Representations for Natural Languages: Design, Models and
stream NLP tasks and real-world applications. Re-porting by a diverse team of NLP researchers from academia and industry with extensive experience in designing, building and using meaning represen-tations, our tutorial has three components: (1) an introduction to common meaning representations, including basic concepts and design challenges; (2)
Meaning Representations for Natural Languages: Design, Models and
Reporting by a diverse team of NLP researchers from academia and industry with extensive experience in designing, building and using meaning representations, our tutorial has three components: (1) an introduction to common meaning representations, including basic concepts and design challenges; (2) a review of SoTA methods on building models ...
Natural Language Processing
Linking of linguistic elements to non-linguistic elements. ... Meaning representation can be used to reason for verifying what is true in the world as well as to infer the knowledge from the semantic representation. Lexical Semantics. The first part of semantic analysis, studying the meaning of individual words is called lexical semantics. ...
Abstract Meaning Representation. What is meaning representation in NLP
Abstract Meaning Representation (AMR) First of all, each sentence is a rooted, directed, acyclic graph, where. nodes are concepts. edges are semantic relations. functional words are omitted. The ...
Meaning Representation
State and explain FOUR common types of meaning representation in NLP. For each type, use the following sample sentence/utterance [5.70] [5.69] Jack buys a new flat in London to illustrate how they work for meaning representation. 5.4. What are the THREE basic requirements for meaning representation. For each requirement, give two live examples ...
PDF Meaning Representations for Natural Languages: Design, Models and
There is a wide range of NLP tasks that leverage meaning representations as an effective way to in-fuse knowledge into their models for better perfor-mance and interpretability. For instance, SRL has ... sentence-level meaning representation that inherits PropBank-stylesemanticroles, andUniformMean-ing Representation, a cross-lingual document ...
PDF Meaning Representations for Natural Languages: Design, Models and
cepts of meaning representation. However, a ba-sic understanding of NLP, machine learning (espe-cially, deep learning) concepts may be helpful. We intend to introduce the necessary concepts related to meaning representation during the introductory section of the tutorial. In this tutorial, attendees will • Develop fluency in core concepts of ...
It's the Meaning That Counts: The State of the Art in NLP ...
Semantics, the study of meaning, is central to research in Natural Language Processing (NLP) and many other fields connected to Artificial Intelligence. Nevertheless, how semantics is understood in NLP ranges from traditional, formal linguistic definitions based on logic and the principle of compositionality to more applied notions based on grounding meaning in real-world objects and real-time ...
Meaning Representations for Natural Languages: Design, Models and
This tutorial reviews the design of common meaning representations, SoTA models for predictingmeaning representations, and the applications of meaning representations in a wide range of downstream NLP tasks and real-world applications and shares best practices in choosing the right meaning representation for downstream tasks. This tutorial reviews the design of common meaning representations ...
Part 9: Step by Step Guide to Master NLP
Lexical Semantics. It is the first part of semantic analysis, in which we study the meaning of individual words. It involves words, sub-words, affixes (sub-units), compound words, and phrases also. All the words, sub-words, etc. are collectively known as lexical items. In simple words, we can say that lexical semantics represents the ...
Transition-based Abstract Meaning Representation Parsing with
Abstract Meaning Representation (AMR) [1] is a sentence-level semantic formalism of languages that attempts to do a practical, replicable amount of canonicalization of ... To use AMR in other NLP tasks, parsers are needed to generate AMR graphs automatically from linguistic sentences. In recent years, many parsers have been developed [19, 30 ...
Uniform Meaning Representation
The Uniform Meaning (UMR) project is a collaborative research program between faculty and students at the University of Colorado, Boulder, Brandeis University, and the University of New Mexico, whose goal is to design a meaning representation that can be used to annotate the semantic content of a text in any language. UMR extends AMR to other ...
Meaning Representation and SRL : assuming there is some meaning
A meaning representation can be understood as a bridge between subtle linguistic nuances and our common-sense non-linguistic knowledge about the world. It can be seen as a formal structure capturing the meaning of linguistic input. While doing this, we assume that any given linguistic structure has some stuff/information that can be used to ...
Meaning Representations for Natural Languages: Design, Models and
Meaning Representation (AMR) is a semantic formalism for which a growing set of annotated examples is available. We introduce the first approach to parse sentences into this representation ...
Meaning Representations for Natural Languages: Design, Models and
Reporting by a diverse team of NLP researchers from academia and industry with extensive experience in designing, building, and using meaning representations, our tutorial has three components: (1) an introduction to common meaning representations, including basic concepts and design challenges; (2) a review of SoTA methods on building models ...
Representation Learning and NLP
Hence, we next introduce two forms of knowledge representation that are closely related to NLP. On the one hand, sememe representation tries to encode linguistic and commonsense knowledge in natural languages. Sememe is defined as the minimum indivisible unit of semantic meaning . With the help of sememe representation learning, we can get more ...
PDF Semantics: Meaning Representations and Computation
-Representation of meaning of linguistic input -Representation of state of world Representing Meaning First-order Logic Semantic Network Conceptual Dependency Frame-Based . 10/10/2011 6 Representational Requirements •Verifiability •Unambiguous representations •Canonical Form
PDF Sentence Meaning
tate of affairs. The representation of both states and events may involve a host of participants, props, tim. s and locations.The representations for events and states that we have used thus far have con-sisted of single predicates with as many arguments as are needed to incorporate all the roles associated with.
How to Represent Meaning in Natural Language Processing? Word ...
A general illustration of contextualized word embeddings and how they are integrated in NLP models. A language modelling component is responsible for analyzing the context of the target word (cell in the figure) and generating its dynamic embedding.This way the main system benefits from static and dynamic word representations at the same time, and without the need for disambiguation.
Embeddings in Natural Language Processing: Theory and Advances in
Word vector representations have a long tradition in several research fields, such as cognitive science or computational linguistics. They have been used to represent the meaning of various units of natural languages, including, among others, words, phrases, and sentences. Before the deep learning tsunami, count-based vector space models had been successfully used in computational linguistics ...
PDF CHAPTER Logical Representations of Sentence Meaning
low-ing are some of the highlights of this chapter:A major approach to meaning in computational linguistics involves the cre-ation of formal meaning representations that captu. e the meaning-related content of linguistic inputs. These representations are intended to bridge the gap f.
New Universal Labeling Strategy for Meaning Representation in Nlp
meaning representation in NLP. Advanc e d Mathematical Mo dels & Applications, 8 (3), 437-451. 437. ADV ANCED MA THEMA TICAL MODELS & APPLICA TIONS, V.8, N.3, 2023.