NLTK – WordNet

A WordNet is a semantically-oriented dictionary of English with synonyms, antonyms, and brief definitions. NLTK provides 155,287 words and 117,659 synonym sets in English WordNet.

This entire tutorial has explained the various features of the WordNet with examples.

Senses and Synonyms

WordNet provides synsets which is the collection of synonym words also called “lemmas”. Let’s look at how synonyms are accessed in WordNet.

>>> from nltk.corpus import wordnet
>>> wordnet.synsets("computer")
[Synset('computer.n.01'), Synset('calculator.n.01')]

The word computer has only two ‘synonyms set’ is called a synset. Each word of the synset can have multiple meaning. Let’s explore the synset.

>>> wordnet.synset("computer.n.01").lemma_names()
['computer', 'computing_machine', 'computing_device', 'data_processor', 'electronic_computer', 'information_processing_system']

The word computer can also identify a ‘computing_machine’, ‘computing_device’, ‘data_processor’, ‘electronic_computer’ and  ‘information_processing_system’.

Let’s get all lemmas for each synset.

>>> for e in wordnet.synsets("computer"):
...     print(f'{e} --> {e.lemma_names()}')
... 
Synset('computer.n.01') --> ['computer', 'computing_machine', 'computing_device', 'data_processor', 'electronic_computer', 'information_processing_system']

Synset('calculator.n.01') --> ['calculator', 'reckoner', 'figurer', 'estimator', 'computer']

Synset also provides the definition and some examples of the word which helps human to understand the meaning of the word. Let’s see the definition and example of the word ‘computer’:

>>> wordnet.synset("computer.n.01").definition()
'a machine for performing calculations automatically'

Let’s print all lemmas for a given synset –

>>> wordnet.synset('computer.n.01').lemmas()

[Lemma('computer.n.01.computer'), Lemma('computer.n.01.computing_machine'), Lemma('computer.n.01.computing_device'), Lemma('computer.n.01.data_processor'), Lemma('computer.n.01.electronic_computer'), Lemma('computer.n.01.information_processing_system')]

You can get the synset corresponding to lemma –

>>> wordnet.lemma('computer.n.01.computing_device').synset()
Synset('computer.n.01')

Get the name of the lemma –

>>> wordnet.lemma('computer.n.01.computing_device').name()
'computing_device'

Hyponyms

WordNet navigates the lexical relations between the synset such as Hypernyms, hyponyms, meronyms, holonyms and antonyms.

Hyponyms give abstract concepts of the word that are much more specific.

Let’s look at the list of hyponyms words of the computer:

>>> syn = wordnet.synset('computer.n.01')
>>> syn.hyponyms()
[Synset('analog_computer.n.01'), Synset('digital_computer.n.01'), Synset('home_computer.n.01'), Synset('node.n.08'), Synset('number_cruncher.n.02'), Synset('pari-mutuel_machine.n.01'), Synset('predictor.n.03'), Synset('server.n.03'), Synset('turing_machine.n.01'), Synset('web_site.n.01')]

Let’s all lemmas of each synset –

>>> [lemma.name() for synset in syn.hyponyms() for lemma in synset.lemmas()]

['analog_computer', 'analogue_computer', 'digital_computer', 'home_computer', 'node', 'client', 'guest', 'number_cruncher', 'pari-mutuel_machine', 'totalizer', 'totaliser', 'totalizator', 'totalisator', 'predictor', 'server', 'host', 'Turing_machine', 'web_site', 'website', 'internet_site', 'site']

Antonyms 

Antonyms is the lexical relationship between lemmas, which gives the word opposite in meaning.

>>> wordnet.lemma('buy.v.01.buy').antonyms()
[Lemma('sell.v.01.sell')]

>>> wordnet.lemma('horizontal.a.01.horizontal').antonyms()
[Lemma('vertical.a.01.vertical'), Lemma('inclined.a.02.inclined')]

Semantic Similarity

In Natural Language Processing, word semantic similarity measure often being useful for indexing documents. WordNet also provides the feature to find the synsets with related meaning.

Let’s discover the semantic similarity in WordNet:

>>> vehicle = wordnet.synset('vehicle.n.01')
>>> train = wordnet.synset("train.n.01")
>>> car = wordnet.synset('car.n.01')
>>> bike = wordnet.synset('motorcycle.n.01')
>>> care = wordnet.synset("care.n.01")
>>>
>>> car.lowest_common_hypernyms(vehicle)
[Synset('vehicle.n.01')]
>>>
>>> train.lowest_common_hypernyms(vehicle)
[Synset('conveyance.n.03')]
>>>
>>> care.lowest_common_hypernyms(car)
[Synset('entity.n.01')]
>>>
>>> car.lowest_common_hypernyms(bike)
[Synset('motor_vehicle.n.01')]

Here, the car and motor_vehicle is very specific and the entity is completely general.

WordNet also provides several other features based on lexical relations, similarity measures, etc. Using dir(), you can get the list of methods defined on a synset.

>>> dir(wordnet.synset('car.n.01'))

['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', '__unicode__', '__weakref__', '_all_hypernyms', '_definition', '_examples', '_frame_ids', '_hypernyms', '_instance_hypernyms', '_iter_hypernym_lists', '_lemma_names', '_lemma_pointers', '_lemmas', '_lexname', '_max_depth', '_min_depth', '_name', '_needs_root', '_offset', '_pointers', '_pos', '_related', '_shortest_hypernym_paths', '_wordnet_corpus_reader', 'also_sees', 'attributes', 'causes', 'closure', 'common_hypernyms', 'definition', 'entailments', 'examples', 'frame_ids', 'hypernym_distances', 'hypernym_paths', 'hypernyms', 'hyponyms', 'in_region_domains', 'in_topic_domains', 'in_usage_domains', 'instance_hypernyms', 'instance_hyponyms', 'jcn_similarity', 'lch_similarity', 'lemma_names', 'lemmas', 'lexname', 'lin_similarity', 'lowest_common_hypernyms', 'max_depth', 'member_holonyms', 'member_meronyms', 'min_depth', 'name', 'offset', 'part_holonyms', 'part_meronyms', 'path_similarity', 'pos', 'region_domains', 'res_similarity', 'root_hypernyms', 'shortest_path_distance', 'similar_tos', 'substance_holonyms', 'substance_meronyms', 'topic_domains', 'tree', 'unicode_repr', 'usage_domains', 'verb_groups', 'wup_similarity']

.     .     .

Leave a Reply

Your email address will not be published. Required fields are marked *

Natural Language Processing Tutorials

A complete introduction to GPT-3 with Use Case examples

Deep Unveiling of the BERT Model

Word Embedding

Jaccard Similarity – Text Similarity Metric in NLP

TensorFlow : Text Classification of Movie Reviews

Text Preprocessing: Handle Emoji & Emoticon

Text Preprocessing: Removal of Punctuations

Develop the text Classifier with TensorFlow Hub

Tensorflow : BERT Fine-tuning with GPU

Introduction to BERT

Word Tokenization with NLTK

Installation of NLTK

Introduction to Natural Language Processing (NLP)

Cosine Similarity – Text Similarity Metric

Introduction to Word Embeddings

NLP – Stop Words

An Introduction to N-grams

Stemming and Lemmatization

TfidfVectorizer for text classification

CountVectorizer for text classification

Regular Expression for Text Cleaning in NLP

Text Data Cleaning & Preprocessing

Different Tokenization Technique for Text Processing