NLTK – WordNet – Study Machine Learning

A WordNet is a semantically-oriented dictionary of English with synonyms, antonyms, and brief definitions. NLTK provides 155,287 words and 117,659 synonym sets in English WordNet.

This entire tutorial has explained the various features of the WordNet with examples.

Senses and Synonyms

WordNet provides synsets which is the collection of synonym words also called “lemmas”. Let’s look at how synonyms are accessed in WordNet.

>>> from nltk.corpus import wordnet
>>> wordnet.synsets("computer")
[Synset('computer.n.01'), Synset('calculator.n.01')]

The word computer has only two ‘synonyms set’ is called a synset. Each word of the synset can have multiple meaning. Let’s explore the synset.

>>> wordnet.synset("computer.n.01").lemma_names()
['computer', 'computing_machine', 'computing_device', 'data_processor', 'electronic_computer', 'information_processing_system']

The word computer can also identify a ‘computing_machine’, ‘computing_device’, ‘data_processor’, ‘electronic_computer’ and ‘information_processing_system’.

Let’s get all lemmas for each synset.

>>> for e in wordnet.synsets("computer"):
...     print(f'{e} --> {e.lemma_names()}')
... 
Synset('computer.n.01') --> ['computer', 'computing_machine', 'computing_device', 'data_processor', 'electronic_computer', 'information_processing_system']

Synset('calculator.n.01') --> ['calculator', 'reckoner', 'figurer', 'estimator', 'computer']

Synset also provides the definition and some examples of the word which helps human to understand the meaning of the word. Let’s see the definition and example of the word ‘computer’:

>>> wordnet.synset("computer.n.01").definition()
'a machine for performing calculations automatically'

Let’s print all lemmas for a given synset –

>>> wordnet.synset('computer.n.01').lemmas()

[Lemma('computer.n.01.computer'), Lemma('computer.n.01.computing_machine'), Lemma('computer.n.01.computing_device'), Lemma('computer.n.01.data_processor'), Lemma('computer.n.01.electronic_computer'), Lemma('computer.n.01.information_processing_system')]

You can get the synset corresponding to lemma –

>>> wordnet.lemma('computer.n.01.computing_device').synset()
Synset('computer.n.01')

Get the name of the lemma –

>>> wordnet.lemma('computer.n.01.computing_device').name()
'computing_device'

Hyponyms

WordNet navigates the lexical relations between the synset such as Hypernyms, hyponyms, meronyms, holonyms and antonyms.

Hyponyms give abstract concepts of the word that are much more specific.

Let’s look at the list of hyponyms words of the computer:

>>> syn = wordnet.synset('computer.n.01')
>>> syn.hyponyms()
[Synset('analog_computer.n.01'), Synset('digital_computer.n.01'), Synset('home_computer.n.01'), Synset('node.n.08'), Synset('number_cruncher.n.02'), Synset('pari-mutuel_machine.n.01'), Synset('predictor.n.03'), Synset('server.n.03'), Synset('turing_machine.n.01'), Synset('web_site.n.01')]

Let’s all lemmas of each synset –

>>> [lemma.name() for synset in syn.hyponyms() for lemma in synset.lemmas()]

['analog_computer', 'analogue_computer', 'digital_computer', 'home_computer', 'node', 'client', 'guest', 'number_cruncher', 'pari-mutuel_machine', 'totalizer', 'totaliser', 'totalizator', 'totalisator', 'predictor', 'server', 'host', 'Turing_machine', 'web_site', 'website', 'internet_site', 'site']

Antonyms

Antonyms is the lexical relationship between lemmas, which gives the word opposite in meaning.

>>> wordnet.lemma('buy.v.01.buy').antonyms()
[Lemma('sell.v.01.sell')]

>>> wordnet.lemma('horizontal.a.01.horizontal').antonyms()
[Lemma('vertical.a.01.vertical'), Lemma('inclined.a.02.inclined')]

Semantic Similarity

In Natural Language Processing, word semantic similarity measure often being useful for indexing documents. WordNet also provides the feature to find the synsets with related meaning.

Let’s discover the semantic similarity in WordNet:

>>> vehicle = wordnet.synset('vehicle.n.01')
>>> train = wordnet.synset("train.n.01")
>>> car = wordnet.synset('car.n.01')
>>> bike = wordnet.synset('motorcycle.n.01')
>>> care = wordnet.synset("care.n.01")
>>>
>>> car.lowest_common_hypernyms(vehicle)
[Synset('vehicle.n.01')]
>>>
>>> train.lowest_common_hypernyms(vehicle)
[Synset('conveyance.n.03')]
>>>
>>> care.lowest_common_hypernyms(car)
[Synset('entity.n.01')]
>>>
>>> car.lowest_common_hypernyms(bike)
[Synset('motor_vehicle.n.01')]

Here, the car and motor_vehicle is very specific and the entity is completely general.

WordNet also provides several other features based on lexical relations, similarity measures, etc. Using dir(), you can get the list of methods defined on a synset.

>>> dir(wordnet.synset('car.n.01'))

['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', '__unicode__', '__weakref__', '_all_hypernyms', '_definition', '_examples', '_frame_ids', '_hypernyms', '_instance_hypernyms', '_iter_hypernym_lists', '_lemma_names', '_lemma_pointers', '_lemmas', '_lexname', '_max_depth', '_min_depth', '_name', '_needs_root', '_offset', '_pointers', '_pos', '_related', '_shortest_hypernym_paths', '_wordnet_corpus_reader', 'also_sees', 'attributes', 'causes', 'closure', 'common_hypernyms', 'definition', 'entailments', 'examples', 'frame_ids', 'hypernym_distances', 'hypernym_paths', 'hypernyms', 'hyponyms', 'in_region_domains', 'in_topic_domains', 'in_usage_domains', 'instance_hypernyms', 'instance_hyponyms', 'jcn_similarity', 'lch_similarity', 'lemma_names', 'lemmas', 'lexname', 'lin_similarity', 'lowest_common_hypernyms', 'max_depth', 'member_holonyms', 'member_meronyms', 'min_depth', 'name', 'offset', 'part_holonyms', 'part_meronyms', 'path_similarity', 'pos', 'region_domains', 'res_similarity', 'root_hypernyms', 'shortest_path_distance', 'similar_tos', 'substance_holonyms', 'substance_meronyms', 'topic_domains', 'tree', 'unicode_repr', 'usage_domains', 'verb_groups', 'wup_similarity']

. . .

NLTK – WordNet

Senses and Synonyms

Hyponyms

Antonyms

Semantic Similarity

Leave a Reply Cancel reply

Natural Language Processing Tutorials