Python NLTK: Working with WordNet [Natural Language Processing (NLP)]

Facebook Tweet LinkedIn Pin Print EmailShares

This article shows how you can use the WordNet lexical database in NLTK (Natural Language Toolkit).

We deal with basic usage of WordNet and also finding synonyms, antonyms, hypernyms, hyponyms, holonyms of words. We also look into finding the similarities between any two words.

WordNet means the Network of Words. So, in WordNet, the words are connected with each other through linguistic relationship. The linguistic relations are the synonym, hypernym, hyponym, etc.

WordNet contains a large collection of words and vocabulary from the English language. These words are related to each other and are grouped into sets.

Nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations.

WordNet is part of the NLTK corpus.

Table of Contents

Loading WordNet Corpus

Here, we look up for any particular word.


from nltk.corpus import wordnet as wn

print (wn.synsets('good'))
'''
Output:

[Synset('good.n.01'), Synset('good.n.02'), Synset('good.n.03'), Synset('commodity.n.01'), Synset('good.a.01'), Synset('full.s.06'), Synset('good.a.03'), Synset('estimable.s.02'), Synset('beneficial.s.01'), Synset('good.s.06'), Synset('good.s.07'), Synset('adept.s.01'), Synset('good.s.09'), Synset('dear.s.02'), Synset('dependable.s.04'), Synset('good.s.12'), Synset('good.s.13'), Synset('effective.s.04'), Synset('good.s.15'), Synset('good.s.16'), Synset('good.s.17'), Synset('good.s.18'), Synset('good.s.19'), Synset('good.s.20'), Synset('good.s.21'), Synset('well.r.01'), Synset('thoroughly.r.02')]
'''

The synsets function returns different form of the given word good. Synset is a set of synonyms of the given word that share a common meaning. There is a 3-part naming for the synset in the form of: word.pos.nn:

There’s a second parameter to the synsets function. The second parameter is part of speech (pos) tag.

The part of speech (pos) contants for ADJ, ADJ_SAT, ADV, NOUN, VERB are ‘a’, ‘s’, ‘r’, ‘n’, ‘v’ respectively. ADJ_SAT stands for Adjective Satellite.


# print (wn.synsets('good', pos=wn.NOUN))
print (wn.synsets('good', pos='n'))
'''
Output:

[Synset('good.n.01'), Synset('good.n.02'), Synset('good.n.03'), Synset('commodity.n.01')]
'''

my_word = wn.synset('good.n.01') 
print (my_word.definition()) # Output: benefit
print (my_word.examples())
'''
Output:

['for your own good', "what's the good of worrying?"]
'''

my_word = wn.synset('good.n.02') 
print (my_word.definition()) # Output: moral excellence or admirableness
print (my_word.examples())
'''
Output:

['there is much good to be found in people']
'''

my_word = wn.synset('good.n.03') 
print (my_word.definition()) # Output: that which is pleasing or valuable or useful
print (my_word.examples())
'''
Output:

['weigh the good against the bad', 'among the highest goods of all are happiness and self-realization']
'''

my_word = wn.synset('good.a.01') 
print (my_word.definition()) # Output: having desirable or positive qualities especially those suitable for a thing specified
print (my_word.examples())
'''
Output:

['good news from the hospital', 'a good report card', 'when she was good she was very very good', 'a good knife is one good for cutting', 'this stump will make a good picnic table', 'a good check', 'a good joke', 'a good exterior paint', 'a good secretary', 'a good dress for the office']
'''

my_word = wn.synset('good.a.03') 
print (my_word.definition()) # Output: morally admirable
print (my_word.examples()) # Output: []

SYNONYMS & ANTONYMS

We can use lemmas() function of the synset. It returns synonyms of that particular synset.

Synonyms


my_word = wn.synset('good.n.01') 
print (my_word.lemmas()) # Output: [Lemma('good.n.01.good')]
print (my_word.lemmas()[0].name()) # Output: good
print (my_word.lemmas()[0].antonyms()) # Output: []

my_word = wn.synset('good.n.02') 
print (my_word.lemmas()) # Output: [Lemma('good.n.02.good'), Lemma('good.n.02.goodness')]
print (my_word.lemmas()[0].name()) # Output: good
print (my_word.lemmas()[1].name()) # Output: goodness

Antonyms

We first find out the synonyms of a given word using the lemmas() function. After that, we can find the antonyms of each synonyms word.


my_word = wn.synset('good.n.02')
print (my_word.lemmas()) # Output: [Lemma('good.n.02.good'), Lemma('good.n.02.goodness')]

print (my_word.lemmas()[0].name()) # Output: good
print (my_word.lemmas()[0].antonyms()) # Output: [Lemma('evil.n.03.evil')]
print (my_word.lemmas()[0].antonyms()[0].name()) # Output: evil

print (my_word.lemmas()[1].name()) # Output: goodness
print (my_word.lemmas()[1].antonyms()) # Output: [Lemma('evil.n.03.evilness')]
print (my_word.lemmas()[1].antonyms()[0].name()) # Output: evilness

SIMILARITY BETWEEN TWO WORDS

There are different similarity measures present in NLTK. They are:

1) Path Similarity: Return a score denoting how similar two word senses are, based on the shortest path that connects the senses in the is-a (hypernym/hypnoym) taxonomy.

2) Leacock-Chodorow (LCH) Similarity: Return a score denoting how similar two word senses are, based on the shortest path that connects the senses (as Path Similarity) and the maximum depth of the taxonomy in which the senses occur.

3) Wu-Palmer (WUP) Similarity: Return a score denoting how similar two word senses are, based on the depth of the two senses in the taxonomy and that of their Least Common Subsumer (most specific ancestor node).

4) Resnik (RES) Similarity: Return a score denoting how similar two word senses are, based on the Information Content (IC) of the Least Common Subsumer (most specific ancestor node).

5) Jiang-Conrath (JCN) Similarity: Return a score denoting how similar two word senses are, based on the Information Content (IC) of the Least Common Subsumer (most specific ancestor node) and that of the two input Synsets.

6) Lin Similarity: Return a score denoting how similar two word senses are, based on the Information Content (IC) of the Least Common Subsumer (most specific ancestor node) and that of the two input Synsets.

Source: http://www.nltk.org/howto/wordnet.html


print (wn.synsets('bad'))
'''
Output:

[Synset('bad.n.01'), Synset('bad.a.01'), Synset('bad.s.02'), Synset('bad.s.03'), Synset('bad.s.04'), Synset('regretful.a.01'), Synset('bad.s.06'), Synset('bad.s.07'), Synset('bad.s.08'), Synset('bad.s.09'), Synset('bad.s.10'), Synset('bad.s.11'), Synset('bad.s.12'), Synset('bad.s.13'), Synset('bad.s.14'), Synset('badly.r.05'), Synset('badly.r.06')]
'''

word_1 = wn.synset('good.n.01')
word_2 = wn.synset('bad.n.01')
print (word_1.wup_similarity(word_2)) # Output: 0.666666666667
print (word_2.wup_similarity(word_1)) # Output: 0.666666666667

word_1 = wn.synset('good.n.01')
word_2 = wn.synset('evil.n.01')
print (word_1.wup_similarity(word_2)) # Output: 0.25

word_1 = wn.synset('bad.n.01')
word_2 = wn.synset('evil.n.01')
print (word_1.wup_similarity(word_2)) # Output: 0.285714285714

print (wn.synsets('eat'))
'''
Output:

[Synset('eat.v.01'), Synset('eat.v.02'), Synset('feed.v.06'), Synset('eat.v.04'), Synset('consume.v.05'), Synset('corrode.v.01')]
'''

print (wn.synsets('sleep'))
'''
Output:

[Synset('sleep.n.01'), Synset('sleep.n.02'), Synset('sleep.n.03'), Synset('rest.n.05'), Synset('sleep.v.01'), Synset('sleep.v.02')]
'''

word_1 = wn.synset('eat.v.01')
word_2 = wn.synset('sleep.v.01')
print (word_1.wup_similarity(word_2)) # Output: 0.25


word_1 = wn.synset('dog.n.01')
word_2 = wn.synset('cat.n.01')
print (word_1.wup_similarity(word_2)) # Output: 0.857142857143
print (word_1.path_similarity(word_2)) # Output: 0.2
print (word_1.lch_similarity(word_2)) # Output: 2.02814824729

word_1 = wn.synset('ship.n.01')
word_2 = wn.synset('boat.n.01')
print (word_1.wup_similarity(word_2)) # Output: 0.909090909091
print (word_1.path_similarity(word_2)) # Output: 0.333333333333
print (word_1.lch_similarity(word_2)) # Output: 2.53897387106

HYPERNYMS, HYPONYMS, & HOLONYMS

All synsets are connected to other synsets by means of semantic relations. Some of such relations are:

Hypernyms = Y is a hypernym of X if every X is a (kind of) Y
Hyponyms = Y is a hyponym of X if every Y is a (kind of) X
Holonyms = Y is a holonym of X if X is a part of Y

In below example code, we can see the following:

1) Canine is another name for Dog.

So, from the above definition of hypernym, Canine (Y) is a hypernym of Dog (X) because every Dog (X) is a (kind of) Canine (Y).

2) Basenji is a breed of hunting dog.

So, from the above definition of hyponym, Basenji (Y) is a hyponym of Dog (X) because every Basenji (Y) is a (kind of) Dog (X).

3) Canis is a genus of the Canidaes containing multiple extant species, such as wolves, dogs and coyotes. Species of this genus are distinguished by their moderate to large size, their massive, well-developed skulls and dentition, long legs, and comparatively short ears and tails. (Source: Canis – Wikipedia)

So, from the above definition of holonym, Canis (Y) is a holonym of Dog (X) because Dog (X) is a part of Canis (Y).


dog = wn.synset('dog.n.01')

print (dog.hypernyms())
'''
Output:

[Synset('canine.n.02'), Synset('domestic_animal.n.01')]
'''

print (dog.hyponyms())
'''
Output:

[Synset('basenji.n.01'), Synset('corgi.n.01'), Synset('cur.n.01'), Synset('dalmatian.n.02'), Synset('great_pyrenees.n.01'), Synset('griffon.n.02'), Synset('hunting_dog.n.01'), Synset('lapdog.n.01'), Synset('leonberg.n.01'), Synset('mexican_hairless.n.01'), Synset('newfoundland.n.01'), Synset('pooch.n.01'), Synset('poodle.n.01'), Synset('pug.n.01'), Synset('puppy.n.01'), Synset('spitz.n.01'), Synset('toy_dog.n.01'), Synset('working_dog.n.01')]
'''

print (dog.member_holonyms())
'''
Output:

[Synset('canis.n.01'), Synset('pack.n.06')]
'''

Hope this helps. Thanks.

Loading WordNet Corpus

SYNONYMS & ANTONYMS

SIMILARITY BETWEEN TWO WORDS

HYPERNYMS, HYPONYMS, & HOLONYMS

Related posts: