This article shows how you can use the WordNet
lexical database in NLTK (Natural Language Toolkit).
We deal with basic usage of WordNet and also finding synonyms, antonyms, hypernyms, hyponyms, holonyms of words. We also look into finding the similarities between any two words.
WordNet means the Network of Words. So, in WordNet, the words are connected with each other through linguistic relationship. The linguistic relations are the synonym, hypernym, hyponym, etc.
WordNet contains a large collection of words and vocabulary from the English language. These words are related to each other and are grouped into sets.
Nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations.
WordNet is part of the NLTK corpus.
Loading WordNet Corpus
Here, we look up for any particular word.
from nltk.corpus import wordnet as wn
print (wn.synsets('good'))
'''
Output:
[Synset('good.n.01'), Synset('good.n.02'), Synset('good.n.03'), Synset('commodity.n.01'), Synset('good.a.01'), Synset('full.s.06'), Synset('good.a.03'), Synset('estimable.s.02'), Synset('beneficial.s.01'), Synset('good.s.06'), Synset('good.s.07'), Synset('adept.s.01'), Synset('good.s.09'), Synset('dear.s.02'), Synset('dependable.s.04'), Synset('good.s.12'), Synset('good.s.13'), Synset('effective.s.04'), Synset('good.s.15'), Synset('good.s.16'), Synset('good.s.17'), Synset('good.s.18'), Synset('good.s.19'), Synset('good.s.20'), Synset('good.s.21'), Synset('well.r.01'), Synset('thoroughly.r.02')]
'''
The synsets function returns different form of the given word good
. Synset is a set of synonyms of the given word that share a common meaning. There is a 3-part naming for the synset in the form of: word.pos.nn
:
There’s a second parameter to the synsets
function. The second parameter is part of speech (pos) tag.
The part of speech (pos) contants for ADJ, ADJ_SAT, ADV, NOUN, VERB are ‘a’, ‘s’, ‘r’, ‘n’, ‘v’ respectively. ADJ_SAT stands for Adjective Satellite.
# print (wn.synsets('good', pos=wn.NOUN))
print (wn.synsets('good', pos='n'))
'''
Output:
[Synset('good.n.01'), Synset('good.n.02'), Synset('good.n.03'), Synset('commodity.n.01')]
'''
my_word = wn.synset('good.n.01')
print (my_word.definition()) # Output: benefit
print (my_word.examples())
'''
Output:
['for your own good', "what's the good of worrying?"]
'''
my_word = wn.synset('good.n.02')
print (my_word.definition()) # Output: moral excellence or admirableness
print (my_word.examples())
'''
Output:
['there is much good to be found in people']
'''
my_word = wn.synset('good.n.03')
print (my_word.definition()) # Output: that which is pleasing or valuable or useful
print (my_word.examples())
'''
Output:
['weigh the good against the bad', 'among the highest goods of all are happiness and self-realization']
'''
my_word = wn.synset('good.a.01')
print (my_word.definition()) # Output: having desirable or positive qualities especially those suitable for a thing specified
print (my_word.examples())
'''
Output:
['good news from the hospital', 'a good report card', 'when she was good she was very very good', 'a good knife is one good for cutting', 'this stump will make a good picnic table', 'a good check', 'a good joke', 'a good exterior paint', 'a good secretary', 'a good dress for the office']
'''
my_word = wn.synset('good.a.03')
print (my_word.definition()) # Output: morally admirable
print (my_word.examples()) # Output: []
SYNONYMS & ANTONYMS
We can use lemmas()
function of the synset. It returns synonyms of that particular synset.
Synonyms
my_word = wn.synset('good.n.01')
print (my_word.lemmas()) # Output: [Lemma('good.n.01.good')]
print (my_word.lemmas()[0].name()) # Output: good
print (my_word.lemmas()[0].antonyms()) # Output: []
my_word = wn.synset('good.n.02')
print (my_word.lemmas()) # Output: [Lemma('good.n.02.good'), Lemma('good.n.02.goodness')]
print (my_word.lemmas()[0].name()) # Output: good
print (my_word.lemmas()[1].name()) # Output: goodness
Antonyms
We first find out the synonyms of a given word using the lemmas()
function. After that, we can find the antonyms of each synonyms word.
my_word = wn.synset('good.n.02')
print (my_word.lemmas()) # Output: [Lemma('good.n.02.good'), Lemma('good.n.02.goodness')]
print (my_word.lemmas()[0].name()) # Output: good
print (my_word.lemmas()[0].antonyms()) # Output: [Lemma('evil.n.03.evil')]
print (my_word.lemmas()[0].antonyms()[0].name()) # Output: evil
print (my_word.lemmas()[1].name()) # Output: goodness
print (my_word.lemmas()[1].antonyms()) # Output: [Lemma('evil.n.03.evilness')]
print (my_word.lemmas()[1].antonyms()[0].name()) # Output: evilness
SIMILARITY BETWEEN TWO WORDS
There are different similarity measures present in NLTK. They are:
1) Path Similarity: Return a score denoting how similar two word senses are, based on the shortest path that connects the senses in the is-a (hypernym/hypnoym) taxonomy.
2) Leacock-Chodorow (LCH) Similarity: Return a score denoting how similar two word senses are, based on the shortest path that connects the senses (as Path Similarity) and the maximum depth of the taxonomy in which the senses occur.
3) Wu-Palmer (WUP) Similarity: Return a score denoting how similar two word senses are, based on the depth of the two senses in the taxonomy and that of their Least Common Subsumer (most specific ancestor node).
4) Resnik (RES) Similarity: Return a score denoting how similar two word senses are, based on the Information Content (IC) of the Least Common Subsumer (most specific ancestor node).
5) Jiang-Conrath (JCN) Similarity: Return a score denoting how similar two word senses are, based on the Information Content (IC) of the Least Common Subsumer (most specific ancestor node) and that of the two input Synsets.
6) Lin Similarity: Return a score denoting how similar two word senses are, based on the Information Content (IC) of the Least Common Subsumer (most specific ancestor node) and that of the two input Synsets.
print (wn.synsets('bad'))
'''
Output:
[Synset('bad.n.01'), Synset('bad.a.01'), Synset('bad.s.02'), Synset('bad.s.03'), Synset('bad.s.04'), Synset('regretful.a.01'), Synset('bad.s.06'), Synset('bad.s.07'), Synset('bad.s.08'), Synset('bad.s.09'), Synset('bad.s.10'), Synset('bad.s.11'), Synset('bad.s.12'), Synset('bad.s.13'), Synset('bad.s.14'), Synset('badly.r.05'), Synset('badly.r.06')]
'''
word_1 = wn.synset('good.n.01')
word_2 = wn.synset('bad.n.01')
print (word_1.wup_similarity(word_2)) # Output: 0.666666666667
print (word_2.wup_similarity(word_1)) # Output: 0.666666666667
word_1 = wn.synset('good.n.01')
word_2 = wn.synset('evil.n.01')
print (word_1.wup_similarity(word_2)) # Output: 0.25
word_1 = wn.synset('bad.n.01')
word_2 = wn.synset('evil.n.01')
print (word_1.wup_similarity(word_2)) # Output: 0.285714285714
print (wn.synsets('eat'))
'''
Output:
[Synset('eat.v.01'), Synset('eat.v.02'), Synset('feed.v.06'), Synset('eat.v.04'), Synset('consume.v.05'), Synset('corrode.v.01')]
'''
print (wn.synsets('sleep'))
'''
Output:
[Synset('sleep.n.01'), Synset('sleep.n.02'), Synset('sleep.n.03'), Synset('rest.n.05'), Synset('sleep.v.01'), Synset('sleep.v.02')]
'''
word_1 = wn.synset('eat.v.01')
word_2 = wn.synset('sleep.v.01')
print (word_1.wup_similarity(word_2)) # Output: 0.25
word_1 = wn.synset('dog.n.01')
word_2 = wn.synset('cat.n.01')
print (word_1.wup_similarity(word_2)) # Output: 0.857142857143
print (word_1.path_similarity(word_2)) # Output: 0.2
print (word_1.lch_similarity(word_2)) # Output: 2.02814824729
word_1 = wn.synset('ship.n.01')
word_2 = wn.synset('boat.n.01')
print (word_1.wup_similarity(word_2)) # Output: 0.909090909091
print (word_1.path_similarity(word_2)) # Output: 0.333333333333
print (word_1.lch_similarity(word_2)) # Output: 2.53897387106
HYPERNYMS, HYPONYMS, & HOLONYMS
All synsets are connected to other synsets by means of semantic relations. Some of such relations are:
Hypernyms = Y is a hypernym of X if every X is a (kind of) Y
Hyponyms = Y is a hyponym of X if every Y is a (kind of) X
Holonyms = Y is a holonym of X if X is a part of Y
In below example code, we can see the following:
1) Canine is another name for Dog.
So, from the above definition of hypernym, Canine (Y) is a hypernym of Dog (X) because every Dog (X) is a (kind of) Canine (Y).
2) Basenji is a breed of hunting dog.
So, from the above definition of hyponym, Basenji (Y) is a hyponym of Dog (X) because every Basenji (Y) is a (kind of) Dog (X).
3) Canis is a genus of the Canidaes containing multiple extant species, such as wolves, dogs and coyotes. Species of this genus are distinguished by their moderate to large size, their massive, well-developed skulls and dentition, long legs, and comparatively short ears and tails. (Source: Canis – Wikipedia)
So, from the above definition of holonym, Canis (Y) is a holonym of Dog (X) because Dog (X) is a part of Canis (Y).
dog = wn.synset('dog.n.01')
print (dog.hypernyms())
'''
Output:
[Synset('canine.n.02'), Synset('domestic_animal.n.01')]
'''
print (dog.hyponyms())
'''
Output:
[Synset('basenji.n.01'), Synset('corgi.n.01'), Synset('cur.n.01'), Synset('dalmatian.n.02'), Synset('great_pyrenees.n.01'), Synset('griffon.n.02'), Synset('hunting_dog.n.01'), Synset('lapdog.n.01'), Synset('leonberg.n.01'), Synset('mexican_hairless.n.01'), Synset('newfoundland.n.01'), Synset('pooch.n.01'), Synset('poodle.n.01'), Synset('pug.n.01'), Synset('puppy.n.01'), Synset('spitz.n.01'), Synset('toy_dog.n.01'), Synset('working_dog.n.01')]
'''
print (dog.member_holonyms())
'''
Output:
[Synset('canis.n.01'), Synset('pack.n.06')]
'''
Hope this helps. Thanks.