Since last month NLTK dispersion_plot seems to have y (vertical) axis in reversed order on my machine. This is likely something about my versions of software (I am on a school virtual machine).
versions: nltk 3.8.1 matplotlib 3.7.2 Python 3.9.13
code:
from nltk.draw.dispersion import dispersion_plot
words=['aa','aa','aa','bbb','cccc','aa','bbb','aa','aa','aa','cccc','cccc','cccc','cccc']
targets=['aa','bbb', 'f', 'cccc']
dispersion_plot(words, targets)
expected: aaa is present at the beginning, and cccc at the end. actual: it's backwards! also notice f should be completely absent - instead bbb is absent.
conclusion: Y axis is backwards.
I found source code for nltk.draw.dispersion and it seems there is mistake.
def dispersion_plot(text, words, ignore_case=False, title="Lexical Dispersion Plot"):
"""
Generate a lexical dispersion plot.
:param text: The source text
:type text: list(str) or iter(str)
:param words: The target words
:type words: list of str
:param ignore_case: flag to set if case should be ignored when searching text
:type ignore_case: bool
:return: a matplotlib Axes object that may still be modified before plotting
:rtype: Axes
"""
try:
import matplotlib.pyplot as plt
except ImportError as e:
raise ImportError(
"The plot function requires matplotlib to be installed. "
"See https://matplotlib.org/"
) from e
word2y = {
word.casefold() if ignore_case else word: y
for y, word in enumerate(reversed(words)) # <--- HERE
}
xs, ys = [], []
for x, token in enumerate(text):
token = token.casefold() if ignore_case else token
y = word2y.get(token)
if y is not None:
xs.append(x)
ys.append(y)
_, ax = plt.subplots()
ax.plot(xs, ys, "|")
ax.set_yticks(list(range(len(words))), words, color="C0") # <--- HERE
ax.set_ylim(-1, len(words))
ax.set_title(title)
ax.set_xlabel("Word Offset")
return ax
if __name__ == "__main__":
import matplotlib.pyplot as plt
from nltk.corpus import gutenberg
words = ["Elinor", "Marianne", "Edward", "Willoughby"]
dispersion_plot(gutenberg.words("austen-sense.txt"), words)
plt.show()
It calculates word2y
using reversed(words)
for y, word in enumerate(reversed(words))
but later it uses ax.set_yticks()
using words
but it should use reversed(words)
ax.set_yticks(list(range(len(words))), words, color="C0")
(or it should calculate word2y
without using reversed()
).
I added # <--- HERE
in code above to show these places.
It may need to report it as a issue.
At this moment you can get ax
and use set_yticks
with reversed
to correct it.
In your code it will be targets
instead of words
ax = dispersion_plot(words, targets)
ax.set_yticks(list(range(len(targets))), reversed(targets), color="C0")
Full working code
import matplotlib.pyplot as plt
from nltk.draw.dispersion import dispersion_plot
words = ['aa','aa','aa','bbb','cccc','aa','bbb','aa','aa','aa','cccc','cccc','cccc','cccc']
targets = ['aa','bbb', 'f', 'cccc']
ax = dispersion_plot(words, targets)
ax.set_yticks(list(range(len(targets))), reversed(targets), color="C0")
plt.show()
EDIT: I seems this problem was reported few months ago and they add reversed()
in code on GitHub - and probably it will work in next version
dispersion plot not working properly · Issue #3133 · nltk/nltk
dispersion plot not working properly by Apros7 · Pull Request #3134 · nltk/nltk