I'm trying to identify all instances of a specific syntactic pattern found in a text: RB + NN|NNS|NP|PP. That is to say, I'm looking for adverbs that are immediately followed by nouns. I've tagged my text using TreeTagger. The tagged text is stored in a list called 'tags' that looks like this:
how WRB
hard JJ
it PP
was VBD
This is the relevant part of my code:
adverb = re.compile(r'RB$')
noun = re.compile(r'NN')
for n in range(len(tags)):
w = tags[n]
if adverb.search(w) != None and noun.search(w[n+1]) != None:
print(' '.join(tags[n-2 : n+3]))
My problem is that the fifth line produces the following error:
if adverb.search(w) != None and noun.search(w[n+1]) != None:
IndexError: string index out of range
If the fourth line of code is this...
if adverb.search(w) != None:
...then a list of adverbs is returned.
I'm really lost as to 1) why I am getting this mistake and 2) how I can fix it. Any guidance you guys can offer would be super appreciated.
Your problem is this:
w[n+1]
You are confusing your list tags
with a string in that list, w
. If you want to access another item in the list, you need to use tags[...]
, not w[...]
. Also, you should make sure that the index you are using is inside the range of the list.