I'm trying to identify whether a date occurs in an arbitrary string. Here's my code:
import nltk
txts = ['Submitted on 1st January',
'Today is 1/3/15']
def chunk(t):
w_tokens = nltk.word_tokenize(t)
pt = nltk.pos_tag(w_tokens)
ne = nltk.ne_chunk(pt)
print ne
for t in txts:
print t
chunk(t)
The output I'm getting is
Submitted on 1st January
(S (GPE Submitted/NNP) on/IN 1st/CD January/NNP)
Today is 1/3/15
(S Today/NN is/VBZ 1/3/15/CD)
Clearly the dates are not being tagged. Does anyone know how to have dates tagged?
Thanks
I took the date example from your comment 1/1/70 but this regex code will also find them if they are formatted differently like 1970/01/20 or 2-21-79
import re
x = 'asdfasdf sdf5sdf asd78fsadf 1/1/70 dfsdg fghdfgh 1970/01/20 gfh5fghh sdfgsdg 2-21-79 sdfgsdgf'
print re.findall(r'\d+\S\d+\S\d+', x)
Output:
['1/1/70', '1970/01/20', '2-21-79']
OR,
y = 'Asdfasdf Ddf5sdf asd78fsadf Jan 3 dfsdg fghdfgh February 10 sdfgsdgf'
print re.findall(r'[A-Z]\w+\s\d+', y)
Output:
['Jan 3', 'February 10']