I have a database of mostly correct datetimes but a few are broke like so: Sat Dec 22 12:34:08 PST 20102015
Without the invalid year, this was working for me:
end_date = soup('tr')[4].contents[1].renderContents()
end_date = time.strptime(end_date,"%a %b %d %H:%M:%S %Z %Y")
end_date = datetime.fromtimestamp(time.mktime(end_date))
But once I hit an object with a invalid year I get ValueError: unconverted data remains: 2
, which is great but im not sure how best to strip the bad characters out of the year. They range from 2 to 6 unconverted characters
.
Any pointers? I would just slice end_date
but im hoping there is a datetime-safe strategy.
Yeah, I'd just chop off the extra numbers. Assuming they are always appended to the datestring, then something like this would work:
end_date = end_date.split(" ")
end_date[-1] = end_date[-1][:4]
end_date = " ".join(end_date)
I was going to try to get the number of excess digits from the exception, but on my installed versions of Python (2.6.6 and 3.1.2) that information isn't actually there; it just says that the data does not match the format. Of course, you could just continue lopping off digits one at a time and re-parsing until you don't get an exception.
You could also write a regex that will match only valid dates, including the right number of digits in the year, but that seems like overkill.