I have a list which contains patterned string like this:
['"Bandcamp" (2014)\t\t\t\t\ttv-mini-series',
'"ByMySide" (2012){The Happening (#1.3)}\t\t\t\t\ttwitter-hashtag-in-title',
'"Elmira" (2014)\t\t\t\t\telmira-new-york',
'"Elmira" (2014){The Happening (#1.3)}\t\t\tfriend',
...]
Now, I am trying to extract sub-strings from each line, and make them into a data frame like:
Movie Year Keyword
Bandcamp 2014 tv-mini-series
ByMySide 2012 twitter-hashtag-in-title
Elmira 2014 elmira-new-york
Elmira 2014 friend
...
Here you go:
>>> a
['"Bandcamp" (2014)\t\t\t\t\ttv-mini-series', '"ByMySide" (2012){The Happening (#1.3)}\t\t\t\t\ttwitter-hashtag-in-title', '"Elmira" (2014)\t\t\t\t\telmira-new-york', '"Elmira" (2014){The Happening (#1.3)}\t\t\tfriend']
>>> data = []
>>> for x in a:
... data.append(re.findall("\"(\w+)\" \((\d+)\).*\t{2,5}(\S+)", x)[0])
...
>>> import pandas as pd
>>> pd.DataFrame(data, columns=['Movie', 'Year', 'Keyword'])
Movie Year Keyword
0 Bandcamp 2014 tv-mini-series
1 ByMySide 2012 twitter-hashtag-in-title
2 Elmira 2014 elmira-new-york
3 Elmira 2014 friend