[SOLVED] Regex to Extract #hashtags from MMD metadata in Python

Regex to Extract #hashtags from MMD metadata in Python

I'm trying to extract all the #hashtags from the "Tags: #tag1 #tag2" line of a multimarkdown plaintext file. (I'm in Python multiline mode.)

I've tried using lookaheads:

^(?=Tags:\s.*)#(\w+)\b

and lookbehinds:

#(\w+)\b(?<=Tags:^\s)

Plain vanilla #(\w+)\b works, except it picks up any #hashtag that might appear later in the document.

Any hints, help, instruction appreciated.

Solution

text = "\n\n#bogus\nTags: #foo #bar\n"

First, you need to get the line:

line = re.findall(r'Tags:.+\n', text)
# line = ['Tags: #foo #bar\n']

Lastly, you need to get the tags from the line:

tags = re.findall(r'#(\w+)', line[0])
# tags = ['foo', 'bar']
tags = re.findall(r'#\w+', line[0])
# tags = ['#foo', '#bar']

Lookbehind won't work since you would need to provide a pattern that doesn't have a fixed width.