I have a markdown text document with several sections and just below hashtags of the section. The hashtags are in the form #oneword#
or #multiple words hashtag#
.
I need to extract sections and their hashtags in ruby.
Example
# Section 1
#hash1# #hash tag 2# #hashtag3#
Some text
# Section 2
#hash1# #hash tag 4# #hash tag2#
Some text too
I want to get
{"Section 1"=>["hash1", "hash tag 2", "hashtag3"],
"Section 2"=>["hash1", "hash tag 4", "hash tag2"]}
Can we get in from grep?
When faced with a problem such as this I tend to prefer the to use the builder pattern. It is a little verbose, but is normally very readable and very flexible.
The main idea is you have a "reader" that simply looks at your input and looks for "tokens', in this case lines, and when it finds a token that it recognizes it informs the builder that it found a token of interest. The builder builds another object based on input from the "reader". Here is an example of a "DocumentBuilder" that takes input from a "MarkdownReader" that builds the Hash that you are looking for.
class MarkdownReader
attr_reader :builder
def initialize(builder)
@builder = builder
end
def parse(lines)
lines.each do |line|
case line
when /^#[^#]+$/
builder.convert_section(line)
when /^#.+\#$/
builder.convert_hashtag(line)
end
end
end
end
class DocumentBuilder
attr_reader :document
def initialize()
@document = {}
end
def convert_section(line)
line =~ /^#\s*(.+)$/
@section_name = $1
document[@section_name] = []
end
def convert_hashtag(line)
hashtags = line.split("#").reject {_1.strip.empty?}
document[@section_name] += hashtags
end
end
lines = File.readlines("markdown.md")
builder = DocumentBuilder.new
reader = MarkdownReader.new(builder)
reader.parse(lines)
p builder.document
=> {"Section 1"=>["hash1", "hash tag 2", "hashtag3"], "Section 2"=>["hash1", "hash tag 4", "hash tag2"]}