Consider this scenario (godbolt): I have a text buffer and a function that at runtime tells me its encoding:
enum class Enc {UTF8, UTF16LE, UTF16BE, UTF32LE, UTF32BE};
Enc detect_encoding_of(std::string_view buf);
Then I have a series of functions that can extract the codepoints from the text buffer, according to the encoding. I have organized them specializing a function template with the enum
above:
template<Enc enc> char32_t extract_next_codepoint(const std::string_view buf, std::size_t& pos);
template<> char32_t extract_next_codepoint<Enc::UTF8>(const std::string_view buf, std::size_t& pos);
template<> char32_t extract_next_codepoint<Enc::UTF16LE>(const std::string_view buf, std::size_t& pos);
In order to parse the text buffer I have to select the proper function depending on the detected encoding:
const std::string_view buf; // filled at runtime
const Enc buf_enc = detect_encoding_of(buf);
std::size_t pos = 0;
switch( buf_enc )
{
case Enc::UTF8:
// parse using extract_next_codepoint<Enc::UTF8>(buf,pos)
break;
case Enc::UTF16LE:
// parse using extract_next_codepoint<Enc::UTF16LE>(buf,pos)
break;
// ...
}
The functions extract_next_codepoint()
are called a lot of time, that's why I'm avoiding runtime polymorphism for this.
The downside of my current solution is that I have to write and maintain a lot of repeated and almost identical code for each of the supported encoding.
Is there a way to write less and let the compiler give a little help?
You can use one of following solutions:
Assign a function to function pointer and then call it. https://gcc.godbolt.org/z/Pffz97E4x
char32_t (*next)(const std::string_view buf, std::size_t& pos);
switch (buf_enc)
{
case Enc::UTF8:
next = extract_next_codepoint<Enc::UTF8>;
break;
case Enc::UTF16LE:
next = extract_next_codepoint<Enc::UTF16LE>;
break;
// ...
}
while (pos<buf.size())
{
const char32_t codepoint = next(buf, pos);
fmt::print("{} at {} got {}\n", buf, pos, (int)codepoint);
}
Move your logic to separate template function that calls corresponding function. https://gcc.godbolt.org/z/nzGnoPnnW
template <Enc enc> void handle(const std::string_view &buf)
{
for (std::size_t pos = 0; pos < buf.size(); )
{
const char32_t codepoint = extract_next_codepoint<enc>(buf, pos);
fmt::print("{} at {} got {}\n", buf, pos, (int)codepoint);
}
}
...
switch (buf_enc)
{
case Enc::UTF8:
handle<Enc::UTF8>(buf);
break;
case Enc::UTF16LE:
handle<Enc::UTF16LE>(buf);
break;
// ...
}
For me seems like the second solution will work faster, but I haven't checked it and maybe they have the same speed.