I had some code that had a few thousands lines of code that contain pieces like this
opencanmanager.GetObjectDict()->ReadDataFrom(0x1234, 1).toInt()
that I needed to convert to some other library that uses syntax like this
ReadFromOD<int>(0x1234, 1)
.
Basically I need to search for
[whatever1]opencanmanager.GetObjectDict()->ReadDataFrom([whatever2]).toInt()[whatever3]
across all the lines of a text file and to replace every occurence of it with
[whatever1]ReadFromOD<int>([whatever2])[whatever3]
and then do the same for a few other data types.
Doing that manually was going to be a few days of absolutely terrible dumb work but all the automatic functions of any editor I know of do not allow for any smart code refactoring tools.
Now I have solved the problem using GNU AWK with the script below
#!/usr/bin/awk -f
BEGIN {
spl1 = "opencanmanager.GetObjectDict()->ReadDataFrom("
spl2 = ").to"
spl2_1 = ").toString()"
spl2_2 = ").toUInt()"
spl2_3 = ").toInt()"
min_spl2_len = length(spl2_3)
repl_start = "ReadFromOD<"
repl_mid1 = "QString"
repl_mid2 = "uint"
repl_mid3 = "int"
repl_end = ">("
repl_after = ")"
}
function replacer(str)
{
pos1 = index(str, spl1)
pos2 = index(str, spl2)
if (!pos1 || !pos2) {
return str
}
strbegin = substr(str, 0, pos1-1)
mid_start_pos = pos1+length(spl1)
strkey = substr(str, pos2, min_spl2_len)
key1 = substr(spl2_1, 0, min_spl2_len)
key2 = substr(spl2_2, 0, min_spl2_len)
key3 = substr(spl2_3, 0, min_spl2_len)
strmid = substr(str, mid_start_pos, pos2-mid_start_pos)
if (strkey == key1) {
repl_mid = repl_mid1; spl2_fact = spl2_1;
} else if (strkey == key2) {
repl_mid = repl_mid2; spl2_fact = spl2_2;
} else if (strkey == key3) {
repl_mid = repl_mid3; spl2_fact = spl2_3;
} else {
print "ERROR!!! Found", spl1, "but not any of", spl2_1, spl2_1, spl2_3 "!" > "/dev/stderr"
exit EXIT_FAILURE
}
str_remainder = substr(str, pos2+length(spl2_fact))
return strbegin repl_start repl_mid repl_end strmid repl_after str_remainder
}
{
resultstr = $0
do {
resultstr = replacer(resultstr)
more_spl = index(resultstr, spl1) || index(resultstr, spl2)
} while (more_spl)
print(resultstr)
}
and everything works fine but the thing still bugs me somewhat. My solution still feels a bit too complicated for a job that must be very common and must have an easy standard solution that I just dont't know about for some reason.
I am prepared to just let it go but if you know a more elegant and quick one-liner solution or some specific tool for the smart code modification problem then I would definitely would like to know.
If sed
is an option, you can try this solution which should match both output examples from input such as this.
$ cat input_file
opencanmanager.GetObjectDict()->ReadDataFrom(0x1234, 1).toInt()
power1 = opencanmanager.GetObjectDict()->ReadDataFrom(0x1234, 1).toInt() * opencanmanager.GetObjectDict()->ReadDataFrom(0x5678, 1).toUInt() * FACTOR1;
power2 = opencanmanager.GetObjectDict()->ReadDataFrom(0x5678, 1).toUInt() / 2;
$ sed -E 's/ReadDataFrom/ReadFromOD<int>/g;s/int/uint/2;s/(.*= )?[^>]*>([^\.]*)[^\*|/]*?(\*|\/.{2,})?[^\.]*?[^>]*?>?([^\.]*)?[^\*]*?(.*)?/\1\2 \3 \4 \5/' input_file
ReadFromOD<int>(0x1234, 1)
power1 = ReadFromOD<int>(0x1234, 1) * ReadFromOD<uint>(0x5678, 1) * FACTOR1;
power2 = ReadFromOD<int>(0x5678, 1) / 2;
s/ReadDataFrom/ReadFromOD<int>/g
- The first part of the command does a simple global substitution substituting all occurances of ReadDataFrom
to ReadFromOD<int>
s/int/uint/2
- The second part will only substitute the second occurance of int
to uint
if there is one
s/(.*= )?[^>]*>([^\.]*)[^\*|/]*?(\*|\/.{2,})?[^\.]*?[^>]*?>?([^\.]*)?[^\*]*?(.*)?/\1\2 \3 \4 \5/
- The third part utilizes sed
grouping and back referencing.
(.*= )?
- Group one returned with back reference \1
captures everything up to an =
character, ?
makes it conditional meaning it does not have to exist for the remaining grouping to match.
[^>]*>
- This is an excluded match as it is not within parenthesis ()
. It matches everything continuing from the space after the =
character up to the >
, a literal >
is then included to exclude that also. This is not conditional and must match.
([^\.]*)
- Continuing from the excluded match, this will continue to match everything up to the first .
and can be returned with back reference \2
. This is not conditional and must match.
[^\*|/]*?
- This is an excluded match and will match everything up to the literal *
or |
to /
. It is conditional ?
so does not have to match.
(\*|\/.{2,})?
- Continuing from the excluded match, this will continue to match everything up to and including *
or |
/
followed by at least 2 or more{2,}
characters. It can be returned with back reference \3
and is conditional ?
[^\.]*?[^>]*?>?
- Conditional excluded matches. Match everything up to a literal .
, then everything up to >
and include >
([^\.]*)?
- Conditional group matching up to a full stop .
. It can be returned with back reference \4
.
[^\*]*?
- Excluded. Continue matching up to *
(.*)?
- Everything else after the final *
should be grouped and returned with back reference \5
if it exist ?