scalaescapingcoding-efficiency

Most efficient way for process string escaping - Scala


I'm using these 2 ways of processing string escaping a lot in the code:

1.

if (Seq("\\", "{" , "\"", "\"\"").exists(str.contains)) {
      str.replace("\"","").
        replace("{","-").
        replace("\\", "-").
        replace("\"\"","-")
    }
if (Seq("|", "\"").exists(str.contains)) s""""${str.replace("\"", "\"\"")}"""" else str

It runs inside a Spark cluster and the execution time is very important. Is that the best way of doing that? Is there a better, more efficient way to do that?


Solution

  • The code you provided is some thing like this:

    Given some banned strings such as ["a", "b", "c"], if my string contains either of these string, go replace all the "a"s and "b"s and "c"s

    So actually the checking part is redundant (I mean this part Seq(...).exists(...)), it doubles your complexity in cases that your string contains those banned strings. If you want to do it using scala functions and UDFs, I suggest you do this:

    str
      .replaceAll("[{,\\\\,\"\"]", "-") // this means if you found either of {, \\, ", replace it with -
      .replaceAll("\"", "") // and if you found ", replace it with empty string
    

    You can also chain 2 regexp_replace calls which is from spark APIs, you can choose between either of these 2.