Any ideas how to remove comments from the scala code so that:
Here is example code with comments:
object TestCode {
val a = "A" // a = "AA"
val b = "B" /* b = "BB" */
val c = "C" /* multi line comment
/* c = "CC" nested */ // FOO
*/ // c = "CCC"
val d = """D""" // d = """DD /* """
val e = '"' // e = '"' = char literal
val f = '\"' // f = '\"' = char literal
val codeStr = " \" \"\" \"\"\"/* This is literal */\"\"\" val x = \"\"\"5\"\"\" \" "
"/* This is a literal */" // This is a comment 3
"// This is a literal with extra comment end string */" // This is a comment 4
"/* This is a litral with extra comment begin string" // This is a comment 5
}
Code compiles (with warnings about pure expressions).
The C preprocessor gets quite close but fails with nested comments.
object TestCode {
val a = "A"
val b = "B"
val c = "C"
*/
val d = """D"""
val e = '"'
val f = '\"'
val codeStr = " \" \"\" \"\"\"/* This is literal */\"\"\" val x = \"\"\"5\"\"\" \" "
"/* This is a literal */"
"// This is a literal with extra comment end string */"
"/* This is a litral with extra comment begin string"
}
I also tried this regex solution but it seems that it fails in case of quote char literals and nested comments as you see:
str.replaceAll("//.*|/\\*(?s:.*?)\\*/|(\"(?:(?<!\\\\)(?:\\\\\\\\)*\\\\\"|[^\r\n\"])*\")", "$1")
res1: String = """object TestCode {
val a = "A"
val b = "B"
val c = "C"
*/
val d = """D"""
val e = '"' // e = '"' = char literal
val f = '\"' // f = '\"' = char literal
val codeStr = " \" \"\" \"\"\"/* This is literal */\"\"\" val x = \"\"\"5\"\"\" \" "
"/* This is a literal */"
"// This is a literal with extra comment end string */"
"/* This is a literal with extra comment begin string"
}"""
Scala compiler can do the job but for my understanding there is no compiler option to do just the comment removal.
I used Mateusz Kubuszok's proposal and used ScalaMeta for the implementation.
This is the scala-cli script file: SourceCodeCommentRemover.scala
//> using scala "2.13.5"
//> using lib "org.scalameta::scalameta:4.9.7"
import scala.meta._
import java.io.{File, PrintWriter}
object CommentRemover {
def main(args: Array[String]): Unit = {
if (args.length != 2) {
println("Usage: CommentRemover <input file> <output file>")
sys.exit(1)
}
val inputFile = new File(args(0))
val outputFile = new File(args(1))
if (!inputFile.exists()) {
println(s"Input file ${inputFile.getAbsolutePath} does not exist.")
sys.exit(1)
}
val sourceCode = {
import scala.io.Source
Source.fromFile(inputFile).mkString
}
println(s"Original source: BEGIN\n${sourceCode}\nEND")
val tree = sourceCode.parse[Source] match {
case parsers.Parsed.Success(tree) => tree
case parsers.Parsed.Error(_, msg, _) =>
println(s"Failed to parse the input file: $msg")
sys.exit(1)
}
val codeWithoutComments = tree.tokens.collect {
case token if !token.is[Token.Comment] => token.text
}.mkString
println(s"Comments removed: BEGIN\n${codeWithoutComments}\nEND")
val writer = new PrintWriter(outputFile)
try {
writer.write(codeWithoutComments)
} finally {
writer.close()
}
println(s"Comments removed. Output written to ${outputFile.getAbsolutePath}.")
}
}
This is the test input file: StackOverflowTestCode.scala
object TestCode {
val a = "A" // a = "AA"
val b = "B" /* b = "BB" */
val c = "C" /* multi line comment
/* c = "CC" nested */ // FOO
*/ // c = "CCC"
val d = """D""" // d = """DD /* """
val e = '"' // e = '"' = char literal
val f = '\"' // f = '\"' = char literal
val codeStr = " \" \"\" \"\"\"/* This is literal */\"\"\" val x = \"\"\"5\"\"\" \" "
"/* This is a literal */" // This is a comment 3
"// This is a literal with extra comment end string */" // This is a comment 4
"/* This is a litral with extra comment begin string" // This is a comment 5
}
Run the script:
scala-cli run SourceCodeCommentRemover.scala -- StackOverflowTestCode.scala out.scala
cat out.scala
object TestCode {
val a = "A"
val b = "B"
val c = "C"
val d = """D"""
val e = '"'
val f = '\"'
val codeStr = " \" \"\" \"\"\"/* This is literal */\"\"\" val x = \"\"\"5\"\"\" \" "
"/* This is a literal */"
"// This is a literal with extra comment end string */"
"/* This is a litral with extra comment begin string"
}
Scala-cli version:
scala-cli --version
Scala CLI version: 1.4.1
Scala version (default): 3.4.2