greppattern-matchingwhite-labelling

GREP - finding all occurrences of a string


I am tasked with white labeling an application so that it contains no references to our company, website, etc. The problem I am running into is that I have many different patterns to look for and would like to guarantee that all patterns are removed. Since the application was not developed in-house (entirely) we cannot simply look for occurrences in messages.properties and be done. We must go through JSP's, Java code, and xml.

I am using grep to filter results like this:

grep SOME_PATTERN . -ir | grep -v import | grep -v // | grep -v /* ...

The patterns are escaped when I'm using them on the command line; however, I don't feel this pattern matching is very robust. There could possibly be occurrences that have import in them (unlikely) or even /* (the beginning of a javadoc comment).

All of the text output to the screen must come from a string declaration somewhere or a constants file. So, I can assume I will find something like:

public static final String SOME_CONSTANT = "SOME_PATTERN is currently unavailable";

I would like to find that occurrence as well as:

public static final String SOME_CONSTANT = "
SOME_PATTERN blah blah blah";

Alternatively, if we had an internal crawler / automated tests, I could simply pull back the xhtml from each page and check the source to ensure it was clean.


Solution

  • I would use sed, not grep! Sed is used to perform basic text transformations on an input stream. Try s/regexp/replacement/ option with sed command.

    You can also try awk command. It has an option -F for fields separation, you can use it with ; to separate lines of you files with ;.

    The best solution will be however a simple script in Perl or in Python.