I have a string from where I have to pull all \u0000 characters, and I'm also not satisfied with the following sub-sequences in the strings \u0000, \\u0000, \\\u0000 etc (u0000 with any number of backslashes followed by)
it is important to note that the string may contain other escaped characters, such as quotation marks. They should not be deleted. Also we should not delete other unicode symbols like - \u0001, \uFFFF and like that
I have tried the following patterns
But none of them came up.
I expect to get the following results
HELLO\u0000WORLD -- HELLOWORLD
HELLO\\u0000WORLD -- HELLOWORLD
HELLO\\\u0000WORLD -- HELLOWORLD
HELLO\u0000"world" -- HELLO"world"
HELLO\u0000"WORLD UWU" -- HELLO"WORLD UWU"
if this problem cannot be solved using regex, then what other working options can there be?
UPD: for example, tests with provided patterns failed:
import org.junit.jupiter.params.ParameterizedTest;
import org.junit.jupiter.params.provider.Arguments;
import org.junit.jupiter.params.provider.MethodSource;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.stream.Stream;
import static org.assertj.core.api.AssertionsForClassTypes.assertThat;
import static org.junit.jupiter.params.provider.Arguments.arguments;
final class NulMatchTest {
private static final Pattern pattern = Pattern.compile("\\\\+u0{4}");
static String stripNul(CharSequence input) {
StringBuilder output = new StringBuilder();
Matcher matcher = pattern.matcher(input);
while (matcher.find()) matcher.appendReplacement(output, "");
matcher.appendTail(output);
return output.toString();
}
@ParameterizedTest
@MethodSource
void stripNulTest(String input, String expected) {
String actual = stripNul(input);
assertThat(actual).isEqualTo(expected);
}
private static Stream<Arguments> stripNulTest() {
return Stream.of(
arguments("HELLO\\\u0000WORLD", "HELLOWORLD"),
arguments("HELLO\\\\u0000WORLD", "HELLOWORLD"),
arguments("HELLO\\\\\\u0000WORLD", "HELLOWORLD"),
arguments("HELLO\\u0000\"world\"", "HELLO\"world\""),
arguments("HELLO\\u0000\"WORLD UWU\"", "HELLO\"WORLD UWU\""),
arguments("HELLO\\u0020WORLD", "HELLO\\u0020WORLD"));
}
}
You need to double up your backslashes.
Backslash is interpreted by regex as an escape for metacharacters like +
. So if you want to match a literal \
, you need to escape it as \\
.
But Java string literals also use backslash as an escape character. So to include \
in a string literal, you need to escape it as \\
.
Together, this requires four backslashes in your pattern: "\\\\+u0{4}"
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.stream.Stream;
import org.junit.jupiter.params.ParameterizedTest;
import org.junit.jupiter.params.provider.Arguments;
import org.junit.jupiter.params.provider.MethodSource;
import static org.assertj.core.api.Assertions.assertThat;
import static org.junit.jupiter.params.provider.Arguments.arguments;
final class NulMatchTest {
private static final Pattern pattern = Pattern.compile("\\\\+u0{4}");
static String stripNul(CharSequence input) {
StringBuilder output = new StringBuilder();
Matcher matcher = pattern.matcher(input);
while (matcher.find()) matcher.appendReplacement(output, "");
matcher.appendTail(output);
return output.toString();
}
@ParameterizedTest
@MethodSource
void stripNulTest(String input, String expected) {
String actual = stripNul(input);
assertThat(actual).isEqualTo(expected);
}
private static Stream<Arguments> stripNulTest() {
return Stream.of(
arguments("HELLO\\\\\\u0000WORLD", "HELLOWORLD"),
arguments("HELLO\\u0000WORLD", "HELLOWORLD"),
arguments("HELLO\\\\u0000WORLD", "HELLOWORLD"),
arguments("HELLO\\u0000\"world\"", "HELLO\"world\""),
arguments("HELLO\\u0000\"WORLD UWU\"", "HELLO\"WORLD UWU\""),
arguments("HELLO\\u0020WORLD", "HELLO\\u0020WORLD"));
}
}