javaregexxml

Unable to parse Multiple lined XML Message using Java "Pattern" and "Matcher"


I am unable to parse Multi-lined XML message payload using Pattern.compile(regex).However If I make same message Single line it Gives me expected result.For Example,IF I parse

<Document> <RGOrdCust50K5s0F> AccName AccNo AccAddress </RGOrdCust50K50F> </Document>

It gives me RGOrdCust50K50F> tag value as : AccName AccNo AccAddress but if I use multiple lines like

<Document> <RGOrdCust50K50F>AccNo 
 AccName 
 AccAddress   </RGOrdCust50K50F></Document>

it through ava.lang.IllegalStateException: No match found

The Testcase code I am using to test this is as below

public class ParseXMLMessage {
    public static void main(String[] args) {
        String fldName = "RGOrdCust50K50F";
     String message="<?xml version=1.0 encoding=UTF-8?> <Document><RGOrdCust50K50F>1234
     ABCD
     LONDON,UK </RGOrdCust50K50F></Document>";
String fldValue = getTagValue(fldName, message);
    System.out.println("fldValue:"+fldValue);


    }

    private static String getTagValue(String tagName, String message) {
        String regex = "(?<=<" + tagName + ">).*?(?=</" + tagName + ">)";
            System.out.println("regex:"+regex);
        Pattern pattern = Pattern.compile(regex);
        System.out.println("pattern:"+pattern);
        Matcher matcher = pattern.matcher(message);
        System.out.println("matcher:"+matcher);
        matcher.find(0);
        String tagValue = null;
        try {
            tagValue = matcher.group();
        } catch (IllegalStateException isex) {
            System.out.println("No Tag/Match found " + isex.getMessage());
        }
        return tagValue;
    }
}

As a business requirment I need to make message muli-lined but when i make message mutiple lined I get exception. I am unable to fix this issue Kindly suggest if there IS ANY ISSUE WITH 'REGEX' expression I am using do I need to Use '/n' in Regex express to resolve this issue.Kindly assist


Solution

  • Issue depends on '.' metacharacter. See http://docs.oracle.com/javase/tutorial/essential/regex/pre_char_classes.html

    .   Any character (may or may not match line terminators)
    

    Try to use following code:

    Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE| Pattern.DOTALL);
    

    Check following topic: java regex string matches and multiline delimited with new line