javaapache-stringutils

Get text between two tags using substringBetween() method of StringUtils


I have an input like:

<address>
    <addressLine>280 Flinders Mall</addressLine>
    <geoCodeGranularity>PROPERTY</geoCodeGranularity>
</address>
<address type="office">
    <addressLine>IT Park</addressLine>
    <geoCodeGranularity>office Space</geoCodeGranularity>
</address>

I want to capture everything between the address tag.

I tried:

File file = new File("test.html");
String testHtml = FileUtils.readFileToString(file); 
String title = StringUtils.substringBetween(testHtml, "<address>", "</address>");

This does not work for all the cases because the address tag may contain some attribute inside. Please help how to get text for such string.


Solution

  • You can convert the file into String and can determine the start and end index of the desired sub-string as below:

    import java.io.File;
    import java.io.IOException;
    import java.nio.file.Files;
    import java.nio.file.Paths;
    
    public class Address {
    
        public static void main(String[] args) throws IOException {
    
            // Complete File Path
            File dir =
                new File("\\..\\..\\Test.html");
    
            // Convert File Data As String
            String data =
                new String(
                    Files.readAllBytes(Paths
                        .get(dir
                            .getAbsolutePath())));
    
            // For Loop to get all the <address> tags in the file.
            for (int index = data.indexOf("<address"); index >= 0;) {
    
                // Start Index
                int startIndex = data.indexOf(">", index + 1);
                ++startIndex;
    
                // End Index
                int indexOfEnd = data.indexOf("</address>", startIndex + 1);
    
                String attributesString = data.substring(startIndex, indexOfEnd);
                // Replace below line with desired logic with calling trim() on the String attributesString
                System.out.println(attributesString);
    
                // Next Address will be after the end of first address
                index = data.indexOf("<address", indexOfEnd + 1);
            }
        }
    }