javaregexsortingfilereader

Using pattern matching to sort from a file, Java


So I've gotten my program to the point where it properly separates the lines of the text file properly and can even match the pattern for the first line of text but i also need to be able to detect and separate the address lines of the text file and sort them based on their direction or street/broadway but i cant even get the initial pattern to be detected for the address setup. Am i using regex wrong and is that why the address portion wont be detected properly?

CODE

package csi311;

// Import some standard Java libraries.
import java.io.BufferedReader;
import java.io.FileReader;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.util.ArrayList;
/**
 * Hello world example.  Shows passing in command line arguments, in this case a filename. 
 * If the filename is given, read in the file and echo it to stdout.
 */
public class HelloCsi311 {

    /**
     * Class construtor.
     */
    public HelloCsi311() {
    }


    /**
     * @param filename the name of a file to read in 
     * @throws Exception on anything bad happening 
     */
    public void run(String filename) throws Exception {
        if (filename != null) {
            readFile(filename); 
        }
    }


    /**
     * @param filename the name of a file to read in 
     * @throws Exception on anything bad happening 
     */
    private void readFile(String filename) throws Exception {
        System.out.println("Dumping file " + filename); 
        // Open the file and connect it to a buffered reader.
        BufferedReader br = new BufferedReader(new FileReader(filename));  
        ArrayList<String> foundaddr = new ArrayList<String>();
        String line = null;  
        String pattern = "^\\d\\d\\d-[A-Za-z][A-Za-z][A-Za-z]-\\d\\d\\d\\d";
        String address[] = new String[4];
        address[0] = "\\d{1,3}\\s\\[A-Za-z]{1,20}";
        address[1] = "\\d{1,3}\\s\\[A-Za-z]{1,20}\\s\\d{1,3}\\[A-Za-z]{1,20}\\s\\[A-Za-z]{1,20}";
        address[2] = "\\d{1,3}\\s\\d{1,3}\\[A-Za-z]{1,20}\\s\\[A-Za-z]{1,20}";
        address[3] = "\\d\\d\\s\\[A-Za-z]{1,20}";
        Pattern r = Pattern.compile(pattern);
        // Get lines from the file one at a time until there are no more.
        while ((line = br.readLine()) != null) {
            if(line.trim().isEmpty()) {
                continue;
            }
            String sample = line.replaceAll("\\s+,", ",").replaceAll(",+\\s",",");
            String[] result = sample.split(",");
            String pkgId = result[0].trim().toUpperCase();
            String pkgAddr = result[1].trim();


            Float f = Float.valueOf(result[2]);
            for(String str : result){
                // Trying to match for different types
                for(String pat : address){
                    if(str.matches(pat)){
                        System.out.println(pat);
                    }
                }



                if(f < 50 && !pkgId.matches(pattern)) {
                    Matcher m = r.matcher(str);
                    if(m.find()) {
                        foundaddr.add(str);
                    }
                }
            }
        }

        if(foundaddr != null) {
            System.out.println(foundaddr.size());
        }   

        // Close the buffer and the underlying file.
        br.close();
    }



    /**
     * @param args filename
     */
    public static void main(String[] args) {
        // Make an instance of the class.
        HelloCsi311 theApp = new HelloCsi311();
        String filename = null; 
        // If a command line argument was given, use it as the filename.
        if (args.length > 0) {
            filename = args[0]; 
        }
        try { 
            // Run the run(), passing in the filename, null if not specified.
            theApp.run(filename);
        }
        catch (Exception e) {
            // If anything bad happens, report it.
            System.out.println("Something bad happened!");
            e.printStackTrace();
        }    
    }
}

Text File

123-ABC-4567, 15 W. 15th St., 50.1
456-BGT-9876,22 Broadway,24
QAZ-456-QWER, 100 East 20th Street,50
Q2Z-457-QWER, 200 East 20th Street, 49
6785-FGH-9845 ,45 5th Ave, 12.2,
678-FGH-9846 ,45 5th Ave, 12.2

123-ABC-9999, 46 Foo Bar, 220.0
347-poy-3465, 101 B'way,24

Below is the lines of code that should be able to process the address lines but for some reason it wont match the pattern and the outputs which properly separate the address lines and can be seen in the print statement above the for loop dealing with the addresses but for some reason the address lines arent even being detected as matches and im confused as to why that is.

Line of Code Issue is with

  for(String str : result){
      //System.out.println(str);
      // Trying to match for different types
      for(String pat : address){
          if(str.matches(pat)){
              System.out.println(pat);
          }
      }

Desired Output - Edit as Requested -

22 Broadway
45 5th Ave
101 B'way

Solution

  • I believe the problem is with your Regex. \\d\\d\\s\\[A-Za-z]{1,20} for example, after all of the escaping becomes \d\d\s\[A-Za-z]{1,20}. This breaks down as follows:

    The regex you probably want is \d\d\s[A-Za-z]{1,20} which, as an escaped string is \\d\\d\\s[A-Za-z]{1,20}. Notice that there's no \ before the [.

    Something else to keep in mind is that regular expressions can match anywhere in the string. For example the regex a would match the string a but would also match abc, bac, abracadabra, etc. To avoid this, you must use the anchoring symbols ^ and $ to match the start and end respectively. Your regex then becomes ^\\d\\d\\s[A-Za-z]{1,20}$.

    I also noticed that you're matching each column against the regex using with the for loop for(String str : result){. It seems to me that you should only be matching against result[1] or pkgAddr.

    A final note, take a look at Regex 101. It will allow you to test your regular expressions against a bunch of inputs to see if they match.