phparraysstringtext-parsing

Parse lines of formatted text which have values containing the delimiting character


I have stuck with searching of ways how to prepare strings to parsing them into arrays.

Nissan Micra 1.2dm3 80KM 2015r. 103 000km 
Hyundai Tucson 2dm3 141KM 2005r. 199 000km 
Volkswagen Passat 2dm3 140KM 2014r. 138 000km 
Nissan Note 1.4dm3 88KM 2007r. 120 000km 

It's look simple, split string by " " and it's done, but, there also might be some "Land Rover Range Rover" and other long names, also passed km also will be split, so I don't know what to do with that.

Parse string over cycles each time so my script will check are there any "two-word+ names" inside to work right with them or any another way exists for splitting those strings into arrays correctly?


Solution

  • Assuming the format of the text is consistent, this is pretty easily solved with a regular expression.

    $text = "Nissan Micra 1.2dm3 80KM 2015r. 103 000km";
    $result = preg_match("/(.*?) ([\d.]+dm3) (\d+KM) (\d+r\.) ([\d ]+km)/", $text, $matches);
    var_dump($matches);
    

    Output:

    array(6) {
      [0] =>
      string(41) "Nissan Micra 1.2dm3 80KM 2015r. 103 000km"
      [1] =>
      string(12) "Nissan Micra"
      [2] =>
      string(6) "1.2dm3"
      [3] =>
      string(4) "80KM"
      [4] =>
      string(6) "2015r."
      [5] =>
      string(9) "103 000km"
    }
    

    Demo: https://regex101.com/r/gY0HeK/1

    If you don't need units like km or dm3, you can always place them outside the parentheses like ([\d.]+)dm3, ([\d ]+)km, etc.