I have a data fine being read in using file()
and iterate over each row. Need to be able to split the string into an array of "columns". Problem is the columns are not even widths (60 chars, 24 chars, 16 chars). Seems like all the functions to do this expect that the columns are a fixed size.
This will be performed on a large data file quite regularly so optimal performance is desired.
Example of data.
XXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXX XX XXXXXX
XXXXXXXXX XXX XXX X XXX
XXXXXXXXXXXXXXX XXXXXXXXXXXXX XX XXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXX XX XXXXXX
XXXXXXXXX XXX XXX X XXX
XXXXXXXXXXXXXXX XXXXXXXXXXXXX XX XXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXX XX XXXXXX
XXXXXXXXX XXX XXX X XXX
XXXXXXXXXXXXXXX XXXXXXXXXXXXX XX XXXXXX
Desired result:
array (
0 =>
array (
0 => 'XXXXXXXXXXXXXXXXXXXXXXXXXX ',
1 => 'XXXXXXXXXXXXX ',
2 => 'XX XXXXXX',
),
1 =>
array (
0 => 'XXXXXXXXX ',
1 => 'XXX XXX ',
2 => 'X XXX',
),
2 =>
array (
0 => 'XXXXXXXXXXXXXXX ',
1 => 'XXXXXXXXXXXXX ',
2 => 'XX XXXXXX',
),
3 =>
array (
0 => 'XXXXXXXXXXXXXXXXXXXXXXXXXX ',
1 => 'XXXXXXXXXXXXX ',
2 => 'XX XXXXXX',
),
4 =>
array (
0 => 'XXXXXXXXX ',
1 => 'XXX XXX ',
2 => 'X XXX',
),
5 =>
array (
0 => 'XXXXXXXXXXXXXXX ',
1 => 'XXXXXXXXXXXXX ',
2 => 'XX XXXXXX',
),
6 =>
array (
0 => 'XXXXXXXXXXXXXXXXXXXXXXXXXX ',
1 => 'XXXXXXXXXXXXX ',
2 => 'XX XXXXXX',
),
7 =>
array (
0 => 'XXXXXXXXX ',
1 => 'XXX XXX ',
2 => 'X XXX',
),
8 =>
array (
0 => 'XXXXXXXXXXXXXXX ',
1 => 'XXXXXXXXXXXXX ',
2 => 'XX XXXXXX',
),
)
The straightforward method would be using substr to split up the columns:
foreach (file($fn) as $i=>$line) {
$rows[$i] = array(substr($line, 0, 60), substr($line, 60, 40), substr($line, 100, 30));
}
But contrary to common wisdom it would be faster to use PCRE and a regular expression to split up the string:
preg_match_all('/^(.{60})(.{40})(.{30})\K/m', file_get_contents($fn), $rows, PREG_SET_ORDER);
The disadvantage here is that it each row contains an empty [0]
(would have contained the original line), and the data columns start at index [1]
.