I am trying to parse a FDF file using PHP, and regex. But I just cant get my head around regex. I am stuck parsing the file to generate a array.
%FDF-1.2
%âãÏÓ
1 0 obj
<<
/FDF
<<
/Fields [
<<
/V (email@email.com)
/T (field_email)
>>
<<
/V (John)
/T (field_name)
>>
<<
/V ()
/T (field_reference)
>>]
>>
>>
endobj
trailer
<<
/Root 1 0 R
>>
%%EOF
Current function (source:http://php.net/manual/en/ref.fdf.php)
function parse2($file) {
if (!preg_match_all("/<<\s*\/V([^>]*)>>/x", $file,$out,PREG_SET_ORDER))
return;
for ($i=0;$i<count($out);$i++) {
$pattern = "<<.*/V\s*(.*)\s*/T\s*(.*)\s*>>";
$thing = $out[$i][1];
if (eregi($pattern,$out[$i][0],$regs)) {
$key = $regs[2];
$val = $regs[1];
$key = preg_replace("/^\s*\(/","",$key);
$key = preg_replace("/\)$/","",$key);
$key = preg_replace("/\\\/","",$key);
$val = preg_replace("/^\s*\(/","",$val);
$val = preg_replace("/\)$/","",$val);
$matches[$key] = $val;
}
}
return $matches;
}
Result:
Array
(
[field_email)
] => email@email.com)
[field_name)
] => John)
[field_reference)
] => )
)
Why does it conclude the )
and new line? I know this problem is trivial for someone that understands regex expressions. So help would be appreciated.
Your initial expression simply finds the entire block of text which represents each key and value set. Then in your clean up section, you're looking for a close paran which is followed immediately by a end of string \)$
but I'm sure there are additional characters between the close paran and the end of the string.
Instead I'd handle all this in one operation. This expression will:
field_
substring off^\/V\s\(([^)]*)\)[\r\n]*^\/T\s\(field_([^)]*)\)
Sample Text
%FDF-1.2
%âãÏÓ
1 0 obj
<<
/FDF
<<
/Fields [
<<
/V (email@email.com)
/T (field_email)
>>
<<
/V (John)
/T (field_name)
>>
<<
/V ()
/T (field_reference)
>>]
>>
>>
endobj
trailer
<<
/Root 1 0 R
>>
%%EOF
Matches
[0][0] = /V (email@email.com)
/T (field_email)
[0][1] = email@email.com
[0][2] = email
[1][0] = /V (John)
/T (field_name)
[1][1] = John
[1][2] = name
[2][0] = /V ()
/T (field_reference)
[2][1] =
[2][2] = reference
If you wanted retain the field_
substring, then you can simply remove that from the expression like so:
^\/V\s\(([^)]*)\)[\r\n]*^\/T\s\(([^)]*)\)