I am writing a script to parse LinkedIn-CV. I am stuck at the work experience section. Currently I am able to extract the work experience text from the PDF. But I have an issue with the location key as it is optional.
Array
(
[0] => Company 1
[1] => Software Engineer
[2] => July 2020 - Present (1 month)
[3] => Pretoria, Gauteng, South Africa //this key is optional
[4] => Company 2
[5] => CTO
[6] => September 2016 - Present (3 years 11 months)
[7] => Pretoria, South Africa //this key is optional
)
The format is as follows:
I tried using array_chunk($array, 4);
But that only works if the location is present in the array.
My other attempt was to search for the presence of a country in the entire array, but that is tricky as the title of some companies contain countries. like MTN - South Africa.
My last attempt is to try to write a regex to check for the pattern of location. LinkedIn parses it as City, Province, Country
for South Africa. But for other countries it parses as City, Country
. But i have not been able to get this correctly. I tried preg_match('#\((,*?)\)#', $value, $match)
where $value
is the value of the string for the current iteration
I would like to have an array for each work experience which could either include location or not. For example:
Array
(
[0] => Array
(
[0] => Company 1
[1] => Software Engineer
[2] => July 2020 - Present (1 month)
[3] => Pretoria, Gauteng, South Africa
)
[1] => Array
(
[0] => Company 2
[1] => CTO
[2] => September 2016 - Present (3 years 11 months)
[3] => Pretoria Area, South Africa
)
)
I appreciate your help.
EDIT:
Main String (work experience)
$string = 'Company 1 Software Engineer July 2020 - Present (1 month) Pretoria, Gauteng, South Africa Company 2 CTO September 2016 - Present (3 years 11 months) Pretoria Area, South Africa';
$array = splitNewLine($string);
function splitNewLine($text) {
$code = preg_replace('/\n$/', '', preg_replace('/^\n/', '', preg_replace('/[\r\n]+/', "\n", $text)));
return explode("\n", $code);
}
You could grab lines 4 at a time, then check the location with a proper regular expression, and then adjust the position of the next chunk accordingly:
function computeExperiences(array $lines): array
{
$experiences = [];
$position = 0;
while ($chunkLines = array_slice($lines, $position, 4)) {
$experience = array_slice($chunkLines, 0, 3);
$locationIsPresent = isset($chunkLines[3]) && preg_match('/\w+,\s\w+(?:,\s\w+)?/', $chunkLines[3]);
if ($locationIsPresent) {
$experience[] = $chunkLines[3];
$position += 4;
} else {
$position += 3;
}
$experiences[] = $experience;
}
return $experiences;
}