I am trying to convert some html into an array then into a json string.
I am developing based on this reference: https://www.codeproject.com/Tips/1074174/Simple-Way-to-Convert-HTML-Table-Data-into-PHP-Arr
This is the basic table/html from which I want to convert it to JSON.
<table class="table-list table table-responsive table-striped">
<thead>
<tr>
<th class="coll-1 name">name</th>
<th class="coll-2">height</th>
<th class="coll-3">weight</th>
<th class="coll-date">date</th>
<th class="coll-4"><span class="info">info</span></th>
<th class="coll-5">country</th>
</tr>
</thead>
<tbody>
<tr>
<td class="coll-1 name">
<a href="/username/Jhon Doe/" class="icon"><i class="flaticon-user"></i></a>
<a href="/username/Jhon Doe/">Jhon Doe</a>
</td>
<td class="coll-2 height">45</td>
<td class="coll-3 weight">50</td>
<td class="coll-date">9am May. 16th</td>
<td class="coll-4 size mob-info">abcd</td>
<td class="coll-5 country"><a href="/country/CA/">CA</a></td>
</tr>
<tr>
<td class="coll-1 name">
<a href="/username/Kasim Shk/" class="icon"><i class="flaticon-user"></i></a>
<a href="/username/Kasim Shk/">Kasim Shk</a>
</td>
<td class="coll-2 height">33</td>
<td class="coll-3 weight">54</td>
<td class="coll-date">Mar. 14th '18</td>
<td class="coll-4 size mob-info">ijkl</td>
<td class="coll-5 country"><a href="/country/UAE/">UAE</a></td>
</tr>
</tbody>
</table>
I want json output like this:
[
{
"user_link": "/username/Jhon Doe/",
"name": "Jhon Doe",
"height": "45",
"weight": "50",
"date": "Apr. 01st '18",
"info": "abcd",
"country": "CA"
},
{
"user_link": "/username/Kasim Shk/",
"name": "Kasim Shk",
"height": "33",
"weight": "54",
"date": "Mar. 14th '18",
"info": "ijkl",
"country": "UAE"
}
]
This is what I was trying in PHP :(
However, this is not able to fetch user's link and the name is not proper in JSON.
(HTML Table same as above @ http://rudraproduction.in/htmltable.php)
header("Access-Control-Allow-Origin: *");
header('Content-Type: application/json');
error_reporting(E_ERROR | E_PARSE);
$time_start = microtime(true);
$All = [];
$link = 'http://rudraproduction.in/htmltable.php';
$jsonData = file_get_contents($link);
//echo $jsonData;
$dom = new DOMDocument;
$dom->loadHTML($jsonData);
$tables = $dom->getElementsByTagName('table');
$tr = $dom->getElementsByTagName('tr');
foreach ($tr as $element1) {
for ($i = 0; $i < count($element1); $i++) {
//Not able to fetch the user's link :(
$link = $element1->getElementsByTagName('td')->item(0)->getElementsByTagName('a'); // To fetch user link
$name = $element1->getElementsByTagName('td')->item(0)->textContent; // To fetch name
$height = $element1->getElementsByTagName('td')->item(1)->textContent; // To fetch height
$weight = $element1->getElementsByTagName('td')->item(2)->textContent; // To fetch weight
$date = $element1->getElementsByTagName('td')->item(3)->textContent; // To fetch date
$info = $element1->getElementsByTagName('td')->item(4)->textContent; // To fetch info
$country = $element1->getElementsByTagName('td')->item(5)->textContent; // To fetch country
array_push($All, array(
"user_link" => $link,
"name" => $name,
"height" => $height,
"weight" => $weight,
"date" => $date,
"info" => $info,
"country" => $country
));
}
}
echo json_encode($All, JSON_PRETTY_PRINT);
Output of my PHP code is:
[
{
"user_link": "http:\/\/rudraproduction.in\/htmltable.php",
"name": null,
"height": null,
"weight": null,
"date": null,
"info": null,
"country": null
},
{
"user_link": "http:\/\/rudraproduction.in\/htmltable.php",
"name": "\r\n\t\t\r\n\t\tJhon Doe\r\n\t",
"height": "45",
"weight": "50",
"date": "9am May. 16th",
"info": "abcd",
"country": "CA"
},
{
"user_link": "http:\/\/rudraproduction.in\/htmltable.php",
"name": "\r\n\t\t\r\n\t\tKasim Shk\r\n\t",
"height": "33",
"weight": "54",
"date": "Mar. 14th '18",
"info": "ijkl",
"country": "UAE"
}
]
I prefer to use XPath
with DomDocument
because of utility/ease of the syntax. By targeting the only the <tr>
elements inside the <tbody>
tag, you can access all required data.
With the exception of the href
value, the final "all-letters" substring in each <td>
class value represents your desired key for the associated value. For this I am using preg_match()
to extract the final "word" in the class attribute.
When the $key
is name
, the href
attribute value must be stored with the hardcode key: user_link
.
Your sample date values require some preparation to yield the desired format. As your input data varies, you may need to modify the regular expression to allow strtotime()
to properly handle the date expression.
Code: (Demo)
$html = <<<HTML
<!DOCTYPE html>
<html>
<head>
<title></title>
</head>
<body>
<table class="table-list table table-responsive table-striped" border="1">
<thead>
<tr>
<th class="coll-1 name">name</th>
<th class="coll-2">height</th>
<th class="coll-3">weight</th>
<th class="coll-date">date</th>
<th class="coll-4"><span class="info">info</span></th>
<th class="coll-5">country</th>
</tr>
</thead>
<tbody>
<tr>
<td class="coll-1 name">
<a href="/username/Jhon Doe/" class="icon"><i class="flaticon-user"></i></a>
<a href="/username/Jhon Doe/">Jhon Doe</a>
</td>
<td class="coll-2 height">45</td>
<td class="coll-3 weight">50</td>
<td class="coll-date">9am May. 16th</td>
<td class="coll-4 size mob-info">abcd</td>
<td class="coll-5 country"><a href="/country/CA/">CA</a></td>
</tr>
<tr>
<td class="coll-1 name">
<a href="/username/Kasim Shk/" class="icon"><i class="flaticon-user"></i></a>
<a href="/username/Kasim Shk/">Kasim Shk</a>
</td>
<td class="coll-2 height">33</td>
<td class="coll-3 weight">54</td>
<td class="coll-date">Mar. 14th '18</td>
<td class="coll-4 size mob-info">ijkl</td>
<td class="coll-5 country"><a href="/country/UAE/">UAE</a></td>
</tr>
</tbody>
</table>
</body>
</html>
HTML;
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//tbody/tr') as $tr) {
$tmp = []; // reset the temporary array so previous entries are removed
foreach ($xpath->query("td[@class]", $tr) as $td) {
$key = preg_match('~[a-z]+$~', $td->getAttribute('class'), $out) ? $out[0] : 'no_class';
if ($key === "name") {
$tmp['user_link'] = $xpath->query("a[@class = 'icon']", $td)[0]->getAttribute('href');
}
$tmp[$key] = trim($td->textContent);
}
$tmp['date'] = date("M. dS 'y", strtotime(preg_replace('~\.|\d+[ap]m *~', '', $tmp['date'])));
$result[] = $tmp;
}
var_export($result);
echo "\n----\n";
echo json_encode($result);
Output: (as multidim array, then json encoded string)
array (
0 =>
array (
'user_link' => '/username/Jhon Doe/',
'name' => 'Jhon Doe',
'height' => '45',
'weight' => '50',
'date' => 'May. 16th \'18',
'info' => 'abcd',
'country' => 'CA',
),
1 =>
array (
'user_link' => '/username/Kasim Shk/',
'name' => 'Kasim Shk',
'height' => '33',
'weight' => '54',
'date' => 'Jan. 01st \'70',
'info' => 'ijkl',
'country' => 'UAE',
),
)
----
[{"user_link":"\/username\/Jhon Doe\/","name":"Jhon Doe","height":"45","weight":"50","date":"May. 16th '18","info":"abcd","country":"CA"},{"user_link":"\/username\/Kasim Shk\/","name":"Kasim Shk","height":"33","weight":"54","date":"Jan. 01st '70","info":"ijkl","country":"UAE"}]