I have written Apps Script code for Html Parsing using XmlParse. It works fine at my end, my browser and system language both are English as well as my Google Account's. But when I shared the same script with Client in Italy, it gives error
The markup in the document following the root element must be well-formed!
I am confused if its the same HTML code then why is it giving error at client's end.
I am wondering if ther is any regional or language settings required to be updated for this?
I already tried changing locale of the associated Google Sheet.
Following is the HTML Structure
<div dir=""ltr"">
<div dir=""ltr"">
<div align=""center"">
<div style=""font:20pt Times New Roman""><b>TITLE</b></div>
<br></br>
<table cellspacing=""1"" cellpadding=""3"" border=""0"">
<tbody>
<tr align=""left"">
<td colspan=""2""><b>Account: </b></td>
<td colspan=""5""><b>Name: a</b></td>
<td colspan=""2""><b>Currency: USD</b></td>
<td colspan=""2""><b>Leverage: </b></td>
<td colspan=""3"" align=""right""><b>2024 June 22, 09:14</b></td>
</tr>
<tr align=""left""><td colspan=""13""><b>Closed Transactions:</b></td></tr>
<tr align=""center"" bgcolor=""#C0C0C0"">
<td>Ticket</td>
<td>Open Time</td>
<td>Type</td>
<td>Size</td>
<td>Item</td>
<td>Price</td>
<td>S / L</td>
<td>T / P</td>
<td>Close Time</td>
<td>Price</td>
<td>Commission</td>
<td>Taxes</td>
<td>Swap</td>
<td>Profit</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
Below is my appscript function
function sanitizeHtml(htmlContent) {
return htmlContent.replace(/<meta([^>]*?)\/?>/g, '<meta$1></meta>')
.replace(/<img([^>]*?)\/?>/g, '<img$1></img>')
.replace(/<br([^>]*?)\/?>/g, '<br$1></br>')
.replace(/<hr([^>]*?)\/?>/g, '<hr$1></hr>')
.replace(/ /g, ' ')
.replace(/\s+nowrap/gi, '')
.replace(/\\(\#|0|,|\.|\\)/g, '$1');
}
function extractDateFromHtml(htmlContent) {
// Parse the HTML content
var sanitizedHtml = sanitizeHtml(htmlContent);
var doc = XmlService.parse(sanitizedHtml);
}
I tried your sample HTML file by creating the file "index.html" in the script editor, after editing your double quotion marks to single and used the below script for getting the date-time value in the table.
index.html file is;
<!DOCTYPE html>
<html>
<body>
<div dir="ltr">
<div dir="ltr">
<div align="center">
<div style="font:20pt Times New Roman"><b>TITLE</b></div>
<br></br>
<table cellspacing="1" cellpadding="3" border="0">
<tbody>
<tr align="left">
<td colspan="2"><b>Account: </b></td>
<td colspan="5"><b>Name: a</b></td>
<td colspan="2"><b>Currency: USD</b></td>
<td colspan="2"><b>Leverage: </b></td>
<td colspan="3" align="right"><b>2024 June 22, 09:14</b></td>
</tr>
<tr align="left"><td colspan="13"><b>Closed Transactions:</b></td></tr>
<tr align="center" bgcolor="#C0C0C0">
<td>Ticket</td>
<td>Open Time</td>
<td>Type</td>
<td>Size</td>
<td>Item</td>
<td>Price</td>
<td>S / L</td>
<td>T / P</td>
<td>Close Time</td>
<td>Price</td>
<td>Commission</td>
<td>Taxes</td>
<td>Swap</td>
<td>Profit</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</body>
</html>
and the script where I used "XmlService" is;
function getDateData() {
var html = HtmlService.createTemplateFromFile('index').evaluate().getContent();
html = html.replace(/(\r\n|\n|\r)/gm,"");
var tablesMatch = html.match(/<table(.*?)<\/table>/gm);
if (tablesMatch) {
var myTable = tablesMatch[0];
// Logger.log(myTable);
var doc = XmlService.parse(myTable);
var rows = doc.getDescendants().filter(function(c) {
var element = c.asElement();
return element && element.getName() == "tr";
});
var data = rows.slice(0).map(function(row) {
return row.getChildren("td").map(function(cell) {
return cell.getValue();
});
});
Logger.log(data[0][4]);
}
}
The screenshot after rıunning the script is added below;