I am new to JS. I am scraping a url with X-ray. The tags are removed when scraped as expected, but I want the <br>
tag to be replaced with something like ;
For example:
If I scrape something like 'span#scraped-portion'
<span id="scraped-portion"><span class="bold>NodeJS</span><br>
<span class="bold>Version:</span> 8<br><span class="bold>Date released:</span> 2017 Jan<br><span class="bold>Description:</span>Some other text
</span>
I will get result similar to the following
NodeJS /n Version: 8Date released: 2017 JanDescription: Some other text
The text around <br>
tags get added together and it will get difficult to understand what is what.
So I want the <br>
tag to be replaced replaced with something like ;
.
Is it possible or Should I better use other libraries?
UPDATE
I found a pure X-Ray based solution without the need of replacing <br>
tags in html prior utilizing X-Ray (see original solution below).
That way you're going to use X-Ray's filter
functions in addition with embedding X-Ray functions in each other (sort of nesting).
Firstly we're going to replace <br>
tags in original html by using custom filter function (called replaceLineBreak
) defined for X-Ray.
Secondly we're going to use the result of replace with rebuilding the original html structure (by re-adding <span id="scraped-portion">
) as the first argument of an X-Ray call.
Hope you'll like it!
var x = Xray({
filters: {
replaceLineBreak: function (value) { return value.replace(/\<br\>/g, ';'); },
}
});
var html =
`
<span id="scraped-portion"><span class="bold">NodeJS</span><br>
<span class="bold">Version:</span> 8<br><span class="bold">Date released:</span> 2017 Jan<br><span class="bold">Description:</span>Some other text
</span>
`;
x(html,
'#scraped-portion@html | replaceLineBreak' /// Filter function called to replace '<br>' to ';'
)(function (err, obj) {
x(`<span id="scraped-portion">${obj}</span>`, /// Restore oroginal html structure to have the outer span with id 'scraped-portion
'#scraped-portion'
)(function (err2, obj2) { res.header("Content-Type", "text/html; charset=utf-8"); res.write(obj2); res.end(); })
});
Resulting the following string:
NodeJS; Version: 8;Date released: 2017 Jan;Description:Some other text
ORIGINAL SOLUTION
why not replacing all occurences of <br>
tags prior to processing the html code by X-Ray?
function tst(req, res) {
var x = Xray();
var html =
`
<span id="scraped-portion"><span class="bold">NodeJS</span><br>
<span class="bold">Version:</span> 8<br><span class="bold">Date released:</span> 2017 Jan<br><span class="bold">Description:</span>Some other text
</span>
`.replace(/\<br\>/g, ';');
x
(
html,
['span#scraped-portion']
)(function (err, obj) { res.header("Content-Type", "text/html; charset=utf-8"); res.write(JSON.stringify(obj, null, 4)); res.end(); })
;
}
Then your code would result something like this
NodeJS;\n Version: 8;Date released: 2017 Jan;Description:Some other text\n
which pretty much seems to meet your requirements