trying to fetch some html sources from a internal forum. Just to be independent we play around with nodejs, express and similar.
When I open up the page directly I get the following html back:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="content-type" content="text/html; charset=us-ascii" />
<meta name="description" content="myForum" />
<meta name="viewport" content="width=320; user-scalable=no" />
<title>myForum</title>
</head>
<body>
<table>
<tr>
<td align="left" valign="top" width="100%">
<center>
<h1><img class="banner" src=
"./img/myForum.jpg" width="730"
height="117" border="0" alt="myForum" /></h1>
</center>
<hr />
<center>
[ <a href="answer.php?id=975710">Antworten</a> ] [
<a href="index.php">Forum</a> ] [ <a href=
"newEntries.php">Neue Beiträge</a> ]
</center>
<hr />
<h1>sCHween</h1>geschrieben von <font color=
"#FFFFFF">User1</font> am 18.06.2014 um 21:26:15
<hr />
This is my text! It could contain images and links!
<img src="http://images.google.ch/intl/en_ALL/images/srpr/logo11w.png" /><br />
<a href="http://www.google.com/">Google</a>
<br />
<hr />
<b>Antworten:</b><br />
<a href="thread.php?id=9752">Re:
sCHween</a> - <b><font color=
"#FFFFFF">User2</font></b> - 18.06.2014 22:56:27<br />
<a href="showentry.php?id=9756">Re:
sCHween</a> - <b><font color=
"#FFFFFF">User2</font></b> - 18.06.2014 23:14:44<br />
<a href="showentry.php?id=9753">Re:
sCHween</a> - <b><font color=
"#FFFFFF">User1</font></b> - 18.06.2014 23:02:21<br />
<a href="showentry.php?id=975713">Re:
sCHween</a> - <b><font color=
"#FFFFFF">User1</font></b> - 18.06.2014 21:46:13<br />
<a href="showentry.php?id=9720">Re:
sCHween</a> - <b><font color=
"#FFFFFF">User3</font></b> - 18.06.2014 22:22:25<br />
<a href="showentry.php?id=9755">Re:
sCHween</a> - <b><font color=
"#FFFFFF">User4</font></b> - 18.06.2014 21:52:51<br />
<hr />
<span>
<a href="answer.php?id=975">Antworten</a><br />
<a href="recent.php">Neue Beiträge</a><br />
</span>
<hr />
</td>
</tr>
</table>
</body>
</html>
What we want to get out is the html source of the things between the two hr tags:
This is my text! It could contain images and links!
<img src="http://images.google.ch/intl/en_ALL/images/srpr/logo11w.png" /><br />
<a href="http://www.google.com/">Google</a>
Is there an easy way to get the source between the two hr tags or what would be the cleanest and easiest way to extract this content?
Not Sure if this is what you want:
Jquery:
var AllContent = $("td").contents();
var hrCount = 0;
var addContent = false;
var result="";
AllContent.each(function(){
if ($(this).prop('tagName') == "HR"){
hrCount++;
if (hrCount ==3){
addContent = true;
}
if (hrCount ==4){
addContent = false;
}
}else{
if(addContent){
if (typeof $(this).html() != "undefined"){
result+=$(this)[0].outerHTML;
}else{
result+=$(this).text();
}
}
}
});
alert(result);
Must be a cleaner solution...