phpsimple-html-dom

How to crawl the title and content of a bulletin board using PHP's Simple HTML DOM?


<?php
# simplehtmldom 포함
include('./simplehtmldom/simple_html_dom.php');

# url로 가져오기
$html = file_get_html('https://sample-ex.com/bbs/board.php?bo_table=free');
# 결과값을 담을 빈배열
$parsing = [];

# .na-table li 반복해서 내용 가져오기
foreach ($html->find('.na-table li') as $li) {
  # 결과값을 담을 임시 배열
  $tmp = [];

  $number = str_replace('번호', '', trim($li->find('div', 0)->text()));
  $title = trim($li->find('a', 0)->text());
  $writer = trim($li->find('.sv_member', 0)->text());
  $wrtieDate = str_replace('등록일', '', trim($li->find('div', 5)->text()));
  $count = str_replace('조회', '', trim($li->find('div', 6)->text()));
  $link = $li->find('a', 0)->href;

  $detail_html = file_get_html($link);
  echo $detail_html->find('.view-content', 0)->text();

  $tmp['number'] = $number;
  $tmp['title'] = $title;
  $tmp['writer'] = $writer;
  $tmp['wrtiedate'] = $wrtieDate;
  $tmp['count'] = $count;
  $tmp['link'] = $link;

  # $parsing[] 에 담기
  $parsing[] = $tmp;
}

# 메모리 누수 방지를 위해 $html 초기화
$html->clear();
unset($html);
?>
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Document</title>
  <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/bootstrap@4.6.2/dist/css/bootstrap.min.css" crossorigin="anonymous">
</head>
<body>
  <div class="container">
    <table class="table table-hover">
      <thead>
        <tr>
          <th>번호</th>
          <th>제목</th>
          <th>이름</th>
          <th>날짜</th>
          <th>조회</th>
          <th>링크</th>
        </tr>
      </thead>
      <tbody>
        <?php foreach ($parsing as $data) { ?>
        <tr>
          <td><?php echo $data['number'] ?></td>
          <td><?php echo $data['title'] ?></td>
          <td><?php echo $data['writer'] ?></td>
          <td><?php echo $data['wrtiedate'] ?></td>
          <td><?php echo $data['count'] ?></td>
          <td><?php echo $data['link'] ?></td>
        </tr>
        <?php } ?>
      </tbody>
    </table>
  </div>
</body>
</html>

Hello, I hope you have a wonderful day.

I have a question. I am studying by referring to the official documentation of Simple HTML DOM. I successfully crawled the list of posts on a bulletin board, but I failed when attempting to crawl the content of each individual post through their respective links.

When I add the following code:

$detail_html = file_get_html($link);
echo $detail_html->find('.view-content', 0)->text();

I encounter the error:

Fatal error: Uncaught Error: Call to a member function text() on null in /home/simple/public_html/test.php:23 Stack trace: #0 {main} thrown in /home/simple/public_html/test.php on line 23

If you have any solutions or advice, please let me know. As this question has been translated through ChatGPT, it might sound a bit awkward. Thank you.

I tried to crawl the content using the link to the post.


Solution

  • When you retrieve the link, it will look like this.

    https://sample-ex.com/bbs/board.php?bo_table=free&amp;wr_id=238

    You can use str_replace to transform it into this.

    $detail_html = file_get_html(str_replace('&amp;', '&', $link);