phpsqlmysqlmeta

SQL apostrophe resulting in faulty description on a web crawler


I've been working on a web crawler. Let me describe the problem.

Let's say that web crawler reads the title as Stack Overflow when you put the url http://stackoverflow.com

Now, here's the faulty part, let's say that the meta description of the page is:

Jason's Stack Overflow is the largest online community for programmers to learn, share their knowledge, and advance their careers.

Whenever I run my crawler, as soon as it sees the apostrophe it stops and only crawls until Jason. So, the entire description appears as:

Jason

Here's the crawler's code. Tell me what's wrong, and hopefully how to fix it, because it's driving me insane. I'm missing something super small...

foreach($url as $k) {
$url = parse_url($k);
if(!isset($url['path'])) {
    $selectData = "SELECT * FROM web WHERE url = '$k'";
    if(mysql_fetch_row(mysql_query($selectData)) === false) {
        $content = getUrl($k);
        preg_match('#<title>(.*)</title>#i', $content, $title);
        preg_match_all('/<img src=.([`^"\']+)/', $content, $img);
        preg_match('/<head>.+<meta name="description" content=.([`^"\']+)/is', $content, $description);
        preg_match('/<head>.+<meta name="author" content=.([`^"\']+)/is', $content, $author);
        #preg_match_all('/href=.([`^"\']+)/i', $content, $anchor);
        preg_match('/<body.*?>(.*?)<\/body>/is', $content, $body);
        if(!empty($title[1])  AND !empty($description[1]) || !empty($body[1])) {
            echo 'Titlu: '; @print_r($title[1]);
            echo "\n";
            $body_trim = trim(preg_replace("/&#?[a-z0-9]+;/i",'',(strip_tags(@$body[0])))); $bodyContent = substr(preg_replace('/\s+/', ' ', $body_trim), 0, 255);

            $description_trim = trim(preg_replace("/&#?[a-z0-9]+;/i",'',(strip_tags(@$description[1])))); $descContent = substr(preg_replace('/\s+/', ' ',$description_trim), 0, 255);

            $bodyContent = str_replace('\'', '', $bodyContent);
            $descContent = str_replace('\'', '', $descContent);

            echo 'Description: '; @print_r($descContent);
            echo "\n";
            echo 'Author: '; @print_r($author[1]);
            echo "\n";
            echo 'URL: '; @print_r($k); $date = date("d M Y");
            echo "\n";
            echo "\n---------------------------------------------------------------------------\n";
            $insertData = "INSERT INTO `web` (`url` ,  `title` ,  `description` ,  `body` ,  `author`, `date`) VALUES ('".$k."', '".@$title[1]."', '".@$descContent."', '".@$bodyContent."', '".@$author[1]."', '".$date."')";
            #echo $insertData;
            mysql_query($insertData);
        }
    }
}
}

Solution

  • You got exact SQL-Injection case, because you concatenate input values into SQL string

    ... VALUES ('".$k."', '".@$title[1]."'...
    

    Perhaps @$title[1] (or any other variable) has ' like: "It's SQL Injection" it will make SQL syntax error, because final SQL query string will look like

    VALUES ('something', 'It's SQL Injection', ...
    

    so actually you got SQL syntax error

    as @Bramar said - use prepared statement with parameters. It will guarantee you from real SQL Injections and will handle any SQL reserved characters inside your values properly