Quick Perl question with hopefully a simple answer. I'm trying to perform a split on a string containing non breaking spaces (
). This is after reading in an html
page using HTML::TreeBuilder::XPath
and retrieving the string needed by $titleString = $tree->findvalue('/html/head/title')
use HTML::TreeBuilder::XPath;
$tree = HTML::TreeBuilder::XPath->new;
$tree->parse_file( "filename" );
$titleString = $tree->findvalue('/html/head/title');
print "$titleString\n";
Pasted below is the original string and below that the string that gets printed:
Mr Dan Perkins (Active)
Mr?Dan Perkins?(Active)
I've tried splitting $titleString
with @parts = split('\?',$titleString);
and also with the original nbsp
, though neither have worked. My hunch is that there's a simple piece of encoding code to be added somewhere?
HTML code:
<html>
<head>
<title>Dan Perkins (Active)</title>
</head>
</html>
You shouldn't have to know how the text in the document is encoded. As such, findvalue
returns an actual non-breaking space (U+00A0) when the document contains
. As such, you'd use
split(/\xA0/, $title_string)
-or-
split(/\x{00A0}/, $title_string)
-or-
split(/\N{U+00A0}/, $title_string)
-or-
split(/\N{NBSP}/, $title_string)
-or-
split(/\N{NO-BREAK SPACE}/, $title_string)