regexperlwhitespace

Whitespace woes in Regex


I am using a simple Perl script to parse XML and convert it to usable SQL. My current SQL lines go something like this:

INSERT INTO table VALUES ('data1', 'data2', 'data3', ); 

Obviously I need to remove the comma at the end there. Sounds simple but I just can't get regex to find it. I tried s/,\s+\)/\)/ but that doesn't change anything when I run it. Strangely, s/,\s+/WTF/ doesn't modify anything either, when it should be replacing all the commas and the spaces next to them. BUT when I run s/\s+\)/something/ it correctly finds and replaces the close parentheses at the end of the line. So apparently the whitespace character right after the commas is some strange ghost character that I can't find by any means. Not even with the . expression.

What's really weird though is when I use Find on the document in Notepad++ with the Regular Expression option, it finds all of them perfectly when I enter ,\s+\) yet the exact same sequence in Perl regex will not find them.

I suspected it was something with \r (I'm using Windows) since I previously removed the \n characters but it won't find a \r in the whole sql file.

Thank you in advance for your help this is really puzzling me.


Solution

  • First off,

    $ perl -E 'my $foo = "bar, baz"; $foo =~ s/,\s+/WTF/; say $foo'
    barWTFbaz
    

    It does work. (For perl 5.8 and before, change that to -e and print "$foo\n")

    Second, you're doing it wrong. Instead of doing something like:

    $values = "'$values[0]', ";
    $values .= "'$values[1]', ";
    ⋮
    

    you should do:

    $values = join(q{,}, map("'$_'", @values)); # map adds 'quotes'; join adds commas
    

    Third, you shouldn't even do that, you should use placeholders:

    # note specifying the column names, its a good idea! Schema change.
    my $query = "INSERT INTO table (col1, col2, col3) VALUES (?,?,?)";
    my $sth = $dbh->prepare($query);
    $sth->execute(@values);