regexrust

How to use Rust regex on bytes (Vec<u8> or &[u8])?


I have a &[u8] and I need to verify if it conforms to some pattern. There are examples of using regexes on &[u8] in the Regex documentation and in the module documentation. I took the code from the examples section and put it inside a main() and added a few declarations:

extern crate regex;
use regex::Regex;

fn main() {
    let re = Regex::new(r"'([^']+)'\s+\((\d{4})\)").unwrap();
    let text = b"Not my favorite movie: 'Citizen Kane' (1941).";
    let caps = re.captures(text).unwrap();
    assert_eq!(&caps[1], &b"Citizen Kane"[..]);
    assert_eq!(&caps[2], &b"1941"[..]);
    assert_eq!(&caps[0], &b"'Citizen Kane' (1941)"[..]);
    // You can also access the groups by index using the Index notation.
    // Note that this will panic on an invalid index.
    assert_eq!(&caps[1], b"Citizen Kane");
    assert_eq!(&caps[2], b"1941");
    assert_eq!(&caps[0], b"'Citizen Kane' (1941)");
}

I don't understand how this example code differs from regular string matching, and indeed the compiler complains about expecting a &str. In general the code does not hint how it differs from the usual string matching, with which I have no problems.

I presume I did something basic wrong, like a missing or more precise import. I am in a guessing game here, as the docs fail to provide working examples (as they regularly do), and this time the compiler also fails to nudge me in the right direction.

Here are the compiler messages:

error[E0308]: mismatched types
 --> src/main.rs:7:28
  |
7 |     let caps = re.captures(text).unwrap();
  |                            ^^^^ expected str, found array of 45 elements
  |
  = note: expected type `&str`
             found type `&[u8; 45]`

error[E0277]: the trait bound `str: std::cmp::PartialEq<[u8]>` is not satisfied
 --> src/main.rs:8:5
  |
8 |     assert_eq!(&caps[1], &b"Citizen Kane"[..]);
  |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ can't compare `str` with `[u8]`
  |
  = help: the trait `std::cmp::PartialEq<[u8]>` is not implemented for `str`
  = note: required because of the requirements on the impl of `std::cmp::PartialEq<&[u8]>` for `&str`
  = note: this error originates in a macro outside of the current crate

error[E0277]: the trait bound `str: std::cmp::PartialEq<[u8]>` is not satisfied
 --> src/main.rs:9:5
  |
9 |     assert_eq!(&caps[2], &b"1941"[..]);
  |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ can't compare `str` with `[u8]`
  |
  = help: the trait `std::cmp::PartialEq<[u8]>` is not implemented for `str`
  = note: required because of the requirements on the impl of `std::cmp::PartialEq<&[u8]>` for `&str`
  = note: this error originates in a macro outside of the current crate

error[E0277]: the trait bound `str: std::cmp::PartialEq<[u8]>` is not satisfied
  --> src/main.rs:10:5
   |
10 |     assert_eq!(&caps[0], &b"'Citizen Kane' (1941)"[..]);
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ can't compare `str` with `[u8]`
   |
   = help: the trait `std::cmp::PartialEq<[u8]>` is not implemented for `str`
   = note: required because of the requirements on the impl of `std::cmp::PartialEq<&[u8]>` for `&str`
   = note: this error originates in a macro outside of the current crate

error[E0277]: the trait bound `str: std::cmp::PartialEq<[u8; 12]>` is not satisfied
  --> src/main.rs:13:5
   |
13 |     assert_eq!(&caps[1], b"Citizen Kane");
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ can't compare `str` with `[u8; 12]`
   |
   = help: the trait `std::cmp::PartialEq<[u8; 12]>` is not implemented for `str`
   = note: required because of the requirements on the impl of `std::cmp::PartialEq<&[u8; 12]>` for `&str`
   = note: this error originates in a macro outside of the current crate

error[E0277]: the trait bound `str: std::cmp::PartialEq<[u8; 4]>` is not satisfied
  --> src/main.rs:14:5
   |
14 |     assert_eq!(&caps[2], b"1941");
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ can't compare `str` with `[u8; 4]`
   |
   = help: the trait `std::cmp::PartialEq<[u8; 4]>` is not implemented for `str`
   = note: required because of the requirements on the impl of `std::cmp::PartialEq<&[u8; 4]>` for `&str`
   = note: this error originates in a macro outside of the current crate

error[E0277]: the trait bound `str: std::cmp::PartialEq<[u8; 21]>` is not satisfied
  --> src/main.rs:15:5
   |
15 |     assert_eq!(&caps[0], b"'Citizen Kane' (1941)");
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ can't compare `str` with `[u8; 21]`
   |
   = help: the trait `std::cmp::PartialEq<[u8; 21]>` is not implemented for `str`
   = note: required because of the requirements on the impl of `std::cmp::PartialEq<&[u8; 21]>` for `&str`
   = note: this error originates in a macro outside of the current crate

Solution

  • and added a few declarations

    Unfortunately, you added the wrong ones. Note how the documentation you've linked to is for the struct regex::bytes::Regex, not regex::Regex — they are two different types!

    extern crate regex;
    use regex::bytes::Regex;
    //         ^^^^^
    
    fn main() {
        let re = Regex::new(r"'([^']+)'\s+\((\d{4})\)").unwrap();
        let text = b"Not my favorite movie: 'Citizen Kane' (1941).";
        let caps = re.captures(text).unwrap();
    
        assert_eq!(&caps[1], &b"Citizen Kane"[..]);
        assert_eq!(&caps[2], &b"1941"[..]);
        assert_eq!(&caps[0], &b"'Citizen Kane' (1941)"[..]);
    
        assert_eq!(&caps[1], b"Citizen Kane");
        assert_eq!(&caps[2], b"1941");
        assert_eq!(&caps[0], b"'Citizen Kane' (1941)");
    }
    

    as the docs fail to provide working examples (as they regularly do)

    Note that code blocks in documentation are compiled and executed by default, so my experience is that it's pretty rare that the examples don't work.