regexbash

How to match tabs and spaces and exclude forward slash in bash


The following test string is a git tags string. I want to match the exact 40 "[a-zA-Z0-9]" characters, and then following by multiple "TAB"s or "SPACE"s, and then following by "refs/tags/", and finally following by multiple "NON-FORWARD-SLASH" characters.

Here is the test script:

test_regex() {
    local test_string="123456f43b5c5ec291df504ca037d428792fb37d refs/tags/v1.2.3-rc1"
    
    local hello_regex="^[a-zA-Z0-9]{40}.*refs/tags/.*$"

    if [[ "$test_string" =~ $hello_regex ]]
    then
        echo "hello_regex: yes, matched"
    else
        echo "hello_regex: no, not matched"
    fi

    # TAB or SPACE is not working in "[\t\ ]+"
    # Don't know how to exclude the "/" in "[.]+"
    local expected_regex="^[a-zA-Z0-9]{40}[\t\ ]+refs/tags/[.]+$"

    if [[ "$test_string" =~ $expected_regex ]]
    then
        echo "expected_regex: yes, matched"
    else
        echo "expected_regex: no, not matched"
    fi
}

Here is the output:

hello_regex: yes, matched
expected_regex: no, not matched

What's wrong in the expected_regex and how to fix?


Solution

  • The mistake is [.]+ in your expected_regex. It expects a set of dots not any character.

    To exclude slash character you should use [^/] group. Try use

    local expected_regex="^[a-zA-Z0-9]{40}[\t\ ]+refs/tags/[^/]+$"
    

    You also could use the next regular expression:

    local expected_regex="^[a-zA-Z0-9]{40}\s+refs/tags/[^/]+$"
    

    I have tried this:

    str="123456f43b5c5ec291df504ca037d428792fb37d refs/tags/v.1.2.3-rc1"
    reg="^[a-zA-Z0-9]{40}\s+refs/tags/[^/]+$"
    [[ $str =~ $reg ]] && echo matched
    

    And it outputs matched .