regexbashsedunix-text-processingtext-capture

How to extract branch name using regex and sed?


How can I extract the branch name from a string using bash? For example, I have the following command:

branch=$(git branch -a --contains $sha)

This may return:

  1. * branch-1.0 (the prefix is always an asterisk)

  2. branch-2.0 remotes/origin/branch-2.0 (here may be a new line instead of a space)

  3. master remotes/origin/master (here may be a new line instead of a space)

And I need only the branch name (and only once) - master, branch-2.0 or branch-1.0. I know it can be done with the sed command, but I can't figure out how.

I use the following regex: (branch-[0-9].[0-9])|(master)


Solution

  • This is how it can be done in Bash, without using an external regex parser:

    # Read reference name path in an array splitting entries by /
    IFS=/ read -ra refname < <(
      # Obtain full branch reference path that contains this sha
      git branch --format='%(refname)' --contains="$sha"
    )
    
    # Branch name is the last array element
    branchname="${refname[-1]}"
    
    printf 'The git branch name for sha: %s\nis: %s\n' "$sha" "$branchname"
    

    Or using a POSIX-shell grammar only:

    # Read reference path
    refname=$(
      # Obtain full branch reference path that contains this sha
      git branch --format='%(refname)' --contains="$sha"
    )
    
    # Trim-out all leading path to get only the branch name
    branchname="${refname##*/}"
    
    printf 'The git branch name for sha: %s\nis: %s\n' "$sha" "$branchname"
    

    EDIT:

    As Philippe mentionned --format='%(refname:short) will directly return the branch name without path, thus saving the need for further processing to extract it from the full reference path.

    branchname=$(git branch --format='%(refname:short)' --contains="$sha")