bashawk

How to parse a simplified YAML file using AWK?


I have a file (data.txt) that looks like this:

apple:
  weight: 15
  color: red
banana:
  weight: 30
  length: 12
grape:
  weight: 5
peach:
  weight: 25
  color: orange
  length: 4

Using bash, and awk, (and or sed), how would I pick out the values for the sub keys here? I want to have a bash function such as:

   function get_value(parent_key, child_key){ ... }

   #Test 1
   expected=15
   actual=$(get_value(apple,weight)) # Apples have a weight

   #Test 2
   expected=red
   actual=$(get_value(apple,color))  # Apples have a color

   #Test 3
   expected=""
   actual=$(get_value(apple,length))  # apples don't have a length
   
   #Test 4
   expected=""
   actual=$(get_value(banana,color))  # no color defined

Sadly I don't have access to perl/python. This isn't a YAML file, just data with colons separating keys and values. The file is made up of a parent key, and a child key only.

I've tried something like this:

awk "/$key"'/ {found=1} found && /'"$child_key:"'/{print; found=0}' data.txt

But this doesn't handle the case where a child key doesn't have a value.


Solution

  • Using any POSIX awk:

    $ cat tst.sh
    #!/usr/bin/env bash
    
    get_value() {
        awk -v tgt_parent_key="$1" -v tgt_child_key="$2" '
            {
                gsub(/^[[:space:]]+|[[:space:]]+$/,"")
                key = val = $0
                sub(/[[:space:]]*:.*/,"",key)
                sub(/[^:]*:[[:space:]]*/,"",val)
    
                if ( val == "" ) {
                    parent_key = key
                    parent_val = val
                    child_key = child_val = ""
                }
                else {
                    child_key = key
                    child_val = val
                }
            }
            (parent_key == tgt_parent_key) && (child_key == tgt_child_key) {
                print child_val
            }
        ' "$infile"
    }
    
    infile="$1"
    
    #Test 1
    expected=15
    actual=$(get_value 'apple' 'weight') # Apples have a weight
    printf 'expected="%s", actual="%s"\n' "$expected" "$actual"
    
    #Test 2
    expected=red
    actual=$(get_value 'apple' 'color')  # Apples have a color
    printf 'expected="%s", actual="%s"\n' "$expected" "$actual"
    
    #Test 3
    expected=""
    actual=$(get_value 'apple' 'length')  # apples don't have a length
    printf 'expected="%s", actual="%s"\n' "$expected" "$actual"
    
    #Test 4
    expected=""
    actual=$(get_value 'banana' 'color')  # no color defined
    printf 'expected="%s", actual="%s"\n' "$expected" "$actual"
    

    $ ./tst.sh file
    expected="15", actual="15"
    expected="red", actual="red"
    expected="", actual=""
    expected="", actual=""
    

    The above assumes that the way to tell a parent from a child key is that a parent doesn't have a value after the : while a child does. If that's wrong then fix your example to include a case where a parent has a value after the : and explain how to tell them apart.