tensorflowragged-tensors

strip and filter strings in ragged tensor


I would like learn if there is a decent and tensorflownic way to do follow conversion. Basically each string(row) as several words, and each word has a suffix like "%1", and goal is to strip string and only leave words with suffix value <= certain target value.

It is not hard to achieve using regular python programming. But I am thinking of adding the step to a tf computational graph, so a more tensorflownic way is preferred.

#From
text = tf.constant(['a1%0,a2%0,a3%1,a4%2,a5%3,a6%4','a7%3,a8%4',...]) #in shape of (n,) and n is large

#if target = 1, result will be 
res = tf.ragged.constant([["a1", "a2", "a3"], [],...])
#if target = 3, result will be 
res = tf.ragged.constant([["a1", "a2", "a3", "a4", "a5"], ["a7"],...])


Solution

  • You can do the following (tested in tensorflow 2.9)

    text = tf.constant(['a1%0,a2%0,a3%1,a4%2,a5%3,a6%4','a7%3,a8%4',...]) 
    target = 1
    
    a = tf.strings.split(tf.strings.regex_replace(text, "%\d", ""), ",")
    b = tf.strings.split(tf.strings.regex_replace(text, "[^,]*%", ""), ",")
    b = tf.strings.to_number(b)
    
    c = tf.ragged.boolean_mask(a, (b<=target))