regextableau-desktop

Attempting to clean a string value using regex in tableau desktop designer


For the value: "SM-N981U Galaxy Note 20 5G 128GB Bronze - T-Mobile" I am looking to remove everything after the word Bronze. I have 3 variations of values. Ones that have a "-" in t-mobile, ones that are samsung models and show as "SM-xxxx" and others that only have 1 "-" separating the color and the carrier.

Essentially, what I would like to do is:

  1. For anything SM-xxxx, trim everything from the second - and over, eliminating the carrier name.
  2. For other devices, remove everything from the first - and over.

Any help appreciated. Originally I was able to remove everything fine on all devices but then I realized I was getting "SM" for the Samsung devices, so my lacking knowledge I believe is of # qualifier. To note, this is being done in Tableau Designer so if there is a better function that can accomplish it outside of regex, that works perfect. Not using Python here, otherwise this would be easy :)

Did a good amount of searching here for an answer. Most are using a programming language which makes this process much easier. Since this is being done in tableau (not a huge expert there), I've hit the wall. So far, I've gotten one expression working until I realized Samsung devices have their own "-" and thus removing everything after the SM. Any guidance would be appreciated!


Solution

  • Tableau Desktop, and its companion tool, Tableau Prep Builder include a calculation expression language that is useful for transformations of data. They can also call Python, R or Matlab functions when needed.

    Tableau Prep Builder in particular is well suited for data cleaning.

    Both tools support several functions that work with regular expressions that can be used for the types of transforms you requested. There are also functions that can partition a string into sections based on a delimiter, such as a dash in your example.

    Read the help to learn about the various functions available https://help.tableau.com/current/pro/desktop/en-us/functions.htm

    The SPLIT() function can be paired with the "+" operator (for concatenation) to split apart a string at each dash, and then manipulate the parts. Or you can achieve a lot with regular expressions using the functions that start with REGEXP, such as REGEXP_EXTRACT() and REGEXP_REPLACE() -- documented in the additional functions sections.