unidatamultivalue-database

Unidata - Extract Count of strings from MultiValued fields


I'm a bit of a newbie when it comes to Unidata.

My problem is this;

I have lots of records with 20 fields. I want to extract a TOTAL count of the different values used in 1 of those fields; the field in question is multi-valued, so the data looks like "CSR²STR²CD2" etc.

I want to output to excel so the data looks something like the below, containing the COUNT for each value and how many times it appears in that field, across all records.

Column1 Column2
CSR 234235
STR 987346
CD2 736252

I've been told I can achieve this using a dictionary that calculates this or a subroutine. But I have very little technical grounding in Unidata, so where to start is a bit beyond me.


Solution

  • It sounds like you need an exploding sort? Multivalued (MV) fields, while they are very convenient in Basic code, can be sort of a pain to deal with in the query language. There is explicit support for them, but it's not as simple as non-MV data. The key is to do an "exploding" sort, which will flatten the MV fields - I think of it as creating "virtual" rows for each one of the MV fields. If there are multiple MV fields and they are associated correctly with each other, they will be linked together in the virtual rows. Single valued and non-associated MV fields will be duplicated in each of the virtual rows. It's difficult to explain, but for your question here's an example query:

    The query and results:

    >sort IAN.TEMP BY.EXP ATB BREAK.ON ATB TOTAL COUNTER ID.SUP DET.SUP
    
    ATB.. COUNT
    CD2       3
    CD3       3
    CD4       2
    CSR       3
    IAN       2
          =====
    TOTAL    13
    13 records listed
    

    Setting up the dictionary:

    >AE DICT IAN.TEMP ATB COUNTER
    < 1 > Top of "ATB" in "DICT IAN.TEMP", 7 lines, 11 characters.
    *--: P
    001: D
    002: 1
    003:
    004:
    005: 5L
    006: M  <--- This is very important!  Must be M (or MV) for BY.EXP to work
    007:    <--- If there are associated MV fields, this needs to be populated
    Bottom.
    *--: EX
    Quit "ATB" in file "DICT IAN.TEMP" unchanged.
    < 2 > Top of "COUNTER" in "DICT IAN.TEMP", 6 lines, 10 characters.
    *--: P
    001: I
    002: 1  <-- This just returns "1" for every row in the output, to help with totals
    003:
    004:
    005: 5R
    006: S
    Bottom.
    *--: EX
    Quit "COUNTER" in file "DICT IAN.TEMP" unchanged.
    

    Setting up the sample data:

    >AE IAN.TEMP *
    
    4 record(s) selected.
    
    < 1 > Top of "3" in "IAN.TEMP", 1 line, 7 characters.
    *--: P
    001: CD2▒IAN
    Bottom.
    *--: EX
    Quit "3" in file "IAN.TEMP" unchanged.
    < 2 > Top of "1" in "IAN.TEMP", 1 line, 15 characters.
    *--: P
    001: CSR▒CD2▒CD3▒IAN
    Bottom.
    *--: EX
    Quit "1" in file "IAN.TEMP" unchanged.
    < 3 > Top of "4" in "IAN.TEMP", 1 line, 15 characters.
    *--: P
    001: CSR▒CD2▒CD3▒CD4
    Bottom.
    *--: EX
    Quit "4" in file "IAN.TEMP" unchanged.
    < 4 > Top of "2" in "IAN.TEMP", 1 line, 11 characters.
    *--: P
    001: CD4▒CD3▒CSR
    Bottom.
    *--: EX
    Quit "2" in file "IAN.TEMP" unchanged.