stringpaddingdolphindb

How to efficiently pad odd-length strings with leading zeros in DolphinDB?


Does DolphinDB have a built-in function equivalent to Python's str.ljust() for string left-justification?

I have a DolphinDB table tmpCode with a string column Code containing values of varying lengths (both odd and even). I need to standardize all values to even lengths by prepending a '0' to odd-length strings.

My current approach uses a conditional statement with iif and case when statement:

// Version 1: Using iif
updatedCode = select iif(strlen(Code) % 2 == 0, Code, "0" + Code) from tmpCode
// Result: High memory usage (5GB) and slow execution

// Version 2: Using case-when
updatedCode = select case(strlen(Code) % 2 == 0) when true then Code else "0" + Code end from tmpCode
// Result: No significant performance improvement

However, this query consumes 5GB of memory and runs very slowly on a 30MB table.

What is a more memory-efficient and faster way to achieve this in DolphinDB?


Solution

  • First simulate the generation of an in-memory table tb with 2 million rows. The code column in tb contains string literals with values from "1", "2", "3" to "2000000".

    n = 2000000
    tb = table(string(1..n) as code)
    

    There are three approaches to address this problem:

    Approach 1: Use iif to conditionally pad with zeros.
    Use the iif statement to add a leading zero for strings with odd lengths.

    tb.update!(`paddedCode, iif(tb.code.strlen() % 2 == 0, tb.code, `0 + tb.code))
    

    Approach 2: Use the lpad function to left-pad strings with odd lengths.
    Define a custom function evenZeroPad that calculates the nearest even length equal to or greater than the string length, and then pads using the built-in lpad function. Use the each higher-order function to apply evenZeroPad to each row in the code column.

    def evenZeroPad(x): lpad(x, 2 * ceil strlen(x) \ 2, `0)
    tb.update!(`paddedCode, evenZeroPad:E(tb.code))
    

    Approach 3: Pad all strings with one zero, then use substr to trim.
    Pad every string with a leading zero. Then, define a custom function evenZeroPad that trims the string using the substr function. If the original string length is odd (i.e., after padding becomes even), trim from offset 1; otherwise, from offset 0.

    def evenZeroPad(x): substr(x, strlen(x) & 1)
    tb.update!(`paddedCode, evenZeroPad:E(`0 + tb.code))
    

    Performance Test

    Dataset: 2 million rows, 1 column, approximately 45.8 MB.

    Timing: Measured using the timer function.

    Memory Usage: Difference in DolphinDB memory usage before and after execution, measured via top.

    Approach Time (s) Memory Usage (MB)
    1 0.08 100.96
    2 2.13 183.02
    3 1.34 274.66