Does DolphinDB have a built-in function equivalent to Python's str.ljust()
for string left-justification?
I have a DolphinDB table tmpCode
with a string column Code containing values of varying lengths (both odd and even). I need to standardize all values to even lengths by prepending a '0' to odd-length strings.
My current approach uses a conditional statement with iif
and case when
statement:
// Version 1: Using iif
updatedCode = select iif(strlen(Code) % 2 == 0, Code, "0" + Code) from tmpCode
// Result: High memory usage (5GB) and slow execution
// Version 2: Using case-when
updatedCode = select case(strlen(Code) % 2 == 0) when true then Code else "0" + Code end from tmpCode
// Result: No significant performance improvement
However, this query consumes 5GB of memory and runs very slowly on a 30MB table.
What is a more memory-efficient and faster way to achieve this in DolphinDB?
First simulate the generation of an in-memory table tb
with 2 million rows. The code
column in tb
contains string literals with values from "1"
, "2"
, "3"
to "2000000"
.
n = 2000000
tb = table(string(1..n) as code)
There are three approaches to address this problem:
Approach 1: Use iif
to conditionally pad with zeros.
Use the iif
statement to add a leading zero for strings with odd lengths.
tb.update!(`paddedCode, iif(tb.code.strlen() % 2 == 0, tb.code, `0 + tb.code))
Approach 2: Use the lpad
function to left-pad strings with odd lengths.
Define a custom function evenZeroPad
that calculates the nearest even length equal to or greater than the string length, and then pads using the built-in lpad
function. Use the each
higher-order function to apply evenZeroPad
to each row in the code
column.
def evenZeroPad(x): lpad(x, 2 * ceil strlen(x) \ 2, `0)
tb.update!(`paddedCode, evenZeroPad:E(tb.code))
Approach 3: Pad all strings with one zero, then use substr
to trim.
Pad every string with a leading zero. Then, define a custom function evenZeroPad
that trims the string using the substr
function. If the original string length is odd (i.e., after padding becomes even), trim from offset 1; otherwise, from offset 0.
def evenZeroPad(x): substr(x, strlen(x) & 1)
tb.update!(`paddedCode, evenZeroPad:E(`0 + tb.code))
Dataset: 2 million rows, 1 column, approximately 45.8 MB.
Timing: Measured using the timer
function.
Memory Usage: Difference in DolphinDB memory usage before and after execution, measured via top
.
Approach | Time (s) | Memory Usage (MB) |
---|---|---|
1 | 0.08 | 100.96 |
2 | 2.13 | 183.02 |
3 | 1.34 | 274.66 |