At work I've come across with an old process written in InfoSphere 11.7, I'm trying to migrate it to another framework but no one seems to know how it works (including me, of course).
The ETL process takes as input a fixed length .txt UTF-8 file. File estructure is as follows:
300000007200_0000000000004 20230524123704000000 P4 0001000073873269002000005091662 0001000073875269002 8 1
700202909 2160000000005967 20230524041311000000 P1 000100005606358000700000236541000045097 2502000501851 0001000066063680007 1000 000000237540000000 1
30000008810000000000003279 20230524041319000000 P2 00010000681349580050000366684071497 6204131926 00000100006813496800464000000000000000036768100000 1
At some point the program evaluates null values using the IsNull
function at a certain position on each row like:
If IsNull(Substrings(Input.reg,1,1)) then " " else Substrings(Input.reg,1,1)
I'm familiar with null values in pandas, or SQL, but I'm not sure what is a null value in this context. From my understanding there will never be a null value, as there will always be a character (including spaces) at every position within the file.
Am I right?
The dev searches for Substrings(Input.reg,1,1), which in theory can return null despite the fact that the input can't deliver something to get null. If the target column type is not nullable, you need to deliver an expression that returns a "not-null". Therefore they wrap the input like this or alternatively with the NullToValue() or NullToEmpty() function, so that the expression itself guarantees a non-nullable output.