pythonreplacedatatablenapy-datatable

Replace all 'NA' with 0 in complete DT (Python Datatable)


Hi I am working with the Python datatable package and need to replace all the 'NA' after joining two DT's.

Sample data:

DT = data.table(x=rep(c("b","a","c"),each=3), y=c(1,3,6), v=1:9)
X = data.table(x=c("c","b"), v=8:7, foo=c(4,2))

X[DT, on="x"]

The code below replaces all 1 with 0

DT.replace(1, 0)

How should I adapt it to replace 'NA'? Or is there maybe an option to change the padding while joining from 'NA' to '0'? Thank you.


Solution

  • Here is the code using python's data structures :

    from datatable import dt, f, by, join
    
    DT = dt.Frame(x = ["b"]*3 + ["a"]*3 + ["c"]*3,
              y = [1, 3, 6] * 3,
              v = range(1, 10))
    
    X = dt.Frame({"x":('c','b'),
                  "v":(8,7),
                  "foo":(4,2)})
    
    X.key="x" # key the ``x`` column
    
    merger = DT[:, :, join(X)]
    merger
    
        x   y   v   v.0 foo
    0   b   1   1   7   2
    1   b   3   2   7   2
    2   b   6   3   7   2
    3   a   1   4   NA  NA
    4   a   3   5   NA  NA
    5   a   6   6   NA  NA
    6   c   1   7   8   4
    7   c   3   8   8   4
    8   c   6   9   8   4
    

    The NA is also None; it makes it easy to replace with 0 :

    merger.replace(None, 0)
    
    
    
    x   y   v   v.0 foo
    0   b   1   1   7   2
    1   b   3   2   7   2
    2   b   6   3   7   2
    3   a   1   4   0   0
    4   a   3   5   0   0
    5   a   6   6   0   0
    6   c   1   7   8   4
    7   c   3   8   8   4
    8   c   6   9   8   4