pythonsframe

sFrame into scipy.sparse csr_matrix


I have a sframe like:

x = sf.SFrame({'users': [{'123': 1.0, '122': 5},
{'134': 3.0, '123': 10}]})

I want to convert into scipy.sparse csr_matrix without invoking graphlab create, but only using sframe and Python.

How to do it?


Solution

  • Assuming you want the row number to be the row index in the output sparse matrix, the only tricky step is using SFrame.stack - from there you should be able to construct a csr_matrix directly.

    import sframe as sf
    from scipy.sparse import csr_matrix
    
    x = sf.SFrame({'users': [{'123': 1.0, '122': 5},
                             {'134': 3.0, '123': 10}]})
    x = x.add_row_number('row_id')
    x = x.stack('users')
    A = csr_matrix((x['X3'], (x['row_id'], x['X2'])), 
                   shape=(2, 135))
    

    I'm also hard-coding the dimension of the matrix here, but that's probably something you'd want to figure out programmtically.