pythondictionarylambdajupyter-notebookgraphlab

Value of specific key from a dictionary using lambda?


I have a product array which looks like table below:

+---------------------------+--------------------------------+--------------------------------+
|    name                   |  review                        | word_count                     |
+---------------------------+--------------------------------+--------------------------------+
|                           |                                | {'and': 5, 'wipes': 1,         |
| Planetwise                |  These flannel wipes are OK,   | 'stink': 1, 'because' : 2, ... |
| Flannel Wipes             |  but in my opinion ...         |                                |
|                           |                                |                                |
+---------------------------+--------------------------------+--------------------------------+
|                           |                                | {'and': 3, 'love': 1,          |
| Planetwise                |  it came early and was not     | 'it': 2, 'highly': 1, ...      |
| Wipes Pouch               |  disappointed. i love ...      |                                |
|                           |                                |                                |
+---------------------------+--------------------------------+--------------------------------+
|                           |                                | {'shop': 1, 'noble': 1,        |
|                           |                                | 'is': 1, 'it': 1, 'as': ...    |
| A Tale of Baby's Days     |  Lovely book, it's bound       |                                |
|  with Peter Rabbit ...    |  tightly so you may no ...     |                                |
|                           |                                |                                |
+---------------------------+--------------------------------+--------------------------------+

Basically the word_count column contains a dictionary(key : value) of word occurrence of review columns sentences.

Now I want to build a new column name and which should contain value of and in word_count dictionary, if and exists as a key in the word_count column, then the value, if it doesn't exist as a key, then 0.

For first 3 rows the new and column looks something like this:

+------------+
|    and     |
+------------+
|            |
| 5          |
|            |
|            |
+------------+
|            |
| 3          |
|            |
|            |
+------------+
|            |
| 0          |
|            |
|            |
+------------+

I wrote this code and it's working correctly:

def wordcount(x):
    if 'and' in x:
        return x['and']
    else:
        return 0

products['and'] = products['word_count'].apply(wordcount);

My question: Is there any way I can do this using lambda?

What I've done so far is:

products['and'] = products['word_count'].apply(lambda x : 'and' in x.keys());

This returns only 0 or 1 in columns. What can I add to the line above so that products['and'] contains the value of and the key when it exists as a key in products['word_count']?

I'm using ipython notebook and graphlab.


Solution

  • You have the right idea. Just return the value of x['and'] if it exists, otherwise 0.

    For example:

    data = {"word_count":[{"foo":1, "and":5}, 
                          {"foo":1}]}
    df = pd.DataFrame(data)
    df.word_count.apply(lambda x: x['and'] if 'and' in x.keys() else 0)
    

    Output:

    0    5
    1    0
    Name: word_count, dtype: int64