I have a product array which looks like table below:
+---------------------------+--------------------------------+--------------------------------+ | name | review | word_count | +---------------------------+--------------------------------+--------------------------------+ | | | {'and': 5, 'wipes': 1, | | Planetwise | These flannel wipes are OK, | 'stink': 1, 'because' : 2, ... | | Flannel Wipes | but in my opinion ... | | | | | | +---------------------------+--------------------------------+--------------------------------+ | | | {'and': 3, 'love': 1, | | Planetwise | it came early and was not | 'it': 2, 'highly': 1, ... | | Wipes Pouch | disappointed. i love ... | | | | | | +---------------------------+--------------------------------+--------------------------------+ | | | {'shop': 1, 'noble': 1, | | | | 'is': 1, 'it': 1, 'as': ... | | A Tale of Baby's Days | Lovely book, it's bound | | | with Peter Rabbit ... | tightly so you may no ... | | | | | | +---------------------------+--------------------------------+--------------------------------+
Basically the word_count
column contains a dictionary(key : value)
of word occurrence of review
columns sentences.
Now I want to build a new column name and
which should contain value of and
in word_count
dictionary, if and
exists as a key in the word_count
column, then the value, if it doesn't exist as a key, then 0
.
For first 3 rows the new and
column looks something like this:
+------------+
| and |
+------------+
| |
| 5 |
| |
| |
+------------+
| |
| 3 |
| |
| |
+------------+
| |
| 0 |
| |
| |
+------------+
I wrote this code and it's working correctly:
def wordcount(x):
if 'and' in x:
return x['and']
else:
return 0
products['and'] = products['word_count'].apply(wordcount);
My question: Is there any way I can do this using lambda
?
What I've done so far is:
products['and'] = products['word_count'].apply(lambda x : 'and' in x.keys());
This returns only 0
or 1
in columns. What can I add to the line above so that products['and']
contains the value of and
the key when it exists as a key in products['word_count']
?
I'm using ipython notebook and graphlab.
You have the right idea. Just return the value of x['and']
if it exists, otherwise 0
.
For example:
data = {"word_count":[{"foo":1, "and":5},
{"foo":1}]}
df = pd.DataFrame(data)
df.word_count.apply(lambda x: x['and'] if 'and' in x.keys() else 0)
Output:
0 5
1 0
Name: word_count, dtype: int64