My program would scrape some website and create two lists, one for category, the other for content. I then use dict(zip(......))
command to match them and put them into a dict.
Something like this:
complete_dict=dict(zip(category_list,info_list))
I run into the problem that my program is reading empty element in both lists (category, info). It's fine as long as I can remove them later. The problem is, I failed to find a way to do so. When reading out, both lists contain empty elements, not empty strings, but more like an empty list within a list. I try to remove them both in lists and in dictionary after zipping them, using commands like:
category_list=filter(None, category_list)
or:
info_list=[x for x in info_list if x != []]
Of course operation is done for both lists.
None prevailed. I then tried doing it in the dictionary with:
dict((k, v) for k, v in complete_list.iteritems() if v)
What else can I try at this point?
I tried filtering, and either my conditions are not set correctly or it simply doesn't solve the problem. I'm looking for other way so it's not a duplicate of another thread (that thread has some useful info though).
What I'm getting right now is:
[u'info1', u'info2', u'info3', u'info4', ...]
[]
[]
[]
[]
[u'info1', u'info2', u'info3', u'info4', ...]
[]
[]
[]
[u'info1', u'info2', u'info3', u'info4', ...]
info 1, 2, 3, and 4 (and there are actually more elements) are content scraped from website, sorry I can't really reveal what those are, but the idea shows. This is one of the list (info_list), and I'm trying to remove all the []'s stuck in middle, so the result should be like
[u'info1', u'info2', u'info3', u'info4', ...]
[u'info1', u'info2', u'info3', u'info4', ...]
[u'info1', u'info2', u'info3', u'info4', ...]
and so on
My result looks like this after dict(zip(...))
{u'category1': u'info1', u'category2': u'info2', ...}
{}
{}
{u'category1': u'info1', u'category2': u'info2', ...}
{u'category1': u'info1', u'category2': u'info2', ...}
{}
{}
{}
and so on.
but more like an empty list within a list.
Assuming this is guaranteed you can do
# make sure value is not "[]" or "[[]]"
{k: v for k, v in complete_list.iteritems() if v and v[0]}
Example:
complete_list = {'x': [[]], 'y': [], 'z': [[1]]}
{k: v for k, v in complete_list.iteritems() if v and v[0]}
# returns {'z': [[1]]}
EDIT
From your updated question, I see you are zipping lists together after scraping from a website like so:
complete_dict=dict(zip(category_list,info_list))
It looks like your info_list
is empty in some cases, just do
if info_list:
complete_dict=dict(zip(category_list,info_list))
to ensure you don't zip category_list
with an empty list.