[SOLVED] BeautifulSoup in Python find() works as unexpected way with tuples

BeautifulSoup in Python find() works as unexpected way with tuples

I am practicing crawling web, and yesterday I had an unexpected correct result which I dont think it should be work.

I used soup.find(id=i) to find the attribute key i, I though i must be string, but when I passed a tuple - which is first element of tuple is string that is key, and I was surprise when it still ran correct result.

let say '01' is the key of attribute, the code below had exact result with id='01' tup = ('01', 'Revenue') acc = soup.find(id=tup).text.strip().split('\n')

Who has experience on this matter, please help me to explain? Thank you so much.

What I tried:

tup = ('01', 'Revenue')
acc = soup.find(id=tup).text.strip().split('\n')

I expect the KeyError because I passed a tuple instead of a string to id.

Solution

I searched the BeautifulSoup source code

And found 3 occurrences where it checks if something is a tuple.

I haven't went to the whole chain of calls, but it seems to me that whenever you pass tuples or lists as arguments, BeautifulSoup will turn it into a space-separated string, and will further checks for every values.

I think this part is the actual unpacking / conversion: From source code

    def _attr_value_as_string(self, value, default=None):
        """Force an attribute value into a string representation.

        A multi-valued attribute will be converted into a
        space-separated stirng.
        """
        value = self.get(value, default)
        if isinstance(value, list) or isinstance(value, tuple):
            value =" ".join(value)
        return value