pythonbeautifulsoup

BeautifulSoup in Python find() works as unexpected way with tuples


I am practicing crawling web, and yesterday I had an unexpected correct result which I dont think it should be work.

I used soup.find(id=i) to find the attribute key i, I though i must be string, but when I passed a tuple - which is first element of tuple is string that is key, and I was surprise when it still ran correct result.

let say '01' is the key of attribute, the code below had exact result with id='01' tup = ('01', 'Revenue') acc = soup.find(id=tup).text.strip().split('\n')

Who has experience on this matter, please help me to explain? Thank you so much.

What I tried:

tup = ('01', 'Revenue')
acc = soup.find(id=tup).text.strip().split('\n')

I expect the KeyError because I passed a tuple instead of a string to id.


Solution

  • I searched the BeautifulSoup source code

    And found 3 occurrences where it checks if something is a tuple.

    I haven't went to the whole chain of calls, but it seems to me that whenever you pass tuples or lists as arguments, BeautifulSoup will turn it into a space-separated string, and will further checks for every values.

    I think this part is the actual unpacking / conversion: From source code

        def _attr_value_as_string(self, value, default=None):
            """Force an attribute value into a string representation.
    
            A multi-valued attribute will be converted into a
            space-separated stirng.
            """
            value = self.get(value, default)
            if isinstance(value, list) or isinstance(value, tuple):
                value =" ".join(value)
            return value