I'm curious how Python performs string comparisons under the hood. For example if
if s1 == s2:
print(True)
else:
print(False)
is the same as
condition= True
for x,y in zip(s1, s2):
if x != y:
condition = False
print(condition)
Perhaps under the hood python is able to use ord values more efficiently than O(n) traversals?
Python's string compare is implemented in unicodeobject.c
. After a few checks such as string length and "kind" (Python may use 1, 2 or 4 bytes per character depending on unicode USC character size), it's just a call to the C lib memcmp
.
With a quick change to your Python code
condition = True
if len(s1) == len(s2):
for x,y in zip(s1, s2):
if x != y:
condition = False
break
else:
condition = False
The Python code has the same O(n) time complexity as memcmp, it's just that Python has a much bigger O. Time complexity doesn't say anything about how long an operation takes, just how an operation scales with a larger input set n
.
memcmp
is much faster than the python version because of inherent language overhead. But it scales the same. And when you think about it, each of the if x != y:
compares in the second example runs the exact same code as the single s1 == s2
compare in the first.