python openai-api difflib sequencematcher

difflib.SequenceMatcher suddenly returns different similarity ratio without code or environment changes

We're using Python’s difflib.SequenceMatcher to compare strings in a production system. Here's the simplified relevant code:

from difflib import SequenceMatcher

similarity = SequenceMatcher(
    None, 
    normalized_transcript, 
    normalized_expected
).ratio()

Until 4:10 PM UTC today, the above code was returning a similarity ratio above our internal threshold for a specific string comparison.

After that time, without any change in our code, server configuration, or environment, the same comparison started returning a lower similarity score, failing the threshold check.

Some key facts:

The behavior changed consistently across all environments: both development and production.
Servers run a mix of Windows (dev) and Unix (dev + prod), so this is not likely to be an OS-specific issue.
There were no code deployments, no dependency changes, and no environment variable alterations.
We are aware that SequenceMatcher works entirely locally—there are no third-party requests or models involved.
We’ve validated that the inputs to SequenceMatcher are identical to previous values (confirmed via logs).

Important detail: the string normalized_transcript comes from OpenAI API completions. That’s the only potentially "variable" external component in the system. However, the strings in question are very short, and we’ve historically seen consistent outputs from OpenAI for this prompt setup.

This behavior is baffling. Is there any known edge case, maybe time-sensitive internal optimization, or anything else that could explain this sudden change in behavior from SequenceMatcher?

Solution

As I mentioned in a comment, I wrote the difflib code in question and "it's entirely self-contained and purely functional (the results depend solely on the sequences passed to it)."

It knows nothing about time, which platform it's running on, or anything in its environment, how or when the sequences passed to it were obtained ...

So more information is needed. Not about your environment, but about the symptom itself: what result did you get? what result did you expect? which inputs were passed? We haven't yet been told anything relevant.

Important detail: the string normalized_transcript comes from OpenAI API completions. That’s the only potentially "variable" external component in the system

Then that's the only guess I have.

However, the strings in question are very short,

Why would their lengths be relevant?

and we’ve historically seen consistent outputs from OpenAI for this prompt setup.

Past performance is no guarantee of future results ;-)

At a bare minimum, show us the precise strings that were passed, and what .ratio() returned on your box(es). Then we can at least see whether people can reproduce your results. And as the algorithm's creator, I may be able to guess non-obvious (to others) things from the precise floating-point result .ratio() returned.

But, as is, we're all flying blind here.