pythonsqlpandassqlite

Two seemingly identical SQL queries with python sqlite3 return different results


I am attempting to unit test a function of mine that checks if something is in a database based on a bunch of "identifiers". This is my very first time doing anything with sqlite3 or SQL at all so it's possible I'm making some very stupid mistake. This function has worked when I ran it on actual data, but for some reason is failing in this testing version. I am using Python 3.9, and sqlite3.

After adding several statements to try and debug, I've ended up with this in my code: (there's actually more columns but I shortened it, it just adds more AND to the WHERE statement, also these are obviously not the variable names in my actual code)

import sqlite3

class TestObject:
  def __init__(self, db_loc):
    self.con = sqlite3.connect(db_loc)
    self.cur = self.con.cursor()

  def get_thingy(self, row):
    # this is applied to a pandas dataframe with .apply(lambda row: self.get_thingy(row), axis=1) so row is a pd.Series
    params = (row.col1, row.col2, str(row.col3), row.col4, row.col5)
    print(params)
    res = self.cur.execute("""
    SELECT thingy FROM thingies
    WHERE (col1 = ? or col2 = ?) AND col3 = ? AND col4 = ? AND col5 = ?""", params).fetchone()
    print(res)
    print(self.cur.execute("""SELECT thingy FROM thingies 
      WHERE (col1 = 1 or col2 = 2) AND col3 = '3' AND col4 = 4 AND col5 = 5""").fetchone())
    # and then more code that does stuff with the retrieved thingy

  def get_thingies(self):
    self.thingy_table['thingy'] = self.thingy_table.apply(lambda row: self.get_thingy(row), axis=1)

while my test looks like

import pytest
import os
import pandas as pd
from [path to the other file] import TestObject

class TestClass:
  @pytest.fixture
  def testobj(self):
    db_path = 'test.db'
    test = TestObject(db_path)
    yield test
    if test.con: test.con.close()
    os.remove(db_path)
 
  def test_get_thingy(self, testobj):
    testobj.cur.execute("""CREATE TABLE thingies(col1, col2, col3, col4, col5, thingy)""")
    testobj.cur.execute("""INSERT INTO thingies VALUES (1, 2, '3', 4, 5, "yay thingy")""")
    testobj.con.commit()
    testobj.get_thingy(pd.Series({'col1': 1, 'col2': 2, 'col3': 3, 'col4': 4, 'col5': 5}))

Running this test prints

(1, 2, '3', 4, 5)
None
('yay thingy',)

I have tried copy/pasting the printed params list in to replace the params keyword in the .execute() command, and this caused both queries to return yay thingy. So something is strange about the way that I am accessing these values.

Why does it not work with params as an input, but if I print params and copy what is printed, that works? Shouldn't it be the exact same? Is something strange about taking these values from a pd.Series? The get_thingies function worked on actual data in a DataFrame, is a row of a DataFrame not actually a Series like I thought it was? I'm losing my mind.


Solution

  • Found my solution, posting as an answer in case anyone else finds this through Google or something.

    Anon Coward's comment about needing .item() did make the test work, however it broke the actual code when I ran it on data, so obviously I had to find another solution.

    It turns out, all I had to do was make the numbers in the test into floats instead of integers - so the INSERT and final lines of my test turned into

    testobj.cur.execute("""INSERT INTO thingies VALUES (1.0, 2.0, "3.0", 4.0, 5.0, "yay thingy")""")
    

    and

    testobj.get_thingy(pd.Series({'col1': 1.0, 'col2': 2.0, 'col3': 3.0, 'col4': 4.0, 'col5': 5.0}))
    

    This makes the test work without having to change the code I'm testing.