Background: I have an SFrame that contains numbers indicting how close a dog image is to other images. Usually the dog image should be closest to another dog image but the point is to test the evaluation method
My SFrame is called dog_distances (1000 rows x 4 columns):
dog-automobile dog-bird dog-cat dog-dog
41.9579761457 41.7538647304 36.4196077068 33.4773590373
46.0021331807 41.3382958925 38.8353268874 32.8458495684
42.9462290692 38.6157590853 36.9763410854 35.0397073189
41.6866060048 37.0892269954 34.5750072914 33.9010327697
39.2269664935 38.272288694 34.778824791 37.4849250909
40.5845117698 39.1462089236 35.1171578292 34.945165344
I want to write a function that checks if dog-dog is the lowest number and apply this function to the whole SFrame
Accessing a row of an SFrame normally outputs a dict... sframe_name[row#]['column_name']
Adding .values() to the end of that line just outputs values in a list. This allows you to apply math methods like min() or max() which is useful for creating the function is_dog_correct.
Thus my function is:
def is_dog_correct(row):
#checking if dog-dog is smallest value
if dog_distances[row]['dog-dog'] == min(dog_distances[row].values()):
return 1
else:
return 0
My function takes row as in input, and returns 1 if the value of dog-dog for that row is equal to the min value in that row. It returns 0 if this is not true.
Running is_dog_correct(0) outputs 1. We expect this because, as you can see above, the value in dog-dog for the zeroth row is the smallest number in that row.
Running is_dog_correct(4) outputs 0. We expect this because the value in dog-dog for the zeroth row is NOT the smallest number in that row.
So the function is_dog_correct works perfectly on a row by row basis!
When I run as suggested on the whole sFrame: dog_distances.apply(is_dog_correct)
I get an attribute error:
'SFrame' object has no attribute 'values'
Please someone explain why the function works row by row but not on the whole SFrame??
I figured out the solution:
The problem I think is all documentation suggests that .apply() goes row by row. I assumed that this meant, as it ran a function on a given row, the variable passed was the row number as an integer.
In fact, the variable/object/text that is passed to .apply() is sframe_name[row_#]
So in your function if you want to access/act on a given index
sframe_name[row_#]['column_name']
A generic form would be this:
passed_variable['column_name']
Just to be utterly transparent, in my function the exact code was:
if dog-dog[row]['dog-bird'] <= dog-dog[row]['dog-dog']:
When the code should have been:
if row['dog-bird'] <= row['dog-dog']: