pythonapplygraphlabsframe

Using apply() method on SFrame issues


Background: I have an SFrame that contains numbers indicting how close a dog image is to other images. Usually the dog image should be closest to another dog image but the point is to test the evaluation method

My SFrame is called dog_distances (1000 rows x 4 columns):

dog-automobile  dog-bird             dog-cat    dog-dog
41.9579761457   41.7538647304   36.4196077068   33.4773590373
46.0021331807   41.3382958925   38.8353268874   32.8458495684
42.9462290692   38.6157590853   36.9763410854   35.0397073189
41.6866060048   37.0892269954   34.5750072914   33.9010327697
39.2269664935   38.272288694    34.778824791    37.4849250909
40.5845117698   39.1462089236   35.1171578292   34.945165344

I want to write a function that checks if dog-dog is the lowest number and apply this function to the whole SFrame

Accessing a row of an SFrame normally outputs a dict... sframe_name[row#]['column_name']

Adding .values() to the end of that line just outputs values in a list. This allows you to apply math methods like min() or max() which is useful for creating the function is_dog_correct.

Thus my function is:

def is_dog_correct(row):
    #checking if dog-dog is smallest value
    if dog_distances[row]['dog-dog'] == min(dog_distances[row].values()):
        return 1
    else:
        return 0

My function takes row as in input, and returns 1 if the value of dog-dog for that row is equal to the min value in that row. It returns 0 if this is not true.

Running is_dog_correct(0) outputs 1. We expect this because, as you can see above, the value in dog-dog for the zeroth row is the smallest number in that row.

Running is_dog_correct(4) outputs 0. We expect this because the value in dog-dog for the zeroth row is NOT the smallest number in that row.

So the function is_dog_correct works perfectly on a row by row basis!

When I run as suggested on the whole sFrame: dog_distances.apply(is_dog_correct)

I get an attribute error:

'SFrame' object has no attribute 'values'

Please someone explain why the function works row by row but not on the whole SFrame??


Solution

  • I figured out the solution:

    The problem I think is all documentation suggests that .apply() goes row by row. I assumed that this meant, as it ran a function on a given row, the variable passed was the row number as an integer.

    In fact, the variable/object/text that is passed to .apply() is sframe_name[row_#]

    So in your function if you want to access/act on a given index

    sframe_name[row_#]['column_name']
    

    A generic form would be this:

    passed_variable['column_name']
    

    Just to be utterly transparent, in my function the exact code was:

    if dog-dog[row]['dog-bird'] <= dog-dog[row]['dog-dog']:
    

    When the code should have been:

    if row['dog-bird'] <= row['dog-dog']: