I have the following data :
I am trying to use the library - pyjarowinkler and find the distance between strings - my hello world code works
#Hello World
d1=distance.get_jaro_distance("Hello","hello", winkler=True, scaling=0.1);
d1
When I try to iterate each row or use apply my code fails. Can someone please point me in the right direction.
#Import data
import pandas
df = pandas.read_csv('data.csv')
from pyjarowinkler import distance
score=df.apply(distance.get_jaro_distance(df[S1],df[Stores]))
# iterating over rows using iterrows() function
for i, j in df.iterrows():
print(i, j,distance.get_jaro_distance(i,j,winkler=True, scaling=0.1))
print()
Error:
JaroDistanceException: Cannot calculate distance from NoneType (int, Series)
The expected output is :
I think you should be able to do
df['distance'] = df.apply(lambda d: distance.get_jaro_distance(d['S1'],d['store'],winkler=True,scaling=0.1), axis=1)
note the axis=1
parameter being passed to .apply
, this tells it to operate on the df row-wise rather than column-wise