I have this dataframe
transjakarta_lines = gpd.read_file('https://raw.githubusercontent.com/lokalhangatt/stackoverlow/refs/heads/main/dataviz_day13/transjakarta_lines.geojson')
transjakarta_data = pd.read_excel("https://github.com/lokalhangatt/stackoverlow/raw/refs/heads/main/dataviz_day13/TJ_Agustus_2020.xlsx", index_col=0).T.reset_index() #transposed
It shows something like this:
I try to merge this two dataframe. But it shows error like this:
Merging
epsg_jkt = 5330
transjakarta_lines['koridor'] = transjakarta_lines['koridor'].apply(int)
transjakarta_data['koridor'] = transjakarta_data['index'].apply(int)
transjakarta = pd.merge(transjakarta_lines, transjakarta_data)
transjakarta = gpd.GeoDataFrame(transjakarta)
transjakarta.crs = transjakarta_lines.crs
transjakarta_planar = transjakarta.to_crs(epsg=epsg_jkt)
Error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[225], line 3
1 # gabungkan data keduanya
2 transjakarta_lines['koridor'] = transjakarta_lines['koridor'].apply(int)
----> 3 transjakarta_data['koridor'] = transjakarta_data['index'].apply(int)
4 transjakarta = pd.merge(transjakarta_lines, transjakarta_data)
6 # convert kembali ke geodataframe
File c:\Program Files\Python313\Lib\site-packages\pandas\core\series.py:4924, in Series.apply(self, func, convert_dtype, args, by_row, **kwargs)
4789 def apply(
4790 self,
4791 func: AggFuncType,
(...)
4796 **kwargs,
4797 ) -> DataFrame | Series:
4798 """
4799 Invoke function on values of Series.
4800
(...)
4915 dtype: float64
4916 """
4917 return SeriesApply(
4918 self,
4919 func,
4920 convert_dtype=convert_dtype,
4921 by_row=by_row,
4922 args=args,
4923 kwargs=kwargs,
-> 4924 ).apply()
File c:\Program Files\Python313\Lib\site-packages\pandas\core\apply.py:1427, in SeriesApply.apply(self)
1424 return self.apply_compat()
1426 # self.func is Callable
-> 1427 return self.apply_standard()
File c:\Program Files\Python313\Lib\site-packages\pandas\core\apply.py:1507, in SeriesApply.apply_standard(self)
1501 # row-wise access
1502 # apply doesn't have a `na_action` keyword and for backward compat reasons
1503 # we need to give `na_action="ignore"` for categorical data.
1504 # TODO: remove the `na_action="ignore"` when that default has been changed in
1505 # Categorical (GH51645).
1506 action = "ignore" if isinstance(obj.dtype, CategoricalDtype) else None
-> 1507 mapped = obj._map_values(
1508 mapper=curried, na_action=action, convert=self.convert_dtype
1509 )
1511 if len(mapped) and isinstance(mapped[0], ABCSeries):
1512 # GH#43986 Need to do list(mapped) in order to get treated as nested
1513 # See also GH#25959 regarding EA support
1514 return obj._constructor_expanddim(list(mapped), index=obj.index)
File c:\Program Files\Python313\Lib\site-packages\pandas\core\base.py:921, in IndexOpsMixin._map_values(self, mapper, na_action, convert)
918 if isinstance(arr, ExtensionArray):
919 return arr.map(mapper, na_action=na_action)
--> 921 return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)
File c:\Program Files\Python313\Lib\site-packages\pandas\core\algorithms.py:1743, in map_array(arr, mapper, na_action, convert)
1741 values = arr.astype(object, copy=False)
1742 if na_action is None:
-> 1743 return lib.map_infer(values, mapper, convert=convert)
1744 else:
1745 return lib.map_infer_mask(
1746 values, mapper, mask=isna(values).view(np.uint8), convert=convert
1747 )
File lib.pyx:2972, in pandas._libs.lib.map_infer()
ValueError: invalid literal for int() with base 10: 'Rata-rata Harlan'
If I don't transpose transjakarta_data
and try to apply(int)
it shows error like this:
pd.read_excel("https://github.com/lokalhangatt/stackoverlow/raw/refs/heads/main/dataviz_day13/TJ_Agustus_2020.xlsx",
index_col=0).reset_index().index.apply(int)
Error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[255], line 1
----> 1 pd.read_excel("https://github.com/lokalhangatt/stackoverlow/raw/refs/heads/main/dataviz_day13/TJ_Agustus_2020.xlsx", index_col=0).reset_index().index.apply(int)
AttributeError: 'RangeIndex' object has no attribute 'apply'
How can I solve this?
Just checked the data myself, it seems that you might be trying to convert the index column 'index' rathen than the index itself into a new column called korridor. Transposing and resetting index also leads to the creation of a new column with {Rata-rata Harlan, rata-rata Weekdata, Rata-rata Weekend}, which further complicates the process. In case you would like the data to be aligned by the 'Koridor' in 13 rows as there are Koridors, I am attaching the code:
import pandas as pd
import geopandas as gpd
transjakarta_lines = gpd.read_file('https://raw.githubusercontent.com/lokalhangatt/stackoverlow/refs/heads/main/dataviz_day13/transjakarta_lines.geojson')
transjakarta_data = pd.read_excel("https://github.com/lokalhangatt/stackoverlow/raw/refs/heads/main/dataviz_day13/TJ_Agustus_2020.xlsx", index_col=0).reset_index() #no transposing required in this case
transjakarta = pd.merge(transjakarta_lines, transjakarta_data, on = 'koridor') #after resetting index both dataframes now have a mathcing columns, hence can be merged on it
transjakarta
Please do correct me in case I got the wrong idea of the intent of the code.