pythondataframegeospatial

Dataframe columns can't be found


I have this dataframe

transjakarta_lines = gpd.read_file('https://raw.githubusercontent.com/lokalhangatt/stackoverlow/refs/heads/main/dataviz_day13/transjakarta_lines.geojson')
transjakarta_data = pd.read_excel("https://github.com/lokalhangatt/stackoverlow/raw/refs/heads/main/dataviz_day13/TJ_Agustus_2020.xlsx", index_col=0).T.reset_index() #transposed

It shows something like this: I try to merge this two dataframe. But it shows error like this:
Merging

epsg_jkt = 5330

transjakarta_lines['koridor'] = transjakarta_lines['koridor'].apply(int)
transjakarta_data['koridor'] = transjakarta_data['index'].apply(int)
transjakarta = pd.merge(transjakarta_lines, transjakarta_data)

transjakarta = gpd.GeoDataFrame(transjakarta)

transjakarta.crs = transjakarta_lines.crs

transjakarta_planar = transjakarta.to_crs(epsg=epsg_jkt)

Error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[225], line 3
      1 # gabungkan data keduanya
      2 transjakarta_lines['koridor'] = transjakarta_lines['koridor'].apply(int)
----> 3 transjakarta_data['koridor'] = transjakarta_data['index'].apply(int)
      4 transjakarta = pd.merge(transjakarta_lines, transjakarta_data)
      6 # convert kembali ke geodataframe

File c:\Program Files\Python313\Lib\site-packages\pandas\core\series.py:4924, in Series.apply(self, func, convert_dtype, args, by_row, **kwargs)
   4789 def apply(
   4790     self,
   4791     func: AggFuncType,
   (...)
   4796     **kwargs,
   4797 ) -> DataFrame | Series:
   4798     """
   4799     Invoke function on values of Series.
   4800 
   (...)
   4915     dtype: float64
   4916     """
   4917     return SeriesApply(
   4918         self,
   4919         func,
   4920         convert_dtype=convert_dtype,
   4921         by_row=by_row,
   4922         args=args,
   4923         kwargs=kwargs,
-> 4924     ).apply()

File c:\Program Files\Python313\Lib\site-packages\pandas\core\apply.py:1427, in SeriesApply.apply(self)
   1424     return self.apply_compat()
   1426 # self.func is Callable
-> 1427 return self.apply_standard()

File c:\Program Files\Python313\Lib\site-packages\pandas\core\apply.py:1507, in SeriesApply.apply_standard(self)
   1501 # row-wise access
   1502 # apply doesn't have a `na_action` keyword and for backward compat reasons
   1503 # we need to give `na_action="ignore"` for categorical data.
   1504 # TODO: remove the `na_action="ignore"` when that default has been changed in
   1505 #  Categorical (GH51645).
   1506 action = "ignore" if isinstance(obj.dtype, CategoricalDtype) else None
-> 1507 mapped = obj._map_values(
   1508     mapper=curried, na_action=action, convert=self.convert_dtype
   1509 )
   1511 if len(mapped) and isinstance(mapped[0], ABCSeries):
   1512     # GH#43986 Need to do list(mapped) in order to get treated as nested
   1513     #  See also GH#25959 regarding EA support
   1514     return obj._constructor_expanddim(list(mapped), index=obj.index)

File c:\Program Files\Python313\Lib\site-packages\pandas\core\base.py:921, in IndexOpsMixin._map_values(self, mapper, na_action, convert)
    918 if isinstance(arr, ExtensionArray):
    919     return arr.map(mapper, na_action=na_action)
--> 921 return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)

File c:\Program Files\Python313\Lib\site-packages\pandas\core\algorithms.py:1743, in map_array(arr, mapper, na_action, convert)
   1741 values = arr.astype(object, copy=False)
   1742 if na_action is None:
-> 1743     return lib.map_infer(values, mapper, convert=convert)
   1744 else:
   1745     return lib.map_infer_mask(
   1746         values, mapper, mask=isna(values).view(np.uint8), convert=convert
   1747     )

File lib.pyx:2972, in pandas._libs.lib.map_infer()

ValueError: invalid literal for int() with base 10: 'Rata-rata Harlan'

If I don't transpose transjakarta_data and try to apply(int) it shows error like this:

pd.read_excel("https://github.com/lokalhangatt/stackoverlow/raw/refs/heads/main/dataviz_day13/TJ_Agustus_2020.xlsx", 
              index_col=0).reset_index().index.apply(int)

Error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[255], line 1
----> 1 pd.read_excel("https://github.com/lokalhangatt/stackoverlow/raw/refs/heads/main/dataviz_day13/TJ_Agustus_2020.xlsx", index_col=0).reset_index().index.apply(int)

AttributeError: 'RangeIndex' object has no attribute 'apply'

How can I solve this?


Solution

  • Just checked the data myself, it seems that you might be trying to convert the index column 'index' rathen than the index itself into a new column called korridor. Transposing and resetting index also leads to the creation of a new column with {Rata-rata Harlan, rata-rata Weekdata, Rata-rata Weekend}, which further complicates the process. In case you would like the data to be aligned by the 'Koridor' in 13 rows as there are Koridors, I am attaching the code:

    import pandas as pd
    import geopandas as gpd
    transjakarta_lines = gpd.read_file('https://raw.githubusercontent.com/lokalhangatt/stackoverlow/refs/heads/main/dataviz_day13/transjakarta_lines.geojson')
    transjakarta_data = pd.read_excel("https://github.com/lokalhangatt/stackoverlow/raw/refs/heads/main/dataviz_day13/TJ_Agustus_2020.xlsx", index_col=0).reset_index() #no transposing required in this case 
    transjakarta = pd.merge(transjakarta_lines, transjakarta_data, on = 'koridor') #after resetting index both dataframes now have a mathcing columns, hence can be merged on it
    transjakarta
    

    Please do correct me in case I got the wrong idea of the intent of the code.