python-3.xdataframetype-conversionhex

Converting Pandas dataframe of hex numbers to integers and floats and split it into single columns


I would like to convert a pandas dataframe which contains space separated hex numbers into integers and floats (some columns only contain integers, some columns are floats). The dataframe has an index column (its a time variable).

The dataframe looks like this:

print(selected_df.XData)
DataSrvTime
2021-07-08T08:43:29.0616419     C7 10 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...
2021-07-08T08:43:30.0866790     C2 16 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...
2021-07-08T08:43:31.1107931     CB E 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
2021-07-08T08:43:32.1398927     BF 13 2 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...
2021-07-08T08:43:33.1697282     BA 15 2 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...
                                                     ...                        
2021-07-08T11:12:51.1695194     4 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...
2021-07-08T11:12:52.2000730     5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...
2021-07-08T11:12:53.2248873     4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...
2021-07-08T11:12:54.2574457     2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...
2021-07-08T11:12:56.3157504     6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...
Name: XData, Length: 7799, dtype: object

First of all I split the dataset -each into an own column. The first character of the dataframe is a delimiter. I dismissed the first column therefor to only keep the data containing columns and add a column name:

# Defining the column names:
header = ["00","01","02","03","04","05","06","07","08","09","10","11","12","13","14","15","16","17","18","19","20","21","22","23","MTF1","MTF3","MTF5","MTF7","SP","SFR","T","RH","PM_1","PM_2","PM_3","#RG","#RL","#RR","#RC","LS","Checksum"] 

# split data into single columns
x_df = selected_df.XData.str.split(' ', expand=True)

# dismiss first delimiter column
x_df.drop(0, inplace=True, axis=1)

#add column names
x_df.columns = header

Now, I tried different ways to convert the hex data into integers and/or floats which all of them result in errors. Maybe anyone of you has an idea to make it better than me..

# Simply test apply solution without header yet:
res = x_df.apply(int, base = 16)

Results in this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-107-004c8b72cd04> in <module>
---> 39 res = x_df.apply(int, base = 16)

c:\python38\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, raw, result_type, args, **kwargs)
   8831             kwargs=kwargs,
   8832         )
-> 8833         return op.apply().__finalize__(self, method="apply")
   8834 
   8835     def applymap(

c:\python38\lib\site-packages\pandas\core\apply.py in apply(self)
    725             return self.apply_raw()
    726 
--> 727         return self.apply_standard()
    728 
    729     def agg(self):

c:\python38\lib\site-packages\pandas\core\apply.py in apply_standard(self)
    849 
    850     def apply_standard(self):
--> 851         results, res_index = self.apply_series_generator()
    852 
    853         # wrap results

c:\python38\lib\site-packages\pandas\core\apply.py in apply_series_generator(self)
    865             for i, v in enumerate(series_gen):
    866                 # ignore SettingWithCopy here in case the user mutates
--> 867                 results[i] = self.f(v)
    868                 if isinstance(results[i], ABCSeries):
    869                     # If we have a view on v, we need to make a copy because

c:\python38\lib\site-packages\pandas\core\apply.py in f(x)
    136 
    137             def f(x):
--> 138                 return func(x, *args, **kwargs)
    139 
    140         else:

TypeError: int() can't convert non-string with explicit base

Doing a print(x_df.dtypes) shows that all columns are of "object" type. I thougt that the str.split made already strings of the splitted columns?

Then, the resulting datatypes of the first 24 columns should be integer while the datatype of the rest is float - except of the last one, which is a simple checksum.

Do I have to solve that with a loop?

Thanks for reading


Solution

  • I found a semi-solution (integer values only in this step):

    With a for loop I iterate through the columns (in this case I overwrite them; if not intented I could keep the original data and create a new dataframe with manipulated = x_df.copy() )

    To avoid the error I got earlier I have to add the iteration variable to specify the current column:

    # convert hex data to int
    for column in x_df:
        x_df[column] = x_df[column].apply(int, base=16)
    print(x_df)
    

    Remark: But still this only works if there are no NaN-values somewhere in the data. NaNs have to be dropped before.