pythonpython-3.xcoordinatespython-polarspolars

Detect coordinate precision in polars floats?


I have some coordinate data; some of it high precision, some of it low precision thanks to multiple data sources and other operational realities. I want to have a column that indicates the relative precision of the coordinates. So far, what I want is to essentially count digits after the decimal; in my case more digits indicates higher precision data. In my case I usually get data like the data in the example; its either coming with five to six digits precision or just one digit. Both are useful; but we can do more analysis on higher precision data as you may imagine.

This code does what I want, but it seems .... wordy, inelegant; as if I'm being paid by the line of code. Is there a simpler way to do this?

import polars as pl

df = pl.DataFrame(
    {
            "lat":  [ 43.6425047,  43.6,  40.688966,  40.6],
            "lng":  [-79.3861057, -79.3, -74.044438, -74.0],
    }
)

df = (df.with_columns(
    pl.col("lat").cast(pl.String)
    .str.split_exact(".", 1)
    .struct.rename_fields(["lat_major", "lat_minor"])
    .alias("lat_temp"))
 .unnest("lat_temp")
 .with_columns(
    pl.col("lat_minor")
    .str.len_chars()
    .alias("lat_precision"))
 .drop("lat_major", "lat_minor")
 .with_columns(
    pl.col("lng").cast(pl.String)
    .str.split_exact(".", 1)
    .struct.rename_fields(["lng_major", "lng_minor"])
    .alias("lng_temp"))
    .unnest("lng_temp")
 .with_columns(
    pl.col("lng_minor")
    .str.len_chars()
    .alias("lng_precision"))
 .drop("lng_major", "lng_minor")
 .with_columns(
    pl.col("lat_precision")
    .add(pl.col("lng_precision"))
    .alias("precision"))
 .drop("lat_precision", "lng_precision")
 )
df.head()

results in

shape: (4, 3)
┌───────────┬────────────┬───────────┐
│ lat       ┆ lng        ┆ precision │
│ ---       ┆ ---        ┆ ---       │
│ f64       ┆ f64        ┆ u32       │
╞═══════════╪════════════╪═══════════╡
│ 43.642505 ┆ -79.386106 ┆ 14        │
│ 43.6      ┆ -79.3      ┆ 2         │
│ 40.688966 ┆ -74.044438 ┆ 12        │
│ 40.6      ┆ -74.0      ┆ 2         │
└───────────┴────────────┴───────────┘

later I might pull out records with precision over 5, for instance, as my source data tends to be either one decimal point precision or four+ decimal points precision per coordinate.


Solution

  • You can extract the minor fields directly without the need for temp columns and unnesting.

    df.with_columns(
        pl.col("lat", "lng").cast(pl.String)
          .str.split_exact(".", 1)
          .struct.field("field_1")
          .str.len_chars()
          .name.suffix("_minor")
    )
    
    shape: (4, 4)
    ┌───────────┬────────────┬───────────┬───────────┐
    │ lat       ┆ lng        ┆ lat_minor ┆ lng_minor │
    │ ---       ┆ ---        ┆ ---       ┆ ---       │
    │ f64       ┆ f64        ┆ u32       ┆ u32       │
    ╞═══════════╪════════════╪═══════════╪═══════════╡
    │ 43.642505 ┆ -79.386106 ┆ 7         ┆ 7         │
    │ 43.6      ┆ -79.3      ┆ 1         ┆ 1         │
    │ 40.688966 ┆ -74.044438 ┆ 6         ┆ 6         │
    │ 40.6      ┆ -74.0      ┆ 1         ┆ 1         │
    └───────────┴────────────┴───────────┴───────────┘
    

    We're using a single pl.col("lat", "lng") call here which will go through an "expansion" step, i.e.

    pl.col("lat", "lng").foo().bar()
    

    is expanded into individual expressions.

    pl.col("lat").foo().bar(),
    pl.col("lng").foo().bar()
    

    pl.sum_horizontal() can be used if you just want the totals.

    df.with_columns(
        pl.sum_horizontal(
            pl.col("lat", "lng").cast(pl.String)
              .str.split_exact(".", 1)
              .struct.field("field_1")
              .str.len_chars()
        )
        .alias("precision")
    )
    
    shape: (4, 3)
    ┌───────────┬────────────┬───────────┐
    │ lat       ┆ lng        ┆ precision │
    │ ---       ┆ ---        ┆ ---       │
    │ f64       ┆ f64        ┆ u32       │
    ╞═══════════╪════════════╪═══════════╡
    │ 43.642505 ┆ -79.386106 ┆ 14        │
    │ 43.6      ┆ -79.3      ┆ 2         │
    │ 40.688966 ┆ -74.044438 ┆ 12        │
    │ 40.6      ┆ -74.0      ┆ 2         │
    └───────────┴────────────┴───────────┘