pythonpandaspowerbi

How to stop Python script in PowerBI auto-converting columns to float


I have a table with a 19-digit numerical ID column, formatted as text, and another text column, fizz.

id   fizz
1003674024206094336   foobarpingpong

When I run a Python script in PowerQuery which manipulates the data in fizz, I've noticed that the ID column gets auto-converted to float even when I don't reference that column in my script.

When I run the script it looks like this:

1.00367E+18

I noticed that after running the script, PowerQuery automatically inserts a step that converts the ID column to Int64 - but even when I delete that step and prevent that happening, the Python script itself still converts the column to this:

1.0036740242060943e+18

How can I prevent the Python script from doing anything to that column?

This is my code - you can see it doesn't touch the ID column:

import re
import pandas as pd

# find any text that starts with https until it hits a whitespace char or line end
dataset['urls'] = dataset['fizz'].str.findall(r'https\S*')

# explode the returned list so each item has its own row
# also drop any rows that have no values
dataset = (dataset.explode('urls').dropna(axis=0,subset=['urls']))

I've tried commenting out the explode line but that's not the reason.


Solution

  • The solution to this was that the ID column, despite being formatted as text in the data model, was not formatted as text properly.

    When I tried to make it the key column of the table, PowerBI told me that it had nulls, and therefore it couldn't be used.

    I removed the null, and hey presto - the Python script worked as it should.