I have a table with a 19-digit numerical ID column, formatted as text, and another text column, fizz.
id fizz
1003674024206094336 foobarpingpong
When I run a Python script in PowerQuery which manipulates the data in fizz, I've noticed that the ID column gets auto-converted to float even when I don't reference that column in my script.
When I run the script it looks like this:
1.00367E+18
I noticed that after running the script, PowerQuery automatically inserts a step that converts the ID column to Int64 - but even when I delete that step and prevent that happening, the Python script itself still converts the column to this:
1.0036740242060943e+18
How can I prevent the Python script from doing anything to that column?
This is my code - you can see it doesn't touch the ID column:
import re
import pandas as pd
# find any text that starts with https until it hits a whitespace char or line end
dataset['urls'] = dataset['fizz'].str.findall(r'https\S*')
# explode the returned list so each item has its own row
# also drop any rows that have no values
dataset = (dataset.explode('urls').dropna(axis=0,subset=['urls']))
I've tried commenting out the explode
line but that's not the reason.
The solution to this was that the ID column, despite being formatted as text in the data model, was not formatted as text properly.
When I tried to make it the key column of the table, PowerBI told me that it had nulls, and therefore it couldn't be used.
I removed the null, and hey presto - the Python script worked as it should.