I am trying to query a table in BigQuery with python with the sample code given in the documentation:
query_job = bq_client.run_async_query(str(uuid.uuid4()), sql_query)
query_job.use_legacy_sql = False
query_job.begin()
query_job.result()
destination_table = query_job.destination
destination_table.reload()
query_result = destination_table.fetch_data()
for row in query_result:
print row
I get results looking like:
('?S\x9e\xbe\x9b\xadB\xd8\x92**5\xcck\xee]', 15, 28, 28, datetime.date(2017, 9, 5))
('%SwP\xe6xMK\x99T\xa1\x7f\xbbk>\xff', 15, 45, 19, datetime.date(2017, 9, 21))
('\xd7\x99>\x05(\x94M\xd8\x92\x0e\xe77\x8b\xcc\x08\xf0', 15, 18, 18, datetime.date(2017, 9, 4))
('\x0f+\xe7\xe0\xba5Ei\xa3\xb0\xfd\xd8\x1a\xa2wy', 15, 16, 16, datetime.date(2017, 9, 21))
('\xa3C\x0b\xdb\xebqM\xa1\xb3f\r\x8f#\x85\x93<', 15, 20, 16, datetime.date(2017, 9, 21))
(':\x1cZ?\x13\xf1A\xf5\x8e\xba\xfeYL.|v', 15, 15, 15, datetime.date(2017, 9, 6))
('\x1bP\x88\xd7\x1a\xfbGC\xb5$\x10\x97gx<\xb2', 15, 15, 15, datetime.date(2017, 9, 19))
('(B\xbc\xb7\xe9\xc0D\x89\xb0\x82jfW;,\x1e', 15, 18, 14, datetime.date(2017, 9, 19))
('\xd8\xbaw\x88\x89<Oh\x81]v\xa8!-\x83\x7f', 15, 17, 13, datetime.date(2017, 9, 6))
("\x94\x1f'\xf1\xd1$C\x9b\xb4o\x81H\x17\xf4\xa5S", 10, 14, 12, datetime.date(2017, 9, 5))
('\x949\x17\xbf\x90\xd7L\x04\x98\xe9+5\x9d\x1a\xb4\xe4', 15, 12, 12, datetime.date(2017, 9, 21))
The first field is a byte type that's supposed to be a b64 uuid. The actual result in the table, the UI, with the java wrapper, and in command line (with the same query) is:
P1OevputQtiSKio1zGvuXQ== 15 28 28 2017-09-05
JVN3UOZ4TUuZVKF/u2s+/w== 15 45 19 2017-09-21
15k+BSiUTdiSDuc3i8wI8A== 15 18 18 2017-09-04
o0ML2+txTaGzZg2PI4WTPA== 15 20 16 2017-09-21
Dyvn4Lo1RWmjsP3YGqJ3eQ== 15 16 16 2017-09-21
G1CI1xr7R0O1JBCXZ3g8sg== 15 15 15 2017-09-19
OhxaPxPxQfWOuv5ZTC58dg== 15 15 15 2017-09-06
KEK8t+nARImwgmpmVzssHg== 15 18 14 2017-09-19
2Lp3iIk8T2iBXXaoIS2Dfw== 15 17 13 2017-09-06
lB8n8dEkQ5u0b4FIF/SlUw== 10 14 12 2017-09-05
FJZXBYCAR8mQQwEeuuKKhQ== 15 12 12 2017-09-19
APVxsNTHSrCSU0z6QWdXSw== 15 12 12 2017-09-05
lDkXv5DXTASY6Ss1nRq05A== 15 12 12 2017-09-21
I can't seem to have the full byte array in the python answer, whereas I get it with every other wrapper. Here's the query:
SELECT *
FROM (SELECT c.rich_log.user_id,
w.topic_id, count(*) as event_count,
count(CASE WHEN c.header.timestamp > (UNIX_MILLIS(CURRENT_TIMESTAMP()) - 2506000000)
then 1 else null end) as event_count_in_lookback_window,
date(TIMESTAMP_MILLIS(max(c.header.timestamp))) as last_event
FROM `table1_*` as c
join `table2` as w on w.id = c.rich_log.website_id
where w.topic_id in {}
and c.header.timestamp > UNIX_MILLIS(CURRENT_TIMESTAMP()) - 3456000000
group by c.rich_log.user_id, w.topic_id)
WHERE event_count_in_lookback_window > 0
ORDER BY event_count_in_lookback_window DESC
LIMIT 100
Does anyone see why this is happening? my byte field seems to be truncated.
Thanks,
Anton
Nothing is truncated here. Instead, python is showing the raw binary data, while the other interfaces are base64 encoding the data for you. To show the result is equivalent in the two formats:
$ printf '?S\x9e\xbe\x9b\xadB\xd8\x92**5\xcck\xee]' | base64
P1OevputQtiSKio1zGvuXQ==
This is happening as your UUID is being stored in a BYTES field. This is the most efficient and correct way to store such data, but does mean different systems will by default convert to strings in different way.
In python, there is a base64 module that will covert to base64 string if you want.
Python also has a uuid module that you can use to change the bytes directly into a UUID object:
>>> import uuid
>>> print uuid.UUID(bytes=b'?S\x9e\xbe\x9b\xadB\xd8\x92**5\xcck\xee]')
3f539ebe-9bad-42d8-922a-2a35cc6bee5d