sqlgoogle-bigquerydistincthyperloglog

Possible to do a pivot outside of BigQuery?


Let's say I'm looking to build the following pivot table:

// count by age

 age     male     female
 1-25    18       23         
 26-100  19       10

To do this, I can do a basic aggregation like this:

SELECT age, gender, count(*) GROUP BY age, gender

And get results such as:

gender   age    count
male     1-25   18    
male     26-100 19    
male     1-25   23    
male     26-100 10

And then I can 'build the results' outside of BigQuery/SQL (such as in javascript or even pandas) to get the pivoted result.

However, I run into trouble when the values can't be simply "added up" or calculated in some straightforward fashion. For example:

// distinct users by country

country    male       female
us         192,293    64,000
jp         1,203,203  1,000,000

Is there a way to grab a value in BigQuery so that we can do this calculation outside of BigQuery? I've posted a preceding question to this here where it seems like you cannot use the HLL_COUNT, as that's not available outside. [Or is there some other type of algorithm that can be applied to build some sort of unique sketch that can be merged outside SQL?].

Or is there another approach to building pivot results (outside of BigQuery) that I'm missing?

Update: The above examples are more to give an overview of the issue. The actual pivot queries would be like this: Get the top patent countries, codes in a BQ public dataset, on both the X and Y axis.


Solution

  • I don't understand. Why not do this inside BigQuery?

    SELECT age,
           COUNTIF( gender = 'male' ) as males,
           COUNTIF( gender = 'female' ) as females
    GROUP BY age;