I'm trying to use python's olap framework cubes on a very simple database, but I am having some trouble joining tables.
My schema looks like this:
Users table
ID | name
Products table
ID | name | price
Purchases table
ID | user_id | product_id | date
And the cubes model:
{
'dimensions': [
{'name': 'user_id'},
{'name': 'product_id'},
{'name': 'date'},
],
'cubes': [
{
'name': 'purchases',
'dimensions': ['user_id', 'product_id', 'date'],
'measures': ['price']
'mappings': {
'purchases.user_id': 'users.id',
'purchases.product_id': 'products.id',
'purchases.price': 'products.price'
},
'joins': [
{
'master': 'purchases.user_id',
'detail': 'users.id'
},
{
'master': 'purchases.product_id',
'detail': 'products.id'
}
]
}
]
}
Now I would like to display all the purchases, showing the product's name, user's name and purchase date. I can't seem to find a way to do this. The documentation is a bit scarce.
Thank you
First let's fix the model a bit. In your schema you have more attributes per dimension: id and name, you might end up having more details in the future. You can add them by specifying attributes as a list: "attriubtes": ["id", "name"]
. Note also that the dimension is named as entity product
not as a key id_product
. The key id_product
is just an attribute of the product
dimension, as is name
or in the future maybe category
. Dimension reflects analysts point of view.
For the time being, we ignore the fact that date should be a special dimension and consider date as single-value key, for example a year, not to make things complicated here.
"dimensions": [
{"name": "user", "attributes": ["id", "name"]},
{"name": "product", "attributes": ["id", "name"]},
{"name": "date"}
],
Because we changed names of the dimensions, we have to change them in the cube's dimension list:
"cubes": [
{
"name": "purchases",
"dimensions": ["user", "product", "date"],
...
Your schema reflects classic transactional schema, not traditional data warehouse schema. In this case, you have to be explicit, as you were, and mention all necessary mappings. The rule is: if the attribute belongs to a fact table (logical view), then the key is just attribute
, such as price
, no table specification. If the attribute belongs to a dimension, such as product.id
, then the syntax is dimension.attribute
. The value of the mappings dictionary is physical table and physical column. See more information about mappings. Mappings for your schema look like:
"mappings": {
"price": "products.price",
"product.id": "products.id",
"product.name": "products.name",
"user.id": "users.id",
"user.name": "users.name"
}
You would not have to write mappings if your schema was:
fact purchases
id | date | user_id | product_id | amount
dimension product
id | name | price
dimension user
id | name
In this case you will need only joins, because all dimension attributes are in their respective dimension tables. Note the amount
in the fact table, which in your case, as you do not have count
of purchased products per purchase, would be the same as price
in product
.
Here is the updated model for your model:
{
"dimensions": [
{"name": "user", "attributes": ["id", "name"]},
{"name": "product", "attributes": ["id", "name"]},
{"name": "date"}
],
"cubes": [
{
"name": "purchases",
"dimensions": ["user", "product", "date"],
"measures": ["price"],
"mappings": {
"price": "products.price",
"product.id": "products.id",
"product.name": "products.name",
"user.id": "users.id",
"user.name": "users.name"
},
"joins": [
{
"master": "purchases.user_id",
"detail": "users.id"
},
{
"master": "purchases.product_id",
"detail": "products.id"
}
]
}
]
}
You can try the model without writing any Python code, just by using the slicer
command. For that you will need slicer.ini
configuration file:
[server]
backend: sql
port: 5000
log_level: info
prettyprint: yes
[workspace]
url: sqlite:///data.sqlite
[model]
path: model.json
Change url
in [workspace]
to point to your database and change path
in [model]
to point to your model file. Now you can try:
curl "http://localhost:5000/aggregate"
Also try to drill-down:
curl "http://localhost:5000/aggregate?drilldown=product"
If you need any further help, just let me know, I'm the Cubes author.