I have a dataset with the current stock for some products:
+--------------+-------+
| Product | Stock |
+--------------+-------+
| chocolate | 300 |
| coal | 70 |
| orange juice | 400 |
+--------------+-------+
and the sales for every product over the years for the current month and the next month in another dataset:
+--------------+------+-------+-------+
| Product | Year | Month | Sales |
+--------------+------+-------+-------+
| chocolate | 2017 | 05 | 55 |
| chocolate | 2017 | 04 | 250 |
| chocolate | 2016 | 05 | 70 |
| chocolate | 2016 | 04 | 200 |
| | | | | | | | |
| coal | 2017 | 05 | 40 |
| coal | 2017 | 04 | 30 |
| coal | 2016 | 05 | 50 |
| coal | 2016 | 04 | 20 |
| | | | | | | | |
| orange juice | 2017 | 05 | 400 |
| orange juice | 2017 | 04 | 350 |
| orange juice | 2016 | 05 | 400 |
| orange juice | 2016 | 04 | 300 |
+--------------+--------------+-------+
I want to compute the stock that I will need to order for the next month, by computing the expected sales over the current month and the next month, using the following formula:
ExpectedSales = max(salesMaxCurrentMonth) + max(salesMaxNextMonth)
The orders will then be
Orders = ExpectedSales * (1 + margin) - Stock
Where margin is, for example, 10%.
I tried to group by several columns using GroupBy
, as in the following, but it seems to aggregate by Stock
instead of Product
:
salesDataset
.groupBy(Columns.col("Month"), Columns.col(“Product”))
.agg(Columns.max(“Sales”).as(“SalesMaxPerMonth”))
.agg(Columns.sum(“SalesMaxPerMonth”).as(SalesPeriod))
.withColumn(
“SalesExpected”,
Columns.col(“SalesPeriod”).multiply(Columns.literal(1 + margin)))
.withColumn(
“Orders”,
Columns.col(“SalesExpected”).minus(Columns.col(“Stock”)))
.withColumn(
“Orders”,
Columns.col(“Orders”).map((Double a) -> a >= 0 ? a: 0))
.doNotAggregateAbove()
.toCellSet()
.show();
You got the logic correct in terms of aggregation but there is another way to build your CellSet
, where you provide a map to describe the location of the query which generates it.
salesDataset
.groupBy(Columns.col("Month"), Columns.col(“Product”))
.agg(Columns.max(“Sales”).as(“SalesMaxPerMonth”))
.agg(Columns.sum(“SalesMaxPerMonth”).as(SalesPeriod))
.withColumn(
“SalesExpected”,
Columns.col(“SalesPeriod”).multiply(Columns.literal(1 + margin)))
.withColumn(“Orders”, Columns.col(“SalesExpected”).minus(Columns.col(“Stock”)))
.withColumn(“Orders”, Columns.col(“Orders”).map((Double a) -> a >= 0 ? a: 0))
.doNotAggregateAbove()
.toCellSet(
Empty.<String, Object>map()
.put(“Product”,null)
.put(“Stock”, null))
.show();
Where null
in a location represents the wildcard *
.