My task is:
"To take transactions table, grouped row by transaction date and calculate statuses. This manipulations will be formed statistics, which will be rendered on the page".
This is my method of this statistics generation:
public static function getStatistics(Website $website = null)
{
if($website == null) return [];
$query = \DB::table('transactions')->where("website_id", $website->id)->orderBy("dt", "desc")->get();
$transitions = collect(static::convertDate($query))->groupBy("dt");
$statistics = collect();
dd($transitions);
foreach ($transitions as $date => $trans) {
$subscriptions = $trans->where("status", 'subscribe')->count();
$unsubscriptions = $trans->where("status", 'unsubscribe')->count();
$prolongations = $trans->where("status", 'rebilling')->count();
$redirections = $trans->where("status", 'redirect_to_lp')->count();
$conversion = $redirections == 0 ? 0 : ((float) ($subscriptions / $redirections));
$earnings = $trans->sum("pay");
$statistics->push((object)[
"date" => $date,
"subscriptions" => $subscriptions,
'unsubscriptions' => $unsubscriptions,
'prolongations' => $prolongations,
'redirections' => $redirections,
'conversion' => round($conversion, 2),
'earnings' => $earnings,
]);
}
return $statistics;
}
if count of transaction rows below 100,000 - it's all right. But, if count is above 150-200k - nginx throws a 502 bad gateway. What can you advise to me? I'm don't have any experience in bigdata handling. Maybe, my implementation has a fundamental error?
After several days of researching information on this question, I found the right answer:
NOT to use PHP for handling raw data. It's better to use SQL!
In my case, we are using PostgreSQL.
Below, i'll write sql-query which worked for me, maybe it will help someone else.
WITH
cte_range(dt) AS
(
SELECT
generate_series('2016-04-01 00:00:00'::timestamp with time zone, '{$date} 00:00:00'::timestamp with time zone, INTERVAL '1 day')
),
cte_data AS
(
SELECT
date_trunc('day', dt) AS dt,
COUNT(*) FILTER (WHERE status = 'subscribe') AS count_subscribes,
COUNT(*) FILTER (WHERE status = 'unsubscribe') AS count_unsubscribes,
COUNT(*) FILTER (WHERE status = 'rebilling') AS count_rebillings,
COUNT(*) FILTER (WHERE status = 'redirect_to_lp') AS count_redirects_to_lp,
SUM(pay) AS earnings,
CASE
WHEN COUNT(*) FILTER (WHERE status = 'redirect_to_lp') > 0 THEN 100.0 * COUNT(*) FILTER (WHERE status = 'subscribe')::float / COUNT(*) FILTER (WHERE status = 'redirect_to_lp')::float
ELSE 0
END
AS conversion_percent
FROM
transactions
WHERE
website_id = {$website->id}
GROUP BY
date_trunc('day', dt)
)
SELECT
to_char(cte_range.dt, 'YYYY-MM-DD') AS day,
COALESCE(cte_data.count_subscribes, 0) AS count_subscribe,
COALESCE(cte_data.count_unsubscribes, 0) AS count_unsubscribes,
COALESCE(cte_data.count_rebillings, 0) AS count_rebillings,
COALESCE(cte_data.count_redirects_to_lp, 0) AS count_redirects_to_lp,
COALESCE(cte_data.conversion_percent, 0) AS conversion_percent,
COALESCE(cte_data.earnings, 0) AS earnings
FROM
cte_range
LEFT JOIN
cte_data
ON cte_data.dt = cte_range.dt
ORDER BY
cte_range.dt DESC