laravellaravel-5.4lumenlumen-5.4

Laravel - Collection with relations take a lot of time


We are developing an API with LUMEN. Today we had a confused problem with getting the collection of our "TimeLog"-model. We just wanted to get all time logs with additional informationen from the board model and task model. In one row of time log we had a board_id and a task_id. It is a 1:1 relation on both.

This was our first code for getting the whole data. This took a lot of time and sometimes we got a timeout: BillingController.php

public function byYear() {

       $timeLog = TimeLog::get(); 

        $resp = array(); 

        foreach($timeLog->toArray() as $key => $value) {  

            if(($timeLog[$key]->board_id && $timeLog[$key]->task_id) > 0 ) {      

                 array_push($resp, array(
                    'board_title' => isset($timeLog[$key]->board->title) ? $timeLog[$key]->board->title : null,
                    'task_title' => isset($timeLog[$key]->task->title) ? $timeLog[$key]->task->title : null,
                    'id' => $timeLog[$key]->id
                )); 
            }
        }


        return response()->json($resp);
    }   

The TimeLog.php where the relation has been made.

public function board()
        {
            return $this->belongsTo('App\Board', 'board_id',  'id');
        }

        public function task()
        {
            return $this->belongsTo('App\Task', 'task_id',  'id');
        }

Our new way is like this: BillingController.php

 public function byYear() {



            $timeLog = TimeLog::
join('oc_boards', 'oc_boards.id', '=', 'oc_time_logs.board_id')
                            ->join('oc_tasks', 'oc_tasks.id', '=', 'oc_time_logs.task_id')
                            ->join('oc_users', 'oc_users.id', '=', 'oc_time_logs.user_id')
                            ->select('oc_boards.title AS board_title', 'oc_tasks.title AS task_title','oc_time_logs.id','oc_time_logs.time_used_sec','oc_users.id AS user_id')
                            ->getQuery()
                            ->get(); 

            return response()->json($timeLog);
        }   

We deleted the relation in TimeLog.php, cause we don't need it anymore. Now we have a load time about 1 sec, which is fine! There are about 20k entries in the time log table.

My questions are:

  1. Why is the first method out of range (what causes the timeout?)
  2. What does getQuery(); exactly do?

If you need more information just ask me.


Solution

  • --First Question--

    One of the issues you might be facing is having all those huge amount of data in memory, i.e:

    $timeLog = TimeLog::get();
    

    This is already enormous. Then when you are trying to convert the collection to array:

    1. There is a loop through the collection.
    2. Using the $timeLog->toArray() while initializing the loop based on my understanding is not efficient (I might not be entirely correct about this though)
    3. Thousands of queries are made to retrieve the related models

    So what I would propose are five methods (one which saves you from hundreds of query), and the last which is efficient in returning the result as customized:

    1. Since you have many data, then chunk the result ref: Laravel chunk so you have this instead:

      $timeLog = TimeLog::chunk(1000, function($logs){
          foreach ($logs as $log) {
          // Do the stuff here
          }
      }); 
      
    2. Other way is using cursor (runs only one query where the conditions match) the internal operation of cursor as understood is using Generators.

      foreach (TimeLog::where([['board_id','>',0],['task_id', '>', 0]])->cursor() as $timelog) {
        //do the other stuffs here
      }
      
    3. This looks like the first but instead you have already narrowed your query down to what you need:

      TimeLog::where([['board_id','>',0],['task_id', '>', 0]])->get()
      
    4. Eager Loading would already present the relationship you need on the fly but might lead to more data in memory too. So possibly the chunk method would make things more easier to manage (even though you eagerload related models)

      TimeLog::with(['board','task'],  function ($query) {
          $query->where([['board_id','>',0],['task_id', '>', 0]]);
      }])->get();
      
    5. You can simply use Transformer

      • With transformer, you can load related model, in elegant, clean and more controlled methods even if the size is huge, and one greater benefit is you can transform the result without having to worry about how to loop round it You can simply refer to this answer in order to perform a simple use of it. However incase you don't need to transform your response then you can take other options.

    Although this might not entirely solve the problem, but because the main issues you face is based on memory management, so the above methods should be useful.

    --Second question--

    Based on Laravel API here You could see that:

    enter image description here

    It simply returns the underlying query builder instance. To my observation, it is not needed based on your example.

    UPDATE

    For question 1, since it seems you want to simply return the result as response, truthfully, its more efficient to paginate this result. Laravel offers pagination The easiest of which is SimplePaginate which is good. The only thing is that it makes some few more queries on the database, but keeps a check on the last index; I guess it uses cursor as well but not sure. I guess finally this might be more ideal, having:

    return TimeLog::paginate(1000);