I wrote a script that imports data from an excel file to the database, but the runtime is extremely slow. It worked fine with the sample excel that had 20 row, but I will need to import from excels that have 50k-100k+ rows. Is there a way to make my script run a lot faster?
This is my first attempt at making an import script ever, so any advice would be appriciated.
<?php
namespace App\Services\Import;
use App\Models\Tags;
use App\Models\Videos;
use GuzzleHttp\Client;
use App\Models\Stars;
use App\Models\Categories;
use Illuminate\Support\Str;
use Illuminate\Support\Facades\DB;
use Illuminate\Support\Facades\Log;
use Maatwebsite\Excel\Facades\Excel;
use Illuminate\Support\Facades\Storage;
use Spatie\MediaLibrary\MediaCollections\Models\Media;
class DataImport
{
public function import()
{
$path = Storage::disk('public')->path('database.xlsx');
$data = Excel::toArray([], $path);
for ($i = 1; $i < count($data[0]); $i++) {
$row = $data[0][$i];
$video = Videos::where('url', $row[0])->first();
if(!$video){
$slug = Str::slug($row[1], '-');
$i = 1;
while(Videos::where('slug', $slug)->exists()) {
$slug = Str::slug($row[1], '-') . '-' . $i;
$i++;
}
$video = new Videos();
$video->url = $row[0];
$video->title = $row[1];
$video->slug = $slug;
$video->length = intval($row[2]);
$video->code = intval($row[7]);
$video->embed_url = $row[4];
$video->channel = $row[9];
$video->save();
}
$title = str_replace('_', ' ', $row[8]);
$slug = strtolower(str_replace(' ', '-', $title));
$title = ucfirst(strtolower($title));
$category = Categories::firstOrCreate(['title' => $title], ['slug' => $slug]);
$video->categories()->syncWithoutDetaching([$category->id]);
$names = explode(',', $row[6]);
foreach ($names as $name) {
$name = trim($name);
$slug = strtolower(str_replace(' ', '-', $name));
$star = Stars::firstOrCreate(['name' => $name], ['slug' => $slug]);
$video->stars()->syncWithoutDetaching([$star->id]);
}
$tags = explode(',', $row[5]);
foreach ($tags as $tagName) {
$tagName = str_replace('-', ' ', $tagName);
$tag = Tags::firstOrCreate(['name' => $tagName, 'slug' => Str::slug($tagName)]);
$tagIds[] = $tag->id;
}
$video->tags()->syncWithoutDetaching($tagIds);
$media = Media::where('file_name', $row[3])->first();
if (!$media) {
$media = new Media();
$filename = basename($row[3]);
$media->file_name = $filename;
$media->model_type = 'App\Models\Videos';
$media->model_id = $video->id;
$media->collection_name = 'video_index_images';
$media->name = pathinfo($row[3], PATHINFO_FILENAME);
$media->disk = 'public';
$media->conversions_disk = 'public';
$media->manipulations = [];
$media->custom_properties = [];
$generatedConversionsString = '{"video_index_images":true}';
$generatedConversionsStringNoSlashes = stripslashes($generatedConversionsString);
$generatedConversions = json_decode($generatedConversionsStringNoSlashes, true);
$media->generated_conversions = $generatedConversions;
$media->responsive_images = [];
$media->mime_type = 'image/jpeg';
try {
$client = new Client();
$response = $client->get($row[3], ['http_errors' => false]);
$image = (string) $response->getBody();
$size = strlen($image);
$media->size = $size;
$media->uuid = (string) Str::uuid();
$media->save();
$storagePath = $media->id . '/';
Storage::disk('public')->put($storagePath . $media->file_name, $image);
} catch (\GuzzleHttp\Exception\ConnectException $e) {
Log::error('Could not connect to ' . $row[3] . ': ' . $e->getMessage());
continue;
}
}
$video->media()->touch();
}
}
}
I think you can speed up, at least a bit, your import by reducing the number of queries done by the script, at the moment you will make at least 100k queries to the db when you execute the full import.
For example you can build a video map with the url as the key and the model as value before you start looping, like this:
$videoMap = [];
$videosCollection = Videos::all()->get();
foreach($videosCollection as $video){
$videoMap[$video->url] = $video;
}
in the loop you will replace the $video = Videos::where('url', $row[0])->first();
function with a isset and you will get something like this:
$video = isset($videoMap[$row[0]]) ? $videoMap[$row[0]] : null;
after this you need also to update the map when you save a video in the db.
I hope I've helped you.