phplaravelperformancepopulatedatabase-optimization

Is there any other approach to update or chunk to populate a new column with millions rows?


My query is taking too long to run when I use general filters, but it's fine with specific filters. I suspect it's an optimization issue.

To fix it, I'm trying to create a non-clustered index on a new column. This column will extract a specific value from a JSON attributes column.

The table has over 2 million rows. I ran the migration and it created the new column instantly, but it's hanging indefinitely while trying to populate the data. I'm also planning to create a composite index on three columns, including this new one.

What is the best way to backfill a column on a large table without causing the migration to fail or hang?

<?php

use Illuminate\Database\Migrations\Migration;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Support\Facades\Schema;

return new class extends Migration
{
    public function up(): void
    {
        Schema::table('answer_tests', function (Blueprint $table) {
            $table->BigInteger('unity_id')->after('institution_id')->nullable();
        });

        DB::table('answer_tests')->orderBy('id')->chunkById(100, function ($rows) {
            foreach ($rows as $row) {
                DB::table('answer_tests')
                    ->where('id', $row->id)
                    ->update([
                        'unity_id' => DB::raw("JSON_UNQUOTE(JSON_EXTRACT(attributes, '$.class_school.unity.id'))"),
                    ]);
            }
        });

        Schema::table('answer_tests', function (Blueprint $table) {
            $table->index(['institution_id', 'unity_id', 'created_at'], 'idx_unity_institution_created_at');
        });

    }

    public function down(): void
    {
        Schema::table('answer_tests', function (Blueprint $table) {
            $table->dropIndex('idx_unity_institution_created_at');
            $table->dropColumn('unity_id');
        });
    }
};


Solution

  • running the backfill logic inside the migration file will most likely lock your db, or cause it to hang because you are running multiple db updates.
    Try these steps:
    1. Maintain the migration to just add the column keeping it simple and fast.
    2. Create an artisan command with raw SQL to run the backfill function in chunks of 5k-10k or your preferred size
    3. Create a new migration to create the index after backfill command runs

    See sample code for the artisan command to backfill your table in chunks.

    $batchSize = 5000;
    $lastId = 0;
    
    do {
        $updated = DB::update("
            UPDATE answer_tests
            SET unity_id = JSON_UNQUOTE(JSON_EXTRACT(attributes, '$.class_school.unity.id'))
            WHERE id > ? AND id <= ? AND unity_id IS NULL
        ", [$lastId, $lastId + $batchSize]);
    
        $lastId += $batchSize;
    } while ($updated > 0);