mysqlperformancegomariadbmysql-slow-query-log

Is there a way to parallelise time.Sleep but keeping the effective execution time in Go?


I'm developing a slow query logs parser package associated with a slow query log replayer in golang. For the replayer, have the following piece of code (in which I added comments for readability):

for {
        // method from my package that returns a Query object, containing headers values
        // and the query itself
        q := p.GetNext()
        if q == (query.Query{}) {
            break
        }
        db.logger.Tracef("query: %s", q.Query)

        // we send the SQL query to a chan that is read by workers.
        // workers just execute the query on the database, that's all.
        // results from the db query are handled in another piece of the code, it doesn't really
        // matter here
        queries <- q.Query

        // We need a reference time
        if firstPass {
            firstPass = false
            previousDate = q.Time
            continue
        }
        
        // Time field contains the Time: field value in the query header
        now := q.Time
        sleeping := now.Sub(previousDate)
        db.logger.Tracef("next sleeping time: %s", sleeping)
        time.Sleep(sleeping) // Here is my issue. For MariaDB the value is < 0 so no sleep is done

        // For MariaDB, when there is multiple queries in a short amount of
        // time, the Time field is not repeated, so we do not have to update
        // the previous date.
        if now != (time.Time{}) {
            previousDate = now
        }
    }

I ran into an interesting issue: In MariaDB slow query logs, if 2 (or more) queries are close to each other, there is no Time: field in the header, which reduce the number of time.Sleep(sleeping) in the previous code snippet. However, with MySQL-style slow query logs, there is always a Time: field in the query header, which means that the sleep is done for each query (even for µs sleep duration).

I noticed a huge replaying time difference between MariaDB and MySQL logs; MariaDB replaying duration was quite similar to the real time (time difference between the first and the last query of the log file), but on the other hand MySQL replaying time were quite higher that IRL. After playing with pprof I noticed that the problem came from time.Sleep, especially from runtime.Futex that is CPU time consuming.

I did some benchmarking and the duration results correlate the number of time.Sleep that is done (which is higher with MySQL than with MariaDB).

So instead of doing all the time.Sleep in a single thread, I'm looking for a different way to execute them in parallel without altering the effective time, but I can't figure a way to do this.


Solution

  • The solution I came along with is the following:

    type job struct {
        query string
        idle  time.Time
    }
    
    ...
        var reference time.Time
        start := time.Now()
        for {
            q := p.GetNext()
            if q == (query.Query{}) {
                s.Stop()
                break
            }
            db.logger.Tracef("query: %s", q.Query)
    
            r.queries++
            s.Suffix = " queries replayed: " + strconv.Itoa(r.queries)
    
            // We need a reference time
            if firstPass {
                firstPass = false
                reference = q.Time
            }
    
            var j job
            delta := q.Time.Sub(reference)
            j.idle = start.Add(delta)
            j.query = q.Query
            db.logger.Tracef("next sleeping time: %s", j.idle)
            jobs <- j
        }
    
    ...
    
    func (db database) worker(jobs chan job, errors chan error, wg *sync.WaitGroup) {
        defer wg.Done()
        for {
            j, ok := <-jobs
            if !ok {
                db.logger.Trace("channel closed, worker exiting")
                return
            }
            sleep := time.Until(j.idle)
            if sleep > 0 {
                time.Sleep(sleep)
            }
            rows, err := db.drv.Query(j.query)
            if err != nil {
                errors <- err
                db.logger.Debugf("failed to execute query:\n%s\nerror: %s", j.query, err)
            }
            if rows != nil {
                rows.Close()
            }
        }
    }
    

    Explaination:

    We save the start of the program in a variable (here start). Next, we set a reference time (in reference) which is the very first timestamp of the slow query logs file. It will never change.

    Then, at each new query, we calculate the duration between reference and the current query timestamp q.Time. Let's store it in delta.

    We add delta to start and we have a timestamp in our timeline (not in the past as in the slow query log file). We send this timestamp to the worker next to the query in a new struct we created called job.

    When the worker receives the job via the channel, he calculates the time to wait until he can do the query. If it <= 0, it does the query immediately, otherwise he waits.