node.jssql-servernode-mssql

SQL Server Connections Exhausted


I do not know how to best diagnose or where to go regarding my issue, and would appreciate any help that could be offered.

This question addresses the node-mssql Node.JS library.

Primary Issue

I am running a Node.JS application based on a Socket.IO instance. Data is being pumped through to Socket.IO (the app), and then MSSQL is taking the received data and writing it to a database.

My issue is that scaling up the program to 300 concurrent Socket.IO connections, all feeding data, results in a bottleneck with database queries. This results in a non-descriptive timeout error:

Error: operation timed out for an unknown reason
    at C:\Users\jeanm\WebstormProjects\TikTokResearchNode\node_modules\tarn\dist\PendingOperation.js:17:27
    at runNextTicks (node:internal/process/task_queues:61:5)
    at listOnTimeout (node:internal/timers:528:9)
    at processTimers (node:internal/timers:502:7)

My Queries

Here are my queries that my project is making use of: https://pastebin.com/sr5AtNhS

There is one javascript-embedded query: https://pastebin.com/w7gYeCQQ

Finally, there is a dynamic generator for queries. It's all the same type of query, but with different tables & fields:

    static #build_query(table_name, keys = [], return_id = false) {

        return `
            INSERT INTO
                "${table_name}" (${keys.join(', ')})
            ${return_id ? 'OUTPUT INSERTED.id' : ''}
            VALUES
                (${"?".repeat(keys.length).split("").join(", ")}) 
        `

My Database

The following is my database design: https://i.sstatic.net/H3ezN.jpg

What I've Tried

I read that increasing the connection pool is a valid strategy, and for me, with about 1500 queries per second if things worked the way they should, the pool having a max of 10 was obviously not enough.

This is my config:

{
        connectionString: 'Driver=ODBC Driver 17 for SQL Server;Server=localhost;Database=ResearchTest;Trusted_Connection=yes;',
        driver: 'msnodesqlv8',
        pool: {
            max: 16384,
            min: 256,
            idleTimeoutMillis: 30000
        }
    }

I just went for it, and gave it everything, to see if it was a matter of not enough connections. Well, surprise surprise, no, it's not.

What I've Debugged

Debugging what was happening showed me that the pool was steadily filling up to 5-10 "free"connections:

https://cdn.isaackogan.com/7zIjT

This would stay stable for about 30 seconds. Then, within 5 seconds of the last screenshot, it spiked:

https://cdn.isaackogan.com/y9peJ

What I believe this is telling me is that connections stopped being returned to the pool, so node-mssql tried to combat this by creating new connections, except those were also not being returned. And then, this just continued for a long time since the 'max' was 16384. Eventually, after 30 seconds, the timeout errors start being thrown.

What I Think The Cause Is

I do not have a lot of experience with MSSQL (in fact, the last few weeks are my first). In general, I am not very skilled with SQL, but my project requires it and I am not afraid of a learning curve.

My observation is that connections are not being returned to the pool, and this causes timeouts. My hypothesis is that this would happen if a table is locked and hangs, though I am not sure.

An alternative hypothesis would be that really expensive queries cause a build-up that exhausts all connections.

My Questions

Essentially, I'm curious what people with more experience think. What are the potential reasons for what I've outlined is happening, and do you have any ideas what I can do to fix it? I know it is not an issue with the library, but my code, and I am willing to provide any information if you feel that you can help.

Is this a case of a table lock gone wrong? How would I diagnose and go about fixing that? If it's possible it's something else, what, and how can I find out?

I am also unsure how to debug something like this when there are thousands of queries interacting with the DB at a given time, how would I pin down the 1 where everything goes wrong to even KNOW what has gone wrong?

Any help/a direction to go in would be greatly appreciated!


Solution

  • My project is now running with 300 concurrent users/Socket.IO connections.

    I made the following changes:

    1. Setting validateConnection: false config option
    2. Halve the number of 'requests' to the database by combining queries
    3. Reduce the max pool to 50 (knowing how connections work now, it's funny I had it so large)

    I am glad that I have 'extra' cards to pull if there is a scale issue in the future. These include (for my own self-reference):

    1. Warming up the pool
    2. Using Redis as temporary storage for batching
    3. Sharding the storage script (1 data collection script -> n # of instances of the Node.JS storage script)

    The bottleneck was absolutely sockets and not table locking. For more info, see the issue that helped me: https://github.com/tediousjs/node-mssql/issues/1473