[SOLVED] What is database pooling?

What is database pooling?

I just wanted to know the concept of database connection pooling and how it is achieved.

Solution

Database connection pooling is a method used to keep database connections open so they can be reused by others.

Typically, opening a database connection is an expensive operation, especially if the database is remote. You have to open up network sessions, authenticate, have authorisation checked, and so on. Pooling keeps the connections active so that, when a connection is later requested, one of the active ones is used in preference to having to create another one.

Refer to the following diagram for the next few paragraphs:

  +---------+
  |         |
  | Clients |
+---------+ |
|         |-+  (1) +------+ (3) +----------+
| Clients | ==#==> | Open | ==> | RealOpen |
|         |   |    +------+     +----------+
+---------+   |       ^
              |       |(2)
              |    /------\     +---------+ (6) +-----------+
              |    | Pool | --> | Cleaner | ==> | RealClose |
              |    \------/     +---------+     +-----------+
          (4) |       ^
              |       |(5)
              |    +-------+
              #==> | Close |
                   +-------+

In it's simplest form, it's just an API call (1) to an Open API call which is similar to the "real" one, RealOpen. This first checks the pool for a suitable connection (2) and, if one is available, that's given to the client. Otherwise a new one is created (3) and given to the client.

A "suitable connection" is just one that already has access to the database using the correct information (such as database instance, credentials, and possibly other things).

Similarly, there's a Close API call (4) which doesn't actually call the real RealClose, rather it puts the connection back into the pool (5) for later use. At some point, connections in the pool may be actually closed (6). This could be done by a thread that continuously monitors the pool and calls RealClose if they are old enough or certain other conditions are met.

That's a pretty simplistic explanation. Real implementations may be arbitrarily more complex such as the previously mentioned:

handling connections to multiple servers and using multiple user accounts;
using arbitrary rules to decide when connections should be really shut down, things like its age, how many similar connections there are, and so on.

Database connection pooling is a specific case for a more general one, that of maintaining cached things where they may be expensive to start. As you see from a similar answer of mine, it can apply to thread pools as well (or pools of backup tapes, communication devices, laser cutters, or dozens of other different things).

In all cases, it's best if it's "expensive" to bring another resource online rather than use one that had previously been online.

That linked answer also shows one possible algorithm for deciding when to start up or (fully) shut down a resource.