sqlutilization

Does a SQL query with fewer attributes cost less?


My question is very simple - Does a SQL query with fewer attributes cost less?

Example: Let's say our users table have 10 columns like userId, name, phone, email, ...

SELECT name, phone FROM users WHERE userId='id'

is cheapier than this

SELECT * FROM users WHERE userId='id'

Is it true in the perspective of resource utilization?


Solution

  • It depends.

    It is certainly possible that limiting the number of columns in the projection improves performance but it depends on what indexes are available. If we assume that userId is either the primary key or at least an indexed column, you'd expect database's optimizer to determine which row(s) to fetch by doing a lookup using an index that has userId as the leading column.

    If there is an index on (user_id, phone) or if phone is an included column on the index if your database supports that concept, the database can get the phone from the index it used to find the row(s) to return. In this way, the database never has to visit the actual table to fetch the phone. An index that has all the information the database needs to process the query without visiting the table is known as a "covering index". Roughly speaking, it is probably roughly as costly to search the index for the rows to return as it is to visit the table to fetch additional columns for the projection. If you can limit the number of columns in the projection in order to use a covering index, that to may significantly reduce the cost of the query. Even more significantly if visiting the table to fetch every column involves doing multiple reads because of chained rows or out-of-line LOB columns in Oracle, TOAST-able data types in PostgreSQL, etc.

    Reducing the number of columns in the projection will also decrease the amount of data that needs to be sent over the network and the amount of memory required on the client to process that data. This tends to be most significant when you have larger fields. For example, if one of the columns in the users table happened to be an LDAP path for the user's record, that could easily be hundreds of characters in length and account for half the network bandwidth consumed and half the memory used on the middle tier. Those things probably aren't critical if you're building a relatively low traffic internal line of business application that needs to serve a few hundred users. It is probably very critical if you're building a high volume SaaS application that needs to serve millions of users.