A shorter post this week, but an important one.
Last week, Erik Darling commented on my post saying that we shouldn’t use
SELECT *, which was both amusing and accurate. Amusing, because a number of the example T-SQL queries in that post made use of this construct.
Why was Erik’s comment accurate? A lot of SQL Server and other RDBMS (relational database management systems) folks advise never to use
SELECT *, and it has become a running gag in my talks when I use
SELECT * in a demo and then tell my attendees not to use it.
Let’s go through the main reasons why it’s not a good idea to use
SELECT *, specifically in a production environment.
Querying the system tables
When we write
SELECT * FROM table, the database engine has to go into the system tables to read the column metadata in order to materialize the results. This has a small but measurable performance impact when reading the system tables. If lots of queries use
SELECT *, this can cause noticeable locking on the system tables.
SELECT * returns the columns in the order they were created. This might result in a surprise if a particular order is assumed from output in the past, but the columns have been created in a different order during application upgrades and modifications, which can be reasonably common. Imagine a scenario where one or more columns is appended to the end to avoid rebuilding the whole table, however in a clean install of the application those columns might be in a different order. A
SELECT * query will therefore return the columns in a different order depending on how that table was created and / or modified.
Do we really need all the columns, all the time? By limiting the columns returned, we can make better use of indexes that consume less space in memory when our queries execute. This is by far the best reason to limit the columns in a
SELECT statement. Less memory means fewer storage reads, fewer CPU cycles, and faster queries. Since most databases are accessed over a network, this is another major performance bottleneck we can avoid.
When should we?
As with all best practices, there are exceptions to the rule which is why you’ll often hear a consultant say “it depends.”
If our application is a database design tool for example (like phpMyAdmin for MySQL and MariaDB), we should probably bring back all the columns all the time, and make use of row limits and caching to ensure that the application brings back only what it needs.
Another (common) exception is in a development and testing environment, or if we need to troubleshoot a problem in production. There are times when it does make sense to use
SELECT *. These decisions should be based on the best available information available to us, and only in the appropriate circumstances.
Share your favourite
SELECT * story with me in the comments below.
Never thoughts about performance issues ! This will speed up some features I’m working on
I’m commenting a month late, but when devs use ‘SELECT *’ it eliminates any possibility of creating a covering index.