An often overlooked cause of poor performance in databases is when the data types don’t match between the column and the value it is being compared against in an SQL query. In some cases, this can be obvious (comparing an integer value against a varchar column). In other cases, it can be much more subtle. In this article we will cover some examples that illustrate why matching data types are important for performance, and the subtle ways in which a mismatch can slip under the radar.
Here are some examples of the kind of problems our MySQL support consultants regularly run into on various servers. Consider the following table:
CREATE TABLE test ( PK int auto_increment, FK varchar(250), data1 varchar(100), data2 varchar(100), data3 varchar(100), PRIMARY KEY(PK), KEY idx_FK (FK) );
Consider the following query:
SELECT * FROM test WHERE FK >= 0;
Can you think of why this query might not use the index idx_FK? If you are inferring it from the title of the article, you are right – the type doesn’t match! You are comparing a varchar to an integer type. If you are using a varchar for a foreign key, as the name implies, you should probably be rethinking your entity relationships, but this sort of a problem can be quite well hidden yet put a huge amount of extra strain on the server. A query like this on a relatively small table can run in 150ms and not raise any red flags. But if it is called hundreds of thousands of times per day, the overhead compared to running indexed can amount go many extra CPU-hours consumed on the server. This can make a very substantial cost difference if you are trying to reduce your AWS RDS costs!
Now let’s consider another query against the same table:
SELECT * FROM test WHERE PK = 3456789012;
You run an explain on it, and to your horror, you discover that MySQL has decided to execute a full table scan! Why would it possibly do that? After all, you are matching an integer against an integer value, and it is even the table’s primary key! The type is definitely the same, right? Right?
Well done if you have spotted it. The PK is defined as a signed integer, which has a range -2147483648 to 2147483647. So the value we are comparing against doesn’t fit. It will try to cast the value we put in the query, fail do to an overflow, and decide to do a full table scan as a last resort attempt!
This kind of thing can be rather difficult to spot, but it can have devastating consequences if you really have anything approaching that many rows in a table.