MariaDB / MySQL Performance Tuning and Optimization: Sub-SELECTs are Evil

While most experienced DBAs already know this, it is seldom that an example this clear comes up to illustrate the point. That alone makes this story worth sharing, as a lesson to never use a sub-SELECT if you can achieve the same thing with a JOIN. Not heeding that warning is likely to line you up for pain further down the line, long after you forgot that you wrote a particular query with a sub-SELECT.

So with that intro out of the way, let’s dig into the actual story. Recently we were engaged to provide MySQL support. The client was experiencing performance issues with CiviCRM. Virtually every operation on a particular subsystem was taking seconds. Our suspicion turned to a query running in a trigger that was supporting CiviCRM’s auditing functionality. It was an UPDATE query, but rewriting it as a SELECT to test the query plan was showing that the query was executing in a few milliseconds. So surely it couldn’t have been a problem in that UPDATE query. After all, in SELECT form, it was running in milliseconds rather than seconds. Or could it?

Well, it turns out, it could! This form of the query was taking ~ 4 seconds:

UPDATE civicrm_case
SET    modified_date = CURRENT_TIMESTAMP
WHERE  id IN (SELECT ca.case_id FROM civicrm_case_activity ca WHERE ca.activity_id = OLD.id);

But rephrasing the exact same query using a JOIN rather than a sub-SELECT, resulted in the query taking about 4ms:

UPDATE civicrm_case,
       civicrm_case_activity
SET    civicrm_case.modified_date = CURRENT_TIMESTAMP
WHERE  civicrm_case.id = civicrm_case_activity.case_id AND
       civicrm_case_activity.activity_id = OLD.id;

We escalated this database performance optimization fix for triage with the CiviCRM development team and upstreaming to CiviCRM. This anomaly seems to manifest both on MySQL and MariaDB, but if you stick to best practices for MySQL performance you will be a lot less likely to run into performance anomalies like this one.