MariaDB / MySQL Performance Tuning and Optimization: How to Delete Faster on MySQL and MariaDB

So, you want to delete a large amount of data from your database?……

Good for you!

Not many people appreciate that databases are not supposed to hold infinite amounts of data and a sensible data retention policy is always a good thing.

But how would you delete all those rows of data? Here are some of the techniques our MySQL support consultants use on a regular basis.

The Slow Way

The first option could be just to outright delete the data with one statement:

DELETE FROM whydidiletthistablegetsobig WHERE last_update < NOW() - INTERVAL 90 DAY;

The problem is that it might take a very long time – depending on the size of the table – and it might lock parts of the table as it is running.

Where is your Limit?

Instead of deleting so many rows at once, let’s add a LIMIT and batch the delete process. This has a benefit of reducing locking issues and would seem less like a ‘black box’ long running statement.

-- start loop
DELETE FROM whydidiletthistablegetsobig WHERE last_update < NOW() - INTERVAL 90 DAY
LIMIT 5000;
SELECT ROW_COUNT();
-- is it 0? exit. Else, continue.
-- end loop

Why does this take over 100 seconds?

Sometimes, the batched DELETE statement takes a long time, in and off itself. This means that it still locks and now, the whole process takes even longer to complete. This could be because of index inefficiencies and you do not retain the (primary key) position of where you were in removing rows in the table.

What you can do, is use a separate SELECT statement to locate a range of PRIMARY KEY ids and then use that range to feed into the DELETE statement.

-- start loop
SELECT min(id), max(id) FROM
(SELECT id FROM whydidiletthistablegetsobig WHERE last_update < NOW() - INTERVAL 90 DAY  AND id > max_id
LIMIT 5000) as foo;
-- if null, exit
DELETE FROM whydidiletthistablegetsobig WHERE id BETWEEN min_id AND max_id
AND last_update < NOW() - INTERVAL 90 DAY;
-- remember max_id and use in SELECT above
-- end loop

This is pretty fast

More Cowbell

For these large DELETEs you need a file system that can handle those better. The ZFS file system handles DELETEs 40% faster. You’ll be wondering why you haven’t used this before.

But can I do it instantly?

Yes, you can – if you use PARTITIONs.

If you setup your table, to have partitions – let’s say by month – you can then drop these partitions when you want to prune old data.

ALTER TABLE whydidiletthistablegetsobig
DROP PRIMARY KEY, ADD PRIMARY KEY(id, last_update),
RANGE COLUMNS (last_update) (
PARTITION pNULL VALUES LESS THAN ('2020-11-01 00:00:00'),
PARTITION p202011 VALUES LESS THAN ('2020-12-01 00:00:00'),
PARTITION p202012 VALUES LESS THAN ('2021-01-01 00:00:00'),
PARTITION p202101 VALUES LESS THAN ('2021-02-01 00:00:00'),
PARTITION p202102 VALUES LESS THAN ('2021-03-01 00:00:00'),
PARTITION p202103 VALUES LESS THAN ('2021-04-01 00:00:00'),
PARTITION p202104 VALUES LESS THAN ('2021-05-01 00:00:00'),
PARTITION p202105 VALUES LESS THAN ('2021-06-01 00:00:00'),
PARTITION p202106 VALUES LESS THAN ('2021-07-01 00:00:00'),
PARTITION p202107 VALUES LESS THAN ('2021-08-01 00:00:00'),
PARTITION p202108 VALUES LESS THAN ('2021-09-01 00:00:00'),
PARTITION p202109 VALUES LESS THAN ('2021-10-01 00:00:00'),
PARTITION p202110 VALUES LESS THAN ('2021-11-01 00:00:00'),
PARTITION p202111 VALUES LESS THAN ('2021-12-01 00:00:00'),
PARTITION pMAX VALUES LESS THAN (MAXVALUE));

ALTER TABLE whydidiletthistablegetsobig DROP PARTITION pNULL;