How can I rebuild indexes and update stats in MySQL innoDB?

I have experience with MS SQL server where it is possible and useful to update statistic and rebuild indexes. I can't find such option in MySQL innoDB, is there such option? If not, how MySQL database create an execution plan? Does the MySQL update indexes and statistic with every UPDATE and INSERT?

175400 次浏览

This is done with

ANALYZE TABLE table_name;

Read more about it here.

ANALYZE TABLE analyzes and stores the key distribution for a table. During the analysis, the table is locked with a read lock for MyISAM, BDB, and InnoDB. This statement works with MyISAM, BDB, InnoDB, and NDB tables.

Why? One almost never needs to update the statistics. Rebuilding an index is even more rarely needed.

OPTIMIZE TABLE tbl; will rebuild the indexes and do ANALYZE; it takes time.

ANALYZE TABLE tbl; is fast for InnoDB to rebuild the stats. With 5.6.6 it is even less needed.

(Note: The above applies to regular INDEXes; InnoDB's FULLTEXT does require periodic rebuilding.)

You can also use the provided CLI tool mysqlcheck to run the optimizations. It's got a ton of switches but at its most basic you just pass in the database, username, and password.

Adding this to cron or the Windows Scheduler can make this an automated process. (MariaDB but basically the same thing.)

To date (mysql 8.0.18) there is no suitable function inside mysql to re-create indexes.
Since mysql 8.0 myisam is slowly phasing into deprecated status, innodb is the current main storage engine.
In most practical cases innodb is the best choice and it's supposed to keep indexes working well.
In most practical cases innodb also does a good job, you do not need to recreate indexes. Almost always.

When it comes to large tables with hundreds of GB data amd rows and a lot of writing the situation changes, indexes can degrade in performance.
In my personal case I've seen performance drop from ~15 minutes for a count(*) using a secondary index to 4300 minutes after 2 months of writing to the table with linear time increase.
After recreating the index the performance goes back to 15 minutes.

To date we have two options to do that:
1) OPTIMIZE TABLE (or ALTER TABLE)
Innodb doesn't support optimization so in both cases the entire table will be read and re-created.
This means you need the storage for the temporary file and depending on the table a lot of time (I've cases where an optimize takes a week to complete). This will compact the data and rebuild all indexes.
Despite not being officially recommended, I highly recommend the OPTIMIZE process on write-heavy tables up to 100GB in size.

2) ALTER TABLE DROP KEY -> ALTER TABLE ADD KEY
You manually drop the key by name, you manually create it again. In a production environment you'll want to create it first, then drop the old version.
The upside: this can be a lot faster than optimize. The downside: you need to manually create the syntax.
"SHOW CREATE TABLE" can be used to quickly see which indexes are available and how they are called.

Appendix:
1) To just update statistics you can use the already mentioned "ANALYZE TABLE".
2) If you experience performance degradation on write-heavy servers you might need to restart mysql. There are a couple of bugs in current mysql (8.0) that can cause significant slowdown without showing up in error log. Eventually those slowdowns lead to a server crash but it can take weeks or even months to build up to the crash, in this process the server gets slower and slower in responses.
3) If you wish to re-create a large table that takes weeks to complete or fails after hours due to internal data integrity problems you should do a CREATE TABLE LIKE, INSERT INTO SELECT *. then 'atomic RENAME' the tables.
4) If INSERT INTO SELECT * takes hours to days to complete on huge tables you can speed up the process by about 20-30 times using a multi-threaded approach. You "partition" the table into chunks and INSERT INTO SELECT * in parallel.

For basic cleanup and re-analyzing you can run "OPTIMIZE TABLE ...", it will compact out the overhead in the indexes and run ANALYZE TABLE too, but it's not going to re-sort them and make them as small & efficient as they could be.

https://dev.mysql.com/doc/refman/8.0/en/optimize-table.html

However, if you want the indexes completely rebuilt for best performance, you can:

  1. drop / re-add indexes (obviously)
  2. dump / reload the table
  3. ALTER TABLE and "change" using the same storage engine
  4. REPAIR TABLE (only works for MyISAM, ARCHIVE, and CSV)

https://dev.mysql.com/doc/refman/8.0/en/rebuilding-tables.html

If you do an ALTER TABLE on a field (that is part of an index) and change its type, then it will also fully rebuild the related index(es).

As indicated in MySQL manual, there are multiple ways of rebuilding tables. If you do not change the version of MySQL server and you want to support tables with various engines (MyISAM, InnoDB), then such stored procedure might come in handy:

CREATE PROCEDURE rebuildTables(in dbName VARCHAR(100))
BEGIN
-- flag marking cursor end
DECLARE done INT DEFAULT FALSE;
DECLARE tableName VARCHAR(255) DEFAULT "";
DECLARE tableEngine VARCHAR(100) DEFAULT "";


-- declare cursor for table names and engine
DEClARE curTables
CURSOR FOR
SELECT TABLE_NAME, ENGINE FROM information_schema.tables where table_schema = dbName and table_type = 'BASE TABLE';


-- declare NOT FOUND handler
DECLARE CONTINUE HANDLER
FOR NOT FOUND SET done = TRUE;


OPEN curTables;


rebuildTables: LOOP
FETCH curTables INTO tableName, tableEngine;
SELECT concat('Processing ', tableName, ' engine ', tableEngine);
IF done THEN
LEAVE rebuildTables;
END IF;
-- rebuild table as adviced in https://dev.mysql.com/doc/refman/5.7/en/rebuilding-tables.html
SET @query = CONCAT('ALTER TABLE ', tableName, ' ENGINE=', tableEngine);
PREPARE stmt FROM @query;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END LOOP;
CLOSE curTables;
END;

In order to call it, simply:

CALL rebuildTables('yourDbName');

Mind that this process might take a lot of time, especially for large tables.

use CLI,

mysqlcheck -u root -p --auto-repair --optimize --all-databases