索引

2021-05-02

存储过程生成 100W 数据

1.创建数据库 oldboy

1mysql> create database oldboy;

2.创建存储过程，并且创建 100W 条数据 use oldboy

 1use oldboy
 2
 3create table t100w (id int,num int,k1 char(2),k2 char(4),dt timestamp) charset utf8mb4 collate utf8mb4_bin;
 4
 5delimiter //
 6create  procedure rand_data(in num int)
 7begin
 8declare str char(62) default 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789';
 9declare str2 char(2);
10declare str4 char(4);
11declare i int default 0;
12while i<num do
13set str2=concat(substring(str,1+floor(rand()*61),1),substring(str,1+floor(rand()*61),1));
14set str4=concat(substring(str,1+floor(rand()*61),2),substring(str,1+floor(rand()*61),2));
15set i=i+1;
16insert into t100w values (i,floor(rand()*num),str2,str4,now());
17end while;
18end;
19//
20delimiter ;
21
22插入100w条数据：
23call rand_data(1000000);
24commit;

压力测试

 1[root@localhost ~]# mysqlslap --defaults-file=/etc/my.cnf \
 2> --concurrency=100 --iterations=1 --create-schema='oldboy' \
 3> --query="select * from oldboy.t100w where k2='xyMN'" engine=innodb \
 4> --number-of-queries=2000 -uroot -p123 -verbose
 5mysqlslap: [Warning] Using a password on the command line interface can be insecure.
 6Benchmark
 7	Running for engine rbose
 8	Average number of seconds to run all queries: 339.064 seconds
 9	Minimum number of seconds to run all queries: 339.064 seconds
10	Maximum number of seconds to run all queries: 339.064 seconds
11	Number of clients running queries: 100
12	Average number of queries per client: 20
13
14

索引的作用

提供类似于书中目录的作用，目的是为了优化查询

索引的种类（算法）

B 树索引
Hash 索引
R 树索引
Full text
GIS

B 树索引的算法

MySQL 现在用的是 B*Tree。

辅助索引(S)构建 B 树结构

Secondary Index

(1)辅助索引是基于表的列进行生成的
(2)取出索引列的所有值（取出所有键值）,临时表中存储
(3)进行所有键值的排序
(4)将所有的键值按顺序落到 BTree 索引的叶子节点上
(5)进而生成枝节点和根节点
(6)叶子节点除了存储键值之外，还存储了相邻叶子节点的指针，另外还会保存原表数据的指针

聚集索引(C)构建 B 树结构

Cluster Index

(1) 建表时有主键列（ID）
(2) 表中进行数据存储，会按照 ID 列的顺序，有序的存储一行一行的数据到数据页上(这个动作叫做聚集索引组织表)
(3) 表的数据页被作为聚集索引的叶子节点
(4) 把叶子节点的主键值生成上层枝节点和根节点。

聚集索引和辅助索引构成区别总结

1聚集索引只能有一个,非空唯一,一般时主键
2辅助索引,可以有多个,是配合聚集索引使用的
3聚集索引叶子节点,就是磁盘的数据行存储的数据页
4MySQL是根据聚集索引,组织存储数据,数据存储时就是按照聚集索引的顺序进行存储数据
5辅助索引,只会提取索引键值,进行自动排序生成B树结构

辅助索引细分

联合多列辅助索引(覆盖索引)
单列的辅助索引
唯一索引

索引树的高度

关于索引树的高度受什么影响？

（1）数据行多，分表
（2）索引列字符长度，前缀索引
（3）char varchar ，表设计
（4）enum 优化索引高度，能用则用。

索引的管理

辅助索引：单列的辅助索引

 1# 查看t100w表的结构
 2mysql> desc t100w;
 3+-------+-----------+------+-----+-------------------+-----------------------------+
 4| Field | Type      | Null | Key | Default           | Extra                       |
 5+-------+-----------+------+-----+-------------------+-----------------------------+
 6| id    | int(11)   | YES  |     | NULL              |                             |
 7| num   | int(11)   | YES  |     | NULL              |                             |
 8| k1    | char(2)   | YES  |     | NULL              |                             |
 9| k2    | char(4)   | YES  |     | NULL              |                             |
10| dt    | timestamp | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
11+-------+-----------+------+-----+-------------------+-----------------------------+
125 rows in set (0.00 sec)
13
14# 查看t100w表在系统中占用的空间 (查询条件为k2)
15[root@localhost ~]# ll -h /data/mysql/data/oldboy/
16total 57M
17-rw-r-----. 1 mysql mysql   65 Apr 28 22:03 db.opt
18-rw-r-----. 1 mysql mysql 8.5K Apr 28 22:06 t100w.frm
19-rw-r-----. 1 mysql mysql  56M Apr 28 22:24 t100w.ibd
20
21# 并发100，2000次查询消耗时间 339.064 seconds
22[root@localhost ~]# mysqlslap --defaults-file=/etc/my.cnf \
23--concurrency=100 --iterations=1 --create-schema='oldboy' \
24--query="select * from oldboy.t100w where k2='xyMN'" engine=innodb \
25--number-of-queries=2000 -uroot -p123 -verbose
26mysqlslap: [Warning] Using a password on the command line interface can be insecure.
27Benchmark
28	Running for engine rbose
29	Average number of seconds to run all queries: 339.064 seconds
30	Minimum number of seconds to run all queries: 339.064 seconds
31	Maximum number of seconds to run all queries: 339.064 seconds
32	Number of clients running queries: 100
33	Average number of queries per client: 20
34
35# 建立索引 (会锁表，短暂的锁表，最好在业务不繁忙的时候建立索引)
36mysql> alter table t100w add index idx_k2(k2);
37Query OK, 0 rows affected (1.41 sec)
38Records: 0  Duplicates: 0  Warnings: 0
39
40# 再次进行压测
41[root@localhost ~]# mysqlslap --defaults-file=/etc/my.cnf \
42> --concurrency=100 --iterations=1 --create-schema='oldboy' \
43> --query="select * from oldboy.t100w where k2='xyMN'" engine=innodb \
44> --number-of-queries=2000 -uroot -p123 -verbose
45mysqlslap: [Warning] Using a password on the command line interface can be insecure.
46Benchmark
47	Running for engine rbose
48	Average number of seconds to run all queries: 0.488 seconds
49	Minimum number of seconds to run all queries: 0.488 seconds
50	Maximum number of seconds to run all queries: 0.488 seconds
51	Number of clients running queries: 100
52	Average number of queries per client: 20
53
54# 优化效果大约在800倍左右
55有无索引查询效果很明显
56
57# 查看表中的索引(MUL是辅助索引)
58mysql> desc t100w;
59+-------+-----------+------+-----+-------------------+-----------------------------+
60| Field | Type      | Null | Key | Default           | Extra                       |
61+-------+-----------+------+-----+-------------------+-----------------------------+
62| id    | int(11)   | YES  |     | NULL              |                             |
63| num   | int(11)   | YES  |     | NULL              |                             |
64| k1    | char(2)   | YES  |     | NULL              |                             |
65| k2    | char(4)   | YES  | MUL | NULL              |                             |
66| dt    | timestamp | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
67+-------+-----------+------+-----+-------------------+-----------------------------+
685 rows in set (0.00 sec)
69
70# 更详细的查看表中的索引
71mysql> show index from t100w\G
72*************************** 1. row ***************************
73        Table: t100w
74   Non_unique: 1
75     Key_name: idx_k2
76 Seq_in_index: 1
77  Column_name: k2
78    Collation: A
79  Cardinality: 3880
80     Sub_part: NULL
81       Packed: NULL
82         Null: YES
83   Index_type: BTREE
84      Comment: 
85Index_comment: 
861 row in set (0.00 sec)

辅助索引：唯一索引

列中的数据不能重复

 1# 判断想建索引的列中有没有重复的数据：使用distinct，对比两次的查询行数是否一致。
 2mysql> select count(k1) from t100w;
 3+-----------+
 4| count(k1) |
 5+-----------+
 6|   1000000 |
 7+-----------+
 81 row in set (0.29 sec)
 9
10mysql> select count(distinct(k1)) from t100w;
11+---------------------+
12| count(distinct(k1)) |
13+---------------------+
14|                3721 |
15+---------------------+
161 row in set (0.58 sec)
17
18
19# 当然，分组统计也可以，大于1的就代表重复。
20mysql> select k1,count(k1) from t100w group by k1 having count(k1)>1 order by count(k1) desc limit 5;
21+------+-----------+
22| k1   | count(k1) |
23+------+-----------+
24| OH   |       328 |
25| K4   |       326 |
26| A3   |       322 |
27| kd   |       319 |
28| N6   |       318 |
29+------+-----------+
305 rows in set (0.56 sec)
31
32# 其实直接创建唯一索引也行，因为如果有重复的行，创建会失败，哈哈。
33mysql> alter table t100w add unique index idx_k1(k1);
34ERROR 1062 (23000): Duplicate entry 'pi' for key 'idx_k1'

辅助索引：前缀索引

1# 建立前缀索引：前5个字符（从左到右）
2mysql> alter table city add index idx_name(name(5));
3Query OK, 0 rows affected (0.01 sec)
4Records: 0  Duplicates: 0  Warnings: 0

辅助索引：联合索引

(1) where A、GROUP BY B、ORDER BY C ---> (A,B,C)
(2) where A B C
(2.1) 都是等值 ,在 5.5 以后无关索引顺序，创建索引的原则：唯一值多的列放在联合索引的最左侧
(2.2) 如果有不等值，例如以下情况
select where A= and B> and C=
索引顺序，ACB ，语句改写为 ACB

1# 创建联合索引
2mysql> alter table city add index idx_co_po(countrycode,population);
3Query OK, 0 rows affected (0.02 sec)
4Records: 0  Duplicates: 0  Warnings: 0
5

删除索引

1#　删除city表中的idx_co_po索引
2mysql> alter table city drop index idx_co_po;
3Query OK, 0 rows affected (0.00 sec)
4Records: 0  Duplicates: 0  Warnings: 0

执行计划

作用

上线新的查询语句之前，进行提前预估语句的性能
在出现性能问题时，找到合理的解决思路

获取执行计划(desc = explain 功能一样)

 1mysql> desc  select * from oldboy.t100w where k2='EF12'\G
 2mysql> explain  select * from oldboy.t100w where k2='EF12'\G
 3*************************** 1. row ***************************
 4           id: 1
 5  select_type: SIMPLE
 6        table: t100w
 7   partitions: NULL
 8         type: ref
 9possible_keys: idx_k2
10          key: idx_k2
11      key_len: 17
12          ref: const
13         rows: 257
14     filtered: 100.00
15        Extra: NULL
161 row in set, 1 warning (0.00 sec)
17
18# 解析
19table: t100w	       # 多表查询的时候，特别有用
20type: ref              # 索引的应用级别
21possible_keys: idx_k2  # 可能会使用到的索引  
22key: idx_k2	       # 实际上使用的索引
23key_len: 17 	       # 联合索引覆盖长度
24rows: 293	       # 查询的行数（越少越好）
25Extra: NULL	       # 额外的信息
26
27
28
29

type

索引的应用级别

ALL : 全表扫描，不走索引

1# 没建立索引！！
2
3# 建立索引不走的()！！！！
4mysql> desc select * from t100w;
5mysql> desc select * from t100w where k1='aa';	# k2做的索引列，拿k1去查询，当然走不了索引
6mysql> desc select * from t100w where k2 != 'aaaa'; # 对于辅助索引来说，不等于是不走索引的
7mysql> desc select * from t100w where k2 like '%xt%'; # 模糊查找不走索引。 除非'xt%'才走索引。

Index ：全索引扫描

1mysql> desc select k2 from t100w;

range ：索引范围扫描

 1# 辅助索引 : > < >= <= like ,   in   or
 2mysql> desc select * from world.city where countrycode like 'C%'
 3mysql> desc select * from world.city where countrycode in ('CHN','USA');
 4改写为：
 5desc
 6select * from world.city where countrycode='CHN'
 7union all 
 8select * from world.city where countrycode='USA';
 9
10# 主键： !=
11mysql> desc select * from world.city where id!=3000;
12mysql> desc select * from world.city where id>3000;

ref : 辅助索引等值查询

1mysql> desc select * from city where countrycode='CHN'

eq_ref ：在多表连接查询是 on 的条件列是唯一索引或主键

1# 左边 ALL  右边 eq_ref
2mysql> desc select a.name,b.name ,b.surfacearea 
3from city as a 
4join country as b 
5on a.countrycode=b.code 
6where a.population <100;

const,system ：主键或唯一键等值查询

1mysql> DESC SELECT * from city where id=10

null：查询不到数据的时候，就是性能最好的时候

1mysql> desc select * from city where id=1000000000;

Extra

额外信息

 1# Using filesort 
 2mysql> desc select * from city where countrycode="CHN" order by population\G
 3*************************** 1. row ***************************
 4           id: 1
 5  select_type: SIMPLE
 6        table: city
 7   partitions: NULL
 8         type: ref
 9possible_keys: CountryCode
10          key: CountryCode
11      key_len: 12
12          ref: const
13         rows: 363
14     filtered: 100.00
15        Extra: Using index condition; Using filesort
161 row in set, 1 warning (0.00 sec)
17
18# where 和 orderby 同时出现需要使用联合索引进行优化
19mysql> alter table city add index idx_co_po(countrycode,population);
20Query OK, 0 rows affected (0.01 sec)
21Records: 0  Duplicates: 0  Warnings: 0
22
23# 再次查询发现 Extra 中已经没有 Using filesort
24mysql> desc select * from city where countrycode="CHN" order by population\G
25*************************** 1. row ***************************
26           id: 1
27  select_type: SIMPLE
28        table: city
29   partitions: NULL
30         type: ref
31possible_keys: CountryCode,idx_co_po
32          key: idx_co_po
33      key_len: 12
34          ref: const
35         rows: 363
36     filtered: 100.00
37        Extra: Using index condition
381 row in set, 1 warning (0.00 sec)

explain(desc)使用场景

公司业务慢,请你从数据库的角度分析原因：

（1）应急性的慢：突然夯住

1处理过程:
21.show processlist;  获取到导致数据库hang的语句
32. explain 分析SQL的执行计划,有没有走索引,索引的类型情况
43. 建索引,改语句

（2）一段时间慢(持续性的):

11. 记录慢日志slowlog,分析slowlog，获取到导致数据库slow的语句
22. explain 分析SQL的执行计划,有没有走索引,索引的类型情况
33. 建索引,改语句

索引管理原则

为了使索哪些引的使用效率更高，在创建索引时，必须考虑在字段上创建索引和创建什么类型的索引。

建表时一定要有主键,一般是个无关列

选择唯一性索引

 1唯一性索引的值是唯一的，可以更快速的通过该索引来确定某条记录。
 2例如，学生表中学号是具有唯一性的字段。为该字段建立唯一性索引可以很快的确定某个学生的信息。
 3如果使用姓名的话，可能存在同名现象，从而降低查询速度。
 4
 5优化方案:
 6(1) 如果非得使用重复值较多的列作为查询条件(例如:男女),可以将表逻辑拆分
 7(2) 可以将此列和其他的查询类,做联和索引
 8select count(*) from world.city;
 9select count(distinct countrycode) from world.city;
10select count(distinct countrycode,population ) from world.city;
11

为经常需要 where 、ORDER BY、GROUP BY,join on 等操作的字段是创建索引，排序操作会浪费很多时间。

 1where  A B C      ----》 A  B  C
 2in 
 3where A   group by B  order by C
 4A,B，C
 5
 6where A、GROUP BY B、ORDER BY C
 71. 联合索引 
 82. (A,B,C) 
 9
10如果为其建立索引，优化查询
11注：如果经常作为条件的列，重复值特别多，可以建立联合索引。

尽量使用前缀来索引：如果索引字段的值很长，最好使用值的前缀来索引。
限制索引的数目：索引的数目不是越多越好。

1可能会产生的问题:	percona-toolkit中有个工具,专门分析索引是否有用
2(1) 每个索引都需要占用磁盘空间，索引越多，需要的磁盘空间就越大。
3(2) 修改表时，对索引的重构和更新很麻烦。越多的索引，会使更新表变得很浪费时间。
4(3) 优化器的负担会很重,有可能会影响到优化器的选择.

删除不再使用或者很少使用的索引(percona toolkit)：pt-duplicate-key-checker

1表中的数据被大量更新，或者数据的使用方式被改变后，原有的一些索引可能不再需要。数据库管理
2员应当定期找出这些索引，将它们删除，从而减少索引对更新操作的影响。

大表加索引,要在业务不繁忙期间操作
尽量少在经常更新值的列上建索引

不走索引的情况

没有查询条件，或者查询条件没有建立索引

1select * from tab;       全表扫描。
2select  * from tab where 1=1;

查询结果集是原表中的大部分数据，应该是 25％以上。（查询的结果集，超过了总数行数 25%，优化器觉得就没有必要走索引了。）
索引本身失效，统计数据不真实

1索引有自我维护的能力。
2对于表内容变化比较频繁的情况下，有可能会出现索引失效。
3一般是删除重建
4
5现象:
6有一条select语句平常查询时很快,突然有一天很慢,会是什么原因
7select?  --->索引失效,，统计数据不真实
8DML ?   --->锁冲突

查询条件使用函数在索引列上，或者对索引列进行运算，运算包括(+，-，*，/，! 等)

1错误的例子：select * from test where id-1=9;
2正确的例子：select * from test where id=10;
3算术运算
4函数运算
5子查询

隐式转换导致索引失效.
<> ，not in 不走索引（辅助索引）
like "%_" 百分号在最前面不走

1EXPLAIN SELECT * FROM teltab WHERE telnum LIKE '31%'  走range索引扫描
2EXPLAIN SELECT * FROM teltab WHERE telnum LIKE '%110'  不走索引
3%linux%类的搜索需求，可以使用elasticsearch+mongodb 专门做搜索服务的数据库产品

作者：Soulboy