alter table 组合索引

转载

mob64ca13f8eecb 2024-09-07 10:59:39

文章标签 alter table 组合索引数据库 php 字段 sql 文章分类 数据仓库大数据

一、关于索引的知识

要写出运行效率高的sql，需要对索引的机制有一定了解，下面对索引的基本知识做一介绍。

1、索引的优点和局限
索引可以提高查询的效率，但会降低dml操作的效率。
所以建立索引时需要权衡。对于dml操作比较频繁的表，索引的个数不宜太多。

2、什么样的列需要建索引？
经常用于查询、排序和分组的列（即经常在where、order或group by子句中出现的列）。

3、主键索引和复合索引
对于一张表的主键，系统会自动为其建立索引。
如果一张表的几列经常同时作为查询条件，可为其建立复合索引。

4、建立索引的语句

create  index  i_staff  on  staff  (empno);
create  index  i_agent  on  agent  (empno, start_date);

5、删除索引的语句

drop  index  I_staff;
drop  index  I_agent;

6、查询索引的语句
法一：利用数据字典
表一：all_indexes 查看一张表有哪些索引以及索引状态是否有效
主要字段： index_name, table_name, status
例如：select index_name, status

from  all_indexes
        where  table_name=’STAFF_INFO’;
      INDEX_NAME        STATUS
      ---------------------       -----------
      I_STAFF             VALID

表二：all_ind_columns 查看一张表在哪些字段上建了索引
主要字段： table_name, index_name, column_name, column_position
例如： select index_name, column_name, column_position

from  all_ind_columns
         where  table_name=’AGENT’
      INDEX_NAME        COLUMN_NAME     COLUMN_POSITON
      ---------------------       -----------------------      --------------------------
     I_AGENT             EMPNO              1
     I_AGENT             START_DATE         2

由此可见，agent表中有一个复合索引（empno, start_date ）
法二：利用toad工具
toad用户界面比sql*plus友好，并且功能强大。你可以在toad编辑器中键入表名，按F4，便可见到这张表的表结构以及所有索引列等基本信息。

7、索引的一些特点
1): 不同值较多的列上可建立检索，不同值少的列上则不要建。比如在雇员表的“性别”列上只有“男”与“女”两个不同值，因此就没必要建立索引。如果建立索引不但不会提高查询效率，反而会严重降低更新速度。
2): 如果在索引列上加表达式，则索引不能正常使用
例如：b1,c1分别是表b,c的索引列

select  *  from  b  where  b1/30< 1000 ;
select  *  from  c  where  to_char(c1,’YYYYMMDD HH24:MI:SS’) = ‘200203 14:01:01’;

以上都是不正确的写法
        3): where子句中如果使用in、or、like、!=，均会导致索引不能正常使用
           例如：select * from b where b1=30 or b1=40;
      4): 使用复合索引进行查询时必须使用前置列
         例如表a上有一个复合索引（c1,c2,c3）,则c1为其前置列
         如果用c1或c1+c2或c1+c2+c3为条件进行查询，则该复合索引可以发挥作用，反之，用c2或c3或c2+c3进行查询，则该索引不能起作用。

二. 书写sql注意事项：

1、        避免给sql语句中引用的索引列添加表达式：
典型实例：
b1,c1分别是表b,c的索引列：
1）        select * from b where b1/30< 1000 ;
2）        select * from c where to_char(c1,’YYYYMMDD HH24:MI:SS’) = ‘200203 14:01:01’;
替代方案：
1)        select * from b where b1 < 30000;
2)        select * from c where c1 = to_date(‘20020301 14:01:01’, ‘YYYYMMDD HH24:MI:SS’);
注：在lbs中有两个重要字段，pol_info中的undwrt_date和prem_info中的payment_date，这两个日期是带时分秒的，所以经常有同事用to_char 来查询某一时间段的数据。
例如：select count(*) from pol_info where to_char(undwrt_date,’YYYYMMDD’)=’20020416’;
      select count(*) from prem_info where to_char(undwrt_date,’YYYYMM’)=’200203’;
替代方案：
select count(*) from pol_info
where undwrt_date>=to_date(’20020416’,’YYYYMMDD’) and
           undwrt_date<to_date(’20020417’,’YYYYMMDD’);
select count(*) from prem_info
where payment_date>=to_date(’20020301’,’YYYYMMDD’) and
         payment_date<to_date(’20020401’,’YYYYMMDD’);

2、        避免在where子句中使用in、or、like、!=
典型实例：
a1是a表上的索引列：
1) select * from a
   where ( a1 = ‘0’ and ...) or (a1 = ‘1’ and ...);
2) select count(*) from a where a1 in (‘0’,’1’) ;
替代方案：
1）        select * from a where a1 = ‘0’ and ...
union
select * from a where a1 = ‘1’ and ...
2) select count(*) from a where a1 = ‘0’;
select count(*) from a where a1 = ‘1’;
   然后做一次加法运算；或者直接用存储过程来实现；
小结：
对字段使用了 ‘in，or，like’ 做条件、对字段使用了不等号 ‘!=’，均会使索引失效；如果不产生大量重复值，可以考虑把子句拆开；拆开的子句中应该包含索引，或者使用union连结符代替。另一种方式是使用存储过程，它使SQL变得更加灵活和高效。

3、建立适当的索引
曾经接过开发的一个统计sql， select … from tablea where cola=… and …
运行效率非常慢，经查tablea数据量巨大，再查all_ind_columns，发现cola是tablea的一个复合索引中的一列，但不是前置列。象这种情况，就需要与开发商量，是否针对cola建一个索引。

4、 like和substr
对于‘like’和‘substr’，其效率并没有多大分别。但是，当所搜索的值不存在时，使用‘like’的速度明显大于‘substr’。
所以：select * from a where substr(a1,1,4) = '5378' 可以用like替代
select * from a where a1 like ‘5378%’;

5、写where条件时，有索引字段的判断在前，其它字段的判断在后；如果where条件中用到复合索引，按照索引列在复合索引中出现的顺序来依次写where条件；

6、使用多表连接时，在from子句中，将记录数少的表放在后面，可提高执行效率；

7、避免使用not in
not in 是效率极低的写法，尽量使用minus或外连接加以替代
典型实例：
1) select col1 from tab1 where col1 not in (select col1 from tab2);
2) select sum(col2) from tab1 where col1 not in (select col1 from tab2);
替代方案

select col1 from tab1 minus  select col1 from tab2;
    select  sum(a.col2)  from  tab1 a, tab2  b 
where a.col1=b.col2(+) and b.col1 is null;

8、多表查询时，如果其中一个表的记录数量明显大于其他表，则可以先对此表进行查询后，再与其他小表进行表连接。
典型实例：

select  a.plan_code,  b.dno,  c,tno,  sum(a.tot_modal_prem),
from  prem_info a,  dept_ref b,  plan_type c
where  substr(a.deptno,1,7) = substr(b.deptno,1,7)
and a.plan_code = c.plan_code
group by b.dno,  c.tno,  a.plan_code;

替代方案：

select  b.dno,  c.tno,  a.plan_code,  a.tot_amount
from  (select  plan_code,  deptno,  sum(tot_modal_prem)  tot_amount
from  prem_info
group  by  deptno,  plan_code) a
dept_ref  b,
plan_type  c
  where  substr(a.deptno,1,7) = substr(b.deptno,1,7)
       and  a.plan_code = c.plan_code
group  by  b.dno,  c.tno,  a.plan_code;

小结：
由于prem_info表的记录数远远大于dept_ref表和plan_type表中的记录数，所以首先从prem_info表中查询需要的记录，此时记录数已经被大量缩小，然后再和其他两个表连接，速度会得到很大改善！

9、查询数量较大时，使用表连接代替IN，EXISTS，NOT IN，NOT EXISTS等。
典型实例：
a、使用IN：

select sum(col2) from tab1 where col1 in (select col1 from tab2);

使用EXISTS:：

select sum(col2) from tab1 a 
where exists ( select * from tab2 where col1=a.col1);

b、使用NOT IN：

select sum(col2) from tab1 where col1 not in (select col1 from tab2);

使用NOT EXISTS：

select sum(col2) from tab1 a 
where not exists ( select * from tab2 where col1=a.col1);

替代方案：
a、使用连接：

select sum(a.col2) from tab1 a,tab2 b where a.col1=b.col2;

b、使用外连接：

select sum(a.col2) from tab1 a,tab2 b 
where a.col1=b.col2(+) and b.col1 is null;

7、索引的一些特点
1): 不同值较多的列上可建立检索，不同值少的列上则不要建。比如在雇员表的“性别”列上只有“男”与“女”两个不同值，因此就没必要建立索引。如果建立索引不但不会提高查询效率，反而会严重降低更新速度。
对于以上的说法，我不以为然。不同值较少的列上如果建立位图索引对于查询是很有好处的（默认的是B 树索引）.

对于2
对字段使用了 ‘in，or，like’ 做条件、对字段使用了不等号 ‘!=’，均会使索引失效
-----------
如果用in,ORACLE也可能会利用上索引的,如

select * from tab1 where title in ('aa','bb') 如果在title上建立索引,则oracle在内部可能会执行以下操作
select * from tab1 where title ='aa'  union  select * from tab2 where title='bb'

从而利用上索引

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。