is not null 不走索引怎么优化 not in走索引吗

转载

云端行者 2024-03-25 20:34:14

文章标签 Oracle 数据结构 OS 算法 Windows 文章分类 数据仓库大数据

oracle中关于in和exists，not in 和 not exists

in和exists

in 是把外表和内表作hash 连接，而exists是对外表作loop循环，每次loop循环再对内表进行查询。

not exists:做NL，对子查询先查，有个虚表，有确定值，所以就算子查询有NULL最终也有值返回

not in:做hash，对子查询表建立内存数组，用外表匹配，那子查询要是有NULL那外表没的匹配最终无值返回。一直以来认为exists比in效率高的说法是不准确的。

如果查询的两个表大小相当，那么用in和exists差别不大。

如果两个表中一个较小，一个是大表，则子查询表大的用exists，子查询表小的用in：

例如：表A（小表），表B（大表）1：select * from A where cc in (select cc from B)

效率低，用到了A表上cc列的索引；select * from A where exists(select cc from B where cc=A.cc)

效率高，用到了B表上cc列的索引。

相反的2：select * from B where cc in (select cc from A)

效率高，用到了B表上cc列的索引；select * from B where exists(select cc from A where cc=B.cc)

效率低，用到了A表上cc列的索引。

not in 和not exists如果查询语句使用了not in 那么内外表都进行全表扫描，没有用到索引；而not extsts 的子查询依然能用到表上的索引。所以无论那个表大，用not exists都比not in要快。

一直听到的都是说尽量用exists不要用in，因为exists只判断存在而in需要对比值，所以exists比较快，但看了看网上的一些东西才发现根本不是这么回事。下面这段是抄的Select * from T1 where x in ( select y from T2 )执行的过程相当于:select * from t1, ( select distinct y from t2 ) t2 where t1.x = t2.y;select * from t1 where exists ( select null from t2 where y = x )执行的过程相当于:for x in ( select * from t1 ) loop if ( exists ( select null from t2 where y = x.x ) then OUTPUT THE RECORD end ifend loop从我的角度来说，in的方式比较直观，exists则有些绕，而且in可以用于各种子查询，而exists好像只用于关联子查询（其他子查询当然也可以用，可惜没意义）。由于exists是用loop的方式，所以，循环的次数对于exists影响最大，所以，外表要记录数少，内表就无所谓了，而in用的是hash join，所以内表如果小，整个查询的范围都会很小，如果内表很大，外表如果也很大就很慢了，这时候exists才真正的会快过in的方式。

也就是说，in和exists需要具体情况具体分析，not in和not exists就不用分析了，尽量用not exists就好了。

典型的连接类型共有3种：排序 - - 合并连接(Sort Merge Join (SMJ) ) 嵌套循环(Nested Loops (NL) ) 哈希连接(Hash Join) 嵌套循环和哈希连接的算法还是有不同，在理论上哈希连接要快过排序和nl，当然实际情况比理论上有复杂的多，不过两者还是有差异的.

1 关联子查询与非关联子查询

关联子查询需要在内部引用外部表，而非关联子查询不要引用外部表。对于父查询中处理的记录来说，一个关联子查询是每行计算一次，然而一个非关联子查询只会执行一次，而且结果集被保存在内存中（如果结果集比较小），或者放在一张oracle临时数据段中（如果结果集比较大）。一个“标量”子查询是一个非关联子查询，返回唯一记录。如果子查询仅仅返回一个记录，那么oracle优化器会将结果缩减为一个常量，而且这个子查询只会执行一次。

/*select * from emp where deptno in (select deptno from dept where dept_name='admin');*/

2.如何选择？

根据外部查询，以及子查询本身所返回的记录的数目。如果两种查询返回的结果是相同的，哪一个效率更好？

关联子查询的系统开销：对于返回到外层查询的记录来说，子查询会每次执行一次。因此，必须保证任何可能的时候子查询都要使用索引。

非关联子查询的系统开销：子查询只会执行一次，而且结果集通常是排好序的，并保存在临时数据段中，其中每一个记录在返回时都会被父级查询引用，在子查询返回大量记录的情况下，将这些结果集排序回增大系统的开销。

所以：如果父查询只返回较少的记录，那么再次执行子查询的开销不会非常大，如果返回很多数据行，那么直查询就会执行很多次。如果子查询返回较少的记录，那么为内存中保存父查询的结果集的系统开销不会非常大，如果子查询返回多行，那么需要将结果放在临时段上，然后对数据段排序，以便为负查询中的每个记录服务。

3结论：1）在使用一个关联子查询是，使用in 或者 exists子句的子查询执行计划通常都相同

2）exists子句通常不适于子查询

3）在外部查询返回相对较少记录时，关联子查询比非关联子查询执行得要更快。

4）如果子查询中只有少量的记录，则非关联子查询会比关联子查询执行得更快。

4 子查询转化：子查询可以转化为标准连接操作

1）使用in的非关联子查询（子查询唯一）

条件：1）在整个层次结构中最底层数据表上定义唯一主键的数据列存在于子查询的select列表中

2）至少有个定义了唯一主键的数据列在select列表中，而且定义唯一主键的其他数据列都必须有指定的相等标准，不管是直接指定，还是间接指定。

2）使用exists子句的关联子查询

条件：对于相关条件来说，该子查询只能返回一个记录。

5。not in和not exists调整

1）not in 非关联子查询：转化为in写法下的minus子句

2）not exists关联子查询：这种类型的反连接操作会为外部查询中每一个记录进行内部查询，除了不满足子查询中where条件的内部数据表以外，他会过滤掉所有记录。

可以重写：在一个等值连接中指定外部链接条件，然后添加select distinct

eg:select distinct ... from a,b where a.col1 = b.col1(+) and b.col1 is null

6。在子查询中使用all any

1. 1. 简介

本文简要介绍了关联子查询、非关联子查询、IN & EXISTS 、 NOT IN & NOT EXISTS之间的区别；同时对不同数据库版本下CBO对IN & EXISTS & NOT IN & NOT EXISTS的处理做了一定的阐述。

2. os、数据库版本以及测试数据

os：windows 2000 server sp4
db：oracle 10.1.0.2
 
 
set time on
set timing on
 
 
drop table outer_large_t
/
create table outer_large_t
(id number,
c1 varchar2(100),
c2 varchar2(100)
)
/
 
 
create index idx_outer_large_t on outer_large_t(id)
/
 
 
drop table outer_small_t
/
create table outer_small_t
as select *from outer_large_t 
where 1=2
/
create index idx_outer_small_t_id on outer_small_t(id)
/
 
 
drop table inner_large_t
/
create table inner_large_t
(id number,
c3 varchar2(100),
c4 varchar2(100)
)
/
create index idx_inner_large_t_1 on inner_large_t(id,c3)
/
drop table inner_small_t
/
create table inner_small_t
(id number,
c3 varchar2(100),
c4 varchar2(100)
)
/
create index idx_inner_small_t on inner_small_t(id,c3)
/

3. 2.关联子查询和非关联子查询

测试数据:

truncate table outer_large_t
/
truncate table inner_large_t
/
 
declare
begin
  for i in 1..50000 loop
    insert into outer_large_t values (i,'test','test');    
  end loop;
  for i in 30000..100000 loop
    insert into inner_large_t values (i,'test','test');    
  end loop;
  commit;
end;
/
 
analyze table outer_large_t compute statistics for table for all indexes
/
analyze table inner_large_t compute statistics for table for all indexes
/

非关联子查询形如：

select count(*) from outer_large_t
where id not in 
(select id from inner_large_t)
/

子查询与父查询没有关联。

关联子查询形如：

select count(*) from outer_large_t outer_t
where not exists
(select id from inner_large_t where id = outer_t.id)
/

子查询与父查询存在关联id = outer_t.id。

非关联子查询对于exists和not exists是没有意义的。

看如下实验：

11:17:00 test@GZSERVER> select count(*) from outer_large_t
11:17:02   2  where id not in
11:17:02   3  (select id from inner_large_t)
11:17:02   4  /
 
 
  COUNT(*)
----------
     29999

已用时间: 00: 00: 00.04

11:17:02 test@GZSERVER> select count(*) from outer_large_t
11:17:02   2  where id in
11:17:02   3  (select id from inner_large_t)
11:17:02   4  /
 
 
  COUNT(*)
----------
     20001

已用时间: 00: 00: 00.01

11:17:02 test@GZSERVER> select count(*) from outer_large_t outer_t
11:17:02   2  where not exists
11:17:02   3  (select id from inner_large_t)
11:17:02   4  /
 
 
  COUNT(*)
----------
         0

已用时间: 00: 00: 00.00

11:17:02 test@GZSERVER> select count(*) from outer_large_t outer_t
11:17:02   2  where exists
11:17:02   3  (select id from inner_large_t)
11:17:02   4  /
 
  COUNT(*)
----------
     50000

已用时间: 00: 00: 00.00

11:17:03 test@GZSERVER>

非关联子查询使用not exists的话父查询总是返回0，使用exists总是返回父查询的查询结果集。

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。