Understanding Joins
1.Joins combine tables horizontally (side by side) by combining rows. The tables being joined are not required to have the same number of rows or columns. (被join的表不需要行或列与join表的相同)
2.When any type of join is processed, PROC SQL starts by generating a Cartesian product, which contains all possible combinations of rows from all tables.In all types of joins, PROC SQL generates a Cartesian product first, and then eliminates rows that do not meet any subsetting criteria that you have specified.(在所有的join过程中都是先建立笛卡尔积,再去一个个按照你表明的条件去删除!表中重复的列在join中是不会自动合并的,需手动合并)。
3.连接最多包括32张表,不计算视图数量,只计算视图中的表的数量。
4.连接必须要类型相同,变量名不一定的相同
2.最简单的join,不指定where选择子集,则会生成一个最基本的笛卡尔积(包括两个表所有可能的join)
理解连接的过程!!!!!!For all table
? builds a Cartesian product of rows from the indicated tables
? evaluates each row in the Cartesian product, based on the join conditions specified in
the WHERE clause (along with any other subsetting conditions), and removes any rows
that do not meet the specified conditions
? if summary functions are specified, summarizes the applicable rows
? returns the rows that are to be displayed in output.
有这个过程后,就能完全了解一对多,多对多,多对一连接后的结果了
反正全部都是进行一次所有行的笛卡尔积的生成,然后再按条件进行筛选,
而笛卡尔积的生成过程是主表对应附表行对行的一一对应(扫描)连接。但是具体过程更为复杂,涉及到分块等情况
3:inner join
def:An inner join combines and displays only the rows from the first table that match rows from the second table, based on the matching criteria (内连接只会对两表中基于准则的行进行组合和显示),In an inner join, a WHERE clause is added to restrict the rows of the Cartesian product that will be displayed in output. (在内连接中,where从句是限制在笛卡尔输出集中显示的行的数量)
proc sql;
select one.x, a, b /*select one.* , b* one.*表示表one中所有的列/
from one, two
where one.x = two.x;
quit;
3.1:在标准内连接中,出现两个表都含有重复的值的情况,内连接会对所有满足条件的观测行进行一一对应的笛卡尔积
4:Outer Join
You can think of an outer join as an augmentation of an inner join:an outer join returns all rows generated by an inner join, plus additional (nonmatching) rows.
(外连接是内连接的一个augmentation,除了交的部分,还含有并的某些或全部)
4.1Using a Left Outer Join,左表变量顺序保持不变
A left outer join retrieves all rows that match across tables, based on the specified matching criteria (join conditions), plus nonmatching rows from the left table (the first table specified in the FROM clause).(左连接会将所有满足ON条件的行进行连接,并会额外加上左表中所有不满足条件的行)In all three types of outer joins (left, right, and full), the columns in the result (combined) row that are from the unmatched row are set to missing values. (未满足条件的右表的行被置为缺失值)
4.2:Using a Right Outer Join,右表变量顺序保持不变
A right outer join retrieves all rows that match across tables, based on the specified matching criteria (join conditions), plus nonmatching rows from the right table (the second table specified in the FROM clause). (右连接会将所有满足ON条件的行进行连接,并会额外加上左表中所有不满足条件的行)
4.3:Using a Full Outer Join
A full outer join retrieves both matching rows and nonmatching rows from both tables. (full join把所有满足和不满足条件的行全部列出来)
如果要得出和merge一样的效果,需要加入coalesce函数
COALESCE(argument-1<..., argument-n>)
The COALESCE function checks the value of each argument in the order in which they are listed and returns the first non-missing value。
这个函数也可以对left和right/join用,但是只能得出left或right的结果
5:Using In-Line Views <<nested query>>(Unlike other queries, an in-line view cannot contain an
An in-line view is a nested query that is specified in the outer query's FROM clause.
与subquery的区别:子查询返回的是值,In-Line Views返回的是临时表,子查询在where后,In-Line Views在From后
优点:使用In-Line Views,在某些情况下效率会更高
6,Merge/Join的比较
join 不用排序、不用名字一样、并且条件不限制在等号
当是处于一对一的情况时,和Inner Join对应产生的结果一样。
data merged;
merge one two;
by x;
run;
proc print data=merged noobs;
title ’Table Merged’;
run;
proc sql;
title ’Table Merged’;
select one.x, a, b
from one, two
where one.x = two.x
order by x;
当处于有不对应的情况时,和Outer Join对应。
data merged;
merge three four;
by x;
run;
proc print data=merged noobs;
title ’Table Merged’;
run;
proc sql;
title ’Table Merged’;
select coalesce(three.x, four.x)
as X, a, b
from three
full join
four
on three.x = four.x;