问题描述(Issue Symptoms)

SQL Server 2012 STD cluster安装在Windows Server 2012时,添加节点时,在以下界面超过4小时无法通过:

04856811cbffd474a489c181156f18a1.png-wh_

"Please wait while Microsoft SQL Server 2012 Service Pack 1 Setup processes the current operation."


第二天存在以下界面,但是无法选择已经安装的群集



原因分析(Cause)

1. 查看了安装日志发现

Details.txt

(01) 2014-05-16 19:05:52 Slp: Running discovery on local machine

(01) 2014-05-16 19:05:53 Slp: Discovery on local machine is complete

(01) 2014-05-16 19:05:53 Slp: Running discovery on remote machine:1xx P

(01) 2014-05-16 23:35:56 Slp: Discovery on 1xxP is complete

(01) 2014-05-16 23:35:56 Slp: Completed Action: RunRemoteDiscoveryAction, returned True

发现昨天晚上19::05:53执行的"Running discovery on remote machine"操作在4.5小时后完成。而这4.5小时更像是一个timeout的时间。

对于这个指令进行深入分析,确定他是需要调用Remote Registry Service来检索1xxP的注册表键值。

检查了两个节点的Remote Registry Service,发现都是started状态。


2. 检查了DNS和网卡设置:

DNS suffix相同,但是网卡采用Teaming,我怀疑这个问题可能和网络配置相关,导致网络性能问题。

3. 经过进一步研究,找到了以下两篇KB,确定这个问题是由于以下已知问题引起:

SQL Server 2008 failover cluster installation can take a long time on Windows Server 2008

http://support.microsoft.com/kb/2000219


You encounter poor performance when you use the SMB 2.0 protocol to perform network-related operations, such as ADMT migration, on computers that are running Windows Server 2008 or Windows Vista

http://support.microsoft.com/kb/950836


虽然这个KB应用与Windows Server 2008(vista),之前Windows Server 2008和SQL Server 2008上安装时也遇到了类似的问题。

4. 根据KB将LAN Manager Service的SMB 2.0 协议禁用,重启两个节点后,这个问题得以解决。成功添加群集节点。

  1. 原因分析

Network-related operations in Windows Vista and in Windows Server 2008 are based on Server Message Block (SMB) 2.0. When an SMB 2.0 server receives a request that will take a long time to process, the server processes the request differently. In this case, the server sends an interim response to the client, and then the server switches to processing the request asynchronously. However, the SMB 2.0 server waits synchronously for TCP to deliver this interim response. This behavior may produce a 200-millisecond delay because of the ACK response interval of TCP.

.120通过SMB协议接受请求后,会发送相应给.119,但是.119会异步处理这个请求,这样,会导致每一次获取一个注册表信息是都会有至少200MS的延迟,导致最终超时。这种问题在Windows Server 2008被修复。

然而目前环境,两个节点都是windows server 2012,使用SMB3.0,为什么也会有这种问题呢?请检查Domain Controller是否是Windows Server 2008,如果是的话,这个问题就比较好解释:

因为Windows Server 2012和Windows Server 2008的通讯还是要使用SMB2.0。

http://blogs.technet.com/b/josebda/archive/2013/10/02/windows-server-2012-r2-which-version-of-the-smb-protocol-smb-1-0-smb-2-0-smb-2-1-smb-3-0-or-smb-3-02-you-are-using.aspx

http://technet.microsoft.com/en-us/library/hh831795.aspx


解决方案(Resolution)


单击开始,在开始搜索框中,键入注册表编辑器,然后按 enter 键。

2. 找到并单击以下注册表子项:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\LanmanServer\Parameters

3. 在编辑菜单上,指向新建,然后单击DWORD 值

4. 键入SMB2,然后再按 ENTER。

5. 在编辑菜单上,单击修改

6. 键入0(零),然后单击确定

7. 退出注册表编辑器。

8. 重新启动计算机。

http://support.microsoft.com/kb/950836