【故障TNS-00517和 Broken pipe
1.2.1 导读和注意事项
~O(∩_∩)O~① Linux Error: 32: Broken pipe
② TNS-12518: TNS:listener could not hand off client connection③ Tips①
文章中用到的所有代码http://blog.itpub.net/26736162/viewspace-1624453/
③ pdf格式的文档来阅读④ ,代码输出部分一般放在一行一列的表格中。
db 类型 |
11.2.0.3.0 |
|||
OS版本及登陆报错:
),报错如下:
重新用setasmgidwrap设置[root@orcltest bin]# ll /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle [root@orcltest bin]# ll /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle [root@orcltest bin]# stat /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle
Access: (6751/-rwsr-s--x) Uid: ( 501/ oracle) Gid: ( 504/asmadmin) Change: 2017-03-16 12:33:15.733816820 +0800 [oracle@orcltest ~]$ sqlplus 'sys/"l@h\r/0"'@LHRDB as sysdba
With the Partitioning, Automatic Storage Management, OLAP, Data Mining SYS@LHRDB>
“TNS-12518: TNS:listener could not hand off client connection”、ASM磁盘出错等。解决办法很简单,可以在$ORACLE_HOME/bin/oracle可执行文件的权限和属主或者直接将$ORACLE_HOME/bin/oracle可执行文件正确属主应该是共享才可以,如下所示:
1.6 参考文章 1.6.1.1 Troubleshooting Guide for TNS-12518 TNS listener could not hand off client connection (文档 ID 550859.1)
|
||||
Troubleshooting Steps |
Section II: Commonly Known Errors: |
Error: Connection Pooling limit reached |
Error: 2: No such file or directory |
Error: 10022: Unknown error |
Let us have a small discussion about how actually database connections are made:
In Dedicated mode, database client contacts listener and supplies the SERVICE NAME of the database. Then listener spawns a dedicated server process and hands off the client connection to this dedicated server process. TNS-12518 indicates a problem while handing off the client connection to the server process.$lsnrctl status
- - -
- - -
Listener Parameter File /ora10g/home_ora10g/network/admin/listener.ora
Listener Log File /ora10g/home_ora10g/network/log/listener.log
- - -
- - -
In the above example, listener log shows the complete error stack, the bottom error being 32 is the OS error. It also shows that the jdbc client from IP 10.10.10.3 has tried to connect to the database service 'test.oracle.com' and failed with the error 12518.
The highlighted state should be in 'ready' state for the connection to be successful. If the state is in 'blocked' then the connection are not possible. The state of a handler could be in blocked state in the following scenario:
i. The database parameter processes reached its value.
ii. The database is in the process of startup or shutting down.
In shared server mode, the number of dispatchers should be set according to the load that you expect. 'lsnrctl services' output shows the maximum number(max:997) of connections that the dispatcher would accept and the number connections refused (refused:0) by this dispatcher. If any connections refused by the dispatcher, then consider increasing the number of dispatchers.
If you are using PFILE edit init.ora and increase the dispatchers parameter. If you are using SPFILE you can dynamically increase the dispatchers parameter by the'alter system set' command.
Step 4. Is a local BEQ connection successful
Check if local BEQ connection to the database works fine. It also verifies if the database is up and in good condition to accept the connection. If the database is down or in a hung state then a connection request to the database by the listener will not be possible.
Connect to the database server via telnet or ssh and check if a local bequeath SQL*Plus connection works. In other words, issue:
sqlplus username/password [Enter]
This connection bypasses the listener and directly connects to the database via the BEQ (bequeath) protocol. If this fails, then the TNS-12518 listener error is simply a result of the database issue.
One such error is:
ORA-12560: TNS:protocol adapter error
A possible cause for this error on Microsoft Windows servers, is that the Windows Database Service has not yet been created (common when creating a "standby" instance).
Resolution for this would be to create the Windows Service first by using the "oradim" command (see the Database Admin guide for details on oradim and service creation).
Step 5. Has number of processes reached its limit?
If local BEQ is successful, check the below query
In the above example, the processes parameter has been set to 250. It's MAX_UTILIZATION has reached the limit value of 250, so the processes parameter should be increased further to accomodate the number of incoming connections.
Edit the init.ora and set the processes parameter to a higher value. By default, if you just increase the processes it is enough, the sessions value would automatically be increased.
If there are any memory or process related error in the alert log during the time the TNS-12518 is logged in the listener log, then those errors in the alert log should be focused on and should be solved at first. Because, the errors in the alert log is the base error for the TNS-12518 in the listener log.
However, the errors in the alert logs are not being discussed in this article, they are out of the scope of this article.
(SID_LIST=
(SID_DESC =
(GLOBAL_DBNAME = ORCL.oracle.com)
(SID_NAME = ORCL)
(PROGRAM=extproc)
(ORACLE_HOME = D:\oracle\product\10.2.0\db_1)))
_______________________________________________________________________________________________________________________________________
Error: 32: Broken pipe
Error stack in listener log:
TNS-12518: TNS:listener could not hand off client connection
TNS-12549: TNS:operating system resource quota exceeded
TNS-12560: TNS:protocol adapter error
TNS-00519: Operating system resource quota exceeded
IBM/AIX RISC System/6000 Error: 11: Resource temporarily unavailable
Cause:
As the error indicates operating system resource has exceeded.
Action:
1. Increase the appropriate OS kernel parameters for 'maximum number of processes allowed per user'.
For example for HP-UX the parameters are maxuprc and nproc.
______________________________________________________________________________________________________________________________________
Error: 12: Not enough space
Error stack in listener log:
TNS-12518: TNS:listener could not hand off client connection
TNS-12560: TNS:protocol adapter error
TNS-00530: Protocol adapter error
Linux Error: 24: Too many open files
Error Description:
Out of file descriptors
Action:
See Note 1527483.1 11.2 : ORA-12518 Listener Hangs and Reports "Too Many Open Files"
Use prescribed workaround OR apply one-off patch to your environment if available.
_______________________________________________________________________________________________________________________________________
Cause:
The listener doesn't have adequate permission on socket files
Error stack in listener log:
TNS-12518: TNS:listener could not hand off client connection
TNS-12560: TNS:protocol adapter error
TNS-00530: Protocol adapter error
HPUX Error: 1: Not owner
Action: Clear /var/tmp/.oracle/ directory
IMPORTANT NOTE: In RAC environment, please use caution when removing existing socket files. See Note 2099377.1 How to remove Network socket files in a RAC Environment for Cluster/Resource startup issues
It is also recommended that you refer the Section I above for a generic troubleshooting approach to the error TNS-12518.
This section briefly describes about the errors that are encountered on Windows Operating System. TNS-12518 most commonly occurs on 32-bit OS due to its memory constraint, however TNS-12518 can occur on 64-bit OS as well. See Note 873752.1 for more information on Windows memory addressing and the 3GB switch.
_______________________________________________________________________________________________________________________________________
Error: 2: No such file or directory
Error stack in listener log:
TNS-12518: TNS:listener could not hand off client connection
TNS-12560: TNS:protocol adapter error
TNS-00530: Protocol adapter error
32-bit Windows Error: 233: Unknown error
Error Description:
ERROR_PIPE_NOT_CONNECTED
233
No process is on the other end of the pipe.
Cause:
The communication has been broken while the listener is trying to hand off the client connection to the server process or dispatcher process.
Action:
Refer Note 371983.1
Error stack in listener log:
12518 TNS-12518: TNS:listener could not hand off client connection
TNS-12560: TNS:protocol adapter error TNS-00534: Failed to grant connection ownership to child
64-bit Windows Error: 10022: Unknown error
Error Description:
Error: 10022: Invalid Argument
Cause:
An invalid argument was supplied.
Note that in 12c, the account would be the Oracle Home User.
Note that a JDBC Thin connection from a Windows client might also yield the 12518:
Macromedia}{Oracle JDBC Driver][Oracle]Connection refused, (DESCRIPTION=(TMP=)(VSNNUM=186646784)(ERR=12518)(ERROR_STACK= (ERROR=(CODE=12518)(EMFI=4))(ERROR=(CODE=12560)(EMFI=4))(ERROR=(CODE=530)(EMFI=4))(ERROR=(BUF='64-bit Windows Error:203: Unknown error'))))
REFERENCES
- 11.2 : ORA-12518 Listener Hangs and Reports "Too Many Open Files"
Symptoms
References |
Oracle Database - Enterprise Edition - Version 11.2.0.3 and later
Information in this document applies to any platform.
SYMPTOMS
// *Cause: "End of file" condition has been reached; partner has disconnected.
// *Action: None needed; this is an information message.
Before the error is reported, connection could hang for a while, and a core file may also be generated.
- listener_scann.log or listener.log
INFO: Starting Output Reader Threads for process /ocw/grid/bin/kfod
INFO: Parsing KFOD-00300: OCI error [-1] [OCI error] [ORA-12547: TNS:lost contact
INFO: Parsing ] [12547]
INFO: Parsing
INFO: The process /ocw/grid/bin/kfod exited with code 1
..
SEVERE: [FATAL] [INS-30502] No ASM disk group found.
CAUSE: There were no disk groups managed by the ASM instance +ASM1.
CAUSE
2. Oracle binary in database home has wrong permission:
/home/oracle on /dev/dsk/diskoracle read/write/nosuid..
As listener owner:
Check that the RDBMS $ORACLE_HOME is set to 755.
This can be seen from an OS trace such as strace or truss when using it to trace the CRS user running the "oracle" executable which fails with the "Permission denied" error.
Also:
a) Log in as the "GRID" user on each node, and issue the following (on each directory under the RDBMS Home) :-
***NOTE: the Oracle directory has 700 for the permissions, which should be changed to 755:
[grid@orcl002:+ASM2 ~]$ ll /home/oracle/app
drwx------. 8 oracle oinstall 4096 Oct 12 08:38 oracle
[grid@orcl002:+ASM2 ~]$ ll /home/oracle/app
drwxr-xr-x. 8 oracle oinstall 4096 Oct 12 08:38 oracle
b) Likewise the /product directory has 700 perms, so change to 755 -->
drwx------. 3 oracle oinstall 4096 Oct 12 08:58 product
[grid@orcl002:+ASM2 ~]$ ls -al /u01/app/oracle
drwxr-xr-x. 3 oracle oinstall 4096 Oct 12 08:58 product
NOTE:422173.1NOTE:975457.1NOTE:970619.1TNS Listener Crash with Core Dump (文档 ID 549932.1)
***Checked for relevance on 6-JUL-2016***
Listener Log:
------------- tnslsnr[5841]: segfault at 0000000000000018 rip 0000003eab66854d rsp 0000007fbfff9230 error 4
tnslsnr[6469]: segfault at 0000000000000018 rip 0000003eab66854d rsp 0000007fbfff9420 error 4
tnslsnr[7375]: segfault at 0000000000000018 rip 0000003eab668bb3 rsp 0000007fbfff9c70 error 4The core indicates that the program terminated with signal 11, Segmentation fault .
SIGSEGV is reported for improper memory handling .The default action for a program upon receiving
SIGSEGV is abnormal termination. This action will end the process. procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
2 0 0 27124 3940 1160 74528 0 1304 956 5360 272 191 51 21 98
3 0 0 27080 3756 1180 72580 0 260 2552 388 218 429 90 10 89
14 2 1 26808 5096 1188 69868 84 1108 2016 9064 490 567 59 22 96
4 0 0 25548 5912 1192 73032 0 0 436 0 478 736 50 50 0
5 0 0 25548 3940 1192 73548 0 0 1560 0 301 385 93 7 0
1 1 1 25548 3336 1192 71800 8 176 2848 432 258 147 91 9 88
1 0 0 25544 4124 1200 70480 116 60 836 60 171 200 97 3 92 Extensive paging/swapping activity is a clear indication that the system is running out of the physical memory.SOLUTION3. Configure Hugepages on the OS. Ref : Note 361323.1BUG:6752308NOTE:361323.1潇湘隐者最近一周,有一台ORACLE数据库服务器的监听服务在凌晨2点过几分的时间点突然崩溃,以前从没有出现过此类情况,但是最近一周出现了两次这种情况,检查时发现了如下一些信息: 第一次错误信息截图第二次错误信息截图Cause: 1. One of reason would be processes parameter being low, and can be verified by the v$resource_limit view. 4. Memory resource is also another cause for this issue. Check the swap, memory usage of the OS.
Column
RESOURCE_NAME
Name of the resource
资源的当前使用量 |
NUMBER |
INITIAL_ALLOCATION |
Initial allocation. This will be equal to the value specified for the resource in the initialization parameter file (UNLIMITED for infinite allocation). |
系统设置的资源值 |
SQL> select * from v$resource_limit; RESOURCE_NAME CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_ALLOCATION LIMIT_VALUE --------------------- ------------------- --------------- ------------------ ------------ processes 113 152 170 170 sessions 115 154 192 192 enqueue_locks 57 117 2480 2480 enqueue_resources 40 86 1064 UNLIMITED ges_procs 0 0 0 0 ges_ress 0 0 0 UNLIMITED ges_locks 0 0 0 UNLIMITED ges_cache_ress 0 0 0 UNLIMITED ges_reg_msgs 0 0 0 UNLIMITED ges_big_msgs 0 0 0 UNLIMITED ges_rsv_msgs 0 0 0 0 RESOURCE_NAME CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_ALLOCATION LIMIT_VALUE --------------------- ------------------- --------------- ------------------ ------------ gcs_resources 0 0 0 0 gcs_shadows 0 0 0 0 dml_locks 0 76 844 UNLIMITED temporary_table_locks 0 3 UNLIMITED UNLIMITED transactions 2 12 211 UNLIMITED branches 0 1 211 UNLIMITED cmtcallbk 0 1 211 UNLIMITED sort_segment_locks 39 131 UNLIMITED UNLIMITED max_rollback_segments 11 11 211 65535 max_shared_servers 1 1 UNLIMITED UNLIMITED parallel_max_servers 0 0 0 3600 22 rows selected. SQL> lsnrctl services 查看时并没有发现dispatcher has refused any connections,所以也可以排除dispatchers数量偏少的原因。这个如下截图所示 Dec 7 04:02:13 ceglnx01 syslogd 1.4.1: restart. Dec 8 07:53:22 ceglnx01 avahi-daemon[3706]: Invalid query packet. Dec 8 08:20:16 ceglnx01 last message repeated 9 times
APPLIES TO:
***Checked for relevance on 22-MAR-2013*** · The number of sessions in the database is well below the upper or maximum limit defined in the parameter file. Listener Log: ..... TNS-12571: TNS:packet writer failure Linux Error: 104: Connection reset by peer TNS-12547: TNS:lost contact Linux Error: 32: Broken pipe The Operating system log (/var/log/messages) may show the following : tnslsnr[7375]: segfault at 0000000000000018 rip 0000003eab668bb3 rsp 0000007fbfff9c70 error 4 Program terminated with signal 11, Segmentation fault. ......... #1 0x00000032b74691f6 in free () from /lib64/tls/libc.so.6 #4 0x00000000004061cb in main () #1 0x00000032b74691f6 in free () from /lib64/tls/libc.so.6 #4 0x00000000004061cb in main () SIGSEGV is abnormal termination. This action will end the process. ------------- Note: You may also use the top command to check the system memory usage. SOLUTION OR 3. Configure Hugepages on the OS. Ref : Note 361323.1 # cat /proc/meminfo |grep Hugepagesize Hugepagesize: 2048 kB BUG:6752308 - LISTENER DIED BY SEGFAULT AFTER TNS ERROR 1: 增加系统物理内存 鉴于当时操作系统有100多天没有重启过了,于是在2014-12-12 23:00重启了一下Linux服务器,到目前为止已经运行了3天,暂时没有出现这个错误。因为有可能一些内存泄露也会导致内存资源不足情况,例如,TNSListener Leaking Memory Using Dedicated Server (文档 ID 785742.1)。所以暂时没有实施上面方案2、3、想运行一段时间,验证一下自己的想法,如果还是出现这个错误,则尝试方案2、3、 从另外一方面分析,在凌晨2点过几分出现这个错误,是因为,有两个比较大的作业在此时运行。消耗的服务器资源比较大,所以也从侧面验证了内存资源不足。 一生产系统监听异常停止了,listener.log中报出如下错误: 并且操作系统日志/var/log/messages中抛出类似如下错误: tnslsnr[5841]: segfault at 0000000000000018 rip 0000003eab66854d rsp 0000007fbfff9230 error 4 在metalink上有这篇文档:549932.1
|
问题现象:
1. Increase the physical memory of the system.
OR
2. Apply the Patch 6139856 for unpublished Bug 6139856 if available for your platform.
OR
3. Configure Hugepages on the OS. Ref : Note 361323.1
而且从操作系统日志中,可以看到linux自己kill 进程的信息(由于事后总结,且信息在内网内,权限有限,贴不出日志内容)。
之所以说恰好是监听进程,是因为在/var/log/messages中,看到之前也有杀掉oracle进程的信息,但当时监听并未停掉,所以怀疑当时杀掉的并不是oracle监听进程,可能是其他非本地进程。
我的解决参考:http://www.itpub.net/thread-1870217-1-1.html
问题描述:
SQL*Plus: Release 11.2.0.1.0 Production on Tue Jun 3 16:42:05 2014
[oracle@jkxtrac1 bin]$ sqlplus / as sysdba
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
环境:操作系统
Distributor ID: RedHatEnterpriseServer
Codename: Tikanga
监听日志:
TNS-12518: TNS:listener could not hand off client connection
TNS-00517: Lost contact
alert日志
[root@jkxtrac1 log]# tail -n50 messages
Jun 3 14:27:42 jkxtrac1 avahi-daemon[8081]: Registering new address record for 10.199.102.23 on eth0.
Jun 3 14:28:56 jkxtrac1 gconfd (root-10454): 正在启动(版本 2.14.0),pid 10454 用户“root”
Jun 3 14:28:56 jkxtrac1 gconfd (root-10454): 地址“xml:readonly:/etc/gconf/gconf.xml.defaults”解析为位于 2 的只读配置源
Jun 3 14:28:58 jkxtrac1 pcscd: winscard.c:304:SCardConnect() Reader E-Gate 0 0 Not Found
Jun 3 14:28:59 jkxtrac1 nm-system-settings: ifcfg-rh: parsing /etc/sysconfig/network-scripts/ifcfg-lo ...
Jun 3 14:28:59 jkxtrac1 nm-system-settings: ifcfg-rh: parsing /etc/sysconfig/network-scripts/ifcfg-usb0 ...
Jun 3 14:28:59 jkxtrac1 nm-system-settings: ifcfg-rh: read connection 'System eth0'
Jun 3 14:53:04 jkxtrac1 kernel: qla2xxx 0000:8b:00.1: scsi(4:1:3): Abort command issued -- 1 34f 2002.
Jun 3 15:49:19 jkxtrac1 scim-bridge: The lockfile is destroied
Jun 3 16:14:00 jkxtrac1 kernel: qla2xxx 0000:8b:00.1: scsi(4:1:3): Abort command issued -- 1 cd7 2002.
Jun 3 16:29:00 jkxtrac1 scim-bridge: Panel client has not yet been prepared
Jun 3 16:29:00 jkxtrac1 scim-bridge: Cleanup, done. Exitting...
Jun 3 16:29:27 jkxtrac1 gconfd (root-10454): 退出
Jun 3 16:32:13 jkxtrac1 gconfd (root-1975): 地址“xml:readonly:/etc/gconf/gconf.xml.mandatory”解析为位于 0 的只读配置源
Jun 3 16:32:14 jkxtrac1 pcscd: winscard.c:304:SCardConnect() Reader E-Gate 0 0 Not Found
Jun 3 16:32:14 jkxtrac1 hcid[7308]: Default passkey agent (:1.18, /org/bluez/applet) registered
Tasks: 665 total, 2 running, 662 sleeping, 0 stopped, 1 zombie
Swap: 34933332k total, 0k used, 34933332k free, 3341088k cached
471 root 10 -5 0 0 0 S 0.9 0.0 0:36.27 kacpid
9342 grid 15 0 1323m 57m 40m S 0.7 0.2 0:48.72 oracle
8763 root 16 0 292m 35m 15m S 0.5 0.1 0:35.45 orarootagent.bi
7042 root 10 -5 0 0 0 S 0.2 0.0 0:03.46 kondemand/15
8798 root RT 0 249m 92m 56m S 0.2 0.3 0:04.35 cssdmonitor
30771 oracle 15 0 12.8g 51m 36m S 0.2 0.2 0:09.80 oracle
------------------------------ ------------------- --------------- ---------------------------------------- ----------------------------------------
enqueue_locks 39 52 19523 19523
ges_ress 9448 9448 32571 UNLIMITED
ges_reg_msgs 112 134 2730 UNLIMITED
RESOURCE_NAME CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_ALLOCATION LIMIT_VALUE
gcs_shadows 36561 36563 UNLIMITED UNLIMITED
smartio_metadata_memory 0 0 0 UNLIMITED
temporary_table_locks 0 0 UNLIMITED UNLIMITED
cmtcallbk 0 2 1689 UNLIMITED
max_rollback_segments 11 11 1689 65535
max_shared_servers 1 1 UNLIMITED UNLIMITED
问题解决,结贴。
方法:
1、修改oracle.exe的权限为6751;
[oracle@jkxtrac1 bin]$ ls -l ./oracle
-rwxr-s--x 1 oracle asmadmin 239627031 Jan 21 18:59 ./oracle
[oracle@jkxtrac1 bin]$ ls -l oracle
-rwxr-s--x 1 oracle asmadmin 239627031 Jan 21 18:59 oracle
[oracle@jkxtrac1 bin]$ chmod 6751 oracle
[oracle@jkxtrac1 bin]$ ls -l oracle
-rwsr-x--x 1 oracle asmadmin 239627031 Jan 21 18:59 oracle
2、重新用setasmgidwrap设置oracle.exe
[root@jkxtrac1 ~]# cd /data/app/11.2.0/grid_1/bin/
[root@jkxtrac1 bin]# ./setasmgidwrap o=/home/oracle/app/oracle/product/11.2.0/dbhome_1/bin/oracle
感谢各位朋友的帮助;;