Tomcat 日志文件目录、脚本正则表达式抓取
1、创建hive表:apachelog
语句如下:
CREATE TABLE apachelog (
host STRING,
identity STRING,
t_user STRING,
time STRING,
type STRING,
http STRING,
http_type STRING,
status STRING,
agent STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) \\[(.*?) .*?\\] \"([^ ]*) (.*?)\" ([^ ]*) ([^ ]*)"
)
STORED AS TEXTFILE;
最后load日志文件:
#LOAD DATA LOCAL INPATH 'log日志的绝对目录'
2、可以添加一个定时任务每小时去执行日志收集:
crontab -e
*/2400 * * * * /usr/sbin/sh shell脚本
日志格式可以如下:
127.0.0.1 - - [24/Apr/2016:09:55:45 +0800] "GET / HTTP/1.1" 200 11418
127.0.0.1 - - [24/Apr/2016:09:55:47 +0800] "GET / HTTP/1.1" 200 11418
127.0.0.1 - - [24/Apr/2016:09:57:52 +0800] "GET / HTTP/1.1" 200 11418
0:0:0:0:0:0:0:1 - - [24/Apr/2016:09:57:56 +0800] "GET / HTTP/1.1" 200 11418
0:0:0:0:0:0:0:1 - - [24/Apr/2016:09:57:56 +0800] "GET /tomcat.css HTTP/1.1" 200 5926
0:0:0:0:0:0:0:1 - - [24/Apr/2016:09:57:56 +0800] "GET /tomcat.png HTTP/1.1" 200 5103
0:0:0:0:0:0:0:1 - - [24/Apr/2016:09:57:56 +0800] "GET /bg-nav.png HTTP/1.1" 200 1401
0:0:0:0:0:0:0:1 - - [24/Apr/2016:09:57:56 +0800] "GET /asf-logo.png HTTP/1.1" 200 17811
0:0:0:0:0:0:0:1 - - [24/Apr/2016:09:57:56 +0800] "GET /bg-middle.png HTTP/1.1" 200 1918
0:0:0:0:0:0:0:1 - - [24/Apr/2016:09:57:56 +0800] "GET /bg-button.png HTTP/1.1" 200 713
0:0:0:0:0:0:0:1 - - [24/Apr/2016:09:57:56 +0800] "GET /bg-upper.png HTTP/1.1" 200 3103
















