1.Hive下查看数据表信息的方法
1.1查看表的字段信息
hive> desc smart_test.hdm;
OK
id string
prod_inst_id string
age int
money double
address string
day_id string
Time taken: 0.298 seconds, Fetched: 6 row(s)
1.2查看表的字段信息及元数据存储路径
hive> desc extended smart_test.hdm;
OK
id string
prod_inst_id string
age int
money double
address string
day_id string
Detailed Table Information Table(tableName:hdm, dbName:smart_test, owner:bdp, createTime:1577604128, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:string, comment:null), FieldSchema(name:prod_inst_id, type:string, comment:null), FieldSchema(name:age, type:int, comment:null), FieldSchema(name:money, type:double, comment:null), FieldSchema(name:address, type:string, comment:null), FieldSchema(name:day_id, type:string, comment:null)], location:hdfs://host66:8020/user/hive/warehouse/smart_test.db/hdm, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{transient_lastDdlTime=1577604128}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
Time taken: 0.154 seconds, Fetched: 8 row(s)
1.3查看表的字段信息及元数据存储路径
hive> desc formatted smart_test.hdm;
OK
# col_name data_type comment
id string
prod_inst_id string
age int
money double
address string
day_id string
# Detailed Table Information
Database: smart_test
Owner: bdp
CreateTime: Sun Dec 29 07:22:08 GMT 2019
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://host66:8020/user/hive/warehouse/smart_test.db/hdm
Table Type: MANAGED_TABLE
Table Parameters:
transient_lastDdlTime 1577604128
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
serialization.format 1
Time taken: 4.084 seconds, Fetched: 31 row(s)
2.查看表战用空间的大小
2.1查看普通表的容量(单位为Byte)
[bdp@host66 ~]hadoop fs -ls /user/hive/warehouse/smart_test.db/hdm | awk -F ' ' '{print $5}'|awk '{a+=$1}END {print a}'
48
这样可以省去自己相加,下面命令是列出该表的详细文件列表
[bdp@host66 ~]hadoop fs -ls /user/hive/warehouse/smart_test.db/hdm
2.2查看分区表的容量
[bdp@host66 ~]hadoop fs -ls /user/hive/warehouse/smart_test.db/hdm/yyyymm=201601 | awk -F ' ' '{print $5}'|awk '{a+=$1}END {print a/(1024*1024*1024)}'
39.709
这样可以省去自己相加,下面命令是列出该表的详细文件列表
[bdp@host66 ~]hadoop fs -ls /user/hive/warehouse/smart_test.db/hdm/yyyymm=201601
2.3查看该表总容量大小,单位为G
[bdp@host66 ~]hadoop fs -du /user/hive/warehouse/smart_test.db/hdm|awk '{ SUM += $1 } END { print SUM/(1024*1024*1024)}'
3.利用fsck是用来检测hdfs上文件、block信息,总文件个数,文件大小
[bdp@host66 ~]$hdfs fsck /user/hive/warehouse/wid_bigdata_1049.db/app_table_1584090796207
Status: HEALTHY
Total size: 2470065 B
Total dirs: 1
Total files: 601
Total symlinks: 0
Total blocks (validated): 600 (avg. block size 4116 B)
Minimally replicated blocks: 600 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 600 (100.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.0
Corrupt blocks: 0
Missing replicas: 600 (33.333332 %)
Number of data-nodes: 2
Number of racks: 1
参数解释:
Status:代表这次hdfs上block检测的结果
Total size: 代表/目录下文件总大小
Total dirs:代表检测的目录下总共有多少个目录
Total files:代表检测的目录下总共有多少文件
Total symlinks:代表检测的目录下有多少个符号连接
Total blocks(validated):代表检测的目录下有多少个block块是有效的
Minimally replicated blocks:代表拷贝的最小block块数
Over-replicated blocks:指的是副本数大于指定副本数的block数量
Under-replicated blocks:指的是副本数小于指定副本数的block数量
Mis-replicated blocks:指丢失的block块数量
Default replication factor: 3 指默认的副本数是3份(自身一份,需要拷贝两份)
Missing replicas:丢失的副本数
Number of data-nodes:有多少个节点
Number of racks:有多少个机架