Impala 简单使用指南

原创

wx63118e2bb7416 2022-09-02 13:44:14 博主文章分类：BigData ©著作权

文章标签 Impala hadoop hive java 数据 文章分类 运维

©著作权归作者所有：来自51CTO博客作者wx63118e2bb7416的原创作品，请联系作者获取转载授权，否则将追究法律责任

Impala优点


 1. 基于内存运算，不需要讲中间结果写入磁盘，省去IO
 2. 无序转换成MapReduce程序，直接访问HDFS,Hbase中的数据进行作业调度，速度快
 3. 使用了支持data locality的IO调度机制：尽可能将数据和计算分配在同一台机器上，减少网络开销
 4. 支持各种文件格式，如TEXTFILE 、SEQUENCEFILE 、RCFile、Parquet。//不支持orc
 5. 可以访问hive的metastore，对hive数据直接做数据分析。

Impala缺点


 1. 对内存的依赖较大，且完全依赖于hive
 2. 当新的记录、文件被添加到hdfs中的数据目录时，该表需要被刷新

Impala架构


 1. Impalad：
    接收client的请求、Query执行并返回给中心协调节点；
    子节点上的守护进程，负责向statestore保持通信，汇报工作。
  2. Catalog：
    分发表的元数据信息到各个impalad中；
    接收来自statestore的所有请求。
  3. Statestore：
      收集各个impalad进程的资源信息，各节点状态信息

impala-shell 小例子

Impala 简单使用指南_数据

Options:
  -h, --help            show this help message and exit
  -i IMPALAD, --impalad=IMPALAD
                        <host:port> of impalad to connect to
                        [default: hadoop106:21000]
  -q QUERY, --query=QUERY
                        Execute a query without the shell [default: none]
  -f QUERY_FILE, --query_file=QUERY_FILE
                        Execute the queries in the query file, delimited by ;
                        [default: none]
  -k, --kerberos        Connect to a kerberized impalad [default: False]
  -o OUTPUT_FILE, --output_file=OUTPUT_FILE
                        If set, query results are written to the given file.
                        Results from multiple semicolon-terminated queries
                        will be appended to the same file [default: none]
  -B, --delimited       Output rows in delimited mode [default: False]
  --print_header        Print column names in delimited mode when pretty-
                        printed. [default: False]
  --output_delimiter=OUTPUT_DELIMITER
                        Field delimiter to use for output in delimited mode
                        [default: \t]
  -s KERBEROS_SERVICE_NAME, --kerberos_service_name=KERBEROS_SERVICE_NAME
                        Service name of a kerberized impalad [default: impala]
  -V, --verbose         Verbose output [default: True]
  -p, --show_profiles   Always display query profiles after execution
                        [default: False]
  --quiet               Disable verbose output [default: False]
  -v, --version         Print version information [default: False]
  -c, --ignore_query_failure
                        Continue on query failure [default: False]
  -r, --refresh_after_connect
                        Refresh Impala catalog after connecting
                        [default: False]
  -d DEFAULT_DB, --database=DEFAULT_DB
                        Issues a use database command on startup
                        [default: none]
  -l, --ldap            Use LDAP to authenticate with Impala. Impala must be
                        configured to allow LDAP authentication.
                        [default: False]
  -u USER, --user=USER  User to authenticate with. [default: root]
  --ssl                 Connect to Impala via SSL-secured connection
                        [default: False]
  --ca_cert=CA_CERT     Full path to certificate file used to authenticate
                        Impala's SSL certificate. May either be a copy of
                        Impala's certificate (for self-signed certs) or the
                        certificate of a trusted third-party CA. If not set,
                        but SSL is enabled, the shell will NOT verify Impala's
                        server certificate [default: none]
  --config_file=CONFIG_FILE
                        Specify the configuration file to load options. File
                        must have case-sensitive '[impala]' header. Specifying
                        this option within a config file will have no effect.
                        Only specify this as a option in the commandline.
                        [default: /root/.impalarc]
  --live_summary        Print a query summary every 1s while the query is
                        running. [default: False]
  --live_progress       Print a query progress every 1s while the query is
                        running. [default: False]
  --auth_creds_ok_in_clear
                        If set, LDAP authentication may be used with an
                        insecure connection to Impala. WARNING: Authentication
                        credentials will therefore be sent unencrypted, and
                        may be vulnerable to attack. [default: none]
  --ldap_password_cmd=LDAP_PASSWORD_CMD
                        Shell command to run to retrieve the LDAP password
                        [default: none]
  --var=KEYVAL          Define variable(s) to be used within the Impala
                        session. [default: none]

常用Impala外部shell

-h ：显示帮助命令
-i：指定连接的impalad守护进程的主机，端口号默认21000
-q: 运行一个 SQL命令
-f : 执行一个SQL脚本
-o : 将查询结果输出到指定的文件
-c : 查询过程中忽略失败sql ,继续执行其他SQL
-r : 建立连接后刷新Impala元数据
-p : 显示查询计划
-B ：去格式化输出
--output_delimiter=character : 指定分隔符
--print_header : 打印列名

Impala内部shell

-help: 显示帮助信息
shell：在不退出impala的情况下，执行hdfs 或 linux命令，去操作hdfs或linux文件系统
connect: 连接impalad主机
refresh : 增量刷新元数据信息
invalidate metadata: 全部刷新元数据库

Impala中不支持的hive相关操作

1.Impala基本数据类型不支持Binary
2.Impala创建数据库时不支持with dbproperties()
3. 对复杂数据类型支持不完全，//基本别使用

删除数据库

Impala不支持修改数据库语法，alter database
当数据库被使用时，无法进行删除/强制删除cascade

向表中导入数据时

不支持导入load local data ==>不支持导入本地数据
不支持export import

删除分区时

不支持一次性删除多个分区

查询

1.不支持cluster by ,distribute by,sort by //因为不会转化为mapreduce程序
2.Impala不支持分桶表
3.不支持collect_set(col)和explode(col) //数组类型支持的不行

Impala存储格式

不支持orc格式存储

总结：

除了上述与hive的不同，其他的语法和hive差不多...

上一篇：Hadoop NameNode和DataNode只能启动一个问题

下一篇：SSM整合过程

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯