介绍

        自从Hadoop集群上了kerberos后,简直是痛不欲生啊,ops和dev各种吐槽,最初启用kerberos的初衷是为了防止hive、impala的误操作,而kerberos只是授权认证,hive/impala的授权是通过自身的命令行实现的,但是hive和impala的授权都是粗粒度的,有没有细粒度的授权呢?Sentry就是为此而生。

        Sentry为确保数据安全,提供了一个统一平台,使用现有的Hadoop Kerberos实现安全认证(本文去掉kerberos组件)。授权是通过命令。Sentry通过一组特权,如SELECT和INSERT,控制着对Hive Metastore中每个schema对象的访问。schema对象是数据管理中常见的实体,例如SERVER、DATABASE、TABLE、 COLUMN和URI,也就是HDFS中文件的位置。Cloudera Search有它自己的一组特权(如QUERY)和对象(如COLLECTION)。

        Sentry通常被配制成默认不允许访问服务和数据。因此,在被划分到设有指定访问角色的用户组之前,用户只有有限的权限访问系统。

        Sentry提供了:

有层次结构的对象,自动地从上层对象继承权限;
包含了一组多个对象/权限对的规则;
用户组可以被授予一个或多个角色;
用户可以被指定到一个或多个用户组中;



规划

主机

hive

Sentry

DB

hadoop1

 

 

 

hadoop2

 

 

 

hadoop3

 

 

 

hadoop4

 

 

 

hadoop5

hive-server2

 

 

hadoop6

 

sentry-store

mysql

 



部署



    安装Sentry

yum install sentry sentry-store sentry-hdfs-plugin -y

   

seatunnel 读区hive_hive

 



    配置Sentry



        /etc/sentry/conf/sentry-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <property>
        <name>sentry.service.security.mode</name>
        <value>none</value>
    </property>

    <property> 
        <name>sentry.service.server.principal</name>  
        <value></value> 
    </property>  

    <property> 
        <name>sentry.service.server.keytab</name>  
        <value></value> 
    </property>  

    <property>
        <name>sentry.service.admin.group</name>
        <value>hive,impala,hue,admin</value>
    </property>

    <property>
        <name>sentry.service.reporting</name>
        <value>JMX</value>
    </property>

    <property>
        <name>sentry.service.web.enable</name>
        <value>true</value>
    </property>

    <property> 
        <name>sentry.service.web.port</name>  
        <value>51000</value> 
    </property>  

    <property> 
        <name>sentry.service.web.authentication.type</name>  
        <value>NONE</value> 
    </property> 

    <property>
        <name>sentry.service.allow.connect</name>
        <value>impala,hive,solr,hue</value>
    </property>

    <property>
        <name>sentry.verify.schema.version</name>
        <value>true</value>
    </property>

    <property>
        <name>sentry.service.server.rpc-address</name>
        <value>hadoop6</value>
    </property>

    <property>
        <name>sentry.service.server.rpc-port</name>
        <value>8038</value>
    </property>

    <property>
        <name>sentry.store.jdbc.url</name>
        <value>jdbc:mysql://172.31.217.156:3306/sentry</value>
    </property>

    <property>
        <name>sentry.store.jdbc.user</name>
        <value>root</value>
    </property>

    <property>
        <name>sentry.store.jdbc.password</name>
        <value>123456</value>
    </property>

    <property>
        <name>sentry.store.jdbc.driver</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>

    <property>
        <name>sentry.hive.server</name>
        <value>hadoop5</value>
    </property>

    <property>
        <name>sentry.store.group.mapping</name>
        <value>org.apache.sentry.provider.common.HadoopGroupMappingService</value>
    </property>
</configuration>

        ps:在sentry-site.xml文件中的sentry.service.admin.group中添加hive,impala和hue组。如果最终用户在这些管理组之一中,则该用户对Sentry服务器具有管理权限。



    初始化Sentry

        因为使用mysql作为后端DB,所以sentry需要使用java的mysql-connector,拷贝hive的mysql-connector-java.jar到Sentry的lib目录

cp /usr/lib/hive/lib/mysql-connector-java.jar /usr/lib/sentry/lib/



        初始化DB

cd /etc/sentry/conf
sentry --command schema-tool --conffile ./sentry-site.xml --dbType mysql --initSchema

seatunnel 读区hive_seatunnel 读区hive_02



    启动Sentry

/etc/init.d/sentry-store restart

seatunnel 读区hive_hive_03



测试Sentry WEB

http://172.31.217.156:51000/

seatunnel 读区hive_hive_04

 



Hive配置调整



    /etc/hive/conf/hive-site.xml

<configuration>

    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://172.31.217.156:3306/metastore</value>
    </property>

    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>

    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>hive</value>
    </property>

    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>123456</value>
    </property>
    
    <property>
        <name>datanucleus.readOnlyDatastore</name>
        <value>false</value>
    </property>

    <property> 
        <name>datanucleus.fixedDatastore</name>
        <value>false</value> 
    </property>

    <property>
        <name>datanucleus.autoCreateSchema</name>
        <value>true</value>
    </property>

    <property>
        <name>datanucleus.autoCreateTables</name>
        <value>true</value>
    </property>

    <property>
        <name>datanucleus.autoCreateColumns</name>
        <value>true</value>
    </property>

    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>haddop1:23125,hadoop6:23125</value>
    </property>

    <property>
        <name>hive.auto.convert.join</name>
        <value>true</value>
    </property>

    <property>
        <name>hive.metastore.schema.verification</name>
        <value>false</value>
    </property>

    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/user/hive/warehouse</value>
    </property>

    <property>
        <name>hive.warehouse.subdir.inherit.perms</name>
        <value>true</value>
    </property>

    <property>
        <name>hive.metastore.uris</name>
        <value>thrift://hadoop5:9083</value>
    </property>

    <property>
        <name>hive.metastore.client.socket.timeout</name>
        <value>36000</value>
    </property>

    <property>
        <name>hive.zookeeper.quorum</name>
        <value>hadoop2:2181,hadoop3:2181,hadoop4:2181</value>
    </property>

    <property>
        <name>hive.server2.thrift.port</name>
        <value>10000</value>
    </property>

    <property>
        <name>hive.server2.thrift.bind.host</name>
        <value>0.0.0.0</value>
    </property>

    <property>
        <name>hive.server2.thrift.min.worker.threads</name>
        <value>2</value>
    </property>

    <property>
        <name>hive.server2.thrift.max.worker.threads</name>
        <value>10</value>
    </property>

    <property>  
        <name>hive.metastore.authorization.storage.checks</name>  
        <value>true</value>  
    </property>  

    <property>
        <name>dfs.client.read.shortcircuit</name>
        <value>true</value>
    </property>

    <property>
        <name>dfs.domain.socket.path</name>
        <value>/var/lib/hadoop-hdfs/dn_socket</value>
    </property>

    <property>
        <name>hive.execution.engine</name>
        <value>spark</value>
    </property>

    <property>
        <name>hive.enable.spark.execution.engine</name>
        <value>true</value>
    </property>

    <property>
        <name>spark.home</name>
        <value>/opt/programs/spark_1.6.0</value>
    </property>


    <!-- Sentry Hiveserver2 config-->
    <property> 
        <name>hive.sentry.conf.url</name>  
        <value>file:///etc/hive/conf/sentry-site.xml</value> 
    </property> 

    <property>
        <name>hive.server2.session.hook</name>
        <value>org.apache.sentry.binding.hive.HiveAuthzBindingSessionHook</value>
    </property>

    <property>
        <name>hive.security.authorization.task.factory</name>
        <value>org.apache.sentry.binding.hive.SentryHiveAuthorizationTaskFactoryImpl</value>
    </property>

    <!-- Sentry hivemeastore config -->
    <property>
        <name>hive.metastore.filter.hook</name>
        <value>org.apache.sentry.binding.metastore.SentryMetaStoreFilterHook</value>
    </property>

    <property>  
        <name>hive.metastore.pre.event.listeners</name>  
        <value>org.apache.sentry.binding.metastore.MetastoreAuthzBinding</value>  
    </property>

    <property>
        <name>hive.metastore.event.listeners</name>  
        <value>org.apache.sentry.binding.metastore.SentryMetastorePostEventListener</value>  
    </property>

</configuration>

        ps:注意其中关于sentry的配置



    /etc/hive/conf/sentry-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<property>
    <name>hive.sentry.server</name>
    <value>hadoop6</value>
</property>
<property>
    <name>sentry.service.security.mode</name>
    <value>none</value>
</property>
<property>
    <name>sentry.hive.provider.backend</name>
    <value>org.apache.sentry.provider.db.SimpleDBProviderBackend</value>
</property>
<property>
    <name>sentry.service.client.server.rpc-address</name>
    <value>hadoop6</value>
</property>
<property> 
    <name>sentry.service.client.server.rpc-connection-timeout</name>  
    <value>200000</value> 
</property> 
<property>
    <name>sentry.service.client.server.rpc-port</name>
    <value>8038</value>
</property>
<property>
    <name>hive.sentry.provider</name>
    <value>org.apache.sentry.provider.file.HadoopGroupResourceAuthorizationProvider</value>
</property>
<property>
    <name>hive.sentry.failure.hooks</name>
    <value>com.cloudera.navigator.audit.hive.HiveSentryOnFailureHook</value>
</property>
<property> 
    <name>sentry.hive.server</name>  
    <value>hadoop5</value> 
</property>  
<property> 
    <name>sentry.hive.testing.mode</name>  
    <value>true</value> 
</property> 
</configuration>

        ps:此sentry-site.xml不是hadoop6上的sentry配置文件,是单独为hadoop5的hive配置的sentry配置。



    添加lib库

        拷贝hadoop6上面/usr/lib/sentry/lib/里面的sentry*.jar和shiro-*.jar文件到hadoop5的/usr/lib/hive/lib/



    注意!!!

        在官方文档中,提示你要开启sentry服务,hive的warehouse目录权限必须是771,也就是执行以下操作

hadoop -fs -chmod -R 771 /user/hive/warehouse
hadoop -fs -chown -R hive:hive /user/hive/warehouse

        ps:但是因为某种原因,本文并不需要执行此步骤,还是继续保持相关目录1777的权限即可。



重启hive服务



    重启metastore

/etc/init.d/hive-metastore restart



    重启hiveserver2

/etc/init.d/hive-server2 restart



hive权限测试

        ps:必须以hive用户登录



    查看当前权限

        beeline

seatunnel 读区hive_mysql_05

        show current roles

seatunnel 读区hive_seatunnel 读区hive_06

        show roles

seatunnel 读区hive_seatunnel 读区hive_07



    创建admin权限组

CREATE ROLE admin_role;
GRANT ROLE admin_role TO GROUP admin;
GRANT ALL ON server hadoop5 to role admin_role;

        

seatunnel 读区hive_hadoop_08



    创建hive权限

GRANT ROLE admin_role TO GROUP hive;

seatunnel 读区hive_mysql_09

        ps:因为sentry默认连接进来的默认用户是没有任何权限的。



    准备测试数据

        /tmp/sample.csv

10.1.2.3,US,android,createNote
10.200.88.99,FR,windows,updateNote
10.1.2.3,US,android,updateNote
10.200.88.77,FR,ios,createNote
10.1.4.5,US,windows,updateTag



    加载测试数据到sensitive

create database sensitive;
create table sensitive.events (ip STRING, country STRING, client STRING, action STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
load data local inpath '/tmp/sample.csv' overwrite into table sensitive.events;

seatunnel 读区hive_seatunnel 读区hive_10



    加载测试数据到filtered

create database filtered;
create table filtered.events as select country, client, action from sensitive.events;
create table filtered.events_usonly as select * from filtered.events where country = 'US';
0: jdbc:hive2://172.31.217.155:10000> create database filtered;
INFO  : Compiling command(queryId=hive_20170314200303_b3042205-49f1-475b-972c-9255e0faf9c0): create database filtered
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hive_20170314200303_b3042205-49f1-475b-972c-9255e0faf9c0); Time taken: 0.083 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hive_20170314200303_b3042205-49f1-475b-972c-9255e0faf9c0): create database filtered
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hive_20170314200303_b3042205-49f1-475b-972c-9255e0faf9c0); Time taken: 0.029 seconds
INFO  : OK
No rows affected (0.124 seconds)
0: jdbc:hive2://172.31.217.155:10000> create table filtered.events as select country, client, action from sensitive.events;
INFO  : Compiling command(queryId=hive_20170314200303_4fc58355-18bb-4f2b-87de-a894cf8688ec): create table filtered.events as select country, client, action from sensitive.events
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:country, type:string, comment:null), FieldSchema(name:client, type:string, comment:null), FieldSchema(name:action, type:string, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20170314200303_4fc58355-18bb-4f2b-87de-a894cf8688ec); Time taken: 0.137 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hive_20170314200303_4fc58355-18bb-4f2b-87de-a894cf8688ec): create table filtered.events as select country, client, action from sensitive.events
INFO  : Query ID = hive_20170314200303_4fc58355-18bb-4f2b-87de-a894cf8688ec
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : In order to change the average load for a reducer (in bytes):
INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
INFO  : In order to limit the maximum number of reducers:
INFO  :   set hive.exec.reducers.max=<number>
INFO  : In order to set a constant number of reducers:
INFO  :   set mapreduce.job.reduces=<number>
INFO  : Starting Spark Job = ccae0717-542b-4827-a411-800bc660910b
INFO  : =====Spark Job[ccae0717-542b-4827-a411-800bc660910b] statistics=====
INFO  : HIVE
INFO  : 	RECORDS_OUT_1_filtered.events: 5
INFO  : 	CREATED_FILES: 1
INFO  : 	RECORDS_IN: 5
INFO  : 	DESERIALIZE_ERRORS: 0
INFO  : Spark Job[ccae0717-542b-4827-a411-800bc660910b] Metrics
INFO  : 	ExecutorDeserializeTime: 1558
INFO  : 	ExecutorRunTime: 1149
INFO  : 	ResultSize: 2077
INFO  : 	JvmGCTime: 181
INFO  : 	ResultSerializationTime: 1
INFO  : 	MemoryBytesSpilled: 0
INFO  : 	DiskBytesSpilled: 0
INFO  : 	BytesRead: 3259
INFO  : Execution completed successfully
INFO  : Starting task [Stage-0:MOVE] in serial mode
INFO  : Moving data to: hdfs://dev-dalu:8020/user/hive/warehouse/filtered.db/events from hdfs://dev-dalu:8020/user/hive/warehouse/filtered.db/.hive-staging_hive_2017-03-14_20-03-52_904_3142690968931776036-2/-ext-10001
INFO  : Starting task [Stage-3:DDL] in serial mode
INFO  : Starting task [Stage-2:STATS] in serial mode
INFO  : Table filtered.events stats: [numFiles=1, numRows=5, totalSize=105, rawDataSize=100]
INFO  : Completed executing command(queryId=hive_20170314200303_4fc58355-18bb-4f2b-87de-a894cf8688ec); Time taken: 19.663 seconds
INFO  : OK
No rows affected (19.809 seconds)
0: jdbc:hive2://172.31.217.155:10000> create table filtered.events_usonly as select * from filtered.events where country = 'US';
INFO  : Compiling command(queryId=hive_20170314200404_31d0b74a-9bd8-45e7-b24e-ebd5b19c10cf): create table filtered.events_usonly as select * from filtered.events where country = 'US'
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:events.country, type:string, comment:null), FieldSchema(name:events.client, type:string, comment:null), FieldSchema(name:events.action, type:string, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20170314200404_31d0b74a-9bd8-45e7-b24e-ebd5b19c10cf); Time taken: 0.133 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hive_20170314200404_31d0b74a-9bd8-45e7-b24e-ebd5b19c10cf): create table filtered.events_usonly as select * from filtered.events where country = 'US'
INFO  : Query ID = hive_20170314200404_31d0b74a-9bd8-45e7-b24e-ebd5b19c10cf
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : In order to change the average load for a reducer (in bytes):
INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
INFO  : In order to limit the maximum number of reducers:
INFO  :   set hive.exec.reducers.max=<number>
INFO  : In order to set a constant number of reducers:
INFO  :   set mapreduce.job.reduces=<number>
INFO  : Starting Spark Job = c14fb371-f3ff-43a0-bd47-28c369edd92c
INFO  : =====Spark Job[c14fb371-f3ff-43a0-bd47-28c369edd92c] statistics=====
INFO  : HIVE
INFO  : 	CREATED_FILES: 1
INFO  : 	RECORDS_OUT_1_filtered.events_usonly: 3
INFO  : 	RECORDS_IN: 5
INFO  : 	DESERIALIZE_ERRORS: 0
INFO  : Spark Job[c14fb371-f3ff-43a0-bd47-28c369edd92c] Metrics
INFO  : 	ExecutorDeserializeTime: 1589
INFO  : 	ExecutorRunTime: 1488
INFO  : 	ResultSize: 2095
INFO  : 	JvmGCTime: 194
INFO  : 	ResultSerializationTime: 1
INFO  : 	MemoryBytesSpilled: 0
INFO  : 	DiskBytesSpilled: 0
INFO  : 	BytesRead: 3480
INFO  : Execution completed successfully
INFO  : Starting task [Stage-0:MOVE] in serial mode
INFO  : Moving data to: hdfs://dev-dalu:8020/user/hive/warehouse/filtered.db/events_usonly from hdfs://dev-dalu:8020/user/hive/warehouse/filtered.db/.hive-staging_hive_2017-03-14_20-04-21_023_473130083580993490-2/-ext-10001
INFO  : Starting task [Stage-3:DDL] in serial mode
INFO  : Starting task [Stage-2:STATS] in serial mode
INFO  : Table filtered.events_usonly stats: [numFiles=1, numRows=3, totalSize=65, rawDataSize=62]
INFO  : Completed executing command(queryId=hive_20170314200404_31d0b74a-9bd8-45e7-b24e-ebd5b19c10cf); Time taken: 4.362 seconds
INFO  : OK
No rows affected (4.507 seconds)

        ps:为啥filtered加载数据这么多输出,因为我们hive底层是spark执行引擎,而数据是从sensitive导入过来的,所以会把sql转换为spark job来执行。



    创建test权限组

create role test_role;
GRANT ALL ON DATABASE filtered TO ROLE test_role;
use sensitive;
GRANT SELECT(ip) on TABLE sensitive.events TO ROLE test_role;
GRANT ROLE test_role TO GROUP test;

seatunnel 读区hive_seatunnel 读区hive_11



    授予test权限组权限

seatunnel 读区hive_mysql_12

        ps:

                admin_role,具有管理员权限,可以读写所有数据库,并授权给 admin 和 hive 组(对应操作系统上的组)
                test_role,只能读写 filtered 数据库,和只能读取sensitive库中events 表中的ip字段,并授权给 test 组。



    以admin登录hive进行查看

        登录

beeline -u "jdbc:hive2://172.31.217.155:10000" -n admin -p admin

        当前roles

seatunnel 读区hive_hadoop_13

 



    以test登录hive进行查看

        登录

beeline -u "jdbc:hive2://172.31.217.155:10000" -n test -p test

        当前roles

seatunnel 读区hive_hadoop_14



        查询



            admin查询filtered

beeline -u "jdbc:hive2://172.31.217.155:10000" -n admin -p admin
>use filtered;
>select * from events;

                

seatunnel 读区hive_seatunnel 读区hive_15

    

 



            test查询sensitive

beeline -u "jdbc:hive2://172.31.217.155:10000" -n test -p test
>use sensitive;
>select * from events;
>select ip from events;

seatunnel 读区hive_seatunnel 读区hive_16



        test用户尝试给自己授权

GRANT ROLE admin_role TO GROUP test;

seatunnel 读区hive_mysql_17