Hive CREATE AS

Hive is a data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, querying, and analysis. One of the powerful features of Hive is the ability to create tables using the CREATE AS statement. This allows users to create a new table based on the results of a query.

Syntax

The syntax for the CREATE AS statement is as follows:

CREATE TABLE new_table_name AS
SELECT column1, column2, ...
FROM existing_table_name
WHERE condition;
  • new_table_name is the name of the new table that will be created.
  • column1, column2, ... are the columns that will be selected from the existing table. If you want to include all columns, you can use *.
  • existing_table_name is the name of the existing table from which the data will be selected.
  • condition is an optional clause that allows you to filter the data based on a specific condition.

Example

Let's say we have a table called employees with the following schema:

id name age salary
1 John 30 50000
2 Alice 25 40000
3 Bob 35 60000

Now, we want to create a new table called top_earning_employees to store the employees with a salary greater than 45000. We can use the CREATE AS statement to achieve this:

CREATE TABLE top_earning_employees AS
SELECT id, name, age, salary
FROM employees
WHERE salary > 45000;

The above query will create a new table top_earning_employees with the following schema:

id name age salary
1 John 30 50000
3 Bob 35 60000

The new table will only contain the rows where the salary is greater than 45000.

Benefits of Using CREATE AS

The CREATE AS statement in Hive offers several benefits:

  1. Efficiency: By creating a new table based on the results of a query, you can avoid the need to write complex and repetitive code to perform the same calculations or transformations multiple times.

  2. Data Manipulation: The CREATE AS statement allows you to manipulate and transform the data during the creation of the new table. You can use functions, aggregations, and filters to extract only the necessary data.

  3. Data Security: Creating a new table with a subset of data from an existing table can help to ensure data security. You can restrict access to the new table, providing only the necessary information to the users who need it.

Conclusion

In this article, we have learned about the CREATE AS statement in Hive, which allows users to create a new table based on the results of a query. We have seen the syntax of the statement and how to use it with an example. The CREATE AS statement in Hive provides efficiency, data manipulation, and data security benefits, making it a powerful tool for data analysis and management.

erDiagram
    employees ||--|| top_earning_employees : CREATE AS

References:

  • Hive Language Manual - [Data Definition Language](