Hive
Hive ETL: Loading JSON, XML, Text Data Examples
Hive as an ETL and data warehousing tool on top of Hadoop ecosystem provides functionalities like...
Tables, Partitions, and Buckets are the parts of Hive data modeling.
Hive Partitions is a way to organizes tables into partitions by dividing tables into different parts based on partition keys.
Partition is helpful when the table has one or more Partition keys. Partition keys are basic elements for determining how the data is stored in the table.
For Example: -
"Client having Some E –commerce data which belongs to India operations in which each state (38 states) operations mentioned in as a whole. If we take state column as partition key and perform partitions on that India data as a whole, we can able to get Number of partitions (38 partitions) which is equal to number of states (38) present in India. Such that each state data can be viewed separately in partitions tables.
Sample Code Snippet for partitions
create table all states(state string, District string,Enrolments string) row format delimited fields terminated by ',';
Load data local inpath '/home/hduser/Desktop/AllStates.csv' into table allstates;
create table state_part(District string,Enrolments string) PARTITIONED BY(state string);
For partition we have to set this property
set hive.exec.dynamic.partition.mode=nonstrict
INSERT OVERWRITE TABLE state_part PARTITION(state) SELECT district,enrolments,state from allstates;
The following screen shots will show u the execution of above mentioned code
From the above code, we do following things
Buckets in hive is used in segregating of hive table-data into multiple files or directories. it is used for efficient querying.
Step 1) Creating Bucket as shown below.
From the above screen shot
Step 2) Loading Data into table sample bucket
Assuming that"Employees table" already created in Hive system. In this step, we will see the loading of Data from employees table into table sample bucket.
Before we start moving employees data into buckets, make sure that it consist of column names such as first_name, job_id, department, salary and country.
Here we are loading data into sample bucket from employees table.
Step 3)Displaying 4 buckets that created in Step 1
From the above screenshot, we can see that the data from the employees table is transferred into 4 buckets created in step 1.
Hive as an ETL and data warehousing tool on top of Hadoop ecosystem provides functionalities like...
Why to Use MySQL in Hive as Metastore: By Default, Hive comes with derby database as metastore. Derby...
In this tutorial, you will learn- Join queries Different type of joins Sub queries Embedding custom...
Prior to Apache Hive installation we require dedicated Hadoop installation, up and running with...
What is HiveQL(Hive Query Language)? Hive provides a CLI to write Hive queries using Hive Query...
Table Operations such as Creation, Altering, and Dropping tables in Hive can be observed in this...