Author Avatar

Pradeep Mishra

0

Share post:

In continuation of the previous blog Part-1, I am going to mention some other Hive changes that have been introduced in the Hadoop CDP (Cloudera Data Platform) upgrade.

Beeline

Hive CLI was deprecated in CDH 5.16 onwards version and is now removed in CDP PvC Base. Hive in CDP has switched completely to Beeline. Any former Hive CLI calls should be replaced with Beeline and tested. Beeline uses a JDBC connection to Hive on Tez to execute commands.

Handling table reference syntax

Hive 3.x rejects `db.table` in SQL queries as described by the Hive-16907 bug fix. A dot (.) is not allowed in table names. To reference the database and table in a table name, enclosed both in backticks as follows : `db`.`table` .
Steps

  1. Find a table having the problematic table reference.
    For example, math.students appears in a CREATE TABLE statement.
  2. Enclose the database name and the table name in backticks.
    CREATE TABLE `math`.`students` (name VARCHAR(64), age INT, gpa DECIMAL(3,2));

After the CDP Upgrade, I have also faced the same issue while running the sqoop command because I have specified the table as
"--hive-table db.table" and got the below error:-

SemanticException: Table or database name may not contain dot(.) character

Sqoop sends it as `db.table`. Use “--hive-database” and “--hive-table” to specify individually the hive database and tables that you want to use in your sqoop import/export. For example:

--hive-database db 
--hive-table table

Casting timestamps

Before Upgrade to CDP
Casting a numeric type value into a timestamp could be used to produce a result that reflected the time zone of the cluster. For example, 1597217764557 is 2020-08-12 00:36:04 PDT. Running the following query casts the numeric to a timestamp in PDT:

> SELECT CAST(1597217764557 AS TIMESTAMP); 
| 2020-08-12 00:36:04 |       

After Upgrade to CDP
Casting a numeric type value into a timestamp produces a result that reflects the UTC instead of the time zone of the cluster. Running the following query casts the numeric to a timestamp in UTC.

> SELECT CAST(1597217764557 AS TIMESTAMP); 
| 2020-08-12 07:36:04.557  |          

Action Required
Change applications. Do not cast from a numeral to obtain a local time zone. Built-in functions from_utc_timestamp and to_utc_timestamp can be used to mimic behavior before the upgrade.

Handling output of greatest and least functions

To calculate the greatest (or least) value in a column, you need to work around a problem that occurs when the column has a NULL value.

Before Upgrade to CDP
The greatest function returned the highest value of the list of values. The least function returned the lowest value of the list of values.

After Upgrade to CDP
Returns NULL when one or more arguments are NULL.

Action Required
Use NULL filters or the nvl function on the columns you use as arguments to the greatest or least functions.

SELECT greatest(nvl(col1,default value incase of NULL), nvl(col2,default value incase of NULL));

TRUNCATE TABLE on an external table

Hive 3 does not support TRUNCATE TABLE on external tables. Truncating an external table results in an error. You can truncate an external table if you change your applications to set a table property to purge data.

Before Upgrade to CDP
Some legacy versions of Hive supported TRUNCATE TABLE on external tables.

After Upgrade to CDP Private Cloud Base
By default, TRUNCATE TABLE is supported only on managed tables. Attempting to truncate an external table results in the following error:

Error: org.apache.spark.sql.AnalysisException: Operation not allowed: TRUNCATE TABLE on external tables

Action Required
Change applications. Do not attempt to run TRUNCATE TABLE on an external table.
Alternatively, change applications to alter a table property to set external.table.purge to true to allow truncation of an external table:

ALTER TABLE mytable SET TBLPROPERTIES ('external.table.purge'='true');

File structure changes in HDFS

  1. Empty Table
    Before Upgrade to CDP, a 0 KB file gets created at the given hdfs file location in case of empty table creation.

    After Upgrade to CDP, no file gets created at the given hdfs file location in case of empty table creation.
  2. UNION ALL
    Before Upgrade to CDP, files without any folder structure gets created at the given hdfs file location in case of UNION ALL table creation.

    After Upgrade to CDP, folder wise files will be created at the given hdfs file location in case of UNION ALL table creation.

I hope you have enjoyed this post. Please like and share and feel free to comment if you have any suggestions or feedback.

Apache Hive 3 Changes in CDP Upgrade: Part-1
Different ways of implementing Singleton Design Pattern in Java

Leave a Reply