In continuation of the previous blog Part-1, I am going to mention some other Hive changes that have been introduced in the Hadoop CDP (Cloudera Data Platform) upgrade.
Beeline
Hive CLI was deprecated in CDH 5.16 onwards version and is now removed in CDP PvC Base. Hive in CDP has switched completely to Beeline. Any former Hive CLI calls should be replaced with Beeline and tested. Beeline uses a JDBC connection to Hive on Tez to execute commands.
Handling table reference syntax
Hive 3.x rejects `db.table`
in SQL queries as described by the Hive-16907 bug fix. A dot (.) is not allowed in table names. To reference the database and table in a table name, enclosed both in backticks as follows : `db`.`table`
.
Steps
- Find a table having the problematic table reference.
For example, math.students
appears in a CREATE TABLE statement. - Enclose the database name and the table name in backticks.
CREATE TABLE `math`.`students` (name VARCHAR(64), age INT, gpa DECIMAL(3,2));
After the CDP Upgrade, I have also faced the same issue while running the sqoop command because I have specified the table as
"--hive-table db.table"
and got the below error:-
SemanticException: Table or database name may not contain dot(.) character
Sqoop sends it as `db.table
`. Use “--hive-database
” and “--hive-table
” to specify individually the hive database and tables that you want to use in your sqoop import/export. For example:
--hive-database db
--hive-table table
Casting timestamps
Before Upgrade to CDP
Casting a numeric type value into a timestamp could be used to produce a result that reflected the time zone of the cluster. For example, 1597217764557 is 2020-08-12 00:36:04 PDT. Running the following query casts the numeric to a timestamp in PDT:
> SELECT CAST(1597217764557 AS TIMESTAMP);
| 2020-08-12 00:36:04 |
After Upgrade to CDP
Casting a numeric type value into a timestamp produces a result that reflects the UTC instead of the time zone of the cluster. Running the following query casts the numeric to a timestamp in UTC.
> SELECT CAST(1597217764557 AS TIMESTAMP);
| 2020-08-12 07:36:04.557 |
Action Required
Change applications. Do not cast from a numeral to obtain a local time zone. Built-in functions from_utc_timestamp
and to_utc_timestamp
can be used to mimic behavior before the upgrade.
Handling output of greatest and least functions
To calculate the greatest (or least) value in a column, you need to work around a problem that occurs when the column has a NULL value.
Before Upgrade to CDP
The greatest function returned the highest value of the list of values. The least function returned the lowest value of the list of values.
After Upgrade to CDP
Returns NULL when one or more arguments are NULL.
Action Required
Use NULL filters or the nvl function on the columns you use as arguments to the greatest or least functions.
SELECT greatest(nvl(col1,default value incase of NULL), nvl(col2,default value incase of NULL));
TRUNCATE TABLE on an external table
Hive 3 does not support TRUNCATE TABLE on external tables. Truncating an external table results in an error. You can truncate an external table if you change your applications to set a table property to purge data.
Before Upgrade to CDP
Some legacy versions of Hive supported TRUNCATE TABLE on external tables.
After Upgrade to CDP Private Cloud Base
By default, TRUNCATE TABLE is supported only on managed tables. Attempting to truncate an external table results in the following error:
Error: org.apache.spark.sql.AnalysisException: Operation not allowed: TRUNCATE TABLE on external tables
Action Required
Change applications. Do not attempt to run TRUNCATE TABLE on an external table.
Alternatively, change applications to alter a table property to set external.table.purge
to true to allow truncation of an external table:
ALTER TABLE mytable SET TBLPROPERTIES ('external.table.purge'='true');
File structure changes in HDFS
- Empty Table
Before Upgrade to CDP, a 0 KB file gets created at the given hdfs file location in case of empty table creation.
After Upgrade to CDP, no file gets created at the given hdfs file location in case of empty table creation.
- UNION ALL
Before Upgrade to CDP, files without any folder structure gets created at the given hdfs file location in case of UNION ALL table creation.
After Upgrade to CDP, folder wise files will be created at the given hdfs file location in case of UNION ALL table creation.
I hope you have enjoyed this post. Please like and share and feel free to comment if you have any suggestions or feedback.
Like this:
Like Loading...
Pradeep Mishra
Share post:
In continuation of the previous blog Part-1, I am going to mention some other Hive changes that have been introduced in the Hadoop CDP (Cloudera Data Platform) upgrade.
Beeline
Hive CLI was deprecated in CDH 5.16 onwards version and is now removed in CDP PvC Base. Hive in CDP has switched completely to Beeline. Any former Hive CLI calls should be replaced with Beeline and tested. Beeline uses a JDBC connection to Hive on Tez to execute commands.
Handling table reference syntax
Hive 3.x rejects
`db.table`
in SQL queries as described by the Hive-16907 bug fix. A dot (.) is not allowed in table names. To reference the database and table in a table name, enclosed both in backticks as follows :`db`.`table`
.Steps
For example,
math.students
appears in a CREATE TABLE statement.CREATE TABLE `math`.`students` (name VARCHAR(64), age INT, gpa DECIMAL(3,2));
After the CDP Upgrade, I have also faced the same issue while running the sqoop command because I have specified the table as
"--hive-table db.table"
and got the below error:-Sqoop sends it as `
db.table
`. Use “--hive-database
” and “--hive-table
” to specify individually the hive database and tables that you want to use in your sqoop import/export. For example:Casting timestamps
Before Upgrade to CDP
Casting a numeric type value into a timestamp could be used to produce a result that reflected the time zone of the cluster. For example, 1597217764557 is 2020-08-12 00:36:04 PDT. Running the following query casts the numeric to a timestamp in PDT:
After Upgrade to CDP
Casting a numeric type value into a timestamp produces a result that reflects the UTC instead of the time zone of the cluster. Running the following query casts the numeric to a timestamp in UTC.
Action Required
Change applications. Do not cast from a numeral to obtain a local time zone. Built-in functions
from_utc_timestamp
andto_utc_timestamp
can be used to mimic behavior before the upgrade.Handling output of greatest and least functions
To calculate the greatest (or least) value in a column, you need to work around a problem that occurs when the column has a NULL value.
Before Upgrade to CDP
The greatest function returned the highest value of the list of values. The least function returned the lowest value of the list of values.
After Upgrade to CDP
Returns NULL when one or more arguments are NULL.
Action Required
Use NULL filters or the nvl function on the columns you use as arguments to the greatest or least functions.
TRUNCATE TABLE on an external table
Hive 3 does not support TRUNCATE TABLE on external tables. Truncating an external table results in an error. You can truncate an external table if you change your applications to set a table property to purge data.
Before Upgrade to CDP
Some legacy versions of Hive supported TRUNCATE TABLE on external tables.
After Upgrade to CDP Private Cloud Base
By default, TRUNCATE TABLE is supported only on managed tables. Attempting to truncate an external table results in the following error:
Action Required
Change applications. Do not attempt to run TRUNCATE TABLE on an external table.
Alternatively, change applications to alter a table property to set
external.table.purge
to true to allow truncation of an external table:File structure changes in HDFS
Before Upgrade to CDP, a 0 KB file gets created at the given hdfs file location in case of empty table creation.
After Upgrade to CDP, no file gets created at the given hdfs file location in case of empty table creation.
Before Upgrade to CDP, files without any folder structure gets created at the given hdfs file location in case of UNION ALL table creation.
After Upgrade to CDP, folder wise files will be created at the given hdfs file location in case of UNION ALL table creation.
I hope you have enjoyed this post. Please like and share and feel free to comment if you have any suggestions or feedback.
Share this:
Like this:
Apache Hive 3 Changes in CDP Upgrade: Part-1
INTRODUCTION Cloudera Data Platform (CDP) is a cloud computing platform for businesses. It provides integrated and multifunctional self-service tools in order to analyze and centralize data. It brings security and governance. read more…
Share this:
Like this:
Continue Reading