HPLSQL - procedural SQL on Hadoop
What:
Hive Hybrid
Procedural SQL On Hadoop (HPL/SQL) is a tool that implements procedural SQL for
Hive, Open
Source(Apache License 2.0)
, included in
Hive 2.0
Who:
The
author of the tool is Dmitry Tolpeko.
Its supported by hive user community - user@hive.apache.org
Code available - https://github.com/apache/hive/tree/master/hplsql
Why:
To brought procedural capability to hive with pl/sql
skills.
Overview
- HPSQL Brings stored procedure programming to the Hadoop world.
- HPL/SQL is an easiest and quickest way to migrate PL/SQL code
- Make SQL-on-Hadoop More Dynamic
- HPL/SQL allows you to work with multiple systems in a single script, so you can take the best of all worlds for different types of workloads and easily integrate them.
- For developers coming from Oracle and SQL Server, these procedures will feel very familiar and will allow you to port a lot of your existing PL/SQL
Requirements
- Java 1.6 or higher
- Hadoop 2.x
- included with Hive 2.0 but its not really need hive to run
Syntax
hplsql -e 'query' | -f file [-main procname] [-d | --define | -hiveconf | -hivevar var=value ...] [-version | --version] [-trace | --trace] [-H | --help]
if -main option is not specified, HPL/SQL starts executing all statements from the beginning of the script
Features
Built-in Functions
- Temporary tables
- Stored Procedures, Functions and Packages
- Exception and Condition Handling
- Processing Engine
- MR
- Tez
- Spark(not supported by Mapr)
- Exception and Condition Handling
- On-the-fly SQL Conversion
- Polyglot Persistence
- UDF
HPL/SQL key features:
Build in function - http://www.hplsql.org/doc
Code sample -
creating sample procedure and function
Temporary tables
/opt/mapr/hive/hive-2.1/bin/hplsql
-f /opt/mapr/hive/hive-2.1/bin/temp_table.sql
Cursor support
/opt/mapr/hive/hive-2.1/bin/hplsql
-f /opt/mapr/hive/hive-2.1/bin/iterate_table.sql
sql generation on the fly
/opt/mapr/hive/hive-2.1/bin/hplsql
-f /opt/mapr/hive/hive-2.1/bin/create_dept_table.sql
PLSQL style Packages support
/opt/mapr/hive/hive-2.1/bin/hplsql
-f /opt/mapr/hive/hive-2.1/bin/pck.sql
Exception and Condition Handling
hplsql.onerror =
exception | seterror |
stop;
DECLARE <exception>
CONDITION;
DECLARE EXIT HANDLER FOR <
<exception>
PRINT
‘<exception
message>';
/opt/mapr/hive/hive-2.1/bin/hplsql
-f \
/opt/mapr/hive/hive- 2.1/bin/TDA_demo_table_eh.sql
Working with multiple sources - mysql, hive, hdfs
Map Object
/opt/mapr/hive/hive-2.1/bin/hplsql -f /opt/mapr/hive/hive-2.1/bin/access_mysql.sql
UDF in hive
write function in /opt/mapr/hive/hive-2.1/bin/hplsqlrc file and just add the file in hive.
Steps below
- Hive – start shell
- ADD FILE /opt/mapr/hive/hive-2.1/bin/hplsqlrc
- CREATE TEMPORARY FUNCTION hplsql AS 'org.apache.hive.hplsql.Udf';
- use pawan_hplsql;
- SELECT hello(name) FROM orders;
Challenges
- HPL/SQL offers the fastest way to start working with Hadoop. Later you can re-design and implement advanced data processing workflows using Spark, Tez, Storm, Flink and other frameworks, but right now you can use your current skills and existing code to run your business logic on Hadoop.
- Mapr hplsql only support mr and tez as processing engine.
- its still a new product. Documentation is limited and there is not much help available on internet.
- Database connectivity on mapr
- Mysql worked
- Oracle ??
- Phoenix failed