My Notes: September 2017

Importing relational database table to hbase table - Sqoop is the best tool for it.

Its a two step process

1. create hbase table

2. import the data from relational table to hbase table using sqoop import

hbase - create table in hbase

[mapr@hostname lib]$ hbase shell

hbase(main):002:0> create 'DV_PRODUCTION_BRANCHES', 'DETAILS'

Sqoop - copy oracle driver and load the data into hbase table using below command.

1. copy oracle driver to lib - /opt/mapr/sqoop/sqoop-1.4.6/lib/

2. run sqoop import

sqoop import --connect jdbc:oracle:thin:@LOCALHOST:1521/DVTST --username DV_PRD --password dv_tst \
--table DV_PRODUCTION_BRANCHES \
--columns "PRODUCTION_BRANCH_ID,NAME" \
--hbase-table 'DV_PRODUCTION_BRANCHES' \
--column-family DETAILS \
--hbase-row-key PRODUCTION_BRANCH_ID \
-m 1

Points to be noted -

DV_PRODUCTION_BRANCHES is the relational table in DVTST database
DV_PRODUCTION_BRANCHES is name of the hbase table with column family DETAILS and rowkey as PRODUCTION_BRANCH_ID
Sqoop doesn’t permit multiple column families at the time of when this blog was written. You need to create HBase table first with multiple column family and then execute three Sqoop import operations to import each column family.

What:

Apache Phoenix is sql skin over No Sql database (HBase).
Its Enables OLTP and operational analytics on Hbase.
Apache Phoenix is fully integrated with other Hadoop products such as Spark, Hive, Pig,Flume, and Map Reduce.

Who/When:

The project was created by the Salesforce.
It was originally open-sourced on GitHub and became a top-level Apache project on 22 May 2014.

Challenge :

The challenge today is blending the scale and performance of NoSQL with the ease of use of SQL

Hbase

•Hbase NoSQL column family database
•Store data as sorted maps as byte Array
•Low level APIs
•Write Java code or use low level API

Why Phoenix

Relational database engine supporting OLTP for Hadoop use Apache Hbase as its backing store.
Transforms SQL queries to Hbase calls hence hiding complexities for noSQL functions.
Low Latency
Reduces the amount of code users need to write
Performance optimizations transparent to the user(scans, server-side filter, secondary indexes )

Phoenix and Hbase together

Accessing HBase data with Phoenix can be easier than direct HBase API

SELECT column FROM table WHERE column > 30

Versus

scan 'table', {FILTER => "ValueFilter(=,'binary:30')",LIMIT => 10}

Versus

HTable t = new HTable(“table”);

RegionScanner s = t.getScanner(new Scan(…, new ValueFilter(CompareOp.GT, new CustomTypedComparator(30)),…));

while ((Result r = s.next()) != null)

{

// Reading values from Result class object

byte [] value = result.getValue(Bytes.toBytes(“columnfamily"),Bytes.toBytes(" column "));

System.out.println(Bytes.toString(value));

}

s.close();

Architecture

•Phoenix pushes processing to the server - most "by hand” API accesses do not use co-processors

•Phoenix supports and uses secondary indexes

•Phoenix uses "every trick in the book" based on various factors: the HBase version, metadata and query, reverse scans, small scans, skip scans, etc.

Few Phoenix SQLs:

Create table -

CREATE TABLE IF NOT EXISTS us_zip (

zip CHAR(5) NOT NULL,

city VARCHAR NOT NULL,

hhc BIGINT

CONSTRAINT my_pk PRIMARY KEY (zip,city))

DATA_BLOCK_ENCODING='NONE',VERSIONS=5,MAX_FILESIZE=2000000 split on (‘a’, ‘p’ ,’r)’

COMPRESSION=‘LZ4’ TRANSCATIONAL=true;

Insert few records -

UPSERT INTO us_zip VALUES('60500','New York', 81431);

UPSERT INTO us_zip VALUES('60501',‘Dallas', 91233);

UPSERT INTO us_zip VALUES(‘06095,‘Windsor, 92433);

Query -

SELECT zip,city ,hhc

FROM us_zip

where hhc = (select max(hhc) from us_zip );

View on the top of Hbase table -

CREATE VIEW “test_log“ ( k VARCHAR primary key, “name”."value" VARCHAR);

CREATE VIEW my_view ( new_col SMALLINT )

AS SELECT * FROM my_table WHERE k = 100;

Phoenix benefits

•SQL Datatypes built in

•Schemas, DDL, Hbase Properties

•Composite Primary Key

•Map existing Hbase tables

•Write to Hbase, read from Phoenix

•Salting

•Indexes

•Client side rewriting

•Master-slave configuration

•Parallel scanning with final client side merge sort

•RPC batching

•Use secondary indexes if available

•Rewrites for multitenant tables

•Server side push down

•Filters

•Skip scans

•Parallel Aggregation

•TopN (coprocessor)

•Join and Subqueries

•Inner, left, right, Full Outer join, Cross Join, Semi Join, Anti Join

•Query server – Similar to REST Server

•Protobuf 3.0 over http

•Has JDBC thin client

•Spark Integration

•Secondary index

•At query time, the optimizer will use the index if it contains all columns referenced in the query and produces the most efficient execution plan

•Global index

•CREATE INDEX my_idx ON sales.opportunity(last_updated_date DESC)

•Read optimized(Use case)

•Different table per index

•Transactional tables indexes have ACID guarantee

•Local Index

•CREATE LOCAL INDEX my_index ON my_table (v1)

•Write optimized (Use case)

•Use same table – different CF

•Functional index

•Supported

By default, unless hinted, an index will not be used for a query that references a column that isn’t part of the index.

Future Road map

•Calcite Support

•No Generic Yarn or Tez layer

•Offset support

•Local index re-implementation

My Notes

Thursday, September 14, 2017

importing relational database table to hbase table - Sqoop

Saturday, September 2, 2017

Apache Phoenix

Websphere Dummy certificate expired - DummyServerKeyFile.jks , DummyServerTrustFile.jks

Search This Blog