What is it : Drill is designed to be a distributed SQL query engine. You can compare it with SparkSQL. Spark contains many sub projects and the piece that directly compares with Drill is SparkSQL If you need to perform complex math, statistics, or machine learning, then Apache Spark is a good place for you to start.
SQL vs SQL like : Drill supports ANSI SQL:2003.
Data Formats : Apache Drill is that it can discover the schema on the fly as you query any data.It can integrate with several data sources like Hive, HBase, MongoDB, file system, RDBMS. Also, input formats like Avro, CSV, TSV, PSV, Parquet, Hadoop Sequence files, and many others can be used in Drill with ease. Best format for drill is Parquet.
Access : There are multiple choice for accessing Drill. It can be accessed via the Drill shell, web interface, ReST interface, or through JDBC/ODBC drivers.
Security : Views can aggregate data from several sources and hide the underlying complexities of the data sources. Security through impersonation leverages views. Impersonation and views together offer fine grained security at the file level. A data owner can create a view which selects some limited set of data from the raw data source. They can then grant privileges in the file system for other users to execute that view to query the underlying data without giving the user the ability to read the underlying file directly.
The table below describes at a high level some of the key considerations for picking the right "SQL-on-Hadoop" technology
|
Drill
|
Hive
|
Impala
|
Spark SQL
|
|
Key Use Cases
|
|
Self-service Data
Exploration Interactive BI / Ad-hoc queries
|
Batch/ ETL/
Long-running jobs
|
Interactive BI /
Ad-hoc queries
|
SQL as part of Spark
pipelines / Advanced analytic workflows
|
Data Sources
|
Files Support
|
Parquet, JSON, Text,
all Hive file formats
|
Yes (all Hive file
formats)
|
Yes (Parquet,
Sequence, RC, Text, AVRO ...)
|
Parquet, JSON, Text,
all Hive file formats
|
HBase/MapR-DB
|
Yes
|
Yes
|
Yes
|
Yes
|
|
|
Beyond Hadoop
|
Yes
|
No
|
No
|
Yes
|
Data Types
|
Relational
|
Yes
|
Yes
|
Yes
|
Yes
|
Complex/Nested
|
Yes
|
Limited
|
No
|
Limited
|
|
Metadata
|
Schema-less/Dynamic schema
|
Yes
|
No
|
No
|
Limited
|
Hive Meta store
|
Yes
|
Yes
|
Yes
|
Yes
|
|
SQL / BI tools
|
SQL support
|
ANSI SQL
|
HiveQL
|
HiveQL
|
ANSI SQL (limited)
& HiveQL
|
Client support
|
ODBC/JDBC
|
ODBC/JDBC
|
ODBC/JDBC
|
ODBC/JDBC
|
|
Beyond Memory
|
Yes
|
Yes
|
Yes
|
Yes
|
|
Optimizer
|
Limited
|
Limited
|
Limited
|
Limited
|
|
Platform
|
Latency
|
Low
|
Medium
|
Low
|
Low (in-memory) /
Medium
|
Concurrency
|
High
|
Medium
|
High
|
Medium
|
|
Decentralized Granular Security
|
|
Yes
|
No
|
No
|
No
|
you may like to edit conf/drill-env.sh for memory setting and conf/drill-override.conf for cluster id and zookeeper host & port
No comments:
Post a Comment