Apache
Drill -
Apache Drill is a Schema-free SQL Query Engine
for Hadoop, NoSQL and Cloud Storage and it allows us to explore, visualize and
query different datasets without having to fix to a schema using ETL and so on.
Apache Drill is also Analyse the multi-structured
and nested data in non-relational data stores directly without restricting any
data.
Apache Drill is the first distributed SQL query
engine and it contains the schema free JSON model and its looks like -
ü Elastic
Search
ü MongoDB
ü NoSQL
database
ü And
SO on
The Apache Drill is very useful for those
professionals that already working with SQL databases and BI tools like Pentaho,
Tableau, and Qlikview.
Also Apache Drill supports to -
ü RESTful,
ü ANSI
SQL and
ü JDBC/ODBC
drivers
Great
Features of Apache Drill –
The following features are -
ü Schema-free
JSON document model similar to MongoDB and Elastic search
ü Code
reusability
ü Easy
to use and developer friendly
ü High
performance Java based API
ü Memory
management system
ü Industry-standard
API like ANSI SQL, ODBC/JDBC, RESTful APIs
ü How
does Drill achieve performance?
ü Distributed
query optimization and execution
ü Columnar
Execution
ü Optimistic
Execution
ü Pipelined
Execution
ü Runtime
compilation and code generation
ü Vectorization
What
Datastores does Drill support?
Drill’s main focused on non-relational data
stores, including Hadoop, NoSQL and cloud storage.
The following datastores are -
ü NoSQL
- HBase and MongoDB
ü Cloud
Storage - Amazon S3, Google Cloud Storage, Azure Blog Storage and Swift
ü Hadoop
- MapR, CDH and Amazon EMR
What
Similarities between Spark SQL and Apache Drill?
ü Both
the Apache Drill and Spark SQL are open source
ü Do
not require a Hadoop cluster to get started
ü Both
the SQL-on-Hadoop tools can easily be run inside a VM.
ü Both
the Apache Drill and Spark SQL are supports multiple data formats- JSON,
Parquet, MongoDB, Avro, MySQL and so on.
What
Are the Main Differences between Spark SQL and Apache Drill?
The Spark SQL only supports a subset of SQL but
Apache Drill supports ANSI SQL.
Querying data in Spark SQL with help of languages
like Java, Scala or Python but Apache Drill querying data with helps of MySQL
or Oracle.
Is
Spark SQL similar to Drill?
No!
How
does Drill support queries on self-describing data?
ü JSON
data model
ü On-the-fly
schema discovery
Do
I need to load data into Drill to start querying it?
No! The Drill can query data in-situ.
Apache
Spark -
The Apache Spark is an open source, very fast,
in-memory data processing and general engine and used for the large amount of
data processing.
Apache Spark is a cluster-computing framework.
The
Advantage of Spark -
ü Ease
of Use
ü Open
Source
ü Spark
is in-memory cluster computing so it Speed is very fast.
ü Combine
SQL, streaming, and complex analytics
ü Spark
runs everywhere - on Hadoop, Mesos, and standalone and so on.
ü Supports
multiple languages
The Spark is not a modified version of Hadoop and
the Spark uses Hadoop for -
ü Storage
ü Data
Processing
ü Spark
supports the following languages -
ü Java
ü Python
ü Scala
ü R
ü Clojure
Is
Apache Spark going to replace Hadoop?
My answer Is Yes! What Is your Opinions about the
same?
Hadoop will be replaced by Spark and both Apache
Spark and Hadoop are big-data frameworks.
The Spark is one of the favourite choices of data
scientist. Apache Spark is growing very quickly and replacing MapReduce.