SlideShare uma empresa Scribd logo
1 de 90
Baixar para ler offline
© MapR Technologies, confidential 
© 2014 MapR Technologies 
M. C. Srivas, CTO and Founder
© MapR Technologies, confidential 
MapR is Unbiased Open Source
© MapR Technologies, confidential 
Linux Is Unbiased 
• Linux provides choice 
– MySQL 
– PostgreSQL 
– SQLite 
• Linux provides choice 
– Apache httpd 
– Nginx 
– Lighttpd
© MapR Technologies, confidential 
MapR Is Unbiased 
• MapR provides choice 
MapR Distribution for Hadoop Distribution C Distribution H 
Spark Spark (all of it) and SparkSQL Spark only No 
Interactive SQL Impala, Drill, Hive/Tez, 
SparkSQL 
One option 
(Impala) 
One option 
(Hive/Tez) 
Scheduler YARN, Mesos One option 
(YARN) 
One option 
(YARN) 
Versions Hive 0.10, 0.11, 0.12, 0.13 
Pig 0.11, 012 
HBase 0.94, 0.98 
One version One version
© MapR Technologies, confidential 
MapR Distribution for Apache Hadoop 
MapR Data Platform 
(Random Read/Write) 
Enterprise Grade Data Hub Operational 
MapR-FS 
(POSIX) 
MapR-DB 
(High-Performance NoSQL) 
Security 
YARN 
Pig 
Cascading 
Spark 
Batch 
Spark 
Streaming 
Storm* 
Streaming 
HBase 
Solr 
NoSQL & 
Search 
Juju 
Provisioning 
& 
Coordination 
Savannah* 
Mahout 
MLLib 
ML, Graph 
GraphX 
MapReduc 
e v1 & v2 
APACHE HADOOP AND OSS ECOSYSTEM 
EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS 
Workflow 
& Data 
Tez* Governance 
Accumulo* 
Hive 
Impala 
Shark 
Drill* 
SQL 
Sqoop Sentry* Oozie ZooKeeper 
Flume Knox* Falcon* Whirr 
Data 
Integration 
& Access 
HttpFS 
Hue 
NFS HDFS API HBase API JSON API 
MapR Control System 
(Management and Monitoring) 
* In Roadmap for inclusion/certification 
CLI REST API GUI
© MapR Technologies, confidential
© MapR Technologies, confidential 
Hadoop an augmentation for EDW—Why?
© MapR Technologies, confidential
© MapR Technologies, confidential
© MapR Technologies, confidential
© MapR Technologies, confidential
© MapR Technologies, confidential
© MapR Technologies, confidential 
But inside, it looks like this …
© MapR Technologies, confidential 
And this …
© MapR Technologies, confidential 
And this …
© MapR Technologies, confidential 
Consolidating schemas is very hard.
© MapR Technologies, confidential 
Consolidating schemas is very hard, causes SILOs
© MapR Technologies, confidential 
Silos make analysis very difficult 
• How do I identify a 
unique {customer, 
trade} across data sets? 
• How can I guarantee 
the lack of anomalous 
behavior if I can’t see 
all data?
© 2014 MapR Technologies 19 
Hard to know what’s of value a priori
© 2014 MapR Technologies 20 
Hard to know what’s of value a priori
© 2014 MapR Technologies 21 
Why Hadoop
© MapR Technologies, confidential 
Rethink SQL for Big Data 
Preserve 
•ANSI SQL 
• Familiar and ubiquitous 
• Performance 
• Interactive nature crucial for BI/Analytics 
• One technology 
• Painful to manage different technologies 
• Enterprise ready 
• System-of-record, HA, DR, Security, Multi-tenancy, 
…
© MapR Technologies, confidential 
Rethink SQL for Big Data 
Preserve 
•ANSI SQL 
• Familiar and ubiquitous 
• Performance 
• Interactive nature crucial for BI/Analytics 
• One technology 
• Painful to manage different technologies 
• Enterprise ready 
• System-of-record, HA, DR, Security, Multi-tenancy, 
… 
Invent 
• Flexible data-model 
• Allow schemas to evolve rapidly 
• Support semi-structured data types 
• Agility 
• Self-service possible when developer and DBA is 
same 
• Scalability 
• In all dimensions: data, speed, schemas, processes, 
management
© MapR Technologies, confidential 
SQL is here to stay
© MapR Technologies, confidential 
Hadoop is here to stay
© MapR Technologies, confidential 
YOU CAN’T HANDLE REAL SQL
© MapR Technologies, confidential 
SQL 
select * from A 
where exists ( 
select 1 from B where B.b < 100 ); 
• Did you know Apache HIVE cannot compute it? 
– eg, Hive, Impala, Spark/Shark
© MapR Technologies, confidential 
Self-described Data 
select cf.month, cf.year 
from hbase.table1; 
• Did you know normal SQL cannot handle the above? 
• Nor can HIVE and its variants like Impala, Shark?
© MapR Technologies, confidential 
Self-described Data 
select cf.month, cf.year 
from hbase.table1; 
• Why? 
• Because there’s no meta-store definition available
© MapR Technologies, confidential 
Self-Describing Data Ubiquitous 
Centralized schema 
- Static 
- Managed by the DBAs 
- In a centralized repository 
Long, meticulous data preparation process (ETL, 
create/alter schema, etc.) 
– can take 6-18 months 
Self-describing, or schema-less, data 
- Dynamic/evolving 
- Managed by the applications 
- Embedded in the data 
Less schema, more suitable for data that has 
higher volume, variety and velocity 
Apache Drill
© MapR Technologies, confidential 
A Quick Tour through Apache Drill
© MapR Technologies, confidential 
Data Source is in the Query 
select timestamp, message 
from dfs1.logs.`AppServerLogs/2014/Jan/p001.parquet` 
where errorLevel > 2
© MapR Technologies, confidential 
Data Source is in the Query 
select timestamp, message 
from dfs1.logs.`AppServerLogs/2014/Jan/p001.parquet` 
where errorLevel > 2 
This is a cluster in Apache Drill 
- DFS 
- HBase 
- Hive meta-store
© MapR Technologies, confidential 
Data Source is in the Query 
select timestamp, message 
from dfs1.logs.`AppServerLogs/2014/Jan/p001.parquet` 
where errorLevel > 2 
This is a cluster in Apache Drill 
- DFS 
- HBase 
- Hive meta-store 
A work-space 
- Typically a sub-directory 
- HIVE database
© MapR Technologies, confidential 
Data Source is in the Query 
select timestamp, message 
from dfs1.logs.`AppServerLogs/2014/Jan/p001.parquet` 
where errorLevel > 2 
This is a cluster in Apache Drill 
- DFS 
- HBase 
- Hive meta-store 
A work-space 
- Typically a sub-directory 
- HIVE database 
A table 
- pathnames 
- Hbase table 
- Hive table
© MapR Technologies, confidential 
Combine data sources on the fly 
• JSON 
• CSV 
• ORC (ie, all Hive types) 
• Parquet 
• HBase tables 
• … can combine them 
Select USERS.name, USERS.emails.work 
from 
dfs.logs.`/data/logs` LOGS, 
dfs.users.`/profiles.json` USERS, 
where 
LOGS.uid = USERS.uid and 
errorLevel > 5 
order by count(*);
© MapR Technologies, confidential 
Can be an entire directory tree 
// On a file 
select errorLevel, count(*) 
from dfs.logs.`/AppServerLogs/2014/Jan/part0001.parquet` 
group by errorLevel;
© MapR Technologies, confidential 
Can be an entire directory tree 
// On a file 
select errorLevel, count(*) 
from dfs.logs.`/AppServerLogs/2014/Jan/part0001.parquet` 
group by errorLevel; 
// On the entire data collection: all years, all months 
select errorLevel, count(*) 
from dfs.logs.`/AppServerLogs` 
group by errorLevel
© MapR Technologies, confidential 
Can be an entire directory tree 
// On a file 
select errorLevel, count(*) 
from dfs.logs.`/AppServerLogs/2014/Jan/part0001.parquet` 
group by errorLevel; 
// On the entire data collection: all years, all months 
select errorLevel, count(*) 
from dfs.logs.`/AppServerLogs` 
group by errorLevel 
dirs[1] dirs[2]
© MapR Technologies, confidential 
Can be an entire directory tree 
// On a file 
select errorLevel, count(*) 
from dfs.logs.`/AppServerLogs/2014/Jan/part0001.parquet` 
group by errorLevel; 
// On the entire data collection: all years, all months 
select errorLevel, count(*) 
from dfs.logs.`/AppServerLogs` 
group by errorLevel 
where dirs[1] > 2012 
, dirs[2] 
dirs[1] dirs[2]
© MapR Technologies, confidential 
Querying JSON 
{ name: classic 
fillings: [ 
{ name: sugar cal: 400 }]} 
{ name: choco 
fillings: [ 
{ name: sugar cal: 400 } 
{ name: chocolate cal: 300 }]} 
{ name: bostoncreme 
fillings: [ 
{ name: sugar cal: 400 } 
{ name: cream cal: 1000 } 
{ name: jelly cal: 600 }]} 
donuts.json
© MapR Technologies, confidential 
Cursors inside Drill 
DrillClient drill = new DrillClient().connect( …); 
ResultReader r = drill.runSqlQuery( "select * from `donuts.json`"); 
while( r.next()) { 
String donutName = r.reader( “name").readString(); 
ListReader fillings = r.reader( "fillings"); 
while( fillings.next()) { 
int calories = fillings.reader( "cal").readInteger(); 
if (calories > 400) 
print( donutName, calories, fillings.reader( "name").readString()); 
} 
} 
{ name: classic 
fillings: [ 
{ name: sugar cal: 400 }]} 
{ name: choco 
fillings: [ 
{ name: sugar cal: 400 } 
{ name: chocolate cal: 300 }]} 
{ name: bostoncreme 
fillings: [ 
{ name: sugar cal: 400 } 
{ name: cream cal: 1000 } 
{ name: jelly cal: 600 }]}
© MapR Technologies, confidential 
Direct queries on nested data 
// Flattening maps in JSON, parquet and other 
nested records 
select name, flatten(fillings) as f 
from dfs.users.`/donuts.json` 
where f.cal < 300; 
// lists the fillings < 300 calories 
{ name: classic 
fillings: [ 
{ name: sugar cal: 400 }]} 
{ name: choco 
fillings: [ 
{ name: sugar cal: 400 } 
{ name: chocolate cal: 300 }]} 
{ name: bostoncreme 
fillings: [ 
{ name: sugar cal: 400 } 
{ name: cream cal: 1000 } 
{ name: jelly cal: 600 }]}
© MapR Technologies, confidential 
Complex Data Using SQL or Fluent API 
// SQL 
Result r = drill.sql( "select name, flatten(fillings) from 
`donuts.json` where fillings.cal < 300`); 
// or Fluent API 
Result r = drill.table(“donuts.json”) 
.lt(“fillings.cal”, 300).all(); 
while( r.next()) { 
String name = r.get( “name").string(); 
List fillings = r.get( “fillings”).list(); 
while(fillings.next()) { 
print(name, calories, fillings.get(“name”).string()); 
} 
} 
{ name: classic 
fillings: [ 
{ name: sugar cal: 400 }]} 
{ name: choco 
fillings: [ 
{ name: sugar cal: 400 } 
{ name: plain: 280 }]} 
{ name: bostoncreme 
fillings: [ 
{ name: sugar cal: 400 } 
{ name: cream cal: 1000 } 
{ name: jelly cal: 600 }]}
© MapR Technologies, confidential 
Queries on embedded data 
// embedded JSON value inside column donut-json inside column-family cf1 
of an hbase table donuts 
select d.name, count( d.fillings), 
from ( 
select convert_from( cf1.donut-json, json) as d 
from hbase.user.`donuts` );
© MapR Technologies, confidential 
Queries inside JSON records 
// Each JSON record itself can be a whole database 
// example: get all donuts with at least 1 filling with > 300 calories 
select d.name, count( d.fillings), 
max(d.fillings.cal) within record as mincal 
from ( select convert_from( cf1.donut-json, json) as d 
from hbase.user.`donuts` ) 
where mincal > 300;
© MapR Technologies, confidential 
a 
• Schema can change over course of query 
• Operators are able to reconfigure themselves on schema 
change events 
– Minimize flexibility overhead 
– Support more advanced execution optimization based on actual data 
characteristics
© MapR Technologies, confidential 
De-centralized metadata 
// count the number of tweets per customer, where the customers are in Hive, and 
their tweets are in HBase. Note that the hbase data has no meta-data information 
select c.customerName, hb.tweets.count 
from hive.CustomersDB.`Customers` c 
join hbase.user.`SocialData` hb 
on c.customerId = convert_from( hb.rowkey, UTF-8);
© MapR Technologies, confidential 
So what does this all mean?
© MapR Technologies, confidential 
A Drill Database 
• What is a database with Drill/MapR?
© MapR Technologies, confidential 
A Drill Database 
• What is a database with Drill/MapR? 
• Just a directory, with a bunch of related files
© MapR Technologies, confidential 
A Drill Database 
• What is a database with Drill/MapR? 
• Just a directory, with a bunch of related files 
• There’s no need for artificial boundaries 
– No need to bunch a set of tables together to call it a “database”
© MapR Technologies, confidential 
A Drill Database 
/user/srivas/work/bugs 
symptom version date bugid dump-name 
impala crash 3.1.1 14/7/14 12345 cust1.tgz 
cldb slow 3.1.0 12/7/14 45678 cust2.tgz 
BugList Customers 
name rep se dump-name 
xxxx dkim junhyuk cust1.tgz 
yyyy yoshi aki cust2.tgz
© MapR Technologies, confidential 
Queries are simple 
select b.bugid, b.symptom, b.date 
from dfs.bugs.’/Customers’ c, dfs.bugs.’/BugList’ b 
where c.dump-name = b.dump-name
© MapR Technologies, confidential 
Queries are simple 
select b.bugid, b.symptom, b.date 
from dfs.bugs.’/Customers’ c, dfs.bugs.’/BugList’ b 
where c.dump-name = b.dump-name 
Let’s say I want to cross-reference against your list: 
select bugid, symptom 
from dfs.bugs.’/Buglist’ b, dfs.yourbugs.’/YourBugFile’ b2 
where b.bugid = b2.xxx
© MapR Technologies, confidential 
What does it mean?
© MapR Technologies, confidential 
What does it mean? 
• No ETL 
• Reach out directly to the particular table/file 
• As long as the permissions are fine, you can do it 
• No need to have the meta-data 
– None needed
© MapR Technologies, confidential 
Another example 
select d.name, count( d.fillings), 
from ( select convert_from( cf1.donut-json, json) as d 
from hbase.user.`donuts` ); 
• convert_from( xx, json) invokes the json parser inside Drill
© MapR Technologies, confidential 
Another example 
select d.name, count( d.fillings), 
from ( select convert_from( cf1.donut-json, json) as d 
from hbase.user.`donuts` ); 
• convert_from( xx, json) invokes the json parser inside Drill 
• What if you could plug in any parser?
© MapR Technologies, confidential 
Another example 
select d.name, count( d.fillings), 
from ( select convert_from( cf1.donut-json, json) as d 
from hbase.user.`donuts` ); 
• convert_from( xx, json) invokes the json parser inside Drill 
• What if you could plug in any parser 
– XML? 
– Semi-conductor yield-analysis files? Oil-exploration readings? 
– Telescope readings of stars? 
– RFIDs of various things?
© MapR Technologies, confidential 
No ETL 
• Basically, Drill is querying the raw data directly 
• Joining with processed data 
• NO ETL 
• Folks, this is very, very powerful 
• NO ETL
© MapR Technologies, confidential 
Seamless integration with Apache Hive 
• Low latency queries on Hive tables 
• Support for 100s of Hive file formats 
• Ability to reuse Hive UDFs 
• Support for multiple Hive metastores in a single query
© MapR Technologies, confidential 
Underneath the Covers
© MapR Technologies, confidential 
Basic Process 
Zookeeper 
DFS/HBase DFS/HBase DFS/HBase 
Drillbit 
Distributed Cache 
Drillbit 
Distributed Cache 
Drillbit 
Distributed Cache 
Query 1. Query comes to any Drillbit (JDBC, ODBC, CLI, protobuf) 
2. Drillbit generates execution plan based on query optimization & locality 
3. Fragments are farmed to individual nodes 
4. Result is returned to driving node 
c c c
© MapR Technologies, confidential 
Stages of Query Planning 
Parser 
Logical 
Planner 
Physical 
Planner 
Query 
Foreman 
Plan 
fragments 
sent to drill 
bits 
SQL 
Query 
Heuristic and 
cost based 
Cost based
© MapR Technologies, confidential 
Query Execution 
SQL Parser 
Optimizer 
Scheduler 
Pig Parser 
Physical Plan 
Mongo 
Cassandra 
HiveQL Parser 
RPC Endpoint 
Distributed Cache 
Storage Engine Interface 
OOppereartaotorsr s 
Foreman 
Logical Plan 
HDFS 
HBase 
JDBC Endpoint ODBC Endpoint
© MapR Technologies, confidential 
A Query engine that is… 
• Columnar/Vectorized 
• Optimistic/pipelined 
• Runtime compilation 
• Late binding 
• Extensible
© MapR Technologies, confidential 
Columnar representation 
A B C D E 
A 
B 
C 
D 
On disk 
E
© MapR Technologies, confidential 
Columnar Encoding 
• Values in a col. stored next to one-another 
– Better compression 
– Range-map: save min-max, can skip if not 
present 
• Only retrieve columns participating in 
query 
• Aggregations can be performed without 
decoding 
A 
B 
C 
D 
On disk 
E
© MapR Technologies, confidential 
Run-length-encoding & Sum 
• Dataset encoded as <val> <run-length>: 
– 2, 4 (4 2’s) 
– 8, 10 (10 8’s) 
• Goal: sum all the records 
• Normally: 
– Decompress: 2, 2, 2, 2, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8 
– Add: 2 + 2 + 2 + 2 + 8 + 8 + 8 + 8 + 8 + 8 + 8 + 8 + 8 + 8 
• Optimized work: 2 * 4 + 8 * 10 
– Less memory, less operations
© MapR Technologies, confidential 
Bit-packed Dictionary Sort 
• Dataset encoded with a dictionary and bit-positions: 
– Dictionary: [Rupert, Bill, Larry] {0, 1, 2} 
– Values: [1,0,1,2,1,2,1,0] 
• Normal work 
– Decompress & store: Bill, Rupert, Bill, Larry, Bill, Larry, Bill, Rupert 
– Sort: ~24 comparisons of variable width strings 
• Optimized work 
– Sort dictionary: {Bill: 1, Larry: 2, Rupert: 0} 
– Sort bit-packed values 
– Work: max 3 string comparisons, ~24 comparisons of fixed-width dictionary bits
© MapR Technologies, confidential 
Drill 4-value semantics 
• SQL’s 3-valued semantics 
– True 
– False 
– Unknown 
• Drill adds fourth 
– Repeated
© MapR Technologies, confidential 
Vectorization 
• Drill operates on more than one record at a time 
– Word-sized manipulations 
– SIMD-like instructions 
• GCC, LLVM and JVM all do various optimizations automatically 
– Manually code algorithms 
• Logical Vectorization 
– Bitmaps allow lightning fast null-checks 
– Avoid branching to speed CPU pipeline
© MapR Technologies, confidential 
Runtime Compilation is Faster 
• JIT is smart, but 
more gains with 
runtime 
compilation 
• Janino: Java-based 
Java compiler 
From http://bit.ly/16Xk32x
© MapR Technologies, confidential 
Drill compiler 
Loaded class 
Merge byte-code 
of the two 
classes 
Janino compiles 
runtime 
byte-code 
CodeModel 
generates code 
Precompiled 
byte-code 
templates
© MapR Technologies, confidential 
Optimistic 
0 
20 
40 
60 
80 
100 
120 
140 
160 
Speed vs. check-pointing 
No need to checkpoint 
Apache Drill Checkpoint frequently
© MapR Technologies, confidential 
Optimistic Execution 
• Recovery code trivial 
– Running instances discard the failed query’s intermediate state 
• Pipelining possible 
– Send results as soon as batch is large enough 
– Requires barrier-less decomposition of query
© MapR Technologies, confidential 
Batches of Values 
• Value vectors 
– List of values, with same schema 
– With the 4-value semantics for each value 
• Shipped around in batches 
– max 256k bytes in a batch 
– max 64K rows in a batch 
• RPC designed for multiple replies to a request
© MapR Technologies, confidential 
Pipelining 
• Record batches are pipelined 
between nodes 
– ~256kB usually 
• Unit of work for Drill 
– Operators works on a batch 
• Operator reconfiguration happens 
at batch boundaries 
DrillBit 
DrillBit DrillBit
© MapR Technologies, confidential 
Pipelining Record Batches 
SQL Parser 
Optimizer 
Scheduler 
Pig Parser 
Physical Plan 
Mongo 
Cassandra 
HiveQL Parser 
RPC Endpoint 
Distributed Cache 
Storage Engine Interface 
OOppereartaotorsr s 
Foreman 
Logical Plan 
HDFS 
HBase 
JDBC Endpoint ODBC Endpoint
© MapR Technologies, confidential 
DISK 
Pipelining 
• Random access: sort without copy or 
restructuring 
• Avoids serialization/deserialization 
• Off-heap (no GC woes when lots of memory) 
• Full specification + off-heap + batch 
– Enables C/C++ operators (fast!) 
• Read/write to disk 
– when data larger than memory 
Drill Bit 
Memory 
overflow 
uses disk
© MapR Technologies, confidential 
Cost-based Optimization 
• Using Optiq, an extensible framework 
• Pluggable rules, and cost model 
• Rules for distributed plan generation 
• Insert Exchange operator into physical plan 
• Optiq enhanced to explore parallel query plans 
• Pluggable cost model 
– CPU, IO, memory, network cost (data locality) 
– Storage engine features (HDFS vs HIVE vs HBase) 
Query 
Optimizer 
Pluggable 
rules 
Pluggable 
cost model
© MapR Technologies, confidential 
Distributed Plan Cost 
• Operators have distribution property 
• Hash, Broadcast, Singleton, … 
• Exchange operator to enforce distributions 
• Hash: HashToRandomExchange 
• Broadcast: BroadcastExchange 
• Singleton: UnionExchange, SingleMergeExchange 
• Enumerate all, use cost to pick best 
• Merge Join vs Hash Join 
• Partition-based join vs Broadcast-based join 
• Streaming Aggregation vs Hash Aggregation 
• Aggregation in one phase or two phases 
• partial local aggregation followed by final aggregation 
HashToRandomExchange 
Sort 
Streaming-Aggregation 
Data Data Data
© MapR Technologies, confidential 
Drill 1.0 Hive 0.13 w/ Tez Impala 1.x 
Latency Low Medium Low 
Files Yes (all Hive file formats, 
plus JSON, Text, …) 
Yes (all Hive file formats) Yes (Parquet, Sequence, 
…) 
HBase/MapR-DB Yes Yes, perf issues Yes, with issues 
Schema Hive or schema-less Hive Hive 
SQL support ANSI SQL HiveQL HiveQL (subset) 
Client support ODBC/JDBC ODBC/JDBC ODBC/JDBC 
Hive compat High High Low 
Large datasets Yes Yes Limited 
Nested data Yes Limited No 
Concurrency High Limited Medium 
Interactive SQL-on-Hadoop options
© MapR Technologies, confidential 
Apache Drill Roadmap 
•Low-latency SQL 
•Schema-less execution 
•Files & HBase/M7 support 
•Hive integration 
•BI and SQL tool support via 
ODBC/JDBC 
Data exploration/ad-hoc queries 
1.0 
•HBase query speedup 
•Nested data functions 
•Advanced SQL 
functionality 
Advanced analytics and 
operational data 
1.1 
•Ultra low latency queries 
•Single row 
insert/update/delete 
•Workload management 
Operational SQL 
2.0
© MapR Technologies, confidential 
MapR Distribution for Apache Hadoop 
MapR Data Platform 
(Random Read/Write) 
Enterprise Grade Data Hub Operational 
MapR-FS 
(POSIX) 
MapR-DB 
(High-Performance NoSQL) 
Security 
YARN 
Pig 
Cascading 
Spark 
Batch 
Spark 
Streaming 
Storm* 
Streaming 
HBase 
Solr 
NoSQL & 
Search 
Juju 
Provisioning 
& 
Coordination 
Savannah* 
Mahout 
MLLib 
ML, Graph 
GraphX 
MapReduc 
e v1 & v2 
APACHE HADOOP AND OSS ECOSYSTEM 
EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS 
Workflow 
& Data 
Tez* Governance 
Accumulo* 
Hive 
Impala 
Shark 
Drill* 
SQL 
Sqoop Sentry* Oozie ZooKeeper 
Flume Knox* Falcon* Whirr 
Data 
Integration 
& Access 
HttpFS 
Hue 
NFS HDFS API HBase API JSON API 
MapR Control System 
(Management and Monitoring) 
* In Roadmap for inclusion/certification 
CLI REST API GUI
© MapR Technologies, confidential 
Apache Drill Resources 
• Drill 0.5 released last week 
• Getting started with Drill is easy 
– just download tarball and start running SQL queries on local files 
• Mailing lists 
– drill-user@incubator.apache.org 
– drill-dev@incubator.apache.org 
• Docs: https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+Wiki 
• Fork us on GitHub: http://github.com/apache/incubator-drill/ 
• Create a JIRA: https://issues.apache.org/jira/browse/DRILL
© MapR Technologies, confidential 
Active Drill Community 
• Large community, growing rapidly 
– 35-40 contributors, 16 committers 
– Microsoft, Linked-in, Oracle, Facebook, Visa, Lucidworks, 
Concurrent, many universities 
• In 2014 
– over 20 meet-ups, many more coming soon 
– 3 hackathons, with 40+ participants 
• Encourage you to join, learn, contribute and have fun …
© MapR Technologies, confidential 
Drill at MapR 
• World-class SQL team, ~20 people 
• 150+ years combined experience building commercial 
databases 
• Oracle, DB2, ParAccel, Teradata, SQLServer, Vertica 
• Team works on Drill, Hive, Impala 
• Fixed some of the toughest problems in Apache Hive
© MapR Technologies, confidential 
Thank you! 
M. C. Srivas 
srivas@mapr.com 
Did I mention we are hiring…

Mais conteúdo relacionado

Mais procurados

Drilling into Data with Apache Drill
Drilling into Data with Apache DrillDrilling into Data with Apache Drill
Drilling into Data with Apache DrillMapR Technologies
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresDataWorks Summit
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersRahul Jain
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentationMapR Technologies
 
Putting Apache Drill into Production
Putting Apache Drill into ProductionPutting Apache Drill into Production
Putting Apache Drill into ProductionMapR Technologies
 
Hive Anatomy
Hive AnatomyHive Anatomy
Hive Anatomynzhang
 
Hadoop and Spark for the SAS Developer
Hadoop and Spark for the SAS DeveloperHadoop and Spark for the SAS Developer
Hadoop and Spark for the SAS DeveloperDataWorks Summit
 
Introduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityIntroduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityMapR Technologies
 
Understanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache DrillUnderstanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache DrillDataWorks Summit
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hari Shankar Sreekumar
 
Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3Manish Chopra
 
Drill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is PossibleDrill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is PossibleMapR Technologies
 
Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Scott Leberknight
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystemsunera pathan
 

Mais procurados (20)

Drilling into Data with Apache Drill
Drilling into Data with Apache DrillDrilling into Data with Apache Drill
Drilling into Data with Apache Drill
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
 
M7 and Apache Drill, Micheal Hausenblas
M7 and Apache Drill, Micheal HausenblasM7 and Apache Drill, Micheal Hausenblas
M7 and Apache Drill, Micheal Hausenblas
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentation
 
Introduction to Apache Drill
Introduction to Apache DrillIntroduction to Apache Drill
Introduction to Apache Drill
 
Putting Apache Drill into Production
Putting Apache Drill into ProductionPutting Apache Drill into Production
Putting Apache Drill into Production
 
Hive Anatomy
Hive AnatomyHive Anatomy
Hive Anatomy
 
Hadoop and Spark for the SAS Developer
Hadoop and Spark for the SAS DeveloperHadoop and Spark for the SAS Developer
Hadoop and Spark for the SAS Developer
 
Introduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityIntroduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and Security
 
Understanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache DrillUnderstanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache Drill
 
Hadoop 1.x vs 2
Hadoop 1.x vs 2Hadoop 1.x vs 2
Hadoop 1.x vs 2
 
6.hive
6.hive6.hive
6.hive
 
Hadoop architecture by ajay
Hadoop architecture by ajayHadoop architecture by ajay
Hadoop architecture by ajay
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
 
Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3
 
Drill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is PossibleDrill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is Possible
 
Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0
 
Apache Drill
Apache DrillApache Drill
Apache Drill
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 

Semelhante a Apache Drill - Why, What, How

Webinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop SolutionWebinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop SolutionMapR Technologies
 
Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop BigDataEverywhere
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillTomer Shiran
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeMapR Technologies
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeDataWorks Summit
 
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)BigDataEverywhere
 
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranThe Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranMapR Technologies
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARNDataWorks Summit
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelt3rmin4t0r
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRData Con LA
 
Cleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - SparkCleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - SparkVince Gonzalez
 
Self-Service Data Exploration with Apache Drill
Self-Service Data Exploration with Apache DrillSelf-Service Data Exploration with Apache Drill
Self-Service Data Exploration with Apache DrillMapR Technologies
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform WebinarCloudera, Inc.
 
Tez Data Processing over Yarn
Tez Data Processing over YarnTez Data Processing over Yarn
Tez Data Processing over YarnInMobi Technology
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream DataDataWorks Summit
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...Amazon Web Services
 

Semelhante a Apache Drill - Why, What, How (20)

Webinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop SolutionWebinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop Solution
 
Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drill
 
2014 08-20-pit-hug
2014 08-20-pit-hug2014 08-20-pit-hug
2014 08-20-pit-hug
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About Time
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About Time
 
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
 
Drilling on JSON
Drilling on JSONDrilling on JSON
Drilling on JSON
 
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranThe Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARN
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthel
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Cleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - SparkCleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - Spark
 
Self-Service Data Exploration with Apache Drill
Self-Service Data Exploration with Apache DrillSelf-Service Data Exploration with Apache Drill
Self-Service Data Exploration with Apache Drill
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
 
Tez Data Processing over Yarn
Tez Data Processing over YarnTez Data Processing over Yarn
Tez Data Processing over Yarn
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream Data
 
Drill 1.0
Drill 1.0Drill 1.0
Drill 1.0
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
 

Último

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 

Último (20)

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 

Apache Drill - Why, What, How

  • 1. © MapR Technologies, confidential © 2014 MapR Technologies M. C. Srivas, CTO and Founder
  • 2. © MapR Technologies, confidential MapR is Unbiased Open Source
  • 3. © MapR Technologies, confidential Linux Is Unbiased • Linux provides choice – MySQL – PostgreSQL – SQLite • Linux provides choice – Apache httpd – Nginx – Lighttpd
  • 4. © MapR Technologies, confidential MapR Is Unbiased • MapR provides choice MapR Distribution for Hadoop Distribution C Distribution H Spark Spark (all of it) and SparkSQL Spark only No Interactive SQL Impala, Drill, Hive/Tez, SparkSQL One option (Impala) One option (Hive/Tez) Scheduler YARN, Mesos One option (YARN) One option (YARN) Versions Hive 0.10, 0.11, 0.12, 0.13 Pig 0.11, 012 HBase 0.94, 0.98 One version One version
  • 5. © MapR Technologies, confidential MapR Distribution for Apache Hadoop MapR Data Platform (Random Read/Write) Enterprise Grade Data Hub Operational MapR-FS (POSIX) MapR-DB (High-Performance NoSQL) Security YARN Pig Cascading Spark Batch Spark Streaming Storm* Streaming HBase Solr NoSQL & Search Juju Provisioning & Coordination Savannah* Mahout MLLib ML, Graph GraphX MapReduc e v1 & v2 APACHE HADOOP AND OSS ECOSYSTEM EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS Workflow & Data Tez* Governance Accumulo* Hive Impala Shark Drill* SQL Sqoop Sentry* Oozie ZooKeeper Flume Knox* Falcon* Whirr Data Integration & Access HttpFS Hue NFS HDFS API HBase API JSON API MapR Control System (Management and Monitoring) * In Roadmap for inclusion/certification CLI REST API GUI
  • 6. © MapR Technologies, confidential
  • 7. © MapR Technologies, confidential Hadoop an augmentation for EDW—Why?
  • 8. © MapR Technologies, confidential
  • 9. © MapR Technologies, confidential
  • 10. © MapR Technologies, confidential
  • 11. © MapR Technologies, confidential
  • 12. © MapR Technologies, confidential
  • 13. © MapR Technologies, confidential But inside, it looks like this …
  • 14. © MapR Technologies, confidential And this …
  • 15. © MapR Technologies, confidential And this …
  • 16. © MapR Technologies, confidential Consolidating schemas is very hard.
  • 17. © MapR Technologies, confidential Consolidating schemas is very hard, causes SILOs
  • 18. © MapR Technologies, confidential Silos make analysis very difficult • How do I identify a unique {customer, trade} across data sets? • How can I guarantee the lack of anomalous behavior if I can’t see all data?
  • 19. © 2014 MapR Technologies 19 Hard to know what’s of value a priori
  • 20. © 2014 MapR Technologies 20 Hard to know what’s of value a priori
  • 21. © 2014 MapR Technologies 21 Why Hadoop
  • 22. © MapR Technologies, confidential Rethink SQL for Big Data Preserve •ANSI SQL • Familiar and ubiquitous • Performance • Interactive nature crucial for BI/Analytics • One technology • Painful to manage different technologies • Enterprise ready • System-of-record, HA, DR, Security, Multi-tenancy, …
  • 23. © MapR Technologies, confidential Rethink SQL for Big Data Preserve •ANSI SQL • Familiar and ubiquitous • Performance • Interactive nature crucial for BI/Analytics • One technology • Painful to manage different technologies • Enterprise ready • System-of-record, HA, DR, Security, Multi-tenancy, … Invent • Flexible data-model • Allow schemas to evolve rapidly • Support semi-structured data types • Agility • Self-service possible when developer and DBA is same • Scalability • In all dimensions: data, speed, schemas, processes, management
  • 24. © MapR Technologies, confidential SQL is here to stay
  • 25. © MapR Technologies, confidential Hadoop is here to stay
  • 26. © MapR Technologies, confidential YOU CAN’T HANDLE REAL SQL
  • 27. © MapR Technologies, confidential SQL select * from A where exists ( select 1 from B where B.b < 100 ); • Did you know Apache HIVE cannot compute it? – eg, Hive, Impala, Spark/Shark
  • 28. © MapR Technologies, confidential Self-described Data select cf.month, cf.year from hbase.table1; • Did you know normal SQL cannot handle the above? • Nor can HIVE and its variants like Impala, Shark?
  • 29. © MapR Technologies, confidential Self-described Data select cf.month, cf.year from hbase.table1; • Why? • Because there’s no meta-store definition available
  • 30. © MapR Technologies, confidential Self-Describing Data Ubiquitous Centralized schema - Static - Managed by the DBAs - In a centralized repository Long, meticulous data preparation process (ETL, create/alter schema, etc.) – can take 6-18 months Self-describing, or schema-less, data - Dynamic/evolving - Managed by the applications - Embedded in the data Less schema, more suitable for data that has higher volume, variety and velocity Apache Drill
  • 31. © MapR Technologies, confidential A Quick Tour through Apache Drill
  • 32. © MapR Technologies, confidential Data Source is in the Query select timestamp, message from dfs1.logs.`AppServerLogs/2014/Jan/p001.parquet` where errorLevel > 2
  • 33. © MapR Technologies, confidential Data Source is in the Query select timestamp, message from dfs1.logs.`AppServerLogs/2014/Jan/p001.parquet` where errorLevel > 2 This is a cluster in Apache Drill - DFS - HBase - Hive meta-store
  • 34. © MapR Technologies, confidential Data Source is in the Query select timestamp, message from dfs1.logs.`AppServerLogs/2014/Jan/p001.parquet` where errorLevel > 2 This is a cluster in Apache Drill - DFS - HBase - Hive meta-store A work-space - Typically a sub-directory - HIVE database
  • 35. © MapR Technologies, confidential Data Source is in the Query select timestamp, message from dfs1.logs.`AppServerLogs/2014/Jan/p001.parquet` where errorLevel > 2 This is a cluster in Apache Drill - DFS - HBase - Hive meta-store A work-space - Typically a sub-directory - HIVE database A table - pathnames - Hbase table - Hive table
  • 36. © MapR Technologies, confidential Combine data sources on the fly • JSON • CSV • ORC (ie, all Hive types) • Parquet • HBase tables • … can combine them Select USERS.name, USERS.emails.work from dfs.logs.`/data/logs` LOGS, dfs.users.`/profiles.json` USERS, where LOGS.uid = USERS.uid and errorLevel > 5 order by count(*);
  • 37. © MapR Technologies, confidential Can be an entire directory tree // On a file select errorLevel, count(*) from dfs.logs.`/AppServerLogs/2014/Jan/part0001.parquet` group by errorLevel;
  • 38. © MapR Technologies, confidential Can be an entire directory tree // On a file select errorLevel, count(*) from dfs.logs.`/AppServerLogs/2014/Jan/part0001.parquet` group by errorLevel; // On the entire data collection: all years, all months select errorLevel, count(*) from dfs.logs.`/AppServerLogs` group by errorLevel
  • 39. © MapR Technologies, confidential Can be an entire directory tree // On a file select errorLevel, count(*) from dfs.logs.`/AppServerLogs/2014/Jan/part0001.parquet` group by errorLevel; // On the entire data collection: all years, all months select errorLevel, count(*) from dfs.logs.`/AppServerLogs` group by errorLevel dirs[1] dirs[2]
  • 40. © MapR Technologies, confidential Can be an entire directory tree // On a file select errorLevel, count(*) from dfs.logs.`/AppServerLogs/2014/Jan/part0001.parquet` group by errorLevel; // On the entire data collection: all years, all months select errorLevel, count(*) from dfs.logs.`/AppServerLogs` group by errorLevel where dirs[1] > 2012 , dirs[2] dirs[1] dirs[2]
  • 41. © MapR Technologies, confidential Querying JSON { name: classic fillings: [ { name: sugar cal: 400 }]} { name: choco fillings: [ { name: sugar cal: 400 } { name: chocolate cal: 300 }]} { name: bostoncreme fillings: [ { name: sugar cal: 400 } { name: cream cal: 1000 } { name: jelly cal: 600 }]} donuts.json
  • 42. © MapR Technologies, confidential Cursors inside Drill DrillClient drill = new DrillClient().connect( …); ResultReader r = drill.runSqlQuery( "select * from `donuts.json`"); while( r.next()) { String donutName = r.reader( “name").readString(); ListReader fillings = r.reader( "fillings"); while( fillings.next()) { int calories = fillings.reader( "cal").readInteger(); if (calories > 400) print( donutName, calories, fillings.reader( "name").readString()); } } { name: classic fillings: [ { name: sugar cal: 400 }]} { name: choco fillings: [ { name: sugar cal: 400 } { name: chocolate cal: 300 }]} { name: bostoncreme fillings: [ { name: sugar cal: 400 } { name: cream cal: 1000 } { name: jelly cal: 600 }]}
  • 43. © MapR Technologies, confidential Direct queries on nested data // Flattening maps in JSON, parquet and other nested records select name, flatten(fillings) as f from dfs.users.`/donuts.json` where f.cal < 300; // lists the fillings < 300 calories { name: classic fillings: [ { name: sugar cal: 400 }]} { name: choco fillings: [ { name: sugar cal: 400 } { name: chocolate cal: 300 }]} { name: bostoncreme fillings: [ { name: sugar cal: 400 } { name: cream cal: 1000 } { name: jelly cal: 600 }]}
  • 44. © MapR Technologies, confidential Complex Data Using SQL or Fluent API // SQL Result r = drill.sql( "select name, flatten(fillings) from `donuts.json` where fillings.cal < 300`); // or Fluent API Result r = drill.table(“donuts.json”) .lt(“fillings.cal”, 300).all(); while( r.next()) { String name = r.get( “name").string(); List fillings = r.get( “fillings”).list(); while(fillings.next()) { print(name, calories, fillings.get(“name”).string()); } } { name: classic fillings: [ { name: sugar cal: 400 }]} { name: choco fillings: [ { name: sugar cal: 400 } { name: plain: 280 }]} { name: bostoncreme fillings: [ { name: sugar cal: 400 } { name: cream cal: 1000 } { name: jelly cal: 600 }]}
  • 45. © MapR Technologies, confidential Queries on embedded data // embedded JSON value inside column donut-json inside column-family cf1 of an hbase table donuts select d.name, count( d.fillings), from ( select convert_from( cf1.donut-json, json) as d from hbase.user.`donuts` );
  • 46. © MapR Technologies, confidential Queries inside JSON records // Each JSON record itself can be a whole database // example: get all donuts with at least 1 filling with > 300 calories select d.name, count( d.fillings), max(d.fillings.cal) within record as mincal from ( select convert_from( cf1.donut-json, json) as d from hbase.user.`donuts` ) where mincal > 300;
  • 47. © MapR Technologies, confidential a • Schema can change over course of query • Operators are able to reconfigure themselves on schema change events – Minimize flexibility overhead – Support more advanced execution optimization based on actual data characteristics
  • 48. © MapR Technologies, confidential De-centralized metadata // count the number of tweets per customer, where the customers are in Hive, and their tweets are in HBase. Note that the hbase data has no meta-data information select c.customerName, hb.tweets.count from hive.CustomersDB.`Customers` c join hbase.user.`SocialData` hb on c.customerId = convert_from( hb.rowkey, UTF-8);
  • 49. © MapR Technologies, confidential So what does this all mean?
  • 50. © MapR Technologies, confidential A Drill Database • What is a database with Drill/MapR?
  • 51. © MapR Technologies, confidential A Drill Database • What is a database with Drill/MapR? • Just a directory, with a bunch of related files
  • 52. © MapR Technologies, confidential A Drill Database • What is a database with Drill/MapR? • Just a directory, with a bunch of related files • There’s no need for artificial boundaries – No need to bunch a set of tables together to call it a “database”
  • 53. © MapR Technologies, confidential A Drill Database /user/srivas/work/bugs symptom version date bugid dump-name impala crash 3.1.1 14/7/14 12345 cust1.tgz cldb slow 3.1.0 12/7/14 45678 cust2.tgz BugList Customers name rep se dump-name xxxx dkim junhyuk cust1.tgz yyyy yoshi aki cust2.tgz
  • 54. © MapR Technologies, confidential Queries are simple select b.bugid, b.symptom, b.date from dfs.bugs.’/Customers’ c, dfs.bugs.’/BugList’ b where c.dump-name = b.dump-name
  • 55. © MapR Technologies, confidential Queries are simple select b.bugid, b.symptom, b.date from dfs.bugs.’/Customers’ c, dfs.bugs.’/BugList’ b where c.dump-name = b.dump-name Let’s say I want to cross-reference against your list: select bugid, symptom from dfs.bugs.’/Buglist’ b, dfs.yourbugs.’/YourBugFile’ b2 where b.bugid = b2.xxx
  • 56. © MapR Technologies, confidential What does it mean?
  • 57. © MapR Technologies, confidential What does it mean? • No ETL • Reach out directly to the particular table/file • As long as the permissions are fine, you can do it • No need to have the meta-data – None needed
  • 58. © MapR Technologies, confidential Another example select d.name, count( d.fillings), from ( select convert_from( cf1.donut-json, json) as d from hbase.user.`donuts` ); • convert_from( xx, json) invokes the json parser inside Drill
  • 59. © MapR Technologies, confidential Another example select d.name, count( d.fillings), from ( select convert_from( cf1.donut-json, json) as d from hbase.user.`donuts` ); • convert_from( xx, json) invokes the json parser inside Drill • What if you could plug in any parser?
  • 60. © MapR Technologies, confidential Another example select d.name, count( d.fillings), from ( select convert_from( cf1.donut-json, json) as d from hbase.user.`donuts` ); • convert_from( xx, json) invokes the json parser inside Drill • What if you could plug in any parser – XML? – Semi-conductor yield-analysis files? Oil-exploration readings? – Telescope readings of stars? – RFIDs of various things?
  • 61. © MapR Technologies, confidential No ETL • Basically, Drill is querying the raw data directly • Joining with processed data • NO ETL • Folks, this is very, very powerful • NO ETL
  • 62. © MapR Technologies, confidential Seamless integration with Apache Hive • Low latency queries on Hive tables • Support for 100s of Hive file formats • Ability to reuse Hive UDFs • Support for multiple Hive metastores in a single query
  • 63. © MapR Technologies, confidential Underneath the Covers
  • 64. © MapR Technologies, confidential Basic Process Zookeeper DFS/HBase DFS/HBase DFS/HBase Drillbit Distributed Cache Drillbit Distributed Cache Drillbit Distributed Cache Query 1. Query comes to any Drillbit (JDBC, ODBC, CLI, protobuf) 2. Drillbit generates execution plan based on query optimization & locality 3. Fragments are farmed to individual nodes 4. Result is returned to driving node c c c
  • 65. © MapR Technologies, confidential Stages of Query Planning Parser Logical Planner Physical Planner Query Foreman Plan fragments sent to drill bits SQL Query Heuristic and cost based Cost based
  • 66. © MapR Technologies, confidential Query Execution SQL Parser Optimizer Scheduler Pig Parser Physical Plan Mongo Cassandra HiveQL Parser RPC Endpoint Distributed Cache Storage Engine Interface OOppereartaotorsr s Foreman Logical Plan HDFS HBase JDBC Endpoint ODBC Endpoint
  • 67. © MapR Technologies, confidential A Query engine that is… • Columnar/Vectorized • Optimistic/pipelined • Runtime compilation • Late binding • Extensible
  • 68. © MapR Technologies, confidential Columnar representation A B C D E A B C D On disk E
  • 69. © MapR Technologies, confidential Columnar Encoding • Values in a col. stored next to one-another – Better compression – Range-map: save min-max, can skip if not present • Only retrieve columns participating in query • Aggregations can be performed without decoding A B C D On disk E
  • 70. © MapR Technologies, confidential Run-length-encoding & Sum • Dataset encoded as <val> <run-length>: – 2, 4 (4 2’s) – 8, 10 (10 8’s) • Goal: sum all the records • Normally: – Decompress: 2, 2, 2, 2, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8 – Add: 2 + 2 + 2 + 2 + 8 + 8 + 8 + 8 + 8 + 8 + 8 + 8 + 8 + 8 • Optimized work: 2 * 4 + 8 * 10 – Less memory, less operations
  • 71. © MapR Technologies, confidential Bit-packed Dictionary Sort • Dataset encoded with a dictionary and bit-positions: – Dictionary: [Rupert, Bill, Larry] {0, 1, 2} – Values: [1,0,1,2,1,2,1,0] • Normal work – Decompress & store: Bill, Rupert, Bill, Larry, Bill, Larry, Bill, Rupert – Sort: ~24 comparisons of variable width strings • Optimized work – Sort dictionary: {Bill: 1, Larry: 2, Rupert: 0} – Sort bit-packed values – Work: max 3 string comparisons, ~24 comparisons of fixed-width dictionary bits
  • 72. © MapR Technologies, confidential Drill 4-value semantics • SQL’s 3-valued semantics – True – False – Unknown • Drill adds fourth – Repeated
  • 73. © MapR Technologies, confidential Vectorization • Drill operates on more than one record at a time – Word-sized manipulations – SIMD-like instructions • GCC, LLVM and JVM all do various optimizations automatically – Manually code algorithms • Logical Vectorization – Bitmaps allow lightning fast null-checks – Avoid branching to speed CPU pipeline
  • 74. © MapR Technologies, confidential Runtime Compilation is Faster • JIT is smart, but more gains with runtime compilation • Janino: Java-based Java compiler From http://bit.ly/16Xk32x
  • 75. © MapR Technologies, confidential Drill compiler Loaded class Merge byte-code of the two classes Janino compiles runtime byte-code CodeModel generates code Precompiled byte-code templates
  • 76. © MapR Technologies, confidential Optimistic 0 20 40 60 80 100 120 140 160 Speed vs. check-pointing No need to checkpoint Apache Drill Checkpoint frequently
  • 77. © MapR Technologies, confidential Optimistic Execution • Recovery code trivial – Running instances discard the failed query’s intermediate state • Pipelining possible – Send results as soon as batch is large enough – Requires barrier-less decomposition of query
  • 78. © MapR Technologies, confidential Batches of Values • Value vectors – List of values, with same schema – With the 4-value semantics for each value • Shipped around in batches – max 256k bytes in a batch – max 64K rows in a batch • RPC designed for multiple replies to a request
  • 79. © MapR Technologies, confidential Pipelining • Record batches are pipelined between nodes – ~256kB usually • Unit of work for Drill – Operators works on a batch • Operator reconfiguration happens at batch boundaries DrillBit DrillBit DrillBit
  • 80. © MapR Technologies, confidential Pipelining Record Batches SQL Parser Optimizer Scheduler Pig Parser Physical Plan Mongo Cassandra HiveQL Parser RPC Endpoint Distributed Cache Storage Engine Interface OOppereartaotorsr s Foreman Logical Plan HDFS HBase JDBC Endpoint ODBC Endpoint
  • 81. © MapR Technologies, confidential DISK Pipelining • Random access: sort without copy or restructuring • Avoids serialization/deserialization • Off-heap (no GC woes when lots of memory) • Full specification + off-heap + batch – Enables C/C++ operators (fast!) • Read/write to disk – when data larger than memory Drill Bit Memory overflow uses disk
  • 82. © MapR Technologies, confidential Cost-based Optimization • Using Optiq, an extensible framework • Pluggable rules, and cost model • Rules for distributed plan generation • Insert Exchange operator into physical plan • Optiq enhanced to explore parallel query plans • Pluggable cost model – CPU, IO, memory, network cost (data locality) – Storage engine features (HDFS vs HIVE vs HBase) Query Optimizer Pluggable rules Pluggable cost model
  • 83. © MapR Technologies, confidential Distributed Plan Cost • Operators have distribution property • Hash, Broadcast, Singleton, … • Exchange operator to enforce distributions • Hash: HashToRandomExchange • Broadcast: BroadcastExchange • Singleton: UnionExchange, SingleMergeExchange • Enumerate all, use cost to pick best • Merge Join vs Hash Join • Partition-based join vs Broadcast-based join • Streaming Aggregation vs Hash Aggregation • Aggregation in one phase or two phases • partial local aggregation followed by final aggregation HashToRandomExchange Sort Streaming-Aggregation Data Data Data
  • 84. © MapR Technologies, confidential Drill 1.0 Hive 0.13 w/ Tez Impala 1.x Latency Low Medium Low Files Yes (all Hive file formats, plus JSON, Text, …) Yes (all Hive file formats) Yes (Parquet, Sequence, …) HBase/MapR-DB Yes Yes, perf issues Yes, with issues Schema Hive or schema-less Hive Hive SQL support ANSI SQL HiveQL HiveQL (subset) Client support ODBC/JDBC ODBC/JDBC ODBC/JDBC Hive compat High High Low Large datasets Yes Yes Limited Nested data Yes Limited No Concurrency High Limited Medium Interactive SQL-on-Hadoop options
  • 85. © MapR Technologies, confidential Apache Drill Roadmap •Low-latency SQL •Schema-less execution •Files & HBase/M7 support •Hive integration •BI and SQL tool support via ODBC/JDBC Data exploration/ad-hoc queries 1.0 •HBase query speedup •Nested data functions •Advanced SQL functionality Advanced analytics and operational data 1.1 •Ultra low latency queries •Single row insert/update/delete •Workload management Operational SQL 2.0
  • 86. © MapR Technologies, confidential MapR Distribution for Apache Hadoop MapR Data Platform (Random Read/Write) Enterprise Grade Data Hub Operational MapR-FS (POSIX) MapR-DB (High-Performance NoSQL) Security YARN Pig Cascading Spark Batch Spark Streaming Storm* Streaming HBase Solr NoSQL & Search Juju Provisioning & Coordination Savannah* Mahout MLLib ML, Graph GraphX MapReduc e v1 & v2 APACHE HADOOP AND OSS ECOSYSTEM EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS Workflow & Data Tez* Governance Accumulo* Hive Impala Shark Drill* SQL Sqoop Sentry* Oozie ZooKeeper Flume Knox* Falcon* Whirr Data Integration & Access HttpFS Hue NFS HDFS API HBase API JSON API MapR Control System (Management and Monitoring) * In Roadmap for inclusion/certification CLI REST API GUI
  • 87. © MapR Technologies, confidential Apache Drill Resources • Drill 0.5 released last week • Getting started with Drill is easy – just download tarball and start running SQL queries on local files • Mailing lists – drill-user@incubator.apache.org – drill-dev@incubator.apache.org • Docs: https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+Wiki • Fork us on GitHub: http://github.com/apache/incubator-drill/ • Create a JIRA: https://issues.apache.org/jira/browse/DRILL
  • 88. © MapR Technologies, confidential Active Drill Community • Large community, growing rapidly – 35-40 contributors, 16 committers – Microsoft, Linked-in, Oracle, Facebook, Visa, Lucidworks, Concurrent, many universities • In 2014 – over 20 meet-ups, many more coming soon – 3 hackathons, with 40+ participants • Encourage you to join, learn, contribute and have fun …
  • 89. © MapR Technologies, confidential Drill at MapR • World-class SQL team, ~20 people • 150+ years combined experience building commercial databases • Oracle, DB2, ParAccel, Teradata, SQLServer, Vertica • Team works on Drill, Hive, Impala • Fixed some of the toughest problems in Apache Hive
  • 90. © MapR Technologies, confidential Thank you! M. C. Srivas srivas@mapr.com Did I mention we are hiring…