SlideShare uma empresa Scribd logo
1 de 44
Baixar para ler offline
1
Cloudera	
  Impala	
  
SD	
  Big	
  Data	
  Monthly	
  Meetup	
  #2	
  
August	
  13th	
  2014	
  
	
  
Maxime	
  Dumas	
  
Systems	
  Engineer	
  
Thirty	
  Seconds	
  About	
  Max	
  
•  Systems	
  Engineer	
  
•  aka	
  Sales	
  Engineer	
  
•  SoCal,	
  AZ,	
  NV	
  
•  former	
  coder	
  of	
  PHP	
  
•  teaches	
  meditaLon	
  +	
  yoga	
  
•  from	
  Montreal,	
  Canada	
  
2	
  
What	
  Does	
  Cloudera	
  Do?	
  
•  product	
  
•  distribuLon	
  of	
  Hadoop	
  components,	
  Apache	
  licensed	
  
•  enterprise	
  tooling	
  
•  support	
  
•  training	
  
•  services	
  (aka	
  consulLng)	
  
•  community	
  
3
What	
  This	
  Talk	
  Isn’t	
  About	
  
•  deploying	
  
•  Puppet,	
  Chef,	
  Ansible,	
  homegrown	
  scripts,	
  intern	
  labor	
  
•  sizing	
  &	
  tuning	
  
•  depends	
  heavily	
  on	
  data	
  and	
  workload	
  
•  coding	
  
•  unless	
  you	
  count	
  XML	
  or	
  CSV	
  or	
  SQL	
  
•  algorithms	
  
4
Public	
  Domain	
  IFCAR	
  
What	
  is	
  Cloudera	
  Impala?	
  
6
cloud·∙e·∙ra	
  im·∙pal·∙a	
  
7
/kloudˈi(ə)rə	
  imˈpalə/	
  
	
  
noun	
  
	
  
a	
  modern,	
  open	
  source,	
  MPP	
  SQL	
  query	
  
engine	
  for	
  Apache	
  Hadoop.	
  
	
  
“Cloudera	
  Impala	
  provides	
  fast,	
  ad	
  hoc	
  SQL	
  
query	
  capability	
  for	
  Apache	
  Hadoop,	
  
complemenLng	
  tradiLonal	
  MapReduce	
  batch	
  
processing.”	
  
8
Quick	
  and	
  dirty,	
  for	
  context.	
  
The	
  Apache	
  Hadoop	
  Ecosystem	
  
Why	
  “Ecosystem?”	
  
•  In	
  the	
  beginning,	
  just	
  Hadoop	
  
•  HDFS	
  
•  MapReduce	
  
•  Today,	
  dozens	
  of	
  interrelated	
  components	
  
•  I/O	
  
•  Processing	
  
•  Specialty	
  ApplicaLons	
  
•  ConfiguraLon	
  
•  Workflow	
  
9
HDFS	
  
•  Distributed,	
  highly	
  fault-­‐tolerant	
  filesystem	
  
•  OpLmized	
  for	
  large	
  streaming	
  access	
  to	
  data	
  
•  Based	
  on	
  Google	
  File	
  System	
  
•  hjp://research.google.com/archive/gfs.html	
  
10
Lots	
  of	
  Commodity	
  Machines	
  
11
Image:Yahoo! Hadoop cluster [ OSCON ’07 ]
MapReduce	
  (MR)	
  
•  Programming	
  paradigm	
  
•  Batch	
  oriented,	
  not	
  realLme	
  
•  Works	
  well	
  with	
  distributed	
  compuLng	
  
•  Lots	
  of	
  Java,	
  but	
  other	
  languages	
  supported	
  
•  Based	
  on	
  Google’s	
  paper	
  
•  hjp://research.google.com/archive/mapreduce.html	
  
12
Apache	
  Hive	
  
•  AbstracLon	
  of	
  Hadoop’s	
  Java	
  API	
  
•  HiveQL	
  “compiles”	
  down	
  to	
  MR	
  
•  a	
  “SQL-­‐like”	
  language	
  
•  Eases	
  analysis	
  using	
  MapReduce	
  
13
Apache	
  Hive	
  Metastore	
  
•  Maps	
  HDFS	
  files	
  to	
  DB-­‐like	
  resources	
  
•  Databases	
  
•  Tables	
  
•  Column/field	
  names,	
  data	
  types	
  
•  Roles/users	
  
•  InputFormat/OutputFormat	
  
14
Sqoop	
  
©2011 Cloudera, Inc. All Rights
Reserved.
15
•  SQL	
  to	
  Hadoop	
  
•  Tool	
  to	
  import/export	
  any	
  JDBC-­‐supported	
  database	
  into	
  Hadoop	
  
•  Transfer	
  data	
  between	
  Hadoop	
  and	
  external	
  databases	
  or	
  EDW	
  
•  High	
  performance	
  connectors	
  for	
  some	
  RDBMS	
  
•  Oracle,	
  Teradata,	
  Netezza	
  
•  Developed	
  at	
  Cloudera	
  
16	
  
17
Familiar	
  interface,	
  but	
  more	
  powerful.	
  
Cloudera	
  Impala	
  
Cloudera	
  Impala	
  
18
Interac(ve	
  SQL	
  for	
  Hadoop	
  
§ Responses	
  in	
  seconds	
  
§ Nearly	
  ANSI-­‐92	
  standard	
  SQL	
  with	
  Hive	
  SQL	
  
Na(ve	
  MPP	
  Query	
  Engine	
  
§ Purpose-­‐built	
  for	
  low-­‐latency	
  queries	
  
§ Separate	
  runLme	
  from	
  MapReduce	
  
§ Designed	
  as	
  part	
  of	
  the	
  Hadoop	
  ecosystem	
  
Open	
  Source	
  
§ Apache-­‐licensed	
  
Benefits	
  of	
  Impala	
  
19
More	
  &	
  Faster	
  Value	
  from	
  “Big	
  Data”	
  
§  InteracLve	
  BI/AnalyLcs	
  experience	
  via	
  SQL	
  
§  No	
  delays	
  from	
  data	
  migraLon	
  
Flexibility	
  
§  Query	
  across	
  exisLng	
  data	
  
§  Select	
  best-­‐fit	
  file	
  formats	
  (Parquet,	
  Avro,	
  etc.)	
  
§  Run	
  mulLple	
  frameworks	
  on	
  the	
  same	
  data	
  at	
  the	
  same	
  Lme	
  	
  
Cost	
  Efficiency	
  
§  Reduce	
  movement,	
  duplicate	
  storage	
  &	
  compute	
  
§  10%	
  to	
  1%	
  the	
  cost	
  of	
  analyLc	
  DBMS	
  
Full	
  Fidelity	
  Analysis	
  
§  No	
  loss	
  from	
  aggregaLons	
  or	
  fixed	
  schemas	
  
Impala	
  Use	
  Cases	
  
20
InteracLve	
  BI/analyLcs	
  on	
  more	
  data	
  
Asking	
  new	
  quesLons	
  –	
  exploraLon,	
  ML	
  
Data	
  processing	
  with	
  Lght	
  SLAs	
  
Query-­‐able	
  archive	
  w/full	
  fidelity	
  
Cost-­‐effec(ve,	
  ad	
  hoc	
  query	
  environment	
  that	
  
offloads	
  the	
  data	
  warehouse	
  for:	
  
Our	
  Design	
  Strategy	
  
21
One	
  pool	
  of	
  (open)	
  data	
  
One	
  metadata	
  model	
  
One	
  security	
  framework	
  
One	
  set	
  of	
  system	
  resources	
  
An	
  Integrated	
  Part	
  of	
  
the	
  Hadoop	
  System	
  
In-­‐Memory	
  
Processing	
  &	
  
Streaming	
  
Spark	
  
Storage	
  
Integra(on	
  
Resource	
  Management	
  
Metadata	
  
Batch	
  
Processing	
  
MAPREDUCE,	
  
HIVE	
  &	
  PIG	
  
…
HDFS	
   HBase	
  
TEXT,	
  RCFILE,	
  PARQUET,	
  AVRO,	
  ETC.	
   RECORDS	
  
Engines	
  
InteracLve	
  
SQL	
  
CLOUDERA	
  
IMPALA	
  
InteracLve	
  
Search	
  
CLOUDERA	
  
SEARCH	
  
Machine	
  
Learning	
  
MAHOUT,	
  
ClouderaML,	
  
Oryx	
  
Math	
  &	
  
Sta(s(cs	
  
SAS,	
  R	
  
	
  
Security	
  
Impala	
  Key	
  Features	
  
22
Fast	
   Flexible	
   Secure	
  
Easy	
  to	
  Implement	
   Easy	
  to	
  Use	
   Simple	
  to	
  Manage	
  
§  In-­‐memory	
  data	
  transfers	
  
§  ParLLoned	
  joins	
  
§  Fully	
  distributed	
  aggregaLons	
  
§  Query	
  data	
  in	
  HDFS	
  &	
  HBase	
  
§  Supports	
  mul(ple	
  file	
  formats	
  
&	
  compression	
  algorithms	
  
§  Java	
  &	
  Na(ve	
  UDFs,	
  UDAFs	
  
§  Integrated	
  with	
  Hadoop	
  
security	
  
§  Kerberos	
  authenLcaLon	
  
§  Authoriza(on	
  (Sentry)	
  
§  Leverages	
  Hive’s	
  ODBC/JDBC	
  
connectors,	
  metastore	
  &	
  SQL	
  
syntax	
  	
  
§  Open	
  source	
  
§  Interact	
  with	
  data	
  via	
  SQL	
  
§  CerLfied	
  with	
  leading	
  BI	
  tools	
  
§  Deploy,	
  configure	
  &	
  monitor	
  
with	
  Cloudera	
  Manager	
  
§  Integrated	
  with	
  Hadoop	
  
resource	
  management	
  
What’s	
  Coming?*	
  
23
SQL	
  2003-­‐Compliant	
  AnalyLc	
  Window	
  FuncLons	
  
AddiLonal	
  AuthenLcaLon	
  Mechanisms	
  
User	
  Defined	
  Table	
  FuncLons	
  
Intra-­‐node	
  Parallelized	
  AggregaLons	
  &	
  Joins	
  
Nested	
  Data	
  
Enhanced	
  YARN-­‐Integrated	
  Resource	
  Manager	
  
Dynamic	
  ParLLon	
  Pruning	
  
In	
  the	
  Near	
  Term:	
  
*On	
  the	
  roadmap…	
  
no	
  guarantees	
  	
  
Impala	
  Plays	
  Well	
  with	
  Others	
  
24
BI	
  Partners:	
  
Building	
  on	
  the	
  
Enterprise	
  Standard	
  
POWERED BY
IMPALA
Not	
  All	
  SQL	
  On	
  Hadoop	
  Is	
  Created	
  Equal	
  
25
Batch	
  MapReduce	
  
Make	
  MapReduce	
  faster	
  
Slow,	
  s(ll	
  batch	
  
Remote	
  Query	
  
Pull	
  data	
  from	
  HDFS	
  over	
  
the	
  network	
  to	
  the	
  DW	
  
compute	
  layer	
  
Slow,	
  expensive	
  
Siloed	
  DBMS	
  
Load	
  data	
  into	
  a	
  
proprietary	
  database	
  file	
  
Rigid,	
  siloed	
  data,	
  
slow	
  ETL	
  
Impala	
  
Na(ve	
  MPP	
  query	
  engine	
  
that’s	
  integrated	
  into	
  
Hadoop	
  
Fast,	
  flexible,	
  	
  
cost-­‐effec(ve	
  
$
DMBS	
  Hadoop	
  
More	
  Detail	
  On	
  AlternaLve	
  Approaches	
  
26
Batch	
  MapReduce	
  
§  Batch-­‐oriented	
  
§  High	
  latency	
  
Remote	
  Query	
   Siloed	
  DBMS	
  
Hadoop	
   DMBS	
  
HDFS	
   Storage	
  
Compute	
   Compute	
  
§  Network	
  bojleneck	
  
§  2x	
  the	
  hardware	
  
§  Duplicate	
  metadata,	
  
security,	
  SQL,	
  etc.	
  
Storage	
  (HDFS)	
  
Integra(on	
  
Resource	
  Management	
  
Hadoop	
  Metadata	
  
DBMS	
  
Hadoop	
  
Engines	
  
MAPREDUCE,	
  HIVE,	
  PIG,	
  IMPALA,	
  ETC.	
  
DBMS	
  Metadata	
  
PROPRIETARY	
   STANDARD	
  &	
  SHARED	
  
§  RDBMS	
  rigidity	
  
§  Query	
  subset	
  of	
  data	
  
§  Duplicate	
  storage,	
  
metadata,	
  security,	
  
SQL,	
  etc.	
  
Storage	
  
Integra(on	
  
Resource	
  Management	
  
Metadata	
  
Batch	
  
Processing	
  
InteracLve	
  
SQL	
  
Machine	
  
Learning	
  
HDFS	
   HBase	
  
Security	
   Security	
  
Other	
  Sexy	
  New	
  Big	
  Data	
  MPP	
  Tools	
  
27
Presto	
  
Purpose-­‐Built	
  MPP	
  Engine;	
  Similar	
  Architecture	
  to	
  Impala;	
  Few	
  Performance	
  Comparisons,	
  
but	
  Impala	
  Anecdotally	
  5x-­‐10x	
  Faster	
  
	
  
Shark	
  
Hive-­‐CompaLble	
  Data	
  Warehouse	
  for	
  Spark;	
  Great	
  Performance	
  unLl	
  Required	
  to	
  go	
  to	
  
Disk,	
  at	
  Which	
  Point	
  Impala	
  Bejer;	
  With	
  HDFS	
  Caching	
  Impala	
  will	
  Perform	
  on	
  Par	
  from	
  a	
  
Memory	
  PerspecLve	
  
	
  
Drill	
  
Open	
  Source	
  version	
  of	
  Dremel;	
  Another	
  MPP	
  Engine;	
  MulLple	
  Data	
  Formats	
  and	
  Sources	
  
	
  
Phoenix	
  –	
  Sort	
  Of	
  
SQL	
  Skin	
  over	
  HBase	
  (and	
  Only	
  HBase);	
  Subset	
  of	
  SQL	
  Standard	
  
What	
  About	
  an	
  EDW/RDBMS?	
  
“Right	
  Tool	
  for	
  the	
  Right	
  Job”	
  
	
  
EDW/RDBMS	
  Great	
  For:	
  
•  OLTP’s	
  complex	
  transacLons	
  
•  Highly	
  planned	
  and	
  opLmized	
  known	
  workloads	
  
•  Opera'onal	
  reports	
  and	
  repeated	
  known	
  queries	
  
	
  
Impala	
  Great	
  For:	
  
•  Exploratory	
  analy'cs	
  with	
  previously-­‐unknown	
  queries	
  
•  Queries	
  on	
  big	
  and	
  growing	
  data	
  sets	
  
EDW/RDBMS	
  Can’t:	
  
•  Dump	
  in	
  raw	
  data	
  then	
  later	
  define	
  schema	
  and	
  query	
  what	
  you	
  want	
  
•  Evolve	
  schemas	
  without	
  an	
  expensive	
  schema	
  upgrade	
  planning	
  process	
  
•  Simply	
  scale	
  just	
  by	
  adding	
  industry-­‐standard	
  servers	
  
•  Store	
  at	
  <	
  $1k/TB	
  instead	
  of	
  $10-­‐150k/TB	
  
28
29
Impala	
  Technical	
  Details	
  
The	
  Impala	
  Advantage	
  
30
No	
  MapReduce;	
  No	
  JVM;	
  All	
  NaLve	
  
In-­‐Memory	
  Data	
  Transfers	
  
Saturate	
  Disks	
  on	
  Reads	
  
OpLmized	
  File	
  Format	
  (ie	
  Parquet)	
  
In-­‐Memory	
  HDFS	
  Caching	
  
	
  
Cost-­‐Based	
  Join	
  Order	
  OpLmizaLon	
  –	
  Frees	
  User	
  
from	
  Having	
  to	
  Guess	
  the	
  Correct	
  Join	
  Order	
  
Where	
  does	
  the	
  Performance	
  Come	
  From?	
  
Impala	
  and	
  Hive	
  
31
Shares	
  Everything	
  Client-­‐Facing	
  
§  Metadata	
  (table	
  definiLons)	
  
§  ODBC/JDBC	
  drivers	
  
§  SQL	
  syntax	
  (Hive	
  SQL)	
  
§  Flexible	
  file	
  formats	
  
§  Machine	
  pool	
  
§  Hue	
  GUI	
  
But	
  Built	
  for	
  Different	
  Purposes	
  
§  Hive:	
  runs	
  on	
  MapReduce	
  and	
  
ideal	
  for	
  batch	
  processing	
  
§  Impala:	
  naLve	
  MPP	
  query	
  engine	
  
ideal	
  for	
  interacLve	
  SQL	
  
Storage	
  
Integra(on	
  
Resource	
  Management	
  
Metadata	
  
HDFS	
   HBase	
  
TEXT,	
  RCFILE,	
  PARQUET,	
  AVRO,	
  ETC.	
   RECORDS	
  
Hive	
  
SQL	
  Syntax	
   Impala	
  
SQL	
  Syntax	
  +	
  
Compute	
  Framework	
  MapReduce	
  
Compute	
  Framework	
  
Batch	
  
Processing	
  
InteracLve	
  
SQL	
  
Impala	
  Query	
  ExecuLon	
  
32
Query	
  Planner	
  
Query	
  Coordinator	
  
Query	
  Executor	
  
HDFS	
  DN	
   HBase	
  
SQL	
  App	
  
ODBC	
  
Hive	
  
Metastore	
  
HDFS	
  NN	
   Statestore	
  
Query	
  Planner	
  
Query	
  Coordinator	
  
Query	
  Executor	
  
HDFS	
  DN	
   HBase	
  
Query	
  Planner	
  
Query	
  Coordinator	
  
Query	
  Executor	
  
HDFS	
  DN	
   HBase	
  
SQL	
  request	
  
1)	
  Request	
  arrives	
  via	
  ODBC/JDBC/HUE/Shell	
  
Impala	
  Query	
  ExecuLon	
  
33
Query	
  Planner	
  
Query	
  Coordinator	
  
Query	
  Executor	
  
HDFS	
  DN	
   HBase	
  
SQL	
  App	
  
ODBC	
  
Hive	
  
Metastore	
  
HDFS	
  NN	
   Statestore	
  
Query	
  Planner	
  
Query	
  Coordinator	
  
Query	
  Executor	
  
HDFS	
  DN	
   HBase	
  
Query	
  Planner	
  
Query	
  Coordinator	
  
Query	
  Executor	
  
HDFS	
  DN	
   HBase	
  
2)	
  Planner	
  turns	
  request	
  into	
  collec(ons	
  of	
  plan	
  fragments	
  
3)	
  Coordinator	
  ini(ates	
  execu(on	
  on	
  impalad(s)	
  local	
  to	
  data	
  
Impala	
  Query	
  ExecuLon	
  
34
Query	
  Planner	
  
Query	
  Coordinator	
  
Query	
  Executor	
  
HDFS	
  DN	
   HBase	
  
SQL	
  App	
  
ODBC	
  
Hive	
  
Metastore	
  
HDFS	
  NN	
   Statestore	
  
Query	
  Planner	
  
Query	
  Coordinator	
  
Query	
  Executor	
  
HDFS	
  DN	
   HBase	
  
Query	
  Planner	
  
Query	
  Coordinator	
  
Query	
  Executor	
  
HDFS	
  DN	
   HBase	
  
4)	
  Intermediate	
  results	
  are	
  streamed	
  between	
  impalad(s)	
  
5)	
  Query	
  results	
  are	
  streamed	
  back	
  to	
  client	
  
Query	
  results	
  
Parquet	
  File	
  Format	
  
35
Open	
  source,	
  columnar	
  Hadoop	
  file	
  
format	
  developed	
  by	
  Cloudera	
  &	
  Twiler	
  
Limits	
  the	
  IO	
  to	
  only	
  the	
  data	
  that	
  is	
  needed	
  
Supports	
  storing	
  each	
  column	
  in	
  a	
  separate	
  file	
  
Saves	
  space:	
  columnar	
  layout	
  compresses	
  bejer	
  
Enables	
  bejer	
  scans:	
  load	
  only	
  the	
  columns	
  that	
  are	
  needed	
  
Supports	
  index	
  pages	
  for	
  fast	
  lookup	
  
Extensible	
  value	
  encodings	
  
36
Impala	
  Performance	
  Results	
  
Impala	
  Performance	
  Results	
  
•  Impala’s	
  Milestone	
  in	
  Jan	
  2014:	
  
•  Comparable	
  commercial	
  MPP	
  DBMS	
  speed	
  
•  NaLvely	
  on	
  Hadoop	
  
	
  
•  Three	
  Result	
  Sets:	
  
•  Impala	
  vs	
  Hive	
  0.12	
  (Impala	
  6-­‐70x	
  faster)	
  
•  Impala	
  vs	
  “DBMS-­‐Y”	
  (Impala	
  average	
  of	
  2x	
  faster)	
  
•  Impala	
  scalability	
  (Impala	
  achieves	
  linear	
  scale)	
  
	
  
•  Background	
  
•  20	
  pre-­‐selected,	
  diverse	
  TPC-­‐DS	
  queries	
  (modified	
  to	
  remove	
  unsupported	
  
language)	
  
•  Sufficient	
  data	
  scale	
  for	
  realisLc	
  comparison	
  (3	
  TB,	
  15	
  TB,	
  and	
  30	
  TB)	
  
•  RealisLc	
  nodes	
  (e.g.	
  8-­‐core	
  CPU,	
  96GB	
  RAM,	
  12x2TB	
  disks)	
  
•  Methodical	
  tesLng	
  (mulLple	
  runs,	
  reviewed	
  fairness	
  for	
  compeLLon,	
  etc)	
  
	
  
•  Details:	
  hjp://blog.cloudera.com/blog/2014/01/impala-­‐performance-­‐dbms-­‐class-­‐speed/	
  
37
Enough	
  slides…	
  DEMO	
  TIME!	
  
38
So	
  What	
  is	
  Cloudera	
  Impala?	
  
39
What’s	
  Next?	
  
•  Download	
  Hadoop!	
  
•  CDH	
  available	
  at	
  www.cloudera.com	
  
•  Try	
  it	
  online:	
  Cloudera	
  Live	
  
•  Cloudera	
  provides	
  pre-­‐loaded	
  VMs	
  
•  hjp://Lny.cloudera.com/quickstartvm	
  
•  Ride	
  Impala!	
  
•  hjp://impala.io/	
  	
  
40
41
SAN	
  DIEGO	
  BIG	
  DATA	
  
Special	
  thanks:	
  
42
Preferably	
  related	
  to	
  the	
  talk…	
  or	
  not.	
  
QuesLons?	
  
43
Thank	
  You!	
  
Maxime	
  Dumas	
  
mdumas@cloudera.com	
  	
  
	
  
We’re	
  hiring.	
  
44

Mais conteúdo relacionado

Mais procurados

SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Data Con LA
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impalahuguk
 
Introduction to Impala
Introduction to ImpalaIntroduction to Impala
Introduction to Impalamarkgrover
 
High concurrency,
Low latency analytics
using Spark/Kudu
 High concurrency,
Low latency analytics
using Spark/Kudu High concurrency,
Low latency analytics
using Spark/Kudu
High concurrency,
Low latency analytics
using Spark/KuduChris George
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoopmarkgrover
 
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera, Inc.
 
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesOReillyStrata
 
Interactive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataInteractive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataOfir Manor
 
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv larsgeorge
 
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data  relational storage (Strata NYC 2017)A brave new world in mutable big data  relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)Todd Lipcon
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Cloudera, Inc.
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014hadooparchbook
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014cdmaxime
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Data Con LA
 
JethroData technical white paper
JethroData technical white paperJethroData technical white paper
JethroData technical white paperJethroData
 

Mais procurados (20)

SQL on Hadoop in Taiwan
SQL on Hadoop in TaiwanSQL on Hadoop in Taiwan
SQL on Hadoop in Taiwan
 
SQL On Hadoop
SQL On HadoopSQL On Hadoop
SQL On Hadoop
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Introduction to Impala
Introduction to ImpalaIntroduction to Impala
Introduction to Impala
 
High concurrency,
Low latency analytics
using Spark/Kudu
 High concurrency,
Low latency analytics
using Spark/Kudu High concurrency,
Low latency analytics
using Spark/Kudu
High concurrency,
Low latency analytics
using Spark/Kudu
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoop
 
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for Hadoop
 
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
 
Interactive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataInteractive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroData
 
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
 
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data  relational storage (Strata NYC 2017)A brave new world in mutable big data  relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
 
JethroData technical white paper
JethroData technical white paperJethroData technical white paper
JethroData technical white paper
 

Semelhante a Cloudera Impala - San Diego Big Data Meetup August 13th 2014

Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkJames Chen
 
Comparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs ApacheComparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs ApacheSandeepTaksande
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSatish Mohan
 
RDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs SparkRDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs SparkLaxmi8
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemInSemble
 
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the CloudSpeed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloudgluent.
 
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...Dataconomy Media
 
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsHadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsDataWorks Summit/Hadoop Summit
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Apache Spark: Usage and Roadmap in Hadoop
Apache Spark: Usage and Roadmap in HadoopApache Spark: Usage and Roadmap in Hadoop
Apache Spark: Usage and Roadmap in HadoopCloudera Japan
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...Yahoo Developer Network
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeNicolas Morales
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...Amazon Web Services
 

Semelhante a Cloudera Impala - San Diego Big Data Meetup August 13th 2014 (20)

Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
 
Comparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs ApacheComparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs Apache
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
 
Hadoop Primer
Hadoop PrimerHadoop Primer
Hadoop Primer
 
RDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs SparkRDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs Spark
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop Ecosystem
 
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the CloudSpeed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
 
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsHadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the Experts
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache Spark: Usage and Roadmap in Hadoop
Apache Spark: Usage and Roadmap in HadoopApache Spark: Usage and Roadmap in Hadoop
Apache Spark: Usage and Roadmap in Hadoop
 
Spark_Part 1
Spark_Part 1Spark_Part 1
Spark_Part 1
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor Landscape
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
 

Mais de cdmaxime

Rocana Deep Dive OC Big Data Meetup #19 Sept 21st 2016
Rocana Deep Dive OC Big Data Meetup #19 Sept 21st 2016Rocana Deep Dive OC Big Data Meetup #19 Sept 21st 2016
Rocana Deep Dive OC Big Data Meetup #19 Sept 21st 2016cdmaxime
 
Apache Spark - San Diego Big Data Meetup Jan 14th 2015
Apache Spark - San Diego Big Data Meetup Jan 14th 2015Apache Spark - San Diego Big Data Meetup Jan 14th 2015
Apache Spark - San Diego Big Data Meetup Jan 14th 2015cdmaxime
 
Apache Spark - Santa Barbara Scala Meetup Dec 18th 2014
Apache Spark - Santa Barbara Scala Meetup Dec 18th 2014Apache Spark - Santa Barbara Scala Meetup Dec 18th 2014
Apache Spark - Santa Barbara Scala Meetup Dec 18th 2014cdmaxime
 
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014cdmaxime
 
Art of Living Happiness App Challenge - San Diego Meetup Nov 20th 2014
Art of Living Happiness App Challenge - San Diego Meetup Nov 20th 2014Art of Living Happiness App Challenge - San Diego Meetup Nov 20th 2014
Art of Living Happiness App Challenge - San Diego Meetup Nov 20th 2014cdmaxime
 
Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014cdmaxime
 

Mais de cdmaxime (6)

Rocana Deep Dive OC Big Data Meetup #19 Sept 21st 2016
Rocana Deep Dive OC Big Data Meetup #19 Sept 21st 2016Rocana Deep Dive OC Big Data Meetup #19 Sept 21st 2016
Rocana Deep Dive OC Big Data Meetup #19 Sept 21st 2016
 
Apache Spark - San Diego Big Data Meetup Jan 14th 2015
Apache Spark - San Diego Big Data Meetup Jan 14th 2015Apache Spark - San Diego Big Data Meetup Jan 14th 2015
Apache Spark - San Diego Big Data Meetup Jan 14th 2015
 
Apache Spark - Santa Barbara Scala Meetup Dec 18th 2014
Apache Spark - Santa Barbara Scala Meetup Dec 18th 2014Apache Spark - Santa Barbara Scala Meetup Dec 18th 2014
Apache Spark - Santa Barbara Scala Meetup Dec 18th 2014
 
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
 
Art of Living Happiness App Challenge - San Diego Meetup Nov 20th 2014
Art of Living Happiness App Challenge - San Diego Meetup Nov 20th 2014Art of Living Happiness App Challenge - San Diego Meetup Nov 20th 2014
Art of Living Happiness App Challenge - San Diego Meetup Nov 20th 2014
 
Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014
 

Último

Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...software pro Development
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...kalichargn70th171
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 

Último (20)

Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

Cloudera Impala - San Diego Big Data Meetup August 13th 2014

  • 1. 1 Cloudera  Impala   SD  Big  Data  Monthly  Meetup  #2   August  13th  2014     Maxime  Dumas   Systems  Engineer  
  • 2. Thirty  Seconds  About  Max   •  Systems  Engineer   •  aka  Sales  Engineer   •  SoCal,  AZ,  NV   •  former  coder  of  PHP   •  teaches  meditaLon  +  yoga   •  from  Montreal,  Canada   2  
  • 3. What  Does  Cloudera  Do?   •  product   •  distribuLon  of  Hadoop  components,  Apache  licensed   •  enterprise  tooling   •  support   •  training   •  services  (aka  consulLng)   •  community   3
  • 4. What  This  Talk  Isn’t  About   •  deploying   •  Puppet,  Chef,  Ansible,  homegrown  scripts,  intern  labor   •  sizing  &  tuning   •  depends  heavily  on  data  and  workload   •  coding   •  unless  you  count  XML  or  CSV  or  SQL   •  algorithms   4
  • 6. What  is  Cloudera  Impala?   6
  • 7. cloud·∙e·∙ra  im·∙pal·∙a   7 /kloudˈi(ə)rə  imˈpalə/     noun     a  modern,  open  source,  MPP  SQL  query   engine  for  Apache  Hadoop.     “Cloudera  Impala  provides  fast,  ad  hoc  SQL   query  capability  for  Apache  Hadoop,   complemenLng  tradiLonal  MapReduce  batch   processing.”  
  • 8. 8 Quick  and  dirty,  for  context.   The  Apache  Hadoop  Ecosystem  
  • 9. Why  “Ecosystem?”   •  In  the  beginning,  just  Hadoop   •  HDFS   •  MapReduce   •  Today,  dozens  of  interrelated  components   •  I/O   •  Processing   •  Specialty  ApplicaLons   •  ConfiguraLon   •  Workflow   9
  • 10. HDFS   •  Distributed,  highly  fault-­‐tolerant  filesystem   •  OpLmized  for  large  streaming  access  to  data   •  Based  on  Google  File  System   •  hjp://research.google.com/archive/gfs.html   10
  • 11. Lots  of  Commodity  Machines   11 Image:Yahoo! Hadoop cluster [ OSCON ’07 ]
  • 12. MapReduce  (MR)   •  Programming  paradigm   •  Batch  oriented,  not  realLme   •  Works  well  with  distributed  compuLng   •  Lots  of  Java,  but  other  languages  supported   •  Based  on  Google’s  paper   •  hjp://research.google.com/archive/mapreduce.html   12
  • 13. Apache  Hive   •  AbstracLon  of  Hadoop’s  Java  API   •  HiveQL  “compiles”  down  to  MR   •  a  “SQL-­‐like”  language   •  Eases  analysis  using  MapReduce   13
  • 14. Apache  Hive  Metastore   •  Maps  HDFS  files  to  DB-­‐like  resources   •  Databases   •  Tables   •  Column/field  names,  data  types   •  Roles/users   •  InputFormat/OutputFormat   14
  • 15. Sqoop   ©2011 Cloudera, Inc. All Rights Reserved. 15 •  SQL  to  Hadoop   •  Tool  to  import/export  any  JDBC-­‐supported  database  into  Hadoop   •  Transfer  data  between  Hadoop  and  external  databases  or  EDW   •  High  performance  connectors  for  some  RDBMS   •  Oracle,  Teradata,  Netezza   •  Developed  at  Cloudera  
  • 16. 16  
  • 17. 17 Familiar  interface,  but  more  powerful.   Cloudera  Impala  
  • 18. Cloudera  Impala   18 Interac(ve  SQL  for  Hadoop   § Responses  in  seconds   § Nearly  ANSI-­‐92  standard  SQL  with  Hive  SQL   Na(ve  MPP  Query  Engine   § Purpose-­‐built  for  low-­‐latency  queries   § Separate  runLme  from  MapReduce   § Designed  as  part  of  the  Hadoop  ecosystem   Open  Source   § Apache-­‐licensed  
  • 19. Benefits  of  Impala   19 More  &  Faster  Value  from  “Big  Data”   §  InteracLve  BI/AnalyLcs  experience  via  SQL   §  No  delays  from  data  migraLon   Flexibility   §  Query  across  exisLng  data   §  Select  best-­‐fit  file  formats  (Parquet,  Avro,  etc.)   §  Run  mulLple  frameworks  on  the  same  data  at  the  same  Lme     Cost  Efficiency   §  Reduce  movement,  duplicate  storage  &  compute   §  10%  to  1%  the  cost  of  analyLc  DBMS   Full  Fidelity  Analysis   §  No  loss  from  aggregaLons  or  fixed  schemas  
  • 20. Impala  Use  Cases   20 InteracLve  BI/analyLcs  on  more  data   Asking  new  quesLons  –  exploraLon,  ML   Data  processing  with  Lght  SLAs   Query-­‐able  archive  w/full  fidelity   Cost-­‐effec(ve,  ad  hoc  query  environment  that   offloads  the  data  warehouse  for:  
  • 21. Our  Design  Strategy   21 One  pool  of  (open)  data   One  metadata  model   One  security  framework   One  set  of  system  resources   An  Integrated  Part  of   the  Hadoop  System   In-­‐Memory   Processing  &   Streaming   Spark   Storage   Integra(on   Resource  Management   Metadata   Batch   Processing   MAPREDUCE,   HIVE  &  PIG   … HDFS   HBase   TEXT,  RCFILE,  PARQUET,  AVRO,  ETC.   RECORDS   Engines   InteracLve   SQL   CLOUDERA   IMPALA   InteracLve   Search   CLOUDERA   SEARCH   Machine   Learning   MAHOUT,   ClouderaML,   Oryx   Math  &   Sta(s(cs   SAS,  R     Security  
  • 22. Impala  Key  Features   22 Fast   Flexible   Secure   Easy  to  Implement   Easy  to  Use   Simple  to  Manage   §  In-­‐memory  data  transfers   §  ParLLoned  joins   §  Fully  distributed  aggregaLons   §  Query  data  in  HDFS  &  HBase   §  Supports  mul(ple  file  formats   &  compression  algorithms   §  Java  &  Na(ve  UDFs,  UDAFs   §  Integrated  with  Hadoop   security   §  Kerberos  authenLcaLon   §  Authoriza(on  (Sentry)   §  Leverages  Hive’s  ODBC/JDBC   connectors,  metastore  &  SQL   syntax     §  Open  source   §  Interact  with  data  via  SQL   §  CerLfied  with  leading  BI  tools   §  Deploy,  configure  &  monitor   with  Cloudera  Manager   §  Integrated  with  Hadoop   resource  management  
  • 23. What’s  Coming?*   23 SQL  2003-­‐Compliant  AnalyLc  Window  FuncLons   AddiLonal  AuthenLcaLon  Mechanisms   User  Defined  Table  FuncLons   Intra-­‐node  Parallelized  AggregaLons  &  Joins   Nested  Data   Enhanced  YARN-­‐Integrated  Resource  Manager   Dynamic  ParLLon  Pruning   In  the  Near  Term:   *On  the  roadmap…   no  guarantees    
  • 24. Impala  Plays  Well  with  Others   24 BI  Partners:   Building  on  the   Enterprise  Standard   POWERED BY IMPALA
  • 25. Not  All  SQL  On  Hadoop  Is  Created  Equal   25 Batch  MapReduce   Make  MapReduce  faster   Slow,  s(ll  batch   Remote  Query   Pull  data  from  HDFS  over   the  network  to  the  DW   compute  layer   Slow,  expensive   Siloed  DBMS   Load  data  into  a   proprietary  database  file   Rigid,  siloed  data,   slow  ETL   Impala   Na(ve  MPP  query  engine   that’s  integrated  into   Hadoop   Fast,  flexible,     cost-­‐effec(ve   $
  • 26. DMBS  Hadoop   More  Detail  On  AlternaLve  Approaches   26 Batch  MapReduce   §  Batch-­‐oriented   §  High  latency   Remote  Query   Siloed  DBMS   Hadoop   DMBS   HDFS   Storage   Compute   Compute   §  Network  bojleneck   §  2x  the  hardware   §  Duplicate  metadata,   security,  SQL,  etc.   Storage  (HDFS)   Integra(on   Resource  Management   Hadoop  Metadata   DBMS   Hadoop   Engines   MAPREDUCE,  HIVE,  PIG,  IMPALA,  ETC.   DBMS  Metadata   PROPRIETARY   STANDARD  &  SHARED   §  RDBMS  rigidity   §  Query  subset  of  data   §  Duplicate  storage,   metadata,  security,   SQL,  etc.   Storage   Integra(on   Resource  Management   Metadata   Batch   Processing   InteracLve   SQL   Machine   Learning   HDFS   HBase   Security   Security  
  • 27. Other  Sexy  New  Big  Data  MPP  Tools   27 Presto   Purpose-­‐Built  MPP  Engine;  Similar  Architecture  to  Impala;  Few  Performance  Comparisons,   but  Impala  Anecdotally  5x-­‐10x  Faster     Shark   Hive-­‐CompaLble  Data  Warehouse  for  Spark;  Great  Performance  unLl  Required  to  go  to   Disk,  at  Which  Point  Impala  Bejer;  With  HDFS  Caching  Impala  will  Perform  on  Par  from  a   Memory  PerspecLve     Drill   Open  Source  version  of  Dremel;  Another  MPP  Engine;  MulLple  Data  Formats  and  Sources     Phoenix  –  Sort  Of   SQL  Skin  over  HBase  (and  Only  HBase);  Subset  of  SQL  Standard  
  • 28. What  About  an  EDW/RDBMS?   “Right  Tool  for  the  Right  Job”     EDW/RDBMS  Great  For:   •  OLTP’s  complex  transacLons   •  Highly  planned  and  opLmized  known  workloads   •  Opera'onal  reports  and  repeated  known  queries     Impala  Great  For:   •  Exploratory  analy'cs  with  previously-­‐unknown  queries   •  Queries  on  big  and  growing  data  sets   EDW/RDBMS  Can’t:   •  Dump  in  raw  data  then  later  define  schema  and  query  what  you  want   •  Evolve  schemas  without  an  expensive  schema  upgrade  planning  process   •  Simply  scale  just  by  adding  industry-­‐standard  servers   •  Store  at  <  $1k/TB  instead  of  $10-­‐150k/TB   28
  • 30. The  Impala  Advantage   30 No  MapReduce;  No  JVM;  All  NaLve   In-­‐Memory  Data  Transfers   Saturate  Disks  on  Reads   OpLmized  File  Format  (ie  Parquet)   In-­‐Memory  HDFS  Caching     Cost-­‐Based  Join  Order  OpLmizaLon  –  Frees  User   from  Having  to  Guess  the  Correct  Join  Order   Where  does  the  Performance  Come  From?  
  • 31. Impala  and  Hive   31 Shares  Everything  Client-­‐Facing   §  Metadata  (table  definiLons)   §  ODBC/JDBC  drivers   §  SQL  syntax  (Hive  SQL)   §  Flexible  file  formats   §  Machine  pool   §  Hue  GUI   But  Built  for  Different  Purposes   §  Hive:  runs  on  MapReduce  and   ideal  for  batch  processing   §  Impala:  naLve  MPP  query  engine   ideal  for  interacLve  SQL   Storage   Integra(on   Resource  Management   Metadata   HDFS   HBase   TEXT,  RCFILE,  PARQUET,  AVRO,  ETC.   RECORDS   Hive   SQL  Syntax   Impala   SQL  Syntax  +   Compute  Framework  MapReduce   Compute  Framework   Batch   Processing   InteracLve   SQL  
  • 32. Impala  Query  ExecuLon   32 Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   SQL  App   ODBC   Hive   Metastore   HDFS  NN   Statestore   Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   SQL  request   1)  Request  arrives  via  ODBC/JDBC/HUE/Shell  
  • 33. Impala  Query  ExecuLon   33 Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   SQL  App   ODBC   Hive   Metastore   HDFS  NN   Statestore   Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   2)  Planner  turns  request  into  collec(ons  of  plan  fragments   3)  Coordinator  ini(ates  execu(on  on  impalad(s)  local  to  data  
  • 34. Impala  Query  ExecuLon   34 Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   SQL  App   ODBC   Hive   Metastore   HDFS  NN   Statestore   Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   4)  Intermediate  results  are  streamed  between  impalad(s)   5)  Query  results  are  streamed  back  to  client   Query  results  
  • 35. Parquet  File  Format   35 Open  source,  columnar  Hadoop  file   format  developed  by  Cloudera  &  Twiler   Limits  the  IO  to  only  the  data  that  is  needed   Supports  storing  each  column  in  a  separate  file   Saves  space:  columnar  layout  compresses  bejer   Enables  bejer  scans:  load  only  the  columns  that  are  needed   Supports  index  pages  for  fast  lookup   Extensible  value  encodings  
  • 37. Impala  Performance  Results   •  Impala’s  Milestone  in  Jan  2014:   •  Comparable  commercial  MPP  DBMS  speed   •  NaLvely  on  Hadoop     •  Three  Result  Sets:   •  Impala  vs  Hive  0.12  (Impala  6-­‐70x  faster)   •  Impala  vs  “DBMS-­‐Y”  (Impala  average  of  2x  faster)   •  Impala  scalability  (Impala  achieves  linear  scale)     •  Background   •  20  pre-­‐selected,  diverse  TPC-­‐DS  queries  (modified  to  remove  unsupported   language)   •  Sufficient  data  scale  for  realisLc  comparison  (3  TB,  15  TB,  and  30  TB)   •  RealisLc  nodes  (e.g.  8-­‐core  CPU,  96GB  RAM,  12x2TB  disks)   •  Methodical  tesLng  (mulLple  runs,  reviewed  fairness  for  compeLLon,  etc)     •  Details:  hjp://blog.cloudera.com/blog/2014/01/impala-­‐performance-­‐dbms-­‐class-­‐speed/   37
  • 38. Enough  slides…  DEMO  TIME!   38
  • 39. So  What  is  Cloudera  Impala?   39
  • 40. What’s  Next?   •  Download  Hadoop!   •  CDH  available  at  www.cloudera.com   •  Try  it  online:  Cloudera  Live   •  Cloudera  provides  pre-­‐loaded  VMs   •  hjp://Lny.cloudera.com/quickstartvm   •  Ride  Impala!   •  hjp://impala.io/     40
  • 41. 41 SAN  DIEGO  BIG  DATA   Special  thanks:  
  • 42. 42 Preferably  related  to  the  talk…  or  not.   QuesLons?  
  • 43. 43 Thank  You!   Maxime  Dumas   mdumas@cloudera.com       We’re  hiring.  
  • 44. 44