SlideShare uma empresa Scribd logo
1 de 92
Baixar para ler offline
1	
  
HBase	
  Applica-ons	
  
Selected	
  Use-­‐Cases	
  around	
  a	
  Common	
  Theme	
  
Atlanta	
  HUG	
  –May	
  2014	
  
Lars	
  George,	
  Cloudera	
  
EMEA	
  Chief	
  Architect	
  
2	
  
About	
  Me	
  
•  EMEA	
  Chief	
  Architect	
  @	
  Cloudera	
  
•  Consul-ng	
  on	
  Hadoop	
  projects	
  (everywhere)	
  
•  Apache	
  CommiNer	
  
•  HBase	
  and	
  Whirr	
  
•  O’Reilly	
  Author	
  
•  HBase	
  –	
  The	
  Defini-ve	
  Guide	
  
•  Now	
  in	
  Japanese!	
  
•  Contact	
  
•  lars@cloudera.com	
  
•  @larsgeorge	
  
日本語版も出ました!
3	
  
The	
  Content...	
  
•  HBase	
  -­‐	
  Strengths	
  and	
  weaknesses	
  
•  Common	
  use-­‐cases	
  and	
  paNerns	
  
•  Focus	
  on	
  specific	
  type	
  of	
  applica-ons	
  
•  Summary	
  
4	
   CONFIDENTIAL	
  -­‐	
  RESTRICTED	
  
HBase	
  
Strength	
  and	
  Weaknesses	
  
5	
  
IOPS	
  vs	
  Throughput	
  Mythbusters	
  
It	
  is	
  all	
  physics	
  in	
  the	
  end,	
  you	
  cannot	
  solve	
  an	
  I/O	
  
problem	
  without	
  reducing	
  I/O	
  in	
  general.	
  Parallelize	
  
access	
  and	
  read/write	
  sequen-ally.	
  
6	
  
HBase:	
  Strengths	
  &	
  Weaknesses	
  
Strengths:	
  
•  Random	
  access	
  to	
  small(ish)	
  key-­‐value	
  pairs	
  
•  Rows	
  and	
  columns	
  stored	
  sorted	
  lexicographically	
  	
  
•  Adds	
  table	
  and	
  region	
  concepts	
  to	
  group	
  related	
  KVs	
  
•  Stores	
  and	
  reads	
  data	
  sequen-ally	
  
•  Parallelizes	
  across	
  all	
  clients	
  
•  Non-­‐blocking	
  I/O	
  throughout	
  
7	
  
HBase:	
  Strengths	
  &	
  Weaknesses	
  
Weaknesses:	
  
•  Not	
  op-mized	
  (yet)	
  for	
  100%	
  possible	
  throughput	
  of	
  
underlying	
  storage	
  layer	
  
•  And	
  HDFS	
  is	
  not	
  op-mized	
  fully	
  either	
  
•  Single	
  writer	
  issue	
  with	
  WALs	
  
•  Single	
  server	
  hot-­‐spojng	
  with	
  non-­‐distributed	
  keys	
  
8	
  
PaNerns	
  
•  There	
  are	
  common	
  paNerns	
  in	
  many	
  common	
  use-­‐
cases,	
  like	
  programming	
  paNerns.	
  	
  
•  We	
  need	
  to	
  extract	
  these	
  common	
  paNerns	
  and	
  make	
  
them	
  repeatable.	
  
•  Similar	
  to	
  the	
  “Gang	
  of	
  Four”	
  (Gamma,	
  Helm,	
  
Johnson,	
  Vlissides),	
  or	
  the	
  “Three	
  Amigos”	
  (Booch,	
  
Jacobson,	
  Rumbaugh)	
  
9	
   CONFIDENTIAL	
  -­‐	
  RESTRICTED	
  
Common	
  PaNerns	
  
10	
  
HBase	
  Dilemma	
  
Although	
  HBase	
  can	
  host	
  many	
  applica-ons,	
  they	
  may	
  
require	
  completely	
  opposite	
  features	
  
Events Entities
Time Series Message Store
11	
  
This	
  talk	
  (at	
  this	
  event)	
  
•  Message	
  Store	
  
•  Informa-on	
  exchange	
  between	
  en--es	
  
•  Sending/Receiving	
  informa-on	
  is	
  an	
  event	
  
•  Time-­‐Series	
  
•  Sequence	
  of	
  data	
  points	
  measure	
  at	
  successive	
  points	
  in	
  
-me,	
  spaced	
  at	
  uniform	
  intervals	
  
•  Measuring	
  of	
  a	
  data	
  point	
  is	
  an	
  event	
  
12	
  
Using	
  HBase	
  Strengths	
  
13	
  
HBase	
  “Indexes”	
  (cont.)	
  
•  Use	
  primary	
  keys,	
  aka	
  the	
  row	
  keys,	
  as	
  sorted	
  index	
  
•  One	
  sort	
  direc-on	
  only	
  
•  Use	
  “secondary	
  index”	
  to	
  get	
  reverse	
  sor-ng	
  
•  Lookup	
  table	
  or	
  same	
  table	
  
•  Use	
  secondary	
  keys,	
  aka	
  the	
  column	
  qualifiers,	
  as	
  
sorted	
  index	
  within	
  main	
  record	
  
•  Use	
  prefixes	
  within	
  a	
  column	
  family	
  or	
  separate	
  column	
  
families	
  	
  
14	
   CONFIDENTIAL	
  -­‐	
  RESTRICTED	
  
Common	
  Use-­‐Cases	
  
15	
  
Use-­‐Case	
  I:	
  Messages	
  
16	
  
HBase	
  Message	
  Store	
  
Use-­‐Case:	
  
•  Store	
  incoming	
  messages	
  in	
  HBase,	
  such	
  as	
  Emails,	
  
SMS,	
  MMS,	
  IM	
  
•  Constant	
  updates	
  of	
  exis-ng	
  en--es	
  
•  e.g.	
  Email	
  read,	
  flagged,	
  starred,	
  moved,	
  deleted	
  
•  Reading	
  of	
  top-­‐N	
  entries,	
  sorted	
  by	
  -me	
  
•  Newest	
  20	
  messages,	
  last	
  20	
  conversa-ons	
  
•  Examples:	
  
•  Facebook	
  Messages	
  
17	
  
Problem	
  Descrip-on	
  
•  Records	
  are	
  of	
  varying	
  size	
  
•  Large	
  ones	
  hinder	
  smaller	
  ones	
  
•  Massive	
  index	
  issue	
  
•  User	
  can	
  sort,	
  filter	
  by	
  everything	
  	
  
•  At	
  the	
  same	
  -me	
  reading	
  top-­‐N	
  should	
  be	
  fast	
  
•  But	
  what	
  to	
  do	
  for	
  automated	
  accounts?	
  80/20	
  rule?	
  
•  Only	
  doable	
  with	
  heuris-cs	
  
•  Only	
  create	
  minimal	
  indexes	
  
•  Create	
  addi-onal	
  ones	
  when	
  user	
  asks	
  for	
  it	
  
•  Cross	
  mailbox	
  issues	
  with	
  Conversa-ons	
  
•  Similar	
  to	
  -meline	
  in	
  Facebook	
  
•  Overall	
  requirements	
  for	
  I/O	
  
18	
  
Interlude I: Compaction
Details
Write Amplification in HBase
19	
  
Compac-ons	
  in	
  HBase	
  
•  Must	
  happen	
  to	
  keep	
  data	
  in	
  check	
  
•  Combine	
  small	
  flush	
  files	
  into	
  larger	
  ones	
  
•  Remove	
  old	
  data	
  (during	
  major	
  compac-ons)	
  
•  Two	
  types:	
  Minor	
  and	
  Major	
  Compac-ons	
  
•  Minor	
  are	
  triggered	
  with	
  API	
  muta-on	
  calls	
  
•  Major	
  are	
  -me	
  scheduled	
  (or	
  auto-­‐promoted)	
  
•  Both	
  can	
  be	
  triggered	
  manually	
  if	
  needed	
  
•  Add	
  extra	
  background	
  I/O	
  that	
  grows	
  over	
  -me	
  
•  Write	
  amplifica-on!	
  
•  Have	
  to	
  be	
  tuned	
  for	
  heavy	
  write	
  systems	
  
20	
  
Writes:	
  Flushes	
  and	
  Compac-ons	
  
Older NewerTIME
SIZE (MB)
1000
0
250
500
750
HF1
hbase.hregion.memstore.flush.size = 128MB
21	
  
Writes:	
  Flushes	
  and	
  Compac-ons	
  
Older NewerTIME
SIZE (MB)
1000
0
250
500
750
HF1HF2HF1
22	
  
Writes:	
  Flushes	
  and	
  Compac-ons	
  
HF3
Older NewerTIME
SIZE (MB)
1000
0
250
500
750
HF2HF1
hbase.hstore.compaction.min = 3
hbase.hstore.compactionThreshold = 3 (0.90)
hbase.hstore.compaction.max = 10
23	
  
Writes:	
  Flushes	
  and	
  Compac-ons	
  
CF1
Older NewerTIME
SIZE (MB)
1000
0
250
500
750
1. Compaction
(Major auto promoted)
24	
  
Writes:	
  Flushes	
  and	
  Compac-ons	
  
Older NewerTIME
SIZE (MB)
1000
0
250
500
750
CF1
HF4
25	
  
Writes:	
  Flushes	
  and	
  Compac-ons	
  
Older NewerTIME
SIZE (MB)
1000
0
250
500
750
CF1
HF4 HF5HF4
26	
  
Writes:	
  Flushes	
  and	
  Compac-ons	
  
HF6
Older NewerTIME
SIZE (MB)
1000
0
250
500
750
CF1
HF5HF4
27	
  
Writes:	
  Flushes	
  and	
  Compac-ons	
  
HF6
Older NewerTIME
SIZE (MB)
1000
0
250
500
750
CF1
HF5HF4
hbase.hstore.compaction.ratio = 1.2
hbase.hstore.compaction.min.size = flush size
28	
  
Writes:	
  Flushes	
  and	
  Compac-ons	
  
HF6
Older NewerTIME
SIZE (MB)
1000
0
250
500
750
CF1 HF5
HF4
hbase.hstore.compaction.ratio = 1.2
120%
29	
  
Writes:	
  Flushes	
  and	
  Compac-ons	
  
Older NewerTIME
SIZE (MB)
1000
0
250
500
750
CF2
2. Compaction
(Major auto promoted)
30	
  
Writes:	
  Flushes	
  and	
  Compac-ons	
  
Older NewerTIME
SIZE (MB)
1000
0
250
500
750
CF2
HF7
CF2
31	
  
Writes:	
  Flushes	
  and	
  Compac-ons	
  
Older NewerTIME
SIZE (MB)
1000
0
250
500
750
HF7 HF8
CF2
32	
  
Writes:	
  Flushes	
  and	
  Compac-ons	
  
Older NewerTIME
SIZE (MB)
1000
0
250
500
750
HF7 HF8
CF2
HF9
33	
  
Writes:	
  Flushes	
  and	
  Compac-ons	
  
Older NewerTIME
SIZE (MB)
1000
0
250
500
750
HF7 HF8
CF2
HF9 HF10
34	
  
Writes:	
  Flushes	
  and	
  Compac-ons	
  
Older NewerTIME
SIZE (MB)
1000
0
250
500
750
HF7
HF8
CF2
HF9
HF10
hbase.hstore.compaction.ratio = 1.2
120%
Eliminate older to newer files, until in ratio
35	
  
Writes:	
  Flushes	
  and	
  Compac-ons	
  
Older NewerTIME
SIZE (MB)
1000
0
250
500
750
CF2
CF3
3. Compaction
36	
  
Fast Forward...
37	
  
Writes:	
  Flushes	
  and	
  Compac-ons	
  
Older NewerTIME
SIZE (MB)
1000
0
250
500
750
38	
  
Addi-onal	
  Notes	
  #1	
  
There	
  are	
  a	
  few	
  more	
  sejngs	
  for	
  compac-ons:	
  
•  hbase.hstore.compaction.max = 10
Limit	
  per	
  maximum	
  number	
  of	
  files	
  per	
  compac-on	
  	
  
•  hbase.hstore.compaction.max.size =
Long.MAX_VALUE
Exclude	
  files	
  larger	
  than	
  that	
  sejng	
  (0.92+)	
  
•  hbase.hregion.majorcompaction = 1d
Scheduled	
  major	
  compac-ons	
  
39	
  
Addi-onal	
  Notes	
  #2	
  
•  hbase.hstore.compaction.kv.max = 10
Limits	
  internal	
  scanner	
  caching	
  during	
  read	
  of	
  files	
  to	
  
be	
  compacted	
  
	
  
•  hbase.hstore.blockingStoreFiles = 7	
  
Enforces	
  upper	
  limit	
  of	
  files	
  for	
  compac-ons	
  to	
  catch	
  
up	
  -­‐	
  blocks	
  user	
  opera-ons!	
  
	
  
•  hbase.hstore.blockingWaitTime = 90s	
  
Upper	
  limit	
  on	
  blocking	
  user	
  opera-ons	
  
40	
  
Write Fragmentation
Yo, where’s the data at?
41	
  
Writes:	
  Flushes	
  and	
  Compac-ons	
  
Older NewerTIME
SIZE (MB)
1000
0
250
500
750
Existing Row Mutations
Unique Row Inserts
We are looking at two specific rows,
one is never changed, the other
frequently
42	
  
Writes:	
  Flushes	
  and	
  Compac-ons	
  
Older NewerTIME
SIZE (MB)
1000
0
250
500
750
Existing Row Mutations
Unique Row Inserts
43	
  
Writes:	
  Flushes	
  and	
  Compac-ons	
  
Older NewerTIME
SIZE (MB)
1000
0
250
500
750
Existing Row Mutations
Unique Row Inserts
44	
  
Writes:	
  Flushes	
  and	
  Compac-ons	
  
Older NewerTIME
SIZE (MB)
1000
0
250
500
750
1. Compaction
(Major auto promoted)
Existing Row Mutations
Unique Row Inserts
45	
  
Writes:	
  Flushes	
  and	
  Compac-ons	
  
Older NewerTIME
SIZE (MB)
1000
0
250
500
750
Existing Row Mutations
Unique Row Inserts
46	
  
Writes:	
  Flushes	
  and	
  Compac-ons	
  
Older NewerTIME
SIZE (MB)
1000
0
250
500
750
Existing Row Mutations
Unique Row Inserts
47	
  
Skip forward again...
48	
  
Writes:	
  Flushes	
  and	
  Compac-ons	
  
Older NewerTIME
SIZE (MB)
1000
0
250
500
750
Existing Row Mutations
Unique Row Inserts
49	
  
Source:http://www.ngdata.com/visualizing-hbase-flushes-and-compactions/
50	
  
Compac-on	
  Summary	
  
•  Compac-on	
  tuning	
  is	
  important	
  
•  Do	
  not	
  be	
  too	
  aggressive	
  or	
  write	
  amplifica-on	
  is	
  
no-ceable	
  under	
  load	
  
•  Use	
  -mestamp/-­‐ranges	
  in	
  Get/Scan	
  to	
  limit	
  files	
  
Ra+o	
   Effect	
  
1.0	
  
Dampened,	
  causes	
  more	
  store	
  files,	
  needs	
  to	
  be	
  combined	
  with	
  an	
  
effec-ve	
  Bloom	
  filter	
  usage	
  (non	
  random)	
  
1.2	
   Default	
  value,	
  moderate	
  sejng	
  
1.4	
  
More	
  aggressive,	
  keeps	
  number	
  of	
  files	
  low,	
  causes	
  more	
  auto	
  
promoted	
  major	
  compac-ons	
  to	
  occur	
  
51	
  
Interlude II: Bloom Filter
Call me maybe, baby?
52	
  
Background	
  on	
  Bloom	
  Filters	
  
53	
  
Background	
  on	
  Bloom	
  Filters	
  
•  Bit	
  arrays	
  of	
  m	
  bits,	
  an	
  k	
  hash	
  func-ons	
  
•  HBase	
  uses	
  Hash	
  folding	
  
•  Returns	
  “No”	
  or	
  “Maybe”	
  only	
  
•  Error	
  rate	
  tunable,	
  usually	
  about	
  1%	
  
•  At	
  1%	
  error	
  rate,	
  op-mal	
  k	
  	
  	
  	
  	
  	
  	
  9.6	
  bits	
  per	
  key	
  
m=18, k=3
54	
  
Seeking	
  with	
  Bloom	
  Filters	
  
55	
  
Read	
  Time	
  Series	
  Entry	
  
•  Event	
  record	
  is	
  wriNen	
  once	
  and	
  never	
  deleted	
  or	
  
updated	
  
•  Keeps	
  en-re	
  record	
  in	
  specific	
  loca-on	
  in	
  storage	
  files	
  
•  Use	
  -me	
  range	
  to	
  indicate	
  what	
  is	
  needed	
  	
  
•  {Get|Scan}.setTimeRange()
•  Helps	
  system	
  to	
  skip	
  unnecessary	
  (older)	
  files	
  
•  Bloom	
  Filter	
  helps	
  for	
  given	
  row	
  key(s)	
  and	
  column	
  
qualifiers	
  
•  Can	
  skip	
  files	
  not	
  containing	
  requested	
  details	
  
56	
  
Writes:	
  Flushes	
  and	
  Compac-ons	
  
Older NewerTIME
SIZE (MB)
1000
0
250
500
750
Existing Row Mutations
Unique Row Inserts
Single	
  Block	
  Read	
  (64K)	
  
Block	
  filter	
  and/or	
  -me	
  range	
  
eliminates	
  all	
  other	
  store	
  files	
  
57	
  
Read	
  Updateable	
  En-ty	
  
•  Data	
  is	
  updated	
  regularly,	
  aging	
  out	
  at	
  intervals	
  
•  Reading	
  en-ty	
  needs	
  to	
  read	
  all	
  details	
  to	
  
recons-tute	
  the	
  current	
  state	
  
•  Deletes	
  mask	
  out	
  aNributes	
  
•  Updates	
  overrides	
  (or	
  complements)	
  aNributes	
  
•  Bloom	
  filters	
  will	
  have	
  a	
  hard	
  -me	
  to	
  say	
  “no”	
  since	
  
most	
  files	
  might	
  contain	
  en-ty	
  aNributes	
  
•  Time	
  filter	
  on	
  scans	
  or	
  gets	
  also	
  has	
  few	
  op-ons	
  to	
  
skip	
  files	
  since	
  older	
  aNributes	
  might	
  s-ll	
  be	
  
important	
  
58	
  
Writes:	
  Flushes	
  and	
  Compac-ons	
  
Older NewerTIME
SIZE (MB)
1000
0
250
500
750
Bloom	
  Filter	
  returns	
  “yes”	
  for	
  
all	
  but	
  two	
  files:	
  
7+	
  block	
  loads	
  (64KB)	
  needed	
  
yes	
  
yes	
   yes	
  yes	
  
yes	
   no	
  
yes	
   yes	
   no	
  
59	
  
Bloom	
  Filter	
  Op-ons	
  
There	
  are	
  three	
  choices:	
  
•  NONE	
  
Duh!	
  Use	
  this	
  when	
  the	
  Bloom	
  Filter	
  is	
  not	
  useful	
  
based	
  on	
  the	
  use-­‐case	
  (Default	
  sejng)	
  
•  ROW	
  
Index	
  only	
  row	
  key,	
  needs	
  an	
  entry	
  per	
  row	
  key	
  in	
  
Bloom	
  Filter	
  
•  ROWCOL	
  
Index	
  row	
  and	
  column	
  key,	
  requires	
  an	
  entry	
  in	
  the	
  
Filter	
  for	
  every	
  column	
  cell	
  (KeyValue)	
  
60	
  
How	
  to	
  decide?	
  
61	
  
Bloom	
  Filter	
  Summary	
  
•  They	
  help	
  a	
  lot	
  -­‐	
  but	
  not	
  always	
  
•  Highly	
  depends	
  on	
  write	
  paNerns	
  
•  Keep	
  an	
  eye	
  on	
  size,	
  since	
  they	
  are	
  cached	
  
•  HFile	
  v2	
  helps	
  here	
  as	
  it	
  only	
  loads	
  root	
  index	
  info	
  
“Bloom	
  filters	
  can	
  get	
  as	
  large	
  as	
  100	
  MB	
  per	
  HFile,	
  which	
  
adds	
  up	
  to	
  2	
  GB	
  when	
  aggregated	
  over	
  20	
  regions.	
  Block	
  
indexes	
  can	
  grow	
  as	
  large	
  as	
  6	
  GB	
  in	
  aggregate	
  size	
  over	
  the	
  
same	
  set	
  of	
  regions.”	
  
Source:	
  hNp://hbase.apache.org/book/hfilev2.html	
  
62	
  
Interlude III: Write-ahead
Log
The lonesome writer tale.
63	
  
Write-­‐ahead	
  Log	
  -­‐	
  Data	
  Flow	
  	
  
64	
  
Write-­‐ahead	
  Log	
  -­‐	
  Overview	
  
•  One	
  file	
  per	
  Region	
  Server	
  
•  All	
  regions	
  have	
  a	
  reference	
  to	
  this	
  file	
  
•  Actually	
  a	
  wrapper	
  around	
  the	
  physical	
  file	
  
•  The	
  file	
  is	
  in	
  the	
  end	
  a	
  Hadoop	
  SequenceFile	
  	
  
•  Stored	
  in	
  HDFS	
  so	
  it	
  can	
  be	
  recovered	
  ater	
  a	
  server	
  
failure	
  
•  There	
  is	
  a	
  synchroniza+on	
  barrier	
  that	
  impacts	
  all	
  
parallel	
  writers,	
  aka	
  clients	
  
•  Overall	
  performance	
  is	
  BAD,	
  maybe	
  10MB/s	
  
65	
  
Write-­‐ahead	
  Log	
  -­‐	
  Workarounds	
  
•  Enable	
  log	
  compression	
  
hbase.regionserver.wal.enablecompression
•  Disable	
  WAL	
  for	
  secondary	
  records	
  
•  Restore	
  indexes	
  or	
  derived	
  records	
  from	
  main	
  one	
  
•  But	
  be	
  careful	
  to	
  use	
  coprocessor	
  hook	
  as	
  it	
  cannot	
  access	
  
currently	
  replaying	
  region	
  
•  Work	
  on	
  upstream	
  JIRAs	
  
•  Mul+ple	
  logs	
  per	
  server	
  
•  Fix	
  single	
  writer	
  issue	
  in	
  HDFS	
  
66	
  
Back to the main
theme...
Yes, message stores.
67	
  
Schema	
  
•  Every	
  line	
  is	
  an	
  inbox	
  
•  Indexes	
  as	
  CFs	
  or	
  separate	
  tables	
  
•  Random	
  updates	
  and	
  inserts	
  cause	
  storage	
  file	
  churn	
  
•  Facebook	
  used	
  more	
  than	
  4	
  or	
  5	
  schema	
  itera+ons	
  
•  Not	
  representa-ve	
  really:	
  pure	
  blob	
  storage	
  
•  Evolved	
  over	
  -me	
  to	
  be	
  more	
  HBase	
  like	
  
•  Another	
  customer	
  iterated	
  about	
  the	
  same	
  -me	
  over	
  
various	
  schemas	
  
•  Difficult	
  to	
  keep	
  indexes	
  up	
  to	
  date	
  	
  
68	
  
Facebook Messages
An interesting use-case…
69	
  
Facebook	
  Messages	
  -­‐	
  Sta-s-cs	
  
Source: HBaseCon 2012 - Anshuman Singh
70	
  
71	
  
72	
  
Schema 1
73	
  
Notes	
  on	
  Facebook	
  Schema	
  1	
  
This	
  is	
  basically	
  the	
  same	
  as	
  the	
  NameNode,	
  i.e.	
  the	
  
applica-on	
  only	
  writes	
  edits	
  and	
  those	
  are	
  merged	
  
with	
  a	
  snapshot	
  of	
  the	
  data.	
  
	
  
The	
  applica-on	
  does	
  not	
  use	
  HBase	
  as	
  an	
  opera-onal	
  
store,	
  but	
  all	
  data	
  is	
  cached	
  in	
  memory.	
  
	
  
Writes	
  occasionally	
  large	
  chunks,	
  and	
  reads	
  only	
  a	
  few	
  
-mes	
  to	
  merge	
  or	
  recover.	
  
74	
  
Notes	
  on	
  Facebook	
  Schema	
  1	
  
Three	
  column	
  families:	
  	
  
•  Snapshot,	
  Ac+ons,	
  Keywords	
  
Sejngs	
  changes:	
  
•  DFS	
  Block	
  Size:	
  256MB	
  
•  Since	
  large	
  KVs	
  are	
  wriNen	
  
•  Efficiency	
  of	
  HFile	
  block	
  index	
  a	
  concern	
  
•  Compac-on	
  ra-o:	
  1.4	
  
•  Be	
  more	
  aggressive	
  to	
  clean	
  up	
  files	
  
•  Split	
  Size:	
  2TB	
  
•  Manage	
  splijng	
  manually	
  
•  Major	
  Compac-ons:	
  3	
  days	
  
75	
  
Schema 2
76	
  
Notes	
  on	
  Facebook	
  Schema	
  2	
  
•  Eight	
  column	
  families	
  
•  Snapshots	
  per	
  thread	
  (user	
  to	
  user)	
  
Sejngs	
  changes:	
  
•  Block	
  Cache	
  Size:	
  55%	
  
•  Cache	
  more	
  data	
  on	
  HBase	
  side	
  
•  Blocking	
  Store	
  Files:	
  25	
  
•  Allow	
  more	
  files	
  to	
  be	
  around	
  
•  Compac-on	
  Min	
  Size:	
  4MB	
  
•  Reduce	
  number	
  of	
  uncondi-onally	
  selected	
  files	
  
•  Major	
  Compac-ons:	
  14	
  days	
  
77	
  
Schema 2
78	
  
Notes	
  on	
  Facebook	
  Schema	
  3	
  
•  Eleven	
  column	
  families	
  
•  Twenty	
  regions	
  per	
  server	
  
•  One	
  hundred	
  server	
  per	
  cluster	
  
Sejngs	
  changes:	
  
•  Block	
  Cache	
  Size:	
  60%	
  
•  Cache	
  more	
  data	
  on	
  HBase	
  side	
  
•  Region	
  Slop:	
  5%	
  (from	
  20%)	
  
•  Keep	
  strict	
  boundaries	
  on	
  regions	
  per	
  server	
  
79	
  
80	
  
Note	
  the	
  imbalance!	
  Recall	
  flushes	
  are	
  interconnected	
  
and	
  causes	
  compac-on	
  storms.	
  	
  
81	
  
FB	
  Messages	
  Summary	
  
•  Triggered	
  many	
  changes	
  in	
  HBase:	
  
•  Change	
  compac-on	
  selec-on	
  algorithm	
  
•  Upper	
  bounds	
  on	
  file	
  sizes	
  
•  Pools	
  for	
  small	
  and	
  large	
  compac-ons	
  
•  Online	
  schema	
  changes	
  
•  Finer	
  grained	
  metrics	
  
•  Lazy	
  seeking	
  in	
  files	
  
•  Point-­‐seek	
  op-miza-ons	
  
•  …	
  
82	
  
FB	
  Messages	
  Summary	
  
•  Went	
  from	
  “Snapshot”	
  to	
  more	
  proper	
  schema	
  
•  Needed	
  to	
  wait	
  for	
  schema	
  to	
  seNle	
  
•  Could	
  sustain	
  warped	
  load	
  for	
  a	
  while	
  
•  Eventually	
  uses	
  HBase	
  more	
  as	
  KV	
  store	
  
•  Tweaked	
  sejngs	
  depending	
  on	
  schema	
  
•  Tuned	
  compac-ons	
  from	
  aggressive	
  to	
  relaxed	
  
•  Changed	
  block	
  sizes	
  to	
  fit	
  KV	
  sizes	
  
•  Strict	
  limit	
  on	
  I/O	
  	
  
•  100	
  server	
  	
  
•  20	
  regions	
  per	
  server	
  
•  50	
  million	
  users	
  per	
  cluster	
  
83	
  
Use-­‐Case	
  II:	
  Time	
  Series	
  Database	
  
84	
  
Events	
  make	
  big	
  data	
  big	
  
•  Majority	
  use	
  cases	
  are	
  dealing	
  with	
  event	
  based	
  data	
  
•  Especially	
  on	
  HDFS	
  and	
  MapReduce	
  level	
  
•  Machine	
  Scale	
  vs.	
  Human	
  Scale	
  
•  Event	
  has	
  aNributes	
  
•  Type	
  
•  Iden-fier	
  
•  Actor	
  
•  Other	
  aNributes	
  
85	
  
Events	
  contd.	
  
•  Accessing	
  event	
  data	
  
•  Give	
  me	
  everything	
  about	
  event	
  e_id1	
  
•  Give	
  me	
  everything	
  in	
  [t1,t2]	
  
•  Give	
  me	
  everything	
  for	
  event	
  type	
  e_t1	
  in	
  [t1,t2]	
  
•  Give	
  me	
  everything	
  for	
  actor	
  a1	
  in	
  [t1,t2]	
  
•  Give	
  me	
  everything	
  for	
  event	
  type	
  e_t1	
  by	
  actor	
  a1	
  in	
  
[t1,t2]	
  
•  Aggregate	
  based	
  on	
  some	
  parameters	
  (like	
  above)	
  and	
  
report	
  
•  Find	
  events	
  that	
  match	
  some	
  other	
  given	
  criteria	
  
86	
  
HBase	
  and	
  Time	
  Series	
  
•  Access	
  paNerns	
  suited	
  for	
  HBase	
  
•  Random	
  access	
  to	
  event	
  data	
  or	
  aggregate	
  data	
  
•  Serving…	
  Not	
  real	
  -me	
  compu-ng	
  (that’s	
  Impala)	
  
•  Schema	
  design	
  is	
  the	
  tricky	
  thing	
  
•  OpenTSDB	
  does	
  this	
  well	
  (but	
  limited)	
  
•  Key	
  principle:	
  
•  Collocate	
  data	
  you	
  want	
  to	
  read	
  together	
  
•  Spread	
  out	
  as	
  much	
  as	
  possible	
  at	
  write	
  -me	
  
•  The	
  above	
  two	
  are	
  conflic-ng	
  in	
  a	
  lot	
  of	
  cases.	
  So,	
  you	
  
decide	
  on	
  trade	
  off	
  
87	
  
Time	
  Series	
  design	
  paNerns	
  
•  Ingest	
  
•  Flume	
  or	
  direct	
  wri-ng	
  via	
  app	
  
•  HDFS	
  
•  Batch	
  queries	
  in	
  Hive	
  
•  Faster	
  queries	
  in	
  Impala	
  	
  
•  No	
  user	
  -me	
  serving	
  
•  HBase	
  
•  Serve	
  individual	
  events	
  (OpenTSDB)	
  
•  Serve	
  pre-­‐computed	
  aggregates	
  (OpenTSDB,	
  FB	
  Insights)	
  
•  Solr	
  
•  To	
  make	
  individual	
  events	
  searchable	
  
88	
  
Time	
  Series	
  design	
  paNerns	
  
•  Land	
  data	
  in	
  HDFS	
  and	
  HBase	
  
•  Aggregate	
  in	
  HDFS	
  and	
  write	
  to	
  HBase	
  
•  HBase	
  can	
  do	
  some	
  aggregates	
  too	
  (counters)	
  
•  Keep	
  serve-­‐able	
  data	
  in	
  HBase.	
  Then	
  discard	
  (TTL	
  tw)	
  
•  Keep	
  all	
  data	
  in	
  HDFS	
  for	
  future	
  use	
  
89	
  
The	
  story	
  with	
  only	
  HBase	
  
•  Landing	
  des-na-on	
  
•  Aggregates	
  via	
  counters	
  
•  Serving	
  end	
  users	
  
•  Event	
  -­‐>	
  Flume/App	
  -­‐>	
  HBase	
  
•  Raw	
  entry	
  in	
  HBase	
  for	
  exact	
  value	
  
•  Mul-ple	
  counter	
  increments	
  for	
  aggregates	
  
•  OSS	
  implementa-on	
  -­‐	
  OpenTSDB	
  
90	
  
Overall	
  Summary	
  
91	
  
Applica-ons	
  in	
  HBase	
  
Requires	
  working	
  with	
  schema	
  peculiari-es	
  and	
  
implementa-on	
  idiosyncrasies.	
  
	
  
Important	
  is	
  to	
  compute	
  write	
  rate	
  and	
  un-­‐op+mize	
  
schema	
  to	
  fit	
  given	
  hardware.	
  If	
  hardware	
  is	
  no	
  issue	
  
then	
  the	
  op-mum	
  is	
  achievable.	
  
	
  
Trifacta	
  of	
  good	
  performance:	
  Compac+ons,	
  Bloom	
  
Filters,	
  and	
  key	
  design.	
  
(but	
  also	
  look	
  out	
  for	
  Memstore	
  and	
  Blockcache	
  sejngs)	
  
92	
  
Ques-ons?	
  

Mais conteúdo relacionado

Mais procurados

HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practicelarsgeorge
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon
 
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsightOptimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsightHBaseCon
 
HBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on MesosHBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on MesosHBaseCon
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaCloudera, Inc.
 
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaHBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaCloudera, Inc.
 
Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path HBaseCon
 
Rigorous and Multi-tenant HBase Performance Measurement
Rigorous and Multi-tenant HBase Performance MeasurementRigorous and Multi-tenant HBase Performance Measurement
Rigorous and Multi-tenant HBase Performance MeasurementDataWorks Summit
 
HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseHBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseCloudera, Inc.
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini Cloudera, Inc.
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0enissoz
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteDataWorks Summit
 
HBase Blockcache 101
HBase Blockcache 101HBase Blockcache 101
HBase Blockcache 101Nick Dimiduk
 
Rigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase PerformanceRigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase PerformanceCloudera, Inc.
 
HBase Sizing Notes
HBase Sizing NotesHBase Sizing Notes
HBase Sizing Noteslarsgeorge
 
HBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance EvaluationHBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance EvaluationSchubert Zhang
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance ImprovementBiju Nair
 

Mais procurados (19)

HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
 
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsightOptimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
 
HBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on MesosHBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on Mesos
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
 
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaHBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
 
Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path
 
Rigorous and Multi-tenant HBase Performance Measurement
Rigorous and Multi-tenant HBase Performance MeasurementRigorous and Multi-tenant HBase Performance Measurement
Rigorous and Multi-tenant HBase Performance Measurement
 
HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseHBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBase
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBaseNoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
 
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
 
HBase Blockcache 101
HBase Blockcache 101HBase Blockcache 101
HBase Blockcache 101
 
Rigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase PerformanceRigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase Performance
 
HBase Sizing Notes
HBase Sizing NotesHBase Sizing Notes
HBase Sizing Notes
 
HBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance EvaluationHBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance Evaluation
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance Improvement
 

Destaque

Big Data is not Rocket Science
Big Data is not Rocket ScienceBig Data is not Rocket Science
Big Data is not Rocket Sciencelarsgeorge
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBasedave_revell
 
Realtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaseRealtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaselarsgeorge
 
HBase: Extreme makeover
HBase: Extreme makeoverHBase: Extreme makeover
HBase: Extreme makeoverbigbase
 
Ysance conference - cloud computing - aws - 3 mai 2010
Ysance   conference - cloud computing - aws - 3 mai 2010Ysance   conference - cloud computing - aws - 3 mai 2010
Ysance conference - cloud computing - aws - 3 mai 2010Ysance
 
Social Networks and the Richness of Data
Social Networks and the Richness of DataSocial Networks and the Richness of Data
Social Networks and the Richness of Datalarsgeorge
 
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
Hadoop is dead - long live Hadoop | BiDaTA 2013 GenoaHadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoalarsgeorge
 
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012larsgeorge
 
Introduction sur les problématiques d'une architecture distribuée
Introduction sur les problématiques d'une architecture distribuéeIntroduction sur les problématiques d'une architecture distribuée
Introduction sur les problématiques d'une architecture distribuéeKhanh Maudoux
 
Phoenix - A High Performance Open Source SQL Layer over HBase
Phoenix - A High Performance Open Source SQL Layer over HBasePhoenix - A High Performance Open Source SQL Layer over HBase
Phoenix - A High Performance Open Source SQL Layer over HBaseSalesforce Developers
 
HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017larsgeorge
 
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012larsgeorge
 
Sept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionSept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionAdam Muise
 
ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...
ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...
ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...Cask Data
 
Real-time Analytics with HBase (short version)
Real-time Analytics with HBase (short version)Real-time Analytics with HBase (short version)
Real-time Analytics with HBase (short version)alexbaranau
 
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv larsgeorge
 

Destaque (20)

Big Data is not Rocket Science
Big Data is not Rocket ScienceBig Data is not Rocket Science
Big Data is not Rocket Science
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBase
 
Realtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaseRealtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBase
 
HBase: Extreme makeover
HBase: Extreme makeoverHBase: Extreme makeover
HBase: Extreme makeover
 
Ysance conference - cloud computing - aws - 3 mai 2010
Ysance   conference - cloud computing - aws - 3 mai 2010Ysance   conference - cloud computing - aws - 3 mai 2010
Ysance conference - cloud computing - aws - 3 mai 2010
 
Hadoop unit
Hadoop unitHadoop unit
Hadoop unit
 
Social Networks and the Richness of Data
Social Networks and the Richness of DataSocial Networks and the Richness of Data
Social Networks and the Richness of Data
 
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
Hadoop is dead - long live Hadoop | BiDaTA 2013 GenoaHadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
 
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
 
Introduction sur les problématiques d'une architecture distribuée
Introduction sur les problématiques d'une architecture distribuéeIntroduction sur les problématiques d'une architecture distribuée
Introduction sur les problématiques d'une architecture distribuée
 
Présentation Club STORM
Présentation Club STORMPrésentation Club STORM
Présentation Club STORM
 
Phoenix - A High Performance Open Source SQL Layer over HBase
Phoenix - A High Performance Open Source SQL Layer over HBasePhoenix - A High Performance Open Source SQL Layer over HBase
Phoenix - A High Performance Open Source SQL Layer over HBase
 
Tech day hadoop, Spark
Tech day hadoop, SparkTech day hadoop, Spark
Tech day hadoop, Spark
 
HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017
 
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
 
Sept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionSept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical Introduction
 
Soutenance ysance
Soutenance ysanceSoutenance ysance
Soutenance ysance
 
ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...
ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...
ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...
 
Real-time Analytics with HBase (short version)
Real-time Analytics with HBase (short version)Real-time Analytics with HBase (short version)
Real-time Analytics with HBase (short version)
 
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
 

Semelhante a HBase Applications - Atlanta HUG - May 2014

HBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBaseCon
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestHBaseCon
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconYiwei Ma
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统yongboy
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase强 王
 
[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)baggioss
 
Keynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseKeynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseHBaseCon
 
Thug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen ZhangThug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen ZhangChen Zhang
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0enissoz
 
Meet HBase 2.0
Meet HBase 2.0Meet HBase 2.0
Meet HBase 2.0enissoz
 
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...Cloudera, Inc.
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for ArchitectsNick Dimiduk
 

Semelhante a HBase Applications - Atlanta HUG - May 2014 (20)

HBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial Industry
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)
 
The Future of Hbase
The Future of HbaseThe Future of Hbase
The Future of Hbase
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Keynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseKeynote: The Future of Apache HBase
Keynote: The Future of Apache HBase
 
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
 
Hbase: an introduction
Hbase: an introductionHbase: an introduction
Hbase: an introduction
 
Thug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen ZhangThug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen Zhang
 
Hadoop - Apache Hbase
Hadoop - Apache HbaseHadoop - Apache Hbase
Hadoop - Apache Hbase
 
Hbase mhug 2015
Hbase mhug 2015Hbase mhug 2015
Hbase mhug 2015
 
Meet Apache HBase - 2.0
Meet Apache HBase - 2.0Meet Apache HBase - 2.0
Meet Apache HBase - 2.0
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
 
Meet HBase 2.0
Meet HBase 2.0Meet HBase 2.0
Meet HBase 2.0
 
Apache hadoop hbase
Apache hadoop hbaseApache hadoop hbase
Apache hadoop hbase
 
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for Architects
 

Último

Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 

Último (20)

Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 

HBase Applications - Atlanta HUG - May 2014

  • 1. 1   HBase  Applica-ons   Selected  Use-­‐Cases  around  a  Common  Theme   Atlanta  HUG  –May  2014   Lars  George,  Cloudera   EMEA  Chief  Architect  
  • 2. 2   About  Me   •  EMEA  Chief  Architect  @  Cloudera   •  Consul-ng  on  Hadoop  projects  (everywhere)   •  Apache  CommiNer   •  HBase  and  Whirr   •  O’Reilly  Author   •  HBase  –  The  Defini-ve  Guide   •  Now  in  Japanese!   •  Contact   •  lars@cloudera.com   •  @larsgeorge   日本語版も出ました!
  • 3. 3   The  Content...   •  HBase  -­‐  Strengths  and  weaknesses   •  Common  use-­‐cases  and  paNerns   •  Focus  on  specific  type  of  applica-ons   •  Summary  
  • 4. 4   CONFIDENTIAL  -­‐  RESTRICTED   HBase   Strength  and  Weaknesses  
  • 5. 5   IOPS  vs  Throughput  Mythbusters   It  is  all  physics  in  the  end,  you  cannot  solve  an  I/O   problem  without  reducing  I/O  in  general.  Parallelize   access  and  read/write  sequen-ally.  
  • 6. 6   HBase:  Strengths  &  Weaknesses   Strengths:   •  Random  access  to  small(ish)  key-­‐value  pairs   •  Rows  and  columns  stored  sorted  lexicographically     •  Adds  table  and  region  concepts  to  group  related  KVs   •  Stores  and  reads  data  sequen-ally   •  Parallelizes  across  all  clients   •  Non-­‐blocking  I/O  throughout  
  • 7. 7   HBase:  Strengths  &  Weaknesses   Weaknesses:   •  Not  op-mized  (yet)  for  100%  possible  throughput  of   underlying  storage  layer   •  And  HDFS  is  not  op-mized  fully  either   •  Single  writer  issue  with  WALs   •  Single  server  hot-­‐spojng  with  non-­‐distributed  keys  
  • 8. 8   PaNerns   •  There  are  common  paNerns  in  many  common  use-­‐ cases,  like  programming  paNerns.     •  We  need  to  extract  these  common  paNerns  and  make   them  repeatable.   •  Similar  to  the  “Gang  of  Four”  (Gamma,  Helm,   Johnson,  Vlissides),  or  the  “Three  Amigos”  (Booch,   Jacobson,  Rumbaugh)  
  • 9. 9   CONFIDENTIAL  -­‐  RESTRICTED   Common  PaNerns  
  • 10. 10   HBase  Dilemma   Although  HBase  can  host  many  applica-ons,  they  may   require  completely  opposite  features   Events Entities Time Series Message Store
  • 11. 11   This  talk  (at  this  event)   •  Message  Store   •  Informa-on  exchange  between  en--es   •  Sending/Receiving  informa-on  is  an  event   •  Time-­‐Series   •  Sequence  of  data  points  measure  at  successive  points  in   -me,  spaced  at  uniform  intervals   •  Measuring  of  a  data  point  is  an  event  
  • 12. 12   Using  HBase  Strengths  
  • 13. 13   HBase  “Indexes”  (cont.)   •  Use  primary  keys,  aka  the  row  keys,  as  sorted  index   •  One  sort  direc-on  only   •  Use  “secondary  index”  to  get  reverse  sor-ng   •  Lookup  table  or  same  table   •  Use  secondary  keys,  aka  the  column  qualifiers,  as   sorted  index  within  main  record   •  Use  prefixes  within  a  column  family  or  separate  column   families    
  • 14. 14   CONFIDENTIAL  -­‐  RESTRICTED   Common  Use-­‐Cases  
  • 15. 15   Use-­‐Case  I:  Messages  
  • 16. 16   HBase  Message  Store   Use-­‐Case:   •  Store  incoming  messages  in  HBase,  such  as  Emails,   SMS,  MMS,  IM   •  Constant  updates  of  exis-ng  en--es   •  e.g.  Email  read,  flagged,  starred,  moved,  deleted   •  Reading  of  top-­‐N  entries,  sorted  by  -me   •  Newest  20  messages,  last  20  conversa-ons   •  Examples:   •  Facebook  Messages  
  • 17. 17   Problem  Descrip-on   •  Records  are  of  varying  size   •  Large  ones  hinder  smaller  ones   •  Massive  index  issue   •  User  can  sort,  filter  by  everything     •  At  the  same  -me  reading  top-­‐N  should  be  fast   •  But  what  to  do  for  automated  accounts?  80/20  rule?   •  Only  doable  with  heuris-cs   •  Only  create  minimal  indexes   •  Create  addi-onal  ones  when  user  asks  for  it   •  Cross  mailbox  issues  with  Conversa-ons   •  Similar  to  -meline  in  Facebook   •  Overall  requirements  for  I/O  
  • 18. 18   Interlude I: Compaction Details Write Amplification in HBase
  • 19. 19   Compac-ons  in  HBase   •  Must  happen  to  keep  data  in  check   •  Combine  small  flush  files  into  larger  ones   •  Remove  old  data  (during  major  compac-ons)   •  Two  types:  Minor  and  Major  Compac-ons   •  Minor  are  triggered  with  API  muta-on  calls   •  Major  are  -me  scheduled  (or  auto-­‐promoted)   •  Both  can  be  triggered  manually  if  needed   •  Add  extra  background  I/O  that  grows  over  -me   •  Write  amplifica-on!   •  Have  to  be  tuned  for  heavy  write  systems  
  • 20. 20   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 HF1 hbase.hregion.memstore.flush.size = 128MB
  • 21. 21   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 HF1HF2HF1
  • 22. 22   Writes:  Flushes  and  Compac-ons   HF3 Older NewerTIME SIZE (MB) 1000 0 250 500 750 HF2HF1 hbase.hstore.compaction.min = 3 hbase.hstore.compactionThreshold = 3 (0.90) hbase.hstore.compaction.max = 10
  • 23. 23   Writes:  Flushes  and  Compac-ons   CF1 Older NewerTIME SIZE (MB) 1000 0 250 500 750 1. Compaction (Major auto promoted)
  • 24. 24   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 CF1 HF4
  • 25. 25   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 CF1 HF4 HF5HF4
  • 26. 26   Writes:  Flushes  and  Compac-ons   HF6 Older NewerTIME SIZE (MB) 1000 0 250 500 750 CF1 HF5HF4
  • 27. 27   Writes:  Flushes  and  Compac-ons   HF6 Older NewerTIME SIZE (MB) 1000 0 250 500 750 CF1 HF5HF4 hbase.hstore.compaction.ratio = 1.2 hbase.hstore.compaction.min.size = flush size
  • 28. 28   Writes:  Flushes  and  Compac-ons   HF6 Older NewerTIME SIZE (MB) 1000 0 250 500 750 CF1 HF5 HF4 hbase.hstore.compaction.ratio = 1.2 120%
  • 29. 29   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 CF2 2. Compaction (Major auto promoted)
  • 30. 30   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 CF2 HF7 CF2
  • 31. 31   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 HF7 HF8 CF2
  • 32. 32   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 HF7 HF8 CF2 HF9
  • 33. 33   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 HF7 HF8 CF2 HF9 HF10
  • 34. 34   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 HF7 HF8 CF2 HF9 HF10 hbase.hstore.compaction.ratio = 1.2 120% Eliminate older to newer files, until in ratio
  • 35. 35   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 CF2 CF3 3. Compaction
  • 37. 37   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750
  • 38. 38   Addi-onal  Notes  #1   There  are  a  few  more  sejngs  for  compac-ons:   •  hbase.hstore.compaction.max = 10 Limit  per  maximum  number  of  files  per  compac-on     •  hbase.hstore.compaction.max.size = Long.MAX_VALUE Exclude  files  larger  than  that  sejng  (0.92+)   •  hbase.hregion.majorcompaction = 1d Scheduled  major  compac-ons  
  • 39. 39   Addi-onal  Notes  #2   •  hbase.hstore.compaction.kv.max = 10 Limits  internal  scanner  caching  during  read  of  files  to   be  compacted     •  hbase.hstore.blockingStoreFiles = 7   Enforces  upper  limit  of  files  for  compac-ons  to  catch   up  -­‐  blocks  user  opera-ons!     •  hbase.hstore.blockingWaitTime = 90s   Upper  limit  on  blocking  user  opera-ons  
  • 40. 40   Write Fragmentation Yo, where’s the data at?
  • 41. 41   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 Existing Row Mutations Unique Row Inserts We are looking at two specific rows, one is never changed, the other frequently
  • 42. 42   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 Existing Row Mutations Unique Row Inserts
  • 43. 43   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 Existing Row Mutations Unique Row Inserts
  • 44. 44   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 1. Compaction (Major auto promoted) Existing Row Mutations Unique Row Inserts
  • 45. 45   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 Existing Row Mutations Unique Row Inserts
  • 46. 46   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 Existing Row Mutations Unique Row Inserts
  • 47. 47   Skip forward again...
  • 48. 48   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 Existing Row Mutations Unique Row Inserts
  • 50. 50   Compac-on  Summary   •  Compac-on  tuning  is  important   •  Do  not  be  too  aggressive  or  write  amplifica-on  is   no-ceable  under  load   •  Use  -mestamp/-­‐ranges  in  Get/Scan  to  limit  files   Ra+o   Effect   1.0   Dampened,  causes  more  store  files,  needs  to  be  combined  with  an   effec-ve  Bloom  filter  usage  (non  random)   1.2   Default  value,  moderate  sejng   1.4   More  aggressive,  keeps  number  of  files  low,  causes  more  auto   promoted  major  compac-ons  to  occur  
  • 51. 51   Interlude II: Bloom Filter Call me maybe, baby?
  • 52. 52   Background  on  Bloom  Filters  
  • 53. 53   Background  on  Bloom  Filters   •  Bit  arrays  of  m  bits,  an  k  hash  func-ons   •  HBase  uses  Hash  folding   •  Returns  “No”  or  “Maybe”  only   •  Error  rate  tunable,  usually  about  1%   •  At  1%  error  rate,  op-mal  k              9.6  bits  per  key   m=18, k=3
  • 54. 54   Seeking  with  Bloom  Filters  
  • 55. 55   Read  Time  Series  Entry   •  Event  record  is  wriNen  once  and  never  deleted  or   updated   •  Keeps  en-re  record  in  specific  loca-on  in  storage  files   •  Use  -me  range  to  indicate  what  is  needed     •  {Get|Scan}.setTimeRange() •  Helps  system  to  skip  unnecessary  (older)  files   •  Bloom  Filter  helps  for  given  row  key(s)  and  column   qualifiers   •  Can  skip  files  not  containing  requested  details  
  • 56. 56   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 Existing Row Mutations Unique Row Inserts Single  Block  Read  (64K)   Block  filter  and/or  -me  range   eliminates  all  other  store  files  
  • 57. 57   Read  Updateable  En-ty   •  Data  is  updated  regularly,  aging  out  at  intervals   •  Reading  en-ty  needs  to  read  all  details  to   recons-tute  the  current  state   •  Deletes  mask  out  aNributes   •  Updates  overrides  (or  complements)  aNributes   •  Bloom  filters  will  have  a  hard  -me  to  say  “no”  since   most  files  might  contain  en-ty  aNributes   •  Time  filter  on  scans  or  gets  also  has  few  op-ons  to   skip  files  since  older  aNributes  might  s-ll  be   important  
  • 58. 58   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 Bloom  Filter  returns  “yes”  for   all  but  two  files:   7+  block  loads  (64KB)  needed   yes   yes   yes  yes   yes   no   yes   yes   no  
  • 59. 59   Bloom  Filter  Op-ons   There  are  three  choices:   •  NONE   Duh!  Use  this  when  the  Bloom  Filter  is  not  useful   based  on  the  use-­‐case  (Default  sejng)   •  ROW   Index  only  row  key,  needs  an  entry  per  row  key  in   Bloom  Filter   •  ROWCOL   Index  row  and  column  key,  requires  an  entry  in  the   Filter  for  every  column  cell  (KeyValue)  
  • 60. 60   How  to  decide?  
  • 61. 61   Bloom  Filter  Summary   •  They  help  a  lot  -­‐  but  not  always   •  Highly  depends  on  write  paNerns   •  Keep  an  eye  on  size,  since  they  are  cached   •  HFile  v2  helps  here  as  it  only  loads  root  index  info   “Bloom  filters  can  get  as  large  as  100  MB  per  HFile,  which   adds  up  to  2  GB  when  aggregated  over  20  regions.  Block   indexes  can  grow  as  large  as  6  GB  in  aggregate  size  over  the   same  set  of  regions.”   Source:  hNp://hbase.apache.org/book/hfilev2.html  
  • 62. 62   Interlude III: Write-ahead Log The lonesome writer tale.
  • 63. 63   Write-­‐ahead  Log  -­‐  Data  Flow    
  • 64. 64   Write-­‐ahead  Log  -­‐  Overview   •  One  file  per  Region  Server   •  All  regions  have  a  reference  to  this  file   •  Actually  a  wrapper  around  the  physical  file   •  The  file  is  in  the  end  a  Hadoop  SequenceFile     •  Stored  in  HDFS  so  it  can  be  recovered  ater  a  server   failure   •  There  is  a  synchroniza+on  barrier  that  impacts  all   parallel  writers,  aka  clients   •  Overall  performance  is  BAD,  maybe  10MB/s  
  • 65. 65   Write-­‐ahead  Log  -­‐  Workarounds   •  Enable  log  compression   hbase.regionserver.wal.enablecompression •  Disable  WAL  for  secondary  records   •  Restore  indexes  or  derived  records  from  main  one   •  But  be  careful  to  use  coprocessor  hook  as  it  cannot  access   currently  replaying  region   •  Work  on  upstream  JIRAs   •  Mul+ple  logs  per  server   •  Fix  single  writer  issue  in  HDFS  
  • 66. 66   Back to the main theme... Yes, message stores.
  • 67. 67   Schema   •  Every  line  is  an  inbox   •  Indexes  as  CFs  or  separate  tables   •  Random  updates  and  inserts  cause  storage  file  churn   •  Facebook  used  more  than  4  or  5  schema  itera+ons   •  Not  representa-ve  really:  pure  blob  storage   •  Evolved  over  -me  to  be  more  HBase  like   •  Another  customer  iterated  about  the  same  -me  over   various  schemas   •  Difficult  to  keep  indexes  up  to  date    
  • 68. 68   Facebook Messages An interesting use-case…
  • 69. 69   Facebook  Messages  -­‐  Sta-s-cs   Source: HBaseCon 2012 - Anshuman Singh
  • 70. 70  
  • 71. 71  
  • 73. 73   Notes  on  Facebook  Schema  1   This  is  basically  the  same  as  the  NameNode,  i.e.  the   applica-on  only  writes  edits  and  those  are  merged   with  a  snapshot  of  the  data.     The  applica-on  does  not  use  HBase  as  an  opera-onal   store,  but  all  data  is  cached  in  memory.     Writes  occasionally  large  chunks,  and  reads  only  a  few   -mes  to  merge  or  recover.  
  • 74. 74   Notes  on  Facebook  Schema  1   Three  column  families:     •  Snapshot,  Ac+ons,  Keywords   Sejngs  changes:   •  DFS  Block  Size:  256MB   •  Since  large  KVs  are  wriNen   •  Efficiency  of  HFile  block  index  a  concern   •  Compac-on  ra-o:  1.4   •  Be  more  aggressive  to  clean  up  files   •  Split  Size:  2TB   •  Manage  splijng  manually   •  Major  Compac-ons:  3  days  
  • 76. 76   Notes  on  Facebook  Schema  2   •  Eight  column  families   •  Snapshots  per  thread  (user  to  user)   Sejngs  changes:   •  Block  Cache  Size:  55%   •  Cache  more  data  on  HBase  side   •  Blocking  Store  Files:  25   •  Allow  more  files  to  be  around   •  Compac-on  Min  Size:  4MB   •  Reduce  number  of  uncondi-onally  selected  files   •  Major  Compac-ons:  14  days  
  • 78. 78   Notes  on  Facebook  Schema  3   •  Eleven  column  families   •  Twenty  regions  per  server   •  One  hundred  server  per  cluster   Sejngs  changes:   •  Block  Cache  Size:  60%   •  Cache  more  data  on  HBase  side   •  Region  Slop:  5%  (from  20%)   •  Keep  strict  boundaries  on  regions  per  server  
  • 79. 79  
  • 80. 80   Note  the  imbalance!  Recall  flushes  are  interconnected   and  causes  compac-on  storms.    
  • 81. 81   FB  Messages  Summary   •  Triggered  many  changes  in  HBase:   •  Change  compac-on  selec-on  algorithm   •  Upper  bounds  on  file  sizes   •  Pools  for  small  and  large  compac-ons   •  Online  schema  changes   •  Finer  grained  metrics   •  Lazy  seeking  in  files   •  Point-­‐seek  op-miza-ons   •  …  
  • 82. 82   FB  Messages  Summary   •  Went  from  “Snapshot”  to  more  proper  schema   •  Needed  to  wait  for  schema  to  seNle   •  Could  sustain  warped  load  for  a  while   •  Eventually  uses  HBase  more  as  KV  store   •  Tweaked  sejngs  depending  on  schema   •  Tuned  compac-ons  from  aggressive  to  relaxed   •  Changed  block  sizes  to  fit  KV  sizes   •  Strict  limit  on  I/O     •  100  server     •  20  regions  per  server   •  50  million  users  per  cluster  
  • 83. 83   Use-­‐Case  II:  Time  Series  Database  
  • 84. 84   Events  make  big  data  big   •  Majority  use  cases  are  dealing  with  event  based  data   •  Especially  on  HDFS  and  MapReduce  level   •  Machine  Scale  vs.  Human  Scale   •  Event  has  aNributes   •  Type   •  Iden-fier   •  Actor   •  Other  aNributes  
  • 85. 85   Events  contd.   •  Accessing  event  data   •  Give  me  everything  about  event  e_id1   •  Give  me  everything  in  [t1,t2]   •  Give  me  everything  for  event  type  e_t1  in  [t1,t2]   •  Give  me  everything  for  actor  a1  in  [t1,t2]   •  Give  me  everything  for  event  type  e_t1  by  actor  a1  in   [t1,t2]   •  Aggregate  based  on  some  parameters  (like  above)  and   report   •  Find  events  that  match  some  other  given  criteria  
  • 86. 86   HBase  and  Time  Series   •  Access  paNerns  suited  for  HBase   •  Random  access  to  event  data  or  aggregate  data   •  Serving…  Not  real  -me  compu-ng  (that’s  Impala)   •  Schema  design  is  the  tricky  thing   •  OpenTSDB  does  this  well  (but  limited)   •  Key  principle:   •  Collocate  data  you  want  to  read  together   •  Spread  out  as  much  as  possible  at  write  -me   •  The  above  two  are  conflic-ng  in  a  lot  of  cases.  So,  you   decide  on  trade  off  
  • 87. 87   Time  Series  design  paNerns   •  Ingest   •  Flume  or  direct  wri-ng  via  app   •  HDFS   •  Batch  queries  in  Hive   •  Faster  queries  in  Impala     •  No  user  -me  serving   •  HBase   •  Serve  individual  events  (OpenTSDB)   •  Serve  pre-­‐computed  aggregates  (OpenTSDB,  FB  Insights)   •  Solr   •  To  make  individual  events  searchable  
  • 88. 88   Time  Series  design  paNerns   •  Land  data  in  HDFS  and  HBase   •  Aggregate  in  HDFS  and  write  to  HBase   •  HBase  can  do  some  aggregates  too  (counters)   •  Keep  serve-­‐able  data  in  HBase.  Then  discard  (TTL  tw)   •  Keep  all  data  in  HDFS  for  future  use  
  • 89. 89   The  story  with  only  HBase   •  Landing  des-na-on   •  Aggregates  via  counters   •  Serving  end  users   •  Event  -­‐>  Flume/App  -­‐>  HBase   •  Raw  entry  in  HBase  for  exact  value   •  Mul-ple  counter  increments  for  aggregates   •  OSS  implementa-on  -­‐  OpenTSDB  
  • 91. 91   Applica-ons  in  HBase   Requires  working  with  schema  peculiari-es  and   implementa-on  idiosyncrasies.     Important  is  to  compute  write  rate  and  un-­‐op+mize   schema  to  fit  given  hardware.  If  hardware  is  no  issue   then  the  op-mum  is  achievable.     Trifacta  of  good  performance:  Compac+ons,  Bloom   Filters,  and  key  design.   (but  also  look  out  for  Memstore  and  Blockcache  sejngs)