O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Secondary Indexing in Phoenix
Jesse Yates
HBase Committer
Software Engineer
LA HBase User Group – September 4, 2013
Agenda
• About
• Other Indexing Frameworks
• Immutable Indexes
• Mutable Indexes in Phoenix
• Mutable Indexing Internals
•...
About me
• Developer at Salesforce
– System of Record, Phoenix
• Open Source
– Phoenix
– HBase
– Accumulo
3 LA HUG – Sept ...
Phoenix
• Open Source
– https://github.com/forcedotcom/phoenix
• “SQL-skin” on HBase
– Everyone knows SQL!
• JDBC Driver
–...
Why Index?
• HBase is only sorted on 1 “axis”
• Great for search via a single pattern
Example!
LA HUG – Sept 20135
Example
name:
type:
subtype:
date:
major:
minor:
quantity:
LA HUG – Sept 20136
Secondary Indexes
• Sort on ‘orthogonal’ axis
• Save full-table scan
• Expected database feature
• Hard in HBase b/c of AC...
Agenda
• About
• Other Indexing Frameworks
• Immutable Indexes
• Mutable Indexes in Phoenix
• Mutable Indexing Internals
•...
9 LA HUG – Sept 2013
http://www.wired.com/wiredenterprise/2011/10/microsoft-and-hadoop/
Other (Major) Indexing Frameworks
• HBase SEP
– Side-Effects Processor
– Replication-based
– https://github.com/NGDATA/hba...
Agenda
• About
• Other Indexing Frameworks
• Immutable Indexes
• Mutable Indexes in Phoenix
• Mutable Indexing Internals
•...
Immutable Indexes
• Immutable Rows
• Much easier to implement
• Client-managed
• Bulk-loadable
12 LA HUG – Sept 2013
Bulk Loading
phoenix-hbase.blogspot.com
13 LA HUG – Sept 2013
Index Bulk Loading
Identity Mapper
Custom Phoenix Reducer
14 LA HUG – Sept 2013
HFile Output Format
Index Bulk Loading
PreparedStatement statement = conn.prepareStatement(dmlStatement);
statement.execute();
String upsertSt...
Agenda
• About
• Other Indexing Frameworks
• Immutable Indexes
• Mutable Indexes in Phoenix
• Mutable Indexing Internals
•...
The “fun” stuff…
17 LA HUG – Sept 2013
1.5 years
18 LA HUG – Sept 2013
Mutable Indexes
• Global Index
• Change row state
– Common use-case
– “expected” implementation
• Covered Columns
19 LA HU...
Usage
• Just SQL!
• Baby name popularity
• Mock demo
20 LA HUG – Sept 2013
Usage
• Selects the most popular name for a given year
SELECT name,occurrences FROM baby_names WHERE year=2012 LIMIT 1;
• ...
Usage
• Update rows due to census inaccuracy
– Will only work if the mutable indexing is working
UPSERT INTO baby_names SE...
Agenda
• About
• Other Indexing Frameworks
• Immutable Indexes
• Mutable Indexes in Phoenix
• Mutable Indexing Internals
•...
Internals
• Index Management
– Build index updates
– Ensures index is ‘cleaned up’
• Recovery Mechanism
– Ensures index up...
“There is no magic”
- Every programming hipster (chipster)
LA HUG – Sept 201325
Mutable Indexing: Standard Write Path
26
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
LA HUG – ...
Mutable Indexing: Standard Write Path
27
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
LA HUG – ...
Mutable Indexing
28
Region
Coprocessor
Host
WAL
Region
Coprocessor
Host
Indexer Builder
WAL Updater
Durable!
Indexer
Index...
Index Management
29
• Lives within a RegionCoprocesorObserver
• Access to the local HRegion
• Specifies the mutations to a...
Why not write my own?
• Managing Cleanup
– Efficient point-in-time correctness
– Performance tricks
• Abstract access to H...
Example: Managing Cleanup
• Updates can arrive out of order
– Client-managed timestamps
LA HUG – Sept 201331
ROW FAMILY QU...
Example: Managing Cleanup
Index Table
LA HUG – Sept 201332
ROW FAMILY QUALIFIER TS
Val1|Row1 Index Fam:Qual 10
Val1|Val2|R...
Example: Managing Cleanup
LA HUG – Sept 201333
ROW FAMILY QUALIFIER TS VALUE
Row1 Fam Qual 10 val1
Row1 Fam2 Qual2 12 val2...
Example: Managing Cleanup
LA HUG – Sept 201334
ROW FAMILY QUALIFIER TS VALUE
Row1 Fam Qual 10 val1
Row1 Fam Qual 11 val4
R...
Example: Managing Cleanup
LA HUG – Sept 201335
ROW FAMILY QUALIFIER TS
Va1|Row1 Index Fam:Qual 10
Val4|Row1 Index Fam:Qual...
Example: Managing Cleanup
LA HUG – Sept 201336
ROW FAMILY QUALIFIER TS
Va1|Row1 Index Fam:Qual 10
Val4|Row1 Index Fam:Qual...
Managing Cleanup
• History “roll up”
• Out-of-order Updates
• Point-in-time correctness
• Multiple Timestamps per Mutation...
Phoenix Index Builder
• Much simpler than full index management
• Hides cleanup considerations
• Abstracted access to loca...
Phoenix Index Codec
LA HUG – Sept 201339
Dude, where’s my data?
40 LA HUG – Sept 2013
Ensuring Correctness
HBase ACID
• Does NOT give you:
– Cross-row consistency
– Cross-table consistency
• Does give you:
– Durable data on succe...
Key Observation
“Secondary indexing is inherently an easier
problem than full transactions… secondary
index updates are id...
Idempotent Index Updates
• Doesn’t need full transactions
• Replay as many times as needed
• Can tolerate a little lag
– A...
Failure Recovery
• Custom WALEditCodec
– Encodes index updates
– Supports compressed WAL
• Custom WAL Reader
– Replay inde...
Failure Situations
• Any time before WAL, client replay
• Any time after WAL, HBase replay
• All-or-nothing
LA HUG – Sept ...
Failure #1: Before WAL
46
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
LA HUG – Sept 2013
Failure #1: Before WAL
47
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
No problem! No data
is s...
Failure #2: After WAL
48
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
LA HUG – Sept 2013
Failure #2: After WAL
49
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
WAL replayed via
usual re...
Agenda
• About
• Other Indexing Frameworks
• Immutable Indexes
• Mutable Indexes
• Roadmap
50 LA HUG – Sept 2013
Roadmap
• Next release of Phoenix
• Performance testing
• Increased adoption
• Adding to HBase (?)
51 LA HUG – Sept 2013
Open Source!
• Main:
https://github.com/forcedotcom/phoenix
• Indexing:
https://github.com/forcedotcom/phoenix/tree/mutabl...
(obligatory hiring slide)
We’re Hiring!
Questions? Comments?
jyates@salesforce.com
@jesse_yates
Próximos SlideShares
Carregando em…5
×

Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

3.069 visualizações

Publicada em

In-depth look at secondary indexing for phoenix

Publicada em: Tecnologia
  • Seja o primeiro a comentar

Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

  1. 1. Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer LA HBase User Group – September 4, 2013
  2. 2. Agenda • About • Other Indexing Frameworks • Immutable Indexes • Mutable Indexes in Phoenix • Mutable Indexing Internals • Roadmap 2 LA HUG – Sept 2013 https://www.madison.k12.wi.us/calendars
  3. 3. About me • Developer at Salesforce – System of Record, Phoenix • Open Source – Phoenix – HBase – Accumulo 3 LA HUG – Sept 2013
  4. 4. Phoenix • Open Source – https://github.com/forcedotcom/phoenix • “SQL-skin” on HBase – Everyone knows SQL! • JDBC Driver – Plug-and-play • Faster than HBase – in some cases 4 LA HUG – Sept 2013
  5. 5. Why Index? • HBase is only sorted on 1 “axis” • Great for search via a single pattern Example! LA HUG – Sept 20135
  6. 6. Example name: type: subtype: date: major: minor: quantity: LA HUG – Sept 20136
  7. 7. Secondary Indexes • Sort on ‘orthogonal’ axis • Save full-table scan • Expected database feature • Hard in HBase b/c of ACID considerations LA HUG – Sept 20137
  8. 8. Agenda • About • Other Indexing Frameworks • Immutable Indexes • Mutable Indexes in Phoenix • Mutable Indexing Internals • Roadmap 8 LA HUG – Sept 2013
  9. 9. 9 LA HUG – Sept 2013 http://www.wired.com/wiredenterprise/2011/10/microsoft-and-hadoop/
  10. 10. Other (Major) Indexing Frameworks • HBase SEP – Side-Effects Processor – Replication-based – https://github.com/NGDATA/hbase-sep • Huawei – Server-local indexes – Buddy regions – https://github.com/Huawei-Hadoop/hindex 10 LA HUG – Sept 2013
  11. 11. Agenda • About • Other Indexing Frameworks • Immutable Indexes • Mutable Indexes in Phoenix • Mutable Indexing Internals • Roadmap 11 LA HUG – Sept 2013
  12. 12. Immutable Indexes • Immutable Rows • Much easier to implement • Client-managed • Bulk-loadable 12 LA HUG – Sept 2013
  13. 13. Bulk Loading phoenix-hbase.blogspot.com 13 LA HUG – Sept 2013
  14. 14. Index Bulk Loading Identity Mapper Custom Phoenix Reducer 14 LA HUG – Sept 2013 HFile Output Format
  15. 15. Index Bulk Loading PreparedStatement statement = conn.prepareStatement(dmlStatement); statement.execute(); String upsertStmt = "upsert into core.entity_history(organization_id,key_prefix,entity_history_id, created_by, created_date)n" + "values(?,?,?,?,?)"; statement = conn.prepareStatement(upsertStmt); … //set values Iterator<Pair<byte[],List<KeyValue>>>dataIterator = PhoenixRuntime.getUncommittedDataIterator(conn); 15 LA HUG – Sept 2013
  16. 16. Agenda • About • Other Indexing Frameworks • Immutable Indexes • Mutable Indexes in Phoenix • Mutable Indexing Internals • Roadmap 16 LA HUG – Sept 2013
  17. 17. The “fun” stuff… 17 LA HUG – Sept 2013
  18. 18. 1.5 years 18 LA HUG – Sept 2013
  19. 19. Mutable Indexes • Global Index • Change row state – Common use-case – “expected” implementation • Covered Columns 19 LA HUG – Sept 2013
  20. 20. Usage • Just SQL! • Baby name popularity • Mock demo 20 LA HUG – Sept 2013
  21. 21. Usage • Selects the most popular name for a given year SELECT name,occurrences FROM baby_names WHERE year=2012 LIMIT 1; • Selects the total occurrences of a given name across all years SELECT /*+ NO_INDEX */ name,sum(occurrences) FROM baby_names WHERE name='Jesse' GROUP BY name; • Selects the total occurrences of a given name across all years allowing an index to be used SELECT name,sum(occurrences) FROM baby_names WHERE name='Jesse' GROUP BY NAME; LA HUG – Sept 201321
  22. 22. Usage • Update rows due to census inaccuracy – Will only work if the mutable indexing is working UPSERT INTO baby_names SELECT year,occurrences+3000,sex,name FROM baby_names WHERE name='Jesse'; • Selects the now updated data (from the index table) SELECT name,sum(occurrences) FROM baby_names WHERE name='Jesse' GROUP BY NAME; • Index table still used in scans EXPLAIN SELECT name,sum(occurrences) FROM baby_names WHERE name='Jesse' GROUP BY NAME; LA HUG – Sept 201322
  23. 23. Agenda • About • Other Indexing Frameworks • Immutable Indexes • Mutable Indexes in Phoenix • Mutable Indexing Internals • Roadmap 23 LA HUG – Sept 2013
  24. 24. Internals • Index Management – Build index updates – Ensures index is ‘cleaned up’ • Recovery Mechanism – Ensures index updates are “ACID” 24 LA HUG – Sept 2013
  25. 25. “There is no magic” - Every programming hipster (chipster) LA HUG – Sept 201325
  26. 26. Mutable Indexing: Standard Write Path 26 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore LA HUG – Sept 2013
  27. 27. Mutable Indexing: Standard Write Path 27 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore LA HUG – Sept 2013
  28. 28. Mutable Indexing 28 Region Coprocessor Host WAL Region Coprocessor Host Indexer Builder WAL Updater Durable! Indexer Index Table Index Table Index Table Codec LA HUG – Sept 2013
  29. 29. Index Management 29 • Lives within a RegionCoprocesorObserver • Access to the local HRegion • Specifies the mutations to apply to the index tables public interface IndexBuilder{ public void setup(RegionCoprocessorEnvironmentenv); public Map<Mutation, String>getIndexUpdate(Put put); public Map<Mutation, String>getIndexUpdate(Deletedelete); } LA HUG – Sept 2013
  30. 30. Why not write my own? • Managing Cleanup – Efficient point-in-time correctness – Performance tricks • Abstract access to HRegion – Minimal network hops • Sorting correctness – Phoenix typing ensures correct index sorting LA HUG – Sept 201330
  31. 31. Example: Managing Cleanup • Updates can arrive out of order – Client-managed timestamps LA HUG – Sept 201331 ROW FAMILY QUALIFIER TS VALUE Row1 Fam Qual 10 val1 Row1 Fam2 Qual2 12 val2 Row1 Fam Qual 13 val3
  32. 32. Example: Managing Cleanup Index Table LA HUG – Sept 201332 ROW FAMILY QUALIFIER TS Val1|Row1 Index Fam:Qual 10 Val1|Val2|Row1 Index Fam:Qual Fam2:Qual2 12 Val3|Val2|Row1 Index Fam:Qual Fam2:Qual2 13
  33. 33. Example: Managing Cleanup LA HUG – Sept 201333 ROW FAMILY QUALIFIER TS VALUE Row1 Fam Qual 10 val1 Row1 Fam2 Qual2 12 val2 Row1 Fam Qual 13 val3 Row1 Fam Qual 11 val4
  34. 34. Example: Managing Cleanup LA HUG – Sept 201334 ROW FAMILY QUALIFIER TS VALUE Row1 Fam Qual 10 val1 Row1 Fam Qual 11 val4 Row1 Fam2 Qual2 12 val2 Row1 Fam Qual 13 val3
  35. 35. Example: Managing Cleanup LA HUG – Sept 201335 ROW FAMILY QUALIFIER TS Va1|Row1 Index Fam:Qual 10 Val4|Row1 Index Fam:Qual 11 Val4|Val2|Row1 Index Fam:Qual Fam2:Qual2 12 Va1l|Val2|Row1 Index Fam:Qual Fam2:Qual2 12 Val3|Val2|Row1 Index Fam:Qual Fam2:Qual2 13
  36. 36. Example: Managing Cleanup LA HUG – Sept 201336 ROW FAMILY QUALIFIER TS Va1|Row1 Index Fam:Qual 10 Val4|Row1 Index Fam:Qual 11 Val4|Val2|Row1 Index Fam:Qual Fam2:Qual2 12 Va1l|Val2|Row1 Index Fam:Qual Fam2:Qual2 12 Val3|Val2|Row1 Index Fam:Qual Fam2:Qual2 13
  37. 37. Managing Cleanup • History “roll up” • Out-of-order Updates • Point-in-time correctness • Multiple Timestamps per Mutation • Delete vs. DeleteColumn vs. DeleteFamily Surprisingly hard! LA HUG – Sept 201337
  38. 38. Phoenix Index Builder • Much simpler than full index management • Hides cleanup considerations • Abstracted access to local state LA HUG – Sept 201338 public interfaceIndexCodec{ public void initialize(RegionCoprocessorEnvironmentenv); public Iterable<IndexUpdate>getIndexDeletes(TableState state; public Iterable<IndexUpdate>getIndexUpserts(TableState state); }
  39. 39. Phoenix Index Codec LA HUG – Sept 201339
  40. 40. Dude, where’s my data? 40 LA HUG – Sept 2013 Ensuring Correctness
  41. 41. HBase ACID • Does NOT give you: – Cross-row consistency – Cross-table consistency • Does give you: – Durable data on success – Visibility on success without partial rows 41 LA HUG – Sept 2013
  42. 42. Key Observation “Secondary indexing is inherently an easier problem than full transactions… secondary index updates are idempotent.” - Lars Hofhansl 42 LA HUG – Sept 2013
  43. 43. Idempotent Index Updates • Doesn’t need full transactions • Replay as many times as needed • Can tolerate a little lag – As long as we get the order right 43 LA HUG – Sept 2013
  44. 44. Failure Recovery • Custom WALEditCodec – Encodes index updates – Supports compressed WAL • Custom WAL Reader – Replay index updates from WAL LA HUG – Sept 201344 <property> <name>hbase.regionserver.wal.codec</name><value>o.a.h.hbase.regionserver.w al.IndexedWALEditCodec</value> </property> <property> <name>hbase.regionserver.hlog.reader.impl</name> <value>o.a.h.hbase.regionserver.wal.IndexedHLogReader</value> </property>
  45. 45. Failure Situations • Any time before WAL, client replay • Any time after WAL, HBase replay • All-or-nothing LA HUG – Sept 201345
  46. 46. Failure #1: Before WAL 46 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore LA HUG – Sept 2013
  47. 47. Failure #1: Before WAL 47 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore No problem! No data is stored in the WAL, client just retries entire update. LA HUG – Sept 2013
  48. 48. Failure #2: After WAL 48 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore LA HUG – Sept 2013
  49. 49. Failure #2: After WAL 49 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore WAL replayed via usual replay mechanisms LA HUG – Sept 2013
  50. 50. Agenda • About • Other Indexing Frameworks • Immutable Indexes • Mutable Indexes • Roadmap 50 LA HUG – Sept 2013
  51. 51. Roadmap • Next release of Phoenix • Performance testing • Increased adoption • Adding to HBase (?) 51 LA HUG – Sept 2013
  52. 52. Open Source! • Main: https://github.com/forcedotcom/phoenix • Indexing: https://github.com/forcedotcom/phoenix/tree/mutable-si 52 LA HUG – Sept 2013
  53. 53. (obligatory hiring slide) We’re Hiring!
  54. 54. Questions? Comments? jyates@salesforce.com @jesse_yates

×