DSPy a system for AI to Write Prompts and Do Fine Tuning
H base vs hive srp vs analytics 2-14-2012
1. HBase vs. Hive
Philip Wickline
Chief Technology Officer
Hadapt
2. Goals
Brief introduction to the differences between
transactional/operational and analytical systems
Understand when to use Hive and when to use HBase and why
2
5. Differences of Purpose : “Transaction Processing”
Operational systems
• Optimized for small short random access – reads and writes
• E.g. record that an employee invested $100 in a S&P500 index
fund in his 401(k) *or* record that a user posted something on
another users “wall”
Traditional DB examples
• Oracle
• MySQL
NoSQL Examples
• HBase
• MongoDB
• Cassandra
5
6. Differences of Purpose: Analytics
Analytics
• Optimized for read-only computations about large amounts of
data
• E.g. compute the average amount invested in bond funds and
stock funds for all employees at all employers over the last 5
years 10
5
0 5-10
DB Examples Option 1 0-5
• Netezza
• Vertica
16
14
12 Option 1
NoSQL Examples 10
8 Plan Acme
6
• Hive Actual GM
4
Newco
2
• Pig 0 Oldco
Oct Nov Dec Jan Feb Mar Bigcorp
6
7. HBase Data Model : Conceptual
From the BigTable paper:
“a sparse, distributed, persistent multi-dimensional sorted map”
(row : bytestring, column family : bytestring, column : bytestring,
time : int64) -> byte string
7
9. Hive Data Model : Conceptual
Traditional Relational Tables
CUSTKEY NAME ADDRESS NATIONKEY PHONE ACCTBAL COMMENT
451234 NEWC 196 1 111-555- $1,231,285 NULL
ORP Broadway 1212
…
887765 ACME 1 Main st. 2 222-555- $46,945 “Top
… 1212 customer”
9
10. HBase Data Model : Physical
Every cell stored with row, family, column and timestamp
Allows fast lookup with low copy overhead
BUT
Space inefficient (optional compression available) and inefficient
to scan
“key_1” “cf_a” “c_i” 15 “foo”
“key_1” “cf_a” “c_ii” 15 “bar”
“key_2” “cf_a” “c_ii” 4 “baz”
10
11. Hive Data Model : Physical
Depends on the underlying storage files
Can use flat text files, RCFiles, even use HBase for storage
Standard Row Storage
C_1 C_2 C_3 C_4
11 12 13 14
21 22 23 24
31 32 33 34
41 42 43 44
51 52 53 54
11
12. Hive Data Model : RCFile
Break into row groups, and then store as columns
Row Group 1
C_1 11 21 31
C_2 12 22 32
C_3 13 23 33
C_4 14 24 34
Row Group 2
C_1 41 51
C_2 42 52
C_3 43 53
C_4 44 54
12
13. Informal Performance Comparison
Hive HBase
Insert Speed batch Fast!
Update Speed NA Fast!
Lookup speed MR lower bound Fast!
(10s of seconds)
Data warehouse 15x faster on one Uh oh
queries test
13