SlideShare uma empresa Scribd logo
1 de 57
1©MapR Technologies
© MapR Technologies, confidential
Introduction to Apache HBase,
MapR Tables, and Security
Agenda
 HBase Overview
 HBase APIs
 MapR Tables
 Example
 Securing tables
What‟s HBase??
 A NoSQL database
– Synonym for ‘non-traditional’ database
 A distributed columnar data store
– Storage layout implies performance characteristics
 The “Hadoop” database
 A semi-structured database
– No rigid requirements to define columns or even data types in advance
– It’s all bytes to HBase
 A persistent sorted Map of Maps
– Programmers view
3
Column Oriented
CF1
colA colB colC
val val
val
 Row is indexed by a key
– Data stored sorted by key
 Data is stored by columns grouped into column families
– Each family is a file of column values laid out in sorted order by row key
– Contrast this to a traditional row oriented database where rows are
stored together with fixed space allocated for each row
CF2
colA colB colC
val val
val
Row
Key
axxx
gxxx
Customer Address data Customer order dataCustomer id
HBase Data Model- Row Keys
 Row Keys: identify the rows in an HBase table.
Row
Key
CF1 CF2 …
colA colB colC colA colB colC colD
R1
axxx val val val val
…
gxxx val val val val
R2
hxxx val val val val val val val
…
jxxx val
R3
kxxx val val val val
…
rxxx val val val val val val
… sxxx val val
Rows are Stored in Sorted Order
 Sorting of row key is based upon binary values
–Sort is lexicographic at byte level
–Comparison is “left to right”
 Example:
–Sort order for String 1, 2, 3, …, 99, 100:
 1, 10, 100, 11, 12,…, 2, 20, 21, …, 9, 91, 92, …,
98, 99
– Sort order for String 001, 002, 003, …, 099, 100:
 001, 002, 003, …, 099, 100
–What if the RowKeys were numbers converted to
fixed sized binary?
Tables are split into Regions = contiguous keys
Source: Diagram from Lars George‟s HBase: The Definitive Guide.
Key
Range
Region1
Key Range
axxx
gxxx
 Tables are partitioned into key ranges (regions)
 Region= contiguous keys, served by nodes (RegionServers)
 Regions are spread across cluster: S1, S2…
Region 2
Key Range
Lxxx
zxxx
Region
CF1
colA colB colC
val val
val
CF2
colA colB colC
val val
val
Region
Row
key
axxx
gxxx
Region Server for Region 2, 3
HBase Data Model- Cells
 Value for each cell is specified by complete coordinates:
– RowKey  Column Family  Column  Version: Value
– Key:CF:Col:Version:Value
RowKey CF:Qualifier version value
smithj Data:street 12734567800 Main street
Column Key
Sparsely-Populated Data
 Missing values: Cells remain empty and consume no storage
Row
Key
CF1 CF2 …
colA colB colC colA colB colC colD
Region
1
axxx val val val val
…
gxxx val val val val
Region
2
hxxx val val val val val val val
…
jxxx val
R3
kxxx val val val val
…
rxxx val val val val val val
… sxxx val val
HBase Data Model Summary
 Efficient/Flexible
– Storage allocated for columns only as needed on a given row
• Great for sparse data
• Great for data of widely varying size
– Adding columns can be done at any time without impact
– Compression and versioning are usually built-in and take advantage of
column family storage (like data together)
 Highly Scalable
– Data is sharded amongst regions based upon key
• Regions are distributed in cluster
– Grouping by key = related data stored together
 Finding data
– Key implies region and server, column family implies file
– Efficiently get to any data by key
Agenda
 HBase Overview
 HBase APIs
 MapR Tables
 Example
 Securing tables
Basic Table Operations
 Create Table, define Column Families before data is imported
– But not the rows keys or number/names of columns
 Basic data access operations (CRUD):
put Inserts data into rows (both add and update)
get Accesses data from one row
scan Accesses data from a range of rows
delete Delete a row or a range of rows or columns
CRUD Operations Follow A Pattern (mostly)
 Most common pattern
– Instantiate object for an operation: Put put = new Put(key)
– Add or Set attributes to specify what you need: put.add(…)
– Execute the operation against the table: myTable.put(put)
// Insert value1 into rowKey in columnFamily:columnName1
Put put = new Put(rowKey);
put.add(columnFamily, columnName1, value1);
myTable.put(put);
// Retrieve values from rowA in columnFamily:columnName1
Get get = new Get(rowKey);
get.addColumn(columnFamily, columnName1);
Result result = myTable.get(get);
Put Example
byte [] invTable = Bytes.toBytes("/path/Inventory");
byte [] stockCF = Bytes.toBytes(“stock");
byte [] quantityCol = Bytes.toBytes (“quantity”);
long amt = 24l;
HTableInterface table = new HTable(hbaseConfig, invTable);
Put put = new Put(Bytes.toBytes (“pens”));
put.add(stockCF, quantityCol, Bytes.toBytes(amt));
table.put(put);
quantity
pens 24
CF “stock"Inventory
Put Operation – Add method
 Once a Put instance is created you call an add method on it
 Typically you add a value for a specific column in a column family
– ("column name" and "qualifier" mean the same thing)
 Optionally you can set a timestamp for a cell
Put add(byte[] family, byte[] qualifier, long ts, byte[]
value)
Put add(byte[] family, byte[] qualifier, byte[] value)
Put Operation –Single Put Example
adding multiple column values to a row
byte [] tableName = Bytes.toBytes("/path/Shopping");
byte [] itemsCF = Bytes.toBytes(“items");
byte [] penCol = Bytes.toBytes (“pens”);
byte [] noteCol = Bytes.toBytes (“notes”);
byte [] eraserCol = Bytes.toBytes (“erasers”);
HTableInterface table = new HTable(hbaseConfig, tableName);
Put put = new Put(“mike”);
put.add(itemsCF, penCol, Bytes.toBytes(5l));
put.add(itemsCF, noteCol, Bytes.toBytes(5l));
put.add(itemsCF, eraserCol, Bytes.toBytes(2l));
table.put(put);
Bytes class
http://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/util/Bytes.html
 org.apache.hadoop.hbase.util.Bytes
 Provides methods to convert Java types to and from byte[] arrays
 Support for
 String, boolean, short, int, long, double, and float
 Example:
byte [] bytesTablePath = Bytes.toBytes("/path/Shopping");
String myTable = Bytes.toString(bytesTablePath);
byte [] amountBytes = Bytes.toBytes(1000l);
long amount = Bytes.toLong(amount);
Get Operation – Single Get Example
byte [] tableName = Bytes.toBytes("/path/Shopping");
byte [] itemsCF = Bytes.toBytes(“stock");
byte [] penCol = Bytes.toBytes (“pens”);
HTableInterface table = new HTable(hbaseConfig, tableName);
Get get = new Get(“Mike”);
get.addColumn(itemsCF, penCol);
Result result = myTable.get(get);
byte[] val = result.getValue(itemsCF, penCol);
System.out.println("Value: " + Bytes.toLong(val));
Get Operation – Add And Set methods
 Using just a get object will return everything for a row.
 To narrow down results call add
– addFamily: get all columns for a specific family
– addColumn: get a specific column
 To further narrow down results, specify more details via one or
more set calls then call add
– setTimeRange: retrieve columns within a specific range of version
timestamps
– setTimestamp: retrieve columns with a specific timestamp
– setMaxVersions: set the number of versions of each column to be returned
– setFilter: add a filter
get.addColumn(columnFamilyName, columnName1);
Result – Retrieve A Value From A Result
public static final byte[] ITEMS_CF= Bytes.toBytes("items");
public static final byte[] PENS_COL = Bytes.toBytes(“pens");
Get g = new Get(Bytes.toBytes(“Adam”));
g.addColumn(ITEMS_CF , PENS_COL);
Result result = table.get(g);
byte[] b = result.getValue(ITEMS_CF, PENS_COL);
long valueInColumn = Bytes.toLong(b);
http://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/client/Result.html
Items:pens Items:notepads Items:erasers
Adam 18 7 10
Other APIs
 Not covering append, delete, and scan
 Not covering administrative APIs
24
Agenda
 HBase Overview
 HBase APIs
 MapR Tables
 Example
 Securing tables
Tables and Files in a Unified Storage Layer
HBase
JVM
HDFS
JVM
ext3 FS
Disks
Apache
HBase on
Hadoop
HBase
JVM
Apache HBase on
MapR Filesystem
MapR-FS
Disks
HDFS API
M7 Tables Integrated
into Filesystem
MapR-FS
Disks
HBase API HDFS API
MapR Filesystem is an integrated system
– Tables and Files in a unified filesystem, based on
MapR’s enterprise-grade storage layer.
Portability
 MapR tables use the HBase data model and API
 Apache HBase applications work as-is on MapR tables
–No need to recompile
–No vendor lock-in
MapR-FS
Disks
HBase API HDFS API
MapR M7 Table Storage
 Table regions live inside a MapR container
– Served by MapR fileserver service running on nodes
– HBase RegionServer and HBase Master services are not required
Region Region
Container
Key colB colC
val val
val
Key colB colC
val val
val
Region Region
Container
Key colB colC
val val
val
Key colB colC
val val
val
Client Nodes
MapR Tables vs. HBase
• Compaction delays
• Manual administration
• Poor reliability
• Lengthy disaster recovery
• No Compaction delays
• Easy administration
• Strong consistency
• Rapid recovery
• 2x Cassandra performance
• 3x HBase performance
Apache HBase
MapR M7 vs. CDH – Mixed Load (50-50)
Agenda
 HBase Overview
 HBase APIs
 MapR Tables
 Example
 Securing tables
Example: Employee Database
 Column Family: Base
– lastName
– firstName
– address
– SSN
 Column Family: salary
– ‘dynamic’ columns
– year:salary
 Row key
– lastName:firstName? Not unique
– Unique id? Can’t search easily
– lastName:firstName:id? Can’t search by id
32
Source: “employee class”
public class Employee {
String key;
String lastName, firstName, address;
String ssn;
Map<Integer, Integer> salary;
…
}
33
Source: „schema‟
byte[] BASE_CF = Bytes.toBytes("base");
byte[] SALARY_CF = Bytes.toBytes("salary");
byte[] FIRST_COL = Bytes.toBytes("firstName");
byte[] LAST_COL = Bytes.toBytes("lastName");
byte[] ADDRESS_COL = Bytes.toBytes("address");
byte[] SSN_COL = Bytes.toBytes("ssn");
String tableName = userdirectory + "/" + shortName;
byte[] TABLE_NAME = Bytes.toBytes(tableName);
34
Source: “get table”
HTablePool pool = new HTablePool();
table = pool.getTable(TABLE_NAME);
return table;
35
Source: “get row”
 Whole row
Get g = new Get(Bytes.toBytes(key));
Result result = getTable().get(g);
 Just base column family
Get g = new Get(Bytes.toBytes(key));
g.addFamily(BASE_CF);
Result result = getTable().get(g);
36
Source: “parse row”
Employee e = new Employee();
e.setKey(Bytes.toString(r.getRow()));
e.setLastName(getString(r, BASE_CF, LAST_COL));
e.setFirstName(getString(r,BASE_CF, FIRST_COL));
e.setAddress(getString(r,BASE_CF, ADDRESS_COL));
e.setSsn(getString(r,BASE_CF, SSN_COL));
String getString(Result r, byte[] cf, byte[] col) {
byte[] b = r.getValue(cf, col);
if (b != null)
return Bytes.toString(b);
else return "";
}
37
Source: “parse row”
//get salary information
Map<byte[], byte[]> m = r.getFamilyMap(SALARY_CF);
Iterator<Map.Entry<byte[], byte[]>> i =
m.entrySet().iterator();
while (i.hasNext()) {
Map.Entry<byte[], byte[]> entry = i.next();
Integer year =
Integer.parseInt(Bytes.toString(entry.getKey()));
Integer amt = Integer.parseInt(Bytes.toString(
entry.getValue()));
e.getSalary().put(year, amt);
}
38
Demo
 Create a table using MCS
 Create a table and column families using maprcli
39
$ maprcli table create -path /user/keys/employees
$ maprcli table cf create -path /user/keys/employees -cfname
base
$ maprcli table cf create -path /user/keys/employees -cfname
salary
Demo
 Populate with sample data using hbase shell
40
hbase> put '/user/keys/employees', 'k1', 'base:lastName', 'William'
> put '/user/keys/employees', 'k1', 'base:firstName', 'John'
> put '/user/keys/employees', 'k1', 'base:address', '123 street, springfield, VA'
> put '/user/keys/empoyees', 'k1', 'base:ssn', '999-99-9999'
> put '/user/keys/employees', 'k1', 'salary:2010', '90000’
> put '/user/keys/employees', 'k1', 'salary:2011', '91000’
> put '/user/keys/employees', 'k1', 'salary:2012', '92000’
> put '/user/keys/employees', 'k1', 'salary:2013', '93000’
….….
Demo
 Fetch record using java program
41
$ ./run employees get k1
Use command get against table /user/keys/employees
Employee record:
Employee [key=k1, lastName=William, firstName=John,
address=123 first street, springfield, VA, ssn=999-99-9999,
salary={2010=90000, 2011=91000, 2012=92000, 2013=93000}]
Demo – run script
42
#!/bin/bash
export LD_LIBRARY_PATH=/opt/mapr/hadoop/hadoop-
0.20.2/lib/native/Linux-amd64-64
java -cp `hbase
classpath`:/home/kbotzum/development/exercises/target/exercises.jar
person.botzum.hbase.Demo $*
What Didn‟t I Consider?
43
 Row Key
 Secondary ways of searching
– Other tables as indexes?
 Long term data evolution
– Avro?
– Protobufs?
 Security
– SSN is sensitive
– Salary looks kind of sensitive
What Didn‟t I Consider?
44
Agenda
 HBase Overview
 HBase APIs
 MapR Tables
 Example
 Securing tables
MapR Tables Security
 Access Control Expressions (ACEs)
– Boolean logic to control access at table, column family, and column level
46
ACE Highlights
 Creator of table has all rights by default
– Others have none
 Can grant admin rights without granting read/write rights
 Defaults for column families set at table level
 Access to data depends on column family and column access
controls
 Boolean logic
47
MapR Tables Security
 Leverages MapR security when enabled
– Wire level authentication
– Wire level encryption
– Trivial to configure
• Most reasonable settings by default
• No Kerberos required!
– Portable
• No MapR specific APIs
48
Demo
 Enable cluster security
 Yes, that’s it!
– Now all Web UI and CLI access requires authentication
– Traffic is now authenticated using encrypted credentials
– Most traffic is encrypted and bulk data transfer traffic can be encrypted
49
# configure.sh –C hostname –Z hostname -secure –genkeys
Demo
 Fetch record using java program when not authenticated
50
$ ./run employees get k1
Use command get against table /user/keys/employees
14/03/14 18:42:39 ERROR fs.MapRFileSystem: Exception while
trying to get currentUser
java.io.IOException: failure to login: Unable to obtain MapR
credentials
Demo
 Fetch record using java program
51
$ maprlogin password
[Password for user 'keys' at cluster 'my.cluster.com': ]
MapR credentials of user 'keys' for cluster 'my.cluster.com' are written to
'/tmp/maprticket_1000'
$ ./run employees get k1
Use command get against table /user/keys/employees
Employee record:
Employee [key=k1, lastName=William, firstName=John, address=123 first
street, springfield, VA, ssn=999-99-9999, salary={2010=90000,
2011=91000, 2012=92000, 2013=93000}]
Demo
 Fetch record using java program as someone not authorized to
table
52
$ maprlogin password
[Password for user 'fred' at cluster 'my.cluster.com': ]
MapR credentials of user 'fred' for cluster 'my.cluster.com' are written to
'/tmp/maprticket_2001'
$ ./run /user/keys/employees get k1
Use command get against table /user/keys/employees
2014-03-14 18:49:20,2787 ERROR JniCommon
fs/client/fileclient/cc/jni_common.cc:7318 Thread: 139674989631232
Error in DBGetRPC for table /user/keys/employees, error: Permission
denied(13)
Exception in thread "main" java.io.IOException: Error: Permission
denied(13)
Demo
 Set ACEs to allow read to base information but not salary
 Fetch whole record using java program
53
$ ./run /user/keys/employees get k1
Use command get against table /user/keys/employees
2014-03-14 18:53:15,0806 ERROR JniCommon
fs/client/fileclient/cc/jni_common.cc:7318 Thread:
139715048077056 Error in DBGetRPC for table
/user/keys/employees, error: Permission denied(13)
Exception in thread "main" java.io.IOException: Error: Permission
denied(13)
Demo
 Set ACEs to allow read to base information but not salary
 Fetch just base record using java program
54
$ ./run employees getbase k1
Use command get against table /user/keys/employees
Employee record:
Employee [key=k1, lastName=William, firstName=John,
address=123 first street, springfield, VA, ssn=999-99-9999,
salary={}]
What Else Didn‟t I Consider?

55
References
 http://www.mapr.com/blog/getting-started-mapr-security-0
 http://www.mapr.com/
 http://hadoop.apache.org/
 http://hbase.apache.org/
 http://tech.flurry.com/2012/06/12/137492485/
 http://en.wikipedia.org/wiki/Lexicographical_order
 Hbase in Action, Nick Dimiduck, Amandeep Khurana
 HBase: The Definitive Guide, Lars George
 Note: this presentation includes materials from the MapR HBase
training classes
57©MapR Technologies
© MapR Technologies, confidential
Questions?
57
58©MapR Technologies
© MapR Technologies, confidential
Hbase Architecture
What is HBase? (Cluster View)
 ZooKeeper (ZK)
 HMaster (HM)
 Region Servers (RS)
For MapR, there is less delineation between Control and Data Nodes.
ZooKeeper
NameNode
A B
HMaster
C D
HMaster
ZooKeeper
ZooKeeper
Master
servers
Slave
servers
Region Server
Data Node
Region Server
Data Node
Region Server
Data Node
Region Server
Data Node
What is a Region?
 The basic partitioning/sharding unit of HBase.
 Each region is assigned a range of keys it is responsible for.
 Region servers serve data for reads and writes
Region Server
Client
Region Region
HMaster
zookeeper
Region Region
Region Server
Key colB colC
val val
val
Key colB colC
val val
val
Key colB colC
val val
val
Key colB colC
val val
val
zookeeper
zookeeper

Mais conteúdo relacionado

Mais procurados

MapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase APIMapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase APImcsrivas
 
Hadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema DesignHadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema DesignCloudera, Inc.
 
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Carol McDonald
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drilltshiran
 
Hadoop hbase mapreduce
Hadoop hbase mapreduceHadoop hbase mapreduce
Hadoop hbase mapreduceFARUK BERKSÖZ
 
Working with Delimited Data in Apache Drill 1.6.0
Working with Delimited Data in Apache Drill 1.6.0Working with Delimited Data in Apache Drill 1.6.0
Working with Delimited Data in Apache Drill 1.6.0Vince Gonzalez
 
Apache HBase 1.0 Release
Apache HBase 1.0 ReleaseApache HBase 1.0 Release
Apache HBase 1.0 ReleaseNick Dimiduk
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base InstallCloudera, Inc.
 
Apache HBase for Architects
Apache HBase for ArchitectsApache HBase for Architects
Apache HBase for ArchitectsNick Dimiduk
 
HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBaseCon
 
Meethadoop
MeethadoopMeethadoop
MeethadoopIIIT-H
 
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesSpark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesDataWorks Summit/Hadoop Summit
 
Drill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is PossibleDrill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is PossibleMapR Technologies
 
Free Code Friday: Drill 101 - Basics of Apache Drill
Free Code Friday: Drill 101 - Basics of Apache DrillFree Code Friday: Drill 101 - Basics of Apache Drill
Free Code Friday: Drill 101 - Basics of Apache DrillMapR Technologies
 
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranThe Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranMapR Technologies
 
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clusterFive major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clustermas4share
 

Mais procurados (20)

MapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase APIMapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase API
 
M7 and Apache Drill, Micheal Hausenblas
M7 and Apache Drill, Micheal HausenblasM7 and Apache Drill, Micheal Hausenblas
M7 and Apache Drill, Micheal Hausenblas
 
Hadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema DesignHadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema Design
 
Introduction to Apache Drill
Introduction to Apache DrillIntroduction to Apache Drill
Introduction to Apache Drill
 
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drill
 
Hadoop hbase mapreduce
Hadoop hbase mapreduceHadoop hbase mapreduce
Hadoop hbase mapreduce
 
Working with Delimited Data in Apache Drill 1.6.0
Working with Delimited Data in Apache Drill 1.6.0Working with Delimited Data in Apache Drill 1.6.0
Working with Delimited Data in Apache Drill 1.6.0
 
Apache HBase 1.0 Release
Apache HBase 1.0 ReleaseApache HBase 1.0 Release
Apache HBase 1.0 Release
 
Hadoop-Introduction
Hadoop-IntroductionHadoop-Introduction
Hadoop-Introduction
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base Install
 
Apache HBase for Architects
Apache HBase for ArchitectsApache HBase for Architects
Apache HBase for Architects
 
HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDK
 
Meethadoop
MeethadoopMeethadoop
Meethadoop
 
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesSpark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different Rules
 
Drill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is PossibleDrill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is Possible
 
Free Code Friday: Drill 101 - Basics of Apache Drill
Free Code Friday: Drill 101 - Basics of Apache DrillFree Code Friday: Drill 101 - Basics of Apache Drill
Free Code Friday: Drill 101 - Basics of Apache Drill
 
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranThe Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
 
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clusterFive major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
 
01 hbase
01 hbase01 hbase
01 hbase
 

Destaque

MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR Technologies
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7Ted Dunning
 
Big Data Hadoop Briefing Hosted by Cisco, WWT and MapR: MapR Overview Present...
Big Data Hadoop Briefing Hosted by Cisco, WWT and MapR: MapR Overview Present...Big Data Hadoop Briefing Hosted by Cisco, WWT and MapR: MapR Overview Present...
Big Data Hadoop Briefing Hosted by Cisco, WWT and MapR: MapR Overview Present...ervogler
 
NoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DBNoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DBMapR Technologies
 
Architectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop DistributionArchitectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop Distributionmcsrivas
 
MapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Technologies
 
Apache Drill でたしなむ セルフサービスデータ探索 - 2014/11/06 Cloudera World Tokyo 2014 LTセッション
Apache Drill でたしなむ セルフサービスデータ探索 - 2014/11/06 Cloudera World Tokyo 2014 LTセッションApache Drill でたしなむ セルフサービスデータ探索 - 2014/11/06 Cloudera World Tokyo 2014 LTセッション
Apache Drill でたしなむ セルフサービスデータ探索 - 2014/11/06 Cloudera World Tokyo 2014 LTセッションMapR Technologies Japan
 
Zeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data ArchitectureZeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data ArchitectureMapR Technologies
 
구글을 지탱하는 기술 요약 - Google 검색
구글을 지탱하는 기술 요약 - Google 검색구글을 지탱하는 기술 요약 - Google 검색
구글을 지탱하는 기술 요약 - Google 검색혜웅 박
 
Agile project management with green hopper 6 blueprints
Agile project management with green hopper 6 blueprintsAgile project management with green hopper 6 blueprints
Agile project management with green hopper 6 blueprintsJaibeer Malik
 
Hadoop Summit 2012 | Improving HBase Availability and Repair
Hadoop Summit 2012 | Improving HBase Availability and RepairHadoop Summit 2012 | Improving HBase Availability and Repair
Hadoop Summit 2012 | Improving HBase Availability and RepairCloudera, Inc.
 
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Cloudera, Inc.
 
Cloudera/Stanford EE203 (Entrepreneurial Engineer)
Cloudera/Stanford EE203 (Entrepreneurial Engineer)Cloudera/Stanford EE203 (Entrepreneurial Engineer)
Cloudera/Stanford EE203 (Entrepreneurial Engineer)Amr Awadallah
 
Sentiment Analysis Using Solr
Sentiment Analysis Using SolrSentiment Analysis Using Solr
Sentiment Analysis Using SolrPradeep Pujari
 
구글을 지탱하는 기술 요약 - Bigtable
구글을 지탱하는 기술 요약 - Bigtable구글을 지탱하는 기술 요약 - Bigtable
구글을 지탱하는 기술 요약 - Bigtable혜웅 박
 

Destaque (20)

MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document Database
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
 
MapR & Skytree:
MapR & Skytree: MapR & Skytree:
MapR & Skytree:
 
Big Data Hadoop Briefing Hosted by Cisco, WWT and MapR: MapR Overview Present...
Big Data Hadoop Briefing Hosted by Cisco, WWT and MapR: MapR Overview Present...Big Data Hadoop Briefing Hosted by Cisco, WWT and MapR: MapR Overview Present...
Big Data Hadoop Briefing Hosted by Cisco, WWT and MapR: MapR Overview Present...
 
NoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DBNoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DB
 
Architectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop DistributionArchitectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop Distribution
 
MapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data Platform
 
Apache Drill でたしなむ セルフサービスデータ探索 - 2014/11/06 Cloudera World Tokyo 2014 LTセッション
Apache Drill でたしなむ セルフサービスデータ探索 - 2014/11/06 Cloudera World Tokyo 2014 LTセッションApache Drill でたしなむ セルフサービスデータ探索 - 2014/11/06 Cloudera World Tokyo 2014 LTセッション
Apache Drill でたしなむ セルフサービスデータ探索 - 2014/11/06 Cloudera World Tokyo 2014 LTセッション
 
Zeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data ArchitectureZeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data Architecture
 
Apache Spark & Hadoop
Apache Spark & HadoopApache Spark & Hadoop
Apache Spark & Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
구글을 지탱하는 기술 요약 - Google 검색
구글을 지탱하는 기술 요약 - Google 검색구글을 지탱하는 기술 요약 - Google 검색
구글을 지탱하는 기술 요약 - Google 검색
 
Agile project management with green hopper 6 blueprints
Agile project management with green hopper 6 blueprintsAgile project management with green hopper 6 blueprints
Agile project management with green hopper 6 blueprints
 
Hadoop Summit 2012 | Improving HBase Availability and Repair
Hadoop Summit 2012 | Improving HBase Availability and RepairHadoop Summit 2012 | Improving HBase Availability and Repair
Hadoop Summit 2012 | Improving HBase Availability and Repair
 
Spark vs Hadoop
Spark vs HadoopSpark vs Hadoop
Spark vs Hadoop
 
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
 
Cloudera/Stanford EE203 (Entrepreneurial Engineer)
Cloudera/Stanford EE203 (Entrepreneurial Engineer)Cloudera/Stanford EE203 (Entrepreneurial Engineer)
Cloudera/Stanford EE203 (Entrepreneurial Engineer)
 
Sentiment Analysis Using Solr
Sentiment Analysis Using SolrSentiment Analysis Using Solr
Sentiment Analysis Using Solr
 
구글을 지탱하는 기술 요약 - Bigtable
구글을 지탱하는 기술 요약 - Bigtable구글을 지탱하는 기술 요약 - Bigtable
구글을 지탱하는 기술 요약 - Bigtable
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
 

Semelhante a Introduction to Apache HBase, MapR Tables and Security

Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelAndrey Lomakin
 
Advance Hive, NoSQL Database (HBase) - Module 7
Advance Hive, NoSQL Database (HBase) - Module 7Advance Hive, NoSQL Database (HBase) - Module 7
Advance Hive, NoSQL Database (HBase) - Module 7Rohit Agrawal
 
Performing Data Science with HBase
Performing Data Science with HBasePerforming Data Science with HBase
Performing Data Science with HBaseWibiData
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionCloudera, Inc.
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBaseAnil Gupta
 
Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018Aman Sinha
 
Hive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive TeamHive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive TeamZheng Shao
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptxSadhik7
 
Getting Started with HBase
Getting Started with HBaseGetting Started with HBase
Getting Started with HBaseCarol McDonald
 
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"Inhacking
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at  FacebookHadoop and Hive Development at  Facebook
Hadoop and Hive Development at FacebookS S
 

Semelhante a Introduction to Apache HBase, MapR Tables and Security (20)

NoSQL & HBase overview
NoSQL & HBase overviewNoSQL & HBase overview
NoSQL & HBase overview
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data model
 
Advance Hive, NoSQL Database (HBase) - Module 7
Advance Hive, NoSQL Database (HBase) - Module 7Advance Hive, NoSQL Database (HBase) - Module 7
Advance Hive, NoSQL Database (HBase) - Module 7
 
Hbase
HbaseHbase
Hbase
 
Performing Data Science with HBase
Performing Data Science with HBasePerforming Data Science with HBase
Performing Data Science with HBase
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
 
Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018
 
Hspark index conf
Hspark index confHspark index conf
Hspark index conf
 
Hive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive TeamHive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive Team
 
Hbase
HbaseHbase
Hbase
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
 
Apache HBase Workshop
Apache HBase WorkshopApache HBase Workshop
Apache HBase Workshop
 
Getting Started with HBase
Getting Started with HBaseGetting Started with HBase
Getting Started with HBase
 
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
 
Apache HBase
Apache HBase  Apache HBase
Apache HBase
 
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
 
Valerii Moisieienko Apache hbase workshop
Valerii Moisieienko	Apache hbase workshopValerii Moisieienko	Apache hbase workshop
Valerii Moisieienko Apache hbase workshop
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at  FacebookHadoop and Hive Development at  Facebook
Hadoop and Hive Development at Facebook
 

Mais de MapR Technologies

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscapeMapR Technologies
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationMapR Technologies
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataMapR Technologies
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureMapR Technologies
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsMapR Technologies
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionMapR Technologies
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformMapR Technologies
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareMapR Technologies
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Technologies
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLMapR Technologies
 

Mais de MapR Technologies (20)

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscape
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
 

Último

Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 

Último (20)

Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 

Introduction to Apache HBase, MapR Tables and Security

  • 1. 1©MapR Technologies © MapR Technologies, confidential Introduction to Apache HBase, MapR Tables, and Security
  • 2. Agenda  HBase Overview  HBase APIs  MapR Tables  Example  Securing tables
  • 3. What‟s HBase??  A NoSQL database – Synonym for ‘non-traditional’ database  A distributed columnar data store – Storage layout implies performance characteristics  The “Hadoop” database  A semi-structured database – No rigid requirements to define columns or even data types in advance – It’s all bytes to HBase  A persistent sorted Map of Maps – Programmers view 3
  • 4. Column Oriented CF1 colA colB colC val val val  Row is indexed by a key – Data stored sorted by key  Data is stored by columns grouped into column families – Each family is a file of column values laid out in sorted order by row key – Contrast this to a traditional row oriented database where rows are stored together with fixed space allocated for each row CF2 colA colB colC val val val Row Key axxx gxxx Customer Address data Customer order dataCustomer id
  • 5. HBase Data Model- Row Keys  Row Keys: identify the rows in an HBase table. Row Key CF1 CF2 … colA colB colC colA colB colC colD R1 axxx val val val val … gxxx val val val val R2 hxxx val val val val val val val … jxxx val R3 kxxx val val val val … rxxx val val val val val val … sxxx val val
  • 6. Rows are Stored in Sorted Order  Sorting of row key is based upon binary values –Sort is lexicographic at byte level –Comparison is “left to right”  Example: –Sort order for String 1, 2, 3, …, 99, 100:  1, 10, 100, 11, 12,…, 2, 20, 21, …, 9, 91, 92, …, 98, 99 – Sort order for String 001, 002, 003, …, 099, 100:  001, 002, 003, …, 099, 100 –What if the RowKeys were numbers converted to fixed sized binary?
  • 7. Tables are split into Regions = contiguous keys Source: Diagram from Lars George‟s HBase: The Definitive Guide. Key Range Region1 Key Range axxx gxxx  Tables are partitioned into key ranges (regions)  Region= contiguous keys, served by nodes (RegionServers)  Regions are spread across cluster: S1, S2… Region 2 Key Range Lxxx zxxx Region CF1 colA colB colC val val val CF2 colA colB colC val val val Region Row key axxx gxxx Region Server for Region 2, 3
  • 8. HBase Data Model- Cells  Value for each cell is specified by complete coordinates: – RowKey  Column Family  Column  Version: Value – Key:CF:Col:Version:Value RowKey CF:Qualifier version value smithj Data:street 12734567800 Main street Column Key
  • 9. Sparsely-Populated Data  Missing values: Cells remain empty and consume no storage Row Key CF1 CF2 … colA colB colC colA colB colC colD Region 1 axxx val val val val … gxxx val val val val Region 2 hxxx val val val val val val val … jxxx val R3 kxxx val val val val … rxxx val val val val val val … sxxx val val
  • 10. HBase Data Model Summary  Efficient/Flexible – Storage allocated for columns only as needed on a given row • Great for sparse data • Great for data of widely varying size – Adding columns can be done at any time without impact – Compression and versioning are usually built-in and take advantage of column family storage (like data together)  Highly Scalable – Data is sharded amongst regions based upon key • Regions are distributed in cluster – Grouping by key = related data stored together  Finding data – Key implies region and server, column family implies file – Efficiently get to any data by key
  • 11. Agenda  HBase Overview  HBase APIs  MapR Tables  Example  Securing tables
  • 12. Basic Table Operations  Create Table, define Column Families before data is imported – But not the rows keys or number/names of columns  Basic data access operations (CRUD): put Inserts data into rows (both add and update) get Accesses data from one row scan Accesses data from a range of rows delete Delete a row or a range of rows or columns
  • 13. CRUD Operations Follow A Pattern (mostly)  Most common pattern – Instantiate object for an operation: Put put = new Put(key) – Add or Set attributes to specify what you need: put.add(…) – Execute the operation against the table: myTable.put(put) // Insert value1 into rowKey in columnFamily:columnName1 Put put = new Put(rowKey); put.add(columnFamily, columnName1, value1); myTable.put(put); // Retrieve values from rowA in columnFamily:columnName1 Get get = new Get(rowKey); get.addColumn(columnFamily, columnName1); Result result = myTable.get(get);
  • 14. Put Example byte [] invTable = Bytes.toBytes("/path/Inventory"); byte [] stockCF = Bytes.toBytes(“stock"); byte [] quantityCol = Bytes.toBytes (“quantity”); long amt = 24l; HTableInterface table = new HTable(hbaseConfig, invTable); Put put = new Put(Bytes.toBytes (“pens”)); put.add(stockCF, quantityCol, Bytes.toBytes(amt)); table.put(put); quantity pens 24 CF “stock"Inventory
  • 15. Put Operation – Add method  Once a Put instance is created you call an add method on it  Typically you add a value for a specific column in a column family – ("column name" and "qualifier" mean the same thing)  Optionally you can set a timestamp for a cell Put add(byte[] family, byte[] qualifier, long ts, byte[] value) Put add(byte[] family, byte[] qualifier, byte[] value)
  • 16. Put Operation –Single Put Example adding multiple column values to a row byte [] tableName = Bytes.toBytes("/path/Shopping"); byte [] itemsCF = Bytes.toBytes(“items"); byte [] penCol = Bytes.toBytes (“pens”); byte [] noteCol = Bytes.toBytes (“notes”); byte [] eraserCol = Bytes.toBytes (“erasers”); HTableInterface table = new HTable(hbaseConfig, tableName); Put put = new Put(“mike”); put.add(itemsCF, penCol, Bytes.toBytes(5l)); put.add(itemsCF, noteCol, Bytes.toBytes(5l)); put.add(itemsCF, eraserCol, Bytes.toBytes(2l)); table.put(put);
  • 17. Bytes class http://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/util/Bytes.html  org.apache.hadoop.hbase.util.Bytes  Provides methods to convert Java types to and from byte[] arrays  Support for  String, boolean, short, int, long, double, and float  Example: byte [] bytesTablePath = Bytes.toBytes("/path/Shopping"); String myTable = Bytes.toString(bytesTablePath); byte [] amountBytes = Bytes.toBytes(1000l); long amount = Bytes.toLong(amount);
  • 18. Get Operation – Single Get Example byte [] tableName = Bytes.toBytes("/path/Shopping"); byte [] itemsCF = Bytes.toBytes(“stock"); byte [] penCol = Bytes.toBytes (“pens”); HTableInterface table = new HTable(hbaseConfig, tableName); Get get = new Get(“Mike”); get.addColumn(itemsCF, penCol); Result result = myTable.get(get); byte[] val = result.getValue(itemsCF, penCol); System.out.println("Value: " + Bytes.toLong(val));
  • 19. Get Operation – Add And Set methods  Using just a get object will return everything for a row.  To narrow down results call add – addFamily: get all columns for a specific family – addColumn: get a specific column  To further narrow down results, specify more details via one or more set calls then call add – setTimeRange: retrieve columns within a specific range of version timestamps – setTimestamp: retrieve columns with a specific timestamp – setMaxVersions: set the number of versions of each column to be returned – setFilter: add a filter get.addColumn(columnFamilyName, columnName1);
  • 20. Result – Retrieve A Value From A Result public static final byte[] ITEMS_CF= Bytes.toBytes("items"); public static final byte[] PENS_COL = Bytes.toBytes(“pens"); Get g = new Get(Bytes.toBytes(“Adam”)); g.addColumn(ITEMS_CF , PENS_COL); Result result = table.get(g); byte[] b = result.getValue(ITEMS_CF, PENS_COL); long valueInColumn = Bytes.toLong(b); http://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/client/Result.html Items:pens Items:notepads Items:erasers Adam 18 7 10
  • 21. Other APIs  Not covering append, delete, and scan  Not covering administrative APIs 24
  • 22. Agenda  HBase Overview  HBase APIs  MapR Tables  Example  Securing tables
  • 23. Tables and Files in a Unified Storage Layer HBase JVM HDFS JVM ext3 FS Disks Apache HBase on Hadoop HBase JVM Apache HBase on MapR Filesystem MapR-FS Disks HDFS API M7 Tables Integrated into Filesystem MapR-FS Disks HBase API HDFS API MapR Filesystem is an integrated system – Tables and Files in a unified filesystem, based on MapR’s enterprise-grade storage layer.
  • 24. Portability  MapR tables use the HBase data model and API  Apache HBase applications work as-is on MapR tables –No need to recompile –No vendor lock-in MapR-FS Disks HBase API HDFS API
  • 25. MapR M7 Table Storage  Table regions live inside a MapR container – Served by MapR fileserver service running on nodes – HBase RegionServer and HBase Master services are not required Region Region Container Key colB colC val val val Key colB colC val val val Region Region Container Key colB colC val val val Key colB colC val val val Client Nodes
  • 26. MapR Tables vs. HBase • Compaction delays • Manual administration • Poor reliability • Lengthy disaster recovery • No Compaction delays • Easy administration • Strong consistency • Rapid recovery • 2x Cassandra performance • 3x HBase performance Apache HBase
  • 27. MapR M7 vs. CDH – Mixed Load (50-50)
  • 28. Agenda  HBase Overview  HBase APIs  MapR Tables  Example  Securing tables
  • 29. Example: Employee Database  Column Family: Base – lastName – firstName – address – SSN  Column Family: salary – ‘dynamic’ columns – year:salary  Row key – lastName:firstName? Not unique – Unique id? Can’t search easily – lastName:firstName:id? Can’t search by id 32
  • 30. Source: “employee class” public class Employee { String key; String lastName, firstName, address; String ssn; Map<Integer, Integer> salary; … } 33
  • 31. Source: „schema‟ byte[] BASE_CF = Bytes.toBytes("base"); byte[] SALARY_CF = Bytes.toBytes("salary"); byte[] FIRST_COL = Bytes.toBytes("firstName"); byte[] LAST_COL = Bytes.toBytes("lastName"); byte[] ADDRESS_COL = Bytes.toBytes("address"); byte[] SSN_COL = Bytes.toBytes("ssn"); String tableName = userdirectory + "/" + shortName; byte[] TABLE_NAME = Bytes.toBytes(tableName); 34
  • 32. Source: “get table” HTablePool pool = new HTablePool(); table = pool.getTable(TABLE_NAME); return table; 35
  • 33. Source: “get row”  Whole row Get g = new Get(Bytes.toBytes(key)); Result result = getTable().get(g);  Just base column family Get g = new Get(Bytes.toBytes(key)); g.addFamily(BASE_CF); Result result = getTable().get(g); 36
  • 34. Source: “parse row” Employee e = new Employee(); e.setKey(Bytes.toString(r.getRow())); e.setLastName(getString(r, BASE_CF, LAST_COL)); e.setFirstName(getString(r,BASE_CF, FIRST_COL)); e.setAddress(getString(r,BASE_CF, ADDRESS_COL)); e.setSsn(getString(r,BASE_CF, SSN_COL)); String getString(Result r, byte[] cf, byte[] col) { byte[] b = r.getValue(cf, col); if (b != null) return Bytes.toString(b); else return ""; } 37
  • 35. Source: “parse row” //get salary information Map<byte[], byte[]> m = r.getFamilyMap(SALARY_CF); Iterator<Map.Entry<byte[], byte[]>> i = m.entrySet().iterator(); while (i.hasNext()) { Map.Entry<byte[], byte[]> entry = i.next(); Integer year = Integer.parseInt(Bytes.toString(entry.getKey())); Integer amt = Integer.parseInt(Bytes.toString( entry.getValue())); e.getSalary().put(year, amt); } 38
  • 36. Demo  Create a table using MCS  Create a table and column families using maprcli 39 $ maprcli table create -path /user/keys/employees $ maprcli table cf create -path /user/keys/employees -cfname base $ maprcli table cf create -path /user/keys/employees -cfname salary
  • 37. Demo  Populate with sample data using hbase shell 40 hbase> put '/user/keys/employees', 'k1', 'base:lastName', 'William' > put '/user/keys/employees', 'k1', 'base:firstName', 'John' > put '/user/keys/employees', 'k1', 'base:address', '123 street, springfield, VA' > put '/user/keys/empoyees', 'k1', 'base:ssn', '999-99-9999' > put '/user/keys/employees', 'k1', 'salary:2010', '90000’ > put '/user/keys/employees', 'k1', 'salary:2011', '91000’ > put '/user/keys/employees', 'k1', 'salary:2012', '92000’ > put '/user/keys/employees', 'k1', 'salary:2013', '93000’ ….….
  • 38. Demo  Fetch record using java program 41 $ ./run employees get k1 Use command get against table /user/keys/employees Employee record: Employee [key=k1, lastName=William, firstName=John, address=123 first street, springfield, VA, ssn=999-99-9999, salary={2010=90000, 2011=91000, 2012=92000, 2013=93000}]
  • 39. Demo – run script 42 #!/bin/bash export LD_LIBRARY_PATH=/opt/mapr/hadoop/hadoop- 0.20.2/lib/native/Linux-amd64-64 java -cp `hbase classpath`:/home/kbotzum/development/exercises/target/exercises.jar person.botzum.hbase.Demo $*
  • 40. What Didn‟t I Consider? 43
  • 41.  Row Key  Secondary ways of searching – Other tables as indexes?  Long term data evolution – Avro? – Protobufs?  Security – SSN is sensitive – Salary looks kind of sensitive What Didn‟t I Consider? 44
  • 42. Agenda  HBase Overview  HBase APIs  MapR Tables  Example  Securing tables
  • 43. MapR Tables Security  Access Control Expressions (ACEs) – Boolean logic to control access at table, column family, and column level 46
  • 44. ACE Highlights  Creator of table has all rights by default – Others have none  Can grant admin rights without granting read/write rights  Defaults for column families set at table level  Access to data depends on column family and column access controls  Boolean logic 47
  • 45. MapR Tables Security  Leverages MapR security when enabled – Wire level authentication – Wire level encryption – Trivial to configure • Most reasonable settings by default • No Kerberos required! – Portable • No MapR specific APIs 48
  • 46. Demo  Enable cluster security  Yes, that’s it! – Now all Web UI and CLI access requires authentication – Traffic is now authenticated using encrypted credentials – Most traffic is encrypted and bulk data transfer traffic can be encrypted 49 # configure.sh –C hostname –Z hostname -secure –genkeys
  • 47. Demo  Fetch record using java program when not authenticated 50 $ ./run employees get k1 Use command get against table /user/keys/employees 14/03/14 18:42:39 ERROR fs.MapRFileSystem: Exception while trying to get currentUser java.io.IOException: failure to login: Unable to obtain MapR credentials
  • 48. Demo  Fetch record using java program 51 $ maprlogin password [Password for user 'keys' at cluster 'my.cluster.com': ] MapR credentials of user 'keys' for cluster 'my.cluster.com' are written to '/tmp/maprticket_1000' $ ./run employees get k1 Use command get against table /user/keys/employees Employee record: Employee [key=k1, lastName=William, firstName=John, address=123 first street, springfield, VA, ssn=999-99-9999, salary={2010=90000, 2011=91000, 2012=92000, 2013=93000}]
  • 49. Demo  Fetch record using java program as someone not authorized to table 52 $ maprlogin password [Password for user 'fred' at cluster 'my.cluster.com': ] MapR credentials of user 'fred' for cluster 'my.cluster.com' are written to '/tmp/maprticket_2001' $ ./run /user/keys/employees get k1 Use command get against table /user/keys/employees 2014-03-14 18:49:20,2787 ERROR JniCommon fs/client/fileclient/cc/jni_common.cc:7318 Thread: 139674989631232 Error in DBGetRPC for table /user/keys/employees, error: Permission denied(13) Exception in thread "main" java.io.IOException: Error: Permission denied(13)
  • 50. Demo  Set ACEs to allow read to base information but not salary  Fetch whole record using java program 53 $ ./run /user/keys/employees get k1 Use command get against table /user/keys/employees 2014-03-14 18:53:15,0806 ERROR JniCommon fs/client/fileclient/cc/jni_common.cc:7318 Thread: 139715048077056 Error in DBGetRPC for table /user/keys/employees, error: Permission denied(13) Exception in thread "main" java.io.IOException: Error: Permission denied(13)
  • 51. Demo  Set ACEs to allow read to base information but not salary  Fetch just base record using java program 54 $ ./run employees getbase k1 Use command get against table /user/keys/employees Employee record: Employee [key=k1, lastName=William, firstName=John, address=123 first street, springfield, VA, ssn=999-99-9999, salary={}]
  • 52. What Else Didn‟t I Consider?  55
  • 53. References  http://www.mapr.com/blog/getting-started-mapr-security-0  http://www.mapr.com/  http://hadoop.apache.org/  http://hbase.apache.org/  http://tech.flurry.com/2012/06/12/137492485/  http://en.wikipedia.org/wiki/Lexicographical_order  Hbase in Action, Nick Dimiduck, Amandeep Khurana  HBase: The Definitive Guide, Lars George  Note: this presentation includes materials from the MapR HBase training classes
  • 54. 57©MapR Technologies © MapR Technologies, confidential Questions? 57
  • 55. 58©MapR Technologies © MapR Technologies, confidential Hbase Architecture
  • 56. What is HBase? (Cluster View)  ZooKeeper (ZK)  HMaster (HM)  Region Servers (RS) For MapR, there is less delineation between Control and Data Nodes. ZooKeeper NameNode A B HMaster C D HMaster ZooKeeper ZooKeeper Master servers Slave servers Region Server Data Node Region Server Data Node Region Server Data Node Region Server Data Node
  • 57. What is a Region?  The basic partitioning/sharding unit of HBase.  Each region is assigned a range of keys it is responsible for.  Region servers serve data for reads and writes Region Server Client Region Region HMaster zookeeper Region Region Region Server Key colB colC val val val Key colB colC val val val Key colB colC val val val Key colB colC val val val zookeeper zookeeper

Notas do Editor

  1. Let’s take a quick look at the relational database model versus non-relational database models. Most of us are familiar with Relational Database Management Systems (RDBMS). We’ll briefly compare the relational model to the column family oriented model in the context of big data. This will help us fully understand the structure of MapR Tables and their underlying concepts.
  2. In the relational model data is normalized, it is split into tables when stored , and then joined back together when queried. We will see that hbase has a different model. Relational databases brought us many benefits: They take care of persistenceThey manage concurrency for transactions. SQL has become a defacto standardRelational databases provide lots of tools , They have become very important for integration of applications and for reportingMany business rules map well to a tabular structure and relationshipsRelational databases provide an efficient and robust structure for storing datastandard model of persistence- standard language of data manipulation (SQL)Relational databases handle concurrency by controlling all access to data through transactions. this transactional mechanism has worked well to contain the complexity of concurrency.shared databases have worked well for integration of applicationsRelational databases have succeeded because they provide these benefits in a standard way
  3. • Row-oriented: Each row is indexed by a key that you can use for lookup.(for example, customer with the ID of 1234) • Column-fanily oriented: Each column family groups like data (customer address, order) within rows. You can think of a row as the join of all values in all column families.Grouping the data by key is central to running on a cluster and sharding. The key acts as the atomic unit for updates.
  4. Data stored in the “big table” is located by it’s “rowkey.” This is like a primary key from a relational database. Records in HBase are stored in sorted order according to rowkey. This is a fundamental tenant of HBase and is also a critical semantic used in HBase schema design. 
  5. Tables are divided into sequences of rows, by key range, called regionsThese Regions are then assigned to the data nodesin the cluster called “RegionServers”. This Scale read and write capacity by spreading acrosscluster.
  6. If a cell is empty then it does not consume disk spaceSparseness provides schema flexibilityAdd columns later, no need totransform entire schema
  7. Once you have created a table you define column families . Columns may be defined on the fly. You can define them ahead of the time but that is not common practice. That’s it. You don’t define rows ahead of time.Table operations are fairly simple.put Inserts data into rows (both add and update)get Accesses data from one rowscan Accesses data from a range of rows
  8. As we go through the details of the HBase API, you will see that there is a pattern that is followed most of the time for CRUD operations.First you instantiate an object for the operation you’re about to execute: put, get, scan or deleteThen you add details to that object and specify what you need from it. You do this by calling an add method and sometimes a set method.Once your object is specified with these attributes you are ready to execute the operation against a table. To do that you invoke the operation with the object you’ve prepared. For example for a put operation you call table.put() and you pass the put object you created as the parameter.Let’s look at the Put operation now.
  9. Here is an example of single put operation. Let’s look at what all this means.
  10. Now that you have an instance of a put object for a specified row key you should provide some details, specifically what value you need to insert or update. In general you add a value for a column that belongs to a column family. That’s the most common case. Just like in the constructor for the Put object itself you don’t have to provide a timestamp but there is a method that lets you control that if you need to by proving a timestamp argument.
  11. This is the same thing as what we saw earlier except that now we add several values to the same put object. Each call to add() specifies exactly one column, or, in combination with an optional timestamp, one single cell.This is just to show you that even though this is a single put operation you typically call add more than once.We saw that one of the add methods takes a KeyValue parameter so let’s look at the KeyValue class.
  12. Everything in Hbase is stored as Bytes. The Bytes class is a utility class that provides methods to convert Java types to and from byte[] arrays.The Native java types supported are String, boolean, short, int, long, double, and float. Bytes The HBase Bytes class is similar to the Java ByteBufferclass but the HBase class performs all of its operations without instantiating new classes (and thus avoids garbage collection)Note to instructor: optional, show the javadoc to point out what conforms and what doesn’t conform to this patternThere are other methods that are worth looking at and we will do that in a later session after we’ve gone through CRUD operations.
  13. Here is an example of a single get operation. You can see it is following the pattern we mentioned earlier. The only notable difference is that we call addColumn instead of just an add. Let’s look at all this in detail now.
  14. You call add to specify what you want returned this is similar to what we saw for Put except that here you specify the family or column you are interested in.If you want to be more precise then you should call one of the set methods to be more specific. You can control what timestamp or time range you are interested in, how many versions of the data you want to see. You can even add a filter and we will talk about filters later as they deserve more than just passing attention.
  15. In this get operation we have narrowed things down to a specific column. Once we got the result back we invoke one of the convenience methods from Result, here getValue, to retrieve the value in the Result instance.To see more about the Result class go to http://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/client/Result.htmlWe’ve added and retrieved data so now to complete the CRUD cycle we need to look at deleting data.
  16. (*) MapR takes things one step further, by integrating table storage into the MapR-FS, eliminating all JVM layers and interacting directly with disks for both file and table storage. The result is an enterprise-grade datastore for tables with all the reliability of a MapR cluster, and no additional administrative burden. removed layersfewer layers unified namespaceAgain, (*) MapR preserves the standard Hadoop and HBase APIs, so all ecosystem components continue to operate without modification.Fewer layersSingle hop to dataNo compactions, low i/o amplificationSeamless splits, automatic mergesInstant recoveryWith the MapR M5 Edition of the Hadoop stack, the company basically pushed HDFS down into a distributed NFS file system, supporting all of the APIs of HDFS . with MapR M7 Edition, the file system can not only handle small chunks of data but also small pieces of HBase tables. This eliminates some layers of Java virtualization, and the way that MapR has implemented its code, all of the HBase APIs are supported so hbaseapplications don&apos;t know they are using MapR&apos;s file system.
  17. In MapR tables are part of the file system so it’s a single hop. Single hop means client to MapR FS that handles write/read operations to the file system directly.MapRFilesystem is an integrated systemTables and Files in a unified filesystem, based on MapR’s enterprise-grade storage layer.MapR tables use the HBase data model and APIKey differences between MapR tables and Apache HBaseTables are part of the MapR File systemNo RegionServers and HBaseMaster daemonsWrite-ahead logs (WALs) are much smallerNo manual compactionNo major compaction delaysRegion splits are seamless and require no manual interventionIn MapR tables are part of the file system so it’s a single hop. Single hop means client to MapR FS that handles write/read operations to the file system directly.Seamless splits, no compaction and small WALs
  18. MapR Filesystem provides strong consistency for table data and a high level of availability in a distributed environment, while also solving the common problems with other popular NoSQL options, such as compaction delays and manual administration.
  19. Can also use hbase shell&gt; create &apos;/user/keys/e3&apos;, &apos;base&apos;, &apos;salary&apos;
  20. hbase shell
  21. Use MCS to set ACEs
  22. Use MCS to set ACEs
  23. ACE on SSN columnFiltering out responses (coming soon in fix)
  24. Let&apos;s review the HBase data model as a quick refresher of terms and concepts.