SlideShare uma empresa Scribd logo
1 de 51
Baixar para ler offline
London
Hadoop User
Group
Deep experience in
building and
operating global web
scale systems
About	
  Amazon	
  
Web	
  Services	
  
?
…get into cloud computing?
How did Amazon…
Utility computing
On demand Pay as you go
Uniform Available
Utility computing
On demand Pay as you go
Uniform Available
Utility computing
Utility computing
On demand Pay as you go
Uniform Available
Compute	
  
Storage	
  
Security	
  
Scaling	
  
Database	
  
Networking	
  
Monitoring	
  
Messaging	
  
Workflow	
  
DNS	
  
Load	
  Balancing	
  
Backup	
  CDN	
  
No	
  Up-­‐Front	
  
Capital	
  Expense	
  
Pay	
  Only	
  for	
  
What	
  You	
  Use	
  
Self-­‐Service	
  
Infrastructure	
  
Easily	
  Scale	
  Up	
  
and	
  Down	
  
Improve	
  Agility	
  &	
  
Time-­‐to-­‐Market	
  
Low	
  Cost	
  
Deploy
Cloud computing benefits
Traditional IT
capacity
ElasNc	
  capacity	
  
Capacity
Time
Your IT needs
On	
  and	
  Off	
   Fast	
  Growth	
  
Variable	
  peaks	
   Predictable	
  peaks	
  
ElasNc	
  capacity	
  
ElasNc	
  capacity	
  
On	
  and	
  Off	
   Fast	
  Growth	
  
Predictable	
  peaks	
  Variable	
  peaks	
  
WASTE
CUSTOMER DISSATISFACTION
ElasNc	
  capacity	
  
Fast	
  Growth	
  On	
  and	
  Off	
  
Predictable	
  peaks	
  Variable	
  peaks	
  
NumberofEC2Instances
4/12/2008 4/14/2008 4/15/2008 4/16/2008 4/18/2008 4/19/2008 4/20/20084/17/20084/13/2008
40	
  servers	
  to	
  5000	
  in	
  3	
  days	
  
EC2 scaled to peak of 5000
instances
“Techcrunched”
Launch of Facebook
modification
Steady state of ~40
instances
Compute	
   Storage	
  
AWS	
  Global	
  Infrastructure	
  
Database	
  
App	
  Services	
  
Deployment	
  &	
  AdministraNon	
  
Networking	
  
Global Infrastructure
Global Infrastructure
Region
US-WEST (N. California)
 EU-WEST (Ireland)
ASIA PAC (Tokyo)
ASIA PAC
(Singapore)
US-WEST (Oregon)
SOUTH AMERICA (Sao Paulo)
US-EAST (Virginia)
GOV CLOUD
ASIA PAC
(Sydney)
Availability Zone
Global Infrastructure
Customer Needs
•  Store	
  Any	
  Amount	
  of	
  Data	
  
–  Without	
  Capacity	
  Planning	
  
•  Perform	
  Complex	
  Analysis	
  on	
  Any	
  Data	
  
–  Scale	
  on	
  Demand	
  
•  Store	
  Data	
  Securely	
  
•  Decrease	
  Time	
  to	
  Market	
  
–  Build	
  Environments	
  Quickly	
  
•  Reduce	
  Costs	
  
–  Reduce	
  Capital	
  Expenditure	
  
•  Enable	
  Global	
  Reach	
  
IngesNon	
  |	
  IntegraNon	
  
ElasNc	
  Block	
  Store	
  
High performance block storage
device
1GB to 1TB in size
Mount as drives to instances with
snapshot/cloning functionalities
IMAGE
Availability
99.99%
Durability
99.999999999%
Is a Web Store
Not a file system
No Single Points of Failure
Eventually consistent
Paradigm Object store
Performance Very Fast
Redundancy Across Availability Zones
Security Public Key / Private Key
Pricing $0.095/GB/month
Typical use
case
Write once, read many
Limits 100 Buckets, Unlimited
Storage, 5TB Objects
Simple	
  Storage	
  Service	
  
Highly	
  scalable	
  object	
  storage	
  for	
  the	
  internet	
  
1	
  byte	
  to	
  5TB	
  in	
  size	
  
99.999999999%	
  durability	
  
Peak Requests: 830,000+ per second
Total Number of Objects Stored in Amazon S3
14 Billion
 40 Billion
102 Billion
762 Billion
262 Billion
1.3 Trillion
Q4 2006 Q4 2007 Q4 2008 Q4 2009 Q4 2010 Q4 2011 Q4 2012
Objects in S3
Glacier	
  
Long	
  term	
  object	
  archive	
  
Extremely	
  low	
  cost	
  per	
  gigabyte	
  
99.999999999%	
  durability	
  
ElasNc	
  Block	
  Store	
  
High performance block storage
device
1GB to 1TB in size
Mount as drives to instances with
snapshot/cloning functionalities
IMAGE
Durability
99.999999999%
Designed for Archival
Not a file system
Vaults & Archives
3-5 Hour Retrieval Time
Paradigm Archive Store
Performance Configurable - Low
Redundancy Across Availability Zones
Security Public Key / Private Key
Pricing $0.011/GB/month
Typical use
case
Write once, read
infrequently
< 10% / Month
Simple	
  Storage	
  Service	
  
Highly	
  scalable	
  object	
  storage	
  
1	
  byte	
  to	
  5TB	
  in	
  size	
  
99.999999999%	
  durability	
  
Glacier	
  
Long	
  term	
  object	
  archive	
  
Extremely	
  low	
  cost	
  per	
  gigabyte	
  
99.999999999%	
  durability	
  
Storage	
  Lifecycle	
  IntegraNon	
  
Structured	
  Data	
  Management	
  
Compute	
   Storage	
  
AWS	
  Global	
  Infrastructure	
  
Database	
  
App	
  Services	
  
Deployment	
  &	
  AdministraNon	
  
Networking	
  
Database
Relational Database Service
Managed Oracle, MySQL & SQL Server
Dynamo DB
Managed NOSQL Database
Amazon Redshift
Massively Parallel Petabyte Scale Data Warehouse
RDS Dynamo
DB
Redshift
Compute	
   Storage	
  
AWS	
  Global	
  Infrastructure	
  
Database	
  
App	
  Services	
  
Deployment	
  &	
  AdministraNon	
  
Networking	
  
Database
Relational Database Service
Database-as-a-Service
No need to install or manage database instances
Scalable and fault tolerant configurations
Integration with Data Pipeline
RDS Dynamo
DB
Redshift
Compute	
   Storage	
  
AWS	
  Global	
  Infrastructure	
  
Database	
  
App	
  Services	
  
Deployment	
  &	
  AdministraNon	
  
Networking	
  
Database
DynamoDB
Provisioned throughput NoSQL database
Fast, predictable, configurable performance
Fully distributed, fault tolerant HA architecture
Integration with EMR & Hive
RDS Dynamo
DB
Redshift
Compute	
   Storage	
  
AWS	
  Global	
  Infrastructure	
  
Database	
  
App	
  Services	
  
Deployment	
  &	
  AdministraNon	
  
Networking	
  
Database
Redshift
Managed Massively Parallel Petabyte Scale Data
Warehouse
Streaming Backup/Restore to S3
Extensive Security
2 TB -> 1.6 PB
RDS Dynamo
DB
Redshift
Unstructured	
  Data	
  
…	
  
Parallel	
  ETL	
  
Elastic MapReduce
Managed, elastic Hadoop cluster
Integrates with S3 & DynamoDB
Leverage Hive & Pig analytics scripts
Support for Spot Instances
Integrated HBase NOSQL Database
Compute	
   Storage	
  
AWS	
  Global	
  Infrastructure	
  
Database	
  
App	
  Services	
  
Deployment	
  &	
  AdministraNon	
  
Networking	
  
Application Services
Elastic
MapReduce
•  AWS Web Console
•  Command Line
elastic-­‐mapreduce	
  -­‐-­‐create	
  -­‐-­‐key-­‐pair	
  micro	
  -­‐-­‐region	
  eu-­‐
west-­‐1	
  -­‐-­‐name	
  IanMM-­‐Test1	
  -­‐-­‐num-­‐instances	
  5	
  -­‐-­‐instance-­‐
type	
  m2.4xlarge	
  –alive	
  -­‐-­‐log-­‐uri	
  s3n://meyersi-­‐ire/EMR/
log	
  
Launching Clusters
•  Enabling Tools
elastic-­‐mapreduce	
  -­‐-­‐create	
  -­‐-­‐key-­‐pair	
  micro	
  -­‐-­‐region	
  eu-­‐west-­‐1	
  -­‐-­‐
name	
  IanMM-­‐Test1	
  -­‐-­‐num-­‐instances	
  5	
  -­‐-­‐instance-­‐type	
  m2.4xlarge	
  -­‐-­‐
alive	
  	
  
-­‐-­‐pig-­‐interactive	
  -­‐-­‐pig-­‐versions	
  latest	
  
-­‐-­‐hive-­‐interactive	
  –-­‐hive-­‐versions	
  latest	
  
-­‐-­‐hbase	
  	
  
-­‐-­‐log-­‐uri	
  s3n://meyersi-­‐ire/EMR/log	
  
Launching Clusters
•  Hadoop Configuration Bootstrap Action
elastic-­‐mapreduce	
  -­‐-­‐create	
  -­‐-­‐bootstrap-­‐action	
  
s3://elasticmapreduce/bootstrap-­‐
actions/configure-­‐hadoop	
  -­‐-­‐args	
  "-­‐
s,dfs.block.size=1048576”	
  -­‐-­‐key-­‐pair	
  micro	
  
-­‐-­‐region	
  eu-­‐west-­‐1	
  -­‐-­‐name	
  IanMM-­‐Test-­‐3	
  -­‐-­‐instance-­‐group	
  
core	
  -­‐-­‐instance-­‐count	
  2	
  -­‐-­‐instance-­‐type	
  m2.4xlarge	
  -­‐-­‐
instance-­‐group	
  task	
  -­‐-­‐instance-­‐count	
  2	
  -­‐-­‐instance-­‐type	
  
m2.4xlarge	
  -­‐-­‐alive	
  -­‐-­‐pig-­‐interactive	
  -­‐-­‐hive-­‐interactive	
  
-­‐-­‐log-­‐uri	
  s3n://meyersi-­‐ire/EMR/log	
  
Launching Clusters
Input Datanode: This could be a S3 bucket, RDS
table, EMR Hive table, etc. 	
  
Activity: This is a data aggregation,
manipulation, or copy that runs on a user-
configured schedule.
Output Datanode: This supports all the same
datasources as the input datanode, but they don’t
have to be the same type.	
  
Amazon Data Pipeline
Output:	
  S3	
  file	
  
Path:	
  s3://trend-­‐data/#{year-­‐month-­‐day}.csv	
  
AcNvity:	
  EMR	
  Transform	
  
Hive	
  Query:	
  user-­‐metrics.hql	
  
Frequency:	
  Daily	
  
Input:	
  RDS	
  Table	
  
Table:	
  User-­‐Demographics	
  
SQL	
  PrecondiNon:	
  	
  “Select	
  last_update	
  from	
  table“	
  >	
  #{YY-­‐MM-­‐DD}	
  
Input:	
  DynamoDB	
  Table	
  
Table:	
  User-­‐Event-­‐Data-­‐#{year-­‐month}	
  
Success	
  NoNficaNon:	
  metrics@example.com	
  
Failure	
  NoNficaNon:	
  emr-­‐admin@example.com	
  
Delay	
  NoNficaNon:	
  :	
  emr-­‐admin@example.com	
  
	
  
Orchestration with Data Pipeline
Analytics Pipeline
Redshift
S3
RDS
EMR
Data Pipeline
…collect & store
…orchestrate
…process & analyse
Dynamo DB
Benefits only possible in the Cloud
Pay as you
Go
Lower
Overall
Costs
Stop
Guessing
Capacity
Agility /
Speed /
Innovation
Avoid
Undifferentiated
Heavy Lifting
Go Global
in Minutes
✔ ✔ ✔ ✔ ✔ ✔
“Private
Cloud” /
On
Premises
X X X X X X
Agility & Global Reach

at the Core of EMR
Ease of Operation
Compute	
  Infrastructure	
  
Hadoop	
  ConfiguraNon	
   Local	
  Disk	
   OperaNng	
  System	
  Config	
  
HDFS	
  
Networking	
  
Hive	
   Pig	
   HBase	
  
User	
  Defined	
  Sogware	
  InstallaNon	
  
Ease of Operation
Compute	
  Infrastructure	
  
Hadoop	
  
ConfiguraNon	
  
Local	
  Disk	
  
OperaNng	
  
System	
  Config	
  
HDFS	
  
Networking	
  
Hive	
  
Pig	
  
HBase	
  
User	
  Defined	
  Sogware	
  InstallaNon	
  
Multiple Hadoop
Distributions - Open Source
& MapR
Clusters Launched with 1
Command
Up in 5 Minutes
Hard Partitioned per
Customer on CPU, Memory
and Disk
Dynamic Cluster Resizing
In any of 8 Regions around
the Globe
Lower Overall Costs

Cheaper | Spot Market Management
Lower TCO
June	
  2013	
  Study	
  by	
  Accenture	
  
Technology	
  Labs	
  
	
  
	
  
Not	
  Sponsored	
  or	
  Funded	
  by	
  Amazon	
  
	
  
	
  
“Accenture	
  assessed	
  the	
  price-­‐
performance	
  raJo	
  between	
  bare-­‐metal	
  
Hadoop	
  clusters	
  and	
  Hadoop-­‐as-­‐a-­‐Service	
  
on	
  Amazon	
  Web	
  Services…[and]	
  revealed	
  
that	
  Hadoop-­‐as-­‐a-­‐Service	
  offers	
  bePer	
  
price-­‐performance	
  raJo…”	
  
	
  
	
  
	
  
hkp://www.accenture.com/us-­‐en/Pages/insight-­‐hadoop-­‐
deployment-­‐comparison.aspx	
  
•  Spot allows customers
to bid on unused EC2
capacity
•  Spot price based on
supply/demand of
instance types in an
Availability Zone
•  Customers are fulfilled
when their bid price is
higher than the Spot
Price
•  Instances will be
interrupted when the
Spot price exceed the
bid price
Spot 101 - What are Spot Instances
elastic-mapreduce --add-instance-group TASK --instance-count 100 --bid-price .4
Mix Spot and On-Demand instances to reduce cost and
accelerate computation while protecting against interruption
#1: Cost without Spot
4 instances *14 hrs * $0.50 = $28
Job Flow
14 Hours
Duration:
Other EMR + Spot Use Cases
§ Run entire cluster on Spot for biggest cost savings
§ Reduce the cost of application testing
#2: Cost with Spot
4 instances *7 hrs * $0.50 = $14 +
5 instances * 7 hrs * $0.25 = $8.75
Total = $22.75
Scenario #1
Duration:
Job Flow
7 Hours
Scenario #2
Time Savings: 50%
Cost Savings: ~20%
Reducing Hadoop Costs with Spot
Stop Guessing Capacity

Dynamic Clusters
Extend on-premise environments…
with Amazon VPC…
Populate as demand dictates…
Connect over dedicated links…
And turn it off when you are done
EMR is Hadoop…

…cheaper, easier, and more agile
What’s New?
•  MapR M7 Introduction
•  Optimised for HBase Clusters
•  Failure Recovery
•  Point in Time Recovery
Snapshotting
•  Low Latency Hadoop Optimisations
•  HBase Mirroring
•  NFS + HDFS
•  MapR M5 Price Drop
•  Support for Pig 0.11.1
•  RANK, CUBE & ROLLUP capability
•  Groovy UDF’s
•  Support for Guava Functions
•  Performance Improvements
•  Spark/Shark Bootstrap
Action
•  In Memory Hadoop
•  Spark Scripting (similar to Pig)
•  Shark Shell with Hive
Interoperability

Mais conteúdo relacionado

Mais procurados

Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageSATOSHI TAGOMORI
 
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...Databricks
 
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkMichael Stack
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data PlatformAmazon Web Services
 
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC timeHBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC timeMichael Stack
 
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...Databricks
 
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...Amazon Web Services
 
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...Amazon Web Services
 
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...Amazon Web Services
 
Querying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS AthenaQuerying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS AthenaYaroslav Tkachenko
 
Optimizing Storage for Big Data/Analytics Workloads
Optimizing Storage for Big Data/Analytics WorkloadsOptimizing Storage for Big Data/Analytics Workloads
Optimizing Storage for Big Data/Analytics WorkloadsAmazon Web Services
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceHBaseCon
 
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudHBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudMichael Stack
 
BigData: AWS RedShift with S3, EC2
BigData: AWS RedShift with S3, EC2BigData: AWS RedShift with S3, EC2
BigData: AWS RedShift with S3, EC2Paulraj Pappaiah
 

Mais procurados (20)

Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
 
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
 
Processing and Analytics
Processing and AnalyticsProcessing and Analytics
Processing and Analytics
 
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
 
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC timeHBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
 
Data Collection and Storage
Data Collection and StorageData Collection and Storage
Data Collection and Storage
 
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
 
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
 
Big Data Tools in AWS
Big Data Tools in AWSBig Data Tools in AWS
Big Data Tools in AWS
 
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...
 
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
 
Querying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS AthenaQuerying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS Athena
 
Amazon Redshift Masterclass
Amazon Redshift MasterclassAmazon Redshift Masterclass
Amazon Redshift Masterclass
 
Optimizing Storage for Big Data/Analytics Workloads
Optimizing Storage for Big Data/Analytics WorkloadsOptimizing Storage for Big Data/Analytics Workloads
Optimizing Storage for Big Data/Analytics Workloads
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
 
Masterclass - Redshift
Masterclass - RedshiftMasterclass - Redshift
Masterclass - Redshift
 
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudHBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
 
BigData: AWS RedShift with S3, EC2
BigData: AWS RedShift with S3, EC2BigData: AWS RedShift with S3, EC2
BigData: AWS RedShift with S3, EC2
 

Semelhante a Amazon Elastic Map Reduce - Ian Meyers

Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
 
2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
2017 AWS DB Day |  AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?2017 AWS DB Day |  AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?Amazon Web Services Korea
 
Lunch and Learn - Store and Move your Data To & From the AWS Cloud, Markku Le...
Lunch and Learn - Store and Move your Data To & From the AWS Cloud, Markku Le...Lunch and Learn - Store and Move your Data To & From the AWS Cloud, Markku Le...
Lunch and Learn - Store and Move your Data To & From the AWS Cloud, Markku Le...Amazon Web Services
 
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介Amazon Web Services Japan
 
Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Amazon Web Services
 
AWS Cloud Kata | Manila - Getting to Scale on AWS
AWS Cloud Kata | Manila - Getting to Scale on AWSAWS Cloud Kata | Manila - Getting to Scale on AWS
AWS Cloud Kata | Manila - Getting to Scale on AWSAmazon Web Services
 
AWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAmazon Web Services
 
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹Amazon Web Services
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924Amazon Web Services
 
Your First 10 million Users on the AWS Cloud
Your First 10 million Users on the AWS CloudYour First 10 million Users on the AWS Cloud
Your First 10 million Users on the AWS CloudAmazon Web Services
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Precisely
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
(DAT202) Managed Database Options on AWS
(DAT202) Managed Database Options on AWS(DAT202) Managed Database Options on AWS
(DAT202) Managed Database Options on AWSAmazon Web Services
 
Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)Rasmus Ekman
 

Semelhante a Amazon Elastic Map Reduce - Ian Meyers (20)

Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Analytics on AWS - IP Expo 2013
Analytics on AWS - IP Expo 2013Analytics on AWS - IP Expo 2013
Analytics on AWS - IP Expo 2013
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
2017 AWS DB Day |  AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?2017 AWS DB Day |  AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
 
Lunch and Learn - Store and Move your Data To & From the AWS Cloud, Markku Le...
Lunch and Learn - Store and Move your Data To & From the AWS Cloud, Markku Le...Lunch and Learn - Store and Move your Data To & From the AWS Cloud, Markku Le...
Lunch and Learn - Store and Move your Data To & From the AWS Cloud, Markku Le...
 
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
 
Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”
 
AWS Cloud Kata | Manila - Getting to Scale on AWS
AWS Cloud Kata | Manila - Getting to Scale on AWSAWS Cloud Kata | Manila - Getting to Scale on AWS
AWS Cloud Kata | Manila - Getting to Scale on AWS
 
AWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon Redshift
 
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 
Your First 10 million Users on the AWS Cloud
Your First 10 million Users on the AWS CloudYour First 10 million Users on the AWS Cloud
Your First 10 million Users on the AWS Cloud
 
AWS Analytics
AWS AnalyticsAWS Analytics
AWS Analytics
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
(DAT202) Managed Database Options on AWS
(DAT202) Managed Database Options on AWS(DAT202) Managed Database Options on AWS
(DAT202) Managed Database Options on AWS
 
Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)
 

Mais de huguk

Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifactahuguk
 
ether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introhuguk
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoophuguk
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...huguk
 
Extracting maximum value from data while protecting consumer privacy. Jason ...
Extracting maximum value from data while protecting consumer privacy.  Jason ...Extracting maximum value from data while protecting consumer privacy.  Jason ...
Extracting maximum value from data while protecting consumer privacy. Jason ...huguk
 
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM WatsonIntelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watsonhuguk
 
Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink huguk
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLhuguk
 
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...huguk
 
Jonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & PitchingJonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & Pitchinghuguk
 
Signal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News MonitoringSignal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News Monitoringhuguk
 
Dean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your StartupDean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your Startuphuguk
 
Peter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapultPeter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapulthuguk
 
Cytora: Real-Time Political Risk Analysis
Cytora:  Real-Time Political Risk AnalysisCytora:  Real-Time Political Risk Analysis
Cytora: Real-Time Political Risk Analysishuguk
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analyticshuguk
 
Bird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made SocialBird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made Socialhuguk
 
Aiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine IntelligenceAiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine Intelligencehuguk
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive huguk
 
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...huguk
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthyhuguk
 

Mais de huguk (20)

Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
 
ether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp intro
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
 
Extracting maximum value from data while protecting consumer privacy. Jason ...
Extracting maximum value from data while protecting consumer privacy.  Jason ...Extracting maximum value from data while protecting consumer privacy.  Jason ...
Extracting maximum value from data while protecting consumer privacy. Jason ...
 
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM WatsonIntelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
 
Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
 
Jonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & PitchingJonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & Pitching
 
Signal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News MonitoringSignal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News Monitoring
 
Dean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your StartupDean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your Startup
 
Peter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapultPeter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapult
 
Cytora: Real-Time Political Risk Analysis
Cytora:  Real-Time Political Risk AnalysisCytora:  Real-Time Political Risk Analysis
Cytora: Real-Time Political Risk Analysis
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analytics
 
Bird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made SocialBird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made Social
 
Aiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine IntelligenceAiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine Intelligence
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive
 
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
 

Último

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Último (20)

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Amazon Elastic Map Reduce - Ian Meyers

  • 2. Deep experience in building and operating global web scale systems About  Amazon   Web  Services   ? …get into cloud computing? How did Amazon…
  • 3. Utility computing On demand Pay as you go Uniform Available
  • 4. Utility computing On demand Pay as you go Uniform Available
  • 6. Utility computing On demand Pay as you go Uniform Available Compute   Storage   Security   Scaling   Database   Networking   Monitoring   Messaging   Workflow   DNS   Load  Balancing   Backup  CDN  
  • 7. No  Up-­‐Front   Capital  Expense   Pay  Only  for   What  You  Use   Self-­‐Service   Infrastructure   Easily  Scale  Up   and  Down   Improve  Agility  &   Time-­‐to-­‐Market   Low  Cost   Deploy Cloud computing benefits
  • 8. Traditional IT capacity ElasNc  capacity   Capacity Time Your IT needs
  • 9. On  and  Off   Fast  Growth   Variable  peaks   Predictable  peaks   ElasNc  capacity  
  • 10. ElasNc  capacity   On  and  Off   Fast  Growth   Predictable  peaks  Variable  peaks   WASTE CUSTOMER DISSATISFACTION
  • 11. ElasNc  capacity   Fast  Growth  On  and  Off   Predictable  peaks  Variable  peaks  
  • 12. NumberofEC2Instances 4/12/2008 4/14/2008 4/15/2008 4/16/2008 4/18/2008 4/19/2008 4/20/20084/17/20084/13/2008 40  servers  to  5000  in  3  days   EC2 scaled to peak of 5000 instances “Techcrunched” Launch of Facebook modification Steady state of ~40 instances
  • 13. Compute   Storage   AWS  Global  Infrastructure   Database   App  Services   Deployment  &  AdministraNon   Networking   Global Infrastructure
  • 14. Global Infrastructure Region US-WEST (N. California) EU-WEST (Ireland) ASIA PAC (Tokyo) ASIA PAC (Singapore) US-WEST (Oregon) SOUTH AMERICA (Sao Paulo) US-EAST (Virginia) GOV CLOUD ASIA PAC (Sydney)
  • 16. Customer Needs •  Store  Any  Amount  of  Data   –  Without  Capacity  Planning   •  Perform  Complex  Analysis  on  Any  Data   –  Scale  on  Demand   •  Store  Data  Securely   •  Decrease  Time  to  Market   –  Build  Environments  Quickly   •  Reduce  Costs   –  Reduce  Capital  Expenditure   •  Enable  Global  Reach  
  • 18. ElasNc  Block  Store   High performance block storage device 1GB to 1TB in size Mount as drives to instances with snapshot/cloning functionalities IMAGE Availability 99.99% Durability 99.999999999% Is a Web Store Not a file system No Single Points of Failure Eventually consistent Paradigm Object store Performance Very Fast Redundancy Across Availability Zones Security Public Key / Private Key Pricing $0.095/GB/month Typical use case Write once, read many Limits 100 Buckets, Unlimited Storage, 5TB Objects Simple  Storage  Service   Highly  scalable  object  storage  for  the  internet   1  byte  to  5TB  in  size   99.999999999%  durability  
  • 19. Peak Requests: 830,000+ per second Total Number of Objects Stored in Amazon S3 14 Billion 40 Billion 102 Billion 762 Billion 262 Billion 1.3 Trillion Q4 2006 Q4 2007 Q4 2008 Q4 2009 Q4 2010 Q4 2011 Q4 2012 Objects in S3
  • 20. Glacier   Long  term  object  archive   Extremely  low  cost  per  gigabyte   99.999999999%  durability   ElasNc  Block  Store   High performance block storage device 1GB to 1TB in size Mount as drives to instances with snapshot/cloning functionalities IMAGE Durability 99.999999999% Designed for Archival Not a file system Vaults & Archives 3-5 Hour Retrieval Time Paradigm Archive Store Performance Configurable - Low Redundancy Across Availability Zones Security Public Key / Private Key Pricing $0.011/GB/month Typical use case Write once, read infrequently < 10% / Month
  • 21. Simple  Storage  Service   Highly  scalable  object  storage   1  byte  to  5TB  in  size   99.999999999%  durability   Glacier   Long  term  object  archive   Extremely  low  cost  per  gigabyte   99.999999999%  durability   Storage  Lifecycle  IntegraNon  
  • 23. Compute   Storage   AWS  Global  Infrastructure   Database   App  Services   Deployment  &  AdministraNon   Networking   Database Relational Database Service Managed Oracle, MySQL & SQL Server Dynamo DB Managed NOSQL Database Amazon Redshift Massively Parallel Petabyte Scale Data Warehouse RDS Dynamo DB Redshift
  • 24. Compute   Storage   AWS  Global  Infrastructure   Database   App  Services   Deployment  &  AdministraNon   Networking   Database Relational Database Service Database-as-a-Service No need to install or manage database instances Scalable and fault tolerant configurations Integration with Data Pipeline RDS Dynamo DB Redshift
  • 25. Compute   Storage   AWS  Global  Infrastructure   Database   App  Services   Deployment  &  AdministraNon   Networking   Database DynamoDB Provisioned throughput NoSQL database Fast, predictable, configurable performance Fully distributed, fault tolerant HA architecture Integration with EMR & Hive RDS Dynamo DB Redshift
  • 26. Compute   Storage   AWS  Global  Infrastructure   Database   App  Services   Deployment  &  AdministraNon   Networking   Database Redshift Managed Massively Parallel Petabyte Scale Data Warehouse Streaming Backup/Restore to S3 Extensive Security 2 TB -> 1.6 PB RDS Dynamo DB Redshift
  • 27. Unstructured  Data   …   Parallel  ETL  
  • 28. Elastic MapReduce Managed, elastic Hadoop cluster Integrates with S3 & DynamoDB Leverage Hive & Pig analytics scripts Support for Spot Instances Integrated HBase NOSQL Database Compute   Storage   AWS  Global  Infrastructure   Database   App  Services   Deployment  &  AdministraNon   Networking   Application Services Elastic MapReduce
  • 29. •  AWS Web Console •  Command Line elastic-­‐mapreduce  -­‐-­‐create  -­‐-­‐key-­‐pair  micro  -­‐-­‐region  eu-­‐ west-­‐1  -­‐-­‐name  IanMM-­‐Test1  -­‐-­‐num-­‐instances  5  -­‐-­‐instance-­‐ type  m2.4xlarge  –alive  -­‐-­‐log-­‐uri  s3n://meyersi-­‐ire/EMR/ log   Launching Clusters
  • 30. •  Enabling Tools elastic-­‐mapreduce  -­‐-­‐create  -­‐-­‐key-­‐pair  micro  -­‐-­‐region  eu-­‐west-­‐1  -­‐-­‐ name  IanMM-­‐Test1  -­‐-­‐num-­‐instances  5  -­‐-­‐instance-­‐type  m2.4xlarge  -­‐-­‐ alive     -­‐-­‐pig-­‐interactive  -­‐-­‐pig-­‐versions  latest   -­‐-­‐hive-­‐interactive  –-­‐hive-­‐versions  latest   -­‐-­‐hbase     -­‐-­‐log-­‐uri  s3n://meyersi-­‐ire/EMR/log   Launching Clusters
  • 31. •  Hadoop Configuration Bootstrap Action elastic-­‐mapreduce  -­‐-­‐create  -­‐-­‐bootstrap-­‐action   s3://elasticmapreduce/bootstrap-­‐ actions/configure-­‐hadoop  -­‐-­‐args  "-­‐ s,dfs.block.size=1048576”  -­‐-­‐key-­‐pair  micro   -­‐-­‐region  eu-­‐west-­‐1  -­‐-­‐name  IanMM-­‐Test-­‐3  -­‐-­‐instance-­‐group   core  -­‐-­‐instance-­‐count  2  -­‐-­‐instance-­‐type  m2.4xlarge  -­‐-­‐ instance-­‐group  task  -­‐-­‐instance-­‐count  2  -­‐-­‐instance-­‐type   m2.4xlarge  -­‐-­‐alive  -­‐-­‐pig-­‐interactive  -­‐-­‐hive-­‐interactive   -­‐-­‐log-­‐uri  s3n://meyersi-­‐ire/EMR/log   Launching Clusters
  • 32. Input Datanode: This could be a S3 bucket, RDS table, EMR Hive table, etc.   Activity: This is a data aggregation, manipulation, or copy that runs on a user- configured schedule. Output Datanode: This supports all the same datasources as the input datanode, but they don’t have to be the same type.   Amazon Data Pipeline
  • 33. Output:  S3  file   Path:  s3://trend-­‐data/#{year-­‐month-­‐day}.csv   AcNvity:  EMR  Transform   Hive  Query:  user-­‐metrics.hql   Frequency:  Daily   Input:  RDS  Table   Table:  User-­‐Demographics   SQL  PrecondiNon:    “Select  last_update  from  table“  >  #{YY-­‐MM-­‐DD}   Input:  DynamoDB  Table   Table:  User-­‐Event-­‐Data-­‐#{year-­‐month}   Success  NoNficaNon:  metrics@example.com   Failure  NoNficaNon:  emr-­‐admin@example.com   Delay  NoNficaNon:  :  emr-­‐admin@example.com     Orchestration with Data Pipeline
  • 34. Analytics Pipeline Redshift S3 RDS EMR Data Pipeline …collect & store …orchestrate …process & analyse Dynamo DB
  • 35. Benefits only possible in the Cloud Pay as you Go Lower Overall Costs Stop Guessing Capacity Agility / Speed / Innovation Avoid Undifferentiated Heavy Lifting Go Global in Minutes ✔ ✔ ✔ ✔ ✔ ✔ “Private Cloud” / On Premises X X X X X X
  • 36. Agility & Global Reach at the Core of EMR
  • 37. Ease of Operation Compute  Infrastructure   Hadoop  ConfiguraNon   Local  Disk   OperaNng  System  Config   HDFS   Networking   Hive   Pig   HBase   User  Defined  Sogware  InstallaNon  
  • 38. Ease of Operation Compute  Infrastructure   Hadoop   ConfiguraNon   Local  Disk   OperaNng   System  Config   HDFS   Networking   Hive   Pig   HBase   User  Defined  Sogware  InstallaNon   Multiple Hadoop Distributions - Open Source & MapR Clusters Launched with 1 Command Up in 5 Minutes Hard Partitioned per Customer on CPU, Memory and Disk Dynamic Cluster Resizing In any of 8 Regions around the Globe
  • 39. Lower Overall Costs Cheaper | Spot Market Management
  • 40. Lower TCO June  2013  Study  by  Accenture   Technology  Labs       Not  Sponsored  or  Funded  by  Amazon       “Accenture  assessed  the  price-­‐ performance  raJo  between  bare-­‐metal   Hadoop  clusters  and  Hadoop-­‐as-­‐a-­‐Service   on  Amazon  Web  Services…[and]  revealed   that  Hadoop-­‐as-­‐a-­‐Service  offers  bePer   price-­‐performance  raJo…”         hkp://www.accenture.com/us-­‐en/Pages/insight-­‐hadoop-­‐ deployment-­‐comparison.aspx  
  • 41. •  Spot allows customers to bid on unused EC2 capacity •  Spot price based on supply/demand of instance types in an Availability Zone •  Customers are fulfilled when their bid price is higher than the Spot Price •  Instances will be interrupted when the Spot price exceed the bid price Spot 101 - What are Spot Instances
  • 42. elastic-mapreduce --add-instance-group TASK --instance-count 100 --bid-price .4
  • 43. Mix Spot and On-Demand instances to reduce cost and accelerate computation while protecting against interruption #1: Cost without Spot 4 instances *14 hrs * $0.50 = $28 Job Flow 14 Hours Duration: Other EMR + Spot Use Cases § Run entire cluster on Spot for biggest cost savings § Reduce the cost of application testing #2: Cost with Spot 4 instances *7 hrs * $0.50 = $14 + 5 instances * 7 hrs * $0.25 = $8.75 Total = $22.75 Scenario #1 Duration: Job Flow 7 Hours Scenario #2 Time Savings: 50% Cost Savings: ~20% Reducing Hadoop Costs with Spot
  • 47. Populate as demand dictates…
  • 49. And turn it off when you are done
  • 50. EMR is Hadoop… …cheaper, easier, and more agile
  • 51. What’s New? •  MapR M7 Introduction •  Optimised for HBase Clusters •  Failure Recovery •  Point in Time Recovery Snapshotting •  Low Latency Hadoop Optimisations •  HBase Mirroring •  NFS + HDFS •  MapR M5 Price Drop •  Support for Pig 0.11.1 •  RANK, CUBE & ROLLUP capability •  Groovy UDF’s •  Support for Guava Functions •  Performance Improvements •  Spark/Shark Bootstrap Action •  In Memory Hadoop •  Spark Scripting (similar to Pig) •  Shark Shell with Hive Interoperability