SlideShare uma empresa Scribd logo
1 de 40
Baixar para ler offline
ianmas@amazon.com
@IanMmmm
LARGE SCALE DATA
ANALYSIS WITH AWS



Ian Massingham – Technical Evangelist
THE MORE DATA YOU COLLECT
THE MORE VALUE YOU CAN
DERIVE FROM IT!
THE COST OF DATA
GENERATION IS FALLING!
We are constantly producing more data
From all types of industries
GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
Lower cost,
higher throughput
GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
Lower cost,
higher throughput
Highly
constrained
+ ELASTIC AND HIGHLY SCALABLE
+ NO UPFRONT CAPITAL EXPENSE
+ ONLY PAY FOR WHAT YOU USE
+ AVAILABLE ON-DEMAND
= REMOVE CONSTRAINTS
GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
AWS Import / Export
AWS Direct Connect
Inbound data transfer is free
Multipart upload to S3
Physical media
AWS Direct Connect
GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
Amazon S3,
Amazon Glacier,
Amazon DynamoDB,
Amazon RDS,
Amazon Redshift,
AWS Storage Gateway,
Data on Amazon EC2
GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
Amazon EC2
Amazon Elastic
MapReduce
AMAZON ELASTIC
MAPREDUCE

HADOOP AS A SERVICE!
•  SPLITS DATA INTO PIECES
•  LETS PROCESSING OCCUR
•  GATHERS THE RESULTS!
HDFS
EMRKinesis
S3 DynamoDB
Data management
Pig
Analytics languages/engines
RDS
Redshift AWS Data Pipeline
EMR + IMPALA DEMO
STARTING AN EMR CLUSTER
WITH HADOOP ECOSYSTEM
TOOLS PRE-INSTALLED
COPY & LOAD OUR DATASET
$	
  scp	
  –i	
  EMRKeyPair.pem	
  ~/aws/hadoop/LHRarrivals*.csv	
  hadoop@ec2-­‐54-­‐76-­‐242-­‐238.eu-­‐
west-­‐1.compute.amazonaws.com:	
  
	
  
$	
  ssh	
  –i	
  EMRKeyPair.pem	
  hadoop@ec2-­‐54-­‐76-­‐242-­‐238.eu-­‐west-­‐1.compute.amazonaws.com	
  
	
  
$	
  hadoop	
  fs	
  -­‐mkdir	
  /data/	
  
$	
  hadoop	
  fs	
  -­‐put	
  <uploaded_files>	
  /data/	
  
$	
  hadoop	
  fs	
  -­‐ls	
  -­‐h	
  -­‐R	
  /data/	
  
	
  
or at scale, Distributed Copy using S3DistCp to parallel load from S3
	
  
$	
  .	
  /home/hadoop/impala/conf/impala.conf	
  
$	
  hadoop	
  jar	
  /home/hadoop/lib/emr-­‐s3distcp-­‐1.0.jar	
  -­‐Dmapreduce.job.reduces=30	
  -­‐-­‐
src	
  s3://s3bucketname/	
  -­‐-­‐dest	
  hdfs://$HADOOP_NAMENODE_HOST:$HADOOP_NAMENODE_PORT/
data/	
  -­‐-­‐outputCodec	
  'none'	
  
	
  
** Run on a cluster master node
CREATE EXTERNAL TABLE
$	
  #check	
  the	
  size	
  of	
  our	
  data	
  set	
  
$	
  wc	
  –l	
  LHRarrivals*.csv	
  	
  
	
  
	
  850	
  LHRarrivals2.csv	
  
	
  1526	
  LHRarrivals.csv	
  
	
  	
   	
  2376	
  total	
  
	
  
$	
  impala-­‐shell	
  
	
  
Welcome	
  to	
  the	
  Impala	
  shell.	
  
	
  
>	
  create	
  EXTERNAL	
  TABLE	
  flights	
  (	
  input	
  STRING,	
  id	
  BIGINT,	
  widget	
  STRING,	
  source	
  
STRING,	
  resultnum	
  BIGINT,	
  pageurl	
  STRING,	
  scheduled	
  STRING,	
  flightnumber	
  STRING,	
  
airport	
  STRING,	
  status	
  STRING,	
  terminal	
  STRING	
  )	
  ROW	
  FORMAT	
  DELIMITED	
  FIELDS	
  
TERMINATED	
  BY	
  ','	
  LOCATION	
  '/data/';	
  
>	
  select	
  count	
  (*)	
  from	
  flights;	
  
	
  
Should	
  return	
  count(*)	
  2376	
  reflecting	
  the	
  size	
  of	
  the	
  data	
  set	
  
DEMO OF ODBC ACCESS
Doing this part on Amazon WorkSpaces using the Simba Cloudera
Impala ODBC Driver.!
Set up an SSH tunnel to the master node to allow us to connect to port
25010 from the WorkSpaces desktop to the Impala ODBC port!
A previously configured system DSN allows us to work with the data from
our EMR/Impala cluster directly within Microsoft Excel!
GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
Amazon S3,
Amazon DynamoDB,
Amazon RDS,
Amazon Redshift,
Data on Amazon EC2
GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
BATCH
PROCESSING
GENERATE ➔ ➔ SHARE!
STREAM
PROCESSING
AMAZON KINESIS

REAL-TIME DATA STREAM PROCESSING!
Real-time response to content
in semi-structured data streams



Relatively simple computations
on data (aggregates, filters,
sliding window, etc.)
Hourly server logs: how your
systems went wrong an hour ago
Weekly / Monthly Bill: What you
spent this past billing cycle
Daily customer report from your
website: tells you what deal or ad
to try next time
Daily fraud reports: tells you if there
was fraud yesterday
Daily business reports: tells me
how customers used AWS services
yesterday
Real-time metrics: what just went
wrong now
Real-time spending alerts/caps:
guaranteeing you can’t overspend
Real-time analysis: what to offer
the current customer now
Real-time detection: blocks
fraudulent use now
Fast ETL into Amazon Redshift:
how are customers using services
now
GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
Amazon S3,
Amazon DynamoDB,
Amazon RDS,
Amazon Redshift,
Data on Amazon EC2
Amazon EC2
Amazon Elastic
MapReduce
Amazon S3,
Amazon Glacier,
Amazon DynamoDB,
Amazon RDS,
Amazon Redshift,
AWS Storage Gateway,
Data on Amazon EC2
AWS Import / Export
AWS Direct Connect
GENERATE ➔ ➔ SHARE!
STREAM
PROCESSING
GENERATE ➔ ➔ SHARE!
STREAM
PROCESSING
Amazon S3,
Amazon DynamoDB,
Amazon RDS,
Amazon Redshift,
Data on Amazon EC2
Amazon Kinesis
Stream Processing on
Amazon EC2
WANT TO KNOW MORE?
aws.amazon.com/solutions/case-studies/big-data/!
ianmas@amazon.com
@IanMmmm
LARGE SCALE DATA
ANALYSIS WITH AWS



Ian Massingham – Technical Evangelist

Mais conteúdo relacionado

Mais procurados

Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale Amazon Web Services
 
Cost Optimisation with AWS
Cost Optimisation with AWSCost Optimisation with AWS
Cost Optimisation with AWSIan Massingham
 
Your First Data Lake on AWS_Simon Elisha
Your First Data Lake on AWS_Simon ElishaYour First Data Lake on AWS_Simon Elisha
Your First Data Lake on AWS_Simon ElishaHelen Rogers
 
AWSome Day London January 2016 Intro
AWSome Day London January 2016 IntroAWSome Day London January 2016 Intro
AWSome Day London January 2016 IntroIan Massingham
 
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...Amazon Web Services
 
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)Amazon Web Services
 
Soluzioni di Database completamente gestite: NoSQL, relazionali e Data Warehouse
Soluzioni di Database completamente gestite: NoSQL, relazionali e Data WarehouseSoluzioni di Database completamente gestite: NoSQL, relazionali e Data Warehouse
Soluzioni di Database completamente gestite: NoSQL, relazionali e Data WarehouseAmazon Web Services
 
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...Amazon Web Services
 
What's New & What's Next from AWS?
What's New & What's Next from AWS?What's New & What's Next from AWS?
What's New & What's Next from AWS?Ian Massingham
 
Workshop: Building Your First Big Data Application on AWS
Workshop: Building Your First Big Data Application on AWSWorkshop: Building Your First Big Data Application on AWS
Workshop: Building Your First Big Data Application on AWSAmazon Web Services
 
NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing
 	  NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing 	  NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing
NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computingAmazon Web Services
 
Visualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSightVisualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSightAmazon Web Services
 
Getting Started with Amazon QuickSight
Getting Started with Amazon QuickSightGetting Started with Amazon QuickSight
Getting Started with Amazon QuickSightAmazon Web Services
 
Analisi dei dati con AWS: una panoramica degli strumenti disponibili
Analisi dei dati con AWS: una panoramica degli strumenti disponibiliAnalisi dei dati con AWS: una panoramica degli strumenti disponibili
Analisi dei dati con AWS: una panoramica degli strumenti disponibiliAmazon Web Services
 
Structured, Unstructured and Streaming Big Data on the AWS
Structured, Unstructured and Streaming Big Data on the AWSStructured, Unstructured and Streaming Big Data on the AWS
Structured, Unstructured and Streaming Big Data on the AWSAmazon Web Services
 
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...Amazon Web Services
 
Intro Presentation at AWS AWSome Day Glasgow September 2015
Intro Presentation at AWS AWSome Day Glasgow September 2015Intro Presentation at AWS AWSome Day Glasgow September 2015
Intro Presentation at AWS AWSome Day Glasgow September 2015Ian Massingham
 
利用 Amazon QuickSight 視覺化分析服務剖析資料
利用 Amazon QuickSight 視覺化分析服務剖析資料利用 Amazon QuickSight 視覺化分析服務剖析資料
利用 Amazon QuickSight 視覺化分析服務剖析資料Amazon Web Services
 

Mais procurados (20)

AWS Cloud Watch
AWS Cloud WatchAWS Cloud Watch
AWS Cloud Watch
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale
 
Cost Optimisation with AWS
Cost Optimisation with AWSCost Optimisation with AWS
Cost Optimisation with AWS
 
Your First Data Lake on AWS_Simon Elisha
Your First Data Lake on AWS_Simon ElishaYour First Data Lake on AWS_Simon Elisha
Your First Data Lake on AWS_Simon Elisha
 
AWSome Day London January 2016 Intro
AWSome Day London January 2016 IntroAWSome Day London January 2016 Intro
AWSome Day London January 2016 Intro
 
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
 
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
 
Soluzioni di Database completamente gestite: NoSQL, relazionali e Data Warehouse
Soluzioni di Database completamente gestite: NoSQL, relazionali e Data WarehouseSoluzioni di Database completamente gestite: NoSQL, relazionali e Data Warehouse
Soluzioni di Database completamente gestite: NoSQL, relazionali e Data Warehouse
 
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
 
What's New & What's Next from AWS?
What's New & What's Next from AWS?What's New & What's Next from AWS?
What's New & What's Next from AWS?
 
Workshop: Building Your First Big Data Application on AWS
Workshop: Building Your First Big Data Application on AWSWorkshop: Building Your First Big Data Application on AWS
Workshop: Building Your First Big Data Application on AWS
 
NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing
 	  NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing 	  NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing
NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing
 
Visualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSightVisualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSight
 
Cost Optimization at Scale
Cost Optimization at ScaleCost Optimization at Scale
Cost Optimization at Scale
 
Getting Started with Amazon QuickSight
Getting Started with Amazon QuickSightGetting Started with Amazon QuickSight
Getting Started with Amazon QuickSight
 
Analisi dei dati con AWS: una panoramica degli strumenti disponibili
Analisi dei dati con AWS: una panoramica degli strumenti disponibiliAnalisi dei dati con AWS: una panoramica degli strumenti disponibili
Analisi dei dati con AWS: una panoramica degli strumenti disponibili
 
Structured, Unstructured and Streaming Big Data on the AWS
Structured, Unstructured and Streaming Big Data on the AWSStructured, Unstructured and Streaming Big Data on the AWS
Structured, Unstructured and Streaming Big Data on the AWS
 
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
 
Intro Presentation at AWS AWSome Day Glasgow September 2015
Intro Presentation at AWS AWSome Day Glasgow September 2015Intro Presentation at AWS AWSome Day Glasgow September 2015
Intro Presentation at AWS AWSome Day Glasgow September 2015
 
利用 Amazon QuickSight 視覺化分析服務剖析資料
利用 Amazon QuickSight 視覺化分析服務剖析資料利用 Amazon QuickSight 視覺化分析服務剖析資料
利用 Amazon QuickSight 視覺化分析服務剖析資料
 

Semelhante a 2014 Import.io Data Summit - Including Hadoop/Impala Getting Started Demo

Cloud World Forum: Large Scale Data Analysis on AWS
Cloud World Forum: Large Scale Data Analysis on AWSCloud World Forum: Large Scale Data Analysis on AWS
Cloud World Forum: Large Scale Data Analysis on AWSIan Massingham
 
Data Analytics on AWS
Data Analytics on AWSData Analytics on AWS
Data Analytics on AWSDanilo Poccia
 
Large Scale Data Analysis with AWS
Large Scale Data Analysis with AWSLarge Scale Data Analysis with AWS
Large Scale Data Analysis with AWSAmazon Web Services
 
Journey Through the AWS Cloud - Big Data Analysis
Journey Through the AWS Cloud - Big Data AnalysisJourney Through the AWS Cloud - Big Data Analysis
Journey Through the AWS Cloud - Big Data AnalysisAmazon Web Services
 
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big DataAWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big DataAmazon Web Services
 
Loading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SFLoading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SFAmazon Web Services
 
Loading Data into Redshift with Lab
Loading Data into Redshift with LabLoading Data into Redshift with Lab
Loading Data into Redshift with LabAmazon Web Services
 
Loading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftLoading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftAmazon Web Services
 
Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS Amazon Web Services
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon RedshiftAmazon Web Services
 
Big Data: Mejores prácticas en AWS
Big Data: Mejores prácticas en AWSBig Data: Mejores prácticas en AWS
Big Data: Mejores prácticas en AWSAmazon Web Services
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR MasterclassIan Massingham
 
AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
 AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
AWS Cloud Kata 2014 | Jakarta - 2-3 Big DataAmazon Web Services
 
Build your own CDN with Varnish - Confoo 2022
Build your own CDN with Varnish - Confoo 2022Build your own CDN with Varnish - Confoo 2022
Build your own CDN with Varnish - Confoo 2022Thijs Feryn
 
B3 - Business intelligence apps on aws
B3 - Business intelligence apps on awsB3 - Business intelligence apps on aws
B3 - Business intelligence apps on awsAmazon Web Services
 

Semelhante a 2014 Import.io Data Summit - Including Hadoop/Impala Getting Started Demo (20)

Cloud World Forum: Large Scale Data Analysis on AWS
Cloud World Forum: Large Scale Data Analysis on AWSCloud World Forum: Large Scale Data Analysis on AWS
Cloud World Forum: Large Scale Data Analysis on AWS
 
Workshop part2 – Big Data
Workshop part2 – Big DataWorkshop part2 – Big Data
Workshop part2 – Big Data
 
Data Analytics on AWS
Data Analytics on AWSData Analytics on AWS
Data Analytics on AWS
 
Large Scale Data Analysis with AWS
Large Scale Data Analysis with AWSLarge Scale Data Analysis with AWS
Large Scale Data Analysis with AWS
 
Journey Through the AWS Cloud - Big Data Analysis
Journey Through the AWS Cloud - Big Data AnalysisJourney Through the AWS Cloud - Big Data Analysis
Journey Through the AWS Cloud - Big Data Analysis
 
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big DataAWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Loading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SFLoading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SF
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Loading Data into Redshift with Lab
Loading Data into Redshift with LabLoading Data into Redshift with Lab
Loading Data into Redshift with Lab
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Loading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftLoading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF Loft
 
Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon Redshift
 
Big Data: Mejores prácticas en AWS
Big Data: Mejores prácticas en AWSBig Data: Mejores prácticas en AWS
Big Data: Mejores prácticas en AWS
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR Masterclass
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR Masterclass
 
AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
 AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
 
Build your own CDN with Varnish - Confoo 2022
Build your own CDN with Varnish - Confoo 2022Build your own CDN with Varnish - Confoo 2022
Build your own CDN with Varnish - Confoo 2022
 
B3 - Business intelligence apps on aws
B3 - Business intelligence apps on awsB3 - Business intelligence apps on aws
B3 - Business intelligence apps on aws
 

Mais de Ian Massingham

Some thoughts on measuring the impact of developer relations
Some thoughts on measuring the impact of developer relationsSome thoughts on measuring the impact of developer relations
Some thoughts on measuring the impact of developer relationsIan Massingham
 
Leeds IoT Meetup - Nov 2017
Leeds IoT Meetup - Nov 2017Leeds IoT Meetup - Nov 2017
Leeds IoT Meetup - Nov 2017Ian Massingham
 
DevTalks Romania - Getting Started with AWS Lambda & the Serverless Cloud
DevTalks Romania - Getting Started with AWS Lambda & the Serverless CloudDevTalks Romania - Getting Started with AWS Lambda & the Serverless Cloud
DevTalks Romania - Getting Started with AWS Lambda & the Serverless CloudIan Massingham
 
Getting started with AWS Lambda and the Serverless Cloud
Getting started with AWS Lambda and the Serverless CloudGetting started with AWS Lambda and the Serverless Cloud
Getting started with AWS Lambda and the Serverless CloudIan Massingham
 
AWS AWSome Day - Getting Started Best Practices
AWS AWSome Day - Getting Started Best PracticesAWS AWSome Day - Getting Started Best Practices
AWS AWSome Day - Getting Started Best PracticesIan Massingham
 
AWS IoT Workshop Keynote
AWS IoT Workshop KeynoteAWS IoT Workshop Keynote
AWS IoT Workshop KeynoteIan Massingham
 
Security Best Practices: AWS AWSome Day Management Track
Security Best Practices: AWS AWSome Day Management TrackSecurity Best Practices: AWS AWSome Day Management Track
Security Best Practices: AWS AWSome Day Management TrackIan Massingham
 
AWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:CapAWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:CapIan Massingham
 
AWS re:Invent 2016 Day 1 Keynote re:Cap
AWS re:Invent 2016 Day 1 Keynote re:CapAWS re:Invent 2016 Day 1 Keynote re:Cap
AWS re:Invent 2016 Day 1 Keynote re:CapIan Massingham
 
Getting Started with AWS Lambda & Serverless Cloud
Getting Started with AWS Lambda & Serverless CloudGetting Started with AWS Lambda & Serverless Cloud
Getting Started with AWS Lambda & Serverless CloudIan Massingham
 
Building Better IoT Applications without Servers
Building Better IoT Applications without ServersBuilding Better IoT Applications without Servers
Building Better IoT Applications without ServersIan Massingham
 
AWS AWSome Day Roadshow
AWS AWSome Day RoadshowAWS AWSome Day Roadshow
AWS AWSome Day RoadshowIan Massingham
 
AWS AWSome Day Roadshow Intro
AWS AWSome Day Roadshow IntroAWS AWSome Day Roadshow Intro
AWS AWSome Day Roadshow IntroIan Massingham
 
Hashiconf AWS Lambda Breakout
Hashiconf AWS Lambda BreakoutHashiconf AWS Lambda Breakout
Hashiconf AWS Lambda BreakoutIan Massingham
 
Getting started with AWS IoT on Raspberry Pi
Getting started with AWS IoT on Raspberry PiGetting started with AWS IoT on Raspberry Pi
Getting started with AWS IoT on Raspberry PiIan Massingham
 
AWSome Day Dublin Intro & Closing Slides
AWSome Day Dublin Intro & Closing Slides AWSome Day Dublin Intro & Closing Slides
AWSome Day Dublin Intro & Closing Slides Ian Massingham
 
GOTO Stockholm - AWS Lambda - Logic in the cloud without a back-end
GOTO Stockholm - AWS Lambda - Logic in the cloud without a back-endGOTO Stockholm - AWS Lambda - Logic in the cloud without a back-end
GOTO Stockholm - AWS Lambda - Logic in the cloud without a back-endIan Massingham
 
What's New at AWS Update for AWS User Groups
What's New at AWS Update for AWS User Groups What's New at AWS Update for AWS User Groups
What's New at AWS Update for AWS User Groups Ian Massingham
 
Advanced Security Masterclass - Tel Aviv Loft
Advanced Security Masterclass - Tel Aviv LoftAdvanced Security Masterclass - Tel Aviv Loft
Advanced Security Masterclass - Tel Aviv LoftIan Massingham
 
Security Best Practices
Security Best PracticesSecurity Best Practices
Security Best PracticesIan Massingham
 

Mais de Ian Massingham (20)

Some thoughts on measuring the impact of developer relations
Some thoughts on measuring the impact of developer relationsSome thoughts on measuring the impact of developer relations
Some thoughts on measuring the impact of developer relations
 
Leeds IoT Meetup - Nov 2017
Leeds IoT Meetup - Nov 2017Leeds IoT Meetup - Nov 2017
Leeds IoT Meetup - Nov 2017
 
DevTalks Romania - Getting Started with AWS Lambda & the Serverless Cloud
DevTalks Romania - Getting Started with AWS Lambda & the Serverless CloudDevTalks Romania - Getting Started with AWS Lambda & the Serverless Cloud
DevTalks Romania - Getting Started with AWS Lambda & the Serverless Cloud
 
Getting started with AWS Lambda and the Serverless Cloud
Getting started with AWS Lambda and the Serverless CloudGetting started with AWS Lambda and the Serverless Cloud
Getting started with AWS Lambda and the Serverless Cloud
 
AWS AWSome Day - Getting Started Best Practices
AWS AWSome Day - Getting Started Best PracticesAWS AWSome Day - Getting Started Best Practices
AWS AWSome Day - Getting Started Best Practices
 
AWS IoT Workshop Keynote
AWS IoT Workshop KeynoteAWS IoT Workshop Keynote
AWS IoT Workshop Keynote
 
Security Best Practices: AWS AWSome Day Management Track
Security Best Practices: AWS AWSome Day Management TrackSecurity Best Practices: AWS AWSome Day Management Track
Security Best Practices: AWS AWSome Day Management Track
 
AWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:CapAWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:Cap
 
AWS re:Invent 2016 Day 1 Keynote re:Cap
AWS re:Invent 2016 Day 1 Keynote re:CapAWS re:Invent 2016 Day 1 Keynote re:Cap
AWS re:Invent 2016 Day 1 Keynote re:Cap
 
Getting Started with AWS Lambda & Serverless Cloud
Getting Started with AWS Lambda & Serverless CloudGetting Started with AWS Lambda & Serverless Cloud
Getting Started with AWS Lambda & Serverless Cloud
 
Building Better IoT Applications without Servers
Building Better IoT Applications without ServersBuilding Better IoT Applications without Servers
Building Better IoT Applications without Servers
 
AWS AWSome Day Roadshow
AWS AWSome Day RoadshowAWS AWSome Day Roadshow
AWS AWSome Day Roadshow
 
AWS AWSome Day Roadshow Intro
AWS AWSome Day Roadshow IntroAWS AWSome Day Roadshow Intro
AWS AWSome Day Roadshow Intro
 
Hashiconf AWS Lambda Breakout
Hashiconf AWS Lambda BreakoutHashiconf AWS Lambda Breakout
Hashiconf AWS Lambda Breakout
 
Getting started with AWS IoT on Raspberry Pi
Getting started with AWS IoT on Raspberry PiGetting started with AWS IoT on Raspberry Pi
Getting started with AWS IoT on Raspberry Pi
 
AWSome Day Dublin Intro & Closing Slides
AWSome Day Dublin Intro & Closing Slides AWSome Day Dublin Intro & Closing Slides
AWSome Day Dublin Intro & Closing Slides
 
GOTO Stockholm - AWS Lambda - Logic in the cloud without a back-end
GOTO Stockholm - AWS Lambda - Logic in the cloud without a back-endGOTO Stockholm - AWS Lambda - Logic in the cloud without a back-end
GOTO Stockholm - AWS Lambda - Logic in the cloud without a back-end
 
What's New at AWS Update for AWS User Groups
What's New at AWS Update for AWS User Groups What's New at AWS Update for AWS User Groups
What's New at AWS Update for AWS User Groups
 
Advanced Security Masterclass - Tel Aviv Loft
Advanced Security Masterclass - Tel Aviv LoftAdvanced Security Masterclass - Tel Aviv Loft
Advanced Security Masterclass - Tel Aviv Loft
 
Security Best Practices
Security Best PracticesSecurity Best Practices
Security Best Practices
 

Último

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Último (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

2014 Import.io Data Summit - Including Hadoop/Impala Getting Started Demo

  • 1. ianmas@amazon.com @IanMmmm LARGE SCALE DATA ANALYSIS WITH AWS
 
 Ian Massingham – Technical Evangelist
  • 2. THE MORE DATA YOU COLLECT THE MORE VALUE YOU CAN DERIVE FROM IT!
  • 3.
  • 4.
  • 5. THE COST OF DATA GENERATION IS FALLING!
  • 6. We are constantly producing more data
  • 7. From all types of industries
  • 8.
  • 9.
  • 10. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
  • 11. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE! Lower cost, higher throughput
  • 12. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE! Lower cost, higher throughput Highly constrained
  • 13. + ELASTIC AND HIGHLY SCALABLE + NO UPFRONT CAPITAL EXPENSE + ONLY PAY FOR WHAT YOU USE + AVAILABLE ON-DEMAND = REMOVE CONSTRAINTS
  • 14. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
  • 15. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE! AWS Import / Export AWS Direct Connect
  • 16. Inbound data transfer is free Multipart upload to S3 Physical media AWS Direct Connect
  • 17. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE! Amazon S3, Amazon Glacier, Amazon DynamoDB, Amazon RDS, Amazon Redshift, AWS Storage Gateway, Data on Amazon EC2
  • 18. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE! Amazon EC2 Amazon Elastic MapReduce
  • 19.
  • 21. •  SPLITS DATA INTO PIECES •  LETS PROCESSING OCCUR •  GATHERS THE RESULTS!
  • 22. HDFS EMRKinesis S3 DynamoDB Data management Pig Analytics languages/engines RDS Redshift AWS Data Pipeline
  • 23. EMR + IMPALA DEMO
  • 24. STARTING AN EMR CLUSTER WITH HADOOP ECOSYSTEM TOOLS PRE-INSTALLED
  • 25. COPY & LOAD OUR DATASET $  scp  –i  EMRKeyPair.pem  ~/aws/hadoop/LHRarrivals*.csv  hadoop@ec2-­‐54-­‐76-­‐242-­‐238.eu-­‐ west-­‐1.compute.amazonaws.com:     $  ssh  –i  EMRKeyPair.pem  hadoop@ec2-­‐54-­‐76-­‐242-­‐238.eu-­‐west-­‐1.compute.amazonaws.com     $  hadoop  fs  -­‐mkdir  /data/   $  hadoop  fs  -­‐put  <uploaded_files>  /data/   $  hadoop  fs  -­‐ls  -­‐h  -­‐R  /data/     or at scale, Distributed Copy using S3DistCp to parallel load from S3   $  .  /home/hadoop/impala/conf/impala.conf   $  hadoop  jar  /home/hadoop/lib/emr-­‐s3distcp-­‐1.0.jar  -­‐Dmapreduce.job.reduces=30  -­‐-­‐ src  s3://s3bucketname/  -­‐-­‐dest  hdfs://$HADOOP_NAMENODE_HOST:$HADOOP_NAMENODE_PORT/ data/  -­‐-­‐outputCodec  'none'     ** Run on a cluster master node
  • 26. CREATE EXTERNAL TABLE $  #check  the  size  of  our  data  set   $  wc  –l  LHRarrivals*.csv        850  LHRarrivals2.csv    1526  LHRarrivals.csv        2376  total     $  impala-­‐shell     Welcome  to  the  Impala  shell.     >  create  EXTERNAL  TABLE  flights  (  input  STRING,  id  BIGINT,  widget  STRING,  source   STRING,  resultnum  BIGINT,  pageurl  STRING,  scheduled  STRING,  flightnumber  STRING,   airport  STRING,  status  STRING,  terminal  STRING  )  ROW  FORMAT  DELIMITED  FIELDS   TERMINATED  BY  ','  LOCATION  '/data/';   >  select  count  (*)  from  flights;     Should  return  count(*)  2376  reflecting  the  size  of  the  data  set  
  • 27. DEMO OF ODBC ACCESS Doing this part on Amazon WorkSpaces using the Simba Cloudera Impala ODBC Driver.! Set up an SSH tunnel to the master node to allow us to connect to port 25010 from the WorkSpaces desktop to the Impala ODBC port! A previously configured system DSN allows us to work with the data from our EMR/Impala cluster directly within Microsoft Excel!
  • 28. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE! Amazon S3, Amazon DynamoDB, Amazon RDS, Amazon Redshift, Data on Amazon EC2
  • 29. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
  • 30. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE! BATCH PROCESSING
  • 31. GENERATE ➔ ➔ SHARE! STREAM PROCESSING
  • 32. AMAZON KINESIS
 REAL-TIME DATA STREAM PROCESSING!
  • 33. Real-time response to content in semi-structured data streams
 
 Relatively simple computations on data (aggregates, filters, sliding window, etc.)
  • 34. Hourly server logs: how your systems went wrong an hour ago Weekly / Monthly Bill: What you spent this past billing cycle Daily customer report from your website: tells you what deal or ad to try next time Daily fraud reports: tells you if there was fraud yesterday Daily business reports: tells me how customers used AWS services yesterday Real-time metrics: what just went wrong now Real-time spending alerts/caps: guaranteeing you can’t overspend Real-time analysis: what to offer the current customer now Real-time detection: blocks fraudulent use now Fast ETL into Amazon Redshift: how are customers using services now
  • 35. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
  • 36. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE! Amazon S3, Amazon DynamoDB, Amazon RDS, Amazon Redshift, Data on Amazon EC2 Amazon EC2 Amazon Elastic MapReduce Amazon S3, Amazon Glacier, Amazon DynamoDB, Amazon RDS, Amazon Redshift, AWS Storage Gateway, Data on Amazon EC2 AWS Import / Export AWS Direct Connect
  • 37. GENERATE ➔ ➔ SHARE! STREAM PROCESSING
  • 38. GENERATE ➔ ➔ SHARE! STREAM PROCESSING Amazon S3, Amazon DynamoDB, Amazon RDS, Amazon Redshift, Data on Amazon EC2 Amazon Kinesis Stream Processing on Amazon EC2
  • 39. WANT TO KNOW MORE? aws.amazon.com/solutions/case-studies/big-data/!
  • 40. ianmas@amazon.com @IanMmmm LARGE SCALE DATA ANALYSIS WITH AWS
 
 Ian Massingham – Technical Evangelist