O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data

AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data

  • Entre para ver os comentários

AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data

  1. 1. Jakarta Big Data on AWS Markku Lepistö Principal Technology Evangelist @markkulepisto
  2. 2. Does this Data make me look big?
  3. 3. Generation Collection & storage Analytics & computation Collaboration & sharing
  4. 4. Generation Collection & storage Analytics & computation Collaboration & sharing
  5. 5. Getting your Data into AWS Amazon S3 Corporate(Data( Center( • Console Upload • FTP • AWS Import Export • S3 API • Direct Connect • Storage Gateway • 3rd Party Commercial Apps • Tsunami UDP 1
  6. 6. Services: Storage: Amazon S3 Deployment(&(Administra=on( App(Services( Compute( Storage( Database( Networking( AWS(Global(Infrastructure( Amazon Simple Storage Service (S3) • Unlimited storage of objects of any type • 99.999999999% durability, replicated across multiple facilities • Cost effective storage, US$0.03/GB Month • Granular access control and permissions over objects • Encryption at rest using AES 256bit server side encryption • Encryption in transit using HTTPS protocol • High performance throughput supporting parallelized upload or download • Import or export data via physical device handling service • Data remains in geographic location chosen
  7. 7. Write directly to a data source Your%applica+on% Amazon S3 DynamoDB% Any%other%data% store% Amazon S3 Amazon%EC2%% 2
  8. 8. Services: Database: Amazon DynamoDB " Zero Admin NoSQL Service " Unlimited Storage " Provisioned Throughput " Consistent <10ms response " Durable on SSD Compute( Storage( Database( Networking( AWS(Global(Infrastructure(
  9. 9. Scale or Autoscale DynamoDB Table Throughput
  10. 10. Queue, pre-process and then write to data source Amazon%Simple% Queue%Service% (SQS)% Amazon S3 DynamoDB% Any%other%data% store% 3
  11. 11. Aggregate and write to data source Flume running on EC2 Amazon S3 HDFS Any%other%data% store% 4
  12. 12. Choose depending upon design Amazon%SQS% Amazon S3 DynamoDB% Any%SQL%or%NO%SQL% Store% Log%Aggrega+on%% tools%
  13. 13. S3 as a single source of truth S3 Courtesy http://techblog.netflix.com/2013/01/hadoop-platform-as-service-in-cloud.html
  14. 14. Generation Collection storage Analytics computation Collaboration sharing
  15. 15. Hadoop based Analysis Amazon%SQS% Amazon S3 DynamoDB% Any%SQL%or%NO%SQL% Store% Log%Aggrega+on%% tools% Amazon EMR
  16. 16. What is Amazon Elastic MapReduce (EMR)? EMR is Hadoop in the Cloud!
  17. 17. How does EMR work ? EMR%Cluster S3 Put the data into S3 Choose: Hadoop distribution, # of nodes, types of nodes, custom configs, Hive/Pig/etc. Get the output from S3 Launch the cluster using the EMR console, CLI, SDK, or APIs You can also store everything in HDFS
  18. 18. What can you run on EMR… S3 EMR Cluster
  19. 19. SQL based processing Amazon%SQS% Amazon S3 DynamoDB% Any%SQL%or%NO%SQL% Store% Log%Aggrega+on%% tools% Amazon EMR Amazon Redshift Pre-processing framework Petabyte scale Columnar Data - warehouse
  20. 20. Services: Database: Amazon Redshift Amazon Redshift • Easily and rapidly analyze petabytes of data • Fully managed data warehouse service • Automated deployment and administration • 1/10th the cost of traditional data warehouses • $1000 / Terabyte / year • Compatible with popular BI tools Deployment((Administra=on( App(Services( Compute( Storage( Database( Networking( AWS(Global(Infrastructure(
  21. 21. Your choice of BI Tools on the cloud Amazon%SQS% Amazon S3 DynamoDB% Any%SQL%or%NO%SQL% Store% Log%Aggrega+on%% tools% Amazon EMR Amazon Redshift Pre-processing framework
  22. 22. Generation Collection storage Analytics computation Collaboration sharing
  23. 23. Collaboration and Sharing insights Amazon%SQS% Amazon S3 DynamoDB% Any%SQL%or%NO%SQL% Store% Log%Aggrega+on%% tools% Amazon EMR Amazon Redshift
  24. 24. Sharing results and visualizations at scale Amazon%SQS% Amazon S3 DynamoDB% Any%SQL%or%NO%SQL% Store% Log%Aggrega+on%% tools% Amazon EMR Amazon Redshift Visualization tools Web App Server
  25. 25. Rinse and Repeat every day or hour
  26. 26. Rinse and Repeat Amazon%SQS% Amazon S3 DynamoDB% Any%SQL%or%NO%SQL% Store% Log%Aggrega+on%% tools% Amazon EMR Amazon Redshift Business Intelligence Tools Visualization tools GIS tools Business Intelligence Tools GIS tools on hadoop Amazon data pipeline
  27. 27. The complete architecture Amazon%SQS% Amazon S3 DynamoDB% Any%SQL%or%NO%SQL% Store% Log%Aggrega+on%% tools% Amazon EMR Amazon Redshift Business Intelligence Tools Visualization tools GIS tools Business Intelligence Tools GIS tools on hadoop Amazon data pipeline
  28. 28. No it isn’t !
  29. 29. What about Real-Time?
  30. 30. nopeampi data on parempi data
  31. 31. is the exception rate is the ad click-through topics are trending inventory remains queries are slow are the high scores What right now?
  32. 32. HAPPENING NOW! real-time == stream analytics
  33. 33. Ingest data streams Store durably Distribute Scale out Process as packets flow in
  34. 34. Realtime Analytics in the Cloud Amazon%Kinesis% Streaming*Data*Service*
  35. 35. Kinesis architecture AZ AZ AZ Durable, highly consistent storage replicates data across three data centers (availability zones) Amazon Web Services Aggregate and archive to S3 Millions of sources producing 100s of terabytes per hour Front End Authentication Authorization Ordered stream of events supports multiple readers Real-time dashboards and alarms Machine learning algorithms or sliding window analytics Aggregate analysis in Hadoop or a data warehouse Inexpensive: $0.028 per million puts
  36. 36. InBgame( ac=vity( Kinesis:(RealB=me(data(stream(of(inBgame(ac=vity( Amazon( Kinesis( Clash(of(Clans(
  37. 37. KinesisBenabled(apps(on(EC2( InBgame( ac=vity( Kinesis:(RealB=me(data(stream(of(inBgame(ac=vity( Mul=ple(Kinesis(applica=ons:(Dashboards,(analy=cs(and(storage( Clash(of(Clans( RealB=me(clickstream( processing(app( Amazon( Kinesis(
  38. 38. Kinesis:(RealB=me(data(stream(of(inBgame(ac=vity( Mul=ple(Kinesis(applica=ons:(Dashboards,(analy=cs(and(storage( S3(and(Glacier:(Data(storage(and(long(term(archival( S3( Aggregate( sta=s=cs( InBgame( ac=vity( EC2:(InBgame( engagement( trends( dashboard( Clash(of(Clans( KinesisBenabled(apps(on(EC2( RealB=me(clickstream( processing(app( Amazon( Kinesis(
  39. 39. BusinessBintelligence( user( Kinesis:(RealB=me(data(stream(of(inBgame(ac=vity( Mul=ple(Kinesis(applica=ons:(Dashboards,(analy=cs(and(storage( EC2:(InBgame( engagement( trends( dashboard( InBgame( ac=vity( S3( Aggregate( sta=s=cs( S3(and(Glacier:(Data(storage(and(long(term(archival( Data(Warehouse:(BI(repor=ng(and(interac=ve(queries( Clash(of(Clans( KinesisBenabled(apps(on(EC2( EC2( Data( Warehouse( RealB=me(clickstream( processing(app( Amazon( Kinesis(
  40. 40. Glacier( Clickstream( archive( EC2( Data( Warehouse( Kinesis:(RealB=me(data(stream(of(inBgame(ac=vity( Mul=ple(Kinesis(applica=ons:(Dashboards,(analy=cs(and(storage( EC2:(InBgame( engagement( trends( dashboard( S3(and(Glacier:(Data(storage(and(long(term(archival( Data(Warehouse:(BI(repor=ng(and(interac=ve(queries( RealB=me(clickstream( processing(app( InBgame( ac=vity( S3( Clash(of(Clans( Aggregate( sta=s=cs( BusinessBintelligence( user( KinesisBenabled(apps(on(EC2( Amazon( Kinesis(
  41. 41. Demo Sliding Window Analytics ! Live Dashboard S3 Storage ! Redshift Data Warehouse Website Clickstream logs Kinesis
  42. 42. 台北 6月25日
  43. 43. Bonus Internet of Things
  44. 44. Smart(Devices( Powered(by(the(Cloud(
  45. 45. Smart(Devices( Powered(by(the(Cloud(
  46. 46. Smart(Devices( Powered(by(the(Cloud(
  47. 47. Smart(Devices( Powered(by(the(Cloud(
  48. 48. Smart?evices( Powered(by(the(Cloud(
  49. 49. Smart?evices( Powered(by(the(Cloud( Arduino%Uno% Raspberry%Pi% CPU( 20MHz(8bit( 700MHz(32bit( Memory( 2(KB( 512(MB( Storage( 32(KB( SD(card(
  50. 50. Smart(Devices( Powered(by(the(Cloud(
  51. 51. Camera( Microphone( Thermometer( Distance( GPS( Gyroscope( Accelerometer( Actuator( Relay( Motor( Manipulator( Pressure( Switch( Wheel( Propeller( Rotor(
  52. 52. Challenges(
  53. 53. Challenges( Thousands(–(Millions(of( Devices(/(Producers(
  54. 54. Challenges( Thousands(–(Millions(of( Devices(/(Producers( Thousands(–(Millions(of( Users(/(Consumers(
  55. 55. Distributed( Thousands(–(Millions(of( Devices(/(Producers( Thousands(–(Millions(of( Users(/(Consumers(
  56. 56. At(scale( Thousands(–(Millions(of( Devices(/(Producers( Thousands(–(Millions(of( Users(/(Consumers(
  57. 57. Smart(Devices( Powered(by(the(Cloud(
  58. 58. Smart(Devices( Unlimited(Storage(–(Memory( Powered(by(the(Cloud( (( Unlimited(Compute(–(Logic(
  59. 59. Camera( Microphone( Thermometer( Distance( GPS( Gyroscope( Accelerometer( Actuator( Relay( Motor( Manipulator( Pressure( Switch( Wheel( Propeller( Rotor(
  60. 60. Smart(Devices( Powered(by(the(Cloud(
  61. 61. Dropcam(is(the(biggest(inbound(video(service(on(the(Web(( • More(data(uploaded(per( minute(than(YouTube(( • Petabytes(of(data( processed(every(month( • Billions(of(mo=on(events( detected(
  62. 62. 73(
  63. 63. The AWS Big Data Portfolio COLLECT | STORE | ANALYZE | SHARE Direct Connect S3 Import Export S3 EC2 DynamoDB Redshift Glacier EMR Data Pipeline Kinesis CloudFront
  64. 64. (((((((((((((((AWS(CBSDK(Experimental( for(( ArduinoBstyle(IoT(Devices(
  65. 65. CBSDK(–(Na=ve(AWS(Libraries,( Direct(Access(to(AWS(Services(from(Devices(
  66. 66. Demo(
  67. 67. Arduino(Yún( %
  68. 68. Raspberry(Pi(
  69. 69. Spark(Core(
  70. 70. AcceleB rometer( MQTT( Mosqui_o(MQTT(Broker( MQTTBKinesis(Bridge( AWS(SDK( Amazon(Kinesis( RealB=me(Streaming( Data(Service( AWS(APIs( AWS(Elas=c( Beanstalk( Dashboard(
  71. 71. Amazon(SNS( Alert(No=fica=on( (5g( Mobile(Push( Spark(API( Sound(Alarm( (6g(( AWS(Elas=c( Beanstalk( Dashboard(
  72. 72. Demo(
  73. 73. AWS Cloud Kata for Start-Ups and Developers 台北 6月25日

×