O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Big Data and Analytics Innovation Summit

936 visualizações

Publicada em

Publicada em: Tecnologia
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

Big Data and Analytics Innovation Summit

  1. 1. Cloud
  2. 2. The big data pipelineHow customers are using the pipelineThe big data eco-system on the cloud
  3. 3. GenerationCollectStoreCollaboration & sharingAnalysis and Computation
  4. 4. GenerationCollectStoreCollaboration & sharingAnalysis and Computationlower cost,increasedthroughput
  5. 5. GenerationCollectStoreCollaboration & sharingAnalysis and Computationlower cost,increasedthroughputconstraint
  6. 6. Generated dataAvailable for analysisData volumeGartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
  7. 7. Very high barrier toturning data intoinformation…
  8. 8. Very high barrier toturning data intoinformation.Infrastructure capacityTechnical SkillsQuestions to askCheap experimentation
  9. 9. Amazon Web Services Cloud
  10. 10. Elastic and highly scalableNo upfront capital expenseOnly pay for what you use++Available on-demand+=Removeconstraints
  11. 11. Remove constraints = More experimentationMore experimentation = More innovationMore Innovation = Competitive edge
  12. 12. Amazon Web ServicesRemoves constraintsFocus on your dataLeave undifferentiated heavy lifting to us
  13. 13. big data
  14. 14. Bankinter uses HPC on AWS for Monte CarloSimulation“Bankinter uses AWS as anintegral part of our credit-risk simulation application;We need to perform atleast 5,000,000 simulationsto get realistic results”CreditDataAverage simulationtime went from 23 hours to 20 minutes
  15. 15. Challenge:Learn about customer based onwhat they do, rather than whatthey say (i.e., data exhaust);virtually unlimited dataSolution:Always-on cluster continuallyprocesses new financial dataand stores results in S3.Collaborative filtering used toprovide recommendations andad-hoc queries performedusing Hive.
  16. 16. For illustrative purposes only.
  17. 17. S&P Capital IQMicrosoftSQL ServerAmazon S3:• Companies You MayBe Interested InAmazon S3:• Clicks• Key Developments• Company ProfilesAmazon Elastic Map-Reduce:• Compute User Selectivity• Compute Key Developments• Join & Score
  18. 18. Challenge:Volatile weather is deadly to crops like grapes and tomatoesSolution:Built a predictive model based on freely available data—60 years ofcrop data, 14 TBs of soil data, and one million government Dopplerradar points. 50 hadoop clusters process new data as it comes into S3each day, continuously updating the model.150B SoilObservations3M DailyWeatherMeasurements850K PrecisionRainfall GridsTracked
  19. 19. Simulations Each Month• Per Simulation:• 10K Unique Scenarios Generated• 5 Trillion Datapoints• 5-6k Node Hadoop Cluster
  20. 20. AWSImport/ExportCorporatedata centerAmazonElasticMapReduceAmazonSimpleStorageService (S3)BI UsersClickstream datafrom 500+websites and VoDplatform
  21. 21. More than 25 Million Streaming Members50 Billion Events Per Day30 Million plays every day2 billion hours of video in 3months4 million ratings per day3 million searchesDevice location , time ,day, week etc.Social data
  22. 22. 10 TB of streaming data per day
  23. 23. Data consumed in multiple waysS3EMRProd Cluster(EMR)RecommendationEngineAd-hocAnalysisPersonalization
  24. 24. Amazon Dynamodb
  25. 25. “Who buys video games?”
  26. 26. 3.5 billion records13 TB of click stream logs71 million unique cookiesPer day:
  27. 27. 500% return on ad spend17,000% reduction inprocurement timeResults:
  28. 28. “Who is using ourservice?”
  29. 29. Identified early mobile usageInvested heavily in mobiledevelopmentFinding signal in the noise of logs9,432,061 unique mobile devicesused the Yelp mobile app.
  30. 30. Every day is crucial and costly
  31. 31. Challenge: To run a virtual screen with a higheraccuracy algorithm & 21 million compounds
  32. 32. Metric CountCompute Hours ofWork109,927 hoursCompute Days ofWork4,580 daysCompute Years ofWork12.55 yearsLigand Count ~21 million ligandsUsing Cycle Computing and AmazonWeb Services
  33. 33. 3 Hoursfor $4828.85/hr
  34. 34. Relational Database ServiceFully managed database(MySQL, Oracle, MSSQL)DynamoDBNoSQL, Schemaless,Provisioned throughputdatabaseS3Object datastore up to 5TBper object99.999999999% durability
  35. 35. Map-Reduce engineHadoop-as-a-serviceMassively parallelCost effective AWS wrapperAmazon Elastic MapReduce
  36. 36. AmazonRedshiftdata warehouse servicepetabyte-scalefast and fully managed
  37. 37. RDBMSRedshiftOLTPERPReportingand BI
  38. 38. +Source: http://nerds.airbnb.com/redshift-performance-costTable Size Query type Hive Redshift3 billionrowsSimple rangequery1680seconds (28min)360 seconds(6 min)1 millionrows2 complexjoins182 seconds 8 seconds$13.60/hour on Redshift versus $57/hour onHIVE
  39. 39. GenerationCollectStoreCollaboration & sharingAnalysis and Computation
  40. 40. Thank you! aws.amazon.com/big-dataMay 14st, Kowloonbay International Trade& Exhibition Centre (KITEC), Hong KongOne day Free trainingWalk through of serviceshttp://aws.amazon.com/apac/awsday/hk/