SlideShare uma empresa Scribd logo
1 de 10
Data Infrastructure
on Hadoop
Venkatesh S
Architect, Hadoop Data
Outline
• Big Picture
• Data Infrastructure
–Now
–Next Wave
• Questions
BIG Data is here.
Managing BIG Data
Ads
Optimization
Content
Optimization
Search
Index
Machine
Learning
(e.g. Spam filters)
RSS Feeds
Site thumbnails
Who is using this Data?
Next Wave!
Hadoop Analytics Warehouse
Utilization
Storage
Efficiency
Questions?

Mais conteúdo relacionado

Mais procurados

IBM Big Data Platform, 2012
IBM Big Data Platform, 2012IBM Big Data Platform, 2012
IBM Big Data Platform, 2012Rob Thomas
 
Výběr Big Data platformy - Jan Sovka - IBM
Výběr Big Data platformy - Jan Sovka - IBMVýběr Big Data platformy - Jan Sovka - IBM
Výběr Big Data platformy - Jan Sovka - IBMProfinit
 
リアルタイムアクセスログ分析基盤をAWSに構築した話 (JAWS UG BigData Branch)
リアルタイムアクセスログ分析基盤をAWSに構築した話 (JAWS UG BigData Branch)リアルタイムアクセスログ分析基盤をAWSに構築した話 (JAWS UG BigData Branch)
リアルタイムアクセスログ分析基盤をAWSに構築した話 (JAWS UG BigData Branch)Hajime Sano
 
Working with data using AI based tools
Working with data using AI based toolsWorking with data using AI based tools
Working with data using AI based toolsdhruv_gairola
 
Qubole presentation for the Cleveland Big Data and Hadoop Meetup
Qubole presentation for the Cleveland Big Data and Hadoop Meetup   Qubole presentation for the Cleveland Big Data and Hadoop Meetup
Qubole presentation for the Cleveland Big Data and Hadoop Meetup Qubole
 
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14Phillip Delaney
 
PASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureMLPASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureMLJen Stirrup
 
Strata Online_road_to_enterprise_data_2011
Strata Online_road_to_enterprise_data_2011Strata Online_road_to_enterprise_data_2011
Strata Online_road_to_enterprise_data_2011Lynn Langit
 
Using BigQuery as a main Big Data solution
Using BigQuery as a main Big Data solutionUsing BigQuery as a main Big Data solution
Using BigQuery as a main Big Data solutionNikolay Novozhilov
 
Kyvos Insights
Kyvos Insights Kyvos Insights
Kyvos Insights rebeccatho
 
Spark + Scikit Learn- Performance Tuning
Spark + Scikit Learn- Performance TuningSpark + Scikit Learn- Performance Tuning
Spark + Scikit Learn- Performance Tuning晨揚 施
 
Hw09 Real Time Business Intelligence
Hw09   Real Time Business IntelligenceHw09   Real Time Business Intelligence
Hw09 Real Time Business IntelligenceCloudera, Inc.
 
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...MongoDB
 
Case study: Hadoop as ELT for Leading US Retailer - Happiest Minds
Case study: Hadoop as ELT for Leading US Retailer - Happiest MindsCase study: Hadoop as ELT for Leading US Retailer - Happiest Minds
Case study: Hadoop as ELT for Leading US Retailer - Happiest MindsHappiest Minds Technologies
 
Peter Elleby - Big Data, Big Noise, Big Hope - No Miracles
Peter Elleby - Big Data, Big Noise, Big Hope - No MiraclesPeter Elleby - Big Data, Big Noise, Big Hope - No Miracles
Peter Elleby - Big Data, Big Noise, Big Hope - No MiraclesWeAreEsynergy
 
Sql rally amsterdam Aanalysing data with Power BI and Hive
Sql rally amsterdam Aanalysing data with Power BI and HiveSql rally amsterdam Aanalysing data with Power BI and Hive
Sql rally amsterdam Aanalysing data with Power BI and HiveJen Stirrup
 

Mais procurados (20)

IBM Big Data Platform, 2012
IBM Big Data Platform, 2012IBM Big Data Platform, 2012
IBM Big Data Platform, 2012
 
Výběr Big Data platformy - Jan Sovka - IBM
Výběr Big Data platformy - Jan Sovka - IBMVýběr Big Data platformy - Jan Sovka - IBM
Výběr Big Data platformy - Jan Sovka - IBM
 
Big data 101
Big data 101Big data 101
Big data 101
 
リアルタイムアクセスログ分析基盤をAWSに構築した話 (JAWS UG BigData Branch)
リアルタイムアクセスログ分析基盤をAWSに構築した話 (JAWS UG BigData Branch)リアルタイムアクセスログ分析基盤をAWSに構築した話 (JAWS UG BigData Branch)
リアルタイムアクセスログ分析基盤をAWSに構築した話 (JAWS UG BigData Branch)
 
Working with data using AI based tools
Working with data using AI based toolsWorking with data using AI based tools
Working with data using AI based tools
 
Big Data
Big DataBig Data
Big Data
 
Big data
Big dataBig data
Big data
 
Qubole presentation for the Cleveland Big Data and Hadoop Meetup
Qubole presentation for the Cleveland Big Data and Hadoop Meetup   Qubole presentation for the Cleveland Big Data and Hadoop Meetup
Qubole presentation for the Cleveland Big Data and Hadoop Meetup
 
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
 
PASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureMLPASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureML
 
Big data
Big dataBig data
Big data
 
Strata Online_road_to_enterprise_data_2011
Strata Online_road_to_enterprise_data_2011Strata Online_road_to_enterprise_data_2011
Strata Online_road_to_enterprise_data_2011
 
Using BigQuery as a main Big Data solution
Using BigQuery as a main Big Data solutionUsing BigQuery as a main Big Data solution
Using BigQuery as a main Big Data solution
 
Kyvos Insights
Kyvos Insights Kyvos Insights
Kyvos Insights
 
Spark + Scikit Learn- Performance Tuning
Spark + Scikit Learn- Performance TuningSpark + Scikit Learn- Performance Tuning
Spark + Scikit Learn- Performance Tuning
 
Hw09 Real Time Business Intelligence
Hw09   Real Time Business IntelligenceHw09   Real Time Business Intelligence
Hw09 Real Time Business Intelligence
 
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
 
Case study: Hadoop as ELT for Leading US Retailer - Happiest Minds
Case study: Hadoop as ELT for Leading US Retailer - Happiest MindsCase study: Hadoop as ELT for Leading US Retailer - Happiest Minds
Case study: Hadoop as ELT for Leading US Retailer - Happiest Minds
 
Peter Elleby - Big Data, Big Noise, Big Hope - No Miracles
Peter Elleby - Big Data, Big Noise, Big Hope - No MiraclesPeter Elleby - Big Data, Big Noise, Big Hope - No Miracles
Peter Elleby - Big Data, Big Noise, Big Hope - No Miracles
 
Sql rally amsterdam Aanalysing data with Power BI and Hive
Sql rally amsterdam Aanalysing data with Power BI and HiveSql rally amsterdam Aanalysing data with Power BI and Hive
Sql rally amsterdam Aanalysing data with Power BI and Hive
 

Destaque

Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsCognizant
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopGERARDO BARBERENA
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureRoman Nikitchenko
 
Scalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and MesosScalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and MesosDataWorks Summit
 
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosHeiko Loewe
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersBlueData, Inc.
 

Destaque (8)

Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to Hadoop
 
Final White Paper_
Final White Paper_Final White Paper_
Final White Paper_
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructure
 
Scalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and MesosScalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and Mesos
 
HPE Keynote Hadoop Summit San Jose 2016
HPE Keynote Hadoop Summit San Jose 2016HPE Keynote Hadoop Summit San Jose 2016
HPE Keynote Hadoop Summit San Jose 2016
 
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and Mesos
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
 

Último

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Último (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Data Infrastructure on Hadoop: Big Picture, Now & Next Wave

Notas do Editor

  1. This session delves into the data infrastructure driving the productivity gains by having the user focus on utilizing the data on Hadoop and not how to get itWe'll also look into the next generation of data infrastructure at Yahoo!
  2. BIG Data is hereData is increasing faster than computeTurning data into Insights isn’t trivial.Bring personal meaning to the WebTurning Data into a Competitive advantage for Yahoo!Make it RelevantAll of that requires an unprecedented ability to process big data, at scale– all at speeds never before thought possible and with a powerful layer of security. I’m talking about 120 terabytes of data and 100 billion events every single day on top of 70 petabytes of data.Hadoop is at the epicenter of big data and cloud computing
  3. Hadoop Clusters are partitioned based on the class of usersManaging data is a challengeWe built a Data Management Solution optimized for space, time and bandwidthProductivity Gains – Users focus on utilizing the data and not how to get itScale: 100 + TB of Data and ~2000+ feeds processed dailySupport standard data loading sources Various Data Sources, Sinks, varying interfacesSupport replication across Hadoop ClustersData Loading SLAReliable data loadingGuarantee Data QualityData Retention ManagementAnonymizationCompliance Archival
  4. Y! Research and SciencesAdvertising & AudienceTargetingAd OptimizationsReportingContent Agility
  5. Key Drivers:Once we have Users focus on utilizing the data and not how to get it, how do we optimize for space and timeHow do we learn about the Hadoop itself so we can build better systems What are the KPIs that we can generate that could help us drive the next wave?
  6. WhatThis attempts to provide insights into how Hadoop infrastructure is actually enabling the business. HowGather all data needed to analyze Hadoop performance into one central hive warehouseNextProvide insights into the Hadoop Clusters usage to allow us to tune better and prioritize features.Generate KPIs for the clusters, such as Availability, Utilization, Capacity Planning, etc.Generate canned reports per Hadoop Cluster.Provides key information to manage the data better such as Archival decisions or management of replicas, etc.Find the Query of Death, jobs that cause the clusters to go down.
  7. UtilizationThe central warehouse of Hadoop logs could drive metering and reporting efforts. Infrastructure utilizationMeteringReportingHow do we drive utilization of resources?
  8. Key Drivers:Each byte of information on average is replicated by a factor of 6 (average 2 clusters with replication factor 3). Some feeds may be copied in as many as 8 clusters.Typically each file is accessed 80% of the times in the first 20 weeksData local maps on average around 20%Millions of metadata files that take up valuable Namenode namespace/memoryOpportunities:Reduce the replication factor after data becomes coldUse Erasure coding for cold dataAging data can be archived into Hadoop Archives, as the access frequency dropsReduce the footprint of Metadata stored in files on HDFS, Howl?
  9. We are hiring!!!