implementation of a big data architecture for real-time analytics with data s...Joseph Arriola
My topic presented in DataStax Accelerate 2019 was "Implementation of a Big Data architecture for real-time analytics with DataStax Enterprise Graph, Analytics and Search". To show some of the most widely used open source technologies in the market. and how to integrate them with an Enterprise tool, looking for do real-time analytics.
Primary and Clustering Keys should be one of the very first things you learn about when modeling Cassandra data. Most people coming from a relational background automatically think, "Yeah, I know what a Primary Key is", and gloss right over it. Because of this, there always seems to be a lot of confusion around the topic of Primary Keys in Cassandra. This presentation will demystify that confusion. I will cover what the different types of Keys are, how they can be used, what their purpose is, and how they affect your queries.
For this presentation, I will be using CrossFit gym locations as my subject matter. I will explain the differences between Primary Keys, Compound Keys, Clustering Keys, & Composite Keys. I will also show how the data behind each type differs as stored on disk. Lastly, I will show what queries each type of key will support.
Cassandra Summit: Data Modeling A Scheduling AppAdam Hutson
The task of a data modeler is to create order out of chaos without excessively distorting the truth. The finished product should be data model that describes the structure, manipulation and integrity aspects of the data to be stored. To properly create a data model, the modeler will transform said chaos through three distinct stages. The first is a Conceptual Data Model, then a Logical Data Model, and lastly, a Physical Data Model. This presentation will cover all three stages. It will also walk through the creation of a full blown data model for a service scheduling application to be built with a Cassandra backend. This presentation is based on a 3 part blog series I wrote for DataScale.
Build Your Own Data Beast : Greenplum + Dellskahler
Looking for a software and hardware stack that can get you results with big data now? Greenplum Database plus Dell hardware will help you construct a data science monster.
This presentation provides an overview of DataStax Enterprise, which is a smart data platform built on Apache Cassandra that manages both real-time and analytic data in the same database cluster.
Decoupling Compute and Storage for Data WorkloadsAlluxio, Inc.
This was presented by Carlos Quieroz, Head of Data Platform at Development Bank of Singapore, at the Data Transformation in Financial Services meetup in Singapore jointly hosted by Accenture, Talend, BigDataSG Hadoop, and Alluxio.
implementation of a big data architecture for real-time analytics with data s...Joseph Arriola
My topic presented in DataStax Accelerate 2019 was "Implementation of a Big Data architecture for real-time analytics with DataStax Enterprise Graph, Analytics and Search". To show some of the most widely used open source technologies in the market. and how to integrate them with an Enterprise tool, looking for do real-time analytics.
Primary and Clustering Keys should be one of the very first things you learn about when modeling Cassandra data. Most people coming from a relational background automatically think, "Yeah, I know what a Primary Key is", and gloss right over it. Because of this, there always seems to be a lot of confusion around the topic of Primary Keys in Cassandra. This presentation will demystify that confusion. I will cover what the different types of Keys are, how they can be used, what their purpose is, and how they affect your queries.
For this presentation, I will be using CrossFit gym locations as my subject matter. I will explain the differences between Primary Keys, Compound Keys, Clustering Keys, & Composite Keys. I will also show how the data behind each type differs as stored on disk. Lastly, I will show what queries each type of key will support.
Cassandra Summit: Data Modeling A Scheduling AppAdam Hutson
The task of a data modeler is to create order out of chaos without excessively distorting the truth. The finished product should be data model that describes the structure, manipulation and integrity aspects of the data to be stored. To properly create a data model, the modeler will transform said chaos through three distinct stages. The first is a Conceptual Data Model, then a Logical Data Model, and lastly, a Physical Data Model. This presentation will cover all three stages. It will also walk through the creation of a full blown data model for a service scheduling application to be built with a Cassandra backend. This presentation is based on a 3 part blog series I wrote for DataScale.
Build Your Own Data Beast : Greenplum + Dellskahler
Looking for a software and hardware stack that can get you results with big data now? Greenplum Database plus Dell hardware will help you construct a data science monster.
This presentation provides an overview of DataStax Enterprise, which is a smart data platform built on Apache Cassandra that manages both real-time and analytic data in the same database cluster.
Decoupling Compute and Storage for Data WorkloadsAlluxio, Inc.
This was presented by Carlos Quieroz, Head of Data Platform at Development Bank of Singapore, at the Data Transformation in Financial Services meetup in Singapore jointly hosted by Accenture, Talend, BigDataSG Hadoop, and Alluxio.
Get Results, Build Your Own Big Data Beast : Greenplum + Dellskahler
Build your own big data cluster using a software package that enables your current DBAs and Analysts to step up their data science game. Dell hardware and Pivotal Greenplum Database are a perfect match.
Polyglot Database - Linuxcon North America 2016Dave Stokes
Many Relation Databases are adding NoSQL features to their products. So what happens when you can get direct access to the data as a key/value pair, or you can store an entire document in a column of a relational table, and more
Talk I gave at a recent Shoreditch Village Hall for an IBM BlueMix meetup. Some high level Cloudant intro stuff, but then posing a more interesting question of whether the future of the Enterprise Cloud will actually be defined by the Service / People element of Managed Services vs the software / hosting bit. Hat tip to Simon Metson (@drsm79) for the initial idea behind this and coining the idea of Expertise as a Service
With Azure Data Lake Store, analyze all of your data in one place with no artificial constraints. Data Lake Store can store trillions of files.
Azure Data Lake Analytics: Easily develop and run massively parallel data transformation and processing programs in U-SQL, R, Python, and .NET over petabytes of data. With no infrastructure to manage, you can process data on demand, scale instantly, and only pay per job.
Aleksejs Nemirovskis - Manage your data using oracle BDAAndrejs Vorobjovs
Manage Your Data, Using Oracle Big Data Appliance - Tips & Tricksngest, process and manage the data, using Oracle Big Data Appliance (end-to-end BigData solution from Oracle):
- Oracle BDA architecture and componets overview - Oracle platform, Cloudera CDH, Clodera Manager and specific Oracle components;
- Advantages and additional value of an Oracle BDA;
- Challenges, faced inside whole stack (BDA, Cloudera);
- Challenges, which came from original Hadoop EcoSystem;
- Customer case (anonymized): how to utilize a power of an Oracle BDA, including external Informatica Big Data Management tool.
Xanadu is the most advanced big data platform technology developed by eMediaTrack in collaboration with researchers at Oxford University. Xanadu enables the NoSQL write-only (archiving) key-value-time database functionality with ACID properties. The ACID properties are essential for the mission-critical systems (e.g., financial services, healthcare services, national security intelligence). The ACID properties are also essential for many predictive analysis. Key-value-time store (time series of all key-value pairs) facilitates easy extraction and analysis based on multiple time series. Xanadu provides a highly scalable, available, fault tolerant, de-duplicated, immortal, and high performance big data store. Xanadu reduces the actual storage footprint by recognising duplicate data and maintaining only as many copies as it needs to assure its availability. Xanadu exploits the Distributed File System (DFS) technology using commodity servers and storage devices. Xanadu enables database with built-in massively parallel big data processing exploiting the DFS (Hadoop MapReduce being just one example). Xanadu enables real-time big data analytics. Xanadu enables Javascript-based, complex, integrated, parallel queries. This last feature lowers the entry barrier for developers using the most widespread web programming language to transform themselves into powerful, high-valued 'data scientists.'
How Open Source Will Change How You Think about Storage - LGI Tech SummitScott Ryan
As software eats the world, open source always follows. Storage is being fundamentally disrupted by open source. This presentation covers software-defined storage and open source trends and how they affect traditional storage.
Cassandra is well-known for its best-in-class multi-DC. However, some advanced use cases require replicating data between clusters. Some of these use cases involve constrained deployment scenarios such as limited bandwidth or frequent network disruptions which can put significant strains on multi-DC setups. Other use cases include advanced data flows with Hub-and-Spoke configurations.
In this talk we will introduce you to DSE's new Advanced Replication capability that allows for these data flows in such constrained environments. We will provide some motivating use cases, an architecture overview, and some customer and testing results.
About the Speakers
Brian Hess Senior Product Manager, Analytics, DataStax
Brian has been in the analytics space for over 15 years ranging from government to data mining applied research to analytics in enterprise data warehousing and NoSQL engines, in roles ranging from Cryptologic Mathematician to Director of Advanced Analytics to Senior Product Manager. In all these roles he has pushed data analytics and processing to massive scales in order to solve problems that were previously unsolvable.
Cliff Gilmore Solution Architect, DataStax
Cliff Gilmore
Its time to memorize everything about Object Storage from this exciting chart from A to Z. Later, get ready to take the fun quiz Here is the link: https://itblog.sandisk.com/object-storage-quiz/
Spark + Flashblade: Spark Summit East talk by Brian GoldSpark Summit
Modern infrastructure and applications generate extraordinary volumes of log and telemetry data. At Pure Storage, we know this first hand: we have over 5PB of log data from production customers running our all-flash storage systems, from our engineering testbeds, and from test stations at manufacturing partners. Every part of our company — from engineering to sales — now depends on the insights we gather from this data. Given the diversity of our end users, it’s no surprise that our analysis tools comprise a broad mix of reporting queries, stream-processing operations, ad-hoc analyses, and deeper machine-learning algorithms. In this session, we will cover lessons learned from scaling our data warehouse and how we are leveraging Apache Spark’s capabilities as a central hub to meet our analytics demands.
Big data requires service that can orchestrate and operationalize processes to refine the enormous stores of raw data into actionable business insights. Azure Data Factory is a managed cloud service that's built for these complex hybrid extract-transform-load (ETL), extract-load-transform (ELT), and data integration projects.
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & MoreAlluxio, Inc.
Alluxio - Data Orchestration for Analytics and AI in the Cloud
Oct 8, 2019
Speakers:
Haoyuan Li & Bin Fan, Alluxio
Visit https://www.alluxio.io/events/ for more Alluxio events.
Be ready to big data challenges.
The material was composed based on the performance of Leonid Sokolov, Big Data Architect from GreenM.
Full article https://medium.com/greenm/scalable-data-pipeline-f5d3c8f7a6d9
In-Memory Logical Data Warehouse for accelerating Machine Learning Pipelines ...Gianmario Spacagna
Abstract:
Legacy enterprise architectures still rely on relational data warehouse and require moving and syncing with the so-called "Data Lake" where raw data is stored and periodically ingested into a distributed file system such as HDFS.
Moreover, there are a number of use cases where you might want to avoid storing data on the development cluster disks, such as for regulations or reducing latency, in which case Alluxio (previously known as Tachyon) can make this data available in-memory and shared among multiple applications.
We propose an Agile workflow by combining Spark, Scala, DataFrame (and the recent DataSet API), JDBC, Parquet, Kryo and Alluxio to create a scalable, in-memory, reactive stack to explore data directly from source and develop high quality machine learning pipelines that can then be deployed straight into production.
In this talk we will:
* Present how to load raw data from an RDBMS and use Spark to make it available as a DataSet
* Explain the iterative exploratory process and advantages of adopting functional programming
* Make a crucial analysis on the issues faced with the existing methodology
* Show how to deploy Alluxio and how it greatly improved the existing workflow by providing the desired in-memory solution and by decreasing the loading time from hours to seconds
* Discuss some future improvements to the overall architecture
Bio:
Gianmario is a Senior Data Scientist at Pirelli Tyre, processing telemetry data for smart manufacturing and connected vehicles applications.
His main expertise is on building production-oriented machine learning systems.
Co-author of the Professional Manifesto for Data Science (datasciencemanifesto.com), founder of the Data Science Milan Meetup group and currently writing "Python Deep Learning" book (will be published soon).
He loves evangelising his passion for best practices and effective methodologies amongst the community.
Prior to Pirelli, he worked in Financial Services (Barclays), Cyber Security (Cisco) and Predictive Marketing (AgilOne).
Storage Systems for High Scalable Systems Presentationandyman3000
Presentation from http://www.hfadeel.com/Blog/?p=151 what kind of storage systems players like Facebook or Google use for their extreme scalability requirements
Get Results, Build Your Own Big Data Beast : Greenplum + Dellskahler
Build your own big data cluster using a software package that enables your current DBAs and Analysts to step up their data science game. Dell hardware and Pivotal Greenplum Database are a perfect match.
Polyglot Database - Linuxcon North America 2016Dave Stokes
Many Relation Databases are adding NoSQL features to their products. So what happens when you can get direct access to the data as a key/value pair, or you can store an entire document in a column of a relational table, and more
Talk I gave at a recent Shoreditch Village Hall for an IBM BlueMix meetup. Some high level Cloudant intro stuff, but then posing a more interesting question of whether the future of the Enterprise Cloud will actually be defined by the Service / People element of Managed Services vs the software / hosting bit. Hat tip to Simon Metson (@drsm79) for the initial idea behind this and coining the idea of Expertise as a Service
With Azure Data Lake Store, analyze all of your data in one place with no artificial constraints. Data Lake Store can store trillions of files.
Azure Data Lake Analytics: Easily develop and run massively parallel data transformation and processing programs in U-SQL, R, Python, and .NET over petabytes of data. With no infrastructure to manage, you can process data on demand, scale instantly, and only pay per job.
Aleksejs Nemirovskis - Manage your data using oracle BDAAndrejs Vorobjovs
Manage Your Data, Using Oracle Big Data Appliance - Tips & Tricksngest, process and manage the data, using Oracle Big Data Appliance (end-to-end BigData solution from Oracle):
- Oracle BDA architecture and componets overview - Oracle platform, Cloudera CDH, Clodera Manager and specific Oracle components;
- Advantages and additional value of an Oracle BDA;
- Challenges, faced inside whole stack (BDA, Cloudera);
- Challenges, which came from original Hadoop EcoSystem;
- Customer case (anonymized): how to utilize a power of an Oracle BDA, including external Informatica Big Data Management tool.
Xanadu is the most advanced big data platform technology developed by eMediaTrack in collaboration with researchers at Oxford University. Xanadu enables the NoSQL write-only (archiving) key-value-time database functionality with ACID properties. The ACID properties are essential for the mission-critical systems (e.g., financial services, healthcare services, national security intelligence). The ACID properties are also essential for many predictive analysis. Key-value-time store (time series of all key-value pairs) facilitates easy extraction and analysis based on multiple time series. Xanadu provides a highly scalable, available, fault tolerant, de-duplicated, immortal, and high performance big data store. Xanadu reduces the actual storage footprint by recognising duplicate data and maintaining only as many copies as it needs to assure its availability. Xanadu exploits the Distributed File System (DFS) technology using commodity servers and storage devices. Xanadu enables database with built-in massively parallel big data processing exploiting the DFS (Hadoop MapReduce being just one example). Xanadu enables real-time big data analytics. Xanadu enables Javascript-based, complex, integrated, parallel queries. This last feature lowers the entry barrier for developers using the most widespread web programming language to transform themselves into powerful, high-valued 'data scientists.'
How Open Source Will Change How You Think about Storage - LGI Tech SummitScott Ryan
As software eats the world, open source always follows. Storage is being fundamentally disrupted by open source. This presentation covers software-defined storage and open source trends and how they affect traditional storage.
Cassandra is well-known for its best-in-class multi-DC. However, some advanced use cases require replicating data between clusters. Some of these use cases involve constrained deployment scenarios such as limited bandwidth or frequent network disruptions which can put significant strains on multi-DC setups. Other use cases include advanced data flows with Hub-and-Spoke configurations.
In this talk we will introduce you to DSE's new Advanced Replication capability that allows for these data flows in such constrained environments. We will provide some motivating use cases, an architecture overview, and some customer and testing results.
About the Speakers
Brian Hess Senior Product Manager, Analytics, DataStax
Brian has been in the analytics space for over 15 years ranging from government to data mining applied research to analytics in enterprise data warehousing and NoSQL engines, in roles ranging from Cryptologic Mathematician to Director of Advanced Analytics to Senior Product Manager. In all these roles he has pushed data analytics and processing to massive scales in order to solve problems that were previously unsolvable.
Cliff Gilmore Solution Architect, DataStax
Cliff Gilmore
Its time to memorize everything about Object Storage from this exciting chart from A to Z. Later, get ready to take the fun quiz Here is the link: https://itblog.sandisk.com/object-storage-quiz/
Spark + Flashblade: Spark Summit East talk by Brian GoldSpark Summit
Modern infrastructure and applications generate extraordinary volumes of log and telemetry data. At Pure Storage, we know this first hand: we have over 5PB of log data from production customers running our all-flash storage systems, from our engineering testbeds, and from test stations at manufacturing partners. Every part of our company — from engineering to sales — now depends on the insights we gather from this data. Given the diversity of our end users, it’s no surprise that our analysis tools comprise a broad mix of reporting queries, stream-processing operations, ad-hoc analyses, and deeper machine-learning algorithms. In this session, we will cover lessons learned from scaling our data warehouse and how we are leveraging Apache Spark’s capabilities as a central hub to meet our analytics demands.
Big data requires service that can orchestrate and operationalize processes to refine the enormous stores of raw data into actionable business insights. Azure Data Factory is a managed cloud service that's built for these complex hybrid extract-transform-load (ETL), extract-load-transform (ELT), and data integration projects.
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & MoreAlluxio, Inc.
Alluxio - Data Orchestration for Analytics and AI in the Cloud
Oct 8, 2019
Speakers:
Haoyuan Li & Bin Fan, Alluxio
Visit https://www.alluxio.io/events/ for more Alluxio events.
Be ready to big data challenges.
The material was composed based on the performance of Leonid Sokolov, Big Data Architect from GreenM.
Full article https://medium.com/greenm/scalable-data-pipeline-f5d3c8f7a6d9
In-Memory Logical Data Warehouse for accelerating Machine Learning Pipelines ...Gianmario Spacagna
Abstract:
Legacy enterprise architectures still rely on relational data warehouse and require moving and syncing with the so-called "Data Lake" where raw data is stored and periodically ingested into a distributed file system such as HDFS.
Moreover, there are a number of use cases where you might want to avoid storing data on the development cluster disks, such as for regulations or reducing latency, in which case Alluxio (previously known as Tachyon) can make this data available in-memory and shared among multiple applications.
We propose an Agile workflow by combining Spark, Scala, DataFrame (and the recent DataSet API), JDBC, Parquet, Kryo and Alluxio to create a scalable, in-memory, reactive stack to explore data directly from source and develop high quality machine learning pipelines that can then be deployed straight into production.
In this talk we will:
* Present how to load raw data from an RDBMS and use Spark to make it available as a DataSet
* Explain the iterative exploratory process and advantages of adopting functional programming
* Make a crucial analysis on the issues faced with the existing methodology
* Show how to deploy Alluxio and how it greatly improved the existing workflow by providing the desired in-memory solution and by decreasing the loading time from hours to seconds
* Discuss some future improvements to the overall architecture
Bio:
Gianmario is a Senior Data Scientist at Pirelli Tyre, processing telemetry data for smart manufacturing and connected vehicles applications.
His main expertise is on building production-oriented machine learning systems.
Co-author of the Professional Manifesto for Data Science (datasciencemanifesto.com), founder of the Data Science Milan Meetup group and currently writing "Python Deep Learning" book (will be published soon).
He loves evangelising his passion for best practices and effective methodologies amongst the community.
Prior to Pirelli, he worked in Financial Services (Barclays), Cyber Security (Cisco) and Predictive Marketing (AgilOne).
Storage Systems for High Scalable Systems Presentationandyman3000
Presentation from http://www.hfadeel.com/Blog/?p=151 what kind of storage systems players like Facebook or Google use for their extreme scalability requirements
Aerospike meetup july 2019 | Big Data DemystifiedOmid Vahdaty
Building a low latency (sub millisecond), high throughput database that can handle big data AND linearly scale is not easy - but we did it anyway...
In this session we will get to know Aerospike, an enterprise distributed primary key database solution.
- We will do an introduction to Aerospike - basic terms, how it works and why is it widely used in mission critical systems deployments.
- We will understand the 'magic' behind Aerospike ability to handle small, medium and even Petabyte scale data, and still guarantee predictable performance of sub-millisecond latency
- We will learn how Aerospike devops is different than other solutions in the market, and see how easy it is to run it on cloud environments as well as on premise.
We will also run a demo - showing a live example of the performance and self-healing technologies the database have to offer.
Advanced Deployment by Jonathan Weiss presented at Scotland on Rails 2009 in Edinburgh. Deployment and Scaling best practices. See more at http://scotlandonrails.com/schedule/28-march/advanced-deployment/
This presentation breaks down the Aerospike Key Value Data Access. It covers the topics of Structured vs Unstructured Data, Database Hierarchy & Definitions as well as Data Patterns.
Scale confidently. From laptop to lots of nodes to multi-cluster, multi-use case deployments, Elastic experts are sharing best practices to master and pitfalls to avoid when it comes to scaling Elasticsearch.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
2. About this presentation
• Purpose:
Explain how “Lean” organization principles
(in this case, a “6S” event) apply to IT
departments systems.
• Format of Presentation:
Designed to explain principles in a way that
can be directly used in any company to kick
off a “6S” event.
3. The problem.
• Clutter hiding important data
• Unclear ownership
• Obsolete data
• Hardware costs $ (server backup equipment)
• Unclear rules about retention
• Duplicated files
4. Current State
(example*)
• Number of files: 97,822
• Number of folders: 6,758
• Top-level folders: 238
• Disk space used: 136 GB
*Example from a single server’s network share.