Presentation on how to design infrastructure services for meaningful ops intelligence, and how to integrate ops intelligence as feedback for software development
Powerful Google developer tools for immediate impact! (2023-24 C)
Spit, Gather, Churn - Mining Infrastructure Data for Ops Intelligence
1. Spit , Gather, Churn
Mining Infrastructure Data for Ops Intelligence
Ranjib Dey
Twitter: @RanjibDey
IRC/Github :@ranjibd
2. About Me
• Senior software engineer in the CD practice
group @ThoughtWorks India
• Was system administrator before
@ThoughtWorks India
• Worked on life science related algorithms
@Persistent Systems before that.
• Masters in Bio-Informatics (thesis on
HPC, Machine Learning)
• Life Science graduate
3. Agenda
• What is Ops intelligence?
• Why its needed? Implications of Ops
Intelligence.
• Why it is important now?
• Designing intelligent infrastructure services
• How the future looks like?
• Q&A
4. What is Ops Intelligence?
• Suitable for fast , meaningful ops feedback to
business
• Abstracts infrastructure details
• Tech-Stack neutral
• Allows forecasting
• Pre-emptive in nature
7. Why its important now?
• Market volatility increased
• Its not the development, but the deployment
, release and maintenance that’s introducing
delay.
• Cloud is here
• Infrastructure tooling is matured
• Continuous Delivery and DevOps movement is
on
8. Designing intelligent infrastructure
services
• End user driven services
• Adhere to core unix philosophies
• Remember the ‘|’ , don’t create dead ends
• Feedback driven , iterative improvement
• Think of horizontal scalability
• Infrastructure as a code
10. Metrics
• An unit test for a method and a monitoring
service for each infrastructure service
• A single monitoring service can have multiple
metrics
• Metrics can have relationships
• These features should be configurable
12. Logging
• Decouple logging framework from the core
services
• Have configurable logging levels
• Enforce appropriate logging and levels
• Enforce logging patterns
• Logs and logging patterns can be modeled as
metric too.
14. Gathering Ops Information
• Information aggregation
• Consider how you will use it
• Metrics and Logs
• Centralized logging
15. Gathering Ops information
• Two main patterns:
– Time series data
– OLAP Cubes
• Storage engine considerations
– Flat files
– RRDs
– NoSQLs and other distributed storage systems
16. Churning Ops Information
• Visualizations
– Charting
– Trending
– Customized Visualizations
• Dashboards
– Customized views for stake holders
– Information Radiators
23. How the future looks like?
• IaaS
• Ops is not the bottleneck
• Context aware infrastructure
• Test driven infrastructure
• SSH is not a must
• “ The machines are alive” – Jon Crosby
…… and they are emerging