What are the symptoms of a poorly managed data center facility? How close are you to an operating failure or catastrophic downtime event? Learn how to spot the warning signs and start improving your facility management program immediately to minimize the risk of downtime, reduce costs, and upgrade your operations.
7. • Overtime hours exceeding 10%
• Voice mail boxes full
• Emails not responded to
• Email inbox size limit exceeded
• Meetings missed or routinely cancelled
• No time for training
• Shortage of qualified staff
• Personnel performing work outside their competency
• Everything is an emergency
• Personnel turnover
What Else Is Going On?
7
8. • Break fix budget exceeded
• Maintenance budget exceeded
• Energy cost estimate exceeded or unknown
• Last minute deployment requirements
• No organization chart
• No responsibilities matrix
• No records of maintenance activities
• No written policies & procedures
• No preventive maintenance schedule
• Back of the server looks like a spaghetti pot exploded
The Issues Add Up
8
9. • Cabling is not labeled or worse incorrectly labeled
• Equipment is not uniquely labeled
• Loads are consistently out of balance
• Capacities are not managed or tracked
• Deferred maintenance exceeds 10%
• Housekeeping: if it looks like a mess, it is a mess
Maybe you don’t have a crisis, but how do you know how well
your data center operation compares to rest of industry?
The Issues Add Up
9
10. Are you confident in your Facilities team’s capability to
manage a technologically advanced and highly efficient design
to your 24 x 7 uptime requirements?
• Can you easily replace any member of that team?
• Are you protected against poor operations practices
migrating from older sites to higher criticality data centers?
• Do you have sites that operate in isolation, ignoring global
corporate standards?
• Do you even have corporate global standards?
• If you outsource any aspect of your data center operations,
how do you avoid losing responsibility and accountability?
• Do you manage an outsourcing contract. . . . or direct an
expert team?
Ask the Tough Questions
10
11. • Initial review
• Gap analysis against industry best practices
§ Staffing and Organization
§ Maintenance
§ Training
§ Planning, Coordination & Management
§ Operating Conditions
• Roadmap to operational excellence
• Plan changes
• Implement changes
• Monitor & refine
• Annual review
Path to Data Center Operations Success
11
12. Key Elements of Facilities Management
Staffing and Organization
• Staffing
• Qualifications
• Organization
Maintenance
• Preventative Maintenance (PM)
Program
• Housekeeping Policies
• Maintenance Management
System (MMS)
• Vendor Support
• Deferred Maint. Program
• Predictive Maintenance
• Life-Cycle Planning
• Failure Analysis Program
12
13. Key Elements of Facilities Management
Training
• Data Center Staff
• Vendors
Planning, Coordination,
and Management
• Site Policies
• Financial Management
• Reference Library
• Computer Room Mgmt.
Operating Conditions
• Load Management
• Operating Set Points
• Alternating Use of
Infrastructure Equipment
13
14. The Uptime Institute over the years has observed
management issues posing the largest risk to uptime
physical infrastructure
• Inadequate staffing
• Ineffective or non-existing maintenance and training programs
• Lacking processes and procedures
• Resulting in the majority of outages being caused by
‘human error’
No standard existed to help Owners/Operators determine
• Common language/vocabulary of data center operations
• Focus of data center management
• Resource allocation
• Resource requirements
Genesis of Industry Best Practices
14
15. Data Center Owners / Operators / End Users
• Increased availability and cost savings
• Multi-site consistency
• Benchmark for continuous monitoring and refinement
Colocation / Managed Services Sites
• All of the above plus…
• Customer assurance of consistency
• Competitive differentiator (attain & retain certification)
Industry Benchmark
• No need to reply on opinions and anecdotes
Value of Industry Best Practices
15
16. Uptime Institute has been conducting Operational
Sustainability Reviews for approximately 3 years—
based upon decades of site operations knowledge and
experience:
• Operational Sustainability Certifications: Tier + Gold, Silver, or Bronze
• Management & Operations (M&O) Stamps of Approval
See http://uptimeinstitute.com/publications for
Tier Standard: Operational Sustainability
Best Practices Reviews
16
17. Staffing
• Inadequate staffing
• Excessive overtime (over 10%)
• No escalation process
Qualification
• No list of required qualifications
• No experience with data center specific equipment
Organization
• Roles and Responsibilities not documented
• Data center organization not integrated
Staffing and Organization Significant Findings
17
18. Preventive Maintenance (PM)
• No list of required PM activities
• PM activities not fully scripted
• No quality control process
Housekeeping
• Combustibles in the data center
• No documented housekeeping policy
Maintenance Management System (MMS)
• No list of equipment
• Missing critical data: warranty info, maintenance history, performance
data, etc.
Maintenance Significant Findings
18
19. Vendor Support
• Contracts missing response times, call-in process, detail SOW, or
technician qualifications
Deferred Maintenance
• Unable to produce Deferred maintenance report from MMS
Predictive Maintenance
• No predictive maintenance program
• Not comparing current results with previous results
Maintenance Significant Findings
19
20. Life-Cycle Planning
• No life-cycle plan
• Not using MMS data to develop plan
Failure Analysis
• No record of outages or near misses
Maintenance Significant Findings
20
21. Data Center Staff
• Undocumented On-the-Job (OJT) programs
• No formal qualification program
• No list of training required by position
• No formal training program with lesson plans, etc.
Vendors
• No briefing for escorted vendors
Training Significant Findings
21
22. Load Management
• Alarm settings not documented
• Alarms not set on PDUs to ensure maximum loads are not exceeded
Operating Set Points
• Cooling set points are not document or part of
Change Management Process
• Changing of set points is not controlled
Operating Conditions Significant Findings
22
23. Site Policies
• Missing Site Policies
• Especially Site Configuration Policy
Reference Library
• No process for keeping documents up-to-date
Capacity Management
• No process for forecasting future space, power, and cooling
requirements
• No active tracking of cooling capacity
• Ineffective management of Cold Aisles /Hot Aisles
• Electrical power monitoring (balancing phases)
Planning, Coordination, and Management
Significant Findings
23
24. Facilities
• Operate and maintain the critical facility infrastructure
• Support the installation of IT equipment (space, power, & cooling)
IT Management
• Operate and maintain IT hardware, software, applications, and
network connectivity
• Manage the installation/de-installation of IT equipment
Security
• Access Control
• Physical Security
Typical Data Center Disciplines
24
25. Functionally Separate Organization
• Corporate Real Estate (Facilities)
• IT
• Security
Communication between organizations was
typically poor
• Data center activities conducted without coordination
• Poor future space, power, and cooling planning
No individual responsible for all aspects of operating
a data center
Past Organizational Structures
25
26. Factors driving changes to organizational structure
• Rapid changes in technology and speed at which capacity must be
brought online
• Increased costs associate with IT and Facilities
• Business objectives of continuous computing availability
Legacy organizations could not accommodate quickly
evolving business requirements
• Slow to respond
• Not integrated
Evolving Organizational Structure
26
27. The value of industry best practices is in the process of
continuous improvement
• Discovery leads to learning
• Learning leads to change
• Change leads to improvement
• Regular reviews leads to discovery
• Crises can be avoided
Summary
27