Roll back doesn’t exist. It’s not real. It’s a fantasy, a dream, a delusion. Any vendor who tells you they have a roll back capability is lying to you. And lying to you in a downright dangerous way that will come back to haunt you at 4am in a war room when someone says:
“We can’t fix this. Let’s roll back the deployment.”
This talk is designed to explain and demonstrate to Operations staff:
Why roll back is a fantasy and explained with a dash of Werner Heisenberg
Why it is dangerous and how you can recognize when you’re about to get trapped
How you can avoid falling into that trap of considering it an appropriate compensating control.
It’ll also explain what you can actually do operationally instead of “rolling back”. This will cover other alternative compensating controls that can help you get running again and resolve your outage whilst still allowing you to find root cause.
20. On system rollback and totalised fields
An algebraic approach to system change
Mark Burgess and Alva Couch
20th June 2011
http://cfengine.com/markburgess/papers/totalfield.p
df
This is very much opinion based on experience. Everyone’s shop is different – everyone has different constraints and requirements.A trading house differs from a twitter analytics company differs from a hospital from a .gov/Fed.Distance the discussion from ”the sometimes emotional standpoints that bind system administrators to the notion of rollback: desperately wanting does not make it possible”But every shop has technical heritage and technical debtEstablished institutional memory/remembered painApproach with an open mind and don’t make assumptionsWelcome new ideas and evaluate old constructsYou don’t have to agree / you can think I am a clueless idiot – as long as you do so based on clear, established data not “we’re a different special snowflake” because you’re fucking not.
Changed my views a little since writing the abstract.
Trad/Modern – arbitrary labelsDatabase rollback, transactional rollbackIn single-threaded and parallel software applications, many authors have developed a ‘journaling’ approach to reversibility and rollback (see foregoing references on checkpointing). A stackof state-history can be kept to arbitrary accuracy (and at proportional cost), provided there is sufficient memory to document changes.
Service rollback. Many interconnecting components.Interconnectness between application(s) and infrastructure changesRelease management, Checkpointing, Snapshots and version control In more general ‘open’ (or incompletely specified) systems the cost of maintaining history increases without bound as system complexity increases.
Rollback isn’t a myth – for certain definitions in certain circumstances it MAY be possible to do something that resembles a rollback.
Roll-back recovery requires that the operations between the checkpoint and the detected erroneous state can be made idempotent.
Apply enough money and set enough constraints and you can have something like rollback.Duplicate infrastructure / scale
Roll-back recovery requires that the operations between the checkpoint and the detected erroneous state can be made idempotent.
A cascading rollback occurs in database systems when a transaction (T1) causes a failure and a rollback must be performed. Other transactions dependent on T1's actions must also be rollbacked due to T1's failure, thus causing a cascading effect. That is, one transaction's failure causes many to fail.Practical database recovery techniques guarantee cascadeless rollback, therefore a cascading rollback is not a desirable result.
You must have sufficient memory/storage/resources to maintain sufficient history to rollback to a specified point
Story about University and “You are here” signs. Promised Heisenberg – uncertainty principle lower bound on the precision on which certain pairs of properties of particles can be measured (location / speed). The closer you measure one the harder it is to measure the other. Observer principle – observing things actually resulting in making it hard to measure them.
Story about University and “You are here” signs. Promised Heisenberg – uncertainty principle lower bound on the precision on which certain pairs of properties of particles can be measured (location / speed). The closer you measure one the harder it is to measure the other. Observer principle – observing things actually resulting in making it hard to measure them
A deterministic system is one in which no randomness in the development of future states of the system. Lessons learnt about Complex systems and systems thinking.
I have a Liberal Arts degree and got someone sciency and smart to explain the hard bits to me.
Risk – false sense of security
Unless you are committed to testing 'rollback' on a regular basis,maybe even every deploy, you inevitably end up in a situation where atthe worst possible moment you are going to be depending on a processthat is rarely done.We backup but we never restore.We have UPS/Genneratot but we’ve never tested itWe’ve got DRP but it’s too difficult/dangerous to execute it.
No matter how much you believe things can be tracked there is always something that either can’t be tracked, can’t be predicted or is simply unknown.Deterministic reference.
K.I.S.S – rollback changes are usually made after a production changes fails, when the team is at a low, often tired, often frustrated, often angry.
Return the system to a known good state removing any erroneous transactions from the systems
Return the system to a known good state removing/correcting any erroneous transactions from the systems AND return the system to working order as fast as possible.Are these different? Contradictory?
Dev / Ops / QA
Dev / Ops / QA
http://www.slideshare.net/mmalone/architecture-at-simplegeo-staying-agile-at-scaleIf your system is hard to deploy or you can’t upgrade without org risk then that’s an architectural problem NOT an operational one
Continuous deployment on end of spectrum – other end is more small change rather than big bang change.If it hurts do it more until it stops hurting
Accept failure, learn from it, move forward not backwards, you are going to have to deploy anything you roll back now again sometime in the future.
Having rollback is not an excuse not to SUFFICIENTLY test
Under Siege 2.Don’t assume the past dictates the futureLess NIH and religion – more science and data
Poet John Lydgate – ably stolen by Abraham Lincoln“You can please some of the people all of the time, you can please all of the people some of the time, but you can’t please all of the people all of the time”.
Or worth the effort.
Don’t lie to yourself.from ”the sometimes emotional standpoints that bind system administrators to the notion of rollback: desperately wanting does not make it possible”Thank you.