O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Cloud Foundry Summit 2015: Using Service Brokers to Manage Data Lifecycle

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Carregando em…3
×

Confira estes a seguir

1 de 35 Anúncio
Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a Cloud Foundry Summit 2015: Using Service Brokers to Manage Data Lifecycle (20)

Anúncio

Mais de VMware Tanzu (20)

Mais recentes (20)

Anúncio

Cloud Foundry Summit 2015: Using Service Brokers to Manage Data Lifecycle

  1. 1. Using Service Brokers to Manage Data Lifecycle Josh Kruck | @krujos jkruck@pivotal.io github.com/krujos
  2. 2. 2 What are the some operational problems with data?
  3. 3. 3 Primary Primary DR Backup Snapshots Business Critical Data Lifecycle RTO 00:05 RPO 01:00 First 12 hours Replica Backup
  4. 4. 4 Primary Backup Backup Primary Snapshots Replica Backup Business Critical Data Lifecycle RTO 00:05 RPO 01:00 First 24 hours DR
  5. 5. 5 525,600 minutes
  6. 6. 6 5476 copies
  7. 7. 7
  8. 8. 8 (capex is easy, just buy more stuff) copies aren’t really the problem!
  9. 9. 9 The real problem is 5476 copies are…
  10. 10. 10 managed by 3 systems [“storage”, “backup”, “rdbms”]
  11. 11. 11 and 5 teams. [ “storage”, “backup”, “offsite provider”, “app owner”, “dba” ]
  12. 12. 12 (you shouldn't buy more people) opex is the problem
  13. 13. 13 what’s the read/write load on the copy?
  14. 14. 14 0 5475 copies doing nothing for your business
  15. 15. 15 Why all this talk about backups and stuff? ?
  16. 16. 16 Good code needs good tests. Good tests need good data. Good data needs… a copy. A play in 3 acts so lets get one!
  17. 17. 17 “I don’t think we have any copies of that”
  18. 18. 18 “I not allowed to have prod logs, much less the db”
  19. 19. 19 we can do it, this one time: file a ticket.
  20. 20. 20 Solved! But did we create another problem?
  21. 21. 21 Once you find a copy, it needs a curator Sizing (don’t use all of 10 TB of prod to test) But your sample must represent the entirety of the dataset. Representative curation is futile with most datasets (unknown unknowns). Sizing means you restrict your tests to what you left in. Sizing hides performance issues (missing index) So maybe it’s not worth it….
  22. 22. 22 Once you find a copy, it needs a curator Sanitize it! Can’t have SSN’s and CC in test
  23. 23. 23 Once you find a copy, it needs a curator Delete! old data smells funny.
  24. 24. 24 Once you find a copy, it needs a curator Refresh! GOTO 10
  25. 25. 25 hard|complex manual infrequent error prone handoffs deletion ownership Curation is expensive
  26. 26. 26 A manual process that starts with a ticket is the wrong solution
  27. 27. 27 The sum of the mess is worth more than its parts There’s 5475 secondary copies with no load, can we leverage them for testing? Fix: Let CF manage your data.
  28. 28. 28 How?
  29. 29. 29 most copies do nothing, but when the sky is falling you need them first do no harm
  30. 30. 30 cf create-service Copy Data Sanitize Data cf push <app> Test cf delete app -r -f cf delete-service Pattern:
  31. 31. 31 How do you fill in that hand wavy part in the middle?
  32. 32. 32 Putting the E in Enterprise Buy a CDM Product Actifio, Delphix, ViPR Great if they support your workloads! And you can consume the form factors they deliver
  33. 33. 33 Based on technology to allow layered writes Layered FS (Docker, Docker, Docker)? Clones, Linked Clones, VM Snaps Writeable Snapshots (FlexClone, XtremIO, LVM Snaps) Building is harder than buying BYO
  34. 34. 34 cf create-service Snap Prod VM Spin up VM Allocate IP Sanitize Data in PG cf push demo Test Dispose AMI and Postgres Demo
  35. 35. 35 https://github.com/krujos/data-lifecycle-service-broker please help!

Notas do Editor

  • First, act, how do I get the copies?
  • much sleuthing and failed attempts to generate legit test data later…
  • Act II
  • ACT III
    I have a customer who hasn’t refreshed test data in three years.
  • ACT III
    I have a customer who hasn’t refreshed test data in three years.
  • Represent the entirety of the dataset means things like previous schemas. Rows with missing additive fields, FK’s etc. Is selecting those records going to cause issues? What about formats assumed in the data itself (but surely no one stores encoded information in their database).
    Everyone knows the data well enough to know what representative is? (no)
  • Represent the entirety of the dataset means things like previous schemas. Rows with missing additive fields, FK’s etc. Is selecting those records going to cause issues? What about formats assumed in the data itself (but surely no one stores encoded information in their database).
    Everyone knows the data well enough to know what representative is? (no)
  • Represent the entirety of the dataset means things like previous schemas. Rows with missing additive fields, FK’s etc. Is selecting those records going to cause issues? What about formats assumed in the data itself (but surely no one stores encoded information in their database).
    Everyone knows the data well enough to know what representative is? (no)
  • Represent the entirety of the dataset means things like previous schemas. Rows with missing additive fields, FK’s etc. Is selecting those records going to cause issues? What about formats assumed in the data itself (but surely no one stores encoded information in their database).
    Everyone knows the data well enough to know what representative is? (no)

×