SlideShare uma empresa Scribd logo
1 de 41
Baixar para ler offline
Drop ACID and think about data

                                    Bob Ippolito
                                  Mochi Media, Inc.
                            PyCon 2009 - Chicago (actually Rosemont)


                                    March 28, 2009




Bob Ippolito (PyCon 2009)           Drop ACID and think about data     March 28, 2009   1 / 35
Author: Bob Ippolito
        Date:   March 2009
        Venue: PyCon 2009




Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   2 / 35
Bob’s Perspective


       Startup with lots of data:

       Cofounded Mochi Media in 2005
       MochiBot analytics platform (for Flash)
       MochiAds ad serving platform (for Flash games)
       Other cool services for game developers




 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   2 / 35
Mochi Ad Sales
       Hard Sell:




 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   3 / 35
What’s ACID?


       A promise ring your DBMS wears:
       Atomicity: all or nothing
       Consistency: no explosions
       Isolation: no fights
       Durability: no lying




 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   4 / 35
ACID Trips

       Scalability and reliability:

       Downtime is unacceptable
       Reliable is >= 2 nodes
       Scalable is ... more
       Networks make it hard
       Networks make it hard
       Networks make it hard



 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   5 / 35
What can I have?


       CAP theorem says pick two:

       Consistency
       Availability
       Partition tolerance




 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   6 / 35
Turn up the BASE


       Write smarter applications:

       Basically Available
       Soft state
       Eventually consistent




 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   7 / 35
BASE jumping

       Everyone else is doing it:

       Google
       Amazon
       eBay
       Yahoo!
       Facebook
       ...



 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   8 / 35
BigTable
           Google:

       Paxos (Chubby)
       Single-master
       Distributed tablets via GFS
       Row/Column db hybrid
       Compression (BMDiff, Zippy)
       Versioned (Row, Column, Timestamp)
       Bloom filters


 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   9 / 35
BigTable Pros


                Pros:

       Compression = Awesome
       Clients are probably simple
       Integrates with map/reduce




 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   10 / 35
BigTable Cons


               Cons:

       Proprietary to Google
       Single-master




 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   11 / 35
BigTable Diagram
       Single-master:




 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   12 / 35
Dynamo


         Amazon:

       Key/Value store
       Consistent hashing
       Vector clocks
       Read repair




 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   13 / 35
Dynamo Pros


                Pros:

      No master
      Highly available for write
      Knobs to make it fast to read
      “Simple” (lots of half-baked clones!)




 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   14 / 35
Dynamo Cons

               Cons:

       Proprietary to Amazon
       Clients need to be smart
       No compression
       Not suitable for column-like workloads
       Just a Key/Value store




 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   15 / 35
Dynamo Diagram
       Smart client:




 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   16 / 35
Cassandra


       Facebook -> Apache:

       Open source!
       No master like Dynamo
       Storage model more like BigTable




 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   17 / 35
Cassandra Pros


                Pros:

       OPEN SOURCE
       Incrementally scalable
       Minimal administration




 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   18 / 35
Cassandra Cons


               Cons:

       Not polished
       No compression yet




 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   19 / 35
Cassandra Diagram
       Soul Calibur:




 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   20 / 35
Distributed Musings


       New Hotness:

       Distributed databases are the new web framework
       ... except none of them are awesome yet
       I don’t think we need another half-baked Dynamo
       clone




 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   21 / 35
Key-Value Stores


       Simple and Fast:

       Similar to a Python dict
       Keys usually bytes, probably limited
       Values usually bytes, often have fewer limits
       Extremely fast, simple




 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   22 / 35
Memcached

       Key/Value store as cache:

      No persistence
      RAM only
      Throws data away (on purpose)
      Lightning fast
      “Everyone” uses it




 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   23 / 35
Caching Immutable Data



       If only data never changed:

       Immutable is easy, do that




 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   24 / 35
Caching Mutable Data

       Invalidation sucks:

       Mutable is hard
       Failed transactions?
       Concurrent writers?
       Dependent cache keys?
       You will get it wrong and it will be hard to debug




 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   25 / 35
Tokyo Cabinet/Tyrant


       Not your mom’s BerkeleyDB:

       Disk persistent
       Very performant
       Actively developed
       Similar replication strategy to MySQL




 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   26 / 35
Redis

       Still very new:

       Not just a Key/Value store
       Matching on key spaces
       Values can be bytes, lists or sets
       Requires full store in RAM
       Might be a nice cache server?




 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   27 / 35
Document Databases


       Schema-free:

       Very easy to use
       Document Versioning
       Great for storing documents




 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   28 / 35
CouchDB

       Document DB Poster Child:

       Apache project
       Asynchronous replication
       JSON based
       Views materialized on demand (not indexes)
       Neat admin UI




 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   29 / 35
MongoDB

       C++’s revenge:

       Fast
       JSON and BSON (binary JSON-ish)
       Asynchronous replication with auto-sharding “soon”
       Index support
       Nested documents
       Advanced queries



 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   30 / 35
Column Databases


       Data Warehousing:

       Sequential reads are awesome
       Columns compress better than rows
       Doesn’t waste I/O on uninteresting columns




 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   31 / 35
MonetDB


       Research project:

       Tried really hard to get it to work
       Crashes a lot and corrupts your data
       Do not waste your time




 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   32 / 35
LucidDB


       Sounds interesting:

       Java/C++ open source data warehouse
       No clustering
       No experience yet




 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   33 / 35
Vertica


       We paid for it:

       Commercial (based on C-Store)
       Clustered
       Would still prefer open source




 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   34 / 35
Bitmap Indexes


       Sequential Scans can be fast:

       1-N bits per row of data
       Can apply logical operations across indexes
       Can be compressed (BBC, WAH)
       FastBit is a good implementation




 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   35 / 35
Bitmap Index Uses


       Big Queries:

       PostgreSQL 8.1+ in-memory for some queries
       Almost a requirement for column stores
       FastBit is a great implementation (WAH)




 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   36 / 35
Bloom Filters are Neat


       But our Princess is in another castle:

       Probabilistic data structure
       False positives at a known error
       Constant space
       I won’t bore you with the math




 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   37 / 35
Bloom Filter Diagram
       Actually Relevant:




 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   38 / 35
Bloom Filter Uses

       Find stuff, maybe:

       Approximate counting of a large set (e.g. unique IPs
       from logs)
       Knowing that data is definitely NOT stored
       somewhere, e.g. remote cache
       Several variants (Counting Bloom Filter, Scalable
       Bloom Filter, ...)



 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   39 / 35
Questions?


       Open Space:

       Open Space TODAY @ 5pm, Lambert. See
       Jonathan Ellis




 Bob Ippolito (PyCon 2009)   Drop ACID and think about data   March 28, 2009   40 / 35

Mais conteúdo relacionado

Destaque

program_draft3.pdf
program_draft3.pdfprogram_draft3.pdf
program_draft3.pdfHiroshi Ono
 
Brasil SãO Paulo
Brasil SãO PauloBrasil SãO Paulo
Brasil SãO Paulohome
 
Presentació BiblioClic-Premi Impuls
Presentació BiblioClic-Premi ImpulsPresentació BiblioClic-Premi Impuls
Presentació BiblioClic-Premi ImpulsAnna Badia
 
懇丁相簿
懇丁相簿懇丁相簿
懇丁相簿judychenQ
 
Especímenes Tipográficos, diploma en Tipografía Digital 2012 ( Muestra)
Especímenes Tipográficos, diploma en Tipografía Digital 2012 ( Muestra)Especímenes Tipográficos, diploma en Tipografía Digital 2012 ( Muestra)
Especímenes Tipográficos, diploma en Tipografía Digital 2012 ( Muestra)PublicacionesFAU
 
cosita de mariana jajajajaja
cosita de mariana jajajajajacosita de mariana jajajajaja
cosita de mariana jajajajajaluigi921219
 
Madera jose miguel
Madera jose miguelMadera jose miguel
Madera jose miguelLeyre_prof
 

Destaque (14)

program_draft3.pdf
program_draft3.pdfprogram_draft3.pdf
program_draft3.pdf
 
Brasil SãO Paulo
Brasil SãO PauloBrasil SãO Paulo
Brasil SãO Paulo
 
Presentació BiblioClic-Premi Impuls
Presentació BiblioClic-Premi ImpulsPresentació BiblioClic-Premi Impuls
Presentació BiblioClic-Premi Impuls
 
Petic Software
Petic SoftwarePetic Software
Petic Software
 
懇丁相簿
懇丁相簿懇丁相簿
懇丁相簿
 
Lisboa pop web
Lisboa pop webLisboa pop web
Lisboa pop web
 
Especímenes Tipográficos, diploma en Tipografía Digital 2012 ( Muestra)
Especímenes Tipográficos, diploma en Tipografía Digital 2012 ( Muestra)Especímenes Tipográficos, diploma en Tipografía Digital 2012 ( Muestra)
Especímenes Tipográficos, diploma en Tipografía Digital 2012 ( Muestra)
 
Apres Hsbc V3
Apres Hsbc V3Apres Hsbc V3
Apres Hsbc V3
 
cosita de mariana jajajajaja
cosita de mariana jajajajajacosita de mariana jajajajaja
cosita de mariana jajajajaja
 
Ojosycara
OjosycaraOjosycara
Ojosycara
 
Madera jose miguel
Madera jose miguelMadera jose miguel
Madera jose miguel
 
Misticismo o satanismo
Misticismo o satanismoMisticismo o satanismo
Misticismo o satanismo
 
Higienedelalechepdf 130627115808-phpapp01
Higienedelalechepdf 130627115808-phpapp01Higienedelalechepdf 130627115808-phpapp01
Higienedelalechepdf 130627115808-phpapp01
 
Prehistoria Carmen Cara 5ºA
Prehistoria Carmen Cara 5ºAPrehistoria Carmen Cara 5ºA
Prehistoria Carmen Cara 5ºA
 

Semelhante a Drop ACID and think about data (Bob Ippolito PyCon 2009

MATI 2009 Conference: CAT Tools
MATI 2009 Conference: CAT ToolsMATI 2009 Conference: CAT Tools
MATI 2009 Conference: CAT ToolsDierk Seeburg
 
MATI 2009 Conference: QA Tips and Tricks
MATI 2009 Conference: QA Tips and TricksMATI 2009 Conference: QA Tips and Tricks
MATI 2009 Conference: QA Tips and TricksDierk Seeburg
 
Istio on IBM K8Sにチャレンジしてみた
Istio on IBM K8SにチャレンジしてみたIstio on IBM K8Sにチャレンジしてみた
Istio on IBM K8SにチャレンジしてみたShoichiro Sakaigawa
 
[Bucharest] Reversing the Apple Sandbox
[Bucharest] Reversing the Apple Sandbox[Bucharest] Reversing the Apple Sandbox
[Bucharest] Reversing the Apple SandboxOWASP EEE
 
Data viz as interface #ignitelondon7
Data viz as interface #ignitelondon7Data viz as interface #ignitelondon7
Data viz as interface #ignitelondon7Makoto Inoue
 
Fast track foundations getting serious about sequential io
Fast track foundations getting serious about sequential ioFast track foundations getting serious about sequential io
Fast track foundations getting serious about sequential iojrowlandjones
 
PostgreSQL - backup and recovery with large databases
PostgreSQL - backup and recovery with large databasesPostgreSQL - backup and recovery with large databases
PostgreSQL - backup and recovery with large databasesFederico Campoli
 
Migrating PostgreSQL to the Cloud
Migrating PostgreSQL to the CloudMigrating PostgreSQL to the Cloud
Migrating PostgreSQL to the CloudMike Fowler
 
TubeTagger – YouTube-based Concept Detection
TubeTagger – YouTube-based Concept DetectionTubeTagger – YouTube-based Concept Detection
TubeTagger – YouTube-based Concept DetectionDamian Borth
 
Raspberry Pi for startups - Estonia
Raspberry Pi for startups - EstoniaRaspberry Pi for startups - Estonia
Raspberry Pi for startups - Estoniabennuttall
 

Semelhante a Drop ACID and think about data (Bob Ippolito PyCon 2009 (10)

MATI 2009 Conference: CAT Tools
MATI 2009 Conference: CAT ToolsMATI 2009 Conference: CAT Tools
MATI 2009 Conference: CAT Tools
 
MATI 2009 Conference: QA Tips and Tricks
MATI 2009 Conference: QA Tips and TricksMATI 2009 Conference: QA Tips and Tricks
MATI 2009 Conference: QA Tips and Tricks
 
Istio on IBM K8Sにチャレンジしてみた
Istio on IBM K8SにチャレンジしてみたIstio on IBM K8Sにチャレンジしてみた
Istio on IBM K8Sにチャレンジしてみた
 
[Bucharest] Reversing the Apple Sandbox
[Bucharest] Reversing the Apple Sandbox[Bucharest] Reversing the Apple Sandbox
[Bucharest] Reversing the Apple Sandbox
 
Data viz as interface #ignitelondon7
Data viz as interface #ignitelondon7Data viz as interface #ignitelondon7
Data viz as interface #ignitelondon7
 
Fast track foundations getting serious about sequential io
Fast track foundations getting serious about sequential ioFast track foundations getting serious about sequential io
Fast track foundations getting serious about sequential io
 
PostgreSQL - backup and recovery with large databases
PostgreSQL - backup and recovery with large databasesPostgreSQL - backup and recovery with large databases
PostgreSQL - backup and recovery with large databases
 
Migrating PostgreSQL to the Cloud
Migrating PostgreSQL to the CloudMigrating PostgreSQL to the Cloud
Migrating PostgreSQL to the Cloud
 
TubeTagger – YouTube-based Concept Detection
TubeTagger – YouTube-based Concept DetectionTubeTagger – YouTube-based Concept Detection
TubeTagger – YouTube-based Concept Detection
 
Raspberry Pi for startups - Estonia
Raspberry Pi for startups - EstoniaRaspberry Pi for startups - Estonia
Raspberry Pi for startups - Estonia
 

Mais de Hiroshi Ono

Voltdb - wikipedia
Voltdb - wikipediaVoltdb - wikipedia
Voltdb - wikipediaHiroshi Ono
 
Gamecenter概説
Gamecenter概説Gamecenter概説
Gamecenter概説Hiroshi Ono
 
EventDrivenArchitecture
EventDrivenArchitectureEventDrivenArchitecture
EventDrivenArchitectureHiroshi Ono
 
nodalities_issue7.pdf
nodalities_issue7.pdfnodalities_issue7.pdf
nodalities_issue7.pdfHiroshi Ono
 
genpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdfgenpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdfHiroshi Ono
 
kademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdfkademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdfHiroshi Ono
 
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdfpragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdfHiroshi Ono
 
downey08semaphores.pdf
downey08semaphores.pdfdowney08semaphores.pdf
downey08semaphores.pdfHiroshi Ono
 
BOF1-Scala02.pdf
BOF1-Scala02.pdfBOF1-Scala02.pdf
BOF1-Scala02.pdfHiroshi Ono
 
TwitterOct2008.pdf
TwitterOct2008.pdfTwitterOct2008.pdf
TwitterOct2008.pdfHiroshi Ono
 
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdfstateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdfHiroshi Ono
 
SACSIS2009_TCP.pdf
SACSIS2009_TCP.pdfSACSIS2009_TCP.pdf
SACSIS2009_TCP.pdfHiroshi Ono
 
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdfstateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdfHiroshi Ono
 
program_draft3.pdf
program_draft3.pdfprogram_draft3.pdf
program_draft3.pdfHiroshi Ono
 
nodalities_issue7.pdf
nodalities_issue7.pdfnodalities_issue7.pdf
nodalities_issue7.pdfHiroshi Ono
 
genpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdfgenpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdfHiroshi Ono
 
kademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdfkademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdfHiroshi Ono
 
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdfpragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdfHiroshi Ono
 
downey08semaphores.pdf
downey08semaphores.pdfdowney08semaphores.pdf
downey08semaphores.pdfHiroshi Ono
 

Mais de Hiroshi Ono (20)

Voltdb - wikipedia
Voltdb - wikipediaVoltdb - wikipedia
Voltdb - wikipedia
 
Gamecenter概説
Gamecenter概説Gamecenter概説
Gamecenter概説
 
EventDrivenArchitecture
EventDrivenArchitectureEventDrivenArchitecture
EventDrivenArchitecture
 
nodalities_issue7.pdf
nodalities_issue7.pdfnodalities_issue7.pdf
nodalities_issue7.pdf
 
genpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdfgenpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdf
 
kademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdfkademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdf
 
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdfpragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
 
downey08semaphores.pdf
downey08semaphores.pdfdowney08semaphores.pdf
downey08semaphores.pdf
 
BOF1-Scala02.pdf
BOF1-Scala02.pdfBOF1-Scala02.pdf
BOF1-Scala02.pdf
 
TwitterOct2008.pdf
TwitterOct2008.pdfTwitterOct2008.pdf
TwitterOct2008.pdf
 
camel-scala.pdf
camel-scala.pdfcamel-scala.pdf
camel-scala.pdf
 
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdfstateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
 
SACSIS2009_TCP.pdf
SACSIS2009_TCP.pdfSACSIS2009_TCP.pdf
SACSIS2009_TCP.pdf
 
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdfstateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
 
program_draft3.pdf
program_draft3.pdfprogram_draft3.pdf
program_draft3.pdf
 
nodalities_issue7.pdf
nodalities_issue7.pdfnodalities_issue7.pdf
nodalities_issue7.pdf
 
genpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdfgenpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdf
 
kademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdfkademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdf
 
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdfpragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
 
downey08semaphores.pdf
downey08semaphores.pdfdowney08semaphores.pdf
downey08semaphores.pdf
 

Último

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 

Último (20)

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 

Drop ACID and think about data (Bob Ippolito PyCon 2009

  • 1. Drop ACID and think about data Bob Ippolito Mochi Media, Inc. PyCon 2009 - Chicago (actually Rosemont) March 28, 2009 Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 1 / 35
  • 2. Author: Bob Ippolito Date: March 2009 Venue: PyCon 2009 Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 2 / 35
  • 3. Bob’s Perspective Startup with lots of data: Cofounded Mochi Media in 2005 MochiBot analytics platform (for Flash) MochiAds ad serving platform (for Flash games) Other cool services for game developers Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 2 / 35
  • 4. Mochi Ad Sales Hard Sell: Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 3 / 35
  • 5. What’s ACID? A promise ring your DBMS wears: Atomicity: all or nothing Consistency: no explosions Isolation: no fights Durability: no lying Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 4 / 35
  • 6. ACID Trips Scalability and reliability: Downtime is unacceptable Reliable is >= 2 nodes Scalable is ... more Networks make it hard Networks make it hard Networks make it hard Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 5 / 35
  • 7. What can I have? CAP theorem says pick two: Consistency Availability Partition tolerance Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 6 / 35
  • 8. Turn up the BASE Write smarter applications: Basically Available Soft state Eventually consistent Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 7 / 35
  • 9. BASE jumping Everyone else is doing it: Google Amazon eBay Yahoo! Facebook ... Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 8 / 35
  • 10. BigTable Google: Paxos (Chubby) Single-master Distributed tablets via GFS Row/Column db hybrid Compression (BMDiff, Zippy) Versioned (Row, Column, Timestamp) Bloom filters Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 9 / 35
  • 11. BigTable Pros Pros: Compression = Awesome Clients are probably simple Integrates with map/reduce Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 10 / 35
  • 12. BigTable Cons Cons: Proprietary to Google Single-master Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 11 / 35
  • 13. BigTable Diagram Single-master: Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 12 / 35
  • 14. Dynamo Amazon: Key/Value store Consistent hashing Vector clocks Read repair Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 13 / 35
  • 15. Dynamo Pros Pros: No master Highly available for write Knobs to make it fast to read “Simple” (lots of half-baked clones!) Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 14 / 35
  • 16. Dynamo Cons Cons: Proprietary to Amazon Clients need to be smart No compression Not suitable for column-like workloads Just a Key/Value store Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 15 / 35
  • 17. Dynamo Diagram Smart client: Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 16 / 35
  • 18. Cassandra Facebook -> Apache: Open source! No master like Dynamo Storage model more like BigTable Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 17 / 35
  • 19. Cassandra Pros Pros: OPEN SOURCE Incrementally scalable Minimal administration Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 18 / 35
  • 20. Cassandra Cons Cons: Not polished No compression yet Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 19 / 35
  • 21. Cassandra Diagram Soul Calibur: Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 20 / 35
  • 22. Distributed Musings New Hotness: Distributed databases are the new web framework ... except none of them are awesome yet I don’t think we need another half-baked Dynamo clone Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 21 / 35
  • 23. Key-Value Stores Simple and Fast: Similar to a Python dict Keys usually bytes, probably limited Values usually bytes, often have fewer limits Extremely fast, simple Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 22 / 35
  • 24. Memcached Key/Value store as cache: No persistence RAM only Throws data away (on purpose) Lightning fast “Everyone” uses it Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 23 / 35
  • 25. Caching Immutable Data If only data never changed: Immutable is easy, do that Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 24 / 35
  • 26. Caching Mutable Data Invalidation sucks: Mutable is hard Failed transactions? Concurrent writers? Dependent cache keys? You will get it wrong and it will be hard to debug Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 25 / 35
  • 27. Tokyo Cabinet/Tyrant Not your mom’s BerkeleyDB: Disk persistent Very performant Actively developed Similar replication strategy to MySQL Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 26 / 35
  • 28. Redis Still very new: Not just a Key/Value store Matching on key spaces Values can be bytes, lists or sets Requires full store in RAM Might be a nice cache server? Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 27 / 35
  • 29. Document Databases Schema-free: Very easy to use Document Versioning Great for storing documents Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 28 / 35
  • 30. CouchDB Document DB Poster Child: Apache project Asynchronous replication JSON based Views materialized on demand (not indexes) Neat admin UI Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 29 / 35
  • 31. MongoDB C++’s revenge: Fast JSON and BSON (binary JSON-ish) Asynchronous replication with auto-sharding “soon” Index support Nested documents Advanced queries Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 30 / 35
  • 32. Column Databases Data Warehousing: Sequential reads are awesome Columns compress better than rows Doesn’t waste I/O on uninteresting columns Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 31 / 35
  • 33. MonetDB Research project: Tried really hard to get it to work Crashes a lot and corrupts your data Do not waste your time Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 32 / 35
  • 34. LucidDB Sounds interesting: Java/C++ open source data warehouse No clustering No experience yet Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 33 / 35
  • 35. Vertica We paid for it: Commercial (based on C-Store) Clustered Would still prefer open source Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 34 / 35
  • 36. Bitmap Indexes Sequential Scans can be fast: 1-N bits per row of data Can apply logical operations across indexes Can be compressed (BBC, WAH) FastBit is a good implementation Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 35 / 35
  • 37. Bitmap Index Uses Big Queries: PostgreSQL 8.1+ in-memory for some queries Almost a requirement for column stores FastBit is a great implementation (WAH) Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 36 / 35
  • 38. Bloom Filters are Neat But our Princess is in another castle: Probabilistic data structure False positives at a known error Constant space I won’t bore you with the math Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 37 / 35
  • 39. Bloom Filter Diagram Actually Relevant: Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 38 / 35
  • 40. Bloom Filter Uses Find stuff, maybe: Approximate counting of a large set (e.g. unique IPs from logs) Knowing that data is definitely NOT stored somewhere, e.g. remote cache Several variants (Counting Bloom Filter, Scalable Bloom Filter, ...) Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 39 / 35
  • 41. Questions? Open Space: Open Space TODAY @ 5pm, Lambert. See Jonathan Ellis Bob Ippolito (PyCon 2009) Drop ACID and think about data March 28, 2009 40 / 35