SlideShare a Scribd company logo
1 of 51
Download to read offline
Data, dev-ops, and cloud services


    Building a distributed data-platform


               Charles Care

            Engineering Team
              Kasabi / Talis
Talk overview
โ—   About me...
โ—   What Kasabi is,
    โ—   what we are trying to do
    โ—   how we are working to achieve that
    โ—   a quick walk-though
โ—   Discussion of the Kasabi platform team
    โ—   Our technology / architecture
    โ—   Our engineering culture
    โ—   Lessons learnt
Views are mine...

โ€ฆand not necessarily those of
my (current/past) employers
About me...
About me...
โ—   2001-2004 โ€“ BSc Computer Science (Warwick)
โ—   2004-2008 โ€“ PhD Computer Science (Warwick)
โ—   2007-2011 โ€“ BT Plc
    โ—   Technical risk analyst โ€“ BT Global MPLS Network
    โ—   Software Engineer โ€“ Infrastructure for Financial Markets
    โ—   Senior Software Engineer โ€“ Central software standards
        and tools
โ—   2011-Present โ€“ Talis/Kasabi
    โ—   Software Engineer โ€“ Semantic web platform
About Kasabi
About Kasabi
โ—   Data market place
โ—   Bringing together data...
    โ—   owners
    โ—   consumers
โ—   Lowering the barrier for data-driven apps to
    enter the market
โ—   Enabling new opportunities for aggregating and
    mixing data
Data licensing today




                 Bespoke, expensive, contracts




Data Owners                                      Data Consumers
Kasabi as a data platform


                                          Data engineers
                       Data enthusiasts


  Data Owners
                                                            Application
                                                            Developers




Third-party services                       API developers
About Kasabi
โ—   Publish datasets using standard APIs
โ—   Access data using standard APIs
    โ—   Query a dataset using SPARQL
    โ—   Search a dataset using a simple full-text search
โ—   Define, contribute, and share your own APIs
Data marketplace




 http://www.kasabi.com/
A dataset
Access data using standard APIs
Contribute custom APIs
Example โ€“ contributed APIs
Current organisation
โ—   Product development
โ—   Data engineering
โ—   Customer operations
โ—   Platform development
Current organisation
โ—   Product development
โ—   Data engineering
โ—   Customer operations
โ—   Platform development
Platform architecture
Data Platform
                  Load balancing and routing


    Update services   Search services     Query services

                                                           Datasets
โ—   Need to store and update datasets
โ—   Access data via various services
โ—   Must scale with load and increasing data
โ—   Must be tolerant to failure
โ—   Extensible
    โ—   Should be easy to add new services over time
To distribute...

...or not to distribute
Distributed Platform
                                    Routing layer

Dynamic Gossip Network
                                                     Update
                                                     service            SPARQL
             Update
             service            Search                                   service
                                service


   Update                                                                          New
   service                                                                       service?

                                          SPARQL
                                           service
                                                                                 Search
              Search                                       SPARQL                service
              service                                       service


Sequence Service        Storage Service                    Monitoring Services
Distributed Platform โ€“ updates
                                    Routing layer

Dynamic Gossip Network
                                                     Update
                                                     service            SPARQL
             Update
             service            Search                                   service
                                service


   Update     - Updates are sequenced
              - Data stored in distributed storage                                 New
   service                                                                       service?

                                          SPARQL
                                           service
                                                                                 Search
              Search                                       SPARQL                service
              service                                       service


Sequence Service        Storage Service                    Monitoring Services
Distributed Platform โ€“ updates
                                    Routing layer

Dynamic Gossip Network
                                                     Update
                                                     service            SPARQL
             Update
             service            Search                                   service
                                service

                                          - Updates are gossiped around
   Update                                 network
                                                                                   New
   service                                - Here a SPARQL node realises
                                                                                 service?
                                          that it should apply the update

                                          SPARQL
                                           service
                                                                                 Search
              Search                                       SPARQL                service
              service                                       service


Sequence Service        Storage Service                    Monitoring Services
Distributed Platform โ€“ query
                                     Routing layer

Dynamic Gossip Network
                                                      Update
                                                      service            SPARQL
              Update
              service            Search                                   service
                                 service
                                                  SPARQL queries
                                                  will now reflect
                                                  the update that
   Update                                                                           New
                                                  was submitted
   service                                                                        service?

                                           SPARQL
                                            service
                                                                                  Search
               Search                                       SPARQL                service
               service                                       service


Sequence Service         Storage Service                    Monitoring Services
Monolithic vs distributed
โ—   Monolithic
    โ—   Easy to synchronise events and data
    โ—   Consistent views and queries
    โ—   Less inter-process communication / less network overhead
    โ—   Easier to optimise for high throughput
    โ—   Single code-base
    โ—   Fewer processes to monitor
โ—   Distributed
    โ—   Service-oriented - separate concerns run in isolated processes (and can be scaled
        independently)
    โ—   Development is component-based
        โ€“   Changes are more focussed / helps avoids scope-creep
    โ—   Deployment can be localised to avoid downtime
    โ—   Failure is more likely โ€“ so you need to plan for it
    โ—   Easier to integrate out-of-the box software โ€“ e.g. using standard Apache Solr
Distributed data platform
โ—   Separate services for each API
โ—   Communication via Gossip messages
โ—   Have to manage eventual consistency
โ—   Highly scalable
โ—   Easy to add new services
โ—   Use standard protocols and open-source components
    โ—   HTTP libraries / REST / ZeroMQ / Apache Thrift
    โ—   RDF and SPARQL using Apache Jena
    โ—   Search using Apache Solr
    โ—   Avoid modification and forks
โ—   Deploy into Amazon EC2 (also using: S3, EMR, and ELB)
Benefits of using cloud services
Consider a start-up in 2002
โ—   Have an idea...
โ—   Get funding (development, op-ex,
    cap-ex)
โ—   Aquire servers
    โ—   Set-up your servers
        โ€“   mail, web, source code repo, build
            systems
        โ€“   development, staging, live
    โ—   Some 'cloud' services
        โ€“   โ€ฆ, SourceForge, shared servers, etc
โ—   Build, and go, to market
    โ—   Probably embedding open-source
        components
โ—   Delivery based on full-stack,
    monolithic, architectures
Consider a start-up in 2012
โ—   Have an idea...
โ—   Get funding (development capital, op-ex)
    โ—   you will probably not get cap-ex
โ—   Use cloud services... rent rather than buy
    โ—   SaaS โ€“ Software as a Service
        โ€“   Why would you run your own (chat/email etc)
        โ€“   Host your code in GitHub/BitBucket etc
    โ—   PaaS โ€“ Platform as a Service
        โ€“   Do you need to control the full stack?
        โ€“   Could you leverage platforms like: Heroku, Joyant,
            AppEngine etc
        โ€“   Amazon RDS
    โ—   IaaS โ€“ Infrastructure as a Service
        โ€“   Cloud services to provide 'bare metal'
โ—   Build and go to market quickly
โ—   scale elastically over time
But what about the enterprise?
โ—   Benefits of cloud services are
    already transforming the enterprise
    โ—   Private clouds
    โ—   Virtual appliances
    โ—   Cloud bursting
    โ—   Independent scaling
    โ—   Separation of concerns
    โ—   SOA architecture
โ—   And in future...
    โ—   Appetite for IaaS is growing
    โ—   PaaS and SaaS will follow.
    โ—   Perimeter security will be replaced by
        localised security boundaries
So how do we build this stuff...?
How it all happens
โ—   Constantly iterating through...
    โ—   Requirements
    โ—   Development (Test-driven)
    โ—   Testing/Review
    โ—   Deployment
    โ—   Operation
โ—   We're an Agile, dev-ops team...
        so all the above is a shared responsibility
Being a dev-ops team...
โ—   Removing barriers between development and operations
โ—   Shared responsibilities rather than distrust
โ—   Everyone has root access
โ—   Developers are responsible for operating systems they build
โ—   Everyone is free to make changes
        ...and responsible to manage the roll-out of those changes
โ—   Ops/Deployment/Monitoring are automated
โ—   Everyone should have full-stack awareness
โ—   Read more...
    โ—   http://dev2ops.org/blog/2010/2/22/what-is-devops.html
    โ—   http://www.jedi.be/blog/
    โ—   http://en.wikipedia.org/wiki/Devops
    โ—   http://www.slideshare.net/jallspaw/ 10-deploys-per-day-dev-and-ops-cooperation-at-flickr
Life-cycle of a change
Requirements and Planning
โ—   Identification of requirement
โ—   Planning
    โ—   Break down big changes into smaller tasks
        โ€“   Can the change be deployed in small steps?
        โ€“   Can the change be dark-deployed?
    โ—   Understand the wider impact
    โ—   Find middle ground between generic and specific
โ—   Team is self-organising
    โ—   People pull work from the prioritised, planned stories
Branch based development
โ—   One branch per change, squash before merge
Writing the code
โ—   Work on a branch
     โ—    don't know if/when you'll merge
โ—   Test-driven
     โ—    Unit tests first
     โ—    Do acceptance tests need to change?
     โ—    What technology? Which tool-sets?
โ—   Smoke testing
     โ—    How do you know it works?
     โ—    What's different in production?
     โ—    What are the risks of failure?
โ—   Feature flags?
Tests run: 110, Failures: 0, Errors: 0, Skipped: 2

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 39 seconds
[INFO] Finished at: Sat Feb 18 15:20:36 GMT 2012
[INFO] Final Memory: 33M/240M
[INFO] ------------------------------------------------------------------------
Writing the code
โ—   Avoid unnecessary scope-creep
    โ—   โ€œI'll just fix this...โ€
    โ—   โ€œIt would be much cleaner if I re-factored this...โ€
    โ—   โ€œIt would be neat if I also added this...โ€
    โ—   โ€ฆhowever, these observations can be written as new stories
    โ—   โ€ฆand sometimes it's good to fix things before they cause pain
    โ—   โ€ฆif extra changes are really necessary, can they be implemented separately?
    โ—   โ€ฆteam should be empowered to fix technical debt
    โ—   ...managing scope-creep is a shared responsibility
โ—   Be prepared to abandon a change if it's taking too long, maybe it needs
    more planning?
โ—   Should you be pairing?
โ—   Should you demo your work?
Code review
โ—   Code review possible with tools for distributed
    teams (e.g. Gerrit or ReviewBoard)
โ—   If you're not following a strict pairing policy,
    code-review is vital
โ—   Useful to make others aware of changes
โ—   Gerrit
    โ—   Build agent automatically builds your change and
        runs tests โ€“ verify +/- 1
    โ—   Invite others to review your code, they can give it a
        score between -2 and +2.
    โ—   Can only deploy code once at least one person has
        given a +2
    โ—   Work-flow is customisable
โ—   Self-organising... anyone can review

              $> git commit
              $> git review
Code review (2)
Code review (3)
Merge / Deployment
โ—   Merge & Deployment
    โ—   One-click deployment
    โ—   Developer should press the button
    โ—   Code is merged into the
        master/release branch
    โ—   Build server automatically checks
        out the code and builds, tags, and
        uploads the release to an artefact
        repository
    โ—   Package is automatically
        deployed on all servers
        โ€“   Extra orchestration for external-facing
            services to avoid โ€œthundering-herdโ€
            problems
Managing infrastructure
โ—   Puppet or Chef
โ—   Build packages (e.g. DEB or RPM)
โ—   Centralise configuration management
โ—   Utilising cloud compute infrastructure
    โ—   Amazon EC2
    โ—   Amazon S3
    โ—   Elastic load balancers
    โ—   Elastic Map-Reduce
โ—   Application monitoring
    โ—   Metrics
    โ—   Log analysis
    โ—   Internal monitoring
    โ—   External checks
Lessons learnt

(again, my views!)
Technical lessons learnt
โ—   Use distributed SOA-based services to reduce tight-
    coupling
โ—   Monitor everything...
โ—   Leverage cloud offerings
    โ—   wrap them with well-defined interfaces to avoid lock-in
โ—   Design systems to scale
โ—   Use open and unmodified components where possible
    โ—   Standard components fronting external APIs
    โ—   E.g. Jena, Solr, Haproxy, Apache
Practices that have helped us
โ—   Dev-ops culture
โ—   Pragmatic approach to agile development
    โ—   Task allocation should be 'pull', rather than 'push'
    โ—   Teams should be self-organising
    โ—   Pairing when working on new problems
โ—   Test-Driven-Development (TDD)
โ—   Continuous integration
โ—   Peer-review of code
โ—   Continuous deployment
โ€ฆso, in summary...
Conclusion
โ—   Isolate your design into components
โ—   Empower your team to release small changes
    frequently
โ—   Leverage hosted/cloud offerings
Thanks for listening!
Credits
โ—   Thanks for the invite to speak
โ—   Thanks to Kasabi / Talis Systems Ltd

โ—   Sign up at http://www.kasabi.com




    Graphics from http://www.iconarchive.com/,
    http://www.oxygen-icons.org and http://www.icons-land.com
Questions?

More Related Content

Viewers also liked

Becomming a cloud governance ninja linthicum interop fall 2013
Becomming a cloud governance ninja linthicum interop fall 2013Becomming a cloud governance ninja linthicum interop fall 2013
Becomming a cloud governance ninja linthicum interop fall 2013David Linthicum
ย 
The Open Source Messaging Landscape
The Open Source Messaging LandscapeThe Open Source Messaging Landscape
The Open Source Messaging LandscapeRichard Seroter
ย 
What is Enterprise Architecture?
What is Enterprise Architecture?What is Enterprise Architecture?
What is Enterprise Architecture?Brett Colbert
ย 
An agile approach to cloud infrastructure
An agile approach to cloud infrastructureAn agile approach to cloud infrastructure
An agile approach to cloud infrastructureRichard Seroter
ย 
Geting cloud architecture right the first time linthicum interop fall 2013
Geting cloud architecture right the first time linthicum interop fall 2013Geting cloud architecture right the first time linthicum interop fall 2013
Geting cloud architecture right the first time linthicum interop fall 2013David Linthicum
ย 
Mashing Up DevOps with Cloud Computing
Mashing Up DevOps with Cloud ComputingMashing Up DevOps with Cloud Computing
Mashing Up DevOps with Cloud ComputingDavid Linthicum
ย 
Enterprise Architecture, Project Management & Digital Transformation
Enterprise Architecture, Project Management & Digital TransformationEnterprise Architecture, Project Management & Digital Transformation
Enterprise Architecture, Project Management & Digital TransformationRiaz A. Khan, OpenCA, TOGAF
ย 
DevOps in the Cloud with Microsoft Azure
DevOps in the Cloud with Microsoft AzureDevOps in the Cloud with Microsoft Azure
DevOps in the Cloud with Microsoft Azuregjuljo
ย 
Netflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconNetflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconAdrian Cockcroft
ย 
Implementing Effective Enterprise Architecture
Implementing Effective Enterprise ArchitectureImplementing Effective Enterprise Architecture
Implementing Effective Enterprise ArchitectureLeo Shuster
ย 
Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud ArchitectureAdrian Cockcroft
ย 
Introduction to Enterprise Architecture
Introduction to Enterprise Architecture Introduction to Enterprise Architecture
Introduction to Enterprise Architecture Leo Shuster
ย 

Viewers also liked (12)

Becomming a cloud governance ninja linthicum interop fall 2013
Becomming a cloud governance ninja linthicum interop fall 2013Becomming a cloud governance ninja linthicum interop fall 2013
Becomming a cloud governance ninja linthicum interop fall 2013
ย 
The Open Source Messaging Landscape
The Open Source Messaging LandscapeThe Open Source Messaging Landscape
The Open Source Messaging Landscape
ย 
What is Enterprise Architecture?
What is Enterprise Architecture?What is Enterprise Architecture?
What is Enterprise Architecture?
ย 
An agile approach to cloud infrastructure
An agile approach to cloud infrastructureAn agile approach to cloud infrastructure
An agile approach to cloud infrastructure
ย 
Geting cloud architecture right the first time linthicum interop fall 2013
Geting cloud architecture right the first time linthicum interop fall 2013Geting cloud architecture right the first time linthicum interop fall 2013
Geting cloud architecture right the first time linthicum interop fall 2013
ย 
Mashing Up DevOps with Cloud Computing
Mashing Up DevOps with Cloud ComputingMashing Up DevOps with Cloud Computing
Mashing Up DevOps with Cloud Computing
ย 
Enterprise Architecture, Project Management & Digital Transformation
Enterprise Architecture, Project Management & Digital TransformationEnterprise Architecture, Project Management & Digital Transformation
Enterprise Architecture, Project Management & Digital Transformation
ย 
DevOps in the Cloud with Microsoft Azure
DevOps in the Cloud with Microsoft AzureDevOps in the Cloud with Microsoft Azure
DevOps in the Cloud with Microsoft Azure
ย 
Netflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconNetflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at Gluecon
ย 
Implementing Effective Enterprise Architecture
Implementing Effective Enterprise ArchitectureImplementing Effective Enterprise Architecture
Implementing Effective Enterprise Architecture
ย 
Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud Architecture
ย 
Introduction to Enterprise Architecture
Introduction to Enterprise Architecture Introduction to Enterprise Architecture
Introduction to Enterprise Architecture
ย 

Similar to Building a distributed data-platform - A perspective on current trends in computing

Apacheยฎ Sparkโ„ข 1.6 presented by Databricks co-founder Patrick Wendell
Apacheยฎ Sparkโ„ข 1.6 presented by Databricks co-founder Patrick WendellApacheยฎ Sparkโ„ข 1.6 presented by Databricks co-founder Patrick Wendell
Apacheยฎ Sparkโ„ข 1.6 presented by Databricks co-founder Patrick WendellDatabricks
ย 
Rethinking Stream Processing with Apache Kafka, Kafka Streams and KSQL
Rethinking Stream Processing with Apache Kafka, Kafka Streams and KSQLRethinking Stream Processing with Apache Kafka, Kafka Streams and KSQL
Rethinking Stream Processing with Apache Kafka, Kafka Streams and KSQLKai Wรคhner
ย 
Openflow Stanford University - Ericsson Collaboration
Openflow Stanford University - Ericsson CollaborationOpenflow Stanford University - Ericsson Collaboration
Openflow Stanford University - Ericsson CollaborationEricsson Labs
ย 
Web Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectWeb Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectSaltlux Inc.
ย 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesJen Aman
ย 
Efficient State Management With Spark 2.x And Scale-Out Databases
Efficient State Management With Spark 2.x And Scale-Out DatabasesEfficient State Management With Spark 2.x And Scale-Out Databases
Efficient State Management With Spark 2.x And Scale-Out DatabasesSnappyData
ย 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...Edureka!
ย 
SkySQL Reference Architecture (Kaj Arno)
SkySQL Reference Architecture (Kaj Arno)SkySQL Reference Architecture (Kaj Arno)
SkySQL Reference Architecture (Kaj Arno)Ontico
ย 
Deploy from OpenStack Trunk into a Production Environment
Deploy from OpenStack Trunk into a Production EnvironmentDeploy from OpenStack Trunk into a Production Environment
Deploy from OpenStack Trunk into a Production EnvironmentOpenStack Foundation
ย 
A10 Networks: IPv6 Solutions for Enterprise by Paul Nicholson at gogoNET LIVE...
A10 Networks: IPv6 Solutions for Enterprise by Paul Nicholson at gogoNET LIVE...A10 Networks: IPv6 Solutions for Enterprise by Paul Nicholson at gogoNET LIVE...
A10 Networks: IPv6 Solutions for Enterprise by Paul Nicholson at gogoNET LIVE...gogo6
ย 
Ivan Herman - Semantic Web Activities @ W3C
Ivan Herman - Semantic Web Activities @ W3CIvan Herman - Semantic Web Activities @ W3C
Ivan Herman - Semantic Web Activities @ W3Csssw2012
ย 
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and PythonApache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and PythonChristian Perone
ย 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Edureka!
ย 
Difference between apache spark and apache nifi
Difference between apache spark and apache nifiDifference between apache spark and apache nifi
Difference between apache spark and apache nifiGaneshJoshi47
ย 
Dell open stack powered cloud solution introduce & crowbar demo cosug-2012
Dell open stack powered cloud solution introduce & crowbar demo cosug-2012Dell open stack powered cloud solution introduce & crowbar demo cosug-2012
Dell open stack powered cloud solution introduce & crowbar demo cosug-2012OpenCity Community
ย 
Intro: OPFNV Mini Summit at 2015 NFV World Congress
Intro: OPFNV Mini Summit at 2015 NFV World CongressIntro: OPFNV Mini Summit at 2015 NFV World Congress
Intro: OPFNV Mini Summit at 2015 NFV World CongressOPNFV
ย 
Summit 16: Open-O Mini-Summit - OPNFV & Open-O
Summit 16: Open-O Mini-Summit - OPNFV & Open-OSummit 16: Open-O Mini-Summit - OPNFV & Open-O
Summit 16: Open-O Mini-Summit - OPNFV & Open-OOPNFV
ย 
Big Data Day LA 2016/ NoSQL track - Spark And Couchbase: Augmenting The Opera...
Big Data Day LA 2016/ NoSQL track - Spark And Couchbase: Augmenting The Opera...Big Data Day LA 2016/ NoSQL track - Spark And Couchbase: Augmenting The Opera...
Big Data Day LA 2016/ NoSQL track - Spark And Couchbase: Augmenting The Opera...Data Con LA
ย 
Spark and Couchbaseโ€“ Augmenting the Operational Database with Spark
Spark and Couchbaseโ€“ Augmenting the Operational Database with SparkSpark and Couchbaseโ€“ Augmenting the Operational Database with Spark
Spark and Couchbaseโ€“ Augmenting the Operational Database with SparkMatt Ingenthron
ย 
Network Service Benchmarking
Network Service BenchmarkingNetwork Service Benchmarking
Network Service BenchmarkingMichelle Holley
ย 

Similar to Building a distributed data-platform - A perspective on current trends in computing (20)

Apacheยฎ Sparkโ„ข 1.6 presented by Databricks co-founder Patrick Wendell
Apacheยฎ Sparkโ„ข 1.6 presented by Databricks co-founder Patrick WendellApacheยฎ Sparkโ„ข 1.6 presented by Databricks co-founder Patrick Wendell
Apacheยฎ Sparkโ„ข 1.6 presented by Databricks co-founder Patrick Wendell
ย 
Rethinking Stream Processing with Apache Kafka, Kafka Streams and KSQL
Rethinking Stream Processing with Apache Kafka, Kafka Streams and KSQLRethinking Stream Processing with Apache Kafka, Kafka Streams and KSQL
Rethinking Stream Processing with Apache Kafka, Kafka Streams and KSQL
ย 
Openflow Stanford University - Ericsson Collaboration
Openflow Stanford University - Ericsson CollaborationOpenflow Stanford University - Ericsson Collaboration
Openflow Stanford University - Ericsson Collaboration
ย 
Web Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectWeb Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC Project
ย 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out Databases
ย 
Efficient State Management With Spark 2.x And Scale-Out Databases
Efficient State Management With Spark 2.x And Scale-Out DatabasesEfficient State Management With Spark 2.x And Scale-Out Databases
Efficient State Management With Spark 2.x And Scale-Out Databases
ย 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
ย 
SkySQL Reference Architecture (Kaj Arno)
SkySQL Reference Architecture (Kaj Arno)SkySQL Reference Architecture (Kaj Arno)
SkySQL Reference Architecture (Kaj Arno)
ย 
Deploy from OpenStack Trunk into a Production Environment
Deploy from OpenStack Trunk into a Production EnvironmentDeploy from OpenStack Trunk into a Production Environment
Deploy from OpenStack Trunk into a Production Environment
ย 
A10 Networks: IPv6 Solutions for Enterprise by Paul Nicholson at gogoNET LIVE...
A10 Networks: IPv6 Solutions for Enterprise by Paul Nicholson at gogoNET LIVE...A10 Networks: IPv6 Solutions for Enterprise by Paul Nicholson at gogoNET LIVE...
A10 Networks: IPv6 Solutions for Enterprise by Paul Nicholson at gogoNET LIVE...
ย 
Ivan Herman - Semantic Web Activities @ W3C
Ivan Herman - Semantic Web Activities @ W3CIvan Herman - Semantic Web Activities @ W3C
Ivan Herman - Semantic Web Activities @ W3C
ย 
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and PythonApache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python
ย 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
ย 
Difference between apache spark and apache nifi
Difference between apache spark and apache nifiDifference between apache spark and apache nifi
Difference between apache spark and apache nifi
ย 
Dell open stack powered cloud solution introduce & crowbar demo cosug-2012
Dell open stack powered cloud solution introduce & crowbar demo cosug-2012Dell open stack powered cloud solution introduce & crowbar demo cosug-2012
Dell open stack powered cloud solution introduce & crowbar demo cosug-2012
ย 
Intro: OPFNV Mini Summit at 2015 NFV World Congress
Intro: OPFNV Mini Summit at 2015 NFV World CongressIntro: OPFNV Mini Summit at 2015 NFV World Congress
Intro: OPFNV Mini Summit at 2015 NFV World Congress
ย 
Summit 16: Open-O Mini-Summit - OPNFV & Open-O
Summit 16: Open-O Mini-Summit - OPNFV & Open-OSummit 16: Open-O Mini-Summit - OPNFV & Open-O
Summit 16: Open-O Mini-Summit - OPNFV & Open-O
ย 
Big Data Day LA 2016/ NoSQL track - Spark And Couchbase: Augmenting The Opera...
Big Data Day LA 2016/ NoSQL track - Spark And Couchbase: Augmenting The Opera...Big Data Day LA 2016/ NoSQL track - Spark And Couchbase: Augmenting The Opera...
Big Data Day LA 2016/ NoSQL track - Spark And Couchbase: Augmenting The Opera...
ย 
Spark and Couchbaseโ€“ Augmenting the Operational Database with Spark
Spark and Couchbaseโ€“ Augmenting the Operational Database with SparkSpark and Couchbaseโ€“ Augmenting the Operational Database with Spark
Spark and Couchbaseโ€“ Augmenting the Operational Database with Spark
ย 
Network Service Benchmarking
Network Service BenchmarkingNetwork Service Benchmarking
Network Service Benchmarking
ย 

Recently uploaded

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
ย 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
ย 
FULL ENJOY ๐Ÿ” 8264348440 ๐Ÿ” Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY ๐Ÿ” 8264348440 ๐Ÿ” Call Girls in Diplomatic Enclave | DelhiFULL ENJOY ๐Ÿ” 8264348440 ๐Ÿ” Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY ๐Ÿ” 8264348440 ๐Ÿ” Call Girls in Diplomatic Enclave | Delhisoniya singh
ย 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
ย 
SIEMENS: RAPUNZEL โ€“ A Tale About Knowledge Graph
SIEMENS: RAPUNZEL โ€“ A Tale About Knowledge GraphSIEMENS: RAPUNZEL โ€“ A Tale About Knowledge Graph
SIEMENS: RAPUNZEL โ€“ A Tale About Knowledge GraphNeo4j
ย 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
ย 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
ย 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
ย 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
ย 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
ย 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
ย 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
ย 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
ย 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
ย 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
ย 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
ย 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
ย 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
ย 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
ย 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
ย 

Recently uploaded (20)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
ย 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
ย 
FULL ENJOY ๐Ÿ” 8264348440 ๐Ÿ” Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY ๐Ÿ” 8264348440 ๐Ÿ” Call Girls in Diplomatic Enclave | DelhiFULL ENJOY ๐Ÿ” 8264348440 ๐Ÿ” Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY ๐Ÿ” 8264348440 ๐Ÿ” Call Girls in Diplomatic Enclave | Delhi
ย 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
ย 
SIEMENS: RAPUNZEL โ€“ A Tale About Knowledge Graph
SIEMENS: RAPUNZEL โ€“ A Tale About Knowledge GraphSIEMENS: RAPUNZEL โ€“ A Tale About Knowledge Graph
SIEMENS: RAPUNZEL โ€“ A Tale About Knowledge Graph
ย 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
ย 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
ย 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
ย 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
ย 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
ย 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
ย 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
ย 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
ย 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
ย 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
ย 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
ย 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
ย 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
ย 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
ย 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
ย 

Building a distributed data-platform - A perspective on current trends in computing

  • 1. Data, dev-ops, and cloud services Building a distributed data-platform Charles Care Engineering Team Kasabi / Talis
  • 2. Talk overview โ— About me... โ— What Kasabi is, โ— what we are trying to do โ— how we are working to achieve that โ— a quick walk-though โ— Discussion of the Kasabi platform team โ— Our technology / architecture โ— Our engineering culture โ— Lessons learnt
  • 3. Views are mine... โ€ฆand not necessarily those of my (current/past) employers
  • 5. About me... โ— 2001-2004 โ€“ BSc Computer Science (Warwick) โ— 2004-2008 โ€“ PhD Computer Science (Warwick) โ— 2007-2011 โ€“ BT Plc โ— Technical risk analyst โ€“ BT Global MPLS Network โ— Software Engineer โ€“ Infrastructure for Financial Markets โ— Senior Software Engineer โ€“ Central software standards and tools โ— 2011-Present โ€“ Talis/Kasabi โ— Software Engineer โ€“ Semantic web platform
  • 7. About Kasabi โ— Data market place โ— Bringing together data... โ— owners โ— consumers โ— Lowering the barrier for data-driven apps to enter the market โ— Enabling new opportunities for aggregating and mixing data
  • 8. Data licensing today Bespoke, expensive, contracts Data Owners Data Consumers
  • 9. Kasabi as a data platform Data engineers Data enthusiasts Data Owners Application Developers Third-party services API developers
  • 10. About Kasabi โ— Publish datasets using standard APIs โ— Access data using standard APIs โ— Query a dataset using SPARQL โ— Search a dataset using a simple full-text search โ— Define, contribute, and share your own APIs
  • 13. Access data using standard APIs
  • 16. Current organisation โ— Product development โ— Data engineering โ— Customer operations โ— Platform development
  • 17. Current organisation โ— Product development โ— Data engineering โ— Customer operations โ— Platform development
  • 19. Data Platform Load balancing and routing Update services Search services Query services Datasets โ— Need to store and update datasets โ— Access data via various services โ— Must scale with load and increasing data โ— Must be tolerant to failure โ— Extensible โ— Should be easy to add new services over time
  • 20. To distribute... ...or not to distribute
  • 21. Distributed Platform Routing layer Dynamic Gossip Network Update service SPARQL Update service Search service service Update New service service? SPARQL service Search Search SPARQL service service service Sequence Service Storage Service Monitoring Services
  • 22. Distributed Platform โ€“ updates Routing layer Dynamic Gossip Network Update service SPARQL Update service Search service service Update - Updates are sequenced - Data stored in distributed storage New service service? SPARQL service Search Search SPARQL service service service Sequence Service Storage Service Monitoring Services
  • 23. Distributed Platform โ€“ updates Routing layer Dynamic Gossip Network Update service SPARQL Update service Search service service - Updates are gossiped around Update network New service - Here a SPARQL node realises service? that it should apply the update SPARQL service Search Search SPARQL service service service Sequence Service Storage Service Monitoring Services
  • 24. Distributed Platform โ€“ query Routing layer Dynamic Gossip Network Update service SPARQL Update service Search service service SPARQL queries will now reflect the update that Update New was submitted service service? SPARQL service Search Search SPARQL service service service Sequence Service Storage Service Monitoring Services
  • 25. Monolithic vs distributed โ— Monolithic โ— Easy to synchronise events and data โ— Consistent views and queries โ— Less inter-process communication / less network overhead โ— Easier to optimise for high throughput โ— Single code-base โ— Fewer processes to monitor โ— Distributed โ— Service-oriented - separate concerns run in isolated processes (and can be scaled independently) โ— Development is component-based โ€“ Changes are more focussed / helps avoids scope-creep โ— Deployment can be localised to avoid downtime โ— Failure is more likely โ€“ so you need to plan for it โ— Easier to integrate out-of-the box software โ€“ e.g. using standard Apache Solr
  • 26. Distributed data platform โ— Separate services for each API โ— Communication via Gossip messages โ— Have to manage eventual consistency โ— Highly scalable โ— Easy to add new services โ— Use standard protocols and open-source components โ— HTTP libraries / REST / ZeroMQ / Apache Thrift โ— RDF and SPARQL using Apache Jena โ— Search using Apache Solr โ— Avoid modification and forks โ— Deploy into Amazon EC2 (also using: S3, EMR, and ELB)
  • 27. Benefits of using cloud services
  • 28. Consider a start-up in 2002 โ— Have an idea... โ— Get funding (development, op-ex, cap-ex) โ— Aquire servers โ— Set-up your servers โ€“ mail, web, source code repo, build systems โ€“ development, staging, live โ— Some 'cloud' services โ€“ โ€ฆ, SourceForge, shared servers, etc โ— Build, and go, to market โ— Probably embedding open-source components โ— Delivery based on full-stack, monolithic, architectures
  • 29. Consider a start-up in 2012 โ— Have an idea... โ— Get funding (development capital, op-ex) โ— you will probably not get cap-ex โ— Use cloud services... rent rather than buy โ— SaaS โ€“ Software as a Service โ€“ Why would you run your own (chat/email etc) โ€“ Host your code in GitHub/BitBucket etc โ— PaaS โ€“ Platform as a Service โ€“ Do you need to control the full stack? โ€“ Could you leverage platforms like: Heroku, Joyant, AppEngine etc โ€“ Amazon RDS โ— IaaS โ€“ Infrastructure as a Service โ€“ Cloud services to provide 'bare metal' โ— Build and go to market quickly โ— scale elastically over time
  • 30. But what about the enterprise? โ— Benefits of cloud services are already transforming the enterprise โ— Private clouds โ— Virtual appliances โ— Cloud bursting โ— Independent scaling โ— Separation of concerns โ— SOA architecture โ— And in future... โ— Appetite for IaaS is growing โ— PaaS and SaaS will follow. โ— Perimeter security will be replaced by localised security boundaries
  • 31. So how do we build this stuff...?
  • 32. How it all happens โ— Constantly iterating through... โ— Requirements โ— Development (Test-driven) โ— Testing/Review โ— Deployment โ— Operation โ— We're an Agile, dev-ops team... so all the above is a shared responsibility
  • 33. Being a dev-ops team... โ— Removing barriers between development and operations โ— Shared responsibilities rather than distrust โ— Everyone has root access โ— Developers are responsible for operating systems they build โ— Everyone is free to make changes ...and responsible to manage the roll-out of those changes โ— Ops/Deployment/Monitoring are automated โ— Everyone should have full-stack awareness โ— Read more... โ— http://dev2ops.org/blog/2010/2/22/what-is-devops.html โ— http://www.jedi.be/blog/ โ— http://en.wikipedia.org/wiki/Devops โ— http://www.slideshare.net/jallspaw/ 10-deploys-per-day-dev-and-ops-cooperation-at-flickr
  • 34. Life-cycle of a change
  • 35. Requirements and Planning โ— Identification of requirement โ— Planning โ— Break down big changes into smaller tasks โ€“ Can the change be deployed in small steps? โ€“ Can the change be dark-deployed? โ— Understand the wider impact โ— Find middle ground between generic and specific โ— Team is self-organising โ— People pull work from the prioritised, planned stories
  • 36. Branch based development โ— One branch per change, squash before merge
  • 37. Writing the code โ— Work on a branch โ— don't know if/when you'll merge โ— Test-driven โ— Unit tests first โ— Do acceptance tests need to change? โ— What technology? Which tool-sets? โ— Smoke testing โ— How do you know it works? โ— What's different in production? โ— What are the risks of failure? โ— Feature flags? Tests run: 110, Failures: 0, Errors: 0, Skipped: 2 [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESSFUL [INFO] ------------------------------------------------------------------------ [INFO] Total time: 39 seconds [INFO] Finished at: Sat Feb 18 15:20:36 GMT 2012 [INFO] Final Memory: 33M/240M [INFO] ------------------------------------------------------------------------
  • 38. Writing the code โ— Avoid unnecessary scope-creep โ— โ€œI'll just fix this...โ€ โ— โ€œIt would be much cleaner if I re-factored this...โ€ โ— โ€œIt would be neat if I also added this...โ€ โ— โ€ฆhowever, these observations can be written as new stories โ— โ€ฆand sometimes it's good to fix things before they cause pain โ— โ€ฆif extra changes are really necessary, can they be implemented separately? โ— โ€ฆteam should be empowered to fix technical debt โ— ...managing scope-creep is a shared responsibility โ— Be prepared to abandon a change if it's taking too long, maybe it needs more planning? โ— Should you be pairing? โ— Should you demo your work?
  • 39. Code review โ— Code review possible with tools for distributed teams (e.g. Gerrit or ReviewBoard) โ— If you're not following a strict pairing policy, code-review is vital โ— Useful to make others aware of changes โ— Gerrit โ— Build agent automatically builds your change and runs tests โ€“ verify +/- 1 โ— Invite others to review your code, they can give it a score between -2 and +2. โ— Can only deploy code once at least one person has given a +2 โ— Work-flow is customisable โ— Self-organising... anyone can review $> git commit $> git review
  • 42. Merge / Deployment โ— Merge & Deployment โ— One-click deployment โ— Developer should press the button โ— Code is merged into the master/release branch โ— Build server automatically checks out the code and builds, tags, and uploads the release to an artefact repository โ— Package is automatically deployed on all servers โ€“ Extra orchestration for external-facing services to avoid โ€œthundering-herdโ€ problems
  • 43. Managing infrastructure โ— Puppet or Chef โ— Build packages (e.g. DEB or RPM) โ— Centralise configuration management โ— Utilising cloud compute infrastructure โ— Amazon EC2 โ— Amazon S3 โ— Elastic load balancers โ— Elastic Map-Reduce โ— Application monitoring โ— Metrics โ— Log analysis โ— Internal monitoring โ— External checks
  • 45. Technical lessons learnt โ— Use distributed SOA-based services to reduce tight- coupling โ— Monitor everything... โ— Leverage cloud offerings โ— wrap them with well-defined interfaces to avoid lock-in โ— Design systems to scale โ— Use open and unmodified components where possible โ— Standard components fronting external APIs โ— E.g. Jena, Solr, Haproxy, Apache
  • 46. Practices that have helped us โ— Dev-ops culture โ— Pragmatic approach to agile development โ— Task allocation should be 'pull', rather than 'push' โ— Teams should be self-organising โ— Pairing when working on new problems โ— Test-Driven-Development (TDD) โ— Continuous integration โ— Peer-review of code โ— Continuous deployment
  • 48. Conclusion โ— Isolate your design into components โ— Empower your team to release small changes frequently โ— Leverage hosted/cloud offerings
  • 50. Credits โ— Thanks for the invite to speak โ— Thanks to Kasabi / Talis Systems Ltd โ— Sign up at http://www.kasabi.com Graphics from http://www.iconarchive.com/, http://www.oxygen-icons.org and http://www.icons-land.com