SlideShare uma empresa Scribd logo
1 de 80
WHAT’S IN STORE

•
•
    •
    •
    •
    •
    •
    •
    •
    •
A DISCLAIMER

•
    •


    •
    •
    •

•
MICHAEL PEACOCK
SMITH ELECTRIC VEHICLES
NO. NOT MILK FLOATS
    ALL ELECTRIC, COMMERCIAL VEHICLES.                                      (ANYMORE)




Photo courtesy of kenjonbro: http://www.flickr.com/photos/kenjonbro/4037649210/in/set-72157623026469013
ALL-ELECTRIC COMMERCIAL VEHICLES
ELECTRIC VEHICLES

•
    •
    •
    •
    •
•
    •
    •
    •
    •
DATA CHALLENGES FROM ELECTRIC VEHICLES

•
•
•
•

     •
     •
     •
     •
     •
ENTER TELEMETRY

•
    •
    •
    •
    •
•
    •
    •
•
•
HUGE DATA VOLUMES
CURRENT STATS

•
    •

•
    •
    •
•
INITIAL MANDATE
INITIAL SOLUTION

•
•
•
NEW MANDATE
•
•
    •
    •
    •
    •
    •
    •
•
•
•
DATA STARTS TO INCREASE

    •


         •




http://www.flickr.com/photos/robotapocalypse/245508884/
http://www.flickr.com/photos/holyoutlaw/5920882576
SCALING
PROBLEM #0: INSERTS

•


    •

    •

    •

    •
INITIAL ARCHITECTURE
PROBLEM #1: AVAILABILITY
PROBLEM #2: CAPACITY




www.flickr.com/photos/eveofdiscovery/3149008295
CAPACITY
OPTION 1: CLOUD INFRASTRUCTURE

•
    •
    •
    •
PROBLEM WITH CLOUD INFRASTRUCTURE

•

•
•
SOLUTION #1: MQ




 www.flickr.com/photos/gadl/89650415/inphotostream
CLOUD BASED MESSAGE QUEUE
“THE CLOUD”
 ISN’T PERFECT




www.flickr.com/photos/brapps/403257780
PROBLEM #3: STORAGE SYSTEM STARTS TO CRAC




http://www.flickr.com/photos/mknott/2855987266
SOLUTION: GENIUS DBA & VENDOR
               CONSULTANCY
Sam Lambert – DBA Extraordinaire
               •

               •

               •
               •

               •

               •
LIVE, REAL-TIME INFORMATION
LIVE DATA: PROBLEMS

•

    •
    •
    •
    •
    •
LIVE, REAL TIME INFORMATION: PROBLEM

•
    •
    •

•

    •
LIVE DATA: GLOBAL MAP
REAL TIME INFORMATION: CONCURRENT

•
      •

      •

      •
      •
      •
RACE CONDITIONS



•
    •
    •
•
    •
    •
    •
LOTS OF DATA: RACE CONDITIONS

•

    •
    •




•
RACE CONDITIONS: PHP & SESSIONS
LIVE: STABLE BUT SLOW
CACHE THE DATA

•

•
•
MEMCACHE
// instantiate
$mc = new Memcache();
// set the memcache server and port
$mc->connect( MEMCACHE_SERVER, MEMCACHE_PORT );
// get data based on a key
$value = $mc->get(‘key’);
MEMCACHE FAILOVER
CACHING WITHIN LEGACY CODE
LAZY LOADING REGISTRY
public function getObject( $key )
{
    if( in_array( $key, array_keys( $this->objects ) ) )
    {
        return $this->objects[$key];
    }
    elseif( in_array( $key, array_keys( $this->objectSetup ) ) )
    {
        if( ! is_null( $this->objectSetup[ $key ]['abstract'] ) )
        {
            require_once( FRAMEWORK_PATH . 'registry/aspects/' . $this-
>objectSetup[ $key ]['folder'] . '/' .           $this->objectSetup[ $key]['abstract'] .'.abstract.php' );
       }
        require_once( FRAMEWORK_PATH . 'registry/aspects/' . $this->objectSetup[ $key ]['folder'] . '/' . $this-
        >objectSetup[ $key]['file'] . '.class.php' );
        $o = new $this->objectSetup[ $key ]['class']( $this );
        $this->storeObject( $o, $key );
        return $o;
    }
    elseif( $key == 'memcache' )
    {
        // requesting memcache for the first time, instantiate, connect, store and return
        $mc = new Memcache();
        $mc->connect( MEMCACHE_SERVER, MEMCACHE_PORT );
        $this->storeObject( $mc, 'memcache' );
        return $mc;
    }
}
LLR: NOT FIT FOR PURPOSE
REAL TIME INFORMATION: # OF REQUESTS

•



•



•
RACE CONDITIONS: USE A TEMPLATE ENGINE

•

•


•
RACE CONDITIONS: USE A SINGLE ENTRY
                  POINT
•


•



•


•
A LITTLE BREATHING SPACE


•
•
    •
    •
•
•
    •
THINGS WERE STILL SLOW

•
•
•
    •
•
•
GENERATING PERFORMANCE REPORTS


•

    •
    •
    •
•
    •
    •
    •
NOSQL




•
•
•
REGULAR PROCESSING

•
    •

    •
    •
QUERY SPEED BECOMES MORE IMPORTANT

•

•
SHARDING

•

•

•
WHICH BUCKET?
public function getTableNameFromDate( $date ) {
    // we can’t query for future data, so if the date is in the future
    // reset it to today
    $date = ( $date > date( 'Y-m-d') ) ? date('Y-m-d') : $date;
    // get the time in seconds since epoc
    $stt = strtotime( $date );
    // is the query date since we implemented sharding?
    if( $date >= $this->switchOver ) {
        // calculate the year this week is from
        $year = ( date( 'm', $stt ) == 01 && date( 'W', $stt ) == 52 ) ? date('Y',
         $stt ) - 1 : date('Y', $stt );
        // add the year and the week number to the table, and return
        return ‘data_' . $year . '_' . date('W', $stt );
    } else {
        // return the legacy table
        return 'data';
    }
}
QUERY MAGIC

private function parseQuery( $sql, $date=null )
{
    $date = ( is_null( $date ) ) ? date('Y-m-d') : $date;
    return sprintf( $sql, $this->shardHelper->getTableNameFromDate( $date ) );
}
BACKUPS & ARCHIVES

•
•
    •
    •
        •
        •
SHARDING: AN EXCUSE




•
•
•
DATA TYPES

•


•
•
    •
    •
INDEX OPTIMISATION

•
•


    •
QUERIES: OPTIMISATION TIPS

•
•
•
CONCURRENT DATABASE CONNECTIONS




•

•
REPORTS: DB CONNECTION MANAGEMENT

•

     •


     •

     •

     •
CONCURRENT DATA PROCESSING


$counter = 1;
foreach( $vehicles as $vehicle )
{
    if( ( $counter % $concurrent ) == $instance )
    {
        $vehiclesToProcess[] = $vehicle;
    }
    $counter++;
}
SYSTEM ARCHITECTURE
SYSTEM ARCHITECTURE
APPLICATION QUALITY

•

    •
    •
    •
DEPLOYMENT

•


•
ADDITIONAL POINTS
DON’T LOG “EVERYTHING”: EXTRAPOLATE AND
                    ASSUME
•

•
       •

       •
•
       •


       •
EXTRAPOLATE AND ASSUME: “INTERLATION”

•
     •


     •


     •
INTERLACE
       * Add an array to the interlation
       public function addArray( $name, $array )

       * Get the time that we first receive data in one of our arrays
       public function getFirst( $field )

       * Get the time that we last received data in any of our arrays
       public function getLast( $field )

       * Generate the interlaced array
       public function generate( $keyField, $valueField )

       * Beak the interlaced array down into seperate days
       public function dayBreak( $interlationArray )

        * Generate an interlaced array and fill for all timestamps within the range of
            _first_ to _last_
       public function generateAndFill( $keyField, $valueField )

       * Populate the new combined array with key fields using the common field
       public function populateKeysFromField( $field, $valueField=null )


http://www.michaelpeacock.co.uk/interlation-library
ABSTRACTION
TECHNICAL TIPS

•

    •
•

    •
WHAT DO WE HAVE NOW?
CONCLUSIONS
•
•

•

•

•
•
•

•
•

•
Q&A

Mais conteúdo relacionado

Mais procurados

Introducing Assetic (NYPHP)
Introducing Assetic (NYPHP)Introducing Assetic (NYPHP)
Introducing Assetic (NYPHP)
Kris Wallsmith
 
News of the Symfony2 World
News of the Symfony2 WorldNews of the Symfony2 World
News of the Symfony2 World
Fabien Potencier
 
Beyond symfony 1.2 (Symfony Camp 2008)
Beyond symfony 1.2 (Symfony Camp 2008)Beyond symfony 1.2 (Symfony Camp 2008)
Beyond symfony 1.2 (Symfony Camp 2008)
Fabien Potencier
 
Dance for the puppet master: G6 Tech Talk
Dance for the puppet master: G6 Tech TalkDance for the puppet master: G6 Tech Talk
Dance for the puppet master: G6 Tech Talk
Michael Peacock
 
Symfony & Javascript. Combining the best of two worlds
Symfony & Javascript. Combining the best of two worldsSymfony & Javascript. Combining the best of two worlds
Symfony & Javascript. Combining the best of two worlds
Ignacio Martín
 

Mais procurados (20)

Perl Web Client
Perl Web ClientPerl Web Client
Perl Web Client
 
Introducing Assetic (NYPHP)
Introducing Assetic (NYPHP)Introducing Assetic (NYPHP)
Introducing Assetic (NYPHP)
 
Nubilus Perl
Nubilus PerlNubilus Perl
Nubilus Perl
 
News of the Symfony2 World
News of the Symfony2 WorldNews of the Symfony2 World
News of the Symfony2 World
 
Symfony 2.0 on PHP 5.3
Symfony 2.0 on PHP 5.3Symfony 2.0 on PHP 5.3
Symfony 2.0 on PHP 5.3
 
Running a Scalable And Reliable Symfony2 Application in Cloud (Symfony Sweden...
Running a Scalable And Reliable Symfony2 Application in Cloud (Symfony Sweden...Running a Scalable And Reliable Symfony2 Application in Cloud (Symfony Sweden...
Running a Scalable And Reliable Symfony2 Application in Cloud (Symfony Sweden...
 
Symfony2 revealed
Symfony2 revealedSymfony2 revealed
Symfony2 revealed
 
Rich domain model with symfony 2.5 and doctrine 2.5
Rich domain model with symfony 2.5 and doctrine 2.5Rich domain model with symfony 2.5 and doctrine 2.5
Rich domain model with symfony 2.5 and doctrine 2.5
 
Zero to SOLID
Zero to SOLIDZero to SOLID
Zero to SOLID
 
Forget about index.php and build you applications around HTTP!
Forget about index.php and build you applications around HTTP!Forget about index.php and build you applications around HTTP!
Forget about index.php and build you applications around HTTP!
 
New Symfony Tips & Tricks (SymfonyCon Paris 2015)
New Symfony Tips & Tricks (SymfonyCon Paris 2015)New Symfony Tips & Tricks (SymfonyCon Paris 2015)
New Symfony Tips & Tricks (SymfonyCon Paris 2015)
 
Decoupling the Ulabox.com monolith. From CRUD to DDD
Decoupling the Ulabox.com monolith. From CRUD to DDDDecoupling the Ulabox.com monolith. From CRUD to DDD
Decoupling the Ulabox.com monolith. From CRUD to DDD
 
Silex meets SOAP & REST
Silex meets SOAP & RESTSilex meets SOAP & REST
Silex meets SOAP & REST
 
The IoC Hydra
The IoC HydraThe IoC Hydra
The IoC Hydra
 
Beyond symfony 1.2 (Symfony Camp 2008)
Beyond symfony 1.2 (Symfony Camp 2008)Beyond symfony 1.2 (Symfony Camp 2008)
Beyond symfony 1.2 (Symfony Camp 2008)
 
The IoC Hydra - Dutch PHP Conference 2016
The IoC Hydra - Dutch PHP Conference 2016The IoC Hydra - Dutch PHP Conference 2016
The IoC Hydra - Dutch PHP Conference 2016
 
Dance for the puppet master: G6 Tech Talk
Dance for the puppet master: G6 Tech TalkDance for the puppet master: G6 Tech Talk
Dance for the puppet master: G6 Tech Talk
 
Symfony & Javascript. Combining the best of two worlds
Symfony & Javascript. Combining the best of two worldsSymfony & Javascript. Combining the best of two worlds
Symfony & Javascript. Combining the best of two worlds
 
Symfony2, creare bundle e valore per il cliente
Symfony2, creare bundle e valore per il clienteSymfony2, creare bundle e valore per il cliente
Symfony2, creare bundle e valore per il cliente
 
Scaling Symfony2 apps with RabbitMQ - Symfony UK Meetup
Scaling Symfony2 apps with RabbitMQ - Symfony UK MeetupScaling Symfony2 apps with RabbitMQ - Symfony UK Meetup
Scaling Symfony2 apps with RabbitMQ - Symfony UK Meetup
 

Destaque

Android Jump Start
Android Jump StartAndroid Jump Start
Android Jump Start
ConFoo
 

Destaque (7)

Paraccel/Database Architechs Press Release
Paraccel/Database Architechs Press ReleaseParaccel/Database Architechs Press Release
Paraccel/Database Architechs Press Release
 
Android Jump Start
Android Jump StartAndroid Jump Start
Android Jump Start
 
Php through the eyes of a hoster confoo
Php through the eyes of a hoster confooPhp through the eyes of a hoster confoo
Php through the eyes of a hoster confoo
 
Symfony CMF: un nuovo paradigma per la gestione dei contenuti
Symfony CMF: un nuovo paradigma per la gestione dei contenutiSymfony CMF: un nuovo paradigma per la gestione dei contenuti
Symfony CMF: un nuovo paradigma per la gestione dei contenuti
 
Running on Amazon EC2
Running on Amazon EC2Running on Amazon EC2
Running on Amazon EC2
 
Trouvez la faille! - Confoo 2012
Trouvez la faille! - Confoo 2012Trouvez la faille! - Confoo 2012
Trouvez la faille! - Confoo 2012
 
Time Table Management System
Time Table Management SystemTime Table Management System
Time Table Management System
 

Semelhante a Dealing with Continuous Data Processing, ConFoo 2012

PHP and Rich Internet Applications
PHP and Rich Internet ApplicationsPHP and Rich Internet Applications
PHP and Rich Internet Applications
elliando dias
 

Semelhante a Dealing with Continuous Data Processing, ConFoo 2012 (20)

ApacheCon 2005
ApacheCon 2005ApacheCon 2005
ApacheCon 2005
 
Painless Persistence in a Disconnected World
Painless Persistence in a Disconnected WorldPainless Persistence in a Disconnected World
Painless Persistence in a Disconnected World
 
Spring batch
Spring batchSpring batch
Spring batch
 
Oracle SQL Tuning
Oracle SQL TuningOracle SQL Tuning
Oracle SQL Tuning
 
QA for PHP projects
QA for PHP projectsQA for PHP projects
QA for PHP projects
 
PHP and Rich Internet Applications
PHP and Rich Internet ApplicationsPHP and Rich Internet Applications
PHP and Rich Internet Applications
 
Advanced Php - Macq Electronique 2010
Advanced Php - Macq Electronique 2010Advanced Php - Macq Electronique 2010
Advanced Php - Macq Electronique 2010
 
Jquery Fundamentals
Jquery FundamentalsJquery Fundamentals
Jquery Fundamentals
 
Unit testing with zend framework tek11
Unit testing with zend framework tek11Unit testing with zend framework tek11
Unit testing with zend framework tek11
 
Let your DBAs get some REST(api)
Let your DBAs get some REST(api)Let your DBAs get some REST(api)
Let your DBAs get some REST(api)
 
Unit testing with zend framework PHPBenelux
Unit testing with zend framework PHPBeneluxUnit testing with zend framework PHPBenelux
Unit testing with zend framework PHPBenelux
 
Oracle SQL Tuning
Oracle SQL TuningOracle SQL Tuning
Oracle SQL Tuning
 
Slow Database in your PHP stack? Don't blame the DBA!
Slow Database in your PHP stack? Don't blame the DBA!Slow Database in your PHP stack? Don't blame the DBA!
Slow Database in your PHP stack? Don't blame the DBA!
 
Harness SharePoint and jQuery to Make Dynamic Displays and Applications
 Harness SharePoint and jQuery to Make Dynamic Displays and Applications Harness SharePoint and jQuery to Make Dynamic Displays and Applications
Harness SharePoint and jQuery to Make Dynamic Displays and Applications
 
Spl Not A Bridge Too Far phpNW09
Spl Not A Bridge Too Far phpNW09Spl Not A Bridge Too Far phpNW09
Spl Not A Bridge Too Far phpNW09
 
CCM AlchemyAPI and Real-time Aggregation
CCM AlchemyAPI and Real-time AggregationCCM AlchemyAPI and Real-time Aggregation
CCM AlchemyAPI and Real-time Aggregation
 
[LDSP] Solr Usage
[LDSP] Solr Usage[LDSP] Solr Usage
[LDSP] Solr Usage
 
Understanding backbonejs
Understanding backbonejsUnderstanding backbonejs
Understanding backbonejs
 
What's new in the Drupal 7 API?
What's new in the Drupal 7 API?What's new in the Drupal 7 API?
What's new in the Drupal 7 API?
 
99% is not enough
99% is not enough99% is not enough
99% is not enough
 

Mais de Michael Peacock

Refactoring to symfony components
Refactoring to symfony componentsRefactoring to symfony components
Refactoring to symfony components
Michael Peacock
 
Powerful and flexible templates with Twig
Powerful and flexible templates with Twig Powerful and flexible templates with Twig
Powerful and flexible templates with Twig
Michael Peacock
 
Introduction to OOP with PHP
Introduction to OOP with PHPIntroduction to OOP with PHP
Introduction to OOP with PHP
Michael Peacock
 
Evolution of a big data project
Evolution of a big data projectEvolution of a big data project
Evolution of a big data project
Michael Peacock
 
Data at Scale - Michael Peacock, Cloud Connect 2012
Data at Scale - Michael Peacock, Cloud Connect 2012Data at Scale - Michael Peacock, Cloud Connect 2012
Data at Scale - Michael Peacock, Cloud Connect 2012
Michael Peacock
 
PHP Continuous Data Processing
PHP Continuous Data ProcessingPHP Continuous Data Processing
PHP Continuous Data Processing
Michael Peacock
 
PHP North East Registry Pattern
PHP North East Registry PatternPHP North East Registry Pattern
PHP North East Registry Pattern
Michael Peacock
 
PHP North East - Registry Design Pattern
PHP North East - Registry Design PatternPHP North East - Registry Design Pattern
PHP North East - Registry Design Pattern
Michael Peacock
 

Mais de Michael Peacock (20)

Immutable Infrastructure with Packer Ansible and Terraform
Immutable Infrastructure with Packer Ansible and TerraformImmutable Infrastructure with Packer Ansible and Terraform
Immutable Infrastructure with Packer Ansible and Terraform
 
Test driven APIs with Laravel
Test driven APIs with LaravelTest driven APIs with Laravel
Test driven APIs with Laravel
 
Symfony Workflow Component - Introductory Lightning Talk
Symfony Workflow Component - Introductory Lightning TalkSymfony Workflow Component - Introductory Lightning Talk
Symfony Workflow Component - Introductory Lightning Talk
 
Alexa, lets make a skill
Alexa, lets make a skillAlexa, lets make a skill
Alexa, lets make a skill
 
API Development with Laravel
API Development with LaravelAPI Development with Laravel
API Development with Laravel
 
An introduction to Laravel Passport
An introduction to Laravel PassportAn introduction to Laravel Passport
An introduction to Laravel Passport
 
Phinx talk
Phinx talkPhinx talk
Phinx talk
 
Refactoring to symfony components
Refactoring to symfony componentsRefactoring to symfony components
Refactoring to symfony components
 
Powerful and flexible templates with Twig
Powerful and flexible templates with Twig Powerful and flexible templates with Twig
Powerful and flexible templates with Twig
 
Introduction to OOP with PHP
Introduction to OOP with PHPIntroduction to OOP with PHP
Introduction to OOP with PHP
 
Vagrant
VagrantVagrant
Vagrant
 
Evolution of a big data project
Evolution of a big data projectEvolution of a big data project
Evolution of a big data project
 
Data at Scale - Michael Peacock, Cloud Connect 2012
Data at Scale - Michael Peacock, Cloud Connect 2012Data at Scale - Michael Peacock, Cloud Connect 2012
Data at Scale - Michael Peacock, Cloud Connect 2012
 
Supermondays twilio
Supermondays twilioSupermondays twilio
Supermondays twilio
 
PHP & Twilio
PHP & TwilioPHP & Twilio
PHP & Twilio
 
PHP Continuous Data Processing
PHP Continuous Data ProcessingPHP Continuous Data Processing
PHP Continuous Data Processing
 
PHP North East Registry Pattern
PHP North East Registry PatternPHP North East Registry Pattern
PHP North East Registry Pattern
 
PHP North East - Registry Design Pattern
PHP North East - Registry Design PatternPHP North East - Registry Design Pattern
PHP North East - Registry Design Pattern
 
Supermondays: Jenkins CI lightning talk
Supermondays: Jenkins CI lightning talkSupermondays: Jenkins CI lightning talk
Supermondays: Jenkins CI lightning talk
 
Corporate Structures - September 2010
Corporate Structures - September 2010Corporate Structures - September 2010
Corporate Structures - September 2010
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 

Dealing with Continuous Data Processing, ConFoo 2012

Notas do Editor

  1. For the past 10 months I’ve been involved in a very challenging big-data focused application. This presentation is a case study of our particular application, with some of the challenges and solutions we found in dealing with the huge volumes of data, processing them, and displaying them within the web application. The challenges ranged from being available to process the data, storing data on volume, querying the database quickly, keeping the web application responsive.Our team inherited a large legacy application, which meant we were left with stability problems, hacks and architecture decisions which we didn’t necessarily agree with.
  2. For those of you who don’t know me; I’m Michael Peacock – I’m the web systems developer for Smith Electric Vehicles on their telemetry project. I’m an experienced lead developer, I spent 5 years running a small web design and development firm, writing bespoke CMS, E-Commerce and CRM solutions. I’ve written a few PHP related books, and volunteer with the PHP North-East user group in the UK.You can find me on twitter, or contact me if you have any questions later.
  3. Smith are the worlds largest manufacturer of all electric, commercial vehicles. Founded over 90 years ago to build electric delivery vehicles – both battery based and cable based. In 2009 the company opened its doors in the US, and at the start of last year the US operation bought out the European company which brings us to where we are today.
  4. Normally when I tell people I work with Electric Vehicles, they think of hybrids like the Prius, or they think about passenger electric vehicles such as the Nissan Leaf. When I tell them its COMMERCIAL electric vehicles, they think about milkfloats or airport passenger buggies.What we actually make are large scale, fully electric delivery and transport vehicles.
  5. These vehicles range from depo based delivery vehicles to home delivery vehicles to utility applications, military applications right through to the american school bus.
  6. These are 16 and a half thousand to 26 thousand pound delivery vehicles, capable of supporting upto 16 thousand pound payload, with a top speed of 80km/h.
  7. Electric vehicles bring to us a huge data challenge. As I’m sure you can appreciate, they are a new continually evolving technology; people are looking for viability evidence; government want to do research; For passenger vehicles charging infrastructures need to be planned and developed. Customers who buy these vehicles want to monitor them; be these performance metrics or to evaluate that their drivers have been properly educated to drive an electric vehicle – as they are very different to drive.
  8. The solution to these problems was to develop a vehicle telematics service to collect data on the vehicles. Our vehicles, and many other types of vehicles, have a number of internal networks, called CAN buses, which broadcast data continuously around the vehicle – in such a way that a central host isn’t required. The problem is they broadcast data internally at a rate of hundreds of times per second. We took and developed a CANBus monitor to sample this on a per second basis. This information included: drive train – how hard and fast are the pedals being pressed, how fast is the motor spinning, how hot is the motor. What is the overall state of the battery – current, voltage, temperature and state of charge. What about the individual modules in the battery, how are they performing?We also needed to know where the vehicle was, what the outside temperature is and so on.We also need to report any error codes the vehicle reports. Our telemetry unit samples all of this information on a per second basis, packages it up and broadcasts it over the GPRS network. ---Another problem was that the project was initially designed by a group of contractors, each handling a different aspect of the application, developing what was initially a proof of concept system. As the project grew, requirements changed, the data volumes increased exponentially, data became more important and stability became a big problem – right about this time the company hired a full time internal telemetry team. The team consists of two hands on technical staff: myself as the web application developer, and my colleague as the systems administrator and DBA
  9. We monitor 2,500 data points from the vehicles CANbus, because of the per-second sampling, we have this per second per vehicle...
  10. Currently, we have around 500 vehicles with telemetry installed giving us data. Telemetry is now going to be a standard component of the vehicle – meaning we are going to have to deal with the data of every vehicle that rolls off the production line.Our MySQL solution had to deal with around one and a half billion MySQL Inserts every single day; we have a constant minimum of 4000 inserts every day.It has recently been publicly announced that new production facilities are being built and new partners are going to be making and selling our vehicles; that means we will have even more data to worry about.
  11. When the project was initially conceived, it was a really simple mandate: we only needed a small number of data points collected for each vehicle, which we would export and provide to grant authorities. A challenge in itself, but not too big of a worry.
  12. The teams of contractors who were tasked with developing the initial proof of concept solution created a basic web application, with a simple database:A single table held all of the dataA single table held all the data descriptorsThese two were joined to match the data e.g. 100% with the descriptor Battery State of ChargeA key field was part of the table to link the data back to the vehicle.The vehicles communicated directly with our servers.
  13. Once the proof of concept was delivered and the benefits of the system were seen, a new mandate was given. Why should we just give a small sample away, why don’t we keep all of the data to monitor the vehicles, monitor the technology, deal with service and warranty issues (after all, we don’t have a dealer on every corner our customers can turn to), and look at vehicle performance data.
  14. Once this mandate came into play, there were still only a small number of vehicles with telemetry installed. The initial team working on the project took a sledge hammer to the database. Where there was initially only one database, they created one for every vehicle – making it easier to add new hardware – so long as the solution knew which machine a specific database was one; and quicker to query the data.
  15. Initial application level sharding meant we could scale out with new databases and new database servers as the number of vehicles went up.It doesn’t help us if the data we collect, or the data retention policy, increases; it also still leaves us with slow running queries.
  16. Problem 0 was the number of inserts into the database. Mysql_insert() is very slow. We need data to be taken by the server and inserted into the database as quickly as possible so that live, real time information can be displayed. The solution, an easy win, was to insert the data in large batches. These large batches are large in terms of the number of inserts, but small in terms of the time they relate to, only a second or two. LOAD DATA INFILE is something MySQL can do quickly, so we were able to push data into the solution quickly.However, thats not enough on its own; we don’t just have inserts to deal with – we need to work with the data too, and deal with a host of other problems.
  17. The inital application architecture involved vehicles communicating directly with our solution.
  18. The problem with this is availability.What happens if our solution goes offline?We need all of that data, because we deliver it to grant authorities. We need it because we don’t want to miss a vehicle fault code. We need it because we want to accurately calculate the vehicles energy usage and driving range.
  19. The other problem is capacity; we need to be able to cope with data which fluctuates. We only get data when the vehicle is on, or charging; when the vehicle is on we get data every second, when it charges we get data every minute. We can’t easily plan for new vehicles coming off the production line as some vehicles take a while before they go into active service – customers want to brand the vehicles, they need to train their drivers, then they put them into service.
  20. The other problem is the capacity of our servers. With a large amount of data coming in, and a large number of collection devices giving us this data, we could find our selves vulnerable to a Distributed Denial of Service attack that we ourselves authorised. This would lead to us being unable to process some or all data, some data being lost, and potentially, downtime.As more and more vehicles are used more and more regularly our servers will run the risk of catching fire!
  21. One option when faced with problems like this, is of course standard cloud based infrastructure. With the likes of Amazons EC2, more machines could be powered on when demand was high, and different availability zones can help in the event of machine downtime or network problems.
  22. However cloud based solutions have problems themselves. The first is that virtualised hardware isn’t well suited for large volumes of MySQL inserts, and large amounts of I/O. That has changed somewhat recently with cloud based relational databases, but wasn’t the case at the time.We also had an existing hardware investment as a result of our proof of concept application.Security and legal issues prevented us from storing un-encrypted data permanently off-site.
  23. The solution, was a message queue.
  24. We integrated a cloud based message queue into the service, based off the open AMQP standard. Data transfer was encrypted over SSL, and stored in the queue as an encrypted message. As the queue was a cloud based service, it could grow or shrink as our demand changed.This allows us not only to work around capacity problems but also added higher availability with the likes of availability zones offered by many cloud based providers.
  25. But of course, the cloud isn’t perfect. There have been a number of famous cloud-service outages, which have affected multiple availability zones at the same time of a number of cloud hosting providers. So we need to take some additional precautions. A small buffer on the vehicle itself.
  26. If our service teams get a call about a vehicle thats off the road, they need to be able to look and see how the vehicle is operating and performing, in real time. We need to provide a large range of metrics in real time to our users, that continually updates to reflect the current state of the vehicle.
  27. Showing data in real time causes a number of headaches:Processing the huge number of insertsData and legacy application architecture Race conditionsAccessing the data quicklyA global view
  28. Imagine viewing a customers fleet of 30 vehicles on a map? 60 queries refreshing every 30 seconds. The second issue was made even more problematic, thanks to a management request: Global map.
  29. Initial team made use of Flash based charts library. This was both good and bad. The Good:Requests for data were asynchronousMain page loaded quickly The badEach chart was a separate queryEach chart was a separate requestSession authentication introduced race conditions
  30. Sessions in PHP close at the end of the execution cycleUnpredictable query timesLarge number of concurrent requests per screenSession Locking Completely locks out a users session, as PHP hasn’t closed the session
  31. session_write_close()Added after each write to the $_SESSION array. Closes the current session.(requires a call to session_start immediately before any further reads or writes)
  32. We now had a live screen which was stable (it didn’t lock) but was still slow
  33. We cached most upto date datapoints for each vehicle in an in-memory key-value storeMemcache! Allows us to quickly get access to live data (which continually changes) without hitting the database
  34. Although for us, the Lazy Loading Registry works, the inclusion of the memcache connection is stretching its flexibility.Better approach: Dependency Injection ContainerWe could instead prepare configuration data and pass it within a DIC to the relevant features
  35. Currently, each piece of “live data” is loaded into a flash graph or widget, which updates every 30 seconds using an AJAX requestThe move from MySQL to Memcache reduces database load, but large number of requests still add strain to web serverMaking more use of text and image based representations, all of which can be updated from a single AJAX request
  36. V1 of the system mixed PHP and HTML You can’t re-initialise your session once output has been sent All new code uses a template engine, so session interaction has no bearing on output. When the template is processed and output, all database and session work has been completed long before.
  37. Race conditions are further exacerbated by the PHP timeout valuesCertain exports, actions and processes take longer than 30 seconds, so the default execution time is longerInitially the project lacked a single entry point, and execution flow was muddledSingle Entry Point makes it easier to enforce a lower time out, which is overridden by intensive controllers or models
  38. We now had an application which was: Stable: sessions were not locking Quick in parts:Fleet overview (map)Vehicle live screen Gaining the confidence of users Unfortunately...Speed was only the appearance
  39. Generating performance data Backing up and archiving data Exporting data for grant authorities Our initial mandate! Viewing historical data Viewing analytical data
  40. In order to look at how a vehicle performed for a given day, we needed to analyse a lot of data points. We needed to take several types of data for a single day and perform calculations on that data; this was to give us detail on efficiency, distance, speeds and driver style. Although this took a little while to load, because we had to deal with lots of data, users were initially willing to wait – they understood that it involved lots of processing.However! Soon they wanted to look at more than a day at a time, and more than a vehicle at a time, and they were asking questions, such as: How far has this customers vehicles travelled last week How do the efficiencies of vehicles in NY compare to OH How far have all our our vehicles ever travelledQuestions which we couldn’t answer
  41. Introduced regular, automated data processingPerformance calculations were done over night, for every vehicleSaved as a summary record, one per vehicle per dayReally, really, really easy and quick to pull out, aggregate, compare and analyse
  42. Pulling data out was still very slow, especially when analysing data over the course of a day We decided to shard the database again, this time based on the date: Week Number Data stored before we implemented sharding wasn’t stored in a sharded table
  43. Shading makes backing up and archiving easier Simply export a sharded-week table and saveNo long queries which have to find data based on dateIf its an archive (not just a backup)DROP the table afterwardsNo need to run a slow delete query based on a wide ranging WHERE clause
  44. With SQL based database systems, each data type available for a field uses a set amount of storage space. A good example is integers, MySQL offers a range of different integer fields, each type is able to store a different range of values, the greater the range, the more storage space the field needs to use – regardless of if the value of the field is part of that range, as opposed to the range of the next field type down. If you know the data in a particular field is always within a specific range – use the data type with the smallest size which supports the range you require. When you need to store data at scale, an over eager datatype can cost you dearly.Similarly, make sure the data type is optimised for the work you are doing on it. When it comes to Ints, floats, doubles and decimals some are more suited to others for arithmetic work because of the part of the CPU they use.
  45. A few other issues and cavets we faced, which don’t really sit nicely in the timeline I’ve just given
  46. Our telemetry unit broadcasts each data point once per secondData doesn’t change every second, e.g.Battery state of charge may take several minutes to loose a percentage pointFault flags only change to 1 when there is a faultMake an assumption. We compare the data to the last known value…if it’s the same we don’t insert, instead we assume it was the sameUnfortunately, this requires us to put additional checks and balances in place