SlideShare uma empresa Scribd logo
1 de 47
Baixar para ler offline
In-Database
Predictive Analytics
        John A. De Goes
   @jdegoes, john@precog.com
Agenda




  •   Introduction
  •   Abusing SQL
  •   Painful by Design
  •   Database Extensions
  •   MADlib
  •   Other Approaches
  •   Summary
Introduction




    In-Database Predictive Analytics

    In-database predictive analytics refers
    to the the process of performing
    advanced predictive analytics directly
    inside the database.
Introduction


      Traditional Predictive Analytics

        R

                            database

       SAS
Introduction




        R




                                  database
       SAS




               Data Bottleneck:
                Painful, Slow
Introduction




               What’s the answer?
Introduction
        Move the Code, not the Data!




                   Advanced
                   Analytics




                “MapReduce”
Abusing SQL




         Let’s Do K-Means in SQL!
Abusing SQL
       General Approach in RDBMS



                  SQL

       Driver              Database
                Feedback
Abusing SQL
                       Our Initial Model



                                         model
                   d             k             n           iteration       avg_q




          number of dimensions          number of points                   variance




                         number of clusters         number of iterations
Abusing SQL
              Our Initial Data Set

                          Y
         Y1        Y2            Y3   Y3




                        n rows
Abusing SQL
             Projection & Numbering

                   Y                                     YH
      Y1      Y2       Y3    ...                i     Y1       ...    Yd
       1                                       1
       2                                       2
       3                                       3
       4                                       4
       ...                                     ...
       ...                                     ...
       n                                       n

       INSERT INTO YH
       SELECT sum(1) over(rows unbounded preceding) AS i,Y1, Y2, ..., Yd
       FROM Y;
Abusing SQL
                           Flattening

               YH                                       YV
         i    Y1     ...     Yd                i          l       val
        1                                      1         1
        2                                      1         2
        3                                      1
                                               ...       ...
        4                                      1         d
        ...                                    2         1
        ...                                    ...       ...
        n                                      n         d
                                                     n x d rows

                   INSERT INTO YV SELECT i,1,Y1 FROM YH;
                   ...
                   INSERT INTO YV SELECT i,d,Yd FROM YH;
Abusing SQL
       Initializing k Cluster Centers

               YH                                          CH
         i    Y1    ...      Yd                  j      Y1      ...   Yd
        1                                        1
        2                                        2
        3                                        3
        4                                        4
        ...                                      ...
        ...                                      ...
        n                                        k

                    INSERT   INTO CH
                    SELECT   1,Y1, ..., Yd FROM YH SAMPLE 1;
                    ...
                    INSERT   INTO CH
                    SELECT   k,Y1, ..., Yd FROM YH SAMPLE 1;
Abusing SQL
                          Flattening

               CH                                       C
        j     Y1    ...      Yd                 l        j        val
        1                                       1        1
        2                                       1        2
        3                                      ...       ...
        4                                       1        k
        ...                                     2        1
        ...                                    ...       ...
        k                                       d        k
                                                     d x k rows
                    INSERT   INTO C
                    SELECT   1, 1, Y1 FROM CH WHERE j = 1;
                    ...
                    INSERT   INTO C
                    SELECT   d, k, Yd FROM CH WHERE j = k;
Abusing SQL
     Computing Distances to Clusters

            YD
    i         j        dist
    1         1
    1         2
                              INSERT INTO YD
    ...       ...             SELECT i, j, sum((YV.val - C.val)**2)
    1         k               FROM YV, C WHERE YV.l = C.l
                                GROUP BY i, j;
    2         1
    ...       ...
    n         k
          n x k rows
Abusing SQL
          Computing Nearest Neighbors

     YNN
                 nearest clusters
     i       j
    1
    2            INSERT INTO YNN
                 SELECT YD.i,Y D.j
    3
                 FROM YD,
    4              (SELECT i, min(dist) AS mindist FROM YD
                   GROUP BY i) YMIND
    5
                 WHERE Y D.i = YMIND.i
    ...            and Y D.distance = YMIND.mindist;
    n
    n rows
Abusing SQL
         Count Points Per Cluster



    INSERT INTO W SELECT j, count(*)
    FROM YNN GROUP BY j;
    UPDATE W SET w = w/model.n;
Abusing SQL
         Compute New Centroids



    INSERT INTO C
    SELECT l, j, avg(YV.val) FROM YV, YNN
    WHERE YV.i = YNN.i GROUP BY l, j;
Abusing SQL
              Compute Variances

    INSERT INTO R
    SELECT C.l, C.j, avg((YV.val-
    C.val)**2)
    FROM C, YV, YNN
    WHERE YV.i = YNN.i
      and YV.l = C.l and YNN.j = C.j
    GROUP BY C.l, C.j;
Abusing SQL
               Update Model

    INSERT INTO R
    SELECT C.l, C.j, avg((YV.val-
    C.val)**2)
    FROM C, YV, YNN
    WHERE YV.i = YNN.i
      and YV.l = C.l and YNN.j = C.j
    GROUP BY C.l, C.j;
Abusing SQL




          Let’s not do that again!
Painful by Design




      Why are predictive analytics so
         hard to express in SQL?
Painful by Design
                    #1: No Arrays




   Sets             Tuples          Arrays
     rows             columns
Painful by Design
         #2: Relational Algebra Sucks

        Projection            Selection               Rename                 Natural Join
                                                                                    R            S




          Semijoin           Antijoin                 Division                 Theta Join
           R        S         R       S               R   ÷   S



        Left outer join   Right outer join      Full outer join              Aggregation
            R   ⟕   S         R   ⟖   S               R⟗ S        G1, G2, ..., Gm g f1(A1'), f2(A2'), ..., fk(Ak') (r)




      Iteration                           Recursion                      Multiple Dimensions
Database Extensions




      There’s GOT to be a better way!
Database Extensions




                      C Extension
Database Extensions




              UDF                    UDA
      User-Defined Function   User-Defined Aggregate




            Map                  Reduce
            map(a)                init(a)
           op2(a,b)             accum(a, b)
                                merge(a, b)
                                 final(a)
MADlib




   MADlib is an open-source library for
   scalable in-database analytics.
   It is implemented using database
   extensions written in C, and is available
   for PostgreSQL and Greenplum.
MADlib
          1. Download the binary


  Mac OS X
  http://www.madlib.net/files/madlib-0.6-
  Darwin.dmg


  Linux
  http://www.madlib.net/files/madlib-0.6-
  Linux.rpm
MADlib
             2. Start the Installation



  Mac OS X
  Double-click on installer


  Linux
  yum install $MADLIB_PACKAGE --nogpgcheck
MADlib
              3. Verify Locatability


  Greenplum
  source /path/to/greenplum/
  greenplum_path.sh


  PostgreSQL
  Make sure psql is in PATH
MADlib
               4. Register MADlib


  Greenplum
  /usr/local/madlib/bin/madpack -p greenplum
  -c $USER@$HOST/$DATABASE install


  PostgreSQL
  /usr/local/madlib/bin/madpack -p postgres
  -c $USER@$HOST/$DATABASE install
MADlib
               5. Test Installation


  Greenplum
  /usr/local/madlib/bin/madpack -p greenplum -c
  $USER@$HOST/$DATABASE install-check


  PostgreSQL
  /usr/local/madlib/bin/madpack -p postgres
  -c $USER@$HOST/$DATABASE install-check
MADlib
           Clustering in MADlib



  SELECT * FROM kmeans_random(
     'rel_source', 'expr_point', k,
     [ 'fn_dist', 'agg_centroid',
     max_num_iterations,
  min_frac_reassigned ]
  );
MADlib




         Ahhhhhh......
MADlib
         Our Way or the Highway




                Composability
Other Approaches




              RDBMS Isn’t the
            Only Game in Town!
Other Approaches
                    1. Embrace Coding


  • Hadoop Ecosystem
   • Mahout, Cascading/Scalding, Crunch/Scrunch, Pangool, Cascalog, and,
     of course, MapReduce
  • BDAS Ecosystem
   • Spark
Other Approaches
                        2. Reject RDBMS



  • Datalog + variants
   • In theory, ideal for many kinds of predictive analytics
   • Suffers from a lack of distributed, feature-complete implementations
Other Approaches
                         2. Reject RDBMS


  • Rasdaman / RASQL
    • Arrays but not analytics


  Community Editions
  http://www.rasdaman.org
Other Approaches
                           2. Reject RDBMS


  • MonetDB / SciQL
    • Array extension of SQL
    • Poor analytics


  Community Editions
  http://www.monetdb.org
Other Approaches
                        2. Reject RDBMS


  • SciDB / AFL (AQL)
    • Excellent analytics
    • Limited composability


  Community Editions
  http://www.scidb.org/forum/viewtopic.php?f=16&t=364/
Other Approaches
                         2. Reject RDBMS


  • Precog / Quirrel (simple “R for big data”)
    • Multidimensional, arrays + functions
    • Still immature


  Community Editions
  http://www.precog.com/editions/precog-for-mongodb (MongoDB)
  http://www.precog.com/editions/precog-for-postgresql (PostgreSQL)
Summary


  • Increase performance, reduce friction by doing more inside
    the database

  • Not a panacea
   • Hard to do in SQL
   • Hard to do in C (but you may not have to: MADlib)
   • Pre-canned & brittle in most databases


  • Ultimately what’s needed is tech designed for advanced
    analytics
Q&A
     John A. De Goes
@jdegoes, john@precog.com
References




  • Programming the K-means Clustering Algorithm in SQL
    (Teradata, NCR)

Mais conteúdo relacionado

Mais procurados

Mth 4108-1 b (ans)
Mth 4108-1 b (ans)Mth 4108-1 b (ans)
Mth 4108-1 b (ans)outdoorjohn
 
Scatter diagrams and correlation and simple linear regresssion
Scatter diagrams and correlation and simple linear regresssionScatter diagrams and correlation and simple linear regresssion
Scatter diagrams and correlation and simple linear regresssionAnkit Katiyar
 
Calculo y geometria analitica (larson hostetler-edwards) 8th ed - solutions m...
Calculo y geometria analitica (larson hostetler-edwards) 8th ed - solutions m...Calculo y geometria analitica (larson hostetler-edwards) 8th ed - solutions m...
Calculo y geometria analitica (larson hostetler-edwards) 8th ed - solutions m...ELMIR IVAN OZUNA LOPEZ
 
solucionario de purcell 3
solucionario de purcell 3solucionario de purcell 3
solucionario de purcell 3José Encalada
 
A Case Study of Expressively Constrainable Level Design Automation Tools for ...
A Case Study of Expressively Constrainable Level Design Automation Tools for ...A Case Study of Expressively Constrainable Level Design Automation Tools for ...
A Case Study of Expressively Constrainable Level Design Automation Tools for ...rndmcnlly
 
X2 t08 03 inequalities & graphs (2012)
X2 t08 03 inequalities & graphs (2012)X2 t08 03 inequalities & graphs (2012)
X2 t08 03 inequalities & graphs (2012)Nigel Simmons
 
X2 T08 01 inequalities and graphs (2010)
X2 T08 01 inequalities and graphs (2010)X2 T08 01 inequalities and graphs (2010)
X2 T08 01 inequalities and graphs (2010)Nigel Simmons
 
Ee107 sp 06_mock_test1_q_s_ok_3p_
Ee107 sp 06_mock_test1_q_s_ok_3p_Ee107 sp 06_mock_test1_q_s_ok_3p_
Ee107 sp 06_mock_test1_q_s_ok_3p_Sporsho
 
2010 mathematics hsc solutions
2010 mathematics hsc solutions2010 mathematics hsc solutions
2010 mathematics hsc solutionsjharnwell
 
Ann chapter-3-single layerperceptron20021031
Ann chapter-3-single layerperceptron20021031Ann chapter-3-single layerperceptron20021031
Ann chapter-3-single layerperceptron20021031frdos
 
Modul penggunaan kalkulator sainstifik sebagai ABM dalam Matematik
Modul penggunaan kalkulator sainstifik sebagai ABM dalam MatematikModul penggunaan kalkulator sainstifik sebagai ABM dalam Matematik
Modul penggunaan kalkulator sainstifik sebagai ABM dalam MatematikNorsyazana Kamarudin
 
Hmm Tutorial
Hmm TutorialHmm Tutorial
Hmm Tutorialjefftang
 

Mais procurados (17)

Mth 4108-1 b (ans)
Mth 4108-1 b (ans)Mth 4108-1 b (ans)
Mth 4108-1 b (ans)
 
Lesson 1: Functions
Lesson 1: FunctionsLesson 1: Functions
Lesson 1: Functions
 
C1 january 2012_mark_scheme
C1 january 2012_mark_schemeC1 january 2012_mark_scheme
C1 january 2012_mark_scheme
 
Scatter diagrams and correlation and simple linear regresssion
Scatter diagrams and correlation and simple linear regresssionScatter diagrams and correlation and simple linear regresssion
Scatter diagrams and correlation and simple linear regresssion
 
Calculo y geometria analitica (larson hostetler-edwards) 8th ed - solutions m...
Calculo y geometria analitica (larson hostetler-edwards) 8th ed - solutions m...Calculo y geometria analitica (larson hostetler-edwards) 8th ed - solutions m...
Calculo y geometria analitica (larson hostetler-edwards) 8th ed - solutions m...
 
solucionario de purcell 3
solucionario de purcell 3solucionario de purcell 3
solucionario de purcell 3
 
A Case Study of Expressively Constrainable Level Design Automation Tools for ...
A Case Study of Expressively Constrainable Level Design Automation Tools for ...A Case Study of Expressively Constrainable Level Design Automation Tools for ...
A Case Study of Expressively Constrainable Level Design Automation Tools for ...
 
09 trial jpwp_s2
09 trial jpwp_s209 trial jpwp_s2
09 trial jpwp_s2
 
Day 12 slope stations
Day 12 slope stationsDay 12 slope stations
Day 12 slope stations
 
X2 t08 03 inequalities & graphs (2012)
X2 t08 03 inequalities & graphs (2012)X2 t08 03 inequalities & graphs (2012)
X2 t08 03 inequalities & graphs (2012)
 
X2 T08 01 inequalities and graphs (2010)
X2 T08 01 inequalities and graphs (2010)X2 T08 01 inequalities and graphs (2010)
X2 T08 01 inequalities and graphs (2010)
 
01 analysis-of-algorithms
01 analysis-of-algorithms01 analysis-of-algorithms
01 analysis-of-algorithms
 
Ee107 sp 06_mock_test1_q_s_ok_3p_
Ee107 sp 06_mock_test1_q_s_ok_3p_Ee107 sp 06_mock_test1_q_s_ok_3p_
Ee107 sp 06_mock_test1_q_s_ok_3p_
 
2010 mathematics hsc solutions
2010 mathematics hsc solutions2010 mathematics hsc solutions
2010 mathematics hsc solutions
 
Ann chapter-3-single layerperceptron20021031
Ann chapter-3-single layerperceptron20021031Ann chapter-3-single layerperceptron20021031
Ann chapter-3-single layerperceptron20021031
 
Modul penggunaan kalkulator sainstifik sebagai ABM dalam Matematik
Modul penggunaan kalkulator sainstifik sebagai ABM dalam MatematikModul penggunaan kalkulator sainstifik sebagai ABM dalam Matematik
Modul penggunaan kalkulator sainstifik sebagai ABM dalam Matematik
 
Hmm Tutorial
Hmm TutorialHmm Tutorial
Hmm Tutorial
 

Destaque

Post-Free: Life After Free Monads
Post-Free: Life After Free MonadsPost-Free: Life After Free Monads
Post-Free: Life After Free MonadsJohn De Goes
 
Analytics Maturity Model
Analytics Maturity ModelAnalytics Maturity Model
Analytics Maturity ModelJohn De Goes
 
アウトプットし続ける技術〜毎日書くためのマインドセットとスキルセット
アウトプットし続ける技術〜毎日書くためのマインドセットとスキルセットアウトプットし続ける技術〜毎日書くためのマインドセットとスキルセット
アウトプットし続ける技術〜毎日書くためのマインドセットとスキルセットMasanori Saito
 
Predictive analytics-nirmal.potx
Predictive analytics-nirmal.potxPredictive analytics-nirmal.potx
Predictive analytics-nirmal.potxWSO2
 
Quirrel & R for Dummies
Quirrel & R for DummiesQuirrel & R for Dummies
Quirrel & R for DummiesJohn De Goes
 
201406 IASA: Analytics Maturity - Unlocking The Business Impact
201406 IASA: Analytics Maturity - Unlocking The Business Impact201406 IASA: Analytics Maturity - Unlocking The Business Impact
201406 IASA: Analytics Maturity - Unlocking The Business ImpactSteven Callahan
 
Make Better Decisions With Your Data 20080916
Make Better Decisions With Your Data 20080916Make Better Decisions With Your Data 20080916
Make Better Decisions With Your Data 20080916Dan English
 
Rise of the scientific database
Rise of the scientific databaseRise of the scientific database
Rise of the scientific databaseJohn De Goes
 
Competing on analytics
Competing on analyticsCompeting on analytics
Competing on analyticsGreg Seltzer
 
20161209 JAWS-UG AI支部 #2 LT : Moving story of AWS/ML beginner engineer
20161209 JAWS-UG AI支部 #2 LT : Moving story of AWS/ML beginner engineer20161209 JAWS-UG AI支部 #2 LT : Moving story of AWS/ML beginner engineer
20161209 JAWS-UG AI支部 #2 LT : Moving story of AWS/ML beginner engineerAtsushi Neki
 
BIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICS
BIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICSBIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICS
BIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICSTIBCO Spotfire
 
The CDO Agenda: Competing with Data - Strategy and Organization
The CDO Agenda: Competing with Data - Strategy and OrganizationThe CDO Agenda: Competing with Data - Strategy and Organization
The CDO Agenda: Competing with Data - Strategy and OrganizationDATAVERSITY
 
AIA SOX Conference May 2009 - CCM & Data Analytics
AIA SOX Conference May 2009 - CCM & Data AnalyticsAIA SOX Conference May 2009 - CCM & Data Analytics
AIA SOX Conference May 2009 - CCM & Data Analyticsprosenzw69
 
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...Sarah Aerni
 
Analytics Environment
Analytics EnvironmentAnalytics Environment
Analytics EnvironmentYuu Kimy
 
Using MongoDB As a Tick Database
Using MongoDB As a Tick DatabaseUsing MongoDB As a Tick Database
Using MongoDB As a Tick DatabaseMongoDB
 
About alteryx
About alteryxAbout alteryx
About alteryxYuu Kimy
 
Io tビジネスモデルに関する考察20161119
Io tビジネスモデルに関する考察20161119Io tビジネスモデルに関する考察20161119
Io tビジネスモデルに関する考察20161119Keiichiro Nabeno
 
BI Maturity Model ppt
BI Maturity Model pptBI Maturity Model ppt
BI Maturity Model pptYiwei Chen
 

Destaque (20)

Post-Free: Life After Free Monads
Post-Free: Life After Free MonadsPost-Free: Life After Free Monads
Post-Free: Life After Free Monads
 
Analytics Maturity Model
Analytics Maturity ModelAnalytics Maturity Model
Analytics Maturity Model
 
アウトプットし続ける技術〜毎日書くためのマインドセットとスキルセット
アウトプットし続ける技術〜毎日書くためのマインドセットとスキルセットアウトプットし続ける技術〜毎日書くためのマインドセットとスキルセット
アウトプットし続ける技術〜毎日書くためのマインドセットとスキルセット
 
Predictive analytics-nirmal.potx
Predictive analytics-nirmal.potxPredictive analytics-nirmal.potx
Predictive analytics-nirmal.potx
 
Quirrel & R for Dummies
Quirrel & R for DummiesQuirrel & R for Dummies
Quirrel & R for Dummies
 
201406 IASA: Analytics Maturity - Unlocking The Business Impact
201406 IASA: Analytics Maturity - Unlocking The Business Impact201406 IASA: Analytics Maturity - Unlocking The Business Impact
201406 IASA: Analytics Maturity - Unlocking The Business Impact
 
Make Better Decisions With Your Data 20080916
Make Better Decisions With Your Data 20080916Make Better Decisions With Your Data 20080916
Make Better Decisions With Your Data 20080916
 
Rise of the scientific database
Rise of the scientific databaseRise of the scientific database
Rise of the scientific database
 
Competing on analytics
Competing on analyticsCompeting on analytics
Competing on analytics
 
20161209 JAWS-UG AI支部 #2 LT : Moving story of AWS/ML beginner engineer
20161209 JAWS-UG AI支部 #2 LT : Moving story of AWS/ML beginner engineer20161209 JAWS-UG AI支部 #2 LT : Moving story of AWS/ML beginner engineer
20161209 JAWS-UG AI支部 #2 LT : Moving story of AWS/ML beginner engineer
 
BIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICS
BIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICSBIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICS
BIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICS
 
Competing on analytics
Competing on analyticsCompeting on analytics
Competing on analytics
 
The CDO Agenda: Competing with Data - Strategy and Organization
The CDO Agenda: Competing with Data - Strategy and OrganizationThe CDO Agenda: Competing with Data - Strategy and Organization
The CDO Agenda: Competing with Data - Strategy and Organization
 
AIA SOX Conference May 2009 - CCM & Data Analytics
AIA SOX Conference May 2009 - CCM & Data AnalyticsAIA SOX Conference May 2009 - CCM & Data Analytics
AIA SOX Conference May 2009 - CCM & Data Analytics
 
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
 
Analytics Environment
Analytics EnvironmentAnalytics Environment
Analytics Environment
 
Using MongoDB As a Tick Database
Using MongoDB As a Tick DatabaseUsing MongoDB As a Tick Database
Using MongoDB As a Tick Database
 
About alteryx
About alteryxAbout alteryx
About alteryx
 
Io tビジネスモデルに関する考察20161119
Io tビジネスモデルに関する考察20161119Io tビジネスモデルに関する考察20161119
Io tビジネスモデルに関する考察20161119
 
BI Maturity Model ppt
BI Maturity Model pptBI Maturity Model ppt
BI Maturity Model ppt
 

Semelhante a In-Database Predictive Analytics

RNN sharing at Trend Micro
RNN sharing at Trend MicroRNN sharing at Trend Micro
RNN sharing at Trend MicroChun Hao Wang
 
Bouguet's MatLab Camera Calibration Toolbox
Bouguet's MatLab Camera Calibration ToolboxBouguet's MatLab Camera Calibration Toolbox
Bouguet's MatLab Camera Calibration ToolboxYuji Oyamada
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
Ode powerpoint presentation1
Ode powerpoint presentation1Ode powerpoint presentation1
Ode powerpoint presentation1Pokkarn Narkhede
 
Passive network-redesign-ntua
Passive network-redesign-ntuaPassive network-redesign-ntua
Passive network-redesign-ntuaIEEE NTUA SB
 
Regularized Estimation of Spatial Patterns
Regularized Estimation of Spatial PatternsRegularized Estimation of Spatial Patterns
Regularized Estimation of Spatial PatternsWen-Ting Wang
 
Special Techniques (Teknik Khusus)
Special Techniques (Teknik Khusus)Special Techniques (Teknik Khusus)
Special Techniques (Teknik Khusus)Septiko Aji
 
Algorithm chapter 8
Algorithm chapter 8Algorithm chapter 8
Algorithm chapter 8chidabdu
 
Geometric transformation cg
Geometric transformation cgGeometric transformation cg
Geometric transformation cgharinipriya1994
 
Kekre’s hybrid wavelet transform technique with dct, walsh, hartley and kekre’s
Kekre’s hybrid wavelet transform technique with dct, walsh, hartley and kekre’sKekre’s hybrid wavelet transform technique with dct, walsh, hartley and kekre’s
Kekre’s hybrid wavelet transform technique with dct, walsh, hartley and kekre’sIAEME Publication
 
Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.Leonid Zhukov
 
23 industrial engineering
23 industrial engineering23 industrial engineering
23 industrial engineeringmloeb825
 
X2 T08 03 inequalities & graphs (2011)
X2 T08 03 inequalities & graphs (2011)X2 T08 03 inequalities & graphs (2011)
X2 T08 03 inequalities & graphs (2011)Nigel Simmons
 
Dijkstra's Algorithm
Dijkstra's AlgorithmDijkstra's Algorithm
Dijkstra's Algorithmguest862df4e
 

Semelhante a In-Database Predictive Analytics (20)

Im2013vit
Im2013vitIm2013vit
Im2013vit
 
RNN sharing at Trend Micro
RNN sharing at Trend MicroRNN sharing at Trend Micro
RNN sharing at Trend Micro
 
Bouguet's MatLab Camera Calibration Toolbox
Bouguet's MatLab Camera Calibration ToolboxBouguet's MatLab Camera Calibration Toolbox
Bouguet's MatLab Camera Calibration Toolbox
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Ode powerpoint presentation1
Ode powerpoint presentation1Ode powerpoint presentation1
Ode powerpoint presentation1
 
Passive network-redesign-ntua
Passive network-redesign-ntuaPassive network-redesign-ntua
Passive network-redesign-ntua
 
Regularized Estimation of Spatial Patterns
Regularized Estimation of Spatial PatternsRegularized Estimation of Spatial Patterns
Regularized Estimation of Spatial Patterns
 
Special Techniques (Teknik Khusus)
Special Techniques (Teknik Khusus)Special Techniques (Teknik Khusus)
Special Techniques (Teknik Khusus)
 
Models
ModelsModels
Models
 
Algorithm chapter 8
Algorithm chapter 8Algorithm chapter 8
Algorithm chapter 8
 
Geometric transformation cg
Geometric transformation cgGeometric transformation cg
Geometric transformation cg
 
Kekre’s hybrid wavelet transform technique with dct, walsh, hartley and kekre’s
Kekre’s hybrid wavelet transform technique with dct, walsh, hartley and kekre’sKekre’s hybrid wavelet transform technique with dct, walsh, hartley and kekre’s
Kekre’s hybrid wavelet transform technique with dct, walsh, hartley and kekre’s
 
Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.
 
23 industrial engineering
23 industrial engineering23 industrial engineering
23 industrial engineering
 
Pole Placement in Digital Control
Pole Placement in Digital ControlPole Placement in Digital Control
Pole Placement in Digital Control
 
X2 T08 03 inequalities & graphs (2011)
X2 T08 03 inequalities & graphs (2011)X2 T08 03 inequalities & graphs (2011)
X2 T08 03 inequalities & graphs (2011)
 
Neural network and mlp
Neural network and mlpNeural network and mlp
Neural network and mlp
 
Dijkstra's Algorithm
Dijkstra's AlgorithmDijkstra's Algorithm
Dijkstra's Algorithm
 
Dijkstra
DijkstraDijkstra
Dijkstra
 
Dijkstra
DijkstraDijkstra
Dijkstra
 

Mais de John De Goes

Refactoring Functional Type Classes
Refactoring Functional Type ClassesRefactoring Functional Type Classes
Refactoring Functional Type ClassesJohn De Goes
 
One Monad to Rule Them All
One Monad to Rule Them AllOne Monad to Rule Them All
One Monad to Rule Them AllJohn De Goes
 
Error Management: Future vs ZIO
Error Management: Future vs ZIOError Management: Future vs ZIO
Error Management: Future vs ZIOJohn De Goes
 
Atomically { Delete Your Actors }
Atomically { Delete Your Actors }Atomically { Delete Your Actors }
Atomically { Delete Your Actors }John De Goes
 
The Death of Final Tagless
The Death of Final TaglessThe Death of Final Tagless
The Death of Final TaglessJohn De Goes
 
Scalaz Stream: Rebirth
Scalaz Stream: RebirthScalaz Stream: Rebirth
Scalaz Stream: RebirthJohn De Goes
 
Scalaz Stream: Rebirth
Scalaz Stream: RebirthScalaz Stream: Rebirth
Scalaz Stream: RebirthJohn De Goes
 
ZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional Programming
ZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional ProgrammingZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional Programming
ZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional ProgrammingJohn De Goes
 
Blazing Fast, Pure Effects without Monads — LambdaConf 2018
Blazing Fast, Pure Effects without Monads — LambdaConf 2018Blazing Fast, Pure Effects without Monads — LambdaConf 2018
Blazing Fast, Pure Effects without Monads — LambdaConf 2018John De Goes
 
Scalaz 8: A Whole New Game
Scalaz 8: A Whole New GameScalaz 8: A Whole New Game
Scalaz 8: A Whole New GameJohn De Goes
 
Scalaz 8 vs Akka Actors
Scalaz 8 vs Akka ActorsScalaz 8 vs Akka Actors
Scalaz 8 vs Akka ActorsJohn De Goes
 
Orthogonal Functional Architecture
Orthogonal Functional ArchitectureOrthogonal Functional Architecture
Orthogonal Functional ArchitectureJohn De Goes
 
The Design of the Scalaz 8 Effect System
The Design of the Scalaz 8 Effect SystemThe Design of the Scalaz 8 Effect System
The Design of the Scalaz 8 Effect SystemJohn De Goes
 
Quark: A Purely-Functional Scala DSL for Data Processing & Analytics
Quark: A Purely-Functional Scala DSL for Data Processing & AnalyticsQuark: A Purely-Functional Scala DSL for Data Processing & Analytics
Quark: A Purely-Functional Scala DSL for Data Processing & AnalyticsJohn De Goes
 
Streams for (Co)Free!
Streams for (Co)Free!Streams for (Co)Free!
Streams for (Co)Free!John De Goes
 
The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...
The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...
The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...John De Goes
 
Halogen: Past, Present, and Future
Halogen: Past, Present, and FutureHalogen: Past, Present, and Future
Halogen: Past, Present, and FutureJohn De Goes
 
All Aboard The Scala-to-PureScript Express!
All Aboard The Scala-to-PureScript Express!All Aboard The Scala-to-PureScript Express!
All Aboard The Scala-to-PureScript Express!John De Goes
 

Mais de John De Goes (20)

Refactoring Functional Type Classes
Refactoring Functional Type ClassesRefactoring Functional Type Classes
Refactoring Functional Type Classes
 
One Monad to Rule Them All
One Monad to Rule Them AllOne Monad to Rule Them All
One Monad to Rule Them All
 
Error Management: Future vs ZIO
Error Management: Future vs ZIOError Management: Future vs ZIO
Error Management: Future vs ZIO
 
Atomically { Delete Your Actors }
Atomically { Delete Your Actors }Atomically { Delete Your Actors }
Atomically { Delete Your Actors }
 
The Death of Final Tagless
The Death of Final TaglessThe Death of Final Tagless
The Death of Final Tagless
 
Scalaz Stream: Rebirth
Scalaz Stream: RebirthScalaz Stream: Rebirth
Scalaz Stream: Rebirth
 
Scalaz Stream: Rebirth
Scalaz Stream: RebirthScalaz Stream: Rebirth
Scalaz Stream: Rebirth
 
ZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional Programming
ZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional ProgrammingZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional Programming
ZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional Programming
 
ZIO Queue
ZIO QueueZIO Queue
ZIO Queue
 
Blazing Fast, Pure Effects without Monads — LambdaConf 2018
Blazing Fast, Pure Effects without Monads — LambdaConf 2018Blazing Fast, Pure Effects without Monads — LambdaConf 2018
Blazing Fast, Pure Effects without Monads — LambdaConf 2018
 
Scalaz 8: A Whole New Game
Scalaz 8: A Whole New GameScalaz 8: A Whole New Game
Scalaz 8: A Whole New Game
 
Scalaz 8 vs Akka Actors
Scalaz 8 vs Akka ActorsScalaz 8 vs Akka Actors
Scalaz 8 vs Akka Actors
 
Orthogonal Functional Architecture
Orthogonal Functional ArchitectureOrthogonal Functional Architecture
Orthogonal Functional Architecture
 
The Design of the Scalaz 8 Effect System
The Design of the Scalaz 8 Effect SystemThe Design of the Scalaz 8 Effect System
The Design of the Scalaz 8 Effect System
 
Quark: A Purely-Functional Scala DSL for Data Processing & Analytics
Quark: A Purely-Functional Scala DSL for Data Processing & AnalyticsQuark: A Purely-Functional Scala DSL for Data Processing & Analytics
Quark: A Purely-Functional Scala DSL for Data Processing & Analytics
 
Streams for (Co)Free!
Streams for (Co)Free!Streams for (Co)Free!
Streams for (Co)Free!
 
MTL Versus Free
MTL Versus FreeMTL Versus Free
MTL Versus Free
 
The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...
The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...
The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...
 
Halogen: Past, Present, and Future
Halogen: Past, Present, and FutureHalogen: Past, Present, and Future
Halogen: Past, Present, and Future
 
All Aboard The Scala-to-PureScript Express!
All Aboard The Scala-to-PureScript Express!All Aboard The Scala-to-PureScript Express!
All Aboard The Scala-to-PureScript Express!
 

Último

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 

Último (20)

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 

In-Database Predictive Analytics

  • 1. In-Database Predictive Analytics John A. De Goes @jdegoes, john@precog.com
  • 2. Agenda • Introduction • Abusing SQL • Painful by Design • Database Extensions • MADlib • Other Approaches • Summary
  • 3. Introduction In-Database Predictive Analytics In-database predictive analytics refers to the the process of performing advanced predictive analytics directly inside the database.
  • 4. Introduction Traditional Predictive Analytics R database SAS
  • 5. Introduction R database SAS Data Bottleneck: Painful, Slow
  • 6. Introduction What’s the answer?
  • 7. Introduction Move the Code, not the Data! Advanced Analytics “MapReduce”
  • 8. Abusing SQL Let’s Do K-Means in SQL!
  • 9. Abusing SQL General Approach in RDBMS SQL Driver Database Feedback
  • 10. Abusing SQL Our Initial Model model d k n iteration avg_q number of dimensions number of points variance number of clusters number of iterations
  • 11. Abusing SQL Our Initial Data Set Y Y1 Y2 Y3 Y3 n rows
  • 12. Abusing SQL Projection & Numbering Y YH Y1 Y2 Y3 ... i Y1 ... Yd 1 1 2 2 3 3 4 4 ... ... ... ... n n INSERT INTO YH SELECT sum(1) over(rows unbounded preceding) AS i,Y1, Y2, ..., Yd FROM Y;
  • 13. Abusing SQL Flattening YH YV i Y1 ... Yd i l val 1 1 1 2 1 2 3 1 ... ... 4 1 d ... 2 1 ... ... ... n n d n x d rows INSERT INTO YV SELECT i,1,Y1 FROM YH; ... INSERT INTO YV SELECT i,d,Yd FROM YH;
  • 14. Abusing SQL Initializing k Cluster Centers YH CH i Y1 ... Yd j Y1 ... Yd 1 1 2 2 3 3 4 4 ... ... ... ... n k INSERT INTO CH SELECT 1,Y1, ..., Yd FROM YH SAMPLE 1; ... INSERT INTO CH SELECT k,Y1, ..., Yd FROM YH SAMPLE 1;
  • 15. Abusing SQL Flattening CH C j Y1 ... Yd l j val 1 1 1 2 1 2 3 ... ... 4 1 k ... 2 1 ... ... ... k d k d x k rows INSERT INTO C SELECT 1, 1, Y1 FROM CH WHERE j = 1; ... INSERT INTO C SELECT d, k, Yd FROM CH WHERE j = k;
  • 16. Abusing SQL Computing Distances to Clusters YD i j dist 1 1 1 2 INSERT INTO YD ... ... SELECT i, j, sum((YV.val - C.val)**2) 1 k FROM YV, C WHERE YV.l = C.l GROUP BY i, j; 2 1 ... ... n k n x k rows
  • 17. Abusing SQL Computing Nearest Neighbors YNN nearest clusters i j 1 2 INSERT INTO YNN SELECT YD.i,Y D.j 3 FROM YD, 4 (SELECT i, min(dist) AS mindist FROM YD GROUP BY i) YMIND 5 WHERE Y D.i = YMIND.i ... and Y D.distance = YMIND.mindist; n n rows
  • 18. Abusing SQL Count Points Per Cluster INSERT INTO W SELECT j, count(*) FROM YNN GROUP BY j; UPDATE W SET w = w/model.n;
  • 19. Abusing SQL Compute New Centroids INSERT INTO C SELECT l, j, avg(YV.val) FROM YV, YNN WHERE YV.i = YNN.i GROUP BY l, j;
  • 20. Abusing SQL Compute Variances INSERT INTO R SELECT C.l, C.j, avg((YV.val- C.val)**2) FROM C, YV, YNN WHERE YV.i = YNN.i and YV.l = C.l and YNN.j = C.j GROUP BY C.l, C.j;
  • 21. Abusing SQL Update Model INSERT INTO R SELECT C.l, C.j, avg((YV.val- C.val)**2) FROM C, YV, YNN WHERE YV.i = YNN.i and YV.l = C.l and YNN.j = C.j GROUP BY C.l, C.j;
  • 22. Abusing SQL Let’s not do that again!
  • 23. Painful by Design Why are predictive analytics so hard to express in SQL?
  • 24. Painful by Design #1: No Arrays Sets Tuples Arrays rows columns
  • 25. Painful by Design #2: Relational Algebra Sucks Projection Selection Rename Natural Join R S Semijoin Antijoin Division Theta Join R S R S R ÷ S Left outer join Right outer join Full outer join Aggregation R ⟕ S R ⟖ S R⟗ S G1, G2, ..., Gm g f1(A1'), f2(A2'), ..., fk(Ak') (r) Iteration Recursion Multiple Dimensions
  • 26. Database Extensions There’s GOT to be a better way!
  • 27. Database Extensions C Extension
  • 28. Database Extensions UDF UDA User-Defined Function User-Defined Aggregate Map Reduce map(a) init(a) op2(a,b) accum(a, b) merge(a, b) final(a)
  • 29. MADlib MADlib is an open-source library for scalable in-database analytics. It is implemented using database extensions written in C, and is available for PostgreSQL and Greenplum.
  • 30. MADlib 1. Download the binary Mac OS X http://www.madlib.net/files/madlib-0.6- Darwin.dmg Linux http://www.madlib.net/files/madlib-0.6- Linux.rpm
  • 31. MADlib 2. Start the Installation Mac OS X Double-click on installer Linux yum install $MADLIB_PACKAGE --nogpgcheck
  • 32. MADlib 3. Verify Locatability Greenplum source /path/to/greenplum/ greenplum_path.sh PostgreSQL Make sure psql is in PATH
  • 33. MADlib 4. Register MADlib Greenplum /usr/local/madlib/bin/madpack -p greenplum -c $USER@$HOST/$DATABASE install PostgreSQL /usr/local/madlib/bin/madpack -p postgres -c $USER@$HOST/$DATABASE install
  • 34. MADlib 5. Test Installation Greenplum /usr/local/madlib/bin/madpack -p greenplum -c $USER@$HOST/$DATABASE install-check PostgreSQL /usr/local/madlib/bin/madpack -p postgres -c $USER@$HOST/$DATABASE install-check
  • 35. MADlib Clustering in MADlib SELECT * FROM kmeans_random( 'rel_source', 'expr_point', k, [ 'fn_dist', 'agg_centroid', max_num_iterations, min_frac_reassigned ] );
  • 36. MADlib Ahhhhhh......
  • 37. MADlib Our Way or the Highway Composability
  • 38. Other Approaches RDBMS Isn’t the Only Game in Town!
  • 39. Other Approaches 1. Embrace Coding • Hadoop Ecosystem • Mahout, Cascading/Scalding, Crunch/Scrunch, Pangool, Cascalog, and, of course, MapReduce • BDAS Ecosystem • Spark
  • 40. Other Approaches 2. Reject RDBMS • Datalog + variants • In theory, ideal for many kinds of predictive analytics • Suffers from a lack of distributed, feature-complete implementations
  • 41. Other Approaches 2. Reject RDBMS • Rasdaman / RASQL • Arrays but not analytics Community Editions http://www.rasdaman.org
  • 42. Other Approaches 2. Reject RDBMS • MonetDB / SciQL • Array extension of SQL • Poor analytics Community Editions http://www.monetdb.org
  • 43. Other Approaches 2. Reject RDBMS • SciDB / AFL (AQL) • Excellent analytics • Limited composability Community Editions http://www.scidb.org/forum/viewtopic.php?f=16&t=364/
  • 44. Other Approaches 2. Reject RDBMS • Precog / Quirrel (simple “R for big data”) • Multidimensional, arrays + functions • Still immature Community Editions http://www.precog.com/editions/precog-for-mongodb (MongoDB) http://www.precog.com/editions/precog-for-postgresql (PostgreSQL)
  • 45. Summary • Increase performance, reduce friction by doing more inside the database • Not a panacea • Hard to do in SQL • Hard to do in C (but you may not have to: MADlib) • Pre-canned & brittle in most databases • Ultimately what’s needed is tech designed for advanced analytics
  • 46. Q&A John A. De Goes @jdegoes, john@precog.com
  • 47. References • Programming the K-means Clustering Algorithm in SQL (Teradata, NCR)