SlideShare a Scribd company logo
1 of 18
bagusco@gmail.com
bagusco@ipb.ac.id
KDD Cup 2010: Overview
• The Challenge
   – How generally or narrowly do students learn? How quickly or
     slowly? Will the rate of improvement vary between students?
     What does it mean for one problem to be similar to another?

   – Is it possible to infer the knowledge requirements of problems
     directly from student performance data, without human analysis
     of the tasks?

   – This year's challenge asks you to predict student performance
     on mathematical problems from logs of student interaction with
     Intelligent Tutoring Systems.
KDD Cup 2010: Results
• Winners of KDD Cup 2010: All Teams
   – First Place: National Taiwan University
     Feature engineering and classifier ensembling for KDD CUP
     2010

   – First Runner Up: Zhang and Su
     Gradient Boosting Machines with Singular Value Decomposition

   – Second Runner Up: BigChaos @ KDD
     Collaborative Filtering Applied to Educational Data Mining
Outline
•   What is Ensemble Learning?
•   Why Ensemble?
•   How good is Ensemble?
•   What next?
Predictive Modeling
• Widely-used in many applications:
  – Business
     • Churn modeling, Scoring
  – Science
     • Chemometrics
  – Bio-Science
     • Efficacy modeling, Classification
  – Academics
     • Admission selection, student performance
Predictive Modeling
                          New
                         Data Set


Training      Model      Predictive   Prediction
  Set      Development     Rules
Classical Approach: Model Selection




   Which one is the best?
New Approach?: Ensemble




  Combine all models!!!
What is Ensemble?
• Single Expert   vs   Team of Experts
What is Ensemble?
                        Data Set



   Training Set #1   Training Set #2   ……   Training Set #k
                                       .

     Learning           Learning              Learning
                                       ……
     Model #1           Model #2              Model #k
                                       .

                        Combiner



                        Ensemble
                        Prediction
Types of Ensemble
• Hybrid Ensemble
  – Combining several different learning algorithms into
    one prediction
  – e.g: combining the result of regression, tree, neural
    nets, and support vector machine

• Non-Hybrid Ensemble
  – Combining several learning models from the same
    algorithm into one prediction
Well-Known Ensembles
• Bagging
  – Generate learning models for the bootstrap samples
  – Aggregate the predictions via averaging or majority-vote
• Boosting (AdaBoost)
  – Generate sequential learning models with higher weight to
    ‘difficult’ cases
  – Combine the predictions by concerning the weight
• Random Forest
  – Similar to bagging except the existence of random feature
    selection for each learning model generation
How Good is Ensemble?
Error Rate
 0.7
                           tree
 0.6
                           bagging
 0.5                       adaboost
 0.4
 0.3
 0.2
 0.1
  0
       1   2   3   4   5   6   7   8   9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33


  Source: Dietterich (1999)
How Good is Ensemble?
AUC
 0.9

 0.8                                                   CART
                                                       C45
 0.7
                                                       Bagging
 0.6                                                   Random Forest
                                                       Rotation Forest
 0.5
                                                       Rotation Boost
 0.4
           DIY          Bank   Telecom1   Mail-order
Source: Bock & Poel (2011)
What Next
• Ensemble Predictive Models

• Class-Imbalance Models
  – Gradient
    Boosting, EasyEnsemble, BalanceCascade, SMOTE
    Boost

• Robust Predictive Models
  – Noise Ensemble
Ensemble in SAS/EM
THANK YOU
Bagus Sartono
Educational Background        Professional Experience
• Bachelor of Science in      • Lecturer – Dept of Stats
  Stats – IPB (2000)            IPB
• Master of Science in        • Experienced Trainer in
  Stats – IPB (2004)            Analytics (Bank
• PhD in Applied                Indonesia, Bank
  Economics – University of     Mandiri, Ganesha Cipta
  Antwerp (2012)                Informatika, CIFOR, LIPI,
                                LPEM-UI, etc)

More Related Content

Viewers also liked

Digital media for marketing meeting 2011
Digital media for marketing meeting 2011Digital media for marketing meeting 2011
Digital media for marketing meeting 2011cmviasat
 
WCAN 2013 Spring ライトニングトーク『学びのコツを掴んで変化が激しい時代を楽しもう』
WCAN 2013 Spring ライトニングトーク『学びのコツを掴んで変化が激しい時代を楽しもう』WCAN 2013 Spring ライトニングトーク『学びのコツを掴んで変化が激しい時代を楽しもう』
WCAN 2013 Spring ライトニングトーク『学びのコツを掴んで変化が激しい時代を楽しもう』takuo yamada
 
Jonge Democraten - Individueel pensioen zonder sociale partners.
Jonge Democraten - Individueel pensioen zonder sociale partners.Jonge Democraten - Individueel pensioen zonder sociale partners.
Jonge Democraten - Individueel pensioen zonder sociale partners.BeFrank
 
INTERIOR-iD Portfolio
INTERIOR-iD PortfolioINTERIOR-iD Portfolio
INTERIOR-iD PortfolioRadaschitz
 
Increasing talent mobility: (Open) Badges @ Selor
Increasing talent mobility: (Open) Badges @ SelorIncreasing talent mobility: (Open) Badges @ Selor
Increasing talent mobility: (Open) Badges @ SelorVincent Van Malderen
 
The Future of Content Marketing
The Future of Content MarketingThe Future of Content Marketing
The Future of Content MarketingLucia Novara
 
Programming SharePoint 2010 with Visual Studio 2010
Programming SharePoint 2010 with Visual Studio 2010Programming SharePoint 2010 with Visual Studio 2010
Programming SharePoint 2010 with Visual Studio 2010Quang Nguyễn Bá
 
39808 sum orientation2011_sav_graduate_ppt
39808 sum orientation2011_sav_graduate_ppt39808 sum orientation2011_sav_graduate_ppt
39808 sum orientation2011_sav_graduate_pptTreyReckling
 
pension jugement
pension jugementpension jugement
pension jugementraph98
 
2 phil lit, pre colonial period
2 phil lit, pre colonial period2 phil lit, pre colonial period
2 phil lit, pre colonial periodMarien Be
 
自転車通勤のススメ
自転車通勤のススメ自転車通勤のススメ
自転車通勤のススメtakuo yamada
 
Unlearning unlimited
Unlearning unlimitedUnlearning unlimited
Unlearning unlimitedPravin Sabnis
 
Building a $100k and flexible design career
Building a $100k and flexible design careerBuilding a $100k and flexible design career
Building a $100k and flexible design careeradambcarney
 

Viewers also liked (19)

Chapter 7
Chapter 7Chapter 7
Chapter 7
 
Digital media for marketing meeting 2011
Digital media for marketing meeting 2011Digital media for marketing meeting 2011
Digital media for marketing meeting 2011
 
WCAN 2013 Spring ライトニングトーク『学びのコツを掴んで変化が激しい時代を楽しもう』
WCAN 2013 Spring ライトニングトーク『学びのコツを掴んで変化が激しい時代を楽しもう』WCAN 2013 Spring ライトニングトーク『学びのコツを掴んで変化が激しい時代を楽しもう』
WCAN 2013 Spring ライトニングトーク『学びのコツを掴んで変化が激しい時代を楽しもう』
 
Jonge Democraten - Individueel pensioen zonder sociale partners.
Jonge Democraten - Individueel pensioen zonder sociale partners.Jonge Democraten - Individueel pensioen zonder sociale partners.
Jonge Democraten - Individueel pensioen zonder sociale partners.
 
INTERIOR-iD Portfolio
INTERIOR-iD PortfolioINTERIOR-iD Portfolio
INTERIOR-iD Portfolio
 
California
California California
California
 
Increasing talent mobility: (Open) Badges @ Selor
Increasing talent mobility: (Open) Badges @ SelorIncreasing talent mobility: (Open) Badges @ Selor
Increasing talent mobility: (Open) Badges @ Selor
 
The Future of Content Marketing
The Future of Content MarketingThe Future of Content Marketing
The Future of Content Marketing
 
The compass
The compassThe compass
The compass
 
A REFORMA E A CONTRARREFORMA
A REFORMA E A CONTRARREFORMAA REFORMA E A CONTRARREFORMA
A REFORMA E A CONTRARREFORMA
 
Programming SharePoint 2010 with Visual Studio 2010
Programming SharePoint 2010 with Visual Studio 2010Programming SharePoint 2010 with Visual Studio 2010
Programming SharePoint 2010 with Visual Studio 2010
 
39808 sum orientation2011_sav_graduate_ppt
39808 sum orientation2011_sav_graduate_ppt39808 sum orientation2011_sav_graduate_ppt
39808 sum orientation2011_sav_graduate_ppt
 
pension jugement
pension jugementpension jugement
pension jugement
 
Rbp ph
Rbp phRbp ph
Rbp ph
 
2 phil lit, pre colonial period
2 phil lit, pre colonial period2 phil lit, pre colonial period
2 phil lit, pre colonial period
 
自転車通勤のススメ
自転車通勤のススメ自転車通勤のススメ
自転車通勤のススメ
 
C
CC
C
 
Unlearning unlimited
Unlearning unlimitedUnlearning unlimited
Unlearning unlimited
 
Building a $100k and flexible design career
Building a $100k and flexible design careerBuilding a $100k and flexible design career
Building a $100k and flexible design career
 

Similar to Improving the Model’s Predictive Power with Ensemble Approaches

Predict oscars (4:17)
Predict oscars (4:17)Predict oscars (4:17)
Predict oscars (4:17)Thinkful
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning IntroductionDong Guo
 
How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?Tuan Yang
 
Hadoop Summit 2010 Machine Learning Using Hadoop
Hadoop Summit 2010 Machine Learning Using HadoopHadoop Summit 2010 Machine Learning Using Hadoop
Hadoop Summit 2010 Machine Learning Using HadoopYahoo Developer Network
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for EveryoneAly Abdelkareem
 
Predict the Oscars with Data Science
Predict the Oscars with Data SciencePredict the Oscars with Data Science
Predict the Oscars with Data ScienceCarlos Edo
 
in5490-classification (1).pptx
in5490-classification (1).pptxin5490-classification (1).pptx
in5490-classification (1).pptxMonicaTimber
 
EssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfEssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfAnkita Tiwari
 
Predict oscars (5:11)
Predict oscars (5:11)Predict oscars (5:11)
Predict oscars (5:11)Thinkful
 
Machine Learning: Learning with data
Machine Learning: Learning with dataMachine Learning: Learning with data
Machine Learning: Learning with dataONE Talks
 
One talk Machine Learning
One talk Machine LearningOne talk Machine Learning
One talk Machine LearningONE Talks
 
To bag, or to boost? A question of balance
To bag, or to boost? A question of balanceTo bag, or to boost? A question of balance
To bag, or to boost? A question of balanceAlex Henderson
 
[ESWC2017 - PhD Symposium] Enhancing white-box machine learning processes by ...
[ESWC2017 - PhD Symposium] Enhancing white-box machine learning processes by ...[ESWC2017 - PhD Symposium] Enhancing white-box machine learning processes by ...
[ESWC2017 - PhD Symposium] Enhancing white-box machine learning processes by ...Gilles Vandewiele
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptxRaflyRizky2
 
CATALST intro stats course presentation at JMM 2013 (Elizabeth Fry, Laura Zie...
CATALST intro stats course presentation at JMM 2013 (Elizabeth Fry, Laura Zie...CATALST intro stats course presentation at JMM 2013 (Elizabeth Fry, Laura Zie...
CATALST intro stats course presentation at JMM 2013 (Elizabeth Fry, Laura Zie...statisfactions
 
Predict the Oscars with Data Science
Predict the Oscars with Data SciencePredict the Oscars with Data Science
Predict the Oscars with Data ScienceThinkful
 
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...MLAI2
 
李俊良/Feature Engineering in Machine Learning
李俊良/Feature Engineering in Machine Learning李俊良/Feature Engineering in Machine Learning
李俊良/Feature Engineering in Machine Learning台灣資料科學年會
 

Similar to Improving the Model’s Predictive Power with Ensemble Approaches (20)

Predict oscars (4:17)
Predict oscars (4:17)Predict oscars (4:17)
Predict oscars (4:17)
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning Introduction
 
How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?
 
Hadoop Summit 2010 Machine Learning Using Hadoop
Hadoop Summit 2010 Machine Learning Using HadoopHadoop Summit 2010 Machine Learning Using Hadoop
Hadoop Summit 2010 Machine Learning Using Hadoop
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
 
Predict the Oscars with Data Science
Predict the Oscars with Data SciencePredict the Oscars with Data Science
Predict the Oscars with Data Science
 
in5490-classification (1).pptx
in5490-classification (1).pptxin5490-classification (1).pptx
in5490-classification (1).pptx
 
EssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfEssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdf
 
Predict oscars (5:11)
Predict oscars (5:11)Predict oscars (5:11)
Predict oscars (5:11)
 
Machine Learning: Learning with data
Machine Learning: Learning with dataMachine Learning: Learning with data
Machine Learning: Learning with data
 
One talk Machine Learning
One talk Machine LearningOne talk Machine Learning
One talk Machine Learning
 
To bag, or to boost? A question of balance
To bag, or to boost? A question of balanceTo bag, or to boost? A question of balance
To bag, or to boost? A question of balance
 
[ESWC2017 - PhD Symposium] Enhancing white-box machine learning processes by ...
[ESWC2017 - PhD Symposium] Enhancing white-box machine learning processes by ...[ESWC2017 - PhD Symposium] Enhancing white-box machine learning processes by ...
[ESWC2017 - PhD Symposium] Enhancing white-box machine learning processes by ...
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
 
Machine learning
Machine learning Machine learning
Machine learning
 
CATALST intro stats course presentation at JMM 2013 (Elizabeth Fry, Laura Zie...
CATALST intro stats course presentation at JMM 2013 (Elizabeth Fry, Laura Zie...CATALST intro stats course presentation at JMM 2013 (Elizabeth Fry, Laura Zie...
CATALST intro stats course presentation at JMM 2013 (Elizabeth Fry, Laura Zie...
 
What is Machine Learning
What is Machine LearningWhat is Machine Learning
What is Machine Learning
 
Predict the Oscars with Data Science
Predict the Oscars with Data SciencePredict the Oscars with Data Science
Predict the Oscars with Data Science
 
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
 
李俊良/Feature Engineering in Machine Learning
李俊良/Feature Engineering in Machine Learning李俊良/Feature Engineering in Machine Learning
李俊良/Feature Engineering in Machine Learning
 

More from SAS Asia Pacific

Produce Analytical Talent to Meet the Industry Needs
Produce Analytical Talent to Meet the Industry NeedsProduce Analytical Talent to Meet the Industry Needs
Produce Analytical Talent to Meet the Industry NeedsSAS Asia Pacific
 
Better decisions through analytics in healthcare industry. Our journey so far
Better decisions through analytics in healthcare industry.  Our journey so farBetter decisions through analytics in healthcare industry.  Our journey so far
Better decisions through analytics in healthcare industry. Our journey so farSAS Asia Pacific
 
How can Analytics Drive Customer Values?
How can Analytics Drive Customer Values?How can Analytics Drive Customer Values?
How can Analytics Drive Customer Values?SAS Asia Pacific
 
Developing an Analytical Mindset – Becoming an Analytical Competitor
Developing an Analytical Mindset – Becoming an Analytical CompetitorDeveloping an Analytical Mindset – Becoming an Analytical Competitor
Developing an Analytical Mindset – Becoming an Analytical CompetitorSAS Asia Pacific
 
Gaining New Insights into Usage Log Data
Gaining New Insights into Usage Log Data Gaining New Insights into Usage Log Data
Gaining New Insights into Usage Log Data SAS Asia Pacific
 
Predictive Analytics: Advanced techniques in data mining
Predictive Analytics: Advanced techniques in data miningPredictive Analytics: Advanced techniques in data mining
Predictive Analytics: Advanced techniques in data miningSAS Asia Pacific
 
A Journey through the Spatial Data Mining and Geographic Knowledge Discover J...
A Journey through the Spatial Data Mining and Geographic Knowledge Discover J...A Journey through the Spatial Data Mining and Geographic Knowledge Discover J...
A Journey through the Spatial Data Mining and Geographic Knowledge Discover J...SAS Asia Pacific
 
A journey through the spatial data mining and geographic knowledge discovery ...
A journey through the spatial data mining and geographic knowledge discovery ...A journey through the spatial data mining and geographic knowledge discovery ...
A journey through the spatial data mining and geographic knowledge discovery ...SAS Asia Pacific
 

More from SAS Asia Pacific (9)

Produce Analytical Talent to Meet the Industry Needs
Produce Analytical Talent to Meet the Industry NeedsProduce Analytical Talent to Meet the Industry Needs
Produce Analytical Talent to Meet the Industry Needs
 
Better decisions through analytics in healthcare industry. Our journey so far
Better decisions through analytics in healthcare industry.  Our journey so farBetter decisions through analytics in healthcare industry.  Our journey so far
Better decisions through analytics in healthcare industry. Our journey so far
 
How can Analytics Drive Customer Values?
How can Analytics Drive Customer Values?How can Analytics Drive Customer Values?
How can Analytics Drive Customer Values?
 
Developing an Analytical Mindset – Becoming an Analytical Competitor
Developing an Analytical Mindset – Becoming an Analytical CompetitorDeveloping an Analytical Mindset – Becoming an Analytical Competitor
Developing an Analytical Mindset – Becoming an Analytical Competitor
 
Gaining New Insights into Usage Log Data
Gaining New Insights into Usage Log Data Gaining New Insights into Usage Log Data
Gaining New Insights into Usage Log Data
 
Predictive Analytics: Advanced techniques in data mining
Predictive Analytics: Advanced techniques in data miningPredictive Analytics: Advanced techniques in data mining
Predictive Analytics: Advanced techniques in data mining
 
Technology Update
Technology UpdateTechnology Update
Technology Update
 
A Journey through the Spatial Data Mining and Geographic Knowledge Discover J...
A Journey through the Spatial Data Mining and Geographic Knowledge Discover J...A Journey through the Spatial Data Mining and Geographic Knowledge Discover J...
A Journey through the Spatial Data Mining and Geographic Knowledge Discover J...
 
A journey through the spatial data mining and geographic knowledge discovery ...
A journey through the spatial data mining and geographic knowledge discovery ...A journey through the spatial data mining and geographic knowledge discovery ...
A journey through the spatial data mining and geographic knowledge discovery ...
 

Recently uploaded

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Recently uploaded (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Improving the Model’s Predictive Power with Ensemble Approaches

  • 2. KDD Cup 2010: Overview • The Challenge – How generally or narrowly do students learn? How quickly or slowly? Will the rate of improvement vary between students? What does it mean for one problem to be similar to another? – Is it possible to infer the knowledge requirements of problems directly from student performance data, without human analysis of the tasks? – This year's challenge asks you to predict student performance on mathematical problems from logs of student interaction with Intelligent Tutoring Systems.
  • 3. KDD Cup 2010: Results • Winners of KDD Cup 2010: All Teams – First Place: National Taiwan University Feature engineering and classifier ensembling for KDD CUP 2010 – First Runner Up: Zhang and Su Gradient Boosting Machines with Singular Value Decomposition – Second Runner Up: BigChaos @ KDD Collaborative Filtering Applied to Educational Data Mining
  • 4. Outline • What is Ensemble Learning? • Why Ensemble? • How good is Ensemble? • What next?
  • 5. Predictive Modeling • Widely-used in many applications: – Business • Churn modeling, Scoring – Science • Chemometrics – Bio-Science • Efficacy modeling, Classification – Academics • Admission selection, student performance
  • 6. Predictive Modeling New Data Set Training Model Predictive Prediction Set Development Rules
  • 7. Classical Approach: Model Selection Which one is the best?
  • 8. New Approach?: Ensemble Combine all models!!!
  • 9. What is Ensemble? • Single Expert vs Team of Experts
  • 10. What is Ensemble? Data Set Training Set #1 Training Set #2 …… Training Set #k . Learning Learning Learning …… Model #1 Model #2 Model #k . Combiner Ensemble Prediction
  • 11. Types of Ensemble • Hybrid Ensemble – Combining several different learning algorithms into one prediction – e.g: combining the result of regression, tree, neural nets, and support vector machine • Non-Hybrid Ensemble – Combining several learning models from the same algorithm into one prediction
  • 12. Well-Known Ensembles • Bagging – Generate learning models for the bootstrap samples – Aggregate the predictions via averaging or majority-vote • Boosting (AdaBoost) – Generate sequential learning models with higher weight to ‘difficult’ cases – Combine the predictions by concerning the weight • Random Forest – Similar to bagging except the existence of random feature selection for each learning model generation
  • 13. How Good is Ensemble? Error Rate 0.7 tree 0.6 bagging 0.5 adaboost 0.4 0.3 0.2 0.1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 Source: Dietterich (1999)
  • 14. How Good is Ensemble? AUC 0.9 0.8 CART C45 0.7 Bagging 0.6 Random Forest Rotation Forest 0.5 Rotation Boost 0.4 DIY Bank Telecom1 Mail-order Source: Bock & Poel (2011)
  • 15. What Next • Ensemble Predictive Models • Class-Imbalance Models – Gradient Boosting, EasyEnsemble, BalanceCascade, SMOTE Boost • Robust Predictive Models – Noise Ensemble
  • 18. Bagus Sartono Educational Background Professional Experience • Bachelor of Science in • Lecturer – Dept of Stats Stats – IPB (2000) IPB • Master of Science in • Experienced Trainer in Stats – IPB (2004) Analytics (Bank • PhD in Applied Indonesia, Bank Economics – University of Mandiri, Ganesha Cipta Antwerp (2012) Informatika, CIFOR, LIPI, LPEM-UI, etc)