SlideShare a Scribd company logo
1 of 45
Machine Learning in the Environmental Discipline:
      Statistical Paradigm Shifts, Combining Data Mining
with Geographic Information Systems (GIS), and Educating the
        Next Generation of Environmental Researchers

                                   Falk Huettmann
                                   -EWHALE lab-
                                   419 Irving 1
                                   Inst. of Arctic Biology
                                   Biology & Wildlife Dept
                                   Fairbanks, Alaska 99775 USA
                                   fhuettmann@alaska.edu
Applying and Teaching Machine Learning at Public Institutions

                Whatā€™s at stake ?

-a better pre-screening of data (data mining)
-a better inference (better model fits)
-a better hypothesis (more relevant)
-a faster and more accurate analysis
-a better prediction (pro-active decisions)
    => progress in science & society
             => a new culture
This presentation

-Environmental Sciences (Sustainability)

-Paradigm Shift in the Sciences

-Geographic Information Systems (GIS)

-Teaching Machine Learning at Public
 Institutions
The earth and its atmosphere ...we only have one ...
The atmosphere
Rio Convention 1992 and its goals
The Earth at nightā€¦




     ā€¦people are everywhere these days,
     mostly at the coast, affecting Wildlife,
     Habitats and Ecological Services
An Earth Sampler: The Three Poles
1.

                             3.




                 2.
Three poles: Ice, glaciers and snow

 => A cool (or a hot?) place to be...




1.                      2.              3.
No data ā€¦and ā€œnothing is knownā€ ā€¦?
Climbing and Understanding Data Mountains with Data Mining:
           A free copy of GBIF dataā€¦ www.gbif.org,
      seamap.env.duke.edu/ , mountainbiodiversity.org




   Megascience
Examples of people using and publishing on
Machine Learning in the Biodiversity, Wildlife
and Sustainability Sciencesā€¦
Classify, pre-screen, explain and predict data




                      A heap of data,
                   e.g. online and merged,
                   free of research design


Response ~ Predictor1 + Predictor2 + Predictor3 + Predictor4 +ā€¦.
             (e.g. clustering & multiple regression)
Multiple Regressions: Are they any good ?!
Response ~ Predictor1 + Predictor2 + Predictor3 + Predictor4 +ā€¦.

The issues are for instance:
                -Many predictors, e.g. 30

                -Correlations

                -Interactions (2-way, 3-way and higher)

                -Coefficients + Offset

                -(Multi-) hypothesis testing

                -Model Selection

                -Parsimony
    = > A single formula vs a software code as a result ?!
What statistical philosophies
    and schools exist ?

          -Frequency Statistics
 R. A. Fisher
                                     Occam
                                  (Parsimony)
          -Bayesian Statistics
Bayes




          -Machine Learning

 L. Breiman
What statistical philosophies
    and schools exist ?
  e.g. Frequency Statistics
  -Normal distribution
  -Independence
  -Linear functions
  -Variance explained
  -Parsimony
What statistical philosophies
    and schools exist ?
 e.g. Bayesian Statistics
 -Priors
 -Fit a function
 -Variance explained
 -(alternative) confidence intervals
 -Linear functions (usually)
What statistical philosophies
    and schools exist ?
 e.g. Machine Learning
 -Inherently non-linear
 -No prior assumptions (non-parametric)
 -Fast
 -Computationally intense
What linear models and frequency statistics cannot achieve
             Linear                      Non-Linear
           (~unrealistic)             (driven by data)




       ā€˜Meanā€™                      ā€˜Meanā€™ ?
       SD                          SD ?
                                   Machine Learning, e.g.
        => potentially low r2
                                   CART, TreeNet &
                                   RandomForest
                                   (there are many
                                    other algorithms !)
     Virtually not Pro-Active       Pro-active Potential
                                      ~Pre-cautionary
What machine learning is



                      Machine Learning
                        algorithm




A numerical pattern                     The mimickā€™ed pattern
              Virtually NOT parsimonious (no AIC, no R2 etc etc)
What is a non-linear algorithm ā€¦?


Through the origin vs.
                  Not through the origin


Single formula vs.
                  Many formulas
What is a non-linear algorithm ā€¦?

Through the origin vs.
                   Not through the origin
What is a non-linear algorithm ā€¦?

Single formula vs.
                     Many formulas
What is a non-linear algorithm ā€¦?

Through the origin vs.
                   Not through the origin
Single formula vs.
                  Many formulas


A binary software algorithm that captures
the mathematical relationship vs.
         A single formula with (a) coefficient
Linear regression as an artefact,
aka statistical & institutional terror ?!




    Linear regression: a+ bx vs.
                  Machine Learning?!
Predictions
Predictions as the prime goal, and linked with Adaptive Management
(done in Random Forest)




                                                                   Best data
                                                                   used and
                                                                   available ?




Pelagic King Penguin prediction using RF (french telemetry data from SCAR-MARBIN)
Outliers and variance ā€¦
reality




                 predicted

    Root Mean Square Error (RMSE)
Outliers and variance ā€¦

                   Predicted
          Yes   Yes            No
                80%            20%
Reality




                20%            80%
          No




Confusion Matrix (=> ROC curve)
Moving into Data Mining
                   (non-parametric, rel. high data error)




   A heap of data,
e.g. online and merged,
free of research design

                                  Your classic Frequentist Statistician
                                  might get very scared and horrified
                                  with such dataā€¦
Moving into Data Mining



                                                                         Describe Patterns,




                                        Screening
                                                                              Outliers

                                                                      Data Quantification
                                                                         (~prediction)
                                                                           Informed
   A heap of data,                                                        Traditional
e.g. online and merged                                                      Analysis
    Craig, E., and F. Huettmann. (2008). Using ā€œblackboxā€ algorithms such as           TreeNet and Random
           Forests for data-mining and for finding meaningful patterns, relationships and outliers in complex
           ecological data: an overview, an example using golden eagle satellite data and an outlook for a
           promising future. Chapter IV in Intelligent Data Analysis: Developing New Methodologies through
           Pattern Discovery and Recovery (Hsiao-fan Wang, Ed.). IGI Global, Hershey, PA, USA.
Data
                                         Mining




                                                                         Describe Patterns,




                                        Screening
                                                                              Outliers

                                                                      Data Quantification
                                                                         (~prediction)
                                                                           Informed
   A heap of data,                                                       (Traditional)
e.g. online and merged                                                      Analysis
    Craig, E., and F. Huettmann. (2008). Using ā€œblackboxā€ algorithms such as           TreeNet and Random
           Forests for data-mining and for finding meaningful patterns, relationships and outliers in complex
           ecological data: an overview, an example using golden eagle satellite data and an outlook for a
           promising future. Chapter IV in Intelligent Data Analysis: Developing New Methodologies through
           Pattern Discovery and Recovery (Hsiao-fan Wang, Ed.). IGI Global, Hershey, PA, USA.

    Magness, D.R., F. Huettmann, and J.M. Morton. (2008). Using Random Forests to provide predicted
        species distribution maps as a metric for ecological inventory & monitoring programs. Pages 209-
        229 in T.G. Smolinski, M.G. Milanova & A-E. Hassanien (eds.). Applications of Computational
        Intelligence in Biology: Current Trends and Open Problems. Studies in Computational Intelligence,
        Vol. 122, Springer-Verlag Berlin Heidelberg.
Data
                                                                          ā‡’     SOFTWARE
                                         Mining
                                                                           This becomes a standard,
                                                                           e.g. BEST PROFESSIONAL
                                                                               PRACTICE

                                                                         Describe Patterns,




                                        Screening
                                                                              Outliers

                                                                      Data Quantification
                                                                         (~prediction)
                                                                           Informed
   A heap of data,                                                       (Traditional)
e.g. online and merged                                                      Analysis
    Craig, E., and F. Huettmann. (2008). Using ā€œblackboxā€ algorithms such as           TreeNet and Random
           Forests for data-mining and for finding meaningful patterns, relationships and outliers in complex
           ecological data: an overview, an example using golden eagle satellite data and an outlook for a
           promising future. Chapter IV in Intelligent Data Analysis: Developing New Methodologies through
           Pattern Discovery and Recovery (Hsiao-fan Wang, Ed.). IGI Global, Hershey, PA, USA.

    Magness, D.R., F. Huettmann, and J.M. Morton. (2008). Using Random Forests to provide predicted
        species distribution maps as a metric for ecological inventory & monitoring programs. Pages 209-
        229 in T.G. Smolinski, M.G. Milanova & A-E. Hassanien (eds.). Applications of Computational
        Intelligence in Biology: Current Trends and Open Problems. Studies in Computational Intelligence,
        Vol. 122, Springer-Verlag Berlin Heidelberg.
(Adaptive/Science-based)
      Wildlife Management:
         How it worksā€¦
-Identify a problem

-Do the science

-Provide a management suggestion

-Implement science-based management

-Assess and fine-tune

 =>Pro-active (before trouble occurs)
   => Predictive in Space and Time
Paradigm Shifts
-Data Mining as a means to inform an analysis

-Predictions as a goal in itself

-Linear regressions and algorithms as a poor performer
 for modern problems of relevance to mankind

-Metrics like r2, p-values and AIC as being suboptimal

-Statistics based on (very advanced) software

-Decision-support systems, Pro-Active Management
 and Operations Research
RandomForest as a scientific discipline ?!
-a big and new research field per se (e.g. machine learning,
                                           ensemble models)

-comes naturally with computing power and the internet

-changes science (and fieldwork, e.g. design-based)

-changes education (e.g. universities & stats depts)

-changes pro-active/adaptive management, multi-species

-changes society

-affects land- and seascapes
Adding Spatial Complexity:
         What is a GIS
(Geographic Information System) ?
   ā€œComputerized Mappingā€

   Consists of Hardware & Software

   (Monitor, PC and Database)

   ArcGIS by ESRI as a major software suite

   =>Can be linked with Machine Learning,
     e.g. for predictions, classifications and
          data mining
What is a GIS
(Geographic Information System) ?
RandomForest & GIS:
            Mapping Supervised & Unsupervised Classification

Supervised Classification: -Multiple Regression (classification or continuous)

                             -Multiple Response
                              e.g. YAIMPUTE
                                                              RandomForest

Unsupervised Classification: 1. Proximity Matrix via Bagging/Voting (RF)

                                 2. Similarity Matrix

                                 3. e.g. Regular Clustering (mclust, PAM)

                                3. Visualize Result
11 Cliome Clusters (RF)
Climate Cluster Data, Canada & Alaska




                         Credit: M. Lindgren et al.
Teaching Machine Learning
            Whatā€™s the issue ?

-the current teaching concept re. math & stats

-wider introduction of machine learning

-parsimony vs. holistic analysis and approach

-better predictions
Teaching math, statistics and
     machine learningā€¦


Usually, machine learning starts where
maximum-likelihood theory endsā€¦

=>maximum-likelihood starts at collegeā€¦

 But it does not have to be that way!
Teaching math, statistics and
          machine learningā€¦
Studentā€™s age       Topic learned

16                  Basic Math

5                 Frequency Statistics
                     (Maximum Likelihood)

23 - 30              Machine Learning
Teaching math, statistics and
   machine learningā€¦the better way
Studentā€™s age   Topic learned
  15             Basic Math
  16             Frequency Statistics
  18             (Maximum Likelihood)
  19             Machine Learning

                 Why & How done ?

                 via Greybox   Blackbox
Teaching math, statistics and
         machine learningā€¦
Parsimony (complexity   vs. variance explained)

    Model name   AIC etc
    Model 1       1000
    Model 2        900
    Model 3        600
    Model 4        500
    Model 5       1200      Breaking down a complex
    Model 6       1100      real-world problem into a single
                            linear termā€¦and for a
                            mechanistic understandingā€¦??
Teaching math, statistics and
         machine learningā€¦
Predictions (knowing things ahead of time)

         For spatial predictions
         For temporal predictions
         For a pro-active management
Toā€¦Salford Systems,
Dan Steinberg and his team,
our students, L. Strecker,
S. Linke, and many many
project and research collaborators
world-wide over the years!!



                           Questions ?

More Related Content

What's hot

A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbies
Vimal Gupta
Ā 

What's hot (8)

An introduction to neural networks
An introduction to neural networksAn introduction to neural networks
An introduction to neural networks
Ā 
From Knowledge Bases to Knowledge Infrastructures for Intelligent Systems
From Knowledge Bases to Knowledge Infrastructures for Intelligent SystemsFrom Knowledge Bases to Knowledge Infrastructures for Intelligent Systems
From Knowledge Bases to Knowledge Infrastructures for Intelligent Systems
Ā 
Statistics For Data Science | Statistics Using R Programming Language | Hypot...
Statistics For Data Science | Statistics Using R Programming Language | Hypot...Statistics For Data Science | Statistics Using R Programming Language | Hypot...
Statistics For Data Science | Statistics Using R Programming Language | Hypot...
Ā 
A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbies
Ā 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
Ā 
Data analytcis-first-steps
Data analytcis-first-stepsData analytcis-first-steps
Data analytcis-first-steps
Ā 
An introduction to neural network
An introduction to neural networkAn introduction to neural network
An introduction to neural network
Ā 
Data analytics beyond data processing and how it affects Industry 4.0
Data analytics beyond data processing and how it affects Industry 4.0Data analytics beyond data processing and how it affects Industry 4.0
Data analytics beyond data processing and how it affects Industry 4.0
Ā 

Similar to Paradigm shifts in wildlife and biodiversity management through machine learning

Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
butest
Ā 
Global Modeling of Biodiversity and Climate Change
Global Modeling of Biodiversity and Climate ChangeGlobal Modeling of Biodiversity and Climate Change
Global Modeling of Biodiversity and Climate Change
Salford Systems
Ā 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Mining
butest
Ā 

Similar to Paradigm shifts in wildlife and biodiversity management through machine learning (20)

Data mining
Data mining Data mining
Data mining
Ā 
Techniques Machine Learning
Techniques Machine LearningTechniques Machine Learning
Techniques Machine Learning
Ā 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
Ā 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data mining
Ā 
Global Modeling of Biodiversity and Climate Change
Global Modeling of Biodiversity and Climate ChangeGlobal Modeling of Biodiversity and Climate Change
Global Modeling of Biodiversity and Climate Change
Ā 
Computer Vision, Computation, and Geometry
Computer Vision, Computation, and GeometryComputer Vision, Computation, and Geometry
Computer Vision, Computation, and Geometry
Ā 
Python for data science
Python for data sciencePython for data science
Python for data science
Ā 
High Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationHigh Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and Visualization
Ā 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Mining
Ā 
La rĆ©solution de problĆØmes Ć  l'aide de graphes
La rĆ©solution de problĆØmes Ć  l'aide de graphesLa rĆ©solution de problĆØmes Ć  l'aide de graphes
La rĆ©solution de problĆØmes Ć  l'aide de graphes
Ā 
A Primer on Entity Resolution
A Primer on Entity ResolutionA Primer on Entity Resolution
A Primer on Entity Resolution
Ā 
Big Data with Rough Set Using Map- Reduce
Big Data with Rough Set Using Map- ReduceBig Data with Rough Set Using Map- Reduce
Big Data with Rough Set Using Map- Reduce
Ā 
machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...
Ā 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown Bag
Ā 
Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)
Ā 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Ā 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
Ā 
Polikar10missing
Polikar10missingPolikar10missing
Polikar10missing
Ā 
Machine learning pour les donnĆ©es massives algorithmes randomisĀ“es, en ligne ...
Machine learning pour les donnĆ©es massives algorithmes randomisĀ“es, en ligne ...Machine learning pour les donnĆ©es massives algorithmes randomisĀ“es, en ligne ...
Machine learning pour les donnĆ©es massives algorithmes randomisĀ“es, en ligne ...
Ā 
UNIT1-2.pptx
UNIT1-2.pptxUNIT1-2.pptx
UNIT1-2.pptx
Ā 

More from Salford Systems

Evolution of regression ols to gps to mars
Evolution of regression   ols to gps to marsEvolution of regression   ols to gps to mars
Evolution of regression ols to gps to mars
Salford Systems
Ā 
Comparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modelingComparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modeling
Salford Systems
Ā 
Molecular data mining tool advances in hiv
Molecular data mining tool  advances in hivMolecular data mining tool  advances in hiv
Molecular data mining tool advances in hiv
Salford Systems
Ā 
Hybrid cart logit model 1998
Hybrid cart logit model 1998Hybrid cart logit model 1998
Hybrid cart logit model 1998
Salford Systems
Ā 

More from Salford Systems (20)

Datascience101presentation4
Datascience101presentation4Datascience101presentation4
Datascience101presentation4
Ā 
Improve Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForestsImprove Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForests
Ā 
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Ā 
Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications
Ā 
The Do's and Don'ts of Data Mining
The Do's and Don'ts of Data MiningThe Do's and Don'ts of Data Mining
The Do's and Don'ts of Data Mining
Ā 
Introduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele CutlerIntroduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele Cutler
Ā 
9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You
Ā 
Statistically Significant Quotes To Remember
Statistically Significant Quotes To RememberStatistically Significant Quotes To Remember
Statistically Significant Quotes To Remember
Ā 
Using CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example DatasetUsing CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example Dataset
Ā 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User Guide
Ā 
Evolution of regression ols to gps to mars
Evolution of regression   ols to gps to marsEvolution of regression   ols to gps to mars
Evolution of regression ols to gps to mars
Ā 
Data Mining for Higher Education
Data Mining for Higher EducationData Mining for Higher Education
Data Mining for Higher Education
Ā 
Comparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modelingComparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modeling
Ā 
Molecular data mining tool advances in hiv
Molecular data mining tool  advances in hivMolecular data mining tool  advances in hiv
Molecular data mining tool advances in hiv
Ā 
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
TreeNet Tree Ensembles & CART Decision Trees:  A Winning CombinationTreeNet Tree Ensembles & CART Decision Trees:  A Winning Combination
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
Ā 
SPM v7.0 Feature Matrix
SPM v7.0 Feature MatrixSPM v7.0 Feature Matrix
SPM v7.0 Feature Matrix
Ā 
SPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARSSPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARS
Ā 
Hybrid cart logit model 1998
Hybrid cart logit model 1998Hybrid cart logit model 1998
Hybrid cart logit model 1998
Ā 
Session Logs Tutorial for SPM
Session Logs Tutorial for SPMSession Logs Tutorial for SPM
Session Logs Tutorial for SPM
Ā 
Some of the new features in SPM 7
Some of the new features in SPM 7Some of the new features in SPM 7
Some of the new features in SPM 7
Ā 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(ā˜Žļø+971_581248768%)**%*]'#abortion pills for sale in dubai@
Ā 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
Ā 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
Ā 

Recently uploaded (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Ā 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
Ā 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Ā 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
Ā 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Ā 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
Ā 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
Ā 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
Ā 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Ā 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
Ā 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Ā 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Ā 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Ā 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
Ā 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
Ā 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
Ā 
Finology Group ā€“ Insurtech Innovation Award 2024
Finology Group ā€“ Insurtech Innovation Award 2024Finology Group ā€“ Insurtech Innovation Award 2024
Finology Group ā€“ Insurtech Innovation Award 2024
Ā 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
Ā 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
Ā 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Ā 

Paradigm shifts in wildlife and biodiversity management through machine learning

  • 1. Machine Learning in the Environmental Discipline: Statistical Paradigm Shifts, Combining Data Mining with Geographic Information Systems (GIS), and Educating the Next Generation of Environmental Researchers Falk Huettmann -EWHALE lab- 419 Irving 1 Inst. of Arctic Biology Biology & Wildlife Dept Fairbanks, Alaska 99775 USA fhuettmann@alaska.edu
  • 2. Applying and Teaching Machine Learning at Public Institutions Whatā€™s at stake ? -a better pre-screening of data (data mining) -a better inference (better model fits) -a better hypothesis (more relevant) -a faster and more accurate analysis -a better prediction (pro-active decisions) => progress in science & society => a new culture
  • 3. This presentation -Environmental Sciences (Sustainability) -Paradigm Shift in the Sciences -Geographic Information Systems (GIS) -Teaching Machine Learning at Public Institutions
  • 4. The earth and its atmosphere ...we only have one ...
  • 6. Rio Convention 1992 and its goals
  • 7. The Earth at nightā€¦ ā€¦people are everywhere these days, mostly at the coast, affecting Wildlife, Habitats and Ecological Services
  • 8. An Earth Sampler: The Three Poles 1. 3. 2.
  • 9. Three poles: Ice, glaciers and snow => A cool (or a hot?) place to be... 1. 2. 3.
  • 10. No data ā€¦and ā€œnothing is knownā€ ā€¦? Climbing and Understanding Data Mountains with Data Mining: A free copy of GBIF dataā€¦ www.gbif.org, seamap.env.duke.edu/ , mountainbiodiversity.org Megascience
  • 11. Examples of people using and publishing on Machine Learning in the Biodiversity, Wildlife and Sustainability Sciencesā€¦
  • 12. Classify, pre-screen, explain and predict data A heap of data, e.g. online and merged, free of research design Response ~ Predictor1 + Predictor2 + Predictor3 + Predictor4 +ā€¦. (e.g. clustering & multiple regression)
  • 13. Multiple Regressions: Are they any good ?! Response ~ Predictor1 + Predictor2 + Predictor3 + Predictor4 +ā€¦. The issues are for instance: -Many predictors, e.g. 30 -Correlations -Interactions (2-way, 3-way and higher) -Coefficients + Offset -(Multi-) hypothesis testing -Model Selection -Parsimony = > A single formula vs a software code as a result ?!
  • 14. What statistical philosophies and schools exist ? -Frequency Statistics R. A. Fisher Occam (Parsimony) -Bayesian Statistics Bayes -Machine Learning L. Breiman
  • 15. What statistical philosophies and schools exist ? e.g. Frequency Statistics -Normal distribution -Independence -Linear functions -Variance explained -Parsimony
  • 16. What statistical philosophies and schools exist ? e.g. Bayesian Statistics -Priors -Fit a function -Variance explained -(alternative) confidence intervals -Linear functions (usually)
  • 17. What statistical philosophies and schools exist ? e.g. Machine Learning -Inherently non-linear -No prior assumptions (non-parametric) -Fast -Computationally intense
  • 18. What linear models and frequency statistics cannot achieve Linear Non-Linear (~unrealistic) (driven by data) ā€˜Meanā€™ ā€˜Meanā€™ ? SD SD ? Machine Learning, e.g. => potentially low r2 CART, TreeNet & RandomForest (there are many other algorithms !) Virtually not Pro-Active Pro-active Potential ~Pre-cautionary
  • 19. What machine learning is Machine Learning algorithm A numerical pattern The mimickā€™ed pattern Virtually NOT parsimonious (no AIC, no R2 etc etc)
  • 20. What is a non-linear algorithm ā€¦? Through the origin vs. Not through the origin Single formula vs. Many formulas
  • 21. What is a non-linear algorithm ā€¦? Through the origin vs. Not through the origin
  • 22. What is a non-linear algorithm ā€¦? Single formula vs. Many formulas
  • 23. What is a non-linear algorithm ā€¦? Through the origin vs. Not through the origin Single formula vs. Many formulas A binary software algorithm that captures the mathematical relationship vs. A single formula with (a) coefficient
  • 24. Linear regression as an artefact, aka statistical & institutional terror ?! Linear regression: a+ bx vs. Machine Learning?!
  • 25. Predictions Predictions as the prime goal, and linked with Adaptive Management (done in Random Forest) Best data used and available ? Pelagic King Penguin prediction using RF (french telemetry data from SCAR-MARBIN)
  • 26. Outliers and variance ā€¦ reality predicted Root Mean Square Error (RMSE)
  • 27. Outliers and variance ā€¦ Predicted Yes Yes No 80% 20% Reality 20% 80% No Confusion Matrix (=> ROC curve)
  • 28. Moving into Data Mining (non-parametric, rel. high data error) A heap of data, e.g. online and merged, free of research design Your classic Frequentist Statistician might get very scared and horrified with such dataā€¦
  • 29. Moving into Data Mining Describe Patterns, Screening Outliers Data Quantification (~prediction) Informed A heap of data, Traditional e.g. online and merged Analysis Craig, E., and F. Huettmann. (2008). Using ā€œblackboxā€ algorithms such as TreeNet and Random Forests for data-mining and for finding meaningful patterns, relationships and outliers in complex ecological data: an overview, an example using golden eagle satellite data and an outlook for a promising future. Chapter IV in Intelligent Data Analysis: Developing New Methodologies through Pattern Discovery and Recovery (Hsiao-fan Wang, Ed.). IGI Global, Hershey, PA, USA.
  • 30. Data Mining Describe Patterns, Screening Outliers Data Quantification (~prediction) Informed A heap of data, (Traditional) e.g. online and merged Analysis Craig, E., and F. Huettmann. (2008). Using ā€œblackboxā€ algorithms such as TreeNet and Random Forests for data-mining and for finding meaningful patterns, relationships and outliers in complex ecological data: an overview, an example using golden eagle satellite data and an outlook for a promising future. Chapter IV in Intelligent Data Analysis: Developing New Methodologies through Pattern Discovery and Recovery (Hsiao-fan Wang, Ed.). IGI Global, Hershey, PA, USA. Magness, D.R., F. Huettmann, and J.M. Morton. (2008). Using Random Forests to provide predicted species distribution maps as a metric for ecological inventory & monitoring programs. Pages 209- 229 in T.G. Smolinski, M.G. Milanova & A-E. Hassanien (eds.). Applications of Computational Intelligence in Biology: Current Trends and Open Problems. Studies in Computational Intelligence, Vol. 122, Springer-Verlag Berlin Heidelberg.
  • 31. Data ā‡’ SOFTWARE Mining This becomes a standard, e.g. BEST PROFESSIONAL PRACTICE Describe Patterns, Screening Outliers Data Quantification (~prediction) Informed A heap of data, (Traditional) e.g. online and merged Analysis Craig, E., and F. Huettmann. (2008). Using ā€œblackboxā€ algorithms such as TreeNet and Random Forests for data-mining and for finding meaningful patterns, relationships and outliers in complex ecological data: an overview, an example using golden eagle satellite data and an outlook for a promising future. Chapter IV in Intelligent Data Analysis: Developing New Methodologies through Pattern Discovery and Recovery (Hsiao-fan Wang, Ed.). IGI Global, Hershey, PA, USA. Magness, D.R., F. Huettmann, and J.M. Morton. (2008). Using Random Forests to provide predicted species distribution maps as a metric for ecological inventory & monitoring programs. Pages 209- 229 in T.G. Smolinski, M.G. Milanova & A-E. Hassanien (eds.). Applications of Computational Intelligence in Biology: Current Trends and Open Problems. Studies in Computational Intelligence, Vol. 122, Springer-Verlag Berlin Heidelberg.
  • 32. (Adaptive/Science-based) Wildlife Management: How it worksā€¦ -Identify a problem -Do the science -Provide a management suggestion -Implement science-based management -Assess and fine-tune =>Pro-active (before trouble occurs) => Predictive in Space and Time
  • 33. Paradigm Shifts -Data Mining as a means to inform an analysis -Predictions as a goal in itself -Linear regressions and algorithms as a poor performer for modern problems of relevance to mankind -Metrics like r2, p-values and AIC as being suboptimal -Statistics based on (very advanced) software -Decision-support systems, Pro-Active Management and Operations Research
  • 34. RandomForest as a scientific discipline ?! -a big and new research field per se (e.g. machine learning, ensemble models) -comes naturally with computing power and the internet -changes science (and fieldwork, e.g. design-based) -changes education (e.g. universities & stats depts) -changes pro-active/adaptive management, multi-species -changes society -affects land- and seascapes
  • 35. Adding Spatial Complexity: What is a GIS (Geographic Information System) ? ā€œComputerized Mappingā€ Consists of Hardware & Software (Monitor, PC and Database) ArcGIS by ESRI as a major software suite =>Can be linked with Machine Learning, e.g. for predictions, classifications and data mining
  • 36. What is a GIS (Geographic Information System) ?
  • 37. RandomForest & GIS: Mapping Supervised & Unsupervised Classification Supervised Classification: -Multiple Regression (classification or continuous) -Multiple Response e.g. YAIMPUTE RandomForest Unsupervised Classification: 1. Proximity Matrix via Bagging/Voting (RF) 2. Similarity Matrix 3. e.g. Regular Clustering (mclust, PAM) 3. Visualize Result
  • 38. 11 Cliome Clusters (RF) Climate Cluster Data, Canada & Alaska Credit: M. Lindgren et al.
  • 39. Teaching Machine Learning Whatā€™s the issue ? -the current teaching concept re. math & stats -wider introduction of machine learning -parsimony vs. holistic analysis and approach -better predictions
  • 40. Teaching math, statistics and machine learningā€¦ Usually, machine learning starts where maximum-likelihood theory endsā€¦ =>maximum-likelihood starts at collegeā€¦ But it does not have to be that way!
  • 41. Teaching math, statistics and machine learningā€¦ Studentā€™s age Topic learned 16 Basic Math 5 Frequency Statistics (Maximum Likelihood) 23 - 30 Machine Learning
  • 42. Teaching math, statistics and machine learningā€¦the better way Studentā€™s age Topic learned 15 Basic Math 16 Frequency Statistics 18 (Maximum Likelihood) 19 Machine Learning Why & How done ? via Greybox Blackbox
  • 43. Teaching math, statistics and machine learningā€¦ Parsimony (complexity vs. variance explained) Model name AIC etc Model 1 1000 Model 2 900 Model 3 600 Model 4 500 Model 5 1200 Breaking down a complex Model 6 1100 real-world problem into a single linear termā€¦and for a mechanistic understandingā€¦??
  • 44. Teaching math, statistics and machine learningā€¦ Predictions (knowing things ahead of time) For spatial predictions For temporal predictions For a pro-active management
  • 45. Toā€¦Salford Systems, Dan Steinberg and his team, our students, L. Strecker, S. Linke, and many many project and research collaborators world-wide over the years!! Questions ?