SlideShare uma empresa Scribd logo
1 de 35
Baixar para ler offline
Data Mining, Truth, Justice, the American Way,
      and the Flying Spaghetti Monster




               tim@menzies.us Ph.D.
             LCSEE, WVU, 20 Sept 2007
Expose, and hose


• quot;Part of education is to     • quot;Part of science is to
  expose people to different     expose people to the
  schools of thought.”           critical and continual
                                 (re)evaluation of ideas.”
   - President George Bush,       - Some guy called Timm,
     August 1, 2005                 September 20, 2007




                                                             2
quot;Look up in the sky! It's a bird! It's a
      plane! It's Superman!quot;
                   quot;Yes, it's Superman, strange visitor from
                   another planet who came to Earth with
                   powers and abilities far beyond those of
                   mortal men.”

                   “Superman, who can change the course of
                   mighty rivers, bend steel in his bare hands;
                   and who, disguised as Clark Kent, mild-
                   mannered reporter for a great metropolitan
                   newspaper, fights a never ending battle for
                   truth, justice, and the American way.quot;
                                                 Why a never-
                        How to ensure            ending battle?
                        justice?
                                         How to make lottsa $$ ?
   How to find truth?
                                                                  3
So, tonight
     Notions of certainty
 
         Standards for debate
     

     Surprises
 
         Nothing is “truth”
     
              but many more things are false
          

         And some things are useful
     

     Implications for humility
 
         And for justice
     




                                               4
God gave me a brain.
    I take it (s)he wants me to use it.
    Mark of the rational

         while not dead; do
     
              Review and revise assumptions;
          

         Done
     



    Entertain a wide range of ideas

         But don’t necessarily accept them
     



    Demand evidence

         that lets your repeat/ refute/ improve
     
         prior conclusions

    But what of faith?

         That, is another talk
     

         There is room for the
     
         divine in my universe
         But in my test tubes?
     
              Not too much
          

                                                  5
Data miners: agents that automate the
    creation and review of new ideas
@relation weather.symbolic
@attribute outlook {sunny, overcast, rainy}
                                              Mountains
@attribute temperature {hot, mild, cool}
@attribute humidity {high, normal}
                                              of data
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}
@data

sunny,hot,high,FALSE,no
                                                          Tablespoons of
sunny,hot,high,TRUE,no

                                                          knowledge
overcast,hot,high,FALSE,yes
rainy,mild,high,FALSE,yes
rainy,cool,normal,FALSE,yes
                                                  outlook = sunny
rainy,cool,normal,TRUE,no
                                                   |   humidity = high: no
overcast,cool,normal,TRUE,yes
sunny,mild,high,FALSE,no                           |   humidity = normal: yes
sunny,cool,normal,FALSE,yes
                                                   outlook = overcast: yes
rainy,mild,normal,FALSE,yes
                                                   outlook = rainy
sunny,mild,normal,TRUE,yes
                                                   |   windy = TRUE: no
overcast,mild,high,TRUE,yes
overcast,hot,normal,FALSE,yes                      |   windy = FALSE: yes
rainy,mild,high,TRUE,no


                                                                                6
Data doubling every 20 months
    Internet, Radio Frequency Identification (RFID) tracking, on-line

    shopping (patterns of sales tracked at Amazon)

    So now we can automatically learn answers to many questions; e.g.

         What eggs to select for IVF?
     
         What will software cost to develop?
     
         What diseases does a patient have?
     
         Which loan applications to fund?
     
         What houses will have the best resale value?
     
         Which parts of the program need more inspection?
     
         What products are best to sell to what markets?
     
         What cows to keep and which to send to the abattoir ?
     
         How to teach a satellite to distinguish between cloud shadows and oil
     
         spills?
         How much electricity will be needed in two hours
     
              i.e. what cola-powered generators to fire up?
          
                                                                                 7
More fundamentally, what can we say
 about the world, with any certainty?
    Same data, different data miners

        different conclusions
    

    Every miner biased by

        Evaluation bias
    
        Language
    
             What is the “shape” of the
         
             models we can learn?
             Decision trees, equations, etc
         

        Search
    
             Pruning the possible infinite
         
             space of of candidate models
             What not to explore
         

        Over-fitting avoidance
    
             How to stop the learner fixating on noise
         

             E.g. pruning back decision trees
         


                                                         8
Any learning scheme
    has many biases
•   Bias lets us ignore “stuff”.
•   Without it, we don’t know
    what is important or dull, we
    can’t summarize, generalize.
•   Without bias, we can’t
    learn from the past
•   Bias blinds us but
    lets us see the future
•   But changing biases changes what
    we best believe
•   No wonder truth is a
    never-ending battle




                                    9
Generalizing from
                  the past, works
    Sometimes, very clearly

        Heavy smokers have
    
        2000% to 3000%
        higher change of lung
        cancer
    Learned theories

    performs very well on
    new data
    But ...

        the “best” learned theory
    
        can be a moveable feast.



                                     10
So, a relativistic soup?

    No certainty?


    No way to plan effective actions?


    No way to rule out absurd notions?





                                         11
I don’t want to offend
                  any one, but…
    … I think that once …                     Should I even say this in a
                                         
                                              public place?
        there were no cell phones
    
        or iPods, or clothes, or                   quot;Part of education is to expose
                                              
        countries, or language, or                 people to different schools of
        human society, or 4-valved                 thought.”
        hearts, or homeostasis, or                      President George Bush,
                                                    

        organs, or brains, or planets,                  August 1, 2005

        or stars, or matter                        Shouldn’t I be have to give
                                               
                                                   credence to all theories?
    Where the net energy

    in-flow is positive…                                Evolution,
                                                    

                                                        Intelligent design
         the universe selects for self-             
    
        perpetuating systems,                           Pirates cause global
                                                    
                                                        warming?
        an exponentially decreasing
    
        number of which are of
        exponentially increasing
        complexity
                                                                                     12
The Church of the Flying
         Spaghetti Monster (FSM)




    Founded in 2005


         OSU physics graduate Bobby Henderson
     

    A protest against the decision by the Kansas State Board of Education


         That require the teaching of intelligent design as an alternative to biological evolution.
     

    Henderson wrote to the board


         professing belief in a supernatural Creator called the Flying Spaghetti Monster
     

         Demanded that his quot;Pastafarianquot; theory of creation be taught in science classrooms.
     
                                                                                                      13
FSM is not about religion
    It is a mistake to view FSM as anti-religion

        Rather, FSM is anti-anti-scientific rigor
    

    No one in their right mind would ever

    believe this nonsense
        And that’s the point
    

    Truth is a never-ending battle

        We must have standards to assess scientific
    
        theories, to reject absurdities
        Or any nonsense can be released on this world
    
             E.g. “Global warming is caused by pirates.”
         



                                                           14
Wikipedia on FSM
    FSM: an invisible, undetectable            Pirates are quot;absolute divine
                                          
    Flying Spaghetti Monster                   beingsquot; and the original
                                               Pastafarians.
    Evidence for evolution planted by

    FSM to in to Pastafarians' faith           Their image as quot;thieves and
                                           
                                               outcastsquot; is misinformation spread
                                               by Christian theologians in the
    FSM changes the results of

                                               Middle Ages and Hare Krishnas.
    measurements, like radiocarbon
    dating, via His Noodly Appendage.
                                               Pirates are quot;peace-loving
                                           
                                               explorers and spreaders of good
    Heaven contains beer volcanoes

                                               willquot; who distributed candy to
    and a stripper factory.
                                               small children.

    Hell is similar, but with stale beer

                                               Global warming, earthquakes,
                                           
    and diseased strippers.                    hurricanes, and other natural
                                               disasters are a direct effect of the
                                               shrinking numbers of pirates since
                                               the 1800s.

                                                                                 15
FSM “proof” of the
             divinity of pirates
                                            A case study on how
                                            not to present data

                                            X-axis deliberately
                                            misleading.




Crazy? Yes!
  • But would you recognize such craziness if you say it again?

                                                                  16
What is the “best” weight-loss diet?




                                       17
How lucky for those in power
that people don't think.

- Adolph Hitler




       i.e. people trying to
       sell you their diet book
What is the “best”
programming language?




                        19
To our peril, we trust
                 old ideas too much
    Columbia ice strike:

         Size: 1200 in3,
     

         Speed: 477 mph
     
         (relative to vehicle)

    Certified as “safe” by the

    CRATER micro-meteorite
    model
         A typical experiment in
     
         CRATER’s test database
               Size: 3 in3 piece of debris
           

               Speed: under 150 mph.
           




                                             20
Value of estrogen

    (NYT magazine,
    Sept 16, 2007)



    1990s:

                                                     Failure of scientific method
                                                 
         American Heart Association
     
                                                          Benefits of estrogen reported from large
                                                      
         recommends hormone replacement
                                                          observational studies, not randomized trials
         therapy for older women to ward off
                                                     Repeated epidemiological finding:
         heart disease and osteoporosis.         

    2001:                                                 randomized trail rarely support conclusions
                                                     
                                                          from observational studies.
         15 million Americans filling H.R.T.
     
                                                     So forget what you’re read about
         prescriptions annually                  

    2002:                                                 Anti-oxidants like vitamins E & C &beta
                                                     
                                                          carotene preventing heat disease
         estrogen therapy exposed as a hazard,
     
                                                          Fiber prevents colon cancer
         not a benefit, for health                    



                                                                                                    21
So, why is FSM silly?
    And please, rest assured,

        it is very very silly stuff indeed.
    


    Theories need an entrance exam



    Many possible theories

        one for each bias
    


    Demand that a theory has past at least

    some operational al test before we
    condone it, act on it.
        If no reason to accept the new, don’t
    


    Trust the most what has been

    challenged the most
        Karl Popper
    
                                                22
No things are “right”, but some
          things are “useful”
    Sure, one data set supports many theories.

        But there are many many more theories that are
    
        unsupported.
    No model is right, but some things are useful

             (perform well on test data)
         

             George Box
         

    And many many many more ideas are useless

             Can’t make predictions
         

             Not defined enough to support (possible) refutation
         




                                                                   23
Wolfgang Pauli
    The quot;conscience of physicsquot;,

         the critic to whom his colleagues were accountable.
     

    Scathing in his dismissal of poor theories

         often labeling it ganz falsch, utterly false.
     

    But “ganz falsch” was not his most severe

    criticism,
         He hated theories so unclearly presented as to be
     
              untestable
          

              unevaluatable,
          

         Worse than wrong because they could not be
     
         proven wrong.
         Not properly belonging within the realm of science,
     
              even though posing as such.
          

         Famously, he wrote of of such unclear paper:
     
              ”This paper is right. It is not even wrong.quot;
          


                                                               24
Believe those who seek the truth;
     doubt those who find it
           -Andre Gide.
Don’t test once on just
            the training data
    Study more than the

    average
    performance

    Also look at the

    variance

    E.g. here, no

    significant on new
    data after X=8
                                    26
If something works, poke it till it breaks
   i) Sort attributes on “infogain”
   ii) Learn using first N attributes




   labor                                                     soybean




diabetes
                                                             anneal

                                         A few variables
                                        are (often) enough     27
Living with Uncertainty
    Check how training rate size effects theory





                                              28
Living with Uncertainty
    Launch learners with anomaly

    detection and repair tools




                                   29
Living with uncertainty:                           An incremental
                                                   discretizer + a Bayes
count, alert, fix                                  classifier where all inputs
                                                   are all mono-classified
                                                   Track average max
                                                   likelihood for data
                                                   processing in “era”’s of X
                                                   instances
                 Count: stuff seen in past
                 Alert: if new counts different    Contrast set learning
                 Fix: find delta new to old        Linear time inference,
                        Very, very fast            Tiny memory footprint
                 




                     And, it works [Orrego, 2004]
                 
                          F15 simulator data [courtesy B. Cukic]
                      
                          Five flights: a,b,c,d,e
                      
                          each with different off-nominal condition
                      
                          imposed at “time” 15
                          Off-nominal condition not present in prior data
                      
                          In all cases,
                      
                          massive change detected


                                                                        30
Living with uncertainty
                    Policy #1: exploration
                
Life is a
                         Tolerate the sub-optimal, a little
                     
balance
                         Doing crazy things to learn new things
                     
between
                    Policy #2: exploitation
                
                         Fix your theories and base your work on those fixed ideas.
                     


                                    Popper:
                                    • most “science” is puzzle solving…
                                    • … within existing paradigms.
                                    • Sometimes the paradigm breakdowns….
                                    • …prompting revolutionary research




               Human young:
               • Do crazy things (take long trips)
               • Less craziness as we grow older
                                                                               31
Tolerance of “exploration”
    Critical to the

    American way
         America: history of
     
         tolerance and acceptance

    1945:

         400 German rocket
     
         scientists choose to
         surrender to the Yankees,
         not the Russians
         The choose their post-war
     
         life based on their
         perceptions of American
         ideology
         Hence,
     



                                      32
Tolerance = hi-tech = $$$
    R. Florida: The Economic

    Geography of Talent, 2002
         Annals of Association of American
     
         Geographers 92(4), 2002,pp743-655

    Best predictor for hi-tech industry


         R2 0.42 to “coolness”
     

         R2 0.49 to cultural amenities
     

         R2 0.50 to median house value
     

         R2 0.77 to “diversity” index
     




                                             33
Data Mining, Truth, Justice, the
  American Way & Flying Spaghetti Monsters
                         “Superman, fights a never ending battle
   To make $$,           for truth, justice, and the American way.quot;
institutionalize
     exploration                                         Old conclusions must
                   No “truth”,
  and tolerance                                       be constantly re-assessed
                   all Is biased.

                      A healthy hi-tech needs
                      tolerance to support
                      exploration


                      and that the FSM is silly,
                     but would consider revising
                        that view if new evidence
                                         emerges
                                                                           34
Expose, and hose


• quot;Part of education is to     • quot;Part of science is to
  expose people to different     expose people to the
  schools of thought.”           critical and continual
                                 (re)evaluation of ideas.”
   - President George Bush,       - Some guy called Timm,
     August 1, 2005                 September 20, 2007




                                                             35

Mais conteúdo relacionado

Semelhante a Data mining, truth, justice, the American Way, and the Giant Spaghetti Monster

David Didau ResearchED
David Didau ResearchEDDavid Didau ResearchED
David Didau ResearchEDDavid Didau
 
Energy For One World- The Book- updated version (December 2012)
Energy For One World- The Book- updated version (December 2012)Energy For One World- The Book- updated version (December 2012)
Energy For One World- The Book- updated version (December 2012)Energy for One World
 
Culture Feasts on Innovation: Here's What you Can Do About It
Culture Feasts on Innovation: Here's What you Can Do About ItCulture Feasts on Innovation: Here's What you Can Do About It
Culture Feasts on Innovation: Here's What you Can Do About ItReuven Gorsht
 
Bienvenidos a la Cultura de la Innovación
Bienvenidos a la Cultura de la Innovación Bienvenidos a la Cultura de la Innovación
Bienvenidos a la Cultura de la Innovación Allan V. Braverman
 
Creativity and Problem Solving
Creativity and Problem SolvingCreativity and Problem Solving
Creativity and Problem SolvingVincent McGregor
 
Quarto evento dell'11/06/2009
Quarto evento dell'11/06/2009Quarto evento dell'11/06/2009
Quarto evento dell'11/06/2009guest24d6ac
 
Plan, Don't Hope: Using Understanding by Design to Improve Instruction
Plan, Don't Hope: Using Understanding by Design to Improve InstructionPlan, Don't Hope: Using Understanding by Design to Improve Instruction
Plan, Don't Hope: Using Understanding by Design to Improve InstructionGlenn Wiebe
 
Design and Darkmatter, Connecting Storytelling with Business Outcomes
Design and Darkmatter, Connecting Storytelling with Business OutcomesDesign and Darkmatter, Connecting Storytelling with Business Outcomes
Design and Darkmatter, Connecting Storytelling with Business OutcomesTrip O'Dell
 
Mba724 s2 w1 scientific reasoning
Mba724 s2 w1 scientific reasoningMba724 s2 w1 scientific reasoning
Mba724 s2 w1 scientific reasoningRachel Chung
 
Awakening the dreamer spring 15
Awakening the dreamer spring 15Awakening the dreamer spring 15
Awakening the dreamer spring 15AmyDeSa
 
Wendy Schultz In Celebration of Vision
Wendy Schultz In Celebration of VisionWendy Schultz In Celebration of Vision
Wendy Schultz In Celebration of VisionWendy Schultz
 
Agile Software Development Company - Agile Ethos
Agile Software Development Company - Agile EthosAgile Software Development Company - Agile Ethos
Agile Software Development Company - Agile EthosAgile Ethos
 

Semelhante a Data mining, truth, justice, the American Way, and the Giant Spaghetti Monster (20)

David Didau ResearchED
David Didau ResearchEDDavid Didau ResearchED
David Didau ResearchED
 
Energy For One World- The Book- updated version (December 2012)
Energy For One World- The Book- updated version (December 2012)Energy For One World- The Book- updated version (December 2012)
Energy For One World- The Book- updated version (December 2012)
 
Innovation Crafting
Innovation CraftingInnovation Crafting
Innovation Crafting
 
Culture Feasts on Innovation: Here's What you Can Do About It
Culture Feasts on Innovation: Here's What you Can Do About ItCulture Feasts on Innovation: Here's What you Can Do About It
Culture Feasts on Innovation: Here's What you Can Do About It
 
Bienvenidos a la Cultura de la Innovación
Bienvenidos a la Cultura de la Innovación Bienvenidos a la Cultura de la Innovación
Bienvenidos a la Cultura de la Innovación
 
Creativity and Problem Solving
Creativity and Problem SolvingCreativity and Problem Solving
Creativity and Problem Solving
 
Ed8
Ed8Ed8
Ed8
 
Quarto evento dell'11/06/2009
Quarto evento dell'11/06/2009Quarto evento dell'11/06/2009
Quarto evento dell'11/06/2009
 
Plan, Don't Hope: Using Understanding by Design to Improve Instruction
Plan, Don't Hope: Using Understanding by Design to Improve InstructionPlan, Don't Hope: Using Understanding by Design to Improve Instruction
Plan, Don't Hope: Using Understanding by Design to Improve Instruction
 
Black swan
Black swanBlack swan
Black swan
 
Design and Darkmatter, Connecting Storytelling with Business Outcomes
Design and Darkmatter, Connecting Storytelling with Business OutcomesDesign and Darkmatter, Connecting Storytelling with Business Outcomes
Design and Darkmatter, Connecting Storytelling with Business Outcomes
 
TOK 2
TOK 2TOK 2
TOK 2
 
Mba724 s2 w1 scientific reasoning
Mba724 s2 w1 scientific reasoningMba724 s2 w1 scientific reasoning
Mba724 s2 w1 scientific reasoning
 
Story Telling
Story  TellingStory  Telling
Story Telling
 
Influencethroughstorytelling 090307153427-phpapp01
Influencethroughstorytelling 090307153427-phpapp01Influencethroughstorytelling 090307153427-phpapp01
Influencethroughstorytelling 090307153427-phpapp01
 
Awakening the dreamer spring 15
Awakening the dreamer spring 15Awakening the dreamer spring 15
Awakening the dreamer spring 15
 
Wendy Schultz In Celebration of Vision
Wendy Schultz In Celebration of VisionWendy Schultz In Celebration of Vision
Wendy Schultz In Celebration of Vision
 
Vikbolandets Creativity
Vikbolandets CreativityVikbolandets Creativity
Vikbolandets Creativity
 
Agile Software Development Company - Agile Ethos
Agile Software Development Company - Agile EthosAgile Software Development Company - Agile Ethos
Agile Software Development Company - Agile Ethos
 
Technology of bliss
Technology of blissTechnology of bliss
Technology of bliss
 

Mais de CS, NcState

Talks2015 novdec
Talks2015 novdecTalks2015 novdec
Talks2015 novdecCS, NcState
 
GALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringGALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringCS, NcState
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest linkCS, NcState
 
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...CS, NcState
 
Lexisnexis june9
Lexisnexis june9Lexisnexis june9
Lexisnexis june9CS, NcState
 
Welcome to ICSE NIER’15 (new ideas and emerging results).
Welcome to ICSE NIER’15 (new ideas and emerging results).Welcome to ICSE NIER’15 (new ideas and emerging results).
Welcome to ICSE NIER’15 (new ideas and emerging results).CS, NcState
 
Icse15 Tech-briefing Data Science
Icse15 Tech-briefing Data ScienceIcse15 Tech-briefing Data Science
Icse15 Tech-briefing Data ScienceCS, NcState
 
Kits to Find the Bits that Fits
Kits to Find  the Bits that Fits Kits to Find  the Bits that Fits
Kits to Find the Bits that Fits CS, NcState
 
Ai4se lab template
Ai4se lab templateAi4se lab template
Ai4se lab templateCS, NcState
 
Automated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUAutomated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUCS, NcState
 
Requirements Engineering
Requirements EngineeringRequirements Engineering
Requirements EngineeringCS, NcState
 
172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginia172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginiaCS, NcState
 
Automated Software Engineering
Automated Software EngineeringAutomated Software Engineering
Automated Software EngineeringCS, NcState
 
Next Generation “Treatment Learning” (finding the diamonds in the dust)
Next Generation “Treatment Learning” (finding the diamonds in the dust)Next Generation “Treatment Learning” (finding the diamonds in the dust)
Next Generation “Treatment Learning” (finding the diamonds in the dust)CS, NcState
 
Tim Menzies, directions in Data Science
Tim Menzies, directions in Data ScienceTim Menzies, directions in Data Science
Tim Menzies, directions in Data ScienceCS, NcState
 
Dagstuhl14 intro-v1
Dagstuhl14 intro-v1Dagstuhl14 intro-v1
Dagstuhl14 intro-v1CS, NcState
 
The Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataThe Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataCS, NcState
 

Mais de CS, NcState (20)

Talks2015 novdec
Talks2015 novdecTalks2015 novdec
Talks2015 novdec
 
Future se oct15
Future se oct15Future se oct15
Future se oct15
 
GALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringGALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software Engineering
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest link
 
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
 
Lexisnexis june9
Lexisnexis june9Lexisnexis june9
Lexisnexis june9
 
Welcome to ICSE NIER’15 (new ideas and emerging results).
Welcome to ICSE NIER’15 (new ideas and emerging results).Welcome to ICSE NIER’15 (new ideas and emerging results).
Welcome to ICSE NIER’15 (new ideas and emerging results).
 
Icse15 Tech-briefing Data Science
Icse15 Tech-briefing Data ScienceIcse15 Tech-briefing Data Science
Icse15 Tech-briefing Data Science
 
Kits to Find the Bits that Fits
Kits to Find  the Bits that Fits Kits to Find  the Bits that Fits
Kits to Find the Bits that Fits
 
Ai4se lab template
Ai4se lab templateAi4se lab template
Ai4se lab template
 
Automated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUAutomated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSU
 
Requirements Engineering
Requirements EngineeringRequirements Engineering
Requirements Engineering
 
172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginia172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginia
 
Automated Software Engineering
Automated Software EngineeringAutomated Software Engineering
Automated Software Engineering
 
Next Generation “Treatment Learning” (finding the diamonds in the dust)
Next Generation “Treatment Learning” (finding the diamonds in the dust)Next Generation “Treatment Learning” (finding the diamonds in the dust)
Next Generation “Treatment Learning” (finding the diamonds in the dust)
 
Tim Menzies, directions in Data Science
Tim Menzies, directions in Data ScienceTim Menzies, directions in Data Science
Tim Menzies, directions in Data Science
 
Goldrush
GoldrushGoldrush
Goldrush
 
Dagstuhl14 intro-v1
Dagstuhl14 intro-v1Dagstuhl14 intro-v1
Dagstuhl14 intro-v1
 
Know thy tools
Know thy toolsKnow thy tools
Know thy tools
 
The Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataThe Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software Data
 

Último

Economic Risk Factor Update: April 2024 [SlideShare]
Economic Risk Factor Update: April 2024 [SlideShare]Economic Risk Factor Update: April 2024 [SlideShare]
Economic Risk Factor Update: April 2024 [SlideShare]Commonwealth
 
NO1 Certified Black Magic Specialist Expert In Bahawalpur, Sargodha, Sialkot,...
NO1 Certified Black Magic Specialist Expert In Bahawalpur, Sargodha, Sialkot,...NO1 Certified Black Magic Specialist Expert In Bahawalpur, Sargodha, Sialkot,...
NO1 Certified Black Magic Specialist Expert In Bahawalpur, Sargodha, Sialkot,...Amil baba
 
Overview of Inkel Unlisted Shares Price.
Overview of Inkel Unlisted Shares Price.Overview of Inkel Unlisted Shares Price.
Overview of Inkel Unlisted Shares Price.Precize Formely Leadoff
 
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdf
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdfBPPG response - Options for Defined Benefit schemes - 19Apr24.pdf
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdfHenry Tapper
 
Vp Girls near me Delhi Call Now or WhatsApp
Vp Girls near me Delhi Call Now or WhatsAppVp Girls near me Delhi Call Now or WhatsApp
Vp Girls near me Delhi Call Now or WhatsAppmiss dipika
 
Stock Market Brief Deck FOR 4/17 video.pdf
Stock Market Brief Deck FOR 4/17 video.pdfStock Market Brief Deck FOR 4/17 video.pdf
Stock Market Brief Deck FOR 4/17 video.pdfMichael Silva
 
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证rjrjkk
 
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办fqiuho152
 
SBP-Market-Operations and market managment
SBP-Market-Operations and market managmentSBP-Market-Operations and market managment
SBP-Market-Operations and market managmentfactical
 
Economics, Commerce and Trade Management: An International Journal (ECTIJ)
Economics, Commerce and Trade Management: An International Journal (ECTIJ)Economics, Commerce and Trade Management: An International Journal (ECTIJ)
Economics, Commerce and Trade Management: An International Journal (ECTIJ)ECTIJ
 
fca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdffca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdfHenry Tapper
 
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...Henry Tapper
 
government_intervention_in_business_ownership[1].pdf
government_intervention_in_business_ownership[1].pdfgovernment_intervention_in_business_ownership[1].pdf
government_intervention_in_business_ownership[1].pdfshaunmashale756
 
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdfmagnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdfHenry Tapper
 
Tenets of Physiocracy History of Economic
Tenets of Physiocracy History of EconomicTenets of Physiocracy History of Economic
Tenets of Physiocracy History of Economiccinemoviesu
 
Governor Olli Rehn: Dialling back monetary restraint
Governor Olli Rehn: Dialling back monetary restraintGovernor Olli Rehn: Dialling back monetary restraint
Governor Olli Rehn: Dialling back monetary restraintSuomen Pankki
 
Unveiling Business Expansion Trends in 2024
Unveiling Business Expansion Trends in 2024Unveiling Business Expansion Trends in 2024
Unveiling Business Expansion Trends in 2024Champak Jhagmag
 
Amil Baba In Pakistan amil baba in Lahore amil baba in Islamabad amil baba in...
Amil Baba In Pakistan amil baba in Lahore amil baba in Islamabad amil baba in...Amil Baba In Pakistan amil baba in Lahore amil baba in Islamabad amil baba in...
Amil Baba In Pakistan amil baba in Lahore amil baba in Islamabad amil baba in...amilabibi1
 
212MTAMount Durham University Bachelor's Diploma in Technology
212MTAMount Durham University Bachelor's Diploma in Technology212MTAMount Durham University Bachelor's Diploma in Technology
212MTAMount Durham University Bachelor's Diploma in Technologyz xss
 

Último (20)

Economic Risk Factor Update: April 2024 [SlideShare]
Economic Risk Factor Update: April 2024 [SlideShare]Economic Risk Factor Update: April 2024 [SlideShare]
Economic Risk Factor Update: April 2024 [SlideShare]
 
NO1 Certified Black Magic Specialist Expert In Bahawalpur, Sargodha, Sialkot,...
NO1 Certified Black Magic Specialist Expert In Bahawalpur, Sargodha, Sialkot,...NO1 Certified Black Magic Specialist Expert In Bahawalpur, Sargodha, Sialkot,...
NO1 Certified Black Magic Specialist Expert In Bahawalpur, Sargodha, Sialkot,...
 
Overview of Inkel Unlisted Shares Price.
Overview of Inkel Unlisted Shares Price.Overview of Inkel Unlisted Shares Price.
Overview of Inkel Unlisted Shares Price.
 
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdf
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdfBPPG response - Options for Defined Benefit schemes - 19Apr24.pdf
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdf
 
Vp Girls near me Delhi Call Now or WhatsApp
Vp Girls near me Delhi Call Now or WhatsAppVp Girls near me Delhi Call Now or WhatsApp
Vp Girls near me Delhi Call Now or WhatsApp
 
Stock Market Brief Deck FOR 4/17 video.pdf
Stock Market Brief Deck FOR 4/17 video.pdfStock Market Brief Deck FOR 4/17 video.pdf
Stock Market Brief Deck FOR 4/17 video.pdf
 
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
 
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办
 
SBP-Market-Operations and market managment
SBP-Market-Operations and market managmentSBP-Market-Operations and market managment
SBP-Market-Operations and market managment
 
Economics, Commerce and Trade Management: An International Journal (ECTIJ)
Economics, Commerce and Trade Management: An International Journal (ECTIJ)Economics, Commerce and Trade Management: An International Journal (ECTIJ)
Economics, Commerce and Trade Management: An International Journal (ECTIJ)
 
fca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdffca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdf
 
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
 
government_intervention_in_business_ownership[1].pdf
government_intervention_in_business_ownership[1].pdfgovernment_intervention_in_business_ownership[1].pdf
government_intervention_in_business_ownership[1].pdf
 
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdfmagnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
 
Tenets of Physiocracy History of Economic
Tenets of Physiocracy History of EconomicTenets of Physiocracy History of Economic
Tenets of Physiocracy History of Economic
 
Governor Olli Rehn: Dialling back monetary restraint
Governor Olli Rehn: Dialling back monetary restraintGovernor Olli Rehn: Dialling back monetary restraint
Governor Olli Rehn: Dialling back monetary restraint
 
Unveiling Business Expansion Trends in 2024
Unveiling Business Expansion Trends in 2024Unveiling Business Expansion Trends in 2024
Unveiling Business Expansion Trends in 2024
 
Amil Baba In Pakistan amil baba in Lahore amil baba in Islamabad amil baba in...
Amil Baba In Pakistan amil baba in Lahore amil baba in Islamabad amil baba in...Amil Baba In Pakistan amil baba in Lahore amil baba in Islamabad amil baba in...
Amil Baba In Pakistan amil baba in Lahore amil baba in Islamabad amil baba in...
 
212MTAMount Durham University Bachelor's Diploma in Technology
212MTAMount Durham University Bachelor's Diploma in Technology212MTAMount Durham University Bachelor's Diploma in Technology
212MTAMount Durham University Bachelor's Diploma in Technology
 
🔝+919953056974 🔝young Delhi Escort service Pusa Road
🔝+919953056974 🔝young Delhi Escort service Pusa Road🔝+919953056974 🔝young Delhi Escort service Pusa Road
🔝+919953056974 🔝young Delhi Escort service Pusa Road
 

Data mining, truth, justice, the American Way, and the Giant Spaghetti Monster

  • 1. Data Mining, Truth, Justice, the American Way, and the Flying Spaghetti Monster tim@menzies.us Ph.D. LCSEE, WVU, 20 Sept 2007
  • 2. Expose, and hose • quot;Part of education is to • quot;Part of science is to expose people to different expose people to the schools of thought.” critical and continual (re)evaluation of ideas.” - President George Bush, - Some guy called Timm, August 1, 2005 September 20, 2007 2
  • 3. quot;Look up in the sky! It's a bird! It's a plane! It's Superman!quot; quot;Yes, it's Superman, strange visitor from another planet who came to Earth with powers and abilities far beyond those of mortal men.” “Superman, who can change the course of mighty rivers, bend steel in his bare hands; and who, disguised as Clark Kent, mild- mannered reporter for a great metropolitan newspaper, fights a never ending battle for truth, justice, and the American way.quot; Why a never- How to ensure ending battle? justice? How to make lottsa $$ ? How to find truth? 3
  • 4. So, tonight Notions of certainty  Standards for debate  Surprises  Nothing is “truth”  but many more things are false  And some things are useful  Implications for humility  And for justice  4
  • 5. God gave me a brain. I take it (s)he wants me to use it. Mark of the rational  while not dead; do  Review and revise assumptions;  Done  Entertain a wide range of ideas  But don’t necessarily accept them  Demand evidence  that lets your repeat/ refute/ improve  prior conclusions But what of faith?  That, is another talk  There is room for the  divine in my universe But in my test tubes?  Not too much  5
  • 6. Data miners: agents that automate the creation and review of new ideas @relation weather.symbolic @attribute outlook {sunny, overcast, rainy} Mountains @attribute temperature {hot, mild, cool} @attribute humidity {high, normal} of data @attribute windy {TRUE, FALSE} @attribute play {yes, no} @data sunny,hot,high,FALSE,no Tablespoons of sunny,hot,high,TRUE,no knowledge overcast,hot,high,FALSE,yes rainy,mild,high,FALSE,yes rainy,cool,normal,FALSE,yes outlook = sunny rainy,cool,normal,TRUE,no | humidity = high: no overcast,cool,normal,TRUE,yes sunny,mild,high,FALSE,no | humidity = normal: yes sunny,cool,normal,FALSE,yes outlook = overcast: yes rainy,mild,normal,FALSE,yes outlook = rainy sunny,mild,normal,TRUE,yes | windy = TRUE: no overcast,mild,high,TRUE,yes overcast,hot,normal,FALSE,yes | windy = FALSE: yes rainy,mild,high,TRUE,no 6
  • 7. Data doubling every 20 months Internet, Radio Frequency Identification (RFID) tracking, on-line  shopping (patterns of sales tracked at Amazon) So now we can automatically learn answers to many questions; e.g.  What eggs to select for IVF?  What will software cost to develop?  What diseases does a patient have?  Which loan applications to fund?  What houses will have the best resale value?  Which parts of the program need more inspection?  What products are best to sell to what markets?  What cows to keep and which to send to the abattoir ?  How to teach a satellite to distinguish between cloud shadows and oil  spills? How much electricity will be needed in two hours  i.e. what cola-powered generators to fire up?  7
  • 8. More fundamentally, what can we say about the world, with any certainty? Same data, different data miners  different conclusions  Every miner biased by  Evaluation bias  Language  What is the “shape” of the  models we can learn? Decision trees, equations, etc  Search  Pruning the possible infinite  space of of candidate models What not to explore  Over-fitting avoidance  How to stop the learner fixating on noise  E.g. pruning back decision trees  8
  • 9. Any learning scheme has many biases • Bias lets us ignore “stuff”. • Without it, we don’t know what is important or dull, we can’t summarize, generalize. • Without bias, we can’t learn from the past • Bias blinds us but lets us see the future • But changing biases changes what we best believe • No wonder truth is a never-ending battle 9
  • 10. Generalizing from the past, works Sometimes, very clearly  Heavy smokers have  2000% to 3000% higher change of lung cancer Learned theories  performs very well on new data But ...  the “best” learned theory  can be a moveable feast. 10
  • 11. So, a relativistic soup? No certainty?  No way to plan effective actions?  No way to rule out absurd notions?  11
  • 12. I don’t want to offend any one, but… … I think that once … Should I even say this in a   public place? there were no cell phones  or iPods, or clothes, or quot;Part of education is to expose   countries, or language, or people to different schools of human society, or 4-valved thought.” hearts, or homeostasis, or President George Bush,  organs, or brains, or planets, August 1, 2005 or stars, or matter Shouldn’t I be have to give  credence to all theories? Where the net energy  in-flow is positive… Evolution,  Intelligent design the universe selects for self-   perpetuating systems, Pirates cause global  warming? an exponentially decreasing  number of which are of exponentially increasing complexity 12
  • 13. The Church of the Flying Spaghetti Monster (FSM) Founded in 2005  OSU physics graduate Bobby Henderson  A protest against the decision by the Kansas State Board of Education  That require the teaching of intelligent design as an alternative to biological evolution.  Henderson wrote to the board  professing belief in a supernatural Creator called the Flying Spaghetti Monster  Demanded that his quot;Pastafarianquot; theory of creation be taught in science classrooms.  13
  • 14. FSM is not about religion It is a mistake to view FSM as anti-religion  Rather, FSM is anti-anti-scientific rigor  No one in their right mind would ever  believe this nonsense And that’s the point  Truth is a never-ending battle  We must have standards to assess scientific  theories, to reject absurdities Or any nonsense can be released on this world  E.g. “Global warming is caused by pirates.”  14
  • 15. Wikipedia on FSM FSM: an invisible, undetectable Pirates are quot;absolute divine   Flying Spaghetti Monster beingsquot; and the original Pastafarians. Evidence for evolution planted by  FSM to in to Pastafarians' faith Their image as quot;thieves and  outcastsquot; is misinformation spread by Christian theologians in the FSM changes the results of  Middle Ages and Hare Krishnas. measurements, like radiocarbon dating, via His Noodly Appendage. Pirates are quot;peace-loving  explorers and spreaders of good Heaven contains beer volcanoes  willquot; who distributed candy to and a stripper factory. small children. Hell is similar, but with stale beer  Global warming, earthquakes,  and diseased strippers. hurricanes, and other natural disasters are a direct effect of the shrinking numbers of pirates since the 1800s. 15
  • 16. FSM “proof” of the divinity of pirates A case study on how not to present data X-axis deliberately misleading. Crazy? Yes! • But would you recognize such craziness if you say it again? 16
  • 17. What is the “best” weight-loss diet? 17
  • 18. How lucky for those in power that people don't think. - Adolph Hitler i.e. people trying to sell you their diet book
  • 19. What is the “best” programming language? 19
  • 20. To our peril, we trust old ideas too much Columbia ice strike:  Size: 1200 in3,  Speed: 477 mph  (relative to vehicle) Certified as “safe” by the  CRATER micro-meteorite model A typical experiment in  CRATER’s test database Size: 3 in3 piece of debris  Speed: under 150 mph.  20
  • 21. Value of estrogen (NYT magazine, Sept 16, 2007) 1990s:  Failure of scientific method  American Heart Association  Benefits of estrogen reported from large  recommends hormone replacement observational studies, not randomized trials therapy for older women to ward off Repeated epidemiological finding: heart disease and osteoporosis.  2001: randomized trail rarely support conclusions   from observational studies. 15 million Americans filling H.R.T.  So forget what you’re read about prescriptions annually  2002: Anti-oxidants like vitamins E & C &beta   carotene preventing heat disease estrogen therapy exposed as a hazard,  Fiber prevents colon cancer not a benefit, for health  21
  • 22. So, why is FSM silly? And please, rest assured,  it is very very silly stuff indeed.  Theories need an entrance exam  Many possible theories  one for each bias  Demand that a theory has past at least  some operational al test before we condone it, act on it. If no reason to accept the new, don’t  Trust the most what has been  challenged the most Karl Popper  22
  • 23. No things are “right”, but some things are “useful” Sure, one data set supports many theories.  But there are many many more theories that are  unsupported. No model is right, but some things are useful  (perform well on test data)  George Box  And many many many more ideas are useless  Can’t make predictions  Not defined enough to support (possible) refutation  23
  • 24. Wolfgang Pauli The quot;conscience of physicsquot;,  the critic to whom his colleagues were accountable.  Scathing in his dismissal of poor theories  often labeling it ganz falsch, utterly false.  But “ganz falsch” was not his most severe  criticism, He hated theories so unclearly presented as to be  untestable  unevaluatable,  Worse than wrong because they could not be  proven wrong. Not properly belonging within the realm of science,  even though posing as such.  Famously, he wrote of of such unclear paper:  ”This paper is right. It is not even wrong.quot;  24
  • 25. Believe those who seek the truth; doubt those who find it -Andre Gide.
  • 26. Don’t test once on just the training data Study more than the  average performance Also look at the  variance E.g. here, no  significant on new data after X=8 26
  • 27. If something works, poke it till it breaks i) Sort attributes on “infogain” ii) Learn using first N attributes labor soybean diabetes anneal A few variables are (often) enough 27
  • 28. Living with Uncertainty Check how training rate size effects theory  28
  • 29. Living with Uncertainty Launch learners with anomaly  detection and repair tools 29
  • 30. Living with uncertainty: An incremental discretizer + a Bayes count, alert, fix classifier where all inputs are all mono-classified Track average max likelihood for data processing in “era”’s of X instances Count: stuff seen in past Alert: if new counts different Contrast set learning Fix: find delta new to old Linear time inference, Very, very fast Tiny memory footprint  And, it works [Orrego, 2004]  F15 simulator data [courtesy B. Cukic]  Five flights: a,b,c,d,e  each with different off-nominal condition  imposed at “time” 15 Off-nominal condition not present in prior data  In all cases,  massive change detected 30
  • 31. Living with uncertainty Policy #1: exploration  Life is a Tolerate the sub-optimal, a little  balance Doing crazy things to learn new things  between Policy #2: exploitation  Fix your theories and base your work on those fixed ideas.  Popper: • most “science” is puzzle solving… • … within existing paradigms. • Sometimes the paradigm breakdowns…. • …prompting revolutionary research Human young: • Do crazy things (take long trips) • Less craziness as we grow older 31
  • 32. Tolerance of “exploration” Critical to the  American way America: history of  tolerance and acceptance 1945:  400 German rocket  scientists choose to surrender to the Yankees, not the Russians The choose their post-war  life based on their perceptions of American ideology Hence,  32
  • 33. Tolerance = hi-tech = $$$ R. Florida: The Economic  Geography of Talent, 2002 Annals of Association of American  Geographers 92(4), 2002,pp743-655 Best predictor for hi-tech industry  R2 0.42 to “coolness”  R2 0.49 to cultural amenities  R2 0.50 to median house value  R2 0.77 to “diversity” index  33
  • 34. Data Mining, Truth, Justice, the American Way & Flying Spaghetti Monsters “Superman, fights a never ending battle To make $$, for truth, justice, and the American way.quot; institutionalize exploration Old conclusions must No “truth”, and tolerance be constantly re-assessed all Is biased. A healthy hi-tech needs tolerance to support exploration and that the FSM is silly, but would consider revising that view if new evidence emerges 34
  • 35. Expose, and hose • quot;Part of education is to • quot;Part of science is to expose people to different expose people to the schools of thought.” critical and continual (re)evaluation of ideas.” - President George Bush, - Some guy called Timm, August 1, 2005 September 20, 2007 35