SlideShare uma empresa Scribd logo
1 de 42
Baixar para ler offline
Big Data 
& Research Methods 
PRESENTED BY 
Grant Stanley, CEO 
Tadd Wood, Chief Data Scientist 
Contemporary Analysis 
1209 Harney Street, Suite 200 
Omaha, NE 68102
Big Data & Research Methods 
INTRO 
The process of research is as 
important as the results. 
• Correct research methods improve results, 
• And allow others to collaborate and improve 
your work. 
Contemporary Analysis canworksmart.com
Big Data & Research Methods 
INTRO 
We’ll explore the dangers of: 
• Spurious Correlation 
• Sampling Errors 
• Model Selection 
• Heteroscedasticity 
• Overfitting 
• Lack of Background 
Contemporary Analysis canworksmart.com 
• Solutions instead of 
Theories 
• Lack of the Scientific 
Method 
• Correlation vs. 
Causation 
Text
Big Data & Research Methods 
INTRO 
Big Data can’t just be about 
collecting, processing & storing 
more data. 
It has to be put to use. We need to 
conduct research, build models, 
and develop reports. 
Contemporary Analysis canworksmart.com
Big Data & Research Methods 
THE DANGER OF FALSE POSITIVES 
The car has little impact without 
the highway or interstate. 
If we take Big Data beyond 
engineering, we are building 
the equivalent of the highway 
or interstate for the computer & 
Internet. 
Contemporary Analysis canworksmart.com
Big Data & Research Methods 
SPURIOUS RELATIONSHIPS 
Spurious relationships are when 
two or more events or variables 
have no direct causal connection, 
yet it may be wrongly inferred that 
they do, due to either coincidence 
or the presence of a certain third, 
unseen factor. 
Contemporary Analysis canworksmart.com
Big Data & Research Methods 
SPURIOUS RELATIONSHIPS 
Big Data Errors: Spurious Correlations 
140,000 
CORRELATIONS 
80,000 
SPURIOUS 20,000 
VARIABLES 500 1000 1500 2000 
Contemporary Analysis canworksmart.com
Big Data & Research Methods 
SPURIOUS RELATIONSHIPS 
Maine’s divorce rate with US margarine consumption 
8 
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 
DIVORCES PER 1000 PEOPLE 
Divorce rate in Maine 
Divorces per 1000 people (US Census) 
5 4.7 4.6 4.4 4.3 4.1 4.2 4.2 4.2 4.1 
Consumption of margarine (US) 
Per capita in pounds (USDA) 
8.2 7 6.5 5.3 5.2 4 4.6 4.5 4.2 3.7 
Correlation 0.992558 
Contemporary Analysis canworksmart.com 
MARGARINE CONSUMPTION (POUNDS) 
5 
4.8 
4.6 
4.4 
4.2 
4 
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 
9 
7 
6 
5 
4 
3 
DIVORCE RATE IN MAINE 
PER CAPITA CONSUMPTION OF MARGARINE (US)
Big Data & Research Methods 
SAMPLING 
There are two reasons for 
sampling a population: 
• The cost of collecting and processing data 
is too high or impossible. 
• To ensure that the results are representative 
of the population. 
Contemporary Analysis canworksmart.com
Big Data & Research Methods 
SAMPLING 
Sampling still matters in Big Data. 
Data is not information. It is simply 
a representation of information. 
You have to think about what the 
data you are using represents. 
Contemporary Analysis canworksmart.com
Big Data & Research Methods 
SAMPLING 
Is smartphone data representative of the population? 
Gender by Platform Age by Platform 
iPhone Android 
100% 
0% 
Contemporary Analysis canworksmart.com 
12% 
18 - 24 
iPhone Android 
100% 
0% 
57% 
MALE 
73% 
MALE 
43% 
FEMALE 
27% 
FEMALE 
7% 
17 OR YOUNGER 
13% 
17 OR YOUNGER 
17% 
18 - 24 
21% 
25 - 34 
30% 
25 - 34 
21% 
35 - 44 
21% 
35 - 44 
32% 
45+ 
25% 
45+
Big Data & Research Methods 
MODEL SELECTION 
OLS is not a catch all. 
You have to know your data. 
Is it continuous, discrete, binary, 
ordinal, or categorical? Is your 
data symmetric or asymmetric? Are 
there outliers? 
Contemporary Analysis canworksmart.com
Big Data & Research Methods 
MODEL SELECTION 
Contemporary Analysis canworksmart.com
Big Data & Research Methods 
HETEROSCEDASTICITY 
Heteroscedasticity refers to 
the circumstance in which the 
variability of a variable is unequal 
across the range of values of a 
second variable that predicts it. 
Contemporary Analysis canworksmart.com
Big Data & Research Methods 
HETEROSCEDASTICITY 
Predicting equipment pricing based on machine hours 
MARKET PRICE 
T2 
HOURS ON MACHINE 
T1 
Contemporary Analysis canworksmart.com 
T3 
^ 
= a + bx 
Y
Big Data & Research Methods 
Unbiased & Homoscedastic Biased & Homoscedastic Biased & Homoscedastic 
Unbiased & Heteroscedastic Biased & Heteroscedastic Biased & Heteroscedastic 
Contemporary Analysis canworksmart.com
Big Data & Research Methods 
OVERFITTING 
Overfitting occurs when a 
statistical model captures 
more than just the underlying 
relationships. 
The model is fitted to as much 
data as possible including random 
errors, outliers, and noise. 
Contemporary Analysis canworksmart.com
Big Data & Research Methods 
OVERFITTING 
An overfitted model nearly 
perfectly matches the training 
set, but does not perform well 
with new data. While an overfitted 
model looks great, it will have poor 
predictive performance. 
Contemporary Analysis canworksmart.com
Big Data & Research Methods 
OVERFITTING 
The mark of a good model isn’t 
how well it performs on the data 
used to build the model, but on 
fresh data outside of the training 
data set. 
Contemporary Analysis canworksmart.com
Big Data & Research Methods 
OVERFITTING 
Overfitting Example: Training Classification Table 
Contemporary Analysis canworksmart.com 
General Election (Predicted) 
General Election (Observed) Did not vote Voted Percentage Correct 
Did not vote 132423 3 99.99773% 
Voted 0 411099 100% 
Overall Correct Percentage 100%
Big Data & Research Methods 
OVERFITTING 
Overfitting Example: Prediction Classification Table 
Contemporary Analysis canworksmart.com 
General Election (Predicted) 
General Election (Observed) Did not vote Voted Percentage Correct 
Did not vote 35726 4068 90% 
Voted 45924 77199 63% 
Overall Correct Percentage 69%
Big Data & Research Methods 
OVERFITTING 
Overfitting Example: Variables 
Contemporary Analysis canworksmart.com 
95% C.I. for EXP(B) 
Variable B (Coefficients) Standard Error Wald Significance Lower Upper 
NumberOfPastRaces 63.840 106.208 .361 .548 .000 1.35E+118 
Primary_03072000_Voter -66.218 106.264 .388 .533 .000 4.95E+61 
General_1107200_Voter -61.971 106.219 .340 .560 .000 3.16E+63 
Special_05082001_Voter -58.129 111.165 .273 .601 .000 2.39E+69 
General_11062001_Voter -60.658 106.181 .326 .568 .000 1.09E+64 
Primary_05072002_Voter -57.806 99.816 .335 .563 .000 7.23E+59 
General_11052002_Voter -63.208 106.206 .354 .552 .000 8.94E+62 
Special_05062003_Voter -66.393 106.249 .390 .532 .000 4.03E+61 
General_11042003_Voter -64.056 106.209 .364 .546 .000 3.85E+62 
Primary_03022004_Voter -63.836 106.204 .361 .548 .000 4.76E+62 
Special_02052005_Voter -58.510 111.784 .274 .601 .000 5.50E+69 
General_11082005_Voter -65.617 106.238 .381 .537 .000 8.56E+61 
Special_02072006_Voter -56.952 305.188 .035 .852 .000 1.10E+235 
Primary_05022006_Voter -64.696 106.220 .371 .542 .000 2.08E+62 
General_11072006_Voter -64.074 106.210 .364 .546 .000 3.79E+62 
Primary_05082007_Voter -65.976 106.233 .386 .535 .000 5.93E+61 
Primary_09112007_Voter -57.949 15652.399 .000 .997 .000 — 
General_11062007_Voter -67.465 106.231 .403 .525 .000 1.33E+61 
General_12112007_Voter -75.855 106.274 .509 .475 .000 3.29E+57 
Primary_03042008_Voter -62.602 106.214 .347 .556 .000 1.67E+63 
General_11042008_Voter -64.100 106.220 .364 .546 .000 3.77E+62 
Primary_05052009_Voter -57.094 98.053 .339 .560 .000 4.56E+58 
Primary_09152009_Voter -54.792 7118.311 .000 .994 .000 — 
General_11032009_Voter -55.176 98.071 .317 .574 .000 3.28E+59 
Primary_05042010_Voter -65.564 106.234 .381 .537 .000 8.97E+61 
Primary_07132010_Voter -56.331 45432.804 .000 .999 .000 — 
Primary_09072010_Voter -57.607 3684.807 .000 .998 .000 — 
General_11022010_Voter -63.431 106.214 .357 .550 .000 7.28E+62 
Primary_05032011_Voter -57.848 136.939 .178 .673 .000 2.75E+91 
General_11082011_Voter -54.865 98.255 .312 .577 .000 6.42E+59 
Primary_03062012_Voter -55.419 95.847 .334 .563 .000 3.29E+57 
Primary_05072013_Voter -58.652 110.873 .280 .597 .000 8.00E+68 
General_11052013_Voter -62.617 106.196 .348 .555 .000 1.58E+63 
Constant -115.093 212.413 .294 .588
Big Data & Research Methods 
OVERFITTING 
Simple Model Example: Variables 
Contemporary Analysis canworksmart.com 
95% C.I. for EXP(B) 
Variable B (Coefficients) Standard Error Wald Significance Lower Upper 
Age_life_bin_1 .344 .019 312.341 .000 1.358 1.466 
Age_life_bin_2 .282 .017 266.954 .000 1.282 1.372 
Age_life_bin_3 .180 .017 109.330 .000 1.158 1.239 
Age_life_bin_4 .133 .018 53.146 .000 1.102 1.184 
Age_life_bin_5 .055 .019 8.719 .003 1.019 1.096 
Age_life_bin_7 -.342 .029 139.262 .000 .671 .752 
Age_life_bin_8 -1.949 .029 4636.533 .000 .135 .151 
Party_affiliation_D .523 .037 202.630 .000 1.570 1.814 
Party_affiliation_R .692 .027 656.239 .000 1.895 2.106 
NumberOfPastRaces .480 .002 63659.304 .000 1.611 1.623 
Constant -1.332 .017 6041.871 .000
Big Data & Research Methods 
OVERFITTING 
Simple Model Example: Training Classification Table 
Contemporary Analysis canworksmart.com 
General Election (Predicted) 
General Election (Observed) Did not vote Voted Percentage Correct 
Did not vote 95397 37029 72% 
Voted 43439 367660 89% 
Overall Correct Percentage 85%
Big Data & Research Methods 
OVERFITTING 
Simple Model Example: Prediction Classification Table 
Contemporary Analysis canworksmart.com 
General Election (Predicted) 
General Election (Observed) Did not vote Voted Percentage Correct 
Did not vote 72167 9483 88% 
Voted 15131 66136 81% 
Overall Correct Percentage 85%
Big Data & Research Methods 
OVERFITTING 
Big Data Errors: Spurious Correlations 
140,000 
CORRELATIONS 
80,000 
SPURIOUS 20,000 
VARIABLES 500 1000 1500 2000 
Contemporary Analysis canworksmart.com
Big Data & Research Methods 
OVERFITTING 
Overstuffing Example: Variables 
Contemporary Analysis canworksmart.com 
95% C.I. for EXP(B) 
Variable B (Coefficients) Standard Error Wald Significance Lower Upper 
Age_life_bin_1 .331 .020 286.120 .000 1.339 1.446 
Age_life_bin_2 .281 .017 263.325 .000 1.281 1.371 
Age_life_bin_3 .184 .017 113.157 .000 1.162 1.243 
Age_life_bin_4 .134 .018 53.857 .000 1.103 1.185 
Age_life_bin_5 .058 .019 9.629 .002 1.022 1.099 
Age_life_bin_7 -.348 .029 143.259 .000 .667 .748 
Age_life_bin_8 -1.959 .029 4687.305 .000 .133 .149 
Party_affiliation_D .513 .037 194.040 .000 1.554 1.796 
Party_affiliation_R .684 .027 637.417 .000 1.879 2.089 
NumberOfPastRaces .478 .002 62834.614 .000 1.608 1.620 
Residential_Zip_3 -.364 .127 8.181 .004 .541 .892 
Residential_Zip_7 .360 .063 32.902 .000 1.268 1.622 
Residential_Zip_8 .428 .218 3.834 .050 1.000 2.354 
Residential_Zip_16 -.125 .023 28.277 .000 .843 .924 
Residential_Zip_17 .127 .058 4.797 .029 1.013 1.272 
Residential_Zip_18 -.356 .044 64.141 .000 .642 .764 
Residential_Zip_19 -.283 .026 117.878 .000 .716 .793 
Residential_Zip_21 .115 .037 9.801 .002 1.044 1.206 
Residential_Zip_22 .113 .026 19.024 .000 1.064 1.178 
Residential_Zip_25 -.182 .024 59.045 .000 .796 .873 
Residential_Zip_26 .074 .032 5.248 .022 1.011 1.148 
Residential_Zip_27 -.132 .033 16.081 .000 .821 .935 
Residential_Zip_28 -.077 .023 11.484 .001 .885 .968 
Residential_Zip_29 -.160 .038 17.765 .000 .791 .918 
Residential_Zip_30 -.191 .044 18.638 .000 .758 .901 
Residential_Zip_33 -.059 .030 3.945 .047 .889 .999 
Residential_Zip_35 .104 .026 15.662 .000 1.054 1.168 
Residential_Zip_41 .140 .018 57.675 .000 1.109 1.193 
Residential_Zip_42 .156 .039 16.010 .000 1.083 1.262 
Residential_Zip_45 .138 .024 32.782 .000 1.095 1.204 
Residential_Zip_46 -.065 .018 12.838 .000 .904 .971 
Residential_Zip_48 .261 .022 136.998 .000 1.243 1.357 
Residential_Zip_50 .164 .025 41.633 .000 1.121 1.239 
Residential_Zip_51 .157 .031 26.169 .000 1.102 1.243 
Residential_Zip_53 .114 .033 11.628 .001 1.050 1.197 
Residential_Zip_54 .104 .029 13.215 .000 1.049 1.174 
Residential_Zip_56 .116 .032 13.238 .000 1.055 1.196 
Residential_Zip_59 .094 .032 8.647 .003 1.032 1.170 
Local_School_District_6 -.375 .055 47.296 .000 .618 .765 
Local_School_District_7 .078 .016 23.389 .000 1.047 1.115 
Local_School_District_9 -.501 .057 77.534 .000 .542 .677 
Local_School_District_10 -.255 .033 61.473 .000 .727 .826 
Constant -1.332 .018 5513.792 .000
Big Data & Research Methods 
OVERFITTING 
Overstuffing Example: Training Classification Table 
Contemporary Analysis canworksmart.com 
General Election (Predicted) 
General Election (Observed) Did not vote Voted Percentage Correct 
Did not vote 93029 39397 70% 
Voted 36228 374871 91% 
Overall Correct Percentage 86%
Big Data & Research Methods 
LACK OF BACKGROUND 
The farther we are from the work, 
the more likely we are to be tricked 
by the data. 
We owe it to the end user to 
get out of the library, and try to 
understand what we are modeling. 
Contemporary Analysis canworksmart.com
Big Data & Research Methods 
SOLUTIONS INSTEAD OF THEORIES 
There is an element of data 
science that should be frustrating, 
confusing, & despair inducing. 
It should make us stand back in 
awe of the complexity of the world, 
and not the simplicity to which we 
can reduce it to. 
Contemporary Analysis canworksmart.com
Big Data & Research Methods 
SOLUTIONS INSTEAD OF THEORIES 
“The great thing about economics, 
is that we admit that we know 
nothing about anything” 
- Thomas Piketty author of “Capital in the Twenty-First Century” 
Contemporary Analysis canworksmart.com
Big Data & Research Methods 
SOLUTIONS INSTEAD OF THEORIES 
As we learn more, we realize 
there’s more to learn. 
The hallmark of genius is the sharp 
awareness of what is and what is 
not possible. We become aware of 
complexity, ambiguity and nuance. 
Contemporary Analysis canworksmart.com
Big Data & Research Methods 
CORRELATION & CAUSATION 
The anthem of the Big Data 
age is “correlation does not 
imply causation.” 
Contemporary Analysis canworksmart.com
Big Data & Research Methods 
CORRELATION & CAUSATION 
The problem is that this statement 
is tautological. It is always correct, 
and can never be wrong. 
Contemporary Analysis canworksmart.com
Big Data & Research Methods 
CORRELATION & CAUSATION 
Don’t let people use it as a kill 
switch to discussion. 
• True causation is pretty rare. There are few 
things where, if I do this, this will happen. 
• Research should create discussions not shut 
them down. Models can’t explain everything. 
There is always an “X” variable that captures 
the unknown. 
Contemporary Analysis canworksmart.com
Big Data & Research Methods 
SOLUTIONS INSTEAD OF THEORIES 
Contemporary Analysis canworksmart.com
Big Data & Research Methods 
FAILING TO AUDIT 
Primary reasons that we fail to 
have our work peer-reviewed: 
• Lack of funding to “repeat” work. 
• We hide behind the complexity of our work. 
Contemporary Analysis canworksmart.com
Big Data & Research Methods 
FAILING TO AUDIT 
Contemporary Analysis canworksmart.com
Big Data & Research Methods 
FAILING TO AUDIT 
Other tools: 
• rMarkdown: for creating webpages and 
documents in R 
• iPython notebooks: for creating websites and 
documents interactively in Python 
• Galaxy Project: for creating reproducible 
workflows. (Favorable for people with less 
scripting experience.) 
Contemporary Analysis canworksmart.com
Big Data & Research Methods 
TRAINING 
We offer 
training on: 
• Data Visualization 
• Managerial Statistics 
• Predictive Modeling 
Contemporary Analysis canworksmart.com 
You will be 
introduced to: 
• R 
• SPSS 
• Tableau 
• MySQL 
• Git
Big Data & Research Methods 
TRAINING 
Trainings sessions last 3 days. 
We will work through projects, 
practice different approaches, 
and which approach is the best for 
different scenarios. 
Contemporary Analysis canworksmart.com
Big Data & Research Methods 
QUESTIONS? 
Grant Stanley, CEO 
Contemporary Analysis 
1209 Harney Street, Suite 200 
Omaha, NE 68102 
grant@canworksmart.com 
(402) 679-8398 
Contemporary Analysis canworksmart.com 
Questions & Learn more.

Mais conteúdo relacionado

Destaque

Online Analytical Processing
Online Analytical ProcessingOnline Analytical Processing
Online Analytical Processingnayakslideshare
 
Session1 methods research_question
Session1 methods research_questionSession1 methods research_question
Session1 methods research_questionmilolostinspace
 
Descriptive Statistics and Data Visualization
Descriptive Statistics and Data VisualizationDescriptive Statistics and Data Visualization
Descriptive Statistics and Data VisualizationDouglas Joubert
 
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007
 
Session 2 Methods qualitative_quantitative
Session 2 Methods qualitative_quantitativeSession 2 Methods qualitative_quantitative
Session 2 Methods qualitative_quantitativemilolostinspace
 
Data Analysis Basics - Workshop (Frameworks)
Data Analysis Basics - Workshop (Frameworks)Data Analysis Basics - Workshop (Frameworks)
Data Analysis Basics - Workshop (Frameworks)Angela Obias
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007
 
Chapter - 4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 4 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007
 
Emotional intellegence
Emotional intellegenceEmotional intellegence
Emotional intellegenceAmber Osborn
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olapData Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olap
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olapSalah Amean
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessingSalah Amean
 
Database Modeling Using Entity.. Weak And Strong Entity Types
Database Modeling Using Entity.. Weak And Strong Entity TypesDatabase Modeling Using Entity.. Weak And Strong Entity Types
Database Modeling Using Entity.. Weak And Strong Entity Typesaakanksha s
 
Chapter - 8.1 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.1 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 8.1 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.1 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007
 
Expert systems in artificial intelegence
Expert systems in artificial intelegenceExpert systems in artificial intelegence
Expert systems in artificial intelegenceAnna Aquarian
 
Data Modeling Presentations I
Data Modeling Presentations IData Modeling Presentations I
Data Modeling Presentations Icd_crisci
 
Multidimensional Database Design & Architecture
Multidimensional Database Design & ArchitectureMultidimensional Database Design & Architecture
Multidimensional Database Design & Architecturehasanshan
 
Mathematical modelling
Mathematical modellingMathematical modelling
Mathematical modellingSadia Zareen
 
Mathematical modelling
Mathematical modellingMathematical modelling
Mathematical modellingBhavin Tandel
 

Destaque (20)

Online Analytical Processing
Online Analytical ProcessingOnline Analytical Processing
Online Analytical Processing
 
Chapter1 IFM
Chapter1 IFMChapter1 IFM
Chapter1 IFM
 
Session1 methods research_question
Session1 methods research_questionSession1 methods research_question
Session1 methods research_question
 
Descriptive Statistics and Data Visualization
Descriptive Statistics and Data VisualizationDescriptive Statistics and Data Visualization
Descriptive Statistics and Data Visualization
 
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Session 2 Methods qualitative_quantitative
Session 2 Methods qualitative_quantitativeSession 2 Methods qualitative_quantitative
Session 2 Methods qualitative_quantitative
 
Data Analysis Basics - Workshop (Frameworks)
Data Analysis Basics - Workshop (Frameworks)Data Analysis Basics - Workshop (Frameworks)
Data Analysis Basics - Workshop (Frameworks)
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Chapter - 4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 4 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Emotional intellegence
Emotional intellegenceEmotional intellegence
Emotional intellegence
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olapData Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olap
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
 
Machine Learning, Stock Market and Chaos
Machine Learning, Stock Market and Chaos Machine Learning, Stock Market and Chaos
Machine Learning, Stock Market and Chaos
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
 
Database Modeling Using Entity.. Weak And Strong Entity Types
Database Modeling Using Entity.. Weak And Strong Entity TypesDatabase Modeling Using Entity.. Weak And Strong Entity Types
Database Modeling Using Entity.. Weak And Strong Entity Types
 
Chapter - 8.1 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.1 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 8.1 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.1 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Expert systems in artificial intelegence
Expert systems in artificial intelegenceExpert systems in artificial intelegence
Expert systems in artificial intelegence
 
Data Modeling Presentations I
Data Modeling Presentations IData Modeling Presentations I
Data Modeling Presentations I
 
Multidimensional Database Design & Architecture
Multidimensional Database Design & ArchitectureMultidimensional Database Design & Architecture
Multidimensional Database Design & Architecture
 
Mathematical modelling
Mathematical modellingMathematical modelling
Mathematical modelling
 
Mathematical modelling
Mathematical modellingMathematical modelling
Mathematical modelling
 

Semelhante a Big Data Research Methods – Contemporary Analysis

Measuring customer effort with Top Tasks - Gerry McGovern
Measuring customer effort with Top Tasks - Gerry McGovernMeasuring customer effort with Top Tasks - Gerry McGovern
Measuring customer effort with Top Tasks - Gerry McGovernuxbri
 
howtoturnbigdataintobetterdecisionspauwelsemac2016
howtoturnbigdataintobetterdecisionspauwelsemac2016howtoturnbigdataintobetterdecisionspauwelsemac2016
howtoturnbigdataintobetterdecisionspauwelsemac2016Koen Pauwels
 
Entering the Data Analytics industry
Entering the Data Analytics industryEntering the Data Analytics industry
Entering the Data Analytics industryGramener
 
Global Survey Across 32 Countries Shows Worker Appetite for Social Tools is I...
Global Survey Across 32 Countries Shows Worker Appetite for Social Tools is I...Global Survey Across 32 Countries Shows Worker Appetite for Social Tools is I...
Global Survey Across 32 Countries Shows Worker Appetite for Social Tools is I...Microsoft
 
Data drift and machine learning
Data drift and machine learningData drift and machine learning
Data drift and machine learningSmita Agrawal
 
Data drift and machine learning
Data drift and machine learningData drift and machine learning
Data drift and machine learningSmita Agrawal
 
Creating Better Customer Experiences Online (with Top Tasks) presented by Ger...
Creating Better Customer Experiences Online (with Top Tasks) presented by Ger...Creating Better Customer Experiences Online (with Top Tasks) presented by Ger...
Creating Better Customer Experiences Online (with Top Tasks) presented by Ger...Patrick Van Renterghem
 
Generating and Qualifying Inbound SMB Leads
Generating and Qualifying Inbound SMB LeadsGenerating and Qualifying Inbound SMB Leads
Generating and Qualifying Inbound SMB LeadsBredin, Inc.
 
Melda Elmas-Project1-ppt.pptx
Melda Elmas-Project1-ppt.pptxMelda Elmas-Project1-ppt.pptx
Melda Elmas-Project1-ppt.pptxImelda903061
 
Bad Data is Polluting Big Data
Bad Data is Polluting Big DataBad Data is Polluting Big Data
Bad Data is Polluting Big DataStreamsets Inc.
 
Creating a Big data Strategy with Tactics for Quick Implementation
Creating a Big data Strategy with Tactics for Quick ImplementationCreating a Big data Strategy with Tactics for Quick Implementation
Creating a Big data Strategy with Tactics for Quick ImplementationLewandog, Inc,
 
Echelon Asia Summit 2017 Startup Academy Workshop
Echelon Asia Summit 2017 Startup Academy WorkshopEchelon Asia Summit 2017 Startup Academy Workshop
Echelon Asia Summit 2017 Startup Academy WorkshopGarrett Teoh Hor Keong
 
Data Integrity Trends
Data Integrity TrendsData Integrity Trends
Data Integrity TrendsPrecisely
 
Selling SaaS to SMBs
Selling SaaS to SMBsSelling SaaS to SMBs
Selling SaaS to SMBsBredin, Inc.
 
"Ready or Not, Here Comes 2015: Marketing Trends to Master" TrendLab Webinar
"Ready or Not, Here Comes 2015: Marketing Trends to Master" TrendLab Webinar"Ready or Not, Here Comes 2015: Marketing Trends to Master" TrendLab Webinar
"Ready or Not, Here Comes 2015: Marketing Trends to Master" TrendLab WebinarBluespire Marketing
 
Business and Data Analytics Collaborative April Meetup
Business and Data Analytics Collaborative April MeetupBusiness and Data Analytics Collaborative April Meetup
Business and Data Analytics Collaborative April MeetupKen Tucker
 
Economics & Statistics Insights in Data Science by DataPerts Technologies
Economics & Statistics Insights in Data Science by DataPerts TechnologiesEconomics & Statistics Insights in Data Science by DataPerts Technologies
Economics & Statistics Insights in Data Science by DataPerts TechnologiesRavindra Panwar
 
Looking for patterns in the data
Looking for patterns in the dataLooking for patterns in the data
Looking for patterns in the dataRay Poynter
 

Semelhante a Big Data Research Methods – Contemporary Analysis (20)

Measuring customer effort with Top Tasks - Gerry McGovern
Measuring customer effort with Top Tasks - Gerry McGovernMeasuring customer effort with Top Tasks - Gerry McGovern
Measuring customer effort with Top Tasks - Gerry McGovern
 
howtoturnbigdataintobetterdecisionspauwelsemac2016
howtoturnbigdataintobetterdecisionspauwelsemac2016howtoturnbigdataintobetterdecisionspauwelsemac2016
howtoturnbigdataintobetterdecisionspauwelsemac2016
 
1530 track1 rosenbaum
1530 track1 rosenbaum1530 track1 rosenbaum
1530 track1 rosenbaum
 
Entering the Data Analytics industry
Entering the Data Analytics industryEntering the Data Analytics industry
Entering the Data Analytics industry
 
Global Survey Across 32 Countries Shows Worker Appetite for Social Tools is I...
Global Survey Across 32 Countries Shows Worker Appetite for Social Tools is I...Global Survey Across 32 Countries Shows Worker Appetite for Social Tools is I...
Global Survey Across 32 Countries Shows Worker Appetite for Social Tools is I...
 
Data drift and machine learning
Data drift and machine learningData drift and machine learning
Data drift and machine learning
 
Data drift and machine learning
Data drift and machine learningData drift and machine learning
Data drift and machine learning
 
Creating Better Customer Experiences Online (with Top Tasks) presented by Ger...
Creating Better Customer Experiences Online (with Top Tasks) presented by Ger...Creating Better Customer Experiences Online (with Top Tasks) presented by Ger...
Creating Better Customer Experiences Online (with Top Tasks) presented by Ger...
 
Generating and Qualifying Inbound SMB Leads
Generating and Qualifying Inbound SMB LeadsGenerating and Qualifying Inbound SMB Leads
Generating and Qualifying Inbound SMB Leads
 
Melda Elmas-Project1-ppt.pptx
Melda Elmas-Project1-ppt.pptxMelda Elmas-Project1-ppt.pptx
Melda Elmas-Project1-ppt.pptx
 
Fintech Facebook Sentiment Analysis
Fintech Facebook Sentiment AnalysisFintech Facebook Sentiment Analysis
Fintech Facebook Sentiment Analysis
 
Bad Data is Polluting Big Data
Bad Data is Polluting Big DataBad Data is Polluting Big Data
Bad Data is Polluting Big Data
 
Creating a Big data Strategy with Tactics for Quick Implementation
Creating a Big data Strategy with Tactics for Quick ImplementationCreating a Big data Strategy with Tactics for Quick Implementation
Creating a Big data Strategy with Tactics for Quick Implementation
 
Echelon Asia Summit 2017 Startup Academy Workshop
Echelon Asia Summit 2017 Startup Academy WorkshopEchelon Asia Summit 2017 Startup Academy Workshop
Echelon Asia Summit 2017 Startup Academy Workshop
 
Data Integrity Trends
Data Integrity TrendsData Integrity Trends
Data Integrity Trends
 
Selling SaaS to SMBs
Selling SaaS to SMBsSelling SaaS to SMBs
Selling SaaS to SMBs
 
"Ready or Not, Here Comes 2015: Marketing Trends to Master" TrendLab Webinar
"Ready or Not, Here Comes 2015: Marketing Trends to Master" TrendLab Webinar"Ready or Not, Here Comes 2015: Marketing Trends to Master" TrendLab Webinar
"Ready or Not, Here Comes 2015: Marketing Trends to Master" TrendLab Webinar
 
Business and Data Analytics Collaborative April Meetup
Business and Data Analytics Collaborative April MeetupBusiness and Data Analytics Collaborative April Meetup
Business and Data Analytics Collaborative April Meetup
 
Economics & Statistics Insights in Data Science by DataPerts Technologies
Economics & Statistics Insights in Data Science by DataPerts TechnologiesEconomics & Statistics Insights in Data Science by DataPerts Technologies
Economics & Statistics Insights in Data Science by DataPerts Technologies
 
Looking for patterns in the data
Looking for patterns in the dataLooking for patterns in the data
Looking for patterns in the data
 

Último

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx9to5mart
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 

Último (20)

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 

Big Data Research Methods – Contemporary Analysis

  • 1. Big Data & Research Methods PRESENTED BY Grant Stanley, CEO Tadd Wood, Chief Data Scientist Contemporary Analysis 1209 Harney Street, Suite 200 Omaha, NE 68102
  • 2. Big Data & Research Methods INTRO The process of research is as important as the results. • Correct research methods improve results, • And allow others to collaborate and improve your work. Contemporary Analysis canworksmart.com
  • 3. Big Data & Research Methods INTRO We’ll explore the dangers of: • Spurious Correlation • Sampling Errors • Model Selection • Heteroscedasticity • Overfitting • Lack of Background Contemporary Analysis canworksmart.com • Solutions instead of Theories • Lack of the Scientific Method • Correlation vs. Causation Text
  • 4. Big Data & Research Methods INTRO Big Data can’t just be about collecting, processing & storing more data. It has to be put to use. We need to conduct research, build models, and develop reports. Contemporary Analysis canworksmart.com
  • 5. Big Data & Research Methods THE DANGER OF FALSE POSITIVES The car has little impact without the highway or interstate. If we take Big Data beyond engineering, we are building the equivalent of the highway or interstate for the computer & Internet. Contemporary Analysis canworksmart.com
  • 6. Big Data & Research Methods SPURIOUS RELATIONSHIPS Spurious relationships are when two or more events or variables have no direct causal connection, yet it may be wrongly inferred that they do, due to either coincidence or the presence of a certain third, unseen factor. Contemporary Analysis canworksmart.com
  • 7. Big Data & Research Methods SPURIOUS RELATIONSHIPS Big Data Errors: Spurious Correlations 140,000 CORRELATIONS 80,000 SPURIOUS 20,000 VARIABLES 500 1000 1500 2000 Contemporary Analysis canworksmart.com
  • 8. Big Data & Research Methods SPURIOUS RELATIONSHIPS Maine’s divorce rate with US margarine consumption 8 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 DIVORCES PER 1000 PEOPLE Divorce rate in Maine Divorces per 1000 people (US Census) 5 4.7 4.6 4.4 4.3 4.1 4.2 4.2 4.2 4.1 Consumption of margarine (US) Per capita in pounds (USDA) 8.2 7 6.5 5.3 5.2 4 4.6 4.5 4.2 3.7 Correlation 0.992558 Contemporary Analysis canworksmart.com MARGARINE CONSUMPTION (POUNDS) 5 4.8 4.6 4.4 4.2 4 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 9 7 6 5 4 3 DIVORCE RATE IN MAINE PER CAPITA CONSUMPTION OF MARGARINE (US)
  • 9. Big Data & Research Methods SAMPLING There are two reasons for sampling a population: • The cost of collecting and processing data is too high or impossible. • To ensure that the results are representative of the population. Contemporary Analysis canworksmart.com
  • 10. Big Data & Research Methods SAMPLING Sampling still matters in Big Data. Data is not information. It is simply a representation of information. You have to think about what the data you are using represents. Contemporary Analysis canworksmart.com
  • 11. Big Data & Research Methods SAMPLING Is smartphone data representative of the population? Gender by Platform Age by Platform iPhone Android 100% 0% Contemporary Analysis canworksmart.com 12% 18 - 24 iPhone Android 100% 0% 57% MALE 73% MALE 43% FEMALE 27% FEMALE 7% 17 OR YOUNGER 13% 17 OR YOUNGER 17% 18 - 24 21% 25 - 34 30% 25 - 34 21% 35 - 44 21% 35 - 44 32% 45+ 25% 45+
  • 12. Big Data & Research Methods MODEL SELECTION OLS is not a catch all. You have to know your data. Is it continuous, discrete, binary, ordinal, or categorical? Is your data symmetric or asymmetric? Are there outliers? Contemporary Analysis canworksmart.com
  • 13. Big Data & Research Methods MODEL SELECTION Contemporary Analysis canworksmart.com
  • 14. Big Data & Research Methods HETEROSCEDASTICITY Heteroscedasticity refers to the circumstance in which the variability of a variable is unequal across the range of values of a second variable that predicts it. Contemporary Analysis canworksmart.com
  • 15. Big Data & Research Methods HETEROSCEDASTICITY Predicting equipment pricing based on machine hours MARKET PRICE T2 HOURS ON MACHINE T1 Contemporary Analysis canworksmart.com T3 ^ = a + bx Y
  • 16. Big Data & Research Methods Unbiased & Homoscedastic Biased & Homoscedastic Biased & Homoscedastic Unbiased & Heteroscedastic Biased & Heteroscedastic Biased & Heteroscedastic Contemporary Analysis canworksmart.com
  • 17. Big Data & Research Methods OVERFITTING Overfitting occurs when a statistical model captures more than just the underlying relationships. The model is fitted to as much data as possible including random errors, outliers, and noise. Contemporary Analysis canworksmart.com
  • 18. Big Data & Research Methods OVERFITTING An overfitted model nearly perfectly matches the training set, but does not perform well with new data. While an overfitted model looks great, it will have poor predictive performance. Contemporary Analysis canworksmart.com
  • 19. Big Data & Research Methods OVERFITTING The mark of a good model isn’t how well it performs on the data used to build the model, but on fresh data outside of the training data set. Contemporary Analysis canworksmart.com
  • 20. Big Data & Research Methods OVERFITTING Overfitting Example: Training Classification Table Contemporary Analysis canworksmart.com General Election (Predicted) General Election (Observed) Did not vote Voted Percentage Correct Did not vote 132423 3 99.99773% Voted 0 411099 100% Overall Correct Percentage 100%
  • 21. Big Data & Research Methods OVERFITTING Overfitting Example: Prediction Classification Table Contemporary Analysis canworksmart.com General Election (Predicted) General Election (Observed) Did not vote Voted Percentage Correct Did not vote 35726 4068 90% Voted 45924 77199 63% Overall Correct Percentage 69%
  • 22. Big Data & Research Methods OVERFITTING Overfitting Example: Variables Contemporary Analysis canworksmart.com 95% C.I. for EXP(B) Variable B (Coefficients) Standard Error Wald Significance Lower Upper NumberOfPastRaces 63.840 106.208 .361 .548 .000 1.35E+118 Primary_03072000_Voter -66.218 106.264 .388 .533 .000 4.95E+61 General_1107200_Voter -61.971 106.219 .340 .560 .000 3.16E+63 Special_05082001_Voter -58.129 111.165 .273 .601 .000 2.39E+69 General_11062001_Voter -60.658 106.181 .326 .568 .000 1.09E+64 Primary_05072002_Voter -57.806 99.816 .335 .563 .000 7.23E+59 General_11052002_Voter -63.208 106.206 .354 .552 .000 8.94E+62 Special_05062003_Voter -66.393 106.249 .390 .532 .000 4.03E+61 General_11042003_Voter -64.056 106.209 .364 .546 .000 3.85E+62 Primary_03022004_Voter -63.836 106.204 .361 .548 .000 4.76E+62 Special_02052005_Voter -58.510 111.784 .274 .601 .000 5.50E+69 General_11082005_Voter -65.617 106.238 .381 .537 .000 8.56E+61 Special_02072006_Voter -56.952 305.188 .035 .852 .000 1.10E+235 Primary_05022006_Voter -64.696 106.220 .371 .542 .000 2.08E+62 General_11072006_Voter -64.074 106.210 .364 .546 .000 3.79E+62 Primary_05082007_Voter -65.976 106.233 .386 .535 .000 5.93E+61 Primary_09112007_Voter -57.949 15652.399 .000 .997 .000 — General_11062007_Voter -67.465 106.231 .403 .525 .000 1.33E+61 General_12112007_Voter -75.855 106.274 .509 .475 .000 3.29E+57 Primary_03042008_Voter -62.602 106.214 .347 .556 .000 1.67E+63 General_11042008_Voter -64.100 106.220 .364 .546 .000 3.77E+62 Primary_05052009_Voter -57.094 98.053 .339 .560 .000 4.56E+58 Primary_09152009_Voter -54.792 7118.311 .000 .994 .000 — General_11032009_Voter -55.176 98.071 .317 .574 .000 3.28E+59 Primary_05042010_Voter -65.564 106.234 .381 .537 .000 8.97E+61 Primary_07132010_Voter -56.331 45432.804 .000 .999 .000 — Primary_09072010_Voter -57.607 3684.807 .000 .998 .000 — General_11022010_Voter -63.431 106.214 .357 .550 .000 7.28E+62 Primary_05032011_Voter -57.848 136.939 .178 .673 .000 2.75E+91 General_11082011_Voter -54.865 98.255 .312 .577 .000 6.42E+59 Primary_03062012_Voter -55.419 95.847 .334 .563 .000 3.29E+57 Primary_05072013_Voter -58.652 110.873 .280 .597 .000 8.00E+68 General_11052013_Voter -62.617 106.196 .348 .555 .000 1.58E+63 Constant -115.093 212.413 .294 .588
  • 23. Big Data & Research Methods OVERFITTING Simple Model Example: Variables Contemporary Analysis canworksmart.com 95% C.I. for EXP(B) Variable B (Coefficients) Standard Error Wald Significance Lower Upper Age_life_bin_1 .344 .019 312.341 .000 1.358 1.466 Age_life_bin_2 .282 .017 266.954 .000 1.282 1.372 Age_life_bin_3 .180 .017 109.330 .000 1.158 1.239 Age_life_bin_4 .133 .018 53.146 .000 1.102 1.184 Age_life_bin_5 .055 .019 8.719 .003 1.019 1.096 Age_life_bin_7 -.342 .029 139.262 .000 .671 .752 Age_life_bin_8 -1.949 .029 4636.533 .000 .135 .151 Party_affiliation_D .523 .037 202.630 .000 1.570 1.814 Party_affiliation_R .692 .027 656.239 .000 1.895 2.106 NumberOfPastRaces .480 .002 63659.304 .000 1.611 1.623 Constant -1.332 .017 6041.871 .000
  • 24. Big Data & Research Methods OVERFITTING Simple Model Example: Training Classification Table Contemporary Analysis canworksmart.com General Election (Predicted) General Election (Observed) Did not vote Voted Percentage Correct Did not vote 95397 37029 72% Voted 43439 367660 89% Overall Correct Percentage 85%
  • 25. Big Data & Research Methods OVERFITTING Simple Model Example: Prediction Classification Table Contemporary Analysis canworksmart.com General Election (Predicted) General Election (Observed) Did not vote Voted Percentage Correct Did not vote 72167 9483 88% Voted 15131 66136 81% Overall Correct Percentage 85%
  • 26. Big Data & Research Methods OVERFITTING Big Data Errors: Spurious Correlations 140,000 CORRELATIONS 80,000 SPURIOUS 20,000 VARIABLES 500 1000 1500 2000 Contemporary Analysis canworksmart.com
  • 27. Big Data & Research Methods OVERFITTING Overstuffing Example: Variables Contemporary Analysis canworksmart.com 95% C.I. for EXP(B) Variable B (Coefficients) Standard Error Wald Significance Lower Upper Age_life_bin_1 .331 .020 286.120 .000 1.339 1.446 Age_life_bin_2 .281 .017 263.325 .000 1.281 1.371 Age_life_bin_3 .184 .017 113.157 .000 1.162 1.243 Age_life_bin_4 .134 .018 53.857 .000 1.103 1.185 Age_life_bin_5 .058 .019 9.629 .002 1.022 1.099 Age_life_bin_7 -.348 .029 143.259 .000 .667 .748 Age_life_bin_8 -1.959 .029 4687.305 .000 .133 .149 Party_affiliation_D .513 .037 194.040 .000 1.554 1.796 Party_affiliation_R .684 .027 637.417 .000 1.879 2.089 NumberOfPastRaces .478 .002 62834.614 .000 1.608 1.620 Residential_Zip_3 -.364 .127 8.181 .004 .541 .892 Residential_Zip_7 .360 .063 32.902 .000 1.268 1.622 Residential_Zip_8 .428 .218 3.834 .050 1.000 2.354 Residential_Zip_16 -.125 .023 28.277 .000 .843 .924 Residential_Zip_17 .127 .058 4.797 .029 1.013 1.272 Residential_Zip_18 -.356 .044 64.141 .000 .642 .764 Residential_Zip_19 -.283 .026 117.878 .000 .716 .793 Residential_Zip_21 .115 .037 9.801 .002 1.044 1.206 Residential_Zip_22 .113 .026 19.024 .000 1.064 1.178 Residential_Zip_25 -.182 .024 59.045 .000 .796 .873 Residential_Zip_26 .074 .032 5.248 .022 1.011 1.148 Residential_Zip_27 -.132 .033 16.081 .000 .821 .935 Residential_Zip_28 -.077 .023 11.484 .001 .885 .968 Residential_Zip_29 -.160 .038 17.765 .000 .791 .918 Residential_Zip_30 -.191 .044 18.638 .000 .758 .901 Residential_Zip_33 -.059 .030 3.945 .047 .889 .999 Residential_Zip_35 .104 .026 15.662 .000 1.054 1.168 Residential_Zip_41 .140 .018 57.675 .000 1.109 1.193 Residential_Zip_42 .156 .039 16.010 .000 1.083 1.262 Residential_Zip_45 .138 .024 32.782 .000 1.095 1.204 Residential_Zip_46 -.065 .018 12.838 .000 .904 .971 Residential_Zip_48 .261 .022 136.998 .000 1.243 1.357 Residential_Zip_50 .164 .025 41.633 .000 1.121 1.239 Residential_Zip_51 .157 .031 26.169 .000 1.102 1.243 Residential_Zip_53 .114 .033 11.628 .001 1.050 1.197 Residential_Zip_54 .104 .029 13.215 .000 1.049 1.174 Residential_Zip_56 .116 .032 13.238 .000 1.055 1.196 Residential_Zip_59 .094 .032 8.647 .003 1.032 1.170 Local_School_District_6 -.375 .055 47.296 .000 .618 .765 Local_School_District_7 .078 .016 23.389 .000 1.047 1.115 Local_School_District_9 -.501 .057 77.534 .000 .542 .677 Local_School_District_10 -.255 .033 61.473 .000 .727 .826 Constant -1.332 .018 5513.792 .000
  • 28. Big Data & Research Methods OVERFITTING Overstuffing Example: Training Classification Table Contemporary Analysis canworksmart.com General Election (Predicted) General Election (Observed) Did not vote Voted Percentage Correct Did not vote 93029 39397 70% Voted 36228 374871 91% Overall Correct Percentage 86%
  • 29. Big Data & Research Methods LACK OF BACKGROUND The farther we are from the work, the more likely we are to be tricked by the data. We owe it to the end user to get out of the library, and try to understand what we are modeling. Contemporary Analysis canworksmart.com
  • 30. Big Data & Research Methods SOLUTIONS INSTEAD OF THEORIES There is an element of data science that should be frustrating, confusing, & despair inducing. It should make us stand back in awe of the complexity of the world, and not the simplicity to which we can reduce it to. Contemporary Analysis canworksmart.com
  • 31. Big Data & Research Methods SOLUTIONS INSTEAD OF THEORIES “The great thing about economics, is that we admit that we know nothing about anything” - Thomas Piketty author of “Capital in the Twenty-First Century” Contemporary Analysis canworksmart.com
  • 32. Big Data & Research Methods SOLUTIONS INSTEAD OF THEORIES As we learn more, we realize there’s more to learn. The hallmark of genius is the sharp awareness of what is and what is not possible. We become aware of complexity, ambiguity and nuance. Contemporary Analysis canworksmart.com
  • 33. Big Data & Research Methods CORRELATION & CAUSATION The anthem of the Big Data age is “correlation does not imply causation.” Contemporary Analysis canworksmart.com
  • 34. Big Data & Research Methods CORRELATION & CAUSATION The problem is that this statement is tautological. It is always correct, and can never be wrong. Contemporary Analysis canworksmart.com
  • 35. Big Data & Research Methods CORRELATION & CAUSATION Don’t let people use it as a kill switch to discussion. • True causation is pretty rare. There are few things where, if I do this, this will happen. • Research should create discussions not shut them down. Models can’t explain everything. There is always an “X” variable that captures the unknown. Contemporary Analysis canworksmart.com
  • 36. Big Data & Research Methods SOLUTIONS INSTEAD OF THEORIES Contemporary Analysis canworksmart.com
  • 37. Big Data & Research Methods FAILING TO AUDIT Primary reasons that we fail to have our work peer-reviewed: • Lack of funding to “repeat” work. • We hide behind the complexity of our work. Contemporary Analysis canworksmart.com
  • 38. Big Data & Research Methods FAILING TO AUDIT Contemporary Analysis canworksmart.com
  • 39. Big Data & Research Methods FAILING TO AUDIT Other tools: • rMarkdown: for creating webpages and documents in R • iPython notebooks: for creating websites and documents interactively in Python • Galaxy Project: for creating reproducible workflows. (Favorable for people with less scripting experience.) Contemporary Analysis canworksmart.com
  • 40. Big Data & Research Methods TRAINING We offer training on: • Data Visualization • Managerial Statistics • Predictive Modeling Contemporary Analysis canworksmart.com You will be introduced to: • R • SPSS • Tableau • MySQL • Git
  • 41. Big Data & Research Methods TRAINING Trainings sessions last 3 days. We will work through projects, practice different approaches, and which approach is the best for different scenarios. Contemporary Analysis canworksmart.com
  • 42. Big Data & Research Methods QUESTIONS? Grant Stanley, CEO Contemporary Analysis 1209 Harney Street, Suite 200 Omaha, NE 68102 grant@canworksmart.com (402) 679-8398 Contemporary Analysis canworksmart.com Questions & Learn more.