O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Data fluency for the 21st century

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Data Fluency for the 21st
Century
Martin Frigaard & Peter Spangler
access these slides: http://bit.ly/data-fluency-slides
...
Objectives
● Why are you here?
● Operational definitions
● Basic skills
● Data analysis toolkit
● Communicating with data
...
Why are you here?
Data skills are in high demand!
'Data scientist' has been the sexiest job for over 5 years. Fortunately,...
Anúncio
Anúncio
Anúncio
Anúncio
Próximos SlideShares
Data Literacy
Data Literacy
Carregando em…3
×

Confira estes a seguir

1 de 22 Anúncio
Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a Data fluency for the 21st century (20)

Anúncio

Mais recentes (20)

Data fluency for the 21st century

  1. 1. Data Fluency for the 21st Century Martin Frigaard & Peter Spangler access these slides: http://bit.ly/data-fluency-slides icons by https://www.freepik.com/ comment on these slides: http://bit.ly/data-fluency-slides
  2. 2. Objectives ● Why are you here? ● Operational definitions ● Basic skills ● Data analysis toolkit ● Communicating with data ● Questions
  3. 3. Why are you here? Data skills are in high demand! 'Data scientist' has been the sexiest job for over 5 years. Fortunately, many of the problems businesses and organizations face do not require someone with a PhD in machine learning, or a fancy software solution. Many of these problems can be solved by people with domain knowledge, data analysis skills, curiosity and the ability to communicate.
  4. 4. Why are you here? Government agencies, nonprofits, and non-governmental organizations are also recognizing the need for data analysis skills - Data analysis has become an essential tool for all policy makers, agencies, and community action organizations to demonstrate the evidence for their ideas. - Data for Democracy: "We work together to make the world a better place. At the heart of our collective efforts is how data and technology can be used for good. We work to help shape a better future and make positive changes in communities around the globe." https://www.datafordemocracy.org/about-us - The Civic Analytics Network: "The network will collaborate on shared projects that advance the use of data visualization and predictive analytics in solving important urban problems related to economic opportunity, poverty reduction, and addressing the root causes of social problems of equity and opportunity." https://datasmart.ash.harvard.edu/news/article/about-the-civic-analytics-network-826 - Our World in Data: "We cannot know what is happening in the world from the daily news alone. The news media focuses on single events, too often missing the long-lasting, forceful changes that reshape the world we live in." https://ourworldindata.org/about
  5. 5. Why should more people be here? Today, everyone needs to understand how data and statistics are shaping the world we live in Data are used to represent and nearly every aspect of life... - Redistricting has a huge effect on U.S. politics but is greatly misunderstood. This project uncovers what’s really broken, what's not and whether gerrymandering can (or should) be killed. Depending on the desired outcome, each of the different maps could represent the “right” way to draw congressional district boundaries - fivethirtyeight's gerrymandering project
  6. 6. Operational definitions What is data science vs. machine learning? Data science: "...integrates a set of problem definitions, algorithms, and processes that can be used to analyze data so as to extract actionable insight...deals with both structured and unstructured (big) data and encompasses principles from a range of fields, including machine learning, statistics, data ethics and regulation, and high-performance computing." Machine learning: "The field of computer science research that focuses on developing and evaluating algorithms that can extract useful patterns from data sets." - Both of these definitions involve a ton of school, training, and experience to understand. However, as you can see, data science includes fields like statistics and machine learning. - These are both far above what is required to work with data - More on this here: https://arxiv.org/abs/1903.07639
  7. 7. Operational definitions The good news! Data science: "...integrates a set of problem definitions, algorithms, and processes that can be used to analyze data so as to extract actionable insight...deals with both structured and unstructured (big) data and encompasses principles from a range of fields, including machine learning, statistics, data ethics and regulation, and high-performance computing." Machine learning: "The field of computer science research that focuses on developing and evaluating algorithms that can extract useful patterns from data sets." USUALLY NOT NECESSARY! - These are both far above what is required to work with data, create visualizations, and gain useful insights! - listen to this podcast: https://soundcloud.com/dataframed/1-data-science-past-present-and-future
  8. 8. Operational definitions Our concern is data fluency Information literacy: "...the ability to know when there is a need for information, to be able to identify, locate, evaluate, and effectively use that information for the issue or problem at hand." Data literacy: "...the ability to read, understand, create and communicate data as information." Statistical literacy: "...the ability to understand and reason with statistics and data." These are great--but why are they separated? Why would you have one without the other?
  9. 9. Data Fluency Data fluency combines 1) the situational assessment skills from information literacy, 2) the storage, retrieval, manipulation, and management abilities from data literacy, and 3) the problem solving, reasoning, and critical thinking from statistical literacy. Data fluency combines 1) the problem assessing skills from information literacy, 2) the storage, retrieval, manipulation, and management abilities from data literacy, and the problem solving, reasoning, and critical thinking from statistical literacy.
  10. 10. Operational definitions skills that 'move across' [Data] Transliteracy: "Transliteracy captures the idea of our capacity to interact with information in whatever form it takes...[it] concerns the ability to apply and transfer a range of skills and contextual insights to a variety of settings. Rather than focusing on any one skill set or technology, transliteracy is about fluidity of movement across a range of contexts. " - Transliteracy: The Art and Craft of ‘Moving Across’
  11. 11. Basic Skills What's required for analytic literacy? 1. Domain expertise: you need to know your stuff 2. Understanding data structures: know what gets measured, how it's stored, and what it looks like 3. Programming: interact with data programmatically so you can express your intentions clearly (and document your work) 4. Exploratory Data Analysis: be able to summarize and communicate the characteristics and patterns of a data set, using tables, graphs, and visualizations An analyst needs characteristics like curiosity, tenacity, and stick-with-it-ness.
  12. 12. Domain expertise Providing the context and purpose An analytic approach to solving problems typically starts with some version of the following questions: 1. What happened? 2. Why did it happen? 3. What will happen if it continues? 4. What can we do about it (or what will happen to y if we do x)? The people closest to a problem will often have the necessary information to solve it, so training them to think analytically is a better long term solution than hiring an expensive 'data scientist' who doesn't know your business.
  13. 13. Data structures: What kind of information is being collected? What are data? - Tweets - Sales - Addresses How can we access them? - API - Relational databases - Google sheets Where are they stored? - Tables (SQL, Google Sheets, etc.) - Web structures (JSON)
  14. 14. Programming Code is a necessary means of communication "Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do." - Donald Knuth. "Literate Programming (1984)" Should everyone learn to code? - Knowing how to program "will vastly increase your potential in becoming a valuable asset at any organization" - "Having coding know-how equips you to better understand how the pieces of the puzzle fit together in a business' - "Coding doesn’t restrict you to a career in tech: it enhances the career, skills, or interests you already have." https://www.forbes.com/sites/laurencebradford/2016/06/20/why-every-millennial-should- learn-some-code/#5ebd0b1870f2
  15. 15. Exploratory Data Analysis The goal of the analysis is exploration (not models and algorithms) - In order to know if you'll be able to use your data to predict anything, you'll need to understand it's characteristics - We do this through summaries, graphics, and visualizations - "It is important to understand what you CAN DO before you learn to measure how WELL you seem to have DONE it" - John Tukey https://simplystatistics.org/2019/04/17/tukey-design-thinking-and-better-questions/ - ...goal of data analysis is to explore the data. In other words, data analysis is exploratory data analysis...maybe this shouldn’t be so surprising given that Tukey wrote the book on exploratory data analysis. - In this paper, at least, he essentially dismisses other goals as overly optimistic or not really meaningful. - For the most part I agree with that sentiment, in the sense that looking for “the answer” in a single set of data is going to result in disappointment. At best, you will accumulate evidence that will point you in a new and promising direction. Then you can iterate, perhaps by collecting new data, or by asking different questions. - At worst, you will conclude that you’ve “figured it out” and then be shocked when someone else, looking at another dataset, concludes something completely different. In light of this, discussions about p-values and statistical significance are very much beside the point.
  16. 16. The Data Analysis Toolkit The necessary steps for an analytic data project are on the left As you can see, staying inside the RStudio IDE minimizes the number of additional tools you'll have to work with Problem statement or question Data collection and wrangling Data visualization and modeling Data communication RStudio IDE The RStudio IDE is a complementary cognitive artifact. ....Expert users of the abacus are not users of the physical abacus—they use a mental model in their brain. And expert users of slide rules can cast the ruler aside having internalized its mechanics. Cartographers memorize maps, and Edwin Hutchins has shown us how expert navigators form near symbiotic relationships with their analog instruments. So our upper Paleolithic lineage has always possessed artificial intelligence to the extent our ancestors have been aided in this way. In modern life, mobile devices and their apps—to-do apps, calendar apps, journaling apps, astronomy apps, game apps, social apps, and on near infinitum—just recapitulate the three essential elements of the astrolabe: memory, search, and calculation. Compare these complementary cognitive artifacts to competitive cognitive artifacts like the mechanical calculator, the global positioning systems in our cars and phones, and machine learning systems powering our App ecosystem. In each of these examples our effective intelligence is amplified, but not in the way of complementary artifacts. In the case of competitive artifacts, when we are deprived of their use, we are no better than when we started. They are not coaches and teachers—they are serfs. We have created an artificial serf economy where incremental and competitive artificial intelligence both amplifies our productivity and threatens to diminish organic and complementary artificial intelligence, and
  17. 17. the ethics of this sort of mechanical labor are only now engaging the attention of practitioners and policy makers. http://nautil.us/blog/will-ai-harm-us-better-to-ask-how-well-reckon-with-our-hybri d-nature
  18. 18. Case Study Follow the following link: https://rstudio.cloud/project/322459 Collecting Google data
  19. 19. Questions?
  20. 20. 1. Robin Donatello 2. Storybench 3. RStudio 4. Tidyverse Additional Resources
  21. 21. This is all stuff I've learned from other people! 1. Hadley Wickham 2. Hilary Mason 3. Greg Wilson 4. David Krakauer 5. David Robinson 6. Jenny Bryan 7. Charlotte Wickham 8. Bradley Boehmke 9. Benjamin S. Baumer 10. Mara Averick 11. Andrew Gelman 12. Lucy D'Agostino McGowan I didn't come up with any of this stuff on my own--I learned it from these great folks (and many others!)
  22. 22. Come find us! mfrigaard@paradigmdata.io pspangler@paradigmdata.io https://www.paradigmdata.io/ http://www.storybench.org/

×