2. Personal Introduction
§ Phd and Habilitation at Technical University of Kaiserslautern, Germany
§ Lecturerer 2008 – 2012 with focus on implementation (micro electronics)
of complex algorithms
§ 2013 – 2017 with Blue Yonder (www.blue-yonder.com) first as a senior data
scientist - later as director data science consulting
§ Since 2014 Privat Dozent at TUKL with focus on teaching data science
practice
3. The lecture addresses students that are interested in the topic of big data,
programming skills and business models. All three topics are addressed - examples
are presented with respect to predictive models in python.
The internet-of-things describes the change in technology where modern
information technology is penetrating all industrial processes. Here, each device,
machine, and sensors are connected to gather information.
The age of data gathering started already 10 years ago and is often coined under
the term big data. Today, big data is any data that is expensive to manage and
hard to extract value from.
Predictive Analytics is the art to extract value out of big data with the task to
leveraging industrial revenues.
Lecture Context
01.05.17 Frank Kienle p. 3
4. In this lecture we focus on predictive modeling (machine learning) via python and
how to solve the related business problem. Programming skills are mandatory for a
data scientist; thus, programming exercises have to done by the students.
Predictive models forecast the future given historic data sets. For this machine
learning becomes mandatory. In this lecture we will use the so-called scikit-learn
python library to demonstrate pitfalls and best practices to solve a problem. Note
that a full coverage of these topics is not possible. Thus, only basic concepts are
sketched by using the python programming language.
One of the chief pitfalls of data analysis is attempting to solve the wrong problem.
Thus, the lecture focuses heavily on the business side and how to address the
correct data questions. Persons responsible to solve data science problem in
industry needs to solve a business problem. The job profile is often denoted as
data scientists.
‘Data Scientist: The Sexiest Job of the 21st Century – HBR article @ https://
hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/
Lecture Context
01.05.17 Frank Kienle p. 4
5. Students’ prerequisites:
every topic in these days can be found in the internet. Teaching facts and
controlling it is not the purpose of the lecture. The idea is to widen the scope for data
science by working on ,real or artificial’ use cases
• slides and online resources will be provided (see information collateral uploaded at
http://de.slideshare.net/frankkienle )
• important topics will be presented in a compressed style within the lecture, however,
the information collateral provides already all mandatory information
• discussions are always related to use case, real data sets are utilized to demonstrate
problems and pitfalls (hacking skills in python have to be developed)
• active participation and open discussion philosophy
• doing the python programming exercises is a prerequisite for the exam
Lecture Overview: applied inverted class room concept
with strong focus on teaching concepts
01.05.17 Frank Kienle 5
6. Teaching Facts
Teachers help students learn facts—that is, verifiable pieces of specific information.
Facts take a variety of forms, including definitions, names, dates, and formulae.
Sample question used when teaching facts: “What is this?”
Teaching Skills
Teachers also want students to learn skills. Skills are best considered a type of
learning that gets better with practice. Practicing programming will likely make
you more efficient (maybe as well effective). Methods for teaching skills usually
involve practice in which the teacher gives quick feedback on the student's
performance. Sample feedback used when teaching skills: “That time was better.
Can you tell what you did differently?”
Teaching Facts, Skills, Concepts*
01.05.17 Frank Kienle p. 6
*https://people.ucsc.edu/~ktellez/facts-skills-con.html
7. Teaching Concepts
Teachers are generally most concerned with conceptual learning because it helps
learners to understand why.
Concepts are distinguished from facts in that they are a much broader, deeper type
of knowledge. Learning a concept should help the learner generalize from the
teaching context to other, different contexts.
Concepts are also different from facts and skills because they involve relationships
or processes. Teaching for concepts can take many forms.
One common method for conceptual development is the use of examples and non-
examples, with a focus on attributes/criteria for inclusion. Teachers also engage in
hypothetical questioning and systems analysis instruction for teaching concepts.
Teaching Facts, Skills, Concepts*
01.05.17 Frank Kienle p. 7
*https://people.ucsc.edu/~ktellez/facts-skills-con.html
8. • What is a data scientist
• Skillsets and different profiles for a data scientists
• Introduction to Big Data
• Machine Learning (part 1 to 3)
• Introduction to Data Bases
• Programming/Hacking day: goal is to enable a quick start for beginners, give
hints for more advanced programmers
• Use case preparation (programming work, mandatory homework)
• Business Models/Business Frameworks
• DevOps and professional environments
• Data Science: best practices
Basic Building Blocks (many personal perspectives)
01.05.17 Frank Kienle p. 8
10. Building data science teams
Data science teams need people with the skills and curiosity to ask the big
questions.
@http://radar.oreilly.com/2011/09/building-data-science-teams.html
The field guide to data science (version 2015: advise to read)
@https://www.boozallen.com/content/dam/boozallen_site/sig/pdf/publications/
2015-field-guide-to-data-science-160211215115.pdf
Data Science Work/Overview
01.05.17 Frank Kienle p. 10
11. Why Software Is Eating The World
Marc Andreesen, August 20, 2011
@http://www.wsj.com/articles/… (advise to read)
Big data: The next frontier for innovation, competition, and productivity
McKinsey 2011, full report
@http://www.mckinsey.com/insights/business_technology/…
The age of analytics: Competing in a data-driven world
McKinsey 2016, full report
@http://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/
the-age-of-analytics-competing-in-a-data-driven-world
Data Driven Business
01.05.17 Frank Kienle p. 11
market/
understanding
progress
14. Introduction to Data Science
@https://www.coursera.org/course/datasci
(advise to look at)
Full Topic: Relational Databases, Relational Algebra,
Full Topic: MapReduce,
NoSQL Introduction and Eventual Consistency
Machine Learning(Stanford)
@https://www.coursera.org/course/ml
(advise to look at)
Topic I – IV, VII, X
Data Science/Machine Learning (online courses)
01.05.17 Frank Kienle p. 14
15. Statistical Analysis & Data Mining Mistakes,
R. Nisbet, J. Elder, G. Miner, ISBN: 978-0-123747655
advise to read (Chapter 20 - Top 10 Data Mining Mistakes)
Data Analytics/ Data Science Books (high level books, easy reading)
01.05.17 Frank Kienle p. 15
Data Science for Business: What you need to know about data mining and data-
analytic thinking
Foster Provost, Tom Fawcett, ISBN: 978-1449361327
16. Amazon Web Service (AWS) Tutorials
http://docs.aws.amazon.com/gettingstarted/latest/awsgsg-intro/gsg-aws-
tutorials.html
Using Vagrant and Ansible (advise to try)
http://docs.ansible.com/guide_vagrant.html
Platforms/Deployment
01.05.17 Frank Kienle p. 16