1. Emerging Trends in Data Science
Joanne S. Luciano, PhD.
Distinguished Professor of Data Science
University of the Virgin Islands
https://www.linkedin.com/in/joanneluciano
Twitter: joanneluciano
HackFest 2020
Wednesday October 28, 2020 12:00 Noon
2. Data Science
“Take nothing on its looks; take everything
on evidence. There's no better rule.”
― Charles Dickens, Great Expectations, (1860)
3. Overview
● What is Data Science? Mini Lecture
● Emerging Trends in Data Science
● AI and ML in Data Management and Reporting
● Mainstreaming of Natural Language Processing
● Deep Learning
● Data Privacy By Design / Cybersecurity
● FAIR Principles
● Does the VI need Data Science? Why?
● Data Science Program at UVI: Minor, Courses, Certificate
5. Data Science Workflow
After this mini lecture, you will be able to:
Define the data science workflow.
Name each of the of the five processes in the data
science workflow.
Describe the each of the five processes in the data
science workflow and why each is necessary.
6. Data Science
Concerned with the collection, preparation, analysis, visualization,
management, and preservation of large collections of information.
Many skills are needed.
In theory, theory and practice are
the same. In practice, they are not.
-- Albert Einstein
7. Data Science is Multidisciplinary
http://www.kdnuggets.com/2016/10/battle-data-science-venn-diagrams.html
http://drewconway.com/the-lab
10. Data Science – Get the data
Dataare facts and
statistics collected
together for
reference or
analysis.
What are some ways
to get data?
Typically as a data scientist, you would not collect data, get here means get it someone from someone
else.
11. Exploratory Data Analysis
Objectives
• Suggest hypotheses about the causes of
observed phenomena
• Assess assumptions on which statistical
inference will be based
• Support the selection of appropriate statistical
tools and techniques
• Provide a basis for further data collection
through surveys or experiments[1]
[1] Behrens-Principles and Procedures of Exploratory Data Analysis-American Psychological Association-1997
Data Science: Explore the Data
12. Data Science: Explore
https://en.wikipedia.org/wiki/Exploratory_data_analysis
Tools to use for data exploration
Graphical:
• Box plot
• Histogram
• Multi-vari chart
• Run chart
• Pareto chart
• Scatter plot
• Stem-and-leaf plot
• Parallel coordinates
Quantitative:
• Median polish
• Trimean
• Ordination
• Odds ratio
• Multidimensional scaling
• Targeted projection pursuit
• Principal component analysis
• Multilinear PCA
• Projection methods such as grand
tour, guided tour and manual tour
• Interactive versions of these plots
This is the tool I
use to
leverage the
data.
13. Data Science Workflow
Data Model or Statistical Model?
Here we mean statistical model, which
Is a type of mathematical model.
A Data Model organizes elements of data and how
they relate to one another, e.g. a data element
representing a car comprises a number of other
elements which in turn represent the color, size and
owner of the car.
A statistical model embodies a set of assumptions
concerning the generation of the observed data, and
similar data from a larger population. A model
represents, often in considerably idealized form, the
data-generating process.
Modelthedata
14. Data Science Workflow
Communicate and
Visualize the results.
http://www.encorebusiness.com/blog/tableau-tips-
tricks-tableau-story-telling/
15. Data Scientists need to be a combination of:
https://aibusiness.com/document.asp?doc_id=761108
The skill level will vary:
• the individual’s interest,
skills, role
• the organization type
and size
16. Overview
● What is Data Science? Mini Lecture
● Emerging Trends in Data Science
● AI and ML in Data Management and Reporting
● Mainstreaming of Natural Language Processing
● Deep Learning
● Data Privacy By Design / Cybersecurity
● FAIR Principles
● Does the VI need Data Science? Why?
● Data Science Program at UVI: Minor, Courses, Certificate
✅
17. AI and ML in Data Management and Reporting
● Traditional techniques are insufficient to handle Big Data
● Machine Learning techniques are needed to tame and organize Big Data
● Data Scientists will be in demand by businesses of all kinds
According to Gartner, “Within the next year, the number of data and analytics
experts in business units will grow at three times the rate of experts in IT
departments”
https://tdan.com/ai-machine-learning-can-solve-data-management-problems
https://www.forbes.com/sites/forbestechcouncil/2020/04/15/machine-learning-and-the-
breakthrough-in-enterprise-data-management
18. Machine learning
5 vectors of progress
http://blog.udacity.com/2014/11/data-science-job-skills.html
https://www2.deloitte.com/us/en/insights/focus/signals-for-strategists/machine-learning-technology-five-vectors-of-progress.html#
Automating data science.
Reducing need for training data.
Accelerating training.
Explaining results.
Deploying locally.
19. Mainstreaming of Natural Language Processing
● Siri/Alexa/Google Assistant
● Disease Prediction
● Sentiment Analysis
https://towardsdatascience.com/your-guide-to-natural-language-processing-nlp-48ea2511f6e1
20. Deep Learning
● Data volumes and computational power
have exploded in recent years
● Artificial Neural Networks = Brain-like, self-
teaching learning models
● Some current applications
○ Better Chess and Go playing algorithms than any
human player
○ Automatic, real-time text translation and image
captions
○ Write entirely new text, from plays to Wikipedia
articles
https://machinelearningmastery.com/what-is-
deep-learning/
https://www.forbes.com/sites/bernardmarr/2018
/08/20/10-amazing-examples-of-how-deep-
learning-ai-is-used-in-practice
21. Why is Data Science Crucial to the VI?
● Lack of data capture and analytic capability costs the VI major funding
opportunities every year
● Hazard Mitigation and Resilience – cannot plan without data
● What are the health and economic tradeoffs of different COVID policies?
● Can we detect early signs of new contagious diseases arriving in the VI?
● What do the data tell us about our education system?
● What other opportunities for the VI to balance our portfolio, rum trourism?
Can we do a better job of marketing the VI?
○ Tourism, Diversity, Marine Sciences, Climate Change
Data Science Skills and Data Literacy are critical for a health, and
prosperous VI.
22. Data Privacy By Design / Cybersecurity
UVI offers a four-year degree program in computer
science with a concentration in Cyber Security and
Digital Forensics.
https://community.nasscom.in/communities/policy-advocacy/gdpr/privacy-by-design-indian-landscape.html
Data protection through technology design:
Easier to adhere to when it is already integrated in the
technology, i.e. when it’s created.
https://cybersecurity.uvi.edu/cybersecurity-
programs/default.aspx
Contact Dr. Marc Boumedine for UVI Cybersecurity program:
Telephone: 340-693-1255
Email: mboumed@uvi.edu
25. Data Science Minor at UVI
Develop strong quantitative analysis skills with a focus on
your field of study.
Improve your marketability in your career of choice
Acquire an additional set of marketable skills in one of the
fastest growing fields
26. SP 2021
UVI’s New
Minor in Data Science
Start with Data science I
Data Science I Offered in SPRING 2021 --
CSC 230 CRN AAS 15420; STT 15414
SCI 230 CRN AAS 15456; STT 15457
3 Credits
Class Meets: Tuesday Thursday 11:00 - 12:15 on zoom
https://bit.ly/UVI-FA20-DSorientation
27. Data science applies to all disciplines and uses multiple
disciplines to gain insights from data.
Data Science students are trained to make evidence-based
ethical data-driven decisions.
This training is in high demand across the country and
represents an urgent unmet need in the VI.
Why take Data Science Courses?
28. Data Science I: Course Description
Data Science I provides students with an introduction to the concepts and basic
skills needed to understand the role of data in today’s world. The course explores
the emergence of the field using the data science workflow as the unifying
framework to illustrate the importance of each stage of the workflow, how it
contributes to the final report, and how that new information is used. Topics
include applications of data science; data ethics; data preparation; data
stewardship; analysis, evaluation, communicating results, and best practices. The
trade offs among tools, algorithms, and visualizations are discussed using both
effective and ineffective examples. This is a hands-on course; students work with
datasets in a peer-peer and near-peer groups. 3 credits.
PRE-REQUISITES: MAT 140 or MAT143
29. Data Science I - Course Overview
This course provides an entry to the field of data science and its applications.
Data science is a field that stems from multiple disciplines. Students will learn
basic competencies each discipline together with fundamental data science
principles, and how together they are applied in practice. The course offers both a
foundation and a framework for thinking about data and its impact on society. The
course is appropriate for students from any major.
COURSE SESSIONS:
Two 75-minute active learning sessions per week.
On zoom.
30. Data Science I - Objectives
Upon completion of the course, students will be able to:
● Provide and discuss several applications of data science in the real world.
● Describe the data science workflow and the contribution of each stage.
● Distinguish between ethical and unethical use of data.
● Understand the importance of restricted access to sensitive data and how it relates to
privacy.
● Identify and validate ethically appropriate data needed to test a hypothesis.
● Collect and properly format data.
● Understand and work with data standards.
● Use tools to create data visualizations.
● Apply the DS workflow processes successfully to solve a real-life problem on relevant data
sets in a complete project.
32. Grade breakdown
Homework and Projects: 60%
Participation: 15%
Midterm 10%
Final Project 15%
Participation includes attending class, asking questions
in class, contributing to the overall ecology of the
classroom and projects, adding extra features to
homework, and identifying readings and events.
33. Data Science Minor - Core Courses
Data Science Minor
The minor in Data Science affords students the opportunity to extend their quantitative abilities as a route to a deeper understanding of
their chosen field and to greater marketability after graduation. Students must complete the following courses with a passing grade in
each course with 18-20 credits.
37. Course Sequence Examples
Earliest year for Data
Science Required
Courses
College of Science
and
Mathematics
School
of
Business
College of Liberal
Arts and Social
Sciences
Freshman Year Programming
Mathematics
Mathematics Gen Ed
Sophomore Year Statistics
Data Science 1
Programming Gen Ed
Junior Year Data Science 2 Statistics Programming
Statistics Mathematics
Senior Year Data Science 1
Data Science 2
Data Science 1
Data Science 2
Example sequences for the Data Science and Analytics core courses
for the College of Science and Mathematics, School of Business, and
the College of Liberal Arts and Sciences. The greater the STEM
background, the earlier the student can take the required DSA courses.
Students from different
disciplines are exposed
to data science at
different stages to ensure
a viable pathway
consistent with their
studies to earn the DSA
minor.
Students: ASK YOUR ADVISOR
where data science can fit into
your course sequence. Start with
Data Science I
Advisors: this is a great
opportunity for your students
38. What UVI students are saying
“Data Science is a superpower! It’s a superpower that, with focus and intention, you can train and develop. The Data Science 1
course is the perfect starting place for anyone interested in developing the superpower of finding actionable insights within data.”
Christopher Murphy
Data Science (FA19), DS I Teaching Assistant (FA 20), UVI Data Science Club President
“I have had a wonderful experience in this class. Don’t let the programming scare you, the information you will learn is very
relevant & applicable in real life. You won’t be disappointed, I had no clue what data science was before this, now I can tell
you how important it is in everyday life.”
Dante Molloy
Data Science I student, FALL 20
“My time in this class has been extremely enlightening. Through this class I’ve learned the boundless applications of data
science and the many fields to which I can apply for a trade. My knowledge of programming has increased greatly as well.
Through this class I’ve been made more aware about the world around me and the information I create and release to the
world and for that I will always be grateful.”
Aaron Krigger, Jr.
Data Science I student, , FALL 20
“The class is interesting and if you want to know more about coding and gathering data, this may be a class for you.”
Derrick Thomas, Data Science I student, FALL 2020
39. Data Science I Team
Chris Murphy
Teaching Assistant
(340) 626-5199
christopher.murphy@students.uvi.edu
Office Hours: By Appointment
Joanne S. Luciano, Ph.D.
Distinguished Professor of Data Science
Office: (340) 693-1253
Mobile: (518) 313-9742
Email: joanne.luciano@uvi.edu
Office Hours: By Appointment
40. Data Science Club
Club Mission:
Bring together students with an interest
and enthusiasm for data science and
analytics. Assist students in learning
and applying the tools of data science
to solve real world problems.
Club Activities:
● Group projects
● Paper reviews
● Personal project discussion,
presentations, and assistance
● Outreach
● Guest speakers
Interested? Please add your information to this form:
https://bit.ly/UVI-Data-Science-Club
41. Thank You!
David Hall, SJD
President
University of the Virgin Islands
Camille McKayle, Ph.D.
Provost
Vice President of Academic Affairs
All the Deans, the Department Chairs, faculty and staff across UVI.
YOU have all helped get the program developed and delivered.
THANK YOU!
Dr. Marc Boumedine, Dr. Tom Lombardi, Dr. Robert Stolz, Dr. David Morris
Ms. Marlene Parrott-Gokool