After the computing industry got started, a new problem quickly emerged. How do you operate this machines and how to you program them. The development of operating systems was relatively slow compared to the advances in hardware. First system were primitive but slowly got better as demand for computing power increased. The ideas of the Graphical User Interfaces or GUI (Gooey) go back to Doug Engelbarts Demo of the Century. However, this did not have much impact on the computer industry. One company though, Xerox, a photocopy company explored these ideas with Palo Alto Park. Steve Jobs of Apple and Bill Gates of Microsoft took notice and Apple introduced first Apple Lisa and the Macintosh. In this lecture on we look so lessons for the development of software, and see how our business theories apply.
In this lecture on we look so lessons for the development of algorithms or software, and see how our business theories apply.
In the second part we look at where software is going, namely Artificial Intelligence. Resent developments in AI are causing an AI boom and new AI application are coming all the time. We look at machine learning and deep learning to get an understanding of the current trends.
2. Big Data
With the computer revolution, digital data becomes possible
Over the years, data has grown exponentially
“Big Data” has become a
platform by itself with new
possibilities
3. Global Data is Growing Fast
Data in Digital Universe vs. Data Storage Cost, 2010-2015
Source: Mary Meeker, KPCB
5. Data is a New Growth Platform
The
Network
The
Software
The
Infrastructure
The
Data
Large investments in fibre optic & last-mile cable create connectivity
that facilitated the early Internet growth
Optimising the network with software became far more capital
efficient than additional capital expenditure buildouts, ultimately
resulting in the creation of pervasive networks (Siloed DCs -> AWS)
and pervasive software (Siebel -> Salesforce)
Emergence of pervasive software created the need to optimise the
performance of the network and store extraordinary amounts of data
at extremely low prices
Next Big Wave: Leveraging this unlimited connectivity and storage to
collect / aggregate / correlate / interpret all of this data to improve
people’s live and enable enterprises to operate more efficiently
10. Big Data Examples
Macy's Inc. and real-time pricing
The retailer adjusts pricing in near-real time for 73 million
items, based on demand and inventory.
Source:Ten big data case studies in a nutshell
11. Big Data Examples
Tipp24 AG, a platform for placing bets
The company uses software to analyse billions of
transactions and hundreds of customer attributes, and to
develop predictive models that target customers and
personalise marketing messages on the fly.
Source:Ten big data case studies in a nutshell
12. Big Data Examples
Wal-Mart Stores Inc. and search
The mega-retailer's latest search engine for Walmart.com
includes semantic data. A platform that was designed in-
house, relies on text analysis, machine learning and even
synonym mining to produce relevant search results.
Wal-Mart says adding semantic search has improved
online shoppers completing a purchase by 10% to 15%.
Source:Ten big data case studies in a nutshell
13. Big Data Examples
PredPol Inc. and repurposing
The Los Angeles and Santa Cruz police departments, a
team of educators and a company called PredPol have
taken an algorithm used to predict earthquakes, tweaked it
and started feeding it crime data.
The software can predict where crimes are likely to occur
down to 500 square feet. In LA, there's been a 33%
reduction in burglaries and 21% reduction in violent crimes
in areas where the software is being used.
Source:Ten big data case studies in a nutshell
14. Big Data Examples
American Express and business intelligence
AmEx started looking for indicators that could really
predict loyalty and developed sophisticated predictive
models to analyse historical transactions and 115 variables
to forecast potential churn
The company believes it can now identify 24% of Australian
accounts that will close within the next four months
Source:Ten big data case studies in a nutshell
15. Big Data Examples
A Bank and IBM
A large US bank uses IBM machine learning technologies
to analyse credit card transactions.
Using machine learning and stream computing to detect financial fraud
19. What is Big Data?
Big data is high-volume, high-velocity and/or high-variety
information assets that demand cost-effective, innovative
forms of information processing that enable enhanced
insight, decision making, and process automation.
Gartner
20. What is Big Data?
Big data refers to a process that is used when traditional
data mining and handling techniques cannot uncover the
insights and meaning of the underlying data. Data that is
unstructured or time sensitive or simply very large cannot
be processed by relational database engines. This type of
data requires a different processing approach called big
data, which uses massive parallelism on readily-available
hardware.
Techopedia
21. “Big data is the oil of the 21st century and
analytics is the combustion engine.”
—Peter Sondergaard, Gartner Research
What is Big Data?
22. How do you measure numbers at large scale?
What is Big Data?
27. Byte: one rice
Kilobyte: handful of rice
Megabyte: Big pot of rice
David Wellman: What is Big Data?
What is Big Data?
28. Byte: one rice
Kilobyte: handful of rice
Megabyte: Big pot of rice
Gigabyte: Truck full of rice
David Wellman: What is Big Data?
What is Big Data?
29. Byte: one rice
Kilobyte: handful of rice
Megabyte: Big pot of rice
Gigabyte: Truck full of rice
Terabyte: Containership full of rice
David Wellman: What is Big Data?
What is Big Data?
30. Byte: one rice
Kilobyte: handful of rice
Megabyte: Big pot of rice
Gigabyte: Truck full of rice
Terabyte: Containership full of rice
Petabyte: Covers Manhattan
David Wellman: What is Big Data?
What is Big Data?
31. Byte: one rice
Kilobyte: handful of rice
Megabyte: Big pot of rice
Gigabyte: Truck full of rice
Terabyte: Containership full of rice
Petabyte: Covers Manhattan
Exabyte: Covers the west coast of US
David Wellman: What is Big Data?
What is Big Data?
32. Byte: one rice
Kilobyte: handful of rice
Megabyte: Big pot of rice
Gigabyte: Truck full of rice
Terabyte: Containership full of rice
Petabyte: Covers Manhattan
Exabyte: Covers the west coast of US
Zettabyte: Fills the Pacific Ocean
David Wellman: What is Big Data?
What is Big Data?
33. Byte: one rice
Kilobyte: handful of rice
Megabyte: Big pot of rice
Gigabyte: Truck full of rice
Terabyte: Containership full of rice
Petabyte: Covers Manhattan
Exabyte: Covers the west coast of US
Zettabyte: Fills the Pacific
Yottabyte: Earth size riceball
David Wellman: What is Big Data?
What is Big Data?
34. Byte: one rice
Kilobyte: handful of rice
Megabyte: Big pot of rice
Gigabyte: Truck full of rice
Terabyte: Containership full of rice
Petabyte: Covers Manhattan
Exabyte: Covers the west coast of US
Zettabyte: Fills the Pacific
Yottabyte: Earth size riceball
David Wellman: What is Big Data?
Big Data
Internet
Computers
Early computers
What is Big Data?
35. Big Data is not about the size of the
date, it’s about the value within the
data
This value can be used for marketing,
businesses optimisation, getting
insights, improving health, security
etc.
What is Big Data?
37. Why Big Data Analytics?
Understand the data the company has
Process data to see patterns, corrections and
information that can be used to make better
decisions
Obtain insights that are otherwise not known
38. Data Analytics
TRADITIONAL APPROACH
Structured and Repeatable Analyses
BIG DATA APPROACH
Iternative and Exploratory Analyses
Business users
Business users
Determine what
questions to ask
IT
Structures the data
to answer the
question
IT
Delivers a platform
to enable creative
discovery
Explores what
questions could be
asked
39. Tools for Data Analytics
NoSQL databases: MongoDB, Cassandra, Hbase, Hypertable
Storage: S3, Hadoop Distributed File System
Servers: EC2, Google App Engine, Heroku
MapReduce: Hadoop, Hive, Pig, Cascading, S4, MapR
Processing: R, Yahoo! Pipes, Solr/Lucene, BigSheets,
40. Two Types of Data Analysis Problems
Supervised Learning: Learn from data but we have labels
for all the data we’ve seen so far
Example: Determining Spam Emails
Learn from data but we don’t have any
labels
Example: Grouping Emails, AlphaZero
Unsupervised Learning:
Learning is about discovering hidden patterns in data
41. Clustering
One of the oldest problems in unsupervised data analysis
In clustering the goal is to group data according to similarity
Algorithms such as K-means are used for clustering
42. For each artefact found,
the location to N and E
from the Marker is
recorded
That is a Data Set
Before the dig, a historian
has said that three families
lived in the location
Clustering
43. Similar: close in physical
distance
You assign each data point
to one and only one group
The groups are called
clusters
Clustering
44. Clustering them is the unsupervised learning problem
where you take your data and assign each data point to
exactly one group, or cluster
Uses unlabelled data
Clustering
45. We may have collection data but we don’t know what to
do with it
We might want to explore the data without a particular
end goal in mind
Perhaps the data will suggest interesting avenues for
further analysis
In this case, we say that we're performing exploratory
data analysis
Clustering
46. Exploratory data analysis
We don’t know what we are looking for
Data point = colour of pixel and location of pixel
Dissimilarity is the distance in colour
47. In some cases
labelling is too
expensive
For example,
news change
every day and
there are too
much of them
Exploratory data analysis
51. Data Analysis as a Platform
THEN NOW
Complex tools operated by Data Analysts
Chaos of data silos accross the company
Real-time data analytics platform like Looker
52. Customer Data as a Platform
Difficult to customise,
lack of automated
customer insights
Real-time Intelligent that
automatically tracks and analysis
interaction with customer
THEN NOW
53. Mapping Data as a Platform
Difficult and expensive to collect data
Limited in-app digital map usage
Mapping platforms like Mapbox
THEN NOW
54. Cloud Data Monitoring as a Platform
Expensive and clunky point solution
Lengthy implementation cycles
Only used by System Administrators
Cloud monitoring platforms like
Datadog
THEN NOW