2. 2
What is Big Data?
“The dynamically linked super set of multiple significant
scale discrete data sets.”
-Oscar Wilde
Characteristics include
• Large volumes, typically adding terabytes of data daily
• Aggregation of many historically discrete data sets
• Dynamic links between the data sets
Consequently
• Any analysis is a point in time position
3. 3
Why care?
• Better intelligence which can be leveraged in business, healthcare etc. to
target efforts;
• Cost of a DNA analysis has reduced by around 5 orders of magnitude
since the process became possible, making personalised medicines a
reality in the near future.
• If you are investing in Big Data projects, the risk of data loss doesn’t
necessarily change. The Volume of loss is potentially colossal with impacts that
aren’t understood for an extended period.
• Customers hold concerns about companies taking a role of Orwellian Big
Brother.
4. 4
There’s no Best Practice…yet
Breaches
• Snowden showed that Government organisations with specific focus on
security struggle to control Big Data and the associated risks.
• Panama Papers showed that legal firms with an inherently high level of
confidentiality in their practices struggle.
Compliance issues
• Harder to define the purpose of data exploration.
• Big Data breaches tend to be….bigger.
• Regulators will expect technology to be used equally to exploit and
control Big Data.
5. 5
Key Controls for Big Data
1. Track all access that collects, views, and manipulates sensitive data, and ensure that it is
encrypted at each point.
2. Encryption keys for sensitive data can't be stored at the same location as the data.
3. All access and processing of data must be logged. These logs must be subject to human and
automatically monitoring.
4. Use automated scanning to constantly monitor systems for vulnerabilities and malware.
5. Monitor network egress for anomalies in traffic.
6. Create a number of "false flag“ records. Configure alerts and blocks to identify and prevent
data breaches.
6. 6
How to use Big Data Analytics?
Prescriptive Analytics
How can we influence the future?
Predictive Analytics
How can we plan for the future?
Diagnostic Analytics
Why did this happen?
Descriptive Analytics
Do we know what happened?
AnalyticsMaturity
HistoricalAnalyticsProactiveAnalytics
7. 7
Police use of Predictive Analytics
The California city of Fresno is just one of
the police departments in the US already
using a software program called “Beware”
to generate “threat scores” about an
individual, address or area.
As reported by the Washington Post in
January, the software works by
processing “billions of data points,
including arrest reports, property records,
commercial databases, deep web
searches and the [person’s] social media
postings”.
Photo: Nick Otto/For The Washington Post
Quote :https://www.theguardian.com/technology/2016/feb/04/us-police-data-analytics-smart-cities-crime-likelihood-fresno-chicago-heat-list
8. 8
How to do it well
Staff appropriately
• Specialist Skills are in demand;
• Big Data
• Data Management
• Have a plan to recruit and retain them!
Data Quality
• Big Data Leaders show maturity in data quality
9. 9
Final Point
Big Data is a pre-requisite of the desire for better
analytics, the desire to better understand. Of itself, its
just a large data set waiting to breach.
10. 10
Points of contact
Ben Fountain
Senior Consultant
M: +44 (0) 7545 503 311
E: ben.fountain@nccgroup.trust
NCC Group Blogs
https://www.nccgroup.trust/uk/about-
us/newsroom-and-events/blogs/
TED Talks on Big Data
https://www.ted.com/search?q=big+data
12. 12
NCC Locations
Europe
Manchester - Head Office
Amsterdam
Basingstoke
Cambridge
Copenhagen
Cheltenham
Delft
Edinburgh
Glasgow
Leatherhead
Leeds
London
Luxembourg
Madrid
Malmö
Milton Keynes
Munich
Vilnius
Zurich
Australia
Sydney
North America
Atlanta
Austin
Chicago
Kitchener
New York
San Francisco
Seattle
Sunnyvale
Editor's Notes
So, I’m starting by defining how I think of Big Data.
Experimenting by falsely attributing this definition.
This definition as of today has zero hits on google.
Experiment with a search over time to see how, or if Google manages to find and attribute the quote to Oscar or myself.
With Big Data you are sifting a larger data set, looking for more specific information than has previously been possible. Sometimes patterns emerge that weren’t previously identified at a macro scale, that’s more often in scientific efforts; business is typically looking to being better able to exploit an existing market than break new ground.
So what are you looking to analyse? What are the data sets and how have they been compiled? What is their provenance? What about the data quality? Where Big Data projects have provided meaningful benefits a trend shows that these companies have three aspects in place;
Strong staff who are interested in asking the right questions, not obsessed in ‘big data’ as a buzzword.
Big Data doesn’t change the Garbage In, Garbage Out principle; Mature data quality processes are a must
Responsible approach, several aspects
big data can expose more details that are not palatable to the general public or sometimes to the company; you need to recognise that the analysis may challenge the hypothesis.
RBAC is critical, exposing these data sets can result in significant harm to your organisation and everyone referred to either directly or indirectly
Compliance becomes critical in this as soon as you have data sets which correlate to identify individuals instead of groups.
Whilst personalised healthcare, advertising that predicts what we want just in time for us to purchase it and identifies criminals automatically is the goal, far too often we have found that new technologies tend to be exploited for less laudable goals.
Big data under the GDPR will associate with big fines….
Gunter Ollman of NCC Domain Services proposed that these controls give an overlapping set that work together across network, vulnerability, behaviour and (to a degree) stupidity to jointly reduce the likelihood and impact of a breach.
Track all access and processing of the data, encrypt sensitive data as soon as possible, ideally at the source.
Don’t leave the keys in the same place as the data.
Log everything and monitor it. Leverage the anomaly detection systems to reduce the signal to noise ratio until humans can realistically review the volume of data.
Use automated scanning to constantly monitor systems for vulnerabilities and malware.
Monitor network egress for anomalies in traffic.
Create a number of "false flag“ records. These will automatically alert your security team if they are accessed. Configure alerts and blocks to identify and prevent data breaches.
We can split companies use of big data into what happened and what will happen and further segment that to provide a maturity model.
Descriptive analytics is where most activity remains in the IT sector at the moment with regards to big data. Log collation and some analysis.
In some instances we have a breach and move to diagnostic analytics as we look to analyse the detail, but this takes effort and because still many organisations do not report breaches the patterns are not always clear enough to derive a confident conclusion. This is a reactive position.
Predictive analytics some of the more advanced and security focussed organisations are moving to. Threat modelling efforts sit here.
Prescriptive analytics; crystal ball gazing is now moving into pre-crime, yet this is happening now for several police forces in the US. https://www.theguardian.com/technology/2016/feb/04/us-police-data-analytics-smart-cities-crime-likelihood-fresno-chicago-heat-list
On a call, officers respond, Beware checks the address and get names of residents, these are checked against public data sources to threat model them RAG.
How this is done is a trade secret, but could identify a PTSD sufferer who has tweeted about having bad experiences….. Your tweets could influence whether the officer approaches the door, and if you are flagged red, say because your account has recently been hacked then the outcome may be violent.
http://www.aclunc.org/docs/201512-social_media_monitoring_softare_pra_response.pdf
Traditional IT staff are often the wrong fit for big data, they focus on the T and not the I.
Specialist skills are required, and only a few organisations work truly at Big Data Exabyte scales, so they are in high demand.
The analysis can improve by ensuring that the importance of data quality is embedded in all your systems to ensure that the data sets are filtered as they progress through downstream systems before they hit the Big Data aggregation point.
Experimenting by falsely attributing this definition.
This definition as of the today has zero hits on google.
I’ve configured a Google alert to track this quote and I’m looking forward to seeing who it gets attributed to.