Data Obfuscation in Splunk Enterprise: Anonymization, Pseudonymization and Encryption Techniques

Copyright © 2015 Splunk Inc.
Data Obfuscation in
Splunk Enterprise

Agenda
The Drivers
Data-in-Flight
Data-at-Rest
Data Obfuscation within Splunk Enterprise
– Anonymization
– Pseudonymization
– Summing Up
Demonstration

The Drivers
risk
minimization
strategy

The Drivers
Collect and Process Data
5
Stakeholder* Workers
Council
Data Privacy
Officer
GDPR Privacy
Shield
PCI ….
Requirements* Anonymization Pseudonymization Pseudonymization Encryption RAW Event
archival for 1
year – 3
month online
*Examples only | Your legal department will assist you.

The Drivers
Collect and Process Data
6
Stakeholder* Workers
Council
Data Privacy
Officer
GDPR Privacy
Shield
PCI ….
Requirements* Anonymization Pseudonymization Pseudonymization Encryption RAW Event
archival for 1
year – 3
month online
*Examples only | Your legal department will assist you.
You need to ensure to have a flexible platform
that fits your needs
–
even if they change!

Spoilt for Choice
What
– Confidentiality / Integrity / Authenticity
Where
– At Source / In Flight / At Rest / Presentation Layer
How
– Anonymization / Pseudonymization
Usability, Maintainability, Cost, …
7

Data-in-Flight
Ways to secure your connections to Splunk Enterprise
Encryption and/or authentication using your own certificates for:
– Communications between the browser and Splunk Web
– Communication from Splunk forwarders to indexers
– Other types of communication, such as communications between Splunk
instances over the management port
9
Type of exchange Client function Server function Encryption Certificate
Authentication
Common Name
checking
Type of data exchanged
Browser to Splunk Web Browser Splunk Web NOT enabled by default dictated by client
(browser)
dictated by client
(browser)
search term results
Inter-Splunk
communication
Splunk Web splunkd enabled by default NOT enabled by default NOT enabled by default search term results
Forwarding splunkd as a
forwarder
splunkd as an indexer NOT enabled by default NOT enabled by default NOT enabled by default data to be indexed
Deployment server to
indexers
splunkd as a
forwarder
splunkd as an indexer NOT enabled by default NOT enabled by default NOT enabled by default Not recommended. Use Pass4SymmKey
instead.
http://docs.splunk.com/Documentation/Splunk/latest/Security/AboutsecuringyourSplunkconfigurationwithSSL

Data-at-Rest Integrity
Ways to ensure the integrity of your machine data stored in Splunk
Compute SHA256 hash for every slice in hot bucket
When bucket rolls from hot to warm, create SHA256 hash of the file
containing the hashes of the individual slices
Can verify integrity from the CLI
Enable for an entire index
11
http://docs.splunk.com/Documentation/Splunk/latest/Security/Dataintegritycontrol http://blogs.splunk.com/2015/10/28/data-integrity-is-back-baby/

Data-at-Rest Encryption
Entire data set
Encryption of all data Splunk writes to
disk (index, raw data, metadata)
Pros:
– Easy to implement with OS or device means
/ covers all data / transparent to Splunk
Cons:
– All indexes on a given file system /
performance overhead / limited security
against rogue users

Data-at-Rest Encryption
Transparent Encryption-at-Rest with Vormetrics
13
https://www.vormetric.com/sites/default/files/wp-splunk-vormetric.pdf

Data Obfuscation
within Splunk

What is Anonymization?
Anonymization of data means processing it with the aim of irreversibly
preventing the identification of the individual to whom it relates.
15
2016-12-24 09:00 host1 mm28522 login successful
2016-12-24 09:00 host1 ****** login successful

What is Pseudonymization?
Pseudonymization of data means replacing any identifying
characteristics of data with a pseudonym, or, in other words, a value
which does not allow the data subject to be directly identified.
16
2016-12-24 09:00 host1 mm28522 login successful
2016-12-24 09:00 host1 0fc43cd589ec74ddb677501adf6c295b login successful

Anonymization
At Rest / At Indexing Time / Modify Raw Events
SEDCMD or TRANSFORMS
props.conf
[source::.../accounts.log]
SEDCMD-accounts = s/ssn=d{5}(d{4})/ssn=xxxxx1/g
[source::.../another.log]
TRANSFORMS-anon=ssn-anon
transforms.conf
[ssn-anon]
REGEX=(ssn=)d{5}(d{4})
FORMAT=$1xxxxx$2
DEST_KEY=_raw
18
https://docs.splunk.com/Documentation/Splunk/latest/Data/Anonymizedata

Anonymization
Presentation Layer / At Search Time
Locked down User
– Pre-defined App with dashboard access only
– No search app, no raw search, no raw event drill down
| eval username = “******“
19
https://docs.splunk.com/Documentation/Splunk/6.5.1/Data/Anonymizedata

Pseudonymization
Presentation Layer / At Search Time
Locked down User
– Pre-defined App with dashboard access only
– No search app, no raw search, no raw event drill down
| eval username = sha256(username)
or use your own custom search command
21
https://docs.splunk.com/Documentation/Splunk/6.5.1/Data/Anonymizedata

Pseudonymization
At Source / Application
Data pseudonymization before Splunk picks it up
Pros:
– Managed earliest as possible in the process
– Data source owner responsible
– Data-Privacy challenge solved for data stored on
source as well
Cons:
– Individual solution per data source/type/method
required

Pseudonymization
Event Duplication Into Different Indexes
User authorization managed via role based
access control for indexes
Pros:
– Easy to implement and maintain, easy usability,
low complexity
Cons:
– Storage costs (can be limited with tsidx
retention but slower search)
– License costs
idx_cleartext
idx_pseudonym

Pseudonymization
Using Summary Index
Scheduled summary search transforms the
data and stores it in a new summary index
Pros:
– Summary index does not count against license
– Everything GUI managed
– Allows grouped aggregation (anonymization, too)
Cons:
– Regular search utilizing resources
– Breaks out-of-the-box CIM (source=search name,
sourcetype=stash, original sourcetype moved to
orig_sourcetype)
idx_cleartext
idx_summary

Pseudonymization
Modular Input
Data de-centralized piped through a custom
method using a modular input
Pros:
– High flexibility on encryption, hashing etc. methods
and requirements
– Processing can be done decentralized at each
forwarder to distribute processing load
Cons:
– Scripting required for modular inputs

Summing Up
Many possible ways – each has pros and cons
Anonymization
– Data aggregation might be an additional layer as specific access to a specific file
from a specific host does potentially allow identification back to an individual
Pseudonymization
– Requires a proper concept to ensure the pros and cons are known and accepted
in advance such that impact and additional complexity is understood in
production and operation use
We are transparent on possibilities, allow multiple ways and levels
which are available for data obfuscation.
Choose the best and most efficient
combination for you!

http://docs.splunk.com/Documentation/Splunk/latest/AdvancedDev/ModInputsIntro
Modular Input
Documentation

Modular Input
Search on Splunkbase
https://splunkbase.splunk.com/apps/#/search/Modular%20Input/

Protocol Data Inputs
Different input protocols
Custom data handler allows to
pre-process data
– Polyglot: many programming
languages can be used. E.g. Java,
JavaScript, Python, …
Different output protocols
Data Handler
https://splunkbase.splunk.com/app/1901/

Demo Scenarios
Encryption
Modular Input
Log file with sensitive data
Read log file data
File Monitor input (UF)
Protocol Data Inputs
Data Handler encrypts field values
Data sent and stored
Decryption
Custom Search Command
Events in Splunk with encrypted
field values
User is authorized to use custom
search command
Custom search command
Decrypts fields
Anonymization
SEDCMD
Log file with sensitive data
Read log file data
File Monitor Input (UF)
Pipeline
Apply SEDCMD and replace data
Data stored
32

Log File With Sensitive Data – cleartext.log
33
Field Description Action we want to take
first First name Encrypt with AES
name Last Name Encrypt with AES
dob Date of Birth Encrypt with AES
uid Employee ID Anonymize

UF File Monitor – Forward Data
34

Receiving side – Protocol Data Inputs
35

Protocol Data Inputs Configuration – Protocols
36

Protocol Data Inputs Configuration – Data Handler
37
Parameters for custom data handler:
• regex: identify fields to encrypt
• AES_Key_File: Key to use to encrypt
PDI Custom data handler (here: Java)

Decrypt Data – Custom Search Command
39

SEDCMD for Anonymization of uid Field (props.conf)
41

Splunk User Groups EMEA
43
https://usergroups.splunk.com/

Data Obfuscation in Splunk Enterprise: Anonymization, Pseudonymization and Encryption Techniques

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Data Obfuscation in Splunk Enterprise: Anonymization, Pseudonymization and Encryption Techniques

Semelhante a Data Obfuscation in Splunk Enterprise: Anonymization, Pseudonymization and Encryption Techniques (20)

Mais de jenny_splunk

Mais de jenny_splunk (6)

Último

Último (20)

Data Obfuscation in Splunk Enterprise: Anonymization, Pseudonymization and Encryption Techniques

Notas do Editor