SlideShare uma empresa Scribd logo
Stopping Ethan Hunt from
Taking Your Data!
Feni Chawla
Mukund Sarma
Who are we
Mukund Sarma
Senior Director of Product Security, Chime
● Security Engineer turned
manager
● Chime ← Credit Karma ←
Synopsys
● There are no Security problems
- They are all engineering and
culture problems!
Feni Chawla
Senior Security Engineer, Chime
● Data engineer turned security
engineer
● Chime ← Rally Health ←
Microsoft ← Teradata
● Passionate about keeping data
safe and user information
private
<¡Spoiler Alert!> How Ethan Hunt Steals Data from CIA
Gets past the following controls to get to the database:
I. Retinal scan
II. Double key card
III. Thermal and pressure sensors
IV. Laser rays
🗣 Feni
<¡Spoiler Alert!> How Ethan Hunt Steals Data from CIA
When he gets to the database:
🗣 Feni
<¡Spoiler Alert!> How Ethan Hunt Steals Data from CIA
When he gets to the database:
🗣 Feni
<¡Spoiler Alert!> How Ethan Hunt Steals Data from CIA
When he gets to the database:
🗣 Feni
<¡Spoiler Alert!> How Ethan Hunt Steals Data from CIA
When he gets to the database:
🗣 Feni
1996 Movie → 2024 Reality
🗣 Feni
1996 Movie → 2024 Reality
Snowflake + RDS + S3 + Databricks
+ Hex + Kafka + Airflow
+ Jupyter + Looker + Sagemaker
🗣 Feni
1996 Movie → 2024 Reality
Snowflake + RDS + S3 + Databricks
+ Hex + Kafka + Airflow
+ Jupyter + Looker + Sagemaker
+ Hugging Face + AWS Bedrock
….
🗣 Feni
Agenda
● Defining Data Security
● Unraveling the Roles and Responsibilities within Data team
● Why working with Data teams is different for Security teams
● Practical challenges of running a Data Security program
● How we approached building a pragmatic Data Security program
● How does “STRIDE” look for Data Security
● Closing thoughts
● Questions
🗣 Feni
Things That Could Be Their Own Talks
(What’s not in scope for this talk)
● Privacy engineering is not in scope
● AI and the implications of AI in
engineering and Security
● “Data Perimeters” or concepts of that in
the world of Public Cloud
🗣 Feni
Definitions (or rather a very brief outline)
● Data warehouse - a very large database that stores integrated data from multiple sources
● Snowflake - a popular, SaaS data warehouse
● ETL - a process of extracting data from various sources, transforming it into a format suitable for
analysis, and loading it somewhere, typically to a data warehouse
● Looker - a reporting & visualization tool that can connect to any database or data warehouse
● Data Lake - a large centralized repository that stores vast amounts of data, typically in their native format
🗣 Feni
Defining Data Security
Data security is the practice of protecting
Data from unauthorized access,
corruption, or theft throughout its
lifecycle.
🗣 Mukund
But why Data Security?
🗣 Mukund
What Makes Data Security Challenging
Data is often intangible
● It can mutate and be derived
● It can easily flow across boundaries
● No intrinsic constraints on data handling
🗣 Feni
Hi, how can I help you?
Can you provide your
order number?
I need help canceling
my last order
Sure, my SSN is 123-
45-6789
What Makes Data Security Challenging
Scope is wide, and growing
● Data is everywhere
● Ownership spans multiple teams
● Data team often has multiple functions & goals
🗣 Feni
What Makes Data Security Challenging
There aren’t many precedences one can learn from
● Traditional Security teams don’t understand the
data domain
● More often it's seen as a Compliance function
● Not enough “security” focused tutorials /
documentation
🗣 Mukund
Data Security
Infra
Security
App
Security
Unraveling the roles and responsibilities within data team
🗣 Feni
Engineers
Processing
Data Platform
Ops & SRE
Scientists
Modeling
ML & AI
Analysts Business Reporting
● Goal: Gather insights from datasets
● Datasets must be:
○ Acquired
○ Transformed
○ Maintained
● Insights must be:
○ Reliable
○ Timely
○ Easy to consume
○ Granular
Functional Roles Responsibilities
Unraveling Roles and Responsibilities within Data Team
🗣 Feni
Engineers
Processing
Data Platform
Ops & SRE
Scientists
Modeling
ML & AI
Analysts Business Reporting
Functional Roles Responsibilities
Unraveling Roles and Responsibilities within Data Team
Using data to improve marketing results
Build real-time view into
performance of digital ads by demographic
Improve performance amongst the 21-25 year
olds in metropolitans
Provide executive reporting on Q1 results
🗣 Feni
Engineers
Processing
Data Platform
Ops & SRE
Scientists
Modeling
ML & AI
Analysts Business Reporting
Unraveling Roles and Responsibilities within Data Team
Build real-time view into performance of digital ads by demographics
Campaign Data Proprietary Data
Data Lifecycle
Management
Indexing &
Cataloging
APIs &
Schedulers
Scrub, Normalize, Ingest
🗣 Feni
Engineers
Processing
Data Platform
Ops & SRE
Scientists
Modeling
ML & AI
Analysts Business Reporting
Unraveling Roles and Responsibilities within Data Team
Improve performance amongst the 21-25 year olds in metropolitans
Testing &
Sampling
Modeling
Analysis
Data Lake
🗣 Feni
Engineers
Processing
Data Platform
Ops & SRE
Scientists
Modeling
ML & AI
Analysts Business Reporting
Unraveling Roles and Responsibilities within Data Team
Provide executive reporting for Q1 results
Data Lake
Curated Data
🗣 Feni
Engineers
Processing
Data Platform
Ops & SRE
Scientists
Modeling
ML & AI
Analysts Business Reporting
Functional Roles Responsibilities
Bringing It All together From a Tooling Perspective
🗣 Feni
Working with data engineers is different for security teams
🗣 Mukund
Working with Data Engineering Teams is Different
from Infra and Software Engineering Teams
🗣 Mukund
Raise your hands if your company has:
- A dedicated security onboarding program
- A security training program that covers OWASP Top 10 or similar for
your engineers
- Built specific guardrails or tools for your developer
- A DevEx or DevRel team
Working with Data Engineering Teams is Different
from Infra and Software Engineering Teams
🗣 Mukund
Continue to keep your hands raised if:
- any of those were built keeping your data teams/data engineers in
mind
Working with Data Engineering Teams is Different
from Infra and Software Engineering Teams
Software & Infra Teams Data Teams
Ownership
Own the software, products and infra
systems they build and maintain
Typically do not own the data itself, just
process it or manage the underlying platform
Culture
Generally start with high specificity about
what they want to build or design
Generally start with low specificity and need
to explore the data to solidify requirements
Testing
Rarely need access to prod data, except
for specific troubleshooting purposes
Almost always need access to prod data
for modeling and validation
🗣 Mukund
Tooling - Common Security Tools Don’t Apply
Controls Product Engineering Data Teams
Tools the team uses
Most tools in use are mature and
established, especially in production
(eg Argo for K8s deployment, Rails,
etc)
No easy way to regulate tools
operating on datasets
Security Design Reviews
Clear, established practice coupled with
availability of frameworks like STRIDE
No established processes &
frameworks
Vulnerability Detection
Code scanners, SAST/DAST, pentests,
bug bounty etc
No way to detect problems in
tools and datasets
Asset Inventory
Generally easy to itemize services,
compute, infra, etc
No easy way to itemize datasets
and models
🗣 Mukund
Working with Data Engineering Teams is Different
from Infra and Software Engineering Teams
🗣 Mukund
We ought to come to terms with the fact that most companies and security
teams have not considered their data scientists and Data engineers as First
class citizens. This must change!
Practical challenges of running a data security program
IAM, Data Inventory (or lack thereof), SDLC, Culture
🗣 Feni
Practical Challenges of Running a Data Security Program
#1: IAM
● Most databases and platforms use their own
RBAC
● Securing service accounts is complex
● Limiting user permissions hinders productivity
Role 1 Role 2 Role 3 Role 4
Service
Role
🗣 Feni
Practical Challenges of Running a Data Security Program
#2: Data Inventory & Data Discovery
● Finding sensitive data is not easy
● Hard to get classification right
🗣 Feni
Practical Challenges of Running a Data Security Program
#2: Data Inventory & Data Discovery
● Finding sensitive data is not easy
● Hard to get classification right
🗣 Feni
In case you were
wondering, Waldo is
here!
Practical Challenges of Running a Data Security Program
#3: Redefining your SDLC to include the Data team
● Existing frameworks and processes don’t work well
○ Lack of an OWASP Top 10 equivalent
○ What does design reviews look like for a researcher?
● Traditional Security tooling is not built for identifying
Data Security issues
● Lack of security training programs catered to data
teams
○ Pragmatic data governance guidelines
○ stop using catch-all words like “PII” - complement with right tools
🗣 Mukund
Practical Challenges of Running a Data Security Program
#4: Culture
● Data team’s culture of exploration runs counter to security team’s culture of
enforcing least privilege
● Data teams aren’t used to Security being involved in their development lifecycle
● Security has focused a lot on Shift left and guardrails for our application and
infrastructure engineers to do things right. What about data?
● Is your team grounded in pragmatism?
🗣 Mukund
What should one do to solve data security
🗣 Feni
What Should One Do? The technical stuff…
Approach Notes
Inventory
Data Discovery Understand where all sensitive data is
Role Discovery Understand who has what access privileges
Access
Control
Isolate Individual Services Ensure all ETL jobs use unique credentials
JIT Access Approvals Extend JIT access controls to all data users
Least Privileged Access Require users to assume least privilege role
Segmentation
Data Segregation Restrict data to specific locations based on risk
Client Segmentation Limit specific clients to specific data based on risk
Increased
complexity
🗣 Feni
What Should One Do? The technical stuff…
Approach Notes
Inventory
Data Discovery Understand where all sensitive data is
Role Discovery Understand who has what access privileges
Access
Control
Isolate Individual Services Ensure all ETL jobs use unique credentials
JIT Access Approvals Extend JIT access controls to all data users
Least Privileged Access Require users to assume least privilege role
Segmentation
Data Segregation Restrict data to specific locations based on risk
Client Segmentation Limit specific clients to specific data based on risk
Increased
complexity
🗣 Feni
What Should One Do? The technical stuff…
Approach Notes
Inventory
Data Discovery Understand where all sensitive data is
Role Discovery Understand who has what access privileges
Access
Control
Isolate Individual Services Ensure all ETL jobs use unique credentials
JIT Access Approvals Extend JIT access controls to all data users
Least Privileged Access Require users to assume least privilege role
Segmentation
Data Segregation Restrict data to specific locations based on risk
Client Segmentation Limit specific clients to specific data based on risk
Increased
complexity
🗣 Feni
What Should One Do? The technical stuff…
Approach Notes
Inventory
Data Discovery Understand where all sensitive data is
Role Discovery Understand who has what access privileges
Access
Control
Isolate Individual Services Ensure all ETL jobs use unique credentials
JIT Access Approvals Extend JIT access controls to all data users
Least Privileged Access Require users to assume least privilege role
Segmentation
Data Segregation Restrict data to specific locations based on risk
Client Segmentation Limit specific clients to specific data based on risk
Increased
complexity
🗣 Feni
What Should One Do? The process stuff…
Approach Notes
Data
Environment
Data Minimization Only collect data that is absolutely needed
Terraform Modules with
Secure Defaults
Shift-left, emulate what worked in AppSec and Cloud Security
Tooling IAM Helper Tooling Provide tools that enable least privilege role selection
Documentation
Tooling Documentation Ask vendors for best practices for using their tools securely
Tutorials Ask vendors to provide tutorials on security configurations
Runbooks Provide practical operational guidance through runbooks
Increased
complexity
🗣 Feni
What Should One Do? The process stuff…
Approach Notes
Data
Environment
Data Minimization Only collect data that is absolutely needed
Terraform Modules with
Secure Defaults
Shift-left, emulate what worked in AppSec and Cloud Security
Tooling IAM Helper Tooling Provide tools that enable least privilege role selection
Documentation
Tooling Documentation Ask vendors for best practices for using their tools securely
Tutorials Ask vendors to provide tutorials on security configurations
Runbooks Provide practical operational guidance through runbooks
Increased
complexity
🗣 Feni
What Should One Do? The process stuff…
Approach Notes
Model Builder
Environment
Data Minimization Only collect the data absolutely needed for model building
Terraform Modules with
Secure Defaults
Shift-left, emulate what worked in AppSec and Cloud Security
Tooling IAM Helper Tooling Provide tools that enable least privilege role selection
Documentation
Tooling Documentation Ask vendors for best practices for using their tools securely
Tutorials Ask vendors to provide tutorials on security configurations
Runbooks Provide practical operational guidance through runbooks
Increased
complexity
🗣 Feni
What Should One Do? The process stuff…
Approach Notes
Model Builder
Environment
Data Minimization Only collect the data absolutely needed for model building
Terraform Modules with
Secure Defaults
Shift-left, emulate what worked in AppSec and Cloud Security
Tooling IAM Helper Tooling Provide tools that enable least privilege role selection
Documentation
Tooling Documentation Ask vendors for best practices for using their tools securely
Tutorials Ask vendors to provide tutorials on security configurations
Runbooks Provide practical operational guidance through runbooks
Increased
complexity
🗣 Feni
What Should One Do? The people stuff…
Approach
Invest in building bridges
Collaboration > Security
Transparency - One can’t fix the issues they can’t see
Help do the job - builds empathy and confidence
Understand that we as an industry have neglected Data teams all along
Plan in the open
Teach them to fish - don’t serve them fish
Increased
complexity
🗣 Mukund
Collaborative Threat Modeling with STRIDE
Threat modeling is not just for applications and infra
🗣 Mukund
But why run collaborative threat models at all in the first place?
Data teams haven’t had to work with the Security teams on an ongoing basis
We’re all trying to figure out how we work - Having this as an activity opens up
opportunities for collaboration
🗣 Mukund
Collaborative Threat Modeling with STRIDE
Threat
Spoofing
Tampering
Repudiation
Information Disclosure
Denial of Service
Elevation of Privileges
● Goal:
○ Apply STRIDE-like model to data, in addition to
apps, services and infrastructure
○ Strategically focus on threat patterns that are not
covered in application or infra security, e.g.:
■ SQL injection is accounted for in appsec
🗣 Feni
Collaborative Threat Modeling with STRIDE
Threat Applied to Data Security Examples
Spoofing ● Using service accounts to
access data directly
● Looker user runs query on
Snowflake using Looker role
● Looker admin logs into Snowflake
using Looker service account
credentials
Tampering
Repudiation
Information Disclosure
Denial of Service
Elevation of Privileges
🗣 Feni
Collaborative Threat Modeling with STRIDE
Threat Applied to Data Security Examples
Spoofing ● DB admin deleting or modifying
data for personal gain
● Application bug resulting in data
being corrupted or deleted
● Creating false data for hijacking
ML model
● IT admin deleting or overwriting
audit logs
● DBA at a bank modifies balance
information
● Complex ETL job deletes certain
records while loading them to
destination
● DBA creates false data for training
ML model that causes fraudulent
transactions to be undetected
Tampering
Repudiation
Information Disclosure
Denial of Service
Elevation of Privileges
🗣 Feni
Collaborative Threat Modeling with STRIDE
Threat Applied to Data Security Examples
Spoofing ● All activity managed and
monitored using ROLES within
databases, instead of user
● User could successively assume
different shared roles, performing
operations piecemeal with each
role, making repudiation very
challenging
Tampering
Repudiation
Information Disclosure
Denial of Service
Elevation of Privileges
🗣 Feni
Collaborative Threat Modeling with STRIDE
Threat Applied to Data Security Examples
Spoofing ● Sensitive data flows from
production to analytics
dashboard
● Sensitive production data used
in dev for testing or
troubleshooting
● Sensitive data copied to an
accessible location using
service account
● ETL pipeline scrubs PII from SSN
column in PG, but not from JSON
fields, which results in SSN being
present in dashboard
● Developers creating copy of
sensitive data in development for
testing
● Developer running ETL job to
copy restricted sensitive data into
an S3 bucket that they have
access to
Tampering
Repudiation
Information Disclosure
Denial of Service
Elevation of Privileges
🗣 Feni
Collaborative Threat Modeling with STRIDE
Threat Applied to Data Security Examples
Spoofing ● Processing job blows up due to
complex computation on large
dataset
● A single Spark job overwhelms
AWS resources, resulting in
cascading failures across other
jobs
Tampering
Repudiation
Information Disclosure
Denial of Service
Elevation of Privileges
🗣 Feni
Collaborative Threat Modeling with STRIDE
Threat Applied to Data Security Examples
Spoofing ● Admin creating backdoors on
databases
● Using service accounts to
access data
● DBA or readwrite user GRANTing
increased privileges to an
alternate role
● Looker admin logs into Snowflake
using Looker service account
credentials, and can read data
that they do not have access to
directly
Tampering
Repudiation
Information Disclosure
Denial of Service
Elevation of Privileges
🗣 Feni
Putting it all together..
● STRIDE is a valuable framework for security and data teams to work
together to conceptualize and subsequently secure against threats
● There are other frameworks too that one could use, eg:
○ DREAD - developed by the company which is graciously hosting us today
○ PASTA - italian food will not taste the same now that you have heard about this
○ VAST - why did the pond break up with the vast ocean? Because it was too shallow
🗣 Feni
Parting Thoughts
● Data tools are neither infra tools, nor app tools -
need to approach them differently
● Concepts from infra, cloud, and app security help,
but need to properly thought through when
applying to data
● Security is important to everyone, but often not at
the cost of productivity
🗣 Feni
Parting Thoughts
● Start small - You will have to do a lot of heavy
lifting yourselves initially
● Share/include the data team with everything
you do
● Build developer focused tools that will help
them do what they need to do securely
● Be pragmatic
● Kudos go a long way!
● Ask for help - Builds empathy
● Share what worked/didn’t work with the
industry - we’re all in this together!
🗣 Mukund
Feedback / Questions?
feni.chawla@chime.com
mukund.sarma@chime.com / X- @spashtata
We’re hiring! - www.chime.com/careers
Appendix

Mais conteúdo relacionado

Semelhante a BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx

SplunkLive! Paris 2018: Splunk And AI 101
SplunkLive! Paris 2018: Splunk And AI 101SplunkLive! Paris 2018: Splunk And AI 101
SplunkLive! Paris 2018: Splunk And AI 101
Splunk
 
Protecting endpoints from targeted attacks
Protecting endpoints from targeted attacksProtecting endpoints from targeted attacks
Protecting endpoints from targeted attacks
AppSense
 
Top learnings from evaluating and implementing a DLP Solution
Top learnings from evaluating and implementing a DLP Solution Top learnings from evaluating and implementing a DLP Solution
Top learnings from evaluating and implementing a DLP Solution
Priyanka Aash
 
SplunkLive! Paris 2018: Legacy SIEM to Splunk
SplunkLive! Paris 2018: Legacy SIEM to SplunkSplunkLive! Paris 2018: Legacy SIEM to Splunk
SplunkLive! Paris 2018: Legacy SIEM to Splunk
Splunk
 
Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptx
AbderrahmanABID2
 
Chapter 2.pdf
Chapter 2.pdfChapter 2.pdf
Chapter 2.pdf
AnisZahirahAzman
 
Tackling data security
Tackling data securityTackling data security
Tackling data security
Peter Bassill
 
How Can Analytics Improve Business?
How Can Analytics Improve Business?How Can Analytics Improve Business?
How Can Analytics Improve Business?
Inside Analysis
 
STEALTHbits Sensitive Data Discovery Solutions
STEALTHbits Sensitive Data Discovery SolutionsSTEALTHbits Sensitive Data Discovery Solutions
STEALTHbits Sensitive Data Discovery Solutions
STEALTHbits Technologies
 
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
arpit206900
 
Aligning Application Security to Compliance
Aligning Application Security to ComplianceAligning Application Security to Compliance
Aligning Application Security to Compliance
Security Innovation
 
SplunkLive! Zurich 2018: Get More From Your Machine Data with Splunk & AI
SplunkLive! Zurich 2018: Get More From Your Machine Data with Splunk & AISplunkLive! Zurich 2018: Get More From Your Machine Data with Splunk & AI
SplunkLive! Zurich 2018: Get More From Your Machine Data with Splunk & AI
Splunk
 
Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...
Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...
Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...
Chief Analytics Officer Forum
 
A guide to Sustainable Cyber Security
A guide to Sustainable Cyber SecurityA guide to Sustainable Cyber Security
A guide to Sustainable Cyber Security
Ernest Staats
 
SplunkLive! Munich 2018: Get More From Your Machine Data Splunk & AI
SplunkLive! Munich 2018: Get More From Your Machine Data Splunk & AISplunkLive! Munich 2018: Get More From Your Machine Data Splunk & AI
SplunkLive! Munich 2018: Get More From Your Machine Data Splunk & AI
Splunk
 
Agile data science
Agile data scienceAgile data science
Agile data science
Joel Horwitz
 
Data Science- Basics.pptx
Data Science- Basics.pptxData Science- Basics.pptx
Data Science- Basics.pptx
RupaliKute3
 
Data Analytics Today - Data, Tech, and Regulation.pdf
Data Analytics Today - Data, Tech, and Regulation.pdfData Analytics Today - Data, Tech, and Regulation.pdf
Data Analytics Today - Data, Tech, and Regulation.pdf
Hendri Karisma
 
Data - Science and Engineering slide at Bandungpy Sharing Session
Data - Science and Engineering slide at Bandungpy Sharing SessionData - Science and Engineering slide at Bandungpy Sharing Session
Data - Science and Engineering slide at Bandungpy Sharing Session
Hendri Karisma
 
Privacy for tech startups
Privacy for tech startups Privacy for tech startups
Privacy for tech startups
Marc Gallardo
 

Semelhante a BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx (20)

SplunkLive! Paris 2018: Splunk And AI 101
SplunkLive! Paris 2018: Splunk And AI 101SplunkLive! Paris 2018: Splunk And AI 101
SplunkLive! Paris 2018: Splunk And AI 101
 
Protecting endpoints from targeted attacks
Protecting endpoints from targeted attacksProtecting endpoints from targeted attacks
Protecting endpoints from targeted attacks
 
Top learnings from evaluating and implementing a DLP Solution
Top learnings from evaluating and implementing a DLP Solution Top learnings from evaluating and implementing a DLP Solution
Top learnings from evaluating and implementing a DLP Solution
 
SplunkLive! Paris 2018: Legacy SIEM to Splunk
SplunkLive! Paris 2018: Legacy SIEM to SplunkSplunkLive! Paris 2018: Legacy SIEM to Splunk
SplunkLive! Paris 2018: Legacy SIEM to Splunk
 
Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptx
 
Chapter 2.pdf
Chapter 2.pdfChapter 2.pdf
Chapter 2.pdf
 
Tackling data security
Tackling data securityTackling data security
Tackling data security
 
How Can Analytics Improve Business?
How Can Analytics Improve Business?How Can Analytics Improve Business?
How Can Analytics Improve Business?
 
STEALTHbits Sensitive Data Discovery Solutions
STEALTHbits Sensitive Data Discovery SolutionsSTEALTHbits Sensitive Data Discovery Solutions
STEALTHbits Sensitive Data Discovery Solutions
 
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
 
Aligning Application Security to Compliance
Aligning Application Security to ComplianceAligning Application Security to Compliance
Aligning Application Security to Compliance
 
SplunkLive! Zurich 2018: Get More From Your Machine Data with Splunk & AI
SplunkLive! Zurich 2018: Get More From Your Machine Data with Splunk & AISplunkLive! Zurich 2018: Get More From Your Machine Data with Splunk & AI
SplunkLive! Zurich 2018: Get More From Your Machine Data with Splunk & AI
 
Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...
Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...
Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...
 
A guide to Sustainable Cyber Security
A guide to Sustainable Cyber SecurityA guide to Sustainable Cyber Security
A guide to Sustainable Cyber Security
 
SplunkLive! Munich 2018: Get More From Your Machine Data Splunk & AI
SplunkLive! Munich 2018: Get More From Your Machine Data Splunk & AISplunkLive! Munich 2018: Get More From Your Machine Data Splunk & AI
SplunkLive! Munich 2018: Get More From Your Machine Data Splunk & AI
 
Agile data science
Agile data scienceAgile data science
Agile data science
 
Data Science- Basics.pptx
Data Science- Basics.pptxData Science- Basics.pptx
Data Science- Basics.pptx
 
Data Analytics Today - Data, Tech, and Regulation.pdf
Data Analytics Today - Data, Tech, and Regulation.pdfData Analytics Today - Data, Tech, and Regulation.pdf
Data Analytics Today - Data, Tech, and Regulation.pdf
 
Data - Science and Engineering slide at Bandungpy Sharing Session
Data - Science and Engineering slide at Bandungpy Sharing SessionData - Science and Engineering slide at Bandungpy Sharing Session
Data - Science and Engineering slide at Bandungpy Sharing Session
 
Privacy for tech startups
Privacy for tech startups Privacy for tech startups
Privacy for tech startups
 

Último

Software Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.pptSoftware Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.ppt
TaghreedAltamimi
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
Madan Karki
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Sinan KOZAK
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
Gino153088
 
Seminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptxSeminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptx
Madan Karki
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
IJECEIAES
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
Nada Hikmah
 
Material for memory and display system h
Material for memory and display system hMaterial for memory and display system h
Material for memory and display system h
gowrishankartb2005
 
artificial intelligence and data science contents.pptx
artificial intelligence and data science contents.pptxartificial intelligence and data science contents.pptx
artificial intelligence and data science contents.pptx
GauravCar
 
Introduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptxIntroduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptx
MiscAnnoy1
 
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
Las Vegas Warehouse
 
Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
UReason
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
Roger Rozario
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
Hematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood CountHematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood Count
shahdabdulbaset
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
shadow0702a
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
co23btech11018
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
Yasser Mahgoub
 
Welding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdfWelding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdf
AjmalKhan50578
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
Hitesh Mohapatra
 

Último (20)

Software Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.pptSoftware Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.ppt
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
 
Seminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptxSeminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptx
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
 
Material for memory and display system h
Material for memory and display system hMaterial for memory and display system h
Material for memory and display system h
 
artificial intelligence and data science contents.pptx
artificial intelligence and data science contents.pptxartificial intelligence and data science contents.pptx
artificial intelligence and data science contents.pptx
 
Introduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptxIntroduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptx
 
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
 
Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
Hematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood CountHematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood Count
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
 
Welding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdfWelding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdf
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
 

BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx

  • 1. Stopping Ethan Hunt from Taking Your Data! Feni Chawla Mukund Sarma
  • 2. Who are we Mukund Sarma Senior Director of Product Security, Chime ● Security Engineer turned manager ● Chime ← Credit Karma ← Synopsys ● There are no Security problems - They are all engineering and culture problems! Feni Chawla Senior Security Engineer, Chime ● Data engineer turned security engineer ● Chime ← Rally Health ← Microsoft ← Teradata ● Passionate about keeping data safe and user information private
  • 3. <¡Spoiler Alert!> How Ethan Hunt Steals Data from CIA Gets past the following controls to get to the database: I. Retinal scan II. Double key card III. Thermal and pressure sensors IV. Laser rays 🗣 Feni
  • 4. <¡Spoiler Alert!> How Ethan Hunt Steals Data from CIA When he gets to the database: 🗣 Feni
  • 5. <¡Spoiler Alert!> How Ethan Hunt Steals Data from CIA When he gets to the database: 🗣 Feni
  • 6. <¡Spoiler Alert!> How Ethan Hunt Steals Data from CIA When he gets to the database: 🗣 Feni
  • 7. <¡Spoiler Alert!> How Ethan Hunt Steals Data from CIA When he gets to the database: 🗣 Feni
  • 8. 1996 Movie → 2024 Reality 🗣 Feni
  • 9. 1996 Movie → 2024 Reality Snowflake + RDS + S3 + Databricks + Hex + Kafka + Airflow + Jupyter + Looker + Sagemaker 🗣 Feni
  • 10. 1996 Movie → 2024 Reality Snowflake + RDS + S3 + Databricks + Hex + Kafka + Airflow + Jupyter + Looker + Sagemaker + Hugging Face + AWS Bedrock …. 🗣 Feni
  • 11. Agenda ● Defining Data Security ● Unraveling the Roles and Responsibilities within Data team ● Why working with Data teams is different for Security teams ● Practical challenges of running a Data Security program ● How we approached building a pragmatic Data Security program ● How does “STRIDE” look for Data Security ● Closing thoughts ● Questions 🗣 Feni
  • 12. Things That Could Be Their Own Talks (What’s not in scope for this talk) ● Privacy engineering is not in scope ● AI and the implications of AI in engineering and Security ● “Data Perimeters” or concepts of that in the world of Public Cloud 🗣 Feni
  • 13. Definitions (or rather a very brief outline) ● Data warehouse - a very large database that stores integrated data from multiple sources ● Snowflake - a popular, SaaS data warehouse ● ETL - a process of extracting data from various sources, transforming it into a format suitable for analysis, and loading it somewhere, typically to a data warehouse ● Looker - a reporting & visualization tool that can connect to any database or data warehouse ● Data Lake - a large centralized repository that stores vast amounts of data, typically in their native format 🗣 Feni
  • 14. Defining Data Security Data security is the practice of protecting Data from unauthorized access, corruption, or theft throughout its lifecycle. 🗣 Mukund
  • 15. But why Data Security? 🗣 Mukund
  • 16. What Makes Data Security Challenging Data is often intangible ● It can mutate and be derived ● It can easily flow across boundaries ● No intrinsic constraints on data handling 🗣 Feni Hi, how can I help you? Can you provide your order number? I need help canceling my last order Sure, my SSN is 123- 45-6789
  • 17. What Makes Data Security Challenging Scope is wide, and growing ● Data is everywhere ● Ownership spans multiple teams ● Data team often has multiple functions & goals 🗣 Feni
  • 18. What Makes Data Security Challenging There aren’t many precedences one can learn from ● Traditional Security teams don’t understand the data domain ● More often it's seen as a Compliance function ● Not enough “security” focused tutorials / documentation 🗣 Mukund Data Security Infra Security App Security
  • 19. Unraveling the roles and responsibilities within data team 🗣 Feni
  • 20. Engineers Processing Data Platform Ops & SRE Scientists Modeling ML & AI Analysts Business Reporting ● Goal: Gather insights from datasets ● Datasets must be: ○ Acquired ○ Transformed ○ Maintained ● Insights must be: ○ Reliable ○ Timely ○ Easy to consume ○ Granular Functional Roles Responsibilities Unraveling Roles and Responsibilities within Data Team 🗣 Feni
  • 21. Engineers Processing Data Platform Ops & SRE Scientists Modeling ML & AI Analysts Business Reporting Functional Roles Responsibilities Unraveling Roles and Responsibilities within Data Team Using data to improve marketing results Build real-time view into performance of digital ads by demographic Improve performance amongst the 21-25 year olds in metropolitans Provide executive reporting on Q1 results 🗣 Feni
  • 22. Engineers Processing Data Platform Ops & SRE Scientists Modeling ML & AI Analysts Business Reporting Unraveling Roles and Responsibilities within Data Team Build real-time view into performance of digital ads by demographics Campaign Data Proprietary Data Data Lifecycle Management Indexing & Cataloging APIs & Schedulers Scrub, Normalize, Ingest 🗣 Feni
  • 23. Engineers Processing Data Platform Ops & SRE Scientists Modeling ML & AI Analysts Business Reporting Unraveling Roles and Responsibilities within Data Team Improve performance amongst the 21-25 year olds in metropolitans Testing & Sampling Modeling Analysis Data Lake 🗣 Feni
  • 24. Engineers Processing Data Platform Ops & SRE Scientists Modeling ML & AI Analysts Business Reporting Unraveling Roles and Responsibilities within Data Team Provide executive reporting for Q1 results Data Lake Curated Data 🗣 Feni
  • 25. Engineers Processing Data Platform Ops & SRE Scientists Modeling ML & AI Analysts Business Reporting Functional Roles Responsibilities Bringing It All together From a Tooling Perspective 🗣 Feni
  • 26. Working with data engineers is different for security teams 🗣 Mukund
  • 27. Working with Data Engineering Teams is Different from Infra and Software Engineering Teams 🗣 Mukund Raise your hands if your company has: - A dedicated security onboarding program - A security training program that covers OWASP Top 10 or similar for your engineers - Built specific guardrails or tools for your developer - A DevEx or DevRel team
  • 28. Working with Data Engineering Teams is Different from Infra and Software Engineering Teams 🗣 Mukund Continue to keep your hands raised if: - any of those were built keeping your data teams/data engineers in mind
  • 29. Working with Data Engineering Teams is Different from Infra and Software Engineering Teams Software & Infra Teams Data Teams Ownership Own the software, products and infra systems they build and maintain Typically do not own the data itself, just process it or manage the underlying platform Culture Generally start with high specificity about what they want to build or design Generally start with low specificity and need to explore the data to solidify requirements Testing Rarely need access to prod data, except for specific troubleshooting purposes Almost always need access to prod data for modeling and validation 🗣 Mukund
  • 30. Tooling - Common Security Tools Don’t Apply Controls Product Engineering Data Teams Tools the team uses Most tools in use are mature and established, especially in production (eg Argo for K8s deployment, Rails, etc) No easy way to regulate tools operating on datasets Security Design Reviews Clear, established practice coupled with availability of frameworks like STRIDE No established processes & frameworks Vulnerability Detection Code scanners, SAST/DAST, pentests, bug bounty etc No way to detect problems in tools and datasets Asset Inventory Generally easy to itemize services, compute, infra, etc No easy way to itemize datasets and models 🗣 Mukund
  • 31. Working with Data Engineering Teams is Different from Infra and Software Engineering Teams 🗣 Mukund We ought to come to terms with the fact that most companies and security teams have not considered their data scientists and Data engineers as First class citizens. This must change!
  • 32. Practical challenges of running a data security program IAM, Data Inventory (or lack thereof), SDLC, Culture 🗣 Feni
  • 33. Practical Challenges of Running a Data Security Program #1: IAM ● Most databases and platforms use their own RBAC ● Securing service accounts is complex ● Limiting user permissions hinders productivity Role 1 Role 2 Role 3 Role 4 Service Role 🗣 Feni
  • 34. Practical Challenges of Running a Data Security Program #2: Data Inventory & Data Discovery ● Finding sensitive data is not easy ● Hard to get classification right 🗣 Feni
  • 35. Practical Challenges of Running a Data Security Program #2: Data Inventory & Data Discovery ● Finding sensitive data is not easy ● Hard to get classification right 🗣 Feni In case you were wondering, Waldo is here!
  • 36. Practical Challenges of Running a Data Security Program #3: Redefining your SDLC to include the Data team ● Existing frameworks and processes don’t work well ○ Lack of an OWASP Top 10 equivalent ○ What does design reviews look like for a researcher? ● Traditional Security tooling is not built for identifying Data Security issues ● Lack of security training programs catered to data teams ○ Pragmatic data governance guidelines ○ stop using catch-all words like “PII” - complement with right tools 🗣 Mukund
  • 37. Practical Challenges of Running a Data Security Program #4: Culture ● Data team’s culture of exploration runs counter to security team’s culture of enforcing least privilege ● Data teams aren’t used to Security being involved in their development lifecycle ● Security has focused a lot on Shift left and guardrails for our application and infrastructure engineers to do things right. What about data? ● Is your team grounded in pragmatism? 🗣 Mukund
  • 38. What should one do to solve data security 🗣 Feni
  • 39. What Should One Do? The technical stuff… Approach Notes Inventory Data Discovery Understand where all sensitive data is Role Discovery Understand who has what access privileges Access Control Isolate Individual Services Ensure all ETL jobs use unique credentials JIT Access Approvals Extend JIT access controls to all data users Least Privileged Access Require users to assume least privilege role Segmentation Data Segregation Restrict data to specific locations based on risk Client Segmentation Limit specific clients to specific data based on risk Increased complexity 🗣 Feni
  • 40. What Should One Do? The technical stuff… Approach Notes Inventory Data Discovery Understand where all sensitive data is Role Discovery Understand who has what access privileges Access Control Isolate Individual Services Ensure all ETL jobs use unique credentials JIT Access Approvals Extend JIT access controls to all data users Least Privileged Access Require users to assume least privilege role Segmentation Data Segregation Restrict data to specific locations based on risk Client Segmentation Limit specific clients to specific data based on risk Increased complexity 🗣 Feni
  • 41. What Should One Do? The technical stuff… Approach Notes Inventory Data Discovery Understand where all sensitive data is Role Discovery Understand who has what access privileges Access Control Isolate Individual Services Ensure all ETL jobs use unique credentials JIT Access Approvals Extend JIT access controls to all data users Least Privileged Access Require users to assume least privilege role Segmentation Data Segregation Restrict data to specific locations based on risk Client Segmentation Limit specific clients to specific data based on risk Increased complexity 🗣 Feni
  • 42. What Should One Do? The technical stuff… Approach Notes Inventory Data Discovery Understand where all sensitive data is Role Discovery Understand who has what access privileges Access Control Isolate Individual Services Ensure all ETL jobs use unique credentials JIT Access Approvals Extend JIT access controls to all data users Least Privileged Access Require users to assume least privilege role Segmentation Data Segregation Restrict data to specific locations based on risk Client Segmentation Limit specific clients to specific data based on risk Increased complexity 🗣 Feni
  • 43. What Should One Do? The process stuff… Approach Notes Data Environment Data Minimization Only collect data that is absolutely needed Terraform Modules with Secure Defaults Shift-left, emulate what worked in AppSec and Cloud Security Tooling IAM Helper Tooling Provide tools that enable least privilege role selection Documentation Tooling Documentation Ask vendors for best practices for using their tools securely Tutorials Ask vendors to provide tutorials on security configurations Runbooks Provide practical operational guidance through runbooks Increased complexity 🗣 Feni
  • 44. What Should One Do? The process stuff… Approach Notes Data Environment Data Minimization Only collect data that is absolutely needed Terraform Modules with Secure Defaults Shift-left, emulate what worked in AppSec and Cloud Security Tooling IAM Helper Tooling Provide tools that enable least privilege role selection Documentation Tooling Documentation Ask vendors for best practices for using their tools securely Tutorials Ask vendors to provide tutorials on security configurations Runbooks Provide practical operational guidance through runbooks Increased complexity 🗣 Feni
  • 45. What Should One Do? The process stuff… Approach Notes Model Builder Environment Data Minimization Only collect the data absolutely needed for model building Terraform Modules with Secure Defaults Shift-left, emulate what worked in AppSec and Cloud Security Tooling IAM Helper Tooling Provide tools that enable least privilege role selection Documentation Tooling Documentation Ask vendors for best practices for using their tools securely Tutorials Ask vendors to provide tutorials on security configurations Runbooks Provide practical operational guidance through runbooks Increased complexity 🗣 Feni
  • 46. What Should One Do? The process stuff… Approach Notes Model Builder Environment Data Minimization Only collect the data absolutely needed for model building Terraform Modules with Secure Defaults Shift-left, emulate what worked in AppSec and Cloud Security Tooling IAM Helper Tooling Provide tools that enable least privilege role selection Documentation Tooling Documentation Ask vendors for best practices for using their tools securely Tutorials Ask vendors to provide tutorials on security configurations Runbooks Provide practical operational guidance through runbooks Increased complexity 🗣 Feni
  • 47. What Should One Do? The people stuff… Approach Invest in building bridges Collaboration > Security Transparency - One can’t fix the issues they can’t see Help do the job - builds empathy and confidence Understand that we as an industry have neglected Data teams all along Plan in the open Teach them to fish - don’t serve them fish Increased complexity 🗣 Mukund
  • 48. Collaborative Threat Modeling with STRIDE Threat modeling is not just for applications and infra 🗣 Mukund
  • 49. But why run collaborative threat models at all in the first place? Data teams haven’t had to work with the Security teams on an ongoing basis We’re all trying to figure out how we work - Having this as an activity opens up opportunities for collaboration 🗣 Mukund
  • 50. Collaborative Threat Modeling with STRIDE Threat Spoofing Tampering Repudiation Information Disclosure Denial of Service Elevation of Privileges ● Goal: ○ Apply STRIDE-like model to data, in addition to apps, services and infrastructure ○ Strategically focus on threat patterns that are not covered in application or infra security, e.g.: ■ SQL injection is accounted for in appsec 🗣 Feni
  • 51. Collaborative Threat Modeling with STRIDE Threat Applied to Data Security Examples Spoofing ● Using service accounts to access data directly ● Looker user runs query on Snowflake using Looker role ● Looker admin logs into Snowflake using Looker service account credentials Tampering Repudiation Information Disclosure Denial of Service Elevation of Privileges 🗣 Feni
  • 52. Collaborative Threat Modeling with STRIDE Threat Applied to Data Security Examples Spoofing ● DB admin deleting or modifying data for personal gain ● Application bug resulting in data being corrupted or deleted ● Creating false data for hijacking ML model ● IT admin deleting or overwriting audit logs ● DBA at a bank modifies balance information ● Complex ETL job deletes certain records while loading them to destination ● DBA creates false data for training ML model that causes fraudulent transactions to be undetected Tampering Repudiation Information Disclosure Denial of Service Elevation of Privileges 🗣 Feni
  • 53. Collaborative Threat Modeling with STRIDE Threat Applied to Data Security Examples Spoofing ● All activity managed and monitored using ROLES within databases, instead of user ● User could successively assume different shared roles, performing operations piecemeal with each role, making repudiation very challenging Tampering Repudiation Information Disclosure Denial of Service Elevation of Privileges 🗣 Feni
  • 54. Collaborative Threat Modeling with STRIDE Threat Applied to Data Security Examples Spoofing ● Sensitive data flows from production to analytics dashboard ● Sensitive production data used in dev for testing or troubleshooting ● Sensitive data copied to an accessible location using service account ● ETL pipeline scrubs PII from SSN column in PG, but not from JSON fields, which results in SSN being present in dashboard ● Developers creating copy of sensitive data in development for testing ● Developer running ETL job to copy restricted sensitive data into an S3 bucket that they have access to Tampering Repudiation Information Disclosure Denial of Service Elevation of Privileges 🗣 Feni
  • 55. Collaborative Threat Modeling with STRIDE Threat Applied to Data Security Examples Spoofing ● Processing job blows up due to complex computation on large dataset ● A single Spark job overwhelms AWS resources, resulting in cascading failures across other jobs Tampering Repudiation Information Disclosure Denial of Service Elevation of Privileges 🗣 Feni
  • 56. Collaborative Threat Modeling with STRIDE Threat Applied to Data Security Examples Spoofing ● Admin creating backdoors on databases ● Using service accounts to access data ● DBA or readwrite user GRANTing increased privileges to an alternate role ● Looker admin logs into Snowflake using Looker service account credentials, and can read data that they do not have access to directly Tampering Repudiation Information Disclosure Denial of Service Elevation of Privileges 🗣 Feni
  • 57. Putting it all together.. ● STRIDE is a valuable framework for security and data teams to work together to conceptualize and subsequently secure against threats ● There are other frameworks too that one could use, eg: ○ DREAD - developed by the company which is graciously hosting us today ○ PASTA - italian food will not taste the same now that you have heard about this ○ VAST - why did the pond break up with the vast ocean? Because it was too shallow 🗣 Feni
  • 58. Parting Thoughts ● Data tools are neither infra tools, nor app tools - need to approach them differently ● Concepts from infra, cloud, and app security help, but need to properly thought through when applying to data ● Security is important to everyone, but often not at the cost of productivity 🗣 Feni
  • 59. Parting Thoughts ● Start small - You will have to do a lot of heavy lifting yourselves initially ● Share/include the data team with everything you do ● Build developer focused tools that will help them do what they need to do securely ● Be pragmatic ● Kudos go a long way! ● Ask for help - Builds empathy ● Share what worked/didn’t work with the industry - we’re all in this together! 🗣 Mukund
  • 60. Feedback / Questions? feni.chawla@chime.com mukund.sarma@chime.com / X- @spashtata We’re hiring! - www.chime.com/careers

Notas do Editor

  1. When Mission Impossible 1 came out in 1996, I saw the whole movie as a kid with my jaw stuck to the floor. And one of the most memorable scenes was how Tom Cruise, playing the role of Ethan Hunt, steals data from the CIA. He is able to circumvent lots of security checks like retina scans, thermal sensors and laser rays
  2. and then finally gets in front of the computer, enters stolen credentials
  3. get access to the data which is very neatly organized for him
  4. He opens the file to look at the data in plain text
  5. And finally copies it to a floppy disk and runs away with it
  6. Now, almost 30 years later, the imaginary risk of Ethan Hunt has been replaced by hacker farms across the world
  7. and instead of an IBM PC, the data for most organizations is sprawled across many different cloud services - like Snowflake, Databricks, S3, - accessible through sophisticated tools like Hex, Jupyter, Looker etc.
  8. And this list of services is increasing rapidly with the advent of AI All of this context brings us to the topic at hand - how do you stop Ethan Hunt, who has compromised other all security controls and is now at the doorstep of your data store, from stealing your most sensitive data.
  9. Thinking of it from first principles CIA Triad as applied to Data: Confidentiality – prevent unauthorized access, ensure sensitivity of data Integrity – prevent tampering, corruption and loss of data Availability – ensure data is available to all authorized users as and when they need it Must be applied to entire data lifecycle Must balance security with productivity and agility
  10. Thinking of it from first principles CIA Triad as applied to Data: Confidentiality – prevent unauthorized access, ensure sensitivity of data Integrity – prevent tampering, corruption and loss of data Availability – ensure data is available to all authorized users as and when they need it Must be applied to entire data lifecycle Must balance security with productivity and agility
  11. Unlike apps, infra and services, it can be mutated. You can derive more data from what you already have on hand, or duplicate it across boundaries Sensitive data also has a knack for showing up in unexpected places. For example if you transcribe and store customer conversations, all sorts of nasty sensitive things can show up. Identifying this sensitive data and securing it can be a real challenge Also, data itself provides no intrinsic constraints on how it can be handled. Apps have APIs, infra has protocols, but data has nothing - you can do whatever you want with it, e.g. query it, dump it to your laptop, transfer it to portable disks, duplicate it in the cloud, bulk-read, trickle-read, etc
  12. Every company has data sprawling everywhere from Emails to laptops, fileservers, engineering services, data products, etc The ownership model is often fuzzy at best. For Example - when party A and party B share data with a third party C, and C then joins that data and aggregates it – who’s data is it? Also, data team itself is a constellation of multiple types of sub-teams whose roles span infra, services, apps, tools, SRE, etc
  13. While exact organizational structure will obviously differ, most data teams can be functionally categorized into three groups: data engineers, data scientists and analysts Together these teams are responsible for gathering insights from datasets. This involves acquiring, processing and managing the data, and delivering on SLAs spanning reliability, time-to-insights, etc. The data engineers are typically the ones responsible for maintaining the data platform, processing all the data in it and its availability Data scientists are the ones who analyze the data by building models and interpreting results to create insights or better models Finally analysts are the folks who interface with all the various business teams, to help deliver on their reporting requirements Let’s explain all this through concrete examples
  14. Imagine there is a company that wants to better market itself to the 21-25 year olds. A typical way to accomplish this would be: 1. Get a real-time view into how their ads are performing across various demographics and see what’s working and what’s not - this is what data engineers would help them with 2. Once they have some proper baseline, focus on Improving the performance within the target age group - which is what data scientists would help with 3. Ultimately, all these results need to be reported to the executive team and board - that’s what the analysts will help with Let’s dive a bit deeper into what all this looks like
  15. Let’s see how the data engineers get their real-time dashboards into ads performance by demographics They will ingest campaign data from from the various advertising platforms - facebooks, youtube, twitter, etc. They will additionally combine data from internal systems - CRM, lead databases, production applications, etc All these places store data in different formats and have their own errors and missing fields. Data engineers need to scrub that data, normalize across formats and ingest it into a central platform This platform will be responsible for things like cataloging, data retention and purging, APIs and schedulers for all the various services, etc They will then build dashboards on top of it using existing tools like Looker, Superset, etc. As all this happen all sorts of outages, errors, infra issues, etc will crop up and the data ops team will have to make sure it all keeps running despite those issues
  16. The next step is to figure out how to improve this performance Data scientists do that by determining factors that influence performance by running Python jobs on Jupyter. For example, what are the common factors across users skipping marketing videos or disliking them etc. Then they map those factors to dataset features, and run experiments to optimize them for future campaigns. They achieve this by creating models, training them and analyzing them using tools like Sagemaker,
  17. Finally, analysts analyze all the results that the rest of the team stored in the data lake and builds executive dashboard with KPIs, e.g. revenue generated / future predictions for the next 90 days. Analysts typically use spreadsheets, presentation and visualization tools to do this.