BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx

Stopping Ethan Hunt from
Taking Your Data!
Feni Chawla
Mukund Sarma

Who are we
Mukund Sarma
Senior Director of Product Security, Chime
● Security Engineer turned
manager
● Chime ← Credit Karma ←
Synopsys
● There are no Security problems
- They are all engineering and
culture problems!
Feni Chawla
Senior Security Engineer, Chime
● Data engineer turned security
engineer
● Chime ← Rally Health ←
Microsoft ← Teradata
● Passionate about keeping data
safe and user information
private

<¡Spoiler Alert!> How Ethan Hunt Steals Data from CIA
Gets past the following controls to get to the database:
I. Retinal scan
II. Double key card
III. Thermal and pressure sensors
IV. Laser rays
🗣 Feni

<¡Spoiler Alert!> How Ethan Hunt Steals Data from CIA
When he gets to the database:
🗣 Feni

1996 Movie → 2024 Reality
🗣 Feni

Snowflake + RDS + S3 + Databricks
+ Hex + Kafka + Airflow
+ Jupyter + Looker + Sagemaker
🗣 Feni

Snowflake + RDS + S3 + Databricks
+ Hex + Kafka + Airflow
+ Jupyter + Looker + Sagemaker
+ Hugging Face + AWS Bedrock
….
🗣 Feni

Agenda
● Defining Data Security
● Unraveling the Roles and Responsibilities within Data team
● Why working with Data teams is different for Security teams
● Practical challenges of running a Data Security program
● How we approached building a pragmatic Data Security program
● How does “STRIDE” look for Data Security
● Closing thoughts
● Questions
🗣 Feni

Things That Could Be Their Own Talks
(What’s not in scope for this talk)
● Privacy engineering is not in scope
● AI and the implications of AI in
engineering and Security
● “Data Perimeters” or concepts of that in
the world of Public Cloud
🗣 Feni

Definitions (or rather a very brief outline)
● Data warehouse - a very large database that stores integrated data from multiple sources
● Snowflake - a popular, SaaS data warehouse
● ETL - a process of extracting data from various sources, transforming it into a format suitable for
analysis, and loading it somewhere, typically to a data warehouse
● Looker - a reporting & visualization tool that can connect to any database or data warehouse
● Data Lake - a large centralized repository that stores vast amounts of data, typically in their native format
🗣 Feni

Defining Data Security
Data security is the practice of protecting
Data from unauthorized access,
corruption, or theft throughout its
lifecycle.
🗣 Mukund

But why Data Security?
🗣 Mukund

What Makes Data Security Challenging
Data is often intangible
● It can mutate and be derived
● It can easily flow across boundaries
● No intrinsic constraints on data handling
🗣 Feni
Hi, how can I help you?
Can you provide your
order number?
I need help canceling
my last order
Sure, my SSN is 123-
45-6789

Scope is wide, and growing
● Data is everywhere
● Ownership spans multiple teams
● Data team often has multiple functions & goals
🗣 Feni

There aren’t many precedences one can learn from
● Traditional Security teams don’t understand the
data domain
● More often it's seen as a Compliance function
● Not enough “security” focused tutorials /
documentation
🗣 Mukund
Data Security
Infra
Security
App
Security

Unraveling the roles and responsibilities within data team
🗣 Feni

Engineers
Processing
Data Platform
Ops & SRE
Scientists
Modeling
ML & AI
Analysts Business Reporting
● Goal: Gather insights from datasets
● Datasets must be:
○ Acquired
○ Transformed
○ Maintained
● Insights must be:
○ Reliable
○ Timely
○ Easy to consume
○ Granular
Functional Roles Responsibilities
Unraveling Roles and Responsibilities within Data Team
🗣 Feni

Engineers
Processing
Data Platform
Ops & SRE
Scientists
Modeling
ML & AI
Using data to improve marketing results
Build real-time view into
performance of digital ads by demographic
Improve performance amongst the 21-25 year
olds in metropolitans
Provide executive reporting on Q1 results
🗣 Feni

Engineers
Processing
Data Platform
Ops & SRE
Scientists
Modeling
ML & AI
Build real-time view into performance of digital ads by demographics
Campaign Data Proprietary Data
Data Lifecycle
Management
Indexing &
Cataloging
APIs &
Schedulers
Scrub, Normalize, Ingest
🗣 Feni

Engineers
Processing
Data Platform
Ops & SRE
Scientists
Modeling
ML & AI
Improve performance amongst the 21-25 year olds in metropolitans
Testing &
Sampling
Modeling
Analysis
Data Lake
🗣 Feni

Engineers
Processing
Data Platform
Ops & SRE
Scientists
Modeling
ML & AI
Provide executive reporting for Q1 results
Data Lake
Curated Data
🗣 Feni

Engineers
Processing
Data Platform
Ops & SRE
Scientists
Modeling
ML & AI
Bringing It All together From a Tooling Perspective
🗣 Feni

Working with data engineers is different for security teams
🗣 Mukund

Working with Data Engineering Teams is Different
from Infra and Software Engineering Teams
🗣 Mukund
Raise your hands if your company has:
- A dedicated security onboarding program
- A security training program that covers OWASP Top 10 or similar for
your engineers
- Built specific guardrails or tools for your developer
- A DevEx or DevRel team

🗣 Mukund
Continue to keep your hands raised if:
- any of those were built keeping your data teams/data engineers in
mind

Software & Infra Teams Data Teams
Ownership
Own the software, products and infra
systems they build and maintain
Typically do not own the data itself, just
process it or manage the underlying platform
Culture
Generally start with high specificity about
what they want to build or design
Generally start with low specificity and need
to explore the data to solidify requirements
Testing
Rarely need access to prod data, except
for specific troubleshooting purposes
Almost always need access to prod data
for modeling and validation
🗣 Mukund

Tooling - Common Security Tools Don’t Apply
Controls Product Engineering Data Teams
Tools the team uses
Most tools in use are mature and
established, especially in production
(eg Argo for K8s deployment, Rails,
etc)
No easy way to regulate tools
operating on datasets
Security Design Reviews
Clear, established practice coupled with
availability of frameworks like STRIDE
No established processes &
frameworks
Vulnerability Detection
Code scanners, SAST/DAST, pentests,
bug bounty etc
No way to detect problems in
tools and datasets
Asset Inventory
Generally easy to itemize services,
compute, infra, etc
No easy way to itemize datasets
and models
🗣 Mukund

🗣 Mukund
We ought to come to terms with the fact that most companies and security
teams have not considered their data scientists and Data engineers as First
class citizens. This must change!

Practical challenges of running a data security program
IAM, Data Inventory (or lack thereof), SDLC, Culture
🗣 Feni

Practical Challenges of Running a Data Security Program
#1: IAM
● Most databases and platforms use their own
RBAC
● Securing service accounts is complex
● Limiting user permissions hinders productivity
Role 1 Role 2 Role 3 Role 4
Service
Role
🗣 Feni

#2: Data Inventory & Data Discovery
● Finding sensitive data is not easy
● Hard to get classification right
🗣 Feni

#2: Data Inventory & Data Discovery
● Finding sensitive data is not easy
● Hard to get classification right
🗣 Feni
In case you were
wondering, Waldo is
here!

#3: Redefining your SDLC to include the Data team
● Existing frameworks and processes don’t work well
○ Lack of an OWASP Top 10 equivalent
○ What does design reviews look like for a researcher?
● Traditional Security tooling is not built for identifying
Data Security issues
● Lack of security training programs catered to data
teams
○ Pragmatic data governance guidelines
○ stop using catch-all words like “PII” - complement with right tools
🗣 Mukund

#4: Culture
● Data team’s culture of exploration runs counter to security team’s culture of
enforcing least privilege
● Data teams aren’t used to Security being involved in their development lifecycle
● Security has focused a lot on Shift left and guardrails for our application and
infrastructure engineers to do things right. What about data?
● Is your team grounded in pragmatism?
🗣 Mukund

What should one do to solve data security
🗣 Feni

What Should One Do? The technical stuff…
Approach Notes
Inventory
Data Discovery Understand where all sensitive data is
Role Discovery Understand who has what access privileges
Access
Control
Isolate Individual Services Ensure all ETL jobs use unique credentials
JIT Access Approvals Extend JIT access controls to all data users
Least Privileged Access Require users to assume least privilege role
Segmentation
Data Segregation Restrict data to specific locations based on risk
Client Segmentation Limit specific clients to specific data based on risk
Increased
complexity
🗣 Feni

What Should One Do? The process stuff…
Approach Notes
Data
Environment
Data Minimization Only collect data that is absolutely needed
Terraform Modules with
Secure Defaults
Shift-left, emulate what worked in AppSec and Cloud Security
Tooling IAM Helper Tooling Provide tools that enable least privilege role selection
Documentation
Tooling Documentation Ask vendors for best practices for using their tools securely
Tutorials Ask vendors to provide tutorials on security configurations
Runbooks Provide practical operational guidance through runbooks
Increased
complexity
🗣 Feni

What Should One Do? The process stuff…
Approach Notes
Model Builder
Environment
Data Minimization Only collect the data absolutely needed for model building
Terraform Modules with
Secure Defaults
Shift-left, emulate what worked in AppSec and Cloud Security
Tooling IAM Helper Tooling Provide tools that enable least privilege role selection
Documentation
Tooling Documentation Ask vendors for best practices for using their tools securely
Tutorials Ask vendors to provide tutorials on security configurations
Runbooks Provide practical operational guidance through runbooks
Increased
complexity
🗣 Feni

What Should One Do? The people stuff…
Approach
Invest in building bridges
Collaboration > Security
Transparency - One can’t fix the issues they can’t see
Help do the job - builds empathy and confidence
Understand that we as an industry have neglected Data teams all along
Plan in the open
Teach them to fish - don’t serve them fish
Increased
complexity
🗣 Mukund

Collaborative Threat Modeling with STRIDE
Threat modeling is not just for applications and infra
🗣 Mukund

But why run collaborative threat models at all in the first place?
Data teams haven’t had to work with the Security teams on an ongoing basis
We’re all trying to figure out how we work - Having this as an activity opens up
opportunities for collaboration
🗣 Mukund

Threat
Spoofing
Tampering
Repudiation
Information Disclosure
Denial of Service
Elevation of Privileges
● Goal:
○ Apply STRIDE-like model to data, in addition to
apps, services and infrastructure
○ Strategically focus on threat patterns that are not
covered in application or infra security, e.g.:
■ SQL injection is accounted for in appsec
🗣 Feni

Threat Applied to Data Security Examples
Spoofing ● Using service accounts to
access data directly
● Looker user runs query on
Snowflake using Looker role
● Looker admin logs into Snowflake
using Looker service account
credentials
Tampering
Repudiation
Denial of Service
🗣 Feni

Spoofing ● DB admin deleting or modifying
data for personal gain
● Application bug resulting in data
being corrupted or deleted
● Creating false data for hijacking
ML model
● IT admin deleting or overwriting
audit logs
● DBA at a bank modifies balance
information
● Complex ETL job deletes certain
records while loading them to
destination
● DBA creates false data for training
ML model that causes fraudulent
transactions to be undetected
Tampering
Repudiation
Denial of Service
🗣 Feni

Spoofing ● All activity managed and
monitored using ROLES within
databases, instead of user
● User could successively assume
different shared roles, performing
operations piecemeal with each
role, making repudiation very
challenging
Tampering
Repudiation
Denial of Service
🗣 Feni

Spoofing ● Sensitive data flows from
production to analytics
dashboard
● Sensitive production data used
in dev for testing or
troubleshooting
● Sensitive data copied to an
accessible location using
service account
● ETL pipeline scrubs PII from SSN
column in PG, but not from JSON
fields, which results in SSN being
present in dashboard
● Developers creating copy of
sensitive data in development for
testing
● Developer running ETL job to
copy restricted sensitive data into
an S3 bucket that they have
access to
Tampering
Repudiation
Denial of Service
🗣 Feni

Spoofing ● Processing job blows up due to
complex computation on large
dataset
● A single Spark job overwhelms
AWS resources, resulting in
cascading failures across other
jobs
Tampering
Repudiation
Denial of Service
🗣 Feni

Spoofing ● Admin creating backdoors on
databases
● Using service accounts to
access data
● DBA or readwrite user GRANTing
increased privileges to an
alternate role
● Looker admin logs into Snowflake
using Looker service account
credentials, and can read data
that they do not have access to
directly
Tampering
Repudiation
Denial of Service
🗣 Feni

Putting it all together..
● STRIDE is a valuable framework for security and data teams to work
together to conceptualize and subsequently secure against threats
● There are other frameworks too that one could use, eg:
○ DREAD - developed by the company which is graciously hosting us today
○ PASTA - italian food will not taste the same now that you have heard about this
○ VAST - why did the pond break up with the vast ocean? Because it was too shallow
🗣 Feni

Parting Thoughts
● Data tools are neither infra tools, nor app tools -
need to approach them differently
● Concepts from infra, cloud, and app security help,
but need to properly thought through when
applying to data
● Security is important to everyone, but often not at
the cost of productivity
🗣 Feni

Parting Thoughts
● Start small - You will have to do a lot of heavy
lifting yourselves initially
● Share/include the data team with everything
you do
● Build developer focused tools that will help
them do what they need to do securely
● Be pragmatic
● Kudos go a long way!
● Ask for help - Builds empathy
● Share what worked/didn’t work with the
industry - we’re all in this together!
🗣 Mukund

Feedback / Questions?
feni.chawla@chime.com
mukund.sarma@chime.com / X- @spashtata
We’re hiring! - www.chime.com/careers

BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx

Semelhante a BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx (20)

Último

Último (20)

BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx

Notas do Editor