Even after deploying traditional security measures like authentication and authorization to secure sensitive data, data owners and security teams are still struggling to manage and get visibility on risks with data. The same challenge multiplies when data is moving and shared across different data silos such as on-premise Hadoop, public cloud infrastructures such as AWS, Azure and Google Cloud. To control the risks that come with data, enterprises need a comprehensive data-centric approach to easily identify risks, manage security and compliance policies and implement behavior analytics to differentiate between good and bad behavior. This talk will explain a 3 step process of implementing data-centric controls for your hybrid environment including discovering where sensitive data is stored, tracking where data is moving and can easily identifying and controlling potential misuse of the data in near real time.
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
Beyond Kerberos and Ranger - Tips to discover, track and manage risks in hybrid data environments
1. Beyond Kerberos and
Ranger – Tips to Discover,
Track and Manage Risks in
Hybrid Data Environments
Balaji Ganesan
Don Bosco Durai
DataWorks Summit – San
Jose 2017
8. Current Challenges – Beyond Kerberos and
Ranger
Any sensitive
data being
ingested?
1
What are users
doing after
access?
2
Visibility on
malicious use or
compliance
adherence
3
9. Challenges – Where is sensitive data?
Sensitive data
could be
hidden within
data
10. Challenges – What are users doing with data?
Distcp between HDP Environment and S3 in cloud – 137 log
entries for single transaction
12. How you can address these challenges?
Discover- Invest in
sensitive data
discovery solution
1
Control- Build
classification based
controls and policies
2
Monitor - Invest in
behavioral monitoring
solutions (trust your
users, but verify)
3
13. Discovery
• Automatically discover and
classify any sensitive content,
structured or unstructured
data
• Machine learning, NLP to
understand context
• Work with external metadata
systems
• Understand risks with data
stored in the data lake
• Make control decisions based
on the content of the data
What it is? How does it benefit ?
14. Control
• Store metadata in Atlas,
leverage Atlas-Ranger
integration
• Configure policies in Ranger
to enable access to data
based on content. Configure
IAM similarly
• Anonymize or encrypt data
based on classification
• Control access based on
classification, focus on
sensitive data
• Dynamically control
anonymization or encryption
based on content of data
What it is? How does it benefit ?
15. Monitor
• Once access is granted,
monitor how users are using
sensitive data
• Track if sensitive data is being
downloaded or moved out of
restricted zones
• Detect and prevent data
misuse, compliance violations
• Enable trust in the
environment and broaden
use-cases
• “Trust the users, but verify”
• Provide real time visibility to
compliance and security team
What it is? How does it benefit ?
17. Enterprise Wide Policies
Name SSN Credit Card
Raw HDFS Files Yes Yes Yes
Hive Columns Yes Yes Yes
1. Michael is a privileged user with access to SSN and Credit Cards
2. Jennifer can only see the last 4 digits of the SSN and masked Credit Card
numbers
Name SSN Credit Card
Raw HDFS Files Yes No No
Hive Columns Yes Last 4 digits Masked
Michael (Privileged User)
Jennifer (Contractor)
Credit Card Info (Name, SSN, Credit Card #)
18. Summary
• Risks are increasing with growth of data, number of applications and
users
• Build basic security controls leveraging security in HDP and other
environments
• Build measures to address risks with data movement, and potential
data misuse and leakage
• Visibility into security and compliance violations. Build templates for
specific compliance needs
The risks come from data breaches and privacy penalties. Hackers are after your social security numbers, credit cards and other personal data. The costs of data breaches are growing exponentially with estimated costs to $2.1 trillion by 2019. On the other hand, privacy regulations are on the rise with a new regulation GDPR imposing upto 4% of revenues as penalties