SlideShare uma empresa Scribd logo
1 de 32
2
Apache Eagle
Monitor Hadoop in Real Time
Yong Zhang | Senior Architect |
yonzhang2012@gmail.com
Arun Manoharan | Senior Product Manager | @lycos_86
Big Data @ eBay
800M
Listings *
159M
Global Active Buyers *
*Q3 2015 data
7
Hadoop Clusters*
800M
HDFS operations
(single cluster)*
120 PB
Data*
Hadoop @ eBay
HADOOP SECURITY
Authorization &
Access Control
Perimeter
Security
Data
Classification
Activity
Monitoring
Security
MDR
• Perimeter Security
• Authorization &
Access Control
• Discovery
• Activity Monitoring
Security for Hadoop
Who is accessing the data?
What data are they accessing?
Is someone trying to access data that they don’t have access to?
Are there any anomalous access patterns?
Is there a security threat?
How to monitor and get notified during or prior to an anomalous event occurring?
Motivation
Apache Eagle
Apache Eagle: Monitor Hadoop in Real Time
Apache Eagle is an Open Source Monitoring Platform for Hadoop
eco-system, which started with monitoring data activities in
Hadoop. It can instantly identify access to sensitive data, recognize
attacks/malicious activities and blocks access in real time.
In conjunction with components such as Ranger, Sentry,
Knox, DgSecure and Splunk etc., Eagle provides
comprehensive solution to secure sensitive data stored in
Hadoop.
Apache Eagle Composition
Apache Eagle
Integrations Alert Engine
HDFS
AUDIT
HIVE
QUERY
HBASE
AUDIT
CASSANDRA
AUDIT
MapR
AUDIT
2 HADOOP
Performance
Metric
Namenode
JMX
Metrics
Datanode
JMX
Metrics
System
Metrics
3 M/R Job
Performance
Metric
History Job
Metrics
Running
Job Metrics
4 Spark Job
Performance
Metric
Spark Job
Metrics
Queue
Metrics
1 Data Activity
Monitoring
RM
JMX
Metrics
1 Policy Store
2 Metadata API
3 Scalability
4 Extensibility
[Domains] [Applications]
More Integrations
•Cassandra
•MapR
•Mongo DB
•Job
•Queue
Extensibility
 Ranger
• As remediation engine
• As generic data source
 DgSecure
• Source of truth for data classification
 Splunk
• Syslog format output
• EAGLE alert output is the 1st abstraction of analytics and Splunk is
the 2nd abstraction
Eagle Architecture
Highlights
1. Turn-key integration: after installation, user defines rules
2. Comprehensive rules on high volume of data: Eagle solves some
unique problem in Hadoop
3. Hot deploy rule: Eagle does not provide a lot of charts, instead it
allows user to write ad-hoc rule and hot deploy it.
4. Metadata driven: kept in mind, here metadata includes policy, event
schema and UI component etc.
5. Extensibility: Keep in mind that Eagle can’t succeed alone, Eagle has to
be integrated with other system for example data classification, policy
enforcement etc.
6. Monolithic storm topology: application pre-processing are running
together with alert engine.
Example 1: Integration with HDFS AUDIT log
• Ingestion
 KafkaLog4jAppender+Kafk
a
 Logstash+Kafka
• Partition
 By user
• Pre-processing
 Sensitivity join
 Command re-assembler
Namenode
Kafka
Partition_1
Kafka
Partition_2
Kafka
Partition_N
Storm
Kafka
Spout
User1 User1
Alert
Executor_1
Alert
Executor_2
Alert
Executor_K
User2 User2
User1
User2
Data Classification - HDFS
•Browse HDFS file system
•Batch import sensitivity metadata through Eagle API
•Manually mark sensitivity in Eagle UI
 One user command generates multiple HDFS audit events
 Eagle does reverse engineering to figure out original user command
 Example
COPYFROMLOCAL_PATTERN = “every a = eventStream[cmd==‘getfileinfo’] ” +
“-> b = eventStream[cmd==‘getfileinfo’ and user==a.user and src==str:concat(a.src,‘._COPYING_’)] ” +
“-> c = eventStream[cmd==‘create’ and user==a.user and src==b.src] ” +
“-> d = eventStream[cmd==‘getfileinfo’ and user==a.user and src==b.src] ” +
“-> e = eventStream[cmd==‘delete’ and user==a.user and src==a.src] ” +
“-> f = eventStream[cmd==‘rename’ and user==a.user and src==b.src and dst==a.src]”
2015-11-20 00:06:47,090 INFO FSNamesystem.audit: allowed=true ugi=root (auth:SIMPLE) ip=/10.0.2.15 cmd=getfileinfo src=/tmp/private dst=null perm=null proto=rpc
2015-11-20 00:06:47,185 INFO FSNamesystem.audit: allowed=true ugi=root (auth:SIMPLE) ip=/10.0.2.15 cmd=getfileinfo src=/tmp/private._COPYING_ dst=null perm=null
proto=rpc
2015-11-20 00:06:47,254 INFO FSNamesystem.audit: allowed=true ugi=root (auth:SIMPLE) ip=/10.0.2.15 cmd=create src=/tmp/private._COPYING_ dst=null
perm=root:hdfs:rw-r--r-- proto=rpc
2015-11-20 00:06:47,289 INFO FSNamesystem.audit: allowed=true ugi=root (auth:SIMPLE) ip=/10.0.2.15 cmd=getfileinfo src=/tmp/private._COPYING_ dst=null perm=null
proto=rpc
2015-11-20 00:06:47,609 INFO FSNamesystem.audit: allowed=true ugi=root (auth:SIMPLE) ip=/10.0.2.15 cmd=delete src=/tmp/private dst=null perm=null proto=rpc
2015-11-20 00:06:47,624 INFO FSNamesystem.audit: allowed=true ugi=root (auth:SIMPLE) ip=/10.0.2.15 cmd=rename src=/tmp/private._COPYING_ dst=/tmp/private
perm=root:hdfs:rw-r--r-- proto=rpc
User Command Re-assembly
• Policy evaluation is stateful (one user’s data has to go to one physical bolt)
• Partition by user all the way (hash)
• User is not balanced at all
• Greedy algorithm https://en.wikipedia.org/wiki/Partition_problem#The_greedy_algorithm
Data Skew Problem
 Policy weight is not even
• Regex policy is CPU intensive
• Window based policy is Memory intensive
Computation Skew Problem
Example 2: Integration with Hive
• Ingestion
 Yarn API
• Partition
 user
• Pre-
processing
 Sensitivity join
 Hive SQL parser
Data Classification - Hive
•Browse Hive databases/tables/columns
•Batch import sensitivity metadata through Eagle API
•Manually mark sensitivity in Eagle UI
Eagle Alert Engine Overview
1 Runs CEP engine on Apache
Storm
• Use CEP engine as library (Siddhi CEP)
• Evaluate policy on streamed data
• Rule is hot deployable
2 Inject policy dynamically
• API
• Intuitive UI
3 Scalability
• Computation
# of policies (policy placement)
• Storage
# of events (event partition)
4 Extensibility for policy
enforcement
• Post-alert processing with plugin
Run CEP Engine on Storm
Storm Bolt
CEP
Worker
CEP
Worker
CEP
Worker
… …
Policy
Check
Thread
Policy
Store
Metadata API
event1
event1
event1
event1
policy1,2,3,4,5,6policy1,2,3
policy1
policy2
policy3
Storm Bolt
event1
policy4,5,6
event schema
Primitives – event, policy, alert
Raw Event
2015-10-11 01:00:00,014 INFO FSNamesystem.audit: allowed=true
ugi=user_tom@sandbox.hortonworks.com (auth:KERBEROS) ip=/10.0.0.1 cmd=getfileinfo
src=/tmp/private dst=null perm=null
Alert Event
Timestamp, cmd, src, dst, ugi, sensitivityType, securityZone
Policy
viewPrivate: from hdfsAuditLogEventStream[(cmd=='getfileinfo') and (src=’/tmp/private’)]
Alert
2015-10-11 01:00:09[UTC] hdfsAuditLog viewPrivate user_tom/10.0.0.1 The Policy "viewPrivate" has
been detected with the below information: timestamp="1445993770932" allowed="true"
cmd="getfileinfo" host="/10.0.0.1" sensitivityType="PRIVATE" securityZone="NA" src="/tmp/private"
dst="NA" user=“user_tom”
Event Schema
• Modeling event
1 Single event evaluation
• threshold check with various conditions
Policy Capabilities
2 Event window based evaluation
• various window semantics (time/length sliding/batch window)
• comprehensive aggregation support
3 Correlation for multiple event
streams
• SQL-like join
4 Pattern Match and
Sequence
• a happens followed by b
Powered by Siddhi 3.0.5, but Eagle provides dynamic capabilities
and intuitive API/UI
1 Namenode master/slave lag
from every a =
hadoopJmxMetricEventStream[metric=="hadoop.namenode.journaltransaction.lastappliedorwrittent
xid"] -> b = hadoopJmxMetricEventStream[metric==a.metric and b.host != a.host and
(max(convert(a.value, "long")) + 100) <= max(convert(value, "long"))] within 5 min select a.host as
hostA, a.value as transactIdA, b.host as hostB, b.value as transactIdB insert into tmp;
Some policy examples
3 Namenode HA state change
from every a = hadoopJmxMetricEventStream[metric=="hadoop.namenode.hastate.active.count"] ->
b = hadoopJmxMetricEventStream[metric==a.metric and b.host == a.host and (convert(a.value,
"long") != convert(value, "long"))] within 10 min select a.host, a.value as oldHaState, b.value as
newHaState, b.timestamp as timestamp, b.metric as metric, b.component as component, b.site as
site insert into tmp;
2 Namenode last checkpoint time
• from hadoopJmxMetricEventStream[metric == "hadoop.namenode.dfs.lastcheckpointtime" and
(convert(value, "long") + 18000000) < timestamp] select metric, host, value, timestamp,
component, site insert into tmp;
Define policy in UI and API
curl -u ${EAGLE_SERVICE_USER}:${EAGLE_SERVICE_PASSWD} -X POST -H 'Content-
Type:application/json' 
"http://${EAGLE_SERVICE_HOST}:${EAGLE_SERVICE_PORT}/eagle-
service/rest/entities?serviceName=AlertDefinitionService" 
-d '
[
{
"prefix": "alertdef",
"tags": {
"site": "sandbox",
"application": "hadoopJmxMetricDataSource",
"policyId": "capacityUsedPolicy",
"alertExecutorId": "hadoopJmxMetricAlertExecutor",
"policyType": "siddhiCEPEngine"
},
"description": "jmx metric ",
"policyDef": "{"expression":"from hadoopJmxMetricEventStream[metric ==
"hadoop.namenode.fsnamesystemstate.capacityused" and convert(value,
"long") > 0] select metric, host, value, timestamp, component, site insert into
tmp; ","type":"siddhiCEPEngine"}",
"enabled": true,
"dedupeDef": "{"alertDedupIntervalMin":10,"emailDedupIntervalMin":10}",
"notificationDef":
"[{"sender":"eagle@apache.org","recipients":"eagle@apache.org","subject
":"missing block
found.","flavor":"email","id":"email_1","tplFileName":""}]"
}
]
'
1 Create policy using API 2 Create policy using UI
Scalability
•Scale with # of events
•Scale with # of policies
Statistics
• # of events evaluated per
second
• audit for policy change
Eagle Service
As of 0.3.0, Eagle stores metadata and statistics into HBASE, and
support Druid as metric store.
Metadata
• Policy
• Event schema
• Site/Application/UI Features
HBASE
• Store metrics
• Store M/R job/task data
• Rowkey design for time-series data
• HBase Coprocessor
Raw data
• Druid for metric
• HBASE for M/R job/task
etc.
• ES for log (future)
1 Data to be stored 2 Storage 3 API/UI
Druid
• Consume data from Kafka
HBASE
• filter, groupby, sort, top
Druid
• Druid query API
• Dashboard in Eagle
Alert Engine Limitations in Eagle 0.3
1 High cost for integrating
• Coding for onboarding new data source
• Monolithic topology for pre-processing and alert
3 Policy capability restricted by event
partition
• Can’t do ad-hoc group-by policy expression
For example from groupby user to groupby cmd
2 Not multi-tenant
• Alert engine is embedded into application
• Many separate Storm topologies
4 Correlation is not declarative
• Coding for correlating existing data sources
If traffic is partitioned by user, policy only
supports expression of user based group-by
One storm topology even for one trivial data
source
Even if it is a simple data source, you have
to write storm topology and then deploy
Can’t declare correlations for multiple
metrics
5 Stateful policy evaluation
• fail over when bolt is down
How to replay one week history data when
node is down
Eagle Next Releases
• Improve User experience
 Remote start storm topology
 Metadata stored in RDBMS
Eagle 0.4 Eagle 0.5
• Alert Engine as Platform
 No monolithic topology
 Declarative data source onboard
 Easy correlation
 Support policies with any field group-by
 Elastic capacity management
USER PROFILE ALGORITHMS…
Eigen Value Decomposition
• Compute mean and variance
• Compute Eigen Vectors and determine Principal Components
• Normal data points lie near first few principal components
• Abnormal data points lie further from first few principal components and
closer to later components
USER PROFILE ARCHITECTURE
dev@eagle.incubator.apache.org
http://eagle.incubator.apache.org
https://github.com/apache/incubator-eagleGithub
Dev Mail
List
@TheApacheEagleTwitter
Q & A

Mais conteúdo relacionado

Mais procurados

End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...DataWorks Summit/Hadoop Summit
 
Securing data in hybrid environments using Apache Ranger
Securing data in hybrid environments using Apache RangerSecuring data in hybrid environments using Apache Ranger
Securing data in hybrid environments using Apache RangerDataWorks Summit
 
Securing Data in Hadoop at Uber
Securing Data in Hadoop at UberSecuring Data in Hadoop at Uber
Securing Data in Hadoop at UberDataWorks Summit
 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive DataWorks Summit
 
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ ExpediaBridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ ExpediaDataWorks Summit/Hadoop Summit
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...DataWorks Summit
 
Storage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on KubernetesStorage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on KubernetesDataWorks Summit
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsDataWorks Summit/Hadoop Summit
 
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...Cloudera, Inc.
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...DataWorks Summit
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesDataWorks Summit
 
The Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceThe Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceDataWorks Summit/Hadoop Summit
 
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningEvans Ye
 
Integrating Apache Phoenix with Distributed Query Engines
Integrating Apache Phoenix with Distributed Query EnginesIntegrating Apache Phoenix with Distributed Query Engines
Integrating Apache Phoenix with Distributed Query EnginesDataWorks Summit
 
HAWQ Meets Hive - Querying Unmanaged Data
HAWQ Meets Hive - Querying Unmanaged DataHAWQ Meets Hive - Querying Unmanaged Data
HAWQ Meets Hive - Querying Unmanaged DataDataWorks Summit
 
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for HadoopHBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for HadoopHBaseCon
 

Mais procurados (20)

End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
Spark Technology Center IBM
Spark Technology Center IBMSpark Technology Center IBM
Spark Technology Center IBM
 
Cloudbreak - Technical Deep Dive
Cloudbreak - Technical Deep DiveCloudbreak - Technical Deep Dive
Cloudbreak - Technical Deep Dive
 
Securing data in hybrid environments using Apache Ranger
Securing data in hybrid environments using Apache RangerSecuring data in hybrid environments using Apache Ranger
Securing data in hybrid environments using Apache Ranger
 
Securing Data in Hadoop at Uber
Securing Data in Hadoop at UberSecuring Data in Hadoop at Uber
Securing Data in Hadoop at Uber
 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive
 
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ ExpediaBridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
 
Big Data Tools in AWS
Big Data Tools in AWSBig Data Tools in AWS
Big Data Tools in AWS
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
 
Storage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on KubernetesStorage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on Kubernetes
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
 
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 
The Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceThe Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open Source
 
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioning
 
Integrating Apache Phoenix with Distributed Query Engines
Integrating Apache Phoenix with Distributed Query EnginesIntegrating Apache Phoenix with Distributed Query Engines
Integrating Apache Phoenix with Distributed Query Engines
 
HAWQ Meets Hive - Querying Unmanaged Data
HAWQ Meets Hive - Querying Unmanaged DataHAWQ Meets Hive - Querying Unmanaged Data
HAWQ Meets Hive - Querying Unmanaged Data
 
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for HadoopHBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
 

Destaque

Semantic Wiki: Social Semantic Web In Action:
Semantic Wiki: Social Semantic Web In Action: Semantic Wiki: Social Semantic Web In Action:
Semantic Wiki: Social Semantic Web In Action: Jesse Wang
 
Ebay: DB Capacity planning at eBay
Ebay: DB Capacity planning at eBayEbay: DB Capacity planning at eBay
Ebay: DB Capacity planning at eBayDataStax Academy
 
eBay Cloud CMS based on NOSQL
eBay Cloud CMS based on NOSQLeBay Cloud CMS based on NOSQL
eBay Cloud CMS based on NOSQLXu Jiang
 
No sql e as vantagens na utilização do mongodb
No sql e as vantagens na utilização do mongodbNo sql e as vantagens na utilização do mongodb
No sql e as vantagens na utilização do mongodbfabio perrella
 
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...MongoDB
 
An Elastic Metadata Store for eBay’s Media Platform
An Elastic Metadata Store for eBay’s Media PlatformAn Elastic Metadata Store for eBay’s Media Platform
An Elastic Metadata Store for eBay’s Media PlatformMongoDB
 
Social Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDBSocial Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDBTakahiro Inoue
 
Scaling with MongoDB
Scaling with MongoDBScaling with MongoDB
Scaling with MongoDBMongoDB
 
NOSQL uma breve introdução
NOSQL uma breve introduçãoNOSQL uma breve introdução
NOSQL uma breve introduçãoWise Systems
 
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDBMongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDBRick Copeland
 
NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)Kevin Weil
 
Building LinkedIn's Learning Platform with MongoDB
Building LinkedIn's Learning Platform with MongoDBBuilding LinkedIn's Learning Platform with MongoDB
Building LinkedIn's Learning Platform with MongoDBMongoDB
 
MongoDB at eBay
MongoDB at eBayMongoDB at eBay
MongoDB at eBayMongoDB
 

Destaque (14)

Semantic Wiki: Social Semantic Web In Action:
Semantic Wiki: Social Semantic Web In Action: Semantic Wiki: Social Semantic Web In Action:
Semantic Wiki: Social Semantic Web In Action:
 
Ebay: DB Capacity planning at eBay
Ebay: DB Capacity planning at eBayEbay: DB Capacity planning at eBay
Ebay: DB Capacity planning at eBay
 
eBay Cloud CMS based on NOSQL
eBay Cloud CMS based on NOSQLeBay Cloud CMS based on NOSQL
eBay Cloud CMS based on NOSQL
 
No sql e as vantagens na utilização do mongodb
No sql e as vantagens na utilização do mongodbNo sql e as vantagens na utilização do mongodb
No sql e as vantagens na utilização do mongodb
 
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
 
An Elastic Metadata Store for eBay’s Media Platform
An Elastic Metadata Store for eBay’s Media PlatformAn Elastic Metadata Store for eBay’s Media Platform
An Elastic Metadata Store for eBay’s Media Platform
 
Social Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDBSocial Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDB
 
Scaling with MongoDB
Scaling with MongoDBScaling with MongoDB
Scaling with MongoDB
 
NOSQL uma breve introdução
NOSQL uma breve introduçãoNOSQL uma breve introdução
NOSQL uma breve introdução
 
Artigo Nosql
Artigo NosqlArtigo Nosql
Artigo Nosql
 
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDBMongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDB
 
NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)
 
Building LinkedIn's Learning Platform with MongoDB
Building LinkedIn's Learning Platform with MongoDBBuilding LinkedIn's Learning Platform with MongoDB
Building LinkedIn's Learning Platform with MongoDB
 
MongoDB at eBay
MongoDB at eBayMongoDB at eBay
MongoDB at eBay
 

Semelhante a ebay

Apache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseApache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseHao Chen
 
Apache Eagle in Action
Apache Eagle in ActionApache Eagle in Action
Apache Eagle in ActionHao Chen
 
Eagle from eBay at China Hadoop Summit 2015
Eagle from eBay at China Hadoop Summit 2015Eagle from eBay at China Hadoop Summit 2015
Eagle from eBay at China Hadoop Summit 2015Hao Chen
 
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay HadoopHadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay HadoopDataWorks Summit
 
Hadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happyHadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happyDataWorks Summit
 
Hadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happyHadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happyAnurag Shrivastava
 
Achieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudAchieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudScott Miao
 
Big data on Azure for Architects
Big data on Azure for ArchitectsBig data on Azure for Architects
Big data on Azure for ArchitectsTomasz Kopacz
 
BigData Security - A Point of View
BigData Security - A Point of ViewBigData Security - A Point of View
BigData Security - A Point of ViewKaran Alang
 
Apache Eagle: Architecture Evolvement and New Features
Apache Eagle: Architecture Evolvement and New FeaturesApache Eagle: Architecture Evolvement and New Features
Apache Eagle: Architecture Evolvement and New FeaturesHao Chen
 
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013Amazon Web Services
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not laterDataWorks Summit
 
Secure Hadoop as a Service - Session Sponsored by Intel
Secure Hadoop as a Service - Session Sponsored by IntelSecure Hadoop as a Service - Session Sponsored by Intel
Secure Hadoop as a Service - Session Sponsored by IntelAmazon Web Services
 
Infocyte - Digital Forensics and Incident Response (DFIR) Training Session
Infocyte - Digital Forensics and Incident Response (DFIR) Training SessionInfocyte - Digital Forensics and Incident Response (DFIR) Training Session
Infocyte - Digital Forensics and Incident Response (DFIR) Training SessionInfocyte
 
Automate That! Scripting Atlassian applications in Python
Automate That! Scripting Atlassian applications in PythonAutomate That! Scripting Atlassian applications in Python
Automate That! Scripting Atlassian applications in PythonAtlassian
 
Automate that
Automate thatAutomate that
Automate thatAtlassian
 
Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎
Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎
Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎Qingwen zhao
 

Semelhante a ebay (20)

Apache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real TimeApache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real Time
 
Apache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseApache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San Jose
 
Apache Eagle in Action
Apache Eagle in ActionApache Eagle in Action
Apache Eagle in Action
 
Eagle from eBay at China Hadoop Summit 2015
Eagle from eBay at China Hadoop Summit 2015Eagle from eBay at China Hadoop Summit 2015
Eagle from eBay at China Hadoop Summit 2015
 
Apache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real TimeApache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real Time
 
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay HadoopHadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
 
Hadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happyHadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happy
 
Hadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happyHadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happy
 
Achieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudAchieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloud
 
Big data on Azure for Architects
Big data on Azure for ArchitectsBig data on Azure for Architects
Big data on Azure for Architects
 
BigData Security - A Point of View
BigData Security - A Point of ViewBigData Security - A Point of View
BigData Security - A Point of View
 
Apache Eagle: Architecture Evolvement and New Features
Apache Eagle: Architecture Evolvement and New FeaturesApache Eagle: Architecture Evolvement and New Features
Apache Eagle: Architecture Evolvement and New Features
 
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not later
 
Secure Hadoop as a Service - Session Sponsored by Intel
Secure Hadoop as a Service - Session Sponsored by IntelSecure Hadoop as a Service - Session Sponsored by Intel
Secure Hadoop as a Service - Session Sponsored by Intel
 
Infocyte - Digital Forensics and Incident Response (DFIR) Training Session
Infocyte - Digital Forensics and Incident Response (DFIR) Training SessionInfocyte - Digital Forensics and Incident Response (DFIR) Training Session
Infocyte - Digital Forensics and Incident Response (DFIR) Training Session
 
Monitoring in 2017 - TIAD Camp Docker
Monitoring in 2017 - TIAD Camp DockerMonitoring in 2017 - TIAD Camp Docker
Monitoring in 2017 - TIAD Camp Docker
 
Automate That! Scripting Atlassian applications in Python
Automate That! Scripting Atlassian applications in PythonAutomate That! Scripting Atlassian applications in Python
Automate That! Scripting Atlassian applications in Python
 
Automate that
Automate thatAutomate that
Automate that
 
Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎
Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎
Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎
 

Mais de DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 

Mais de DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Último

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Último (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

ebay

  • 1.
  • 2. 2 Apache Eagle Monitor Hadoop in Real Time Yong Zhang | Senior Architect | yonzhang2012@gmail.com Arun Manoharan | Senior Product Manager | @lycos_86
  • 3. Big Data @ eBay 800M Listings * 159M Global Active Buyers * *Q3 2015 data 7 Hadoop Clusters* 800M HDFS operations (single cluster)* 120 PB Data* Hadoop @ eBay
  • 4. HADOOP SECURITY Authorization & Access Control Perimeter Security Data Classification Activity Monitoring Security MDR • Perimeter Security • Authorization & Access Control • Discovery • Activity Monitoring Security for Hadoop
  • 5. Who is accessing the data? What data are they accessing? Is someone trying to access data that they don’t have access to? Are there any anomalous access patterns? Is there a security threat? How to monitor and get notified during or prior to an anomalous event occurring? Motivation
  • 6. Apache Eagle Apache Eagle: Monitor Hadoop in Real Time Apache Eagle is an Open Source Monitoring Platform for Hadoop eco-system, which started with monitoring data activities in Hadoop. It can instantly identify access to sensitive data, recognize attacks/malicious activities and blocks access in real time. In conjunction with components such as Ranger, Sentry, Knox, DgSecure and Splunk etc., Eagle provides comprehensive solution to secure sensitive data stored in Hadoop.
  • 7. Apache Eagle Composition Apache Eagle Integrations Alert Engine HDFS AUDIT HIVE QUERY HBASE AUDIT CASSANDRA AUDIT MapR AUDIT 2 HADOOP Performance Metric Namenode JMX Metrics Datanode JMX Metrics System Metrics 3 M/R Job Performance Metric History Job Metrics Running Job Metrics 4 Spark Job Performance Metric Spark Job Metrics Queue Metrics 1 Data Activity Monitoring RM JMX Metrics 1 Policy Store 2 Metadata API 3 Scalability 4 Extensibility [Domains] [Applications]
  • 9. Extensibility  Ranger • As remediation engine • As generic data source  DgSecure • Source of truth for data classification  Splunk • Syslog format output • EAGLE alert output is the 1st abstraction of analytics and Splunk is the 2nd abstraction
  • 11. Highlights 1. Turn-key integration: after installation, user defines rules 2. Comprehensive rules on high volume of data: Eagle solves some unique problem in Hadoop 3. Hot deploy rule: Eagle does not provide a lot of charts, instead it allows user to write ad-hoc rule and hot deploy it. 4. Metadata driven: kept in mind, here metadata includes policy, event schema and UI component etc. 5. Extensibility: Keep in mind that Eagle can’t succeed alone, Eagle has to be integrated with other system for example data classification, policy enforcement etc. 6. Monolithic storm topology: application pre-processing are running together with alert engine.
  • 12. Example 1: Integration with HDFS AUDIT log • Ingestion  KafkaLog4jAppender+Kafk a  Logstash+Kafka • Partition  By user • Pre-processing  Sensitivity join  Command re-assembler Namenode Kafka Partition_1 Kafka Partition_2 Kafka Partition_N Storm Kafka Spout User1 User1 Alert Executor_1 Alert Executor_2 Alert Executor_K User2 User2 User1 User2
  • 13. Data Classification - HDFS •Browse HDFS file system •Batch import sensitivity metadata through Eagle API •Manually mark sensitivity in Eagle UI
  • 14.  One user command generates multiple HDFS audit events  Eagle does reverse engineering to figure out original user command  Example COPYFROMLOCAL_PATTERN = “every a = eventStream[cmd==‘getfileinfo’] ” + “-> b = eventStream[cmd==‘getfileinfo’ and user==a.user and src==str:concat(a.src,‘._COPYING_’)] ” + “-> c = eventStream[cmd==‘create’ and user==a.user and src==b.src] ” + “-> d = eventStream[cmd==‘getfileinfo’ and user==a.user and src==b.src] ” + “-> e = eventStream[cmd==‘delete’ and user==a.user and src==a.src] ” + “-> f = eventStream[cmd==‘rename’ and user==a.user and src==b.src and dst==a.src]” 2015-11-20 00:06:47,090 INFO FSNamesystem.audit: allowed=true ugi=root (auth:SIMPLE) ip=/10.0.2.15 cmd=getfileinfo src=/tmp/private dst=null perm=null proto=rpc 2015-11-20 00:06:47,185 INFO FSNamesystem.audit: allowed=true ugi=root (auth:SIMPLE) ip=/10.0.2.15 cmd=getfileinfo src=/tmp/private._COPYING_ dst=null perm=null proto=rpc 2015-11-20 00:06:47,254 INFO FSNamesystem.audit: allowed=true ugi=root (auth:SIMPLE) ip=/10.0.2.15 cmd=create src=/tmp/private._COPYING_ dst=null perm=root:hdfs:rw-r--r-- proto=rpc 2015-11-20 00:06:47,289 INFO FSNamesystem.audit: allowed=true ugi=root (auth:SIMPLE) ip=/10.0.2.15 cmd=getfileinfo src=/tmp/private._COPYING_ dst=null perm=null proto=rpc 2015-11-20 00:06:47,609 INFO FSNamesystem.audit: allowed=true ugi=root (auth:SIMPLE) ip=/10.0.2.15 cmd=delete src=/tmp/private dst=null perm=null proto=rpc 2015-11-20 00:06:47,624 INFO FSNamesystem.audit: allowed=true ugi=root (auth:SIMPLE) ip=/10.0.2.15 cmd=rename src=/tmp/private._COPYING_ dst=/tmp/private perm=root:hdfs:rw-r--r-- proto=rpc User Command Re-assembly
  • 15. • Policy evaluation is stateful (one user’s data has to go to one physical bolt) • Partition by user all the way (hash) • User is not balanced at all • Greedy algorithm https://en.wikipedia.org/wiki/Partition_problem#The_greedy_algorithm Data Skew Problem
  • 16.  Policy weight is not even • Regex policy is CPU intensive • Window based policy is Memory intensive Computation Skew Problem
  • 17. Example 2: Integration with Hive • Ingestion  Yarn API • Partition  user • Pre- processing  Sensitivity join  Hive SQL parser
  • 18. Data Classification - Hive •Browse Hive databases/tables/columns •Batch import sensitivity metadata through Eagle API •Manually mark sensitivity in Eagle UI
  • 19. Eagle Alert Engine Overview 1 Runs CEP engine on Apache Storm • Use CEP engine as library (Siddhi CEP) • Evaluate policy on streamed data • Rule is hot deployable 2 Inject policy dynamically • API • Intuitive UI 3 Scalability • Computation # of policies (policy placement) • Storage # of events (event partition) 4 Extensibility for policy enforcement • Post-alert processing with plugin
  • 20. Run CEP Engine on Storm Storm Bolt CEP Worker CEP Worker CEP Worker … … Policy Check Thread Policy Store Metadata API event1 event1 event1 event1 policy1,2,3,4,5,6policy1,2,3 policy1 policy2 policy3 Storm Bolt event1 policy4,5,6 event schema
  • 21. Primitives – event, policy, alert Raw Event 2015-10-11 01:00:00,014 INFO FSNamesystem.audit: allowed=true ugi=user_tom@sandbox.hortonworks.com (auth:KERBEROS) ip=/10.0.0.1 cmd=getfileinfo src=/tmp/private dst=null perm=null Alert Event Timestamp, cmd, src, dst, ugi, sensitivityType, securityZone Policy viewPrivate: from hdfsAuditLogEventStream[(cmd=='getfileinfo') and (src=’/tmp/private’)] Alert 2015-10-11 01:00:09[UTC] hdfsAuditLog viewPrivate user_tom/10.0.0.1 The Policy "viewPrivate" has been detected with the below information: timestamp="1445993770932" allowed="true" cmd="getfileinfo" host="/10.0.0.1" sensitivityType="PRIVATE" securityZone="NA" src="/tmp/private" dst="NA" user=“user_tom”
  • 23. 1 Single event evaluation • threshold check with various conditions Policy Capabilities 2 Event window based evaluation • various window semantics (time/length sliding/batch window) • comprehensive aggregation support 3 Correlation for multiple event streams • SQL-like join 4 Pattern Match and Sequence • a happens followed by b Powered by Siddhi 3.0.5, but Eagle provides dynamic capabilities and intuitive API/UI
  • 24. 1 Namenode master/slave lag from every a = hadoopJmxMetricEventStream[metric=="hadoop.namenode.journaltransaction.lastappliedorwrittent xid"] -> b = hadoopJmxMetricEventStream[metric==a.metric and b.host != a.host and (max(convert(a.value, "long")) + 100) <= max(convert(value, "long"))] within 5 min select a.host as hostA, a.value as transactIdA, b.host as hostB, b.value as transactIdB insert into tmp; Some policy examples 3 Namenode HA state change from every a = hadoopJmxMetricEventStream[metric=="hadoop.namenode.hastate.active.count"] -> b = hadoopJmxMetricEventStream[metric==a.metric and b.host == a.host and (convert(a.value, "long") != convert(value, "long"))] within 10 min select a.host, a.value as oldHaState, b.value as newHaState, b.timestamp as timestamp, b.metric as metric, b.component as component, b.site as site insert into tmp; 2 Namenode last checkpoint time • from hadoopJmxMetricEventStream[metric == "hadoop.namenode.dfs.lastcheckpointtime" and (convert(value, "long") + 18000000) < timestamp] select metric, host, value, timestamp, component, site insert into tmp;
  • 25. Define policy in UI and API curl -u ${EAGLE_SERVICE_USER}:${EAGLE_SERVICE_PASSWD} -X POST -H 'Content- Type:application/json' "http://${EAGLE_SERVICE_HOST}:${EAGLE_SERVICE_PORT}/eagle- service/rest/entities?serviceName=AlertDefinitionService" -d ' [ { "prefix": "alertdef", "tags": { "site": "sandbox", "application": "hadoopJmxMetricDataSource", "policyId": "capacityUsedPolicy", "alertExecutorId": "hadoopJmxMetricAlertExecutor", "policyType": "siddhiCEPEngine" }, "description": "jmx metric ", "policyDef": "{"expression":"from hadoopJmxMetricEventStream[metric == "hadoop.namenode.fsnamesystemstate.capacityused" and convert(value, "long") > 0] select metric, host, value, timestamp, component, site insert into tmp; ","type":"siddhiCEPEngine"}", "enabled": true, "dedupeDef": "{"alertDedupIntervalMin":10,"emailDedupIntervalMin":10}", "notificationDef": "[{"sender":"eagle@apache.org","recipients":"eagle@apache.org","subject ":"missing block found.","flavor":"email","id":"email_1","tplFileName":""}]" } ] ' 1 Create policy using API 2 Create policy using UI
  • 26. Scalability •Scale with # of events •Scale with # of policies
  • 27. Statistics • # of events evaluated per second • audit for policy change Eagle Service As of 0.3.0, Eagle stores metadata and statistics into HBASE, and support Druid as metric store. Metadata • Policy • Event schema • Site/Application/UI Features HBASE • Store metrics • Store M/R job/task data • Rowkey design for time-series data • HBase Coprocessor Raw data • Druid for metric • HBASE for M/R job/task etc. • ES for log (future) 1 Data to be stored 2 Storage 3 API/UI Druid • Consume data from Kafka HBASE • filter, groupby, sort, top Druid • Druid query API • Dashboard in Eagle
  • 28. Alert Engine Limitations in Eagle 0.3 1 High cost for integrating • Coding for onboarding new data source • Monolithic topology for pre-processing and alert 3 Policy capability restricted by event partition • Can’t do ad-hoc group-by policy expression For example from groupby user to groupby cmd 2 Not multi-tenant • Alert engine is embedded into application • Many separate Storm topologies 4 Correlation is not declarative • Coding for correlating existing data sources If traffic is partitioned by user, policy only supports expression of user based group-by One storm topology even for one trivial data source Even if it is a simple data source, you have to write storm topology and then deploy Can’t declare correlations for multiple metrics 5 Stateful policy evaluation • fail over when bolt is down How to replay one week history data when node is down
  • 29. Eagle Next Releases • Improve User experience  Remote start storm topology  Metadata stored in RDBMS Eagle 0.4 Eagle 0.5 • Alert Engine as Platform  No monolithic topology  Declarative data source onboard  Easy correlation  Support policies with any field group-by  Elastic capacity management
  • 30. USER PROFILE ALGORITHMS… Eigen Value Decomposition • Compute mean and variance • Compute Eigen Vectors and determine Principal Components • Normal data points lie near first few principal components • Abnormal data points lie further from first few principal components and closer to later components

Notas do Editor

  1. Eagle was started by end of 2013 for monitoring job performance, bad node in large Hadoop clusters. That was very successful. From early this year, we started to use EAGLE to monitor data activities in Hadoop ecosystem and then open source it under apache because we believe data activity monitoring is a common problem for all the companies which have lots of users and data and we want to collaborate with community to solve the problem.
  2. 2FA+firewall lockdown MDR?
  3. The first case which Apache Eagle developed is security monitoring for Hadoop ecosystem but we will expand Eagle to monitor more than security
  4. Apache Eagle includes applications and alert engine. Today application connects to alert engine with JAVA API, in future, Alert Engine is a separate component, application can send data into Alert Engine
  5. Setup – one single Eagle instance can manage multiple sites
  6. Hive is very popular method by which user accesses Hadoop resources. What Eagle does for Hive security monitoring is to collect Hive query log from map/reduce job and parse the query and to get who accesses what database/table/column
  7. Setup – one single Eagle instance can manage multiple sites
  8. Anatomy of Eagle alert engine
  9. There is some policies day-over-day, week-over-week comparison not supported by CEP
  10. There is some policies day-over-day, week-over-week comparison not supported by CEP
  11. Setup – one single Eagle instance can manage multiple sites