SlideShare uma empresa Scribd logo
1 de 18
Baixar para ler offline
Accumulo @ Bloomberg
Accumulo Summit 2015

Skand Gupta

Bloomberg LP
Bloomberg
• Bloomberg technology helps drive the world’s financial markets
– We build our own software, digital platforms, mobile applications and state of the
art hardware
– We run one of the world’s largest private networks with over 20,000 routers across
our network
– We have the largest server side JavaScript deployment in the world – 22 million
lines of JavaScript code
– We developed “cloud computing” and deployed “software as a service” well ahead
of the general marketplace
– Our technology, has brought transparency to the global financial markets
• Bloomberg technologists
– More than 3,000 software developers and designers located around the world
(London, NYC, SF “tech hubs”)
– BloombergLabs.com (@BloombergLabs) is our platform for dialogue between our
experts and the broader tech community
• Our clients
– Over 320,000 subscribers
– Primarily financial professionals including investment bankers, CFOs, investor
relations, hedge funds managers, foreign exchange, etc.
Source:	
  Wall	
  Street	
  Journal,	
  CFTC	
  ,	
  New	
  York	
  Times,	
  Marketplace.org
Source:	
  Wall	
  Street	
  Journal,	
  CFTC	
  ,	
  New	
  York	
  Times
Importance	
  of	
  Compliance
Source:	
  Commodity	
  Futures	
  Trading	
  Commission
Hiding	
  in	
  Plain	
  Sight
Compliance	
  Platform	
  and	
  Processing	
  Pipeline	
  
Chat
Reference
Data
Trade
Data
Customer
Data
Product
Data
Market
Data
Counterparty
Email
Social Media Voice
Human-­‐	
  and	
  Machine-­‐
generated	
  Data
Surveillance	
  
Pipeline
Communication	
  
Data
Transactional	
  
Data
User	
  
Data
Case	
  
Management
Compliance	
  Platform
Compliance	
  Storage
Compliance	
  
Officers
Search,	
  
Review,	
  
Analyze
HDFS
Spark
Kafka
Storm
Mesos	
  
(Cluster	
  Resource	
  Manager)
Elastic	
  data-­‐processing	
  and	
  analytics	
  stack
Open	
  REST	
  API	
  (Play)
WORM
Pre-­‐fabricated	
  Hardware
Applications
Need	
  for	
  a	
  robust,	
  scalable,	
  high	
  performance,	
  geo-­‐distributed	
  
data	
  storage	
  and	
  retrieval	
  system
❑ More	
  than	
  3	
  Peta	
  Bytes	
  of	
  archived	
  
data	
  
❑ 80+	
  Billion	
  indexed	
  objects	
  
❑ Real-­‐time	
  scanning	
  of	
  35	
  million	
  
objects	
  per	
  day
100’s	
  Gigabytes/year
Communication	
  Data	
  Growth Cumulative	
  Data	
  Growth
Over	
  3	
  Petabytes	
  today
$0.00
$0.75
$1.50
$2.25
$3.00
List Price Replication DR Isolation
$2.31
$1.15
$0.58
$0.19
Storing 1GB of Data
Storage	
  Cost
2000 2002 2004 2006 2008 2010 2012
Need	
  for	
  Low	
  Level	
  Security	
  Primitives
Document Level Security
Lorem	
  ipsum	
  dolor	
  sit	
  amet,	
  
consectetur	
  adipiscing	
  elit,	
  sed	
  do	
  
eiusmod	
  tempor	
  incididunt	
  ut	
  
labore	
  et	
  dolore	
  magna	
  aliqua.	
  Ut	
  
enim	
  ad	
  minim	
  veniam,	
  quis	
  
nostrud	
  exercitation	
  ullamco	
  laboris	
  
nisi	
  ut	
  aliquip	
  ex	
  ea	
  commodo	
  
consequat.	
  Duis	
  aute	
  irure	
  dolor	
  in	
  
reprehenderit	
  in	
  voluptate	
  velit	
  esse	
  
cillum	
  dolore	
  eu	
  fugiat	
  nulla	
  
pariatur.	
  Excepteur	
  sint	
  occaecat	
  
cupidatat	
  non	
  proident,	
  sunt	
  in	
  
culpa	
  qui	
  officia	
  deserunt	
  mollit	
  
anim	
  id	
  est	
  laborum
Company Level Security
Data StoreData Pipe Application
User Level Security
Data Store
Security	
  Solutions
• Post-process the queries	
  
– Too slow	
  
– Nasty bugs	
  
• Generate unique document for each view	
  
– Exponential growth in number of documents 	
  
• Use application specific features
– Solr dynamic fields, Mangled Fields	
  
• Accumulo Visibility
– Fast, Clean, Generic
Data	
  Model
Row ID Value
CompanyA_userX_20150426 <bytes>
CompanyA_userX_20150426 <bytes>
CompanyA_userX_20150427 <bytes>
CompanyA_userX_20150428 <bytes>
CompanyA_userY_20150427 <bytes>
CompanyB_userX_20150428 <bytes>
CompanyB_userX_20150428 <bytes>
CompanyB_userX_20150428 <bytes>
Find	
  all	
  Communications	
  for	
  a	
  Set	
  of	
  Users	
  for	
  a	
  Date	
  Range
Row ID Value
CompanyA_userX_20150426 <bytes>
CompanyA_userX_20150426 <bytes>
CompanyA_userX_20150427 <bytes>
CompanyA_userX_20150428 <bytes>
CompanyA_userY_20150427 <bytes>
CompanyB_userX_20150428 <bytes>
CompanyB_userX_20150428 <bytes>
CompanyB_userX_20150428 <bytes>
Batch
Scanner
Application
Find	
  all	
  Records	
  with	
  “Libor”
Filter
Row ID Value
CompanyA_userX_20150426 <bytes>
CompanyA_userX_20150426 <bytes>
CompanyA_userX_20150427 <bytes>
CompanyA_userX_20150428 <bytes>
CompanyA_userY_20150427 <bytes>
CompanyB_userX_20150428 <bytes>
CompanyB_userX_20150428 <bytes>
CompanyB_userX_20150428 <bytes>
Batch
Scanner
Application
Count	
  Number	
  of	
  Objects	
  that	
  Match	
  a	
  Filter
Counting

Iterator
Filter
Row ID Value
CompanyA_userX_20150426 <bytes>
CompanyA_userX_20150426 <bytes>
CompanyA_userX_20150427 <bytes>
CompanyA_userX_20150428 <bytes>
CompanyA_userY_20150427 <bytes>
CompanyB_userX_20150428 <bytes>
CompanyB_userX_20150428 <bytes>
CompanyB_userX_20150428 <bytes>
Batch
Scanner
Application
Scaling	
  Out
Application
Row ID Value
CompanyA_userX_20150426 <bytes>
CompanyA_userX_20150426 <bytes>
CompanyA_userX_20150427 <bytes>
CompanyA_userX_20150428 <bytes>
CompanyA_userY_20150427 <bytes>
CompanyB_userX_20150428 <bytes>
CompanyB_userX_20150428 <bytes>
CompanyB_userX_20150428 <bytes>
Counting

Iterator
Filter
Batch
Scanner
Counting

Iterator
Filter
Batch
Scanner
Counting

Iterator
Filter
Batch
Scanner
SparkProcessing
Low	
  Latency	
  Writes	
  using	
  Accumulo	
  ‘File	
  System’
RowID Family Qualifier Value
attach.pdf chunk “00001” <bytes>
attach.pdf chunk “00002” <bytes>
… … … …
attach.pdf metadata file_size <file size>
attach.pdf metadata chunk_size <chunk size>
attach.pdf metadata sha256 <checksum>
WriteTimes(ms)
0 5 10 15 20
HDFS Accumulo File System
Conclusion
• Understand the data
• Free your data… but enforce
access control
• Need sensible systems that help
achieve these goals
Thank You!
http://careers.bloomberg.com	
  
sgupta178@bloomberg.net
We Are Hiring!

Mais conteúdo relacionado

Semelhante a accumulo summit 2015

Deliver solutions cv_vebtech
Deliver solutions cv_vebtechDeliver solutions cv_vebtech
Deliver solutions cv_vebtechSvetlanaUsikava
 
Platform governance, gestire un ecosistema di microservizi a livello enterprise
Platform governance, gestire un ecosistema di microservizi a livello enterprisePlatform governance, gestire un ecosistema di microservizi a livello enterprise
Platform governance, gestire un ecosistema di microservizi a livello enterpriseGiulio Roggero
 
Continuous Integration
Continuous IntegrationContinuous Integration
Continuous IntegrationMeni Lubetkin
 
Freedom and Responsibility
Freedom and ResponsibilityFreedom and Responsibility
Freedom and ResponsibilityMike Ruangutai
 
Hadoop Summit 2016 - Evolution of Big Data Pipelines At Intuit
Hadoop Summit 2016 - Evolution of Big Data Pipelines At IntuitHadoop Summit 2016 - Evolution of Big Data Pipelines At Intuit
Hadoop Summit 2016 - Evolution of Big Data Pipelines At IntuitRekha Joshi
 
How to Enable, Monitor, and Secure Your Remote Workforce
How to Enable, Monitor, and Secure Your Remote WorkforceHow to Enable, Monitor, and Secure Your Remote Workforce
How to Enable, Monitor, and Secure Your Remote WorkforceSolarWinds
 
Owasp Summit - Wednesday evening briefing master
Owasp Summit - Wednesday evening briefing masterOwasp Summit - Wednesday evening briefing master
Owasp Summit - Wednesday evening briefing masterDinis Cruz
 
LoginCat - Zero Trust Integrated Cybersecurity
LoginCat - Zero Trust Integrated CybersecurityLoginCat - Zero Trust Integrated Cybersecurity
LoginCat - Zero Trust Integrated CybersecurityRohit Kapoor
 
Eit corporate presentation
Eit corporate presentationEit corporate presentation
Eit corporate presentationEitpresentation
 
Eit corporate presentation
Eit corporate presentationEit corporate presentation
Eit corporate presentationEitpresentation
 
Eit corporate presentation
Eit corporate presentationEit corporate presentation
Eit corporate presentationeitwork
 
Securing Manufacturing: How we can improve speed and efficiency while protect...
Securing Manufacturing: How we can improve speed and efficiency while protect...Securing Manufacturing: How we can improve speed and efficiency while protect...
Securing Manufacturing: How we can improve speed and efficiency while protect...Conor Bronsdon
 
Barcelona Digital Festival 28th Nov 2019 - Data Analytics in eSports. UbeatCa...
Barcelona Digital Festival 28th Nov 2019 - Data Analytics in eSports. UbeatCa...Barcelona Digital Festival 28th Nov 2019 - Data Analytics in eSports. UbeatCa...
Barcelona Digital Festival 28th Nov 2019 - Data Analytics in eSports. UbeatCa...CIO Edge
 
Give ‘Em What They Want! Self-Service Middleware Monitoring in a Shared Servi...
Give ‘Em What They Want! Self-Service Middleware Monitoring in a Shared Servi...Give ‘Em What They Want! Self-Service Middleware Monitoring in a Shared Servi...
Give ‘Em What They Want! Self-Service Middleware Monitoring in a Shared Servi...SL Corporation
 
ITMAGINATION - competences, facts, technologies, clients
ITMAGINATION - competences, facts, technologies, clientsITMAGINATION - competences, facts, technologies, clients
ITMAGINATION - competences, facts, technologies, clientsITMAGINATION
 
Chaos Engineering: Why the World Needs More Resilient Systems
Chaos Engineering: Why the World Needs More Resilient SystemsChaos Engineering: Why the World Needs More Resilient Systems
Chaos Engineering: Why the World Needs More Resilient SystemsC4Media
 
How to add security in dataops and devops
How to add security in dataops and devopsHow to add security in dataops and devops
How to add security in dataops and devopsUlf Mattsson
 
Application Security Testing for Software Engineers: An approach to build sof...
Application Security Testing for Software Engineers: An approach to build sof...Application Security Testing for Software Engineers: An approach to build sof...
Application Security Testing for Software Engineers: An approach to build sof...Michael Hidalgo
 

Semelhante a accumulo summit 2015 (20)

Deliver solutions cv_vebtech
Deliver solutions cv_vebtechDeliver solutions cv_vebtech
Deliver solutions cv_vebtech
 
Platform governance, gestire un ecosistema di microservizi a livello enterprise
Platform governance, gestire un ecosistema di microservizi a livello enterprisePlatform governance, gestire un ecosistema di microservizi a livello enterprise
Platform governance, gestire un ecosistema di microservizi a livello enterprise
 
Continuous Integration
Continuous IntegrationContinuous Integration
Continuous Integration
 
Company_Profile_Updated_17032016
Company_Profile_Updated_17032016Company_Profile_Updated_17032016
Company_Profile_Updated_17032016
 
Freedom and Responsibility
Freedom and ResponsibilityFreedom and Responsibility
Freedom and Responsibility
 
Hadoop Summit 2016 - Evolution of Big Data Pipelines At Intuit
Hadoop Summit 2016 - Evolution of Big Data Pipelines At IntuitHadoop Summit 2016 - Evolution of Big Data Pipelines At Intuit
Hadoop Summit 2016 - Evolution of Big Data Pipelines At Intuit
 
shashank_QA_02_june_2016
shashank_QA_02_june_2016shashank_QA_02_june_2016
shashank_QA_02_june_2016
 
How to Enable, Monitor, and Secure Your Remote Workforce
How to Enable, Monitor, and Secure Your Remote WorkforceHow to Enable, Monitor, and Secure Your Remote Workforce
How to Enable, Monitor, and Secure Your Remote Workforce
 
Owasp Summit - Wednesday evening briefing master
Owasp Summit - Wednesday evening briefing masterOwasp Summit - Wednesday evening briefing master
Owasp Summit - Wednesday evening briefing master
 
LoginCat - Zero Trust Integrated Cybersecurity
LoginCat - Zero Trust Integrated CybersecurityLoginCat - Zero Trust Integrated Cybersecurity
LoginCat - Zero Trust Integrated Cybersecurity
 
Eit corporate presentation
Eit corporate presentationEit corporate presentation
Eit corporate presentation
 
Eit corporate presentation
Eit corporate presentationEit corporate presentation
Eit corporate presentation
 
Eit corporate presentation
Eit corporate presentationEit corporate presentation
Eit corporate presentation
 
Securing Manufacturing: How we can improve speed and efficiency while protect...
Securing Manufacturing: How we can improve speed and efficiency while protect...Securing Manufacturing: How we can improve speed and efficiency while protect...
Securing Manufacturing: How we can improve speed and efficiency while protect...
 
Barcelona Digital Festival 28th Nov 2019 - Data Analytics in eSports. UbeatCa...
Barcelona Digital Festival 28th Nov 2019 - Data Analytics in eSports. UbeatCa...Barcelona Digital Festival 28th Nov 2019 - Data Analytics in eSports. UbeatCa...
Barcelona Digital Festival 28th Nov 2019 - Data Analytics in eSports. UbeatCa...
 
Give ‘Em What They Want! Self-Service Middleware Monitoring in a Shared Servi...
Give ‘Em What They Want! Self-Service Middleware Monitoring in a Shared Servi...Give ‘Em What They Want! Self-Service Middleware Monitoring in a Shared Servi...
Give ‘Em What They Want! Self-Service Middleware Monitoring in a Shared Servi...
 
ITMAGINATION - competences, facts, technologies, clients
ITMAGINATION - competences, facts, technologies, clientsITMAGINATION - competences, facts, technologies, clients
ITMAGINATION - competences, facts, technologies, clients
 
Chaos Engineering: Why the World Needs More Resilient Systems
Chaos Engineering: Why the World Needs More Resilient SystemsChaos Engineering: Why the World Needs More Resilient Systems
Chaos Engineering: Why the World Needs More Resilient Systems
 
How to add security in dataops and devops
How to add security in dataops and devopsHow to add security in dataops and devops
How to add security in dataops and devops
 
Application Security Testing for Software Engineers: An approach to build sof...
Application Security Testing for Software Engineers: An approach to build sof...Application Security Testing for Software Engineers: An approach to build sof...
Application Security Testing for Software Engineers: An approach to build sof...
 

accumulo summit 2015

  • 1. Accumulo @ Bloomberg Accumulo Summit 2015
 Skand Gupta
 Bloomberg LP
  • 2. Bloomberg • Bloomberg technology helps drive the world’s financial markets – We build our own software, digital platforms, mobile applications and state of the art hardware – We run one of the world’s largest private networks with over 20,000 routers across our network – We have the largest server side JavaScript deployment in the world – 22 million lines of JavaScript code – We developed “cloud computing” and deployed “software as a service” well ahead of the general marketplace – Our technology, has brought transparency to the global financial markets • Bloomberg technologists – More than 3,000 software developers and designers located around the world (London, NYC, SF “tech hubs”) – BloombergLabs.com (@BloombergLabs) is our platform for dialogue between our experts and the broader tech community • Our clients – Over 320,000 subscribers – Primarily financial professionals including investment bankers, CFOs, investor relations, hedge funds managers, foreign exchange, etc.
  • 3. Source:  Wall  Street  Journal,  CFTC  ,  New  York  Times,  Marketplace.org
  • 4. Source:  Wall  Street  Journal,  CFTC  ,  New  York  Times Importance  of  Compliance
  • 5. Source:  Commodity  Futures  Trading  Commission Hiding  in  Plain  Sight
  • 6. Compliance  Platform  and  Processing  Pipeline   Chat Reference Data Trade Data Customer Data Product Data Market Data Counterparty Email Social Media Voice Human-­‐  and  Machine-­‐ generated  Data Surveillance   Pipeline Communication   Data Transactional   Data User   Data Case   Management Compliance  Platform Compliance  Storage Compliance   Officers Search,   Review,   Analyze
  • 7. HDFS Spark Kafka Storm Mesos   (Cluster  Resource  Manager) Elastic  data-­‐processing  and  analytics  stack Open  REST  API  (Play) WORM Pre-­‐fabricated  Hardware Applications
  • 8. Need  for  a  robust,  scalable,  high  performance,  geo-­‐distributed   data  storage  and  retrieval  system ❑ More  than  3  Peta  Bytes  of  archived   data   ❑ 80+  Billion  indexed  objects   ❑ Real-­‐time  scanning  of  35  million   objects  per  day 100’s  Gigabytes/year Communication  Data  Growth Cumulative  Data  Growth Over  3  Petabytes  today $0.00 $0.75 $1.50 $2.25 $3.00 List Price Replication DR Isolation $2.31 $1.15 $0.58 $0.19 Storing 1GB of Data Storage  Cost 2000 2002 2004 2006 2008 2010 2012
  • 9. Need  for  Low  Level  Security  Primitives Document Level Security Lorem  ipsum  dolor  sit  amet,   consectetur  adipiscing  elit,  sed  do   eiusmod  tempor  incididunt  ut   labore  et  dolore  magna  aliqua.  Ut   enim  ad  minim  veniam,  quis   nostrud  exercitation  ullamco  laboris   nisi  ut  aliquip  ex  ea  commodo   consequat.  Duis  aute  irure  dolor  in   reprehenderit  in  voluptate  velit  esse   cillum  dolore  eu  fugiat  nulla   pariatur.  Excepteur  sint  occaecat   cupidatat  non  proident,  sunt  in   culpa  qui  officia  deserunt  mollit   anim  id  est  laborum Company Level Security Data StoreData Pipe Application User Level Security Data Store
  • 10. Security  Solutions • Post-process the queries   – Too slow   – Nasty bugs   • Generate unique document for each view   – Exponential growth in number of documents   • Use application specific features – Solr dynamic fields, Mangled Fields   • Accumulo Visibility – Fast, Clean, Generic
  • 11. Data  Model Row ID Value CompanyA_userX_20150426 <bytes> CompanyA_userX_20150426 <bytes> CompanyA_userX_20150427 <bytes> CompanyA_userX_20150428 <bytes> CompanyA_userY_20150427 <bytes> CompanyB_userX_20150428 <bytes> CompanyB_userX_20150428 <bytes> CompanyB_userX_20150428 <bytes>
  • 12. Find  all  Communications  for  a  Set  of  Users  for  a  Date  Range Row ID Value CompanyA_userX_20150426 <bytes> CompanyA_userX_20150426 <bytes> CompanyA_userX_20150427 <bytes> CompanyA_userX_20150428 <bytes> CompanyA_userY_20150427 <bytes> CompanyB_userX_20150428 <bytes> CompanyB_userX_20150428 <bytes> CompanyB_userX_20150428 <bytes> Batch Scanner Application
  • 13. Find  all  Records  with  “Libor” Filter Row ID Value CompanyA_userX_20150426 <bytes> CompanyA_userX_20150426 <bytes> CompanyA_userX_20150427 <bytes> CompanyA_userX_20150428 <bytes> CompanyA_userY_20150427 <bytes> CompanyB_userX_20150428 <bytes> CompanyB_userX_20150428 <bytes> CompanyB_userX_20150428 <bytes> Batch Scanner Application
  • 14. Count  Number  of  Objects  that  Match  a  Filter Counting
 Iterator Filter Row ID Value CompanyA_userX_20150426 <bytes> CompanyA_userX_20150426 <bytes> CompanyA_userX_20150427 <bytes> CompanyA_userX_20150428 <bytes> CompanyA_userY_20150427 <bytes> CompanyB_userX_20150428 <bytes> CompanyB_userX_20150428 <bytes> CompanyB_userX_20150428 <bytes> Batch Scanner Application
  • 15. Scaling  Out Application Row ID Value CompanyA_userX_20150426 <bytes> CompanyA_userX_20150426 <bytes> CompanyA_userX_20150427 <bytes> CompanyA_userX_20150428 <bytes> CompanyA_userY_20150427 <bytes> CompanyB_userX_20150428 <bytes> CompanyB_userX_20150428 <bytes> CompanyB_userX_20150428 <bytes> Counting
 Iterator Filter Batch Scanner Counting
 Iterator Filter Batch Scanner Counting
 Iterator Filter Batch Scanner SparkProcessing
  • 16. Low  Latency  Writes  using  Accumulo  ‘File  System’ RowID Family Qualifier Value attach.pdf chunk “00001” <bytes> attach.pdf chunk “00002” <bytes> … … … … attach.pdf metadata file_size <file size> attach.pdf metadata chunk_size <chunk size> attach.pdf metadata sha256 <checksum> WriteTimes(ms) 0 5 10 15 20 HDFS Accumulo File System
  • 17. Conclusion • Understand the data • Free your data… but enforce access control • Need sensible systems that help achieve these goals Thank You!