O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Scalable and Resilient Security Ratings Platform with ScyllaDB

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio

Confira estes a seguir

1 de 21 Anúncio

Scalable and Resilient Security Ratings Platform with ScyllaDB

Baixar para ler offline

SecurityScorecard is a global leader in cybersecurity ratings and the only service with over 12 million companies continuously rated. ScyllaDB is now an integral part of our data processing. Our requirements are for a database with low query latency, real-time data ingestion, fault tolerance, and highly scalable.

In this presentation, we will share how ScyllaDB is powering our platform and why it is a great fit. We will highlight our business and technical use-cases, and the challenges we faced before migrating to ScyllaDB. Next, we will describe how we migrated three data sources and decoupled the frontend and backend services by introducing a middle layer for improved scalability and maintainability.Finally, we will conclude by sharing some of our learnings, performance benchmarks, and future plans.

SecurityScorecard is a global leader in cybersecurity ratings and the only service with over 12 million companies continuously rated. ScyllaDB is now an integral part of our data processing. Our requirements are for a database with low query latency, real-time data ingestion, fault tolerance, and highly scalable.

In this presentation, we will share how ScyllaDB is powering our platform and why it is a great fit. We will highlight our business and technical use-cases, and the challenges we faced before migrating to ScyllaDB. Next, we will describe how we migrated three data sources and decoupled the frontend and backend services by introducing a middle layer for improved scalability and maintainability.Finally, we will conclude by sharing some of our learnings, performance benchmarks, and future plans.

Anúncio
Anúncio

Mais Conteúdo rRelacionado

Semelhante a Scalable and Resilient Security Ratings Platform with ScyllaDB (20)

Mais de ScyllaDB (20)

Anúncio

Mais recentes (20)

Scalable and Resilient Security Ratings Platform with ScyllaDB

  1. 1. Scalable and Resilient Security Ratings Platform with ScyllaDB Nguyen Cao, Staff Data Engineer
  2. 2. Nguyen Cao ■ Staff Software Engineer at SecurityScorecard ■ Key member of data migration to ScyllaDB project ■ 8 years of experience building large scale distributed systems ■ MSc in Computing Science, specialized in Big Data
  3. 3. ■ Introduction ■ Challenges & Improvements ■ Results ■ Lessons Presentation Agenda
  4. 4. Introduction
  5. 5. SecurityScorecard Mission To make the world a safer place by transforming the way organizations understand, mitigate, and communicate cybersecurity to their Boards, employees, and vendors.
  6. 6. SecurityScorecard Security Rating Security Rating is an objective, data-driven and quantifiable measure of an organization’s overall cybersecurity and cyber risk exposure. Ratings grade vendors and organizations on a scale of A through F. SecurityScorecard provides quality insights, giving you the confidence to make fast and informed decisions about cybersecurity investments. Companies with an F rating are 7.7x more likely to suffer a data breach versus those with an A rating. Entities with a Better Security Rating are More Resilient SecurityScorecard Provides: Continuous Visibility into Statewide Risk Greater Visibility into Cyber Investments Decreased Risk of Breaches Hurting the State and Taxpayers
  7. 7. SecurityScorecard Data Pipeline - IPv4 scan - Malware Sinkholes - DNS data - External data feeds Signal Collection - RIR, DNS, SSL data - Domain discovery - Subdomains - IP-domain pairing Attribution Engine - Investigate emerging threats - CVEs - Machine Learning Cyber Analytics - Digital Footprint - Size normalization - Factor scores - Total score Scoring Engine Global network of sensors deployed across 50 countries to spot zero-day threats 4.1B IP addresses scanned every week 100B+ vulnerabilities published weekly at trust.securityscorecard.com 12M+ organizations continually scored every day Risk Factors Application Security Hacker Chatter Cubic Score Social Engineering Patching Cadence DNS Health Network Security Endpoint Security IP Reputation Information Leak The detected security issues are measured by the assigned factor with severity-based weights, update cadence and age out window to determine the calculation of a score
  8. 8. Technical Challenges
  9. 9. Scoring Architecture ssc-platform-api ssc-svc-measurements Redis HDFS Cluster Presto Cluster Aurora SQL query SQL query Redis query AWS EMR Scoring Workflow 12M scorecards 4B measurement stats for domains/IPs 16TB historical measurement details for 1 year OVERVIEW Pre 2022
  10. 10. Scoring Architecture ssc-platform-api ssc-svc-measurements Redis HDFS Cluster Presto Cluster Aurora Scoring Workflow SQL query SQL query Redis query AWS EMR SELECT * FROM measurement_details WHERE scorecard IN (...) AND date >= … and date <= … INSERT INTO measurement_stats VALUES (...) HIGH LATENCY Pre 2022
  11. 11. Scoring Architecture VERTICAL SCALABILITY ssc-platform-api ssc-svc-measurements Redis HDFS Cluster Presto Cluster Aurora SQL query SQL query Redis query AWS EMR Scoring Workflow largest possible ElasticCache instance Pre 2022 ssc-airflow-ops NodeJS/Typescript Python
  12. 12. Scoring Architecture ssc-platform-api ssc-svc-measurements Redis HDFS Cluster Presto Cluster Aurora Scoring Workflow SQL query SQL query Redis query AWS EMR INSERT INTO measurement_details(...) VALUES (...) UPDATE measurement_details(...) SET (...) DATA IMMUTABILITY Pre 2022
  13. 13. Scoring Architecture ssc-platform-api ssc-svc-measurements Redis HDFS Cluster Presto Cluster Aurora SQL query SQL query Redis query AWS EMR Scoring Workflow ssc-svc-users ssc-svc-reports …. MAINTAINABILITY Pre 2022
  14. 14. Technical Improvements ScyllaDB Migration
  15. 15. Scoring Architecture Current OVERVIEW ssc-platform-api ssc-svc-measurements Scoring Workflow CQL query REST API S3 ssc-scoring-api Presto Cluster AWS EMR SQL query 12M scorecards 4B measurement stats for domains/IPs all historical measurement details historical measurement details for 2 weeks
  16. 16. Scoring Architecture Current LOW LATENCY ssc-platform-api ssc-svc-measurements Scoring Workflow CQL query REST API ssc-scoring-api S3 Presto Cluster AWS EMR SQL query scorecard_detail ( uuid_company_id_key UUID, total_score DOUBLE, breach_impact DOUBLE, …, effective_date DATE, PRIMARY KEY ((uuid_company_id_key),effective_date) ) WITH default_time_to_live = 32400000; schemas are designed based on access pattern highly parallel processing tasks SELECT * FROM scorecard_detail WHERE uuid_company_id_key IN (...) AND date >= … and date <= … read throughput is stable even under high write workload
  17. 17. Scoring Architecture Current HORIZONTAL SCALABILITY ssc-platform-api ssc-svc-measurements Scoring Workflow CQL query REST API S3 ssc-scoring-api Presto Cluster AWS EMR SQL query 6 ECS instances 12 GB 12 nodes 720 GB 20 TB storage infinite object storage
  18. 18. Scoring Architecture Current DATA ACCESS ABSTRACTION ssc-platform-api ssc-svc-measurements Scoring Workflow CQL query REST API S3 ssc-scoring-api Presto Cluster AWS EMR SQL query ssc-svc-users ssc-svc-reports …. access data in ScyllaDB for low latency requests with high volume redirect all historical or high latency requests such as reporting to Presto S3 REST interface access for all FE services
  19. 19. Results Migration to ScyllaDB helps us gain lot of benefits from different perspectives: ■ 90% latency reduction for most service endpoints ■ 80% less production incidents related to Presto/Aurora performance ■ $1M infrastructure cost saving per year ■ 30% faster data pipeline processing ■ Much better customer experience
  20. 20. Lessons Route infrequent, complex and high latency-tolerant data access to OLAP engines like Presto, Athena (generating reports, custom analysis, etc.) Build a scalable, highly parallel processing aggregation component to overcome current limits of CQL (in-memory JOIN-capable, SELECT-IN queries, etc.) Design ScyllaDB schemas based on data access patterns to address latency issues.
  21. 21. Thank You Stay in Touch Nguyen Cao ncao@securityscorecard.io @ducnguyen_cao https://github.com/nguyencaoduc https://www.linkedin.com/in/nguyenduccao/

×