O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

MongoDB World 2019: High Performance Auditing of Changes Based on MongoDB Change Streams

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio

Confira estes a seguir

1 de 43 Anúncio

MongoDB World 2019: High Performance Auditing of Changes Based on MongoDB Change Streams

Baixar para ler offline

Take advantage of the elasticity of the cloud by creating resources that can heal themselves. Learn to create Compute Engine resources in GCP using Terraform that will install and configure a MongoDB replica set for you.

Take advantage of the elasticity of the cloud by creating resources that can heal themselves. Learn to create Compute Engine resources in GCP using Terraform that will install and configure a MongoDB replica set for you.

Anúncio
Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a MongoDB World 2019: High Performance Auditing of Changes Based on MongoDB Change Streams (20)

Anúncio

Mais de MongoDB (20)

Mais recentes (20)

Anúncio

MongoDB World 2019: High Performance Auditing of Changes Based on MongoDB Change Streams

  1. 1. © 2019 Fair Isaac Corporation. 1 © 2019 Fair Isaac Corporation. This presentation is provided for the recipient only and cannot be reproduced or shared without Fair Isaac Corporation’s express consent. FICO® Alert & Case Manager Auditing with MongoDB Change Streams Carlos Saraiva Sr. Principal Architect, FICO Sönke Sothmann Principal Engineer, FICO
  2. 2. © 2019 Fair Isaac Corporation. 2 FICO Overview Profile The leader in advanced analytics and decision management Founded: 1956 • Understanding and predicting human behavior • Reducing the time from insight to action • $1B revenue (2018) Products and Services 190+ patents in AI and machine learning methods • Scoring systems for credit underwriting and risk management • AI systems for security and fraud detection • Advanced solutions for AML detection and compliance • Tools for analytics authoring and decision management Clients and Markets 10,000+ clients in 90+ countries Industry focus: Finance, insurance, retail, government, healthcare, logistics, and manufacturing Worldwide 20+ offices worldwide, HQ in San Jose, California 3,400 employees Regional Hubs: San Rafael and San Diego (CA), New York, London, Birmingham (UK), Toronto, Johannesburg, Milan, Moscow, Bensheim, Munich, Madrid, Istanbul, Sao Paulo, Bangalore, Beijing, Singapore
  3. 3. © 2019 Fair Isaac Corporation. 3 Alert & Case Manager ACM Fraud Compliance • Fraud and Compliance use a dual-pronged analytical approach: • Expert-driven, rules-based analytics that assess risk based on human judgment and internal audit expertise. • Outlier detection algorithms that deliver an objective, prioritized assessment of risk based on an activity. • Strong Integrated Case Management to allow staff to quickly review the transactions with the highest risk level. • Risk transactions are sent to ACM to be analyzed and dispositioned by auditors • Alert Management, aka Transactional Case Management – single transactions • Investigative Case Management – a complete portfolio around one of more transactions with some commonalty • Suspicious Entity Case Management – an aggregation of alert cases by a suspicious entity (person, account, etc.)
  4. 4. © 2019 Fair Isaac Corporation. 4 Architecture
  5. 5. © 2019 Fair Isaac Corporation. 5 ACM is a case manager for fraud and compliance alerts
  6. 6. © 2019 Fair Isaac Corporation. 6 Auditing Business Requirements • Recreate cases to a point in time • Track actions on a case • User and automated actions • Administration audit trail • Track changes to groups, roles, ACLs, rules, approval process definitions, queue definitions, templates, etc. • Not covered in this presentation • Security audit trail - login, logout • Data Access Audit trail - PII / PCI data access
  7. 7. © 2019 Fair Isaac Corporation. 7 Non-functional Requirements • High performance for writing changes to the database • Reading and interpreting the audit trail has more relaxed performance requirements • Fail-proof • Capture all changes to the database, even those done by external systems, if any
  8. 8. © 2019 Fair Isaac Corporation. 8 Before the availability of change streams, we‘ve tried other approaches • Manual tracking – registering case activities (non-generic approach) • error prone (missing tracks, inconsistent) • couldn’t recreate snapshots • couldn’t capture changes done outside of the application • Javers (https://javers.org/) • bad performance, too granular, too many IOs • doesn't play well with MongoDB • couldn't capture changes done outside the application • causes issues when changes are not done through application, which makes it hard to e.g. execute scripts against Mongo directly
  9. 9. © 2019 Fair Isaac Corporation. 9 So we’ve looked to MongoDB Change Streams to solve the problem
  10. 10. © 2019 Fair Isaac Corporation. 10 • OPLOG – a capped collection that keeps track of all changes for replication purposes • Change Streams is an API to allow clients to listen to changes • High performance • Resumable Change Streams in a Nutshell
  11. 11. © 2019 Fair Isaac Corporation. 11 Change Events { _id : { "_data" : <BinData|hex string> }, "operationType" : "<operation>", "fullDocument" : { <document> }, "ns" : { "db":"<database>", "coll":"<collection>" }, "to" : { "db":"<database>", "coll":"<collection>" }, "documentKey" : { "_id" : <value> }, "updateDescription" : { "updatedFields" : { <document> }, "removedFields" : [ "<field>", ... ] } "clusterTime" : <Timestamp>, … }
  12. 12. © 2019 Fair Isaac Corporation. 12 Solution Architecture
  13. 13. © 2019 Fair Isaac Corporation. 13 Snapshot Recreation
  14. 14. © 2019 Fair Isaac Corporation. 14 Demo
  15. 15. © 2019 Fair Isaac Corporation. 15 Boost performance using fixed-time interval transfer (the Boxcar Pattern) • Write changes in batches • Collect changes up to a certain amount of time and up to a certain number of changes • E.g. Collect up to 1000 changes for max 1 second • Whichever limit is reached first leads to flushing the batch • Batch insert using insertMany()
  16. 16. © 2019 Fair Isaac Corporation. 16 High Availability • If audit node goes down, other nodes should continue to write the changes to the DB • Ensure changes are written with as little delay as possible • Ensure that we don‘t fall off the oplog because of audit node downtime
  17. 17. © 2019 Fair Isaac Corporation. 17 Change Streams Expectations & Reality
  18. 18. © 2019 Fair Isaac Corporation. 18 Expectations • Access to unchanged fields • E.g. user name/IP might not change between updates • Full document can be provided • Be able to persist the change event documents as they are • Change Streams are fast • Get exact changes • We don’t need to calculate the changes on our own!
  19. 19. © 2019 Fair Isaac Corporation. 19 Reality • Full document lookup feature returns a document that may differ from the document at the time of the update operation • Not suitable for auditing • Change event documents cannot be persisted directly • Need to be transformed • Changes include dot notation in object keys, which is not allowed when you persist a document • Although highly performant, there is degradation as change or document sizes increase • Arrays and embedded documents are reported as full arrays / full embedded documents • You don’t know what has changed, unless you calculate a diff • We can leverage this behavior for change meta data and logical change tracking
  20. 20. © 2019 Fair Isaac Corporation. 20 Demo Reality
  21. 21. © 2019 Fair Isaac Corporation. 21 We adapted our solution to work with this limitation and meet the original requirements
  22. 22. © 2019 Fair Isaac Corporation. 22 Summary
  23. 23. © 2019 Fair Isaac Corporation. 23 Solution meets our expectations in terms of meeting the requirements • Change Streams is an adequate way of recording audit trail of changes • Audit Reader/Writer are generic, can be used with any apps that use MongoDB • Foolproof: as long as the audit component listens to the collections, no change will go undetected • no code required to capture changes • allows for easy recreation of snapshots • captures changes made outside the application • by direct manipulation through scripts • made by other applications • high performance • highly available, fault-tolerant, resumable
  24. 24. © 2019 Fair Isaac Corporation. 24 Scalability
  25. 25. © 2019 Fair Isaac Corporation. 25 Scalability • Vertical Scalability • Writer Threads • Form buckets / multiple listeners • Horizontal Scalability • Load distribution using middleware • Form buckets / multiple listeners
  26. 26. © 2019 Fair Isaac Corporation. 26 Vertical Scalability – Writer Threads • Multiple writer threads • Allows faster writes
  27. 27. © 2019 Fair Isaac Corporation. 27 Vertical Scalability – Multiple Listeners • Listener per watched collection • Listener per bucket (kind of sharding) • subscribe to changes using filter conditions (e.g. ranges of customer IDs)
  28. 28. © 2019 Fair Isaac Corporation. 28 Horizontal Scalability • Should not be required in most cases, due to MongoDB‘s legendary performance • Options • Use middleware to distribute the load among different nodes • subscribe to changes using filter conditions • Form buckets (e.g. customer ID ranges) • different nodes could register for changes of only a subset of the documents • All nodes that are processing a bucket should be run in an HA setup
  29. 29. © 2019 Fair Isaac Corporation. 29 Scalability – Resumability • When vertical or horizontal scalability is used, it’s no longer possible to resume the change stream just by looking at the latest change document in MongoDB to retrieve the latest resume token • When multiple change documents are processed in parallel, and the cluster crashes while processing, older change documents could have not yet been persisted, while newer change documents already have, creating a gap in the audit trail. • Options • Allow to start with a certain timestamp or resume token • Gap detection
  30. 30. © 2019 Fair Isaac Corporation. 30© 2019 Fair Isaac Corporation. 30 Questions & Answers
  31. 31. © 2019 Fair Isaac Corporation. 31 © 2019 Fair Isaac Corporation. This presentation is provided for the recipient only and cannot be reproduced or shared without Fair Isaac Corporation’s express consent. Thank you Vielen Dank Obrigado
  32. 32. © 2019 Fair Isaac Corporation. 32 Appendices
  33. 33. © 2019 Fair Isaac Corporation. 33 Reality – How Changes to Arrays and Emb. Documents are reported (1) { "_id" : ObjectId("5c83c795174eabef4a5d98ca"), "firstname": "Max", "lastname": "Mustermann", "age": 36.0, "addresses": [ {"type":"home", "street":"Musterweg 1", "zip":"68123", "city":"Mannheim", "country":"Germany"}, {"type":"work", "street":"Beispielgasse 2", "zip":"64625", "city":"Bensheim", "country": "Germany"} ], "employment": { "emplNo": "123456", "supervisor": { "emplNo": "1", "firstname": "Big", "lastname": "Boss" } } }
  34. 34. © 2019 Fair Isaac Corporation. 34 Reality – How Changes to Arrays and Emb. Documents are reported (2) db.getCollection('test').updateOne({_id: ObjectId("5c83c795174eabef4a5d98ca")}, {$set: { "_id" : ObjectId("5c83c795174eabef4a5d98ca"), "firstname": "Max CHANGED", "lastname": "Mustermann", "age": 36.0, "addresses": [ {"type":"home", "street":"Musterweg 1 CHANGED", "zip":"68123", "city":"Mannheim", "country":"Germany"}, {"type":"work", "street":"Beispielgasse 2", "zip":"64625", "city":"Bensheim", "country":"Germany"} ], "employment": { "emplNo": "123456", "supervisor": { "emplNo": "1", "firstname": "Big CHANGED", "lastname": "Boss" } } }})
  35. 35. © 2019 Fair Isaac Corporation. 35 Reality – How Changes to Arrays and Emb. Documents are reported (3) { "_id" : { "_data" : ... }, "operationType": "update", "clusterTime": Timestamp(1556449376, 1), "ns" : { "db":"test", "coll":"test" }, "documentKey": { "_id": ObjectId("5c83c795174eabef4a5d98ca") }, "updateDescription": { "updatedFields": { ... } "removedFields": [ ] } }
  36. 36. © 2019 Fair Isaac Corporation. 36 Reality – How Changes to Arrays and Emb. Documents are reported (4) { ... "updateDescription": { "updatedFields": { "addresses": [ { "type":"home", "street":"Musterweg 1 CHANGED", "zip":"68123", "city":"Mannheim", "country":"Germany" }, { "type":"work", "street":"Beispielgasse 2", "zip":"64625", "city":"Bensheim", "country":"Germany" } ], "employment" : { "emplNo": "123456", "supervisor": { "emplNo": "1", "firstname": "Big CHANGED", "lastname": "Boss" } }, "firstname": "Max CHANGED" }, "removedFields": [ ] } }
  37. 37. © 2019 Fair Isaac Corporation. 37 Resumability • Change streams are resumable by specifying a resume token when subscribing for changes • MongoDB will replay changes that have occurred since the change indicated by the resume token • Resume tokens are part of the change documents • As we persist change documents to an audit collection, latest change token can be fetched from there • Downtimes of all audit nodes do not lead to missing change events • as long as the oplog still holds the changes • configure your oplog size!
  38. 38. © 2019 Fair Isaac Corporation. 38 How to find out if change document has already been persisted? • Writing change documents for the original changes happens asynchronously • Based on how you implement or configure it, changes should be written some milliseconds or seconds after the original change • If you still need to know if the change document for a particular change has already been persisted • Introduce change ID • E.g. generate change UUID in your app and set it in the document along with your other changes • Change document will also have this change ID
  39. 39. © 2019 Fair Isaac Corporation. 39 Change Meta Data • Persist change meta data in the original document along with your changes • createdDate/updatedDate • User name and/or ID, plus IP address of user initiating the update • Change UUID
  40. 40. © 2019 Fair Isaac Corporation. 40 Logical Change Tracking • Problem 1: multiple changes combined in single write operation (e.g. user changes priority, rules update other fields), leading to a single change document • Problem 2: Changes in the document might not always clearly indicate what the change is (you only see the manifestation of the change) • Persist logical change description in original document along with actual changes • Logical actions, e.g. „confirm alert“ • User or rule or system performing the update • Parameters of the action, e.g. If action is „change priority“, the parameter might be the target priority, e.g. „P1“ • As this list of logical changes is itself a change of the document, it will be captured in the change events along with the actual changes and gets persisted as an entry in the audit trail • Next change to the document would then replace latestLogicalChanges with new entries • List of changes doesn’t grow in the document itself, as only latest change is persisted
  41. 41. © 2019 Fair Isaac Corporation. 41 Change Meta Data & Logical Change Tracking - Example {"latestChange": { "changeUuid": "…", "timestamp": …, "triggeringEntity": { "type": "USER", // or "SYSTEM" "username": "bob", "ipaddress": "172.123.321.111" }, "logicalChanges": [ { "type":"UserAction", "action":"AlertDecisioning", "params":{"alertId":"123", "decision":"confirmed" } }, { "type":"RuleAction", "action":"SetAlertPriority", "params":{"priority":"P1"} } ] } … }
  42. 42. © 2019 Fair Isaac Corporation. 42 Things to note • Requires • MongoDB 3.6 or later • Replica set • "majority" read concern to be enabled • Eventual consistent • Document changes do not lead to immediate audit trail entries, as change writes will be asynchronous. There will be a delay until the audit entries are written. • Throughput needs to be measured • Use an oplog that is large enough to buffer the changes of several hours, e.g. 24 hours • Works best with small documents / changes on top level properties
  43. 43. © 2019 Fair Isaac Corporation. 43 Core Competencies • Predictive and Descriptive Analytics • Supervised and Unsupervised Techniques • Machine Learning and Artificial Intelligence • Unstructured Data Analytics • Advanced Optimization Advanced Analytics Decision Management • Advanced Rules Management Software • Integrated Analytics and Operational Platform • Operationalized Analytics (Context-based Approaches) • Rapid Application Development • Standards-based (i.e. Decision Modeling Notation) Applied Risk and Fraud Management • Deep Expertise in Credit Risk and Fraud Detection • Trusted Custodians of Massive Data Consortia • Cyber Risk Quantification and Analytics-based Threat Detection • Portfolio-level and Systemic Risk Assessment • AML, KYC, and Compliance Management

×