Grab is one of the most frequently used mobile platforms in Southeast Asia, providing the everyday services that matter most to consumers. Its users commute, eat, arrange shopping deliveries, and pay with one e-wallet. Grab relies on the combination of Apache Kafka and Scylla for a very critical use case -- instantaneously detecting fraudulent transactions that might occur across approximately more than six million on-demand rides per day taking place in eight countries across Southeast Asia. Doing this successfully requires many things to happen in near-real time.
Join our webinar for this fascinating real-time big data use case, and learn the steps Grab took to optimize their fraud detection systems using the Scylla NoSQL database along with Apache Kafka.
2. 2
+ The Real-Time Big Data Database
+ Drop-in replacement for Cassandra
+ 10X the performance & consistent, low latency
+ Open source and enterprise editions
+ New: Scylla Cloud, DBaaS
+ Founded by the creators of KVM hypervisor
+ HQs: Palo Alto, CA; Herzelia, Israel
About ScyllaDB
3. + Engineering Lead at Grab Technologies
+ Streaming Platform
+ Data Platform Infrastructure
+ Worked at Uber Inc prior to Grab
+ Prior to that worked for Isilon/EMC (now DELL)
+ Masters in Computer Science from University of Virginia (UVa)
+ Contact: aravindvelamur@gmail.com
Presenter
4. Background
Overview of TechStack at Grab
Streaming Ecosystem at Grab
Why ScyllaDB
ScyllaDB use at Grab (Fraud Detection)
Conclusion
Q&A
Agenda
6. Introduction to Grab
+ One of the most frequently used mobile
platforms in Southeast Asia
+ Multiple services (multi-million users
per day and growing!)
+ Transport
+ Food
+ Payment
+ Shopping
+ Package Delivery
+ ...
20. Why ScyllaDB
+ Required a State store (metadata store) which has the following
characteristics:
+ Ability to handle very high throughput
+ Ability to handle bursts (doh!)
+ Ability to scale out (i.e, handle hockey stick like growth)
+ Very low latencies (both write and reads) - Near Real-Time
+ Low operational overhead
+ Cost efficient
31. + Why? Eg:- Fraud Detection
ScyllaDB Usage - Fraud Detection
...
The big influx of capital into the industry has led to fraudsters, sometimes individuals,
sometimes organised in gangs, trying to game incentive and sign-up schemes. As a
result, a stolen ride-hailing driver profile today is worth up to US$30 on the black market,
even more than stolen credit card information.
…
33. + Use Kafka streams to do real-time Fraud detection
+ Simple Use Case:
+ Some scammers use fake GPS to say they are online at multiple locations
ScyllaDB Usage - Fraud Detection
34. + Simple Use case:
ScyllaDB Usage - Fraud Detection
Grab
Service
GPS
driver_location topic
Fraud
Detection
Service
Algorithm
35. + Complex Use Case:
+ Fraudsters evolve :)
+ Consuming one topic is not enough
+ Example:
ScyllaDB Usage - Fraud Detection
Use fake GPS tools and modded phones to simulate driving
behaviour and completed rides to game the system.
37. + In summary using ScyllaDB
+ Joined multiple Kafka streams together in Real-Time!!
+ Like joining SQL tables together!
ScyllaDB Usage - Fraud Detection
40. + Teams like to find out:
+ Counts of … (eg: count of rides in the past month, etc)
+ Raw stats for a particular key
+ Business users - require stats on city/country data
ScyllaDB Usage - Stream Statistics
41. ScyllaDB Usage - Stream Statistics
+ Requires a fast store for raw stats
+ Ability to build Time series on top of it.
43. ScyllaDB at Grab So Far...
Overall
+ Great experience
+ Cost effective
+ Very responsive team
+ As good as advertised :)
+ Growing within the company…
44. ScyllaDB at Grab So Far...
+ Some hiccups
+ nodetool repair - Writes and read timeouts every time when the run finishes
(even with -pr)
+ Wish nodes can join the cluster faster
+ Right now takes a really long time (approx 1TB data on each node across all
keyspaces) - takes ~2.5 hours to add a node
+ Understand why but can we design something asynchronous?
+ Better error logging