4. P A Y M E N T S S Y S T E M
Credit Card
____________________________
Information
Account Number
Brand Mark
Expiration DateBIN
Chip
Hologram
Signature
Security Code
Jiang-Ming Yang @ 2015.04
5. P A Y M E N T S S Y S T E M
Card Number
____________________________
Information
Jiang-Ming Yang @ 2015.04
6. Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
%B1234567890123456^CHRISTNER/JOEL ^1504101100001100000000447000000?
CCN: 1234567890123456
Exp: 04/15
CVV: 447
Cardholder: CHRISTNER/JOEL
7. Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
;1234567890123456=150410110000447?
CCN: 1234567890123456
Exp: 04/15
PIN: 000
8. P A Y M E N T S S Y S T E M
Credit Card
____________________________
Transaction flow
Jiang-Ming Yang @ 2015.04
12. Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Active/Active
What
• Resilient to datacenter-level failure
• Resilient to Internet routing problems
• Transparent to the merchant
• No human intervention
Why
• Every second of uptime matters to our
merchants. Goal is 5 9s.
• Much easier and safer to perform
datacenter-level maintenance.
13. Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Challenges
Inconsistent state between
datacenters
Datacenters can’t tell if a transaction
has already been processed elsewhere.
Limited idempotence
Payment networks can’t reliably
guarantee idempotence on retries.
Real-time latency requirements
We can’t just wait until our datacenters
get in sync.
14. Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Concepts
Client idempotence key
15. Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Concepts
Client idempotence key Server transaction
16. Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Concepts
Client idempotence key Server transaction Transaction progression
17. Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Card Processing
____________________________
Multi-DC resolution
18. Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Multi-Tender
____________________________
Multi-DC resolution
Scenario
When Merchant try to sell items/products to customers, customers will have the
option to pay with multiple tenders.
APIs
1. CreateBill
2. AddTender
3. CompleteBill / CancelBIll
Challenges
1. Each time we receive a tender request, we need to process this tender
immediately. Thus different tenders for the same bill may be processed at
different data centers.
2. When receiving the CompleteBill request, we may need to wait for the tender
information from remote data center.
19. Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Multi-Tender
____________________________
Multi-DC resolution
20. Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Multi-Tender
____________________________
Multi-DC resolution
State Machine
Tender state machine
Bill state machine
Correctness
1. A formal proof
2. Simulate all the possible operational combinations and verify the results
21. Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Caveats
Eventually consistent
Asynchronous, eventually consistent systems
are harder to reason about.
Complex
Active/active systems are harder to design,
implement, and test.
Data Loss
If the original data center is down and never
comes back, we may not be able the perform
the capture due to the loss of original auth.
Downstream effects
Not all downstream effects are reversible.
22. Is this the ideal solution?
Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
24. Jiang-Ming Yang @ 2015.04
Database Sharding
____________________________
• Shard Key
• User Id / Merchant Id / Transaction Id / …
• Shard function
• Hash / logical->physical mapping / dynamic / …
Primary
Replication
Secondary
Shard 1
Primary
Replication
Secondary
Shard 2
...
Primary
Replication
Secondary
ShardX
BackendDatabase (MySQL )
P A Y M E N T S S Y S T E M
26. Shard Key
P A Y M E N T S S Y S T E M
Jiang-Ming Yang @ 2015.04
27. Jiang-Ming Yang @ 2015.04
`
Select shard key per components
• Pros:
• flexible for each component
• Cons:
• hard for data migration, e.g. achieving/migrating all data for a given user.
• Issue on a single DB instance may impact the whole system.
Subscription /
membership
Shard 1 Shard 2
...
ShardX
Billing
Shard 1 Shard 2
...
ShardX
PI
Shard 1 Shard 2
...
ShardX
P A Y M E N T S S Y S T E M
28. Jiang-Ming Yang @ 2015.04
`
Use a master key for all components and
organize system by “scale-out unit”
• Pros:
• Isolate the impact of a single shard
• minimize the cross shard accesses
• Optimize for deployment roll-out
• Dependency control
• Capacity planning
• Cons:
• Load balancing
Primary
Replication
Secondary
Shard 1
...
...
Primary
Replication
Secondary
...
Primary
Replication
Secondary
...
Shard 2 ShardX
P A Y M E N T S S Y S T E M
30. Jiang-Ming Yang @ 2015.04
`
Shard Function
• Hash
• Logical Shard / Physical Shard mapping
• Dynamic Sharding
P A Y M E N T S S Y S T E M
31. Jiang-Ming Yang @ 2015.04
`
ID Generation
• ID range per shard
• Encode logical partition inside ID
• Benefits:
• Adjust the shard function without impact the existing IDs
• Route new traffics to new partitions
P A Y M E N T S S Y S T E M
33. Re-Sharding
• Scenarios for Re-Sharding
• Active shard vs Achieve shard
• Load balance
• Scale out
• Route new traffics to new partitions
• Split existing partitions
• Dynamic sharding
Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
34. Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Split Existing Shards
Split one shard into two shards
Read/write from new shards
Data clean up
35. Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Dynamic Sharding
Re-sharding by lookup:
• Challenge: the scalability and availability of lookup layer
• Database replication
• lookup and fallback in case of replication latency
• Consistency hash
• Using a key/value store
• Data migration
• Lease (if migration cannot be done in a single transaction)
36. Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Dynamic Sharding
Re-sharding by buckets:
• Group multiple records into a small bucket and migrate them together.
• For example:
• If we group 1k records into a bucket, for 10 million records:
10,000,000 / 1,000 * (8 bytes (shard key) + 4 bytes (shard ID)) = 117.2MB
37. Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Tips for logical shard
• To route new traffics to new shards: please reserve a big range of unused
logical shards
• For shard splitting: please reserve a big range of logical shards for a single
physical shard.
• To support re-sharding by buckets: please keep the number of records per
logical shard small enough.
39. Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Data Migration
• The typical four steps for data migration: we can roll forward or roll back at
each step.
• Source (read/write) | Target
• Source (read/write) | Target (write)
• <---migration--->
• Source (write) | Target (read/write)
• Source | Target (read/write)
40. Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Tests for Data Migration
• Sanity check of Get-API results
• Compare the Write-API behaviors
• Check the batch jobs
• Dry-run with the real production data
• Tips: Be careful of SQL foreign key
41. No-SQL
P A Y M E N T S S Y S T E M
Jiang-Ming Yang @ 2015.04
42. Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
NoSQL
• Motivation
• Schema change
• Storage-level sharding
• Options
• Using MySQL as a key/value store
• Riak/Cassandra
• Others
• Limitation:
• Transaction
• Deal with conflicts
• Consistency for secondary indexes
44. Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Ideal Solution
• What do we need?
• Scalability and Availability
• Consistency cross datacenter
• Secondary indexes
• Transaction
• What we can compromise?
• A little bit higher latency
45. Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
NewSQL
• Google Spanner
• https://research.google.com/archive/spanner.html
• FoundationDB
• https://foundationdb.com/
• CockroachDB
• https://github.com/cockroachdb/cockroach
46. Security
P A Y M E N T S S Y S T E M
Jiang-Ming Yang @ 2015.04
47. Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
Security Flow
____________________________
• Data collection
• Encryption right after
collecting the data
• Use iFrame for most web
integration
• Audit the data access
permission
• Data persistent
• Key rolling
• Token rolling
48. What we didn’t cover?
P A Y M E N T S S Y S T E M
Jiang-Ming Yang @ 2015.04
49. Jiang-Ming Yang @ 2015.04
P A Y M E N T S S Y S T E M
What we didn’t cover
• Risk
• Reconciliation
• Finance reporting
• Inventory management
• Fulfillment
• Virtual Currency
• Cross currency transaction
• More business logics:
• Subscription / Recurring billing
• Bundle offer
• Reservation
50. Q?
P A Y M E N T S S Y S T E M
Jiang-Ming Yang @ 2015.04