1. Presto at FINRA –
Supporting market
surveillance at scale
John Hitchingham
FINRA Engineering
John.Hitchingham@finra.org
2. Market Regulation surveillance workflow
BDs Exchanges Reference
Data Providers
100B+ events 25+ PB of Data 3+ Yrs ProdMajor Exchange Clients
Market Manipulation, Insider Trading, Fraud,
Abuse
3. Data volume
Incoming
records
• 6000+ business objects
• 7+ million data partitions
• 160+ million data objects
• 25+ data publishers
• 5+ PB of data
14. Managed Data Lake (MDL) – Data Lake “in a box”
Just released as open source
Data lake implementation on
AWS
Featuring Presto as query
endpoint
https://finraos.github.io/herd-mdl
/
17. Query tool use at FINRA
Hive Spark Presto HBase
Status Deprecated General use General use Limited use
Used For ETL/ELT ETL/ELT (replace
Hive)
Data Science
Machine Learning
Data Engineering
Data Profiling
BI
Reporting
Custom Apps
requiring rapid
“indexed” lookups
18. Future exploration with Presto
o CBO
o AuthN/AuthZ
• Hive metastore – column, row – Ranger?
• Federated database access (Postgres) – model to control authorization unique to principal
• Federated AuthN (SAML, OAuth)
o Athena?