O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Improving Mobile Payments With Real time Spark

1.189 visualizações

Publicada em

Talk about real world spark streaming implementation for improving mobile payments experience. Presented at Target data meetup at Bangalore by Madhukara Phatak on 22/08/2015.

Publicada em: Dados e análise
  • Seja o primeiro a comentar

Improving Mobile Payments With Real time Spark

  1. 1. Improving Mobile Payments with Real time Spark
  2. 2. ● Madhukara Phatak ● Big data consultant and trainer at datamantra.io ● Consult in Hadoop, Spark and Scala ● www.madhukaraphatak.com
  3. 3. Agenda ● Mobile as drive for big data ● Our customer solution ● Existing data solution ● Improved solution ● Technical details ● Future enhancements ● Q & A
  4. 4. Mobile as Big data drive ● Mobile has changed the way in which we interact with world ● Most of the buy/sell happens on mobile today ○ Myntra went fully mobile ○ Flipkart and amazon say their 50% buy happens on mobile ○ Quikr and OLX is mobile based selling platform ○ Ola etc
  5. 5. Challenges in Mobile ● Customers expect the service to available 24/7 ● Tiny screens make very challenging to typical software flows ● Flaky connectivity of mobile networks makes it tougher ● Constant moving results in drop in interactions ● No more downtime ● Everything has to be done in realtime
  6. 6. Mobile payments ● Almost every app earlier mentioned needs some kind of payment ● Getting payments right on mobile is very hard ● Globally 21% of online shoppers abandon their basket due to payment failures or delays ● Some companies are building sdk’s to help the app developers ● Our customer is one of them
  7. 7. Why mobile payments are hard?
  8. 8. Too many inputs
  9. 9. Terrible interface by Banks
  10. 10. OTP vs Password
  11. 11. Our customer solution
  12. 12. Our customer solution ● Mobile sdk for applications simplify the payments ● SDK provides better user interface like big buttons to generate OTP or other flows ● SDK also helps in filling up different kind of forms given by different banks using consistent UI ● Better user experience across applications ● Application sends anonymous payments details across apps to our customer servers
  13. 13. Some numbers ● 40 + customers ● Over 1 million transactions per month as per March ● Around 55% success rate ( 5 % above average) ● Supports major banks, payment gateways and wallet providers ● Soon will be available in other than mobile payment space
  14. 14. Why data matters? ● As number of transaction increases, things will go wrong ● There are so many different combinations to go wrong ● Example ○ Airtel OTP failing with state bank netbanking ○ Customers stuck in password page ○ Not able to read OTP from some specific ● Understanding customer pain and reacting to it is paramount ● Every help results in payment
  15. 15. Initial BI solution Events Hourly Push JSON Data S3FS Session Wise Aggregations
  16. 16. Initial BI solution ● Phone sdk pushes events like transaction initiation, payment complete to logging servers ● Logging servers roll log for every one hour and push to s3 ● A single node spark machine aggregates data by sessions and pushes it to mysql ● Google BigQuery is used for adhoc querying
  17. 17. Challenges with BI solution ● Batch processing ● Geared towards more of report generation oriented flow ● Very minimal use of Spark API’s as team was not well aware of it’s potential ● No integration with mobile sdk for feed back loop
  18. 18. Requirements for consulting ● Bring the same reporting calculations to real time ● Understanding the user behaviour and tracking his/her flow over a session ● Closing the loop by providing automatic alerts based on the metric calculations ● Some new specific business cases like loyalty management etc ● Improving team expertise on spark
  19. 19. Choosing Spark streaming ● Company was already invested in Spark so spark streaming was no brainer ● Also porting spark batch code to streaming was mostly straight forward as both talk same API ● Company used python as Spark API language which was supported by streaming also ● So we didn't consider storm we went ahead with Spark streaming
  20. 20. First version Events Five Minute Push JSON Data FileStream Session Wise Aggregations
  21. 21. First version ● We used fileStream API of spark streaming which allowed us to poll a s3 bucket for every few mins ● A new rolling appender was added to log servers to push logs to s3 every 5 mins ● Exact same batch code was used for calculations which made transition very easy ● All downstream applications remained same
  22. 22. Second version Events JSON Data Session Wise Aggregations Hourly Push Realtime
  23. 23. Amazon Kinesis ● A kafka like distributed message queue by Amazon ● It’s used as managed kafka source on AWS web services ● Highly scalable and low latency support ● Persistence with fault tolerance across multiple availability zones ● Great integration with Spark
  24. 24. Second version ● Amazon kinesis is added as real time stream source ● Logging server push logs to kinesis as they arrive ● Streaming application pulls the data from kinesis for every few mins ● Multiple partitions support added for parallel streams
  25. 25. Challenges with Python ● Spark streaming API for python was introduced in 1.2 whereas spark-streaming for Scala/Java is available from 0.8 ● No aws kinesis connector was available as of March ● Team has to write it’s own ● No support for python in Spark job server
  26. 26. Challenges from batch to streaming ● Session typically last from 1-10 mins. Batch is easy most of the time session is done for a one hour data but challenging for real time data ● Designing state for session ● Designing checkpointing and deciding on interval ● Weird checkpointing issues with s3 due to eventual consistency
  27. 27. Improvements to batch code ● Most of the code was written in rdd paradigm as it was only know to team ● Team was trained on spark sql and spark streaming ● Majority code was ported to Spark sql based solution to improve readability and maintainability ● Recently moved into Dataframe based code
  28. 28. Third version Events JSON Data Session Wise Aggregations Hourly Push Realtime
  29. 29. Choosing Mesos ● Mesos is a great cluster manager for Spark only workloads ● Has specific coarse-grain mode which is dedicated for the real time systems ● Minimal overhead compared to YARN ● Easy to setup on EC2
  30. 30. Fourth version Events JSON Data Session Wise Aggregations Hourly Push Realtime
  31. 31. Grafana ● Added grafana for visualization and dashboards ● Graphana = Graphite + influxDB ● Moved away from mysql to time series database influx DB ● Scales much better compared to mysql ● Data scientists or product managers can monitor customers using these dashboards ● Integrates with mobile sdk