O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Financial Modeling with Apache Spark: Calculating Value at Risk

1.330 visualizações

Publicada em

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1LAW6pi.

Sandy Ryza walks the audience through a basic VaR calculation with Spark. The calculation employs the widely used Monte Carlo method, which is useful for modeling portfolios with non-normal distributions of returns. The talk aims to give a feel for what it is like to approach financial modeling with modern big data tools. Filmed at qconnewyork.com.

Sandy Ryza is a data scientist at Cloudera focusing on Apache Spark and its ecosystem. He is an active contributor to the Spark project and coauthor of O’Reilly Media’s forthcoming Advanced Analytics on Spark, as well as an Apache Hadoop committer and PMC member.

Publicada em: Tecnologia
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE Format, ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui
  • Seja a primeira pessoa a gostar disto

Financial Modeling with Apache Spark: Calculating Value at Risk

  1. 1. 1© Cloudera, Inc. All rights reserved. Estimating Financial Risk with Spark Sandy Ryza | Senior Data Scientist
  2. 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /spark-financial-modeling
  3. 3. Presented at QCon New York www.qconnewyork.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  4. 4. 2© Cloudera, Inc. All rights reserved.
  5. 5. 3© Cloudera, Inc. All rights reserved.
  6. 6. 4© Cloudera, Inc. All rights reserved. In reasonable circumstances, what’s the most you can expect to lose?
  7. 7. 5© Cloudera, Inc. All rights reserved.
  8. 8. 6© Cloudera, Inc. All rights reserved. def valueAtRisk( portfolio, timePeriod, pValue ): Double = { ... }
  9. 9. 7© Cloudera, Inc. All rights reserved. def valueAtRisk( portfolio, 2 weeks, 0.05 ) = $1,000,000
  10. 10. 8© Cloudera, Inc. All rights reserved. Probability density Portfolio return ($) over the time period
  11. 11. 9© Cloudera, Inc. All rights reserved. VaR estimation approaches • Variance-covariance • Historical • Monte Carlo
  12. 12. 10© Cloudera, Inc. All rights reserved.
  13. 13. 11© Cloudera, Inc. All rights reserved.
  14. 14. 12© Cloudera, Inc. All rights reserved. Market Risk Factors • Indexes (S&P 500, NASDAQ) • Prices of commodities • Currency exchange rates • Treasury bonds
  15. 15. 13© Cloudera, Inc. All rights reserved. Predicting Instrument Returns from Factor Returns • Train a linear model on the factors for each instrument
  16. 16. 14© Cloudera, Inc. All rights reserved. Fancier • Add features that are non-linear transformations of the market risk factors • Decision trees • For options, use Black-Scholes
  17. 17. 15© Cloudera, Inc. All rights reserved. import org.apache.commons.math3.stat.regression.OLSMultipleLinearRegression // Load the instruments and factors val factorReturns: Array[Array[Double]] = ... val instrumentReturns: RDD[Array[Double]] = ... // Fit a model to each instrument val models: Array[Array[Double]] = instrumentReturns.map { instrument => val regression = new OLSMultipleLinearRegression() regression.newSampleData(instrument, factorReturns) regression.estimateRegressionParameters() }.collect()
  18. 18. 16© Cloudera, Inc. All rights reserved. How to sample factor returns? • Need to be able to generate sample vectors where each component is a factor return. • Factors returns are usually correlated.
  19. 19. 17© Cloudera, Inc. All rights reserved. Distribution of US treasury bond two-week returns
  20. 20. 18© Cloudera, Inc. All rights reserved. Distribution of crude oil two-week returns
  21. 21. 19© Cloudera, Inc. All rights reserved. The Multivariate Normal Distribution • Probability distribution over vectors of length N • Given all the variables but one, that variable is distributed according to a univariate normal distribution • Models correlations between variables
  22. 22. 20© Cloudera, Inc. All rights reserved.
  23. 23. 21© Cloudera, Inc. All rights reserved. import org.apache.commons.math3.stat.correlation.Covariance // Compute means val factorMeans: Array[Double] = transpose(factorReturns) .map(factor => factor.sum / factor.size) // Compute covariances val factorCovs: Array[Array[Double]] = new Covariance(factorReturns) .getCovarianceMatrix().getData()
  24. 24. 22© Cloudera, Inc. All rights reserved. Fancier • Multivariate normal often a poor choice compared to more sophisticated options • Fatter tails: Multivariate T Distribution • Filtered historical simulation • ARMA • GARCH
  25. 25. 23© Cloudera, Inc. All rights reserved. Running the simulations • Create an RDD of seeds • Use each seed to generate a set of simulations • Aggregate results
  26. 26. 24© Cloudera, Inc. All rights reserved. // Broadcast the factor return -> instrument return models val bModels = sc.broadcast(models) // Generate a seed for each task val seeds = (baseSeed until baseSeed + parallelism) val seedRdd = sc.parallelize(seeds, parallelism) // Create an RDD of trials val trialReturns: RDD[Double] = seedRdd.flatMap { seed => trialReturns(seed, trialsPerTask, bModels.value, factorMeans, factorCovs) }
  27. 27. 25© Cloudera, Inc. All rights reserved. def trialReturn(factorDist: MultivariateNormalDistribution, models: Seq[Array[Double]]): Double = { val trialFactorReturns = factorDist.sample() var totalReturn = 0.0 for (model <- models) { // Add the returns from the instrument to the total trial return for (i <- until trialFactorsReturns.length) { totalReturn += trialFactorReturns(i) * model(i) } } totalReturn }
  28. 28. 26© Cloudera, Inc. All rights reserved. Executor Executor Time Model parameters Model parameters
  29. 29. 27© Cloudera, Inc. All rights reserved. Executor Task Task Trial Trial TrialTrial Task Task Trial Trial Trial Trial Executor Task Task Trial Trial TrialTrial Task Task Trial Trial Trial Trial Time Model parameters Model parameters
  30. 30. 28© Cloudera, Inc. All rights reserved. // Cache for reuse trialReturns.cache() val numTrialReturns = trialReturns.count().toInt // Compute value at risk val valueAtRisk = trials.takeOrdered(numTrialReturns / 20).last // Compute expected shortfall val expectedShortfall = trials.takeOrdered(numTrialReturns / 20).sum / (numTrialReturns / 20)
  31. 31. 29© Cloudera, Inc. All rights reserved.
  32. 32. 30© Cloudera, Inc. All rights reserved. VaR
  33. 33. 31© Cloudera, Inc. All rights reserved. So why Spark?
  34. 34. 32© Cloudera, Inc. All rights reserved. Ease of use • Parallel computing for 5-year olds • Scala, Python, and R REPLs
  35. 35. 33© Cloudera, Inc. All rights reserved. • Cleaning data • Fitting models • Running simulations • Storing results • Analyzing results Single platform for
  36. 36. 34© Cloudera, Inc. All rights reserved. But it’s CPU-bound and we’re using Java? • Computational bottlenecks are normally in matrix operations, which can be BLAS- ified • Can call out to GPUs just like in C++ • Memory access patterns aren’t high-GC inducing
  37. 37. 35© Cloudera, Inc. All rights reserved. Want to do this yourself?
  38. 38. 36© Cloudera, Inc. All rights reserved. spark-timeseries • https://github.com/cloudera/spark-timeseries • Everything here + some fancier stuff • Patches welcome!
  39. 39. 37© Cloudera, Inc. All rights reserved.
  40. 40. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/ spark-financial-modeling

×