O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

20170210 sapporotechbar7

866 visualizações

Publicada em

2017/2/10のインサイトテクノロジーさんのSapporo TechBarでお話しさせていただいたPyDataとSparkに関するスライドです。

Publicada em: Software
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

20170210 sapporotechbar7

  1. 1. PyData & Apache Spark 2017 / 2 / 10 Sapporo TechBar #7 @
  2. 2. ▸ facebook : Ryuji Tamagawa ▸ Twitter : tamagawa_ryuji ▸ FB techbar ▸ FB ▸ Twitter
  3. 3. 5
  4. 4. 
 Python PyData Apache Spark Jupyter Notebook 2017 and the future Pandas
  5. 5. PyData
  6. 6. 1 / 5 : PyData
  7. 7. 1 / 5 : PyData PyData.org
  8. 8. 1 / 5 : PyData PyData Anaconda Python Blaze NumPy and pandas interface to Big Data'. dask Bokeh Canopy Python IPython matplotlib PyData nose numba JIT NumPy PyData Scipy PyData Statsmodels SymPy pandas NumPy SciPy scikit-image scikit-learn PyData 

  9. 9. pandas
  10. 10. 2 / 5 : pandas pandas ▸ NumPy SciPy 
 ▸ DataFrame ▸
  11. 11. 2 / 5 : pandas pandas 
 Wes McKinney
  12. 12. 2 / 5 : pandas DataFrame
  13. 13. 2 / 5 : pandas
  14. 14. 2 / 5 : pandas ▸ 
 Python ▸ ▸ PyData pandas 

  15. 15. Jupyter Notebook
  16. 16. 3 /5 : Jupyter Notebook IPython Notebook ▸ Jupyter Notebook ▸ Julia Python R ▸ JupyterCon
  17. 17. 3 /5 : Jupyter Notebook
  18. 18. 3 /5 : Jupyter Notebook
  19. 19. 3 /5 : Jupyter Notebook pandas / matplotlib
  20. 20. 3 /5 : Jupyter Notebook Interactive Widget
  21. 21. 3 /5 : Jupyter Notebook ▸ Learning Jupyter
  22. 22. Apache Spark
  23. 23. 4 / 5 : Apache Spark Hadoop ▸ MapReduce Spark ▸ 2010 Hadoop = MapReduce + HDFS ▸ Hadoop OS HDFS Hive e.t.c. HBaseMapReduce YARN Impala e.t.c in- memory SQL engine Spark Spark Streaming, MLlib, GraphX, Spark SQL) Hadoop HDFS S3 
 YARN Mesos 
 /
  24. 24. 4 / 5 : Apache Spark Apache Spark PyData pandas Apache Spark pandas JVM Python × dask I/O Scala Java Python R
 JVM Python
  25. 25. 4 / 5 : Apache Spark Spark ▸ ▸ ▸ 1 PC 
 Hadoop / MapReduce
  26. 26. 4 / 5 : Apache Spark DataFrame
  27. 27. 4 / 5 : Apache Spark ▸ ▸ SSD ▸ Spark Parquet ▸ Performance comparison of different file formats and storage engines in the Hadoop ecosystem ▸ Parquet Python
  28. 28. 4 / 5 : Apache Spark Apache Spark ▸ ▸ Parquet ▸ ▸
  29. 29. Machine Learning
  30. 30. Machine Learning ▸ ▸ scikit-learn ▸ Spark MLlib / ML ▸ ▸ TensorFlow ▸ Python
  31. 31. 2017 and the future
  32. 32. 5/5 : 2017 and the future PyData ▸ ▸ Spark - pandas ▸ pandas → Spark …
  33. 33. 5/5 : 2017 and the future Wes blog ▸ pandas Apache Arrow ▸ Blog ▸ PyData Blog 
 Wes OK ▸ 2017 : pandas, Arrow, Feather, Parquet, Spark, Ibis
 http://qiita.com/tamagawa-ryuji/items/deb3f63ed4c7c8065e81
  34. 34. 5/5 : 2017 and the future High speed Apache Parquet for Python ▸ Parquet ▸ Spark ▸ Python ▸ Fastparquet ▸ pyarrow
  35. 35. 5/5 : 2017 and the future : apache arrow ▸ apache arrow ▸ PyData / OSS ▸ /

×