Boost Fertility New Invention Ups Success Rates.pdf
data science toolkit 101: set up Python, Spark, & Jupyter
1. IBM Cloud Data Services
data science toolkit 101
set up Python, Spark, & Jupyter
Raj Singh, PhD
Developer Advocate: Geo | Open Data
rrsingh@us.ibm.com
http://ibm.biz/rajrsingh
twitter: @rajrsingh
4. @rajrsingh
IBM Cloud Data Services
What is Spark?
• In-memory Hadoop
• Hadoop was massively scalable but slow
• “Up to 100x faster” (10x faster if memory is exhausted)
• What is Hadoop?
• HDFS: fault-tolerant storage using horizontally scalable commodity hardware
• MapReduce: programming style for distributed processing
• Presents data as an object
independent of the
underlying storage
6. @rajrsingh
IBM Cloud Data Services
Python installation with miniconda
1. https://www.continuum.io/downloads (choose version 2.7)
2. Miniconda2 install into this location: /Users/<username>/miniconda2
3. bash$ conda install pandas jupyter matplotlib
4. bash$ which python
/Users/<username>/miniconda2/bin/python
https://dzone.com/refcardz/apache-spark
7. @rajrsingh
IBM Cloud Data Services
Spark installation
• http://spark.apache.org/downloads.html
• Spark release: 1.6.2
• package type: Pre-built for Hadoop 2.6
• mkdir dev
• cd dev
• tar xzf ~/Downloads/spark-1.6.2-bin-hadoop2.6.tgz
• ln -s spark-1.6.2-bin-hadoop2.6 spark
• mkdir dev/notebooks
9. @rajrsingh
IBM Cloud Data Services
PySpark test
• bash$ cd ~/dev
• bash$ jupyter notebook
• upper right of the Jupyter screen, click New, choose
pySpark (Spark 1.6.2) Python 2
(or whatever name specified in your kernel.json file)
• in the notebook's first cell enter sc.version
and click the >| button to run it (or hit CTRL + Enter).
11. @rajrsingh
IBM Cloud Data Services
Examples
• Pixiedust
• https://github.com/ibm-cds-labs/pixiedust
• Demographic analyses
• http://ibm-cds-labs.github.io/open-data/samples/
• or https://github.com/ibm-cds-labs/open-data/tree/master/samples
12. IBM Cloud Data Services
Raj Singh
Developer Advocate: Geo | Open
Data
rrsingh@us.ibm.com
http://ibm.biz/rajrsingh
Twitter: @rajrsingh
LinkedIn: rajrsingh
Thanks