This document discusses using Python to access and process web data. It covers using the Requests library to make HTTP requests and get web content, parsing web content with Beautiful Soup and JSON, and accessing web services using REST. Example code is provided for making GET and POST requests, extracting data from HTML and JSON responses, and creating a simple Flask web service.
4. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
▸ Web Parser
▸ Web Services
5. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Requests Library
import requests
requests.get(‘http://www.facebook.com’).text
pip install requests #install library
6. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Make a Request
#GET Request
import requests
r = requests.get(‘http://www.facebook.com’)
if r.status_code == 200:
print(“Success”)
Success
7. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Make a Request
#POST Request
import requests
r = requests.post('http://httpbin.org/post', data = {'key':'value'})
if r.status_code == 200:
print(“Success”)
Success
8. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Make a Request
#Other Types of Request
import requests
r = requests.put('http://httpbin.org/put', data = {'key':'value'})
r = requests.delete('http://httpbin.org/delete')
r = requests.head('http://httpbin.org/get')
r = requests.options('http://httpbin.org/get')
9. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Passing Parameters In URLs
#GET Request with parameter
import requests
r = requests.get(‘https://www.google.co.th/?hl=th’)
if r.status_code == 200:
print(“Success”)
Success
10. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Passing Parameters In URLs
#GET Request with parameter
import requests
r = requests.get(‘https://www.google.co.th’,params={“hl”:”en”})
if r.status_code == 200:
print(“Success”)
Success
11. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Passing Parameters In URLs
#POST Request with parameter
import requests
r = requests.post("https://m.facebook.com",data={"key":"value"})
if r.status_code == 200:
print(“Success”)
Success
12. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Response Content
#Text Response
import requests
data = {“email” :“…..” , pass : “……”}
r = requests.post(“https://m.facebook.com”,data=data)
if r.status_code == 200:
print(r.text)
'<?xml version="1.0" encoding="utf-8"?>n<!DOCTYPE html PUBLIC "-//WAPFORUM//DTD XHTML
Mobile 1.0//EN" "http://www.wapforum.org/DTD/xhtml-mobile10.dtd"><html xmlns="http://
www.w3.org/1999/xhtml"><head><title>Facebook</title><meta name="referrer"
content="default" id="meta_referrer" /><style type=“text/css”>/*<!………………..
13. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Response Content
#Response encoding
import requests
r = requests.get('https://www.google.co.th/logos/doodles/2016/king-
bhumibol-adulyadej-1927-2016-5148101410029568.2-hp.png')
r.encoding = ’tis-620'
if r.status_code == 200:
print(r.text)
'<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage"
lang="th"><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta
content="/logos/doodles/2016/king-bhumibol-adulyadej-1927-2016-5148101410029568.2-
hp.png" itemprop="image"><meta content="ปวงข้าพระพุทธเจ้า ขอน้อมเกล้าน้อมกระหม่อมรำลึกใน...
14. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Response Content
#Binary Response
import requests
r = requests.get('https://www.google.co.th/logos/doodles/2016/king-
bhumibol-adulyadej-1927-2016-5148101410029568.2-hp.png')
if r.status_code == 200:
open(“img.png”,”wb”).write(r.content)
15. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Response Status Codes
#200 Response (OK)
import requests
r = requests.get('https://api.github.com/events')
if r.status_code == requests.codes.ok:
print(data[0]['actor'])
{'url': 'https://api.github.com/users/ShaolinSarg', 'display_login': 'ShaolinSarg', 'avatar_url': 'https://
avatars.githubusercontent.com/u/6948796?', 'id': 6948796, 'login': 'ShaolinSarg', 'gravatar_id': ''}
16. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Response Status Codes
#200 Response (OK)
import requests
r = requests.get('https://api.github.com/events')
print(r.status_code)
200
17. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Response Status Codes
#404
import requests
r = requests.get('https://api.github.com/events/404')
print(r.status_code)
404
18. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Response Headers
#404
import requests
r = requests.get('http://www.sanook.com')
print(r.headers)
print(r.headers[‘Date’])
{'Content-Type': 'text/html; charset=UTF-8', 'Date': 'Tue, 08 Nov 2016 14:38:41 GMT', 'Cache-
Control': 'private, max-age=0', 'Age': '16', 'Content-Encoding': 'gzip', 'Content-Length': '38089',
'Connection': 'keep-alive', 'Vary': 'Accept-Encoding', 'Accept-Ranges': 'bytes'}
Tue, 08 Nov 2016 14:38:41 GMT
19. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Timeouts
#404
import requests
r = requests.get(‘http://www.sanook.com',timeout=0.001)
ReadTimeout: HTTPConnectionPool(host='github.com', port=80): Read timed out. (read
timeout=0.101)
20. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Authentication
#Basic Authentication
import requests
r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
print(r.status_code)
200
21. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
read more : http://docs.python-requests.org/en/master/
22. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Quiz#1 : Tag Monitoring
1. Get webpage : http://pantip.com/tags
2. Save to file every 5 minutes (time.sleep(300))
3. Use current date time as filename
(How to get current date time using Python?, find it on Google)
23. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
HTML Parser : beautifulsoup
from bs4 import BeautifulSoup
soup = BeautifulSoup(open(“file.html”),"html.parser") #parse from file
soup = BeautifulSoup(“<html>data</html>”,"html.parser") #parse from
text
pip install beautifulsoup4 #install library
24. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Parse a document
from bs4 import BeautifulSoup
soup = BeautifulSoup(“<html>data</html>”,"html.parser")
print(soup)
<html>data</html>
25. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Parse a document
#Navigating using tag names
from bs4 import BeautifulSoup
html_doc = """<html><head><title>The Dormouse's story</title></
head><body><p class="title"><b>The Dormouse's story</b></p></
body>”””
soup = BeautifulSoup(html_doc,"html.parser")
soup.head
soup.title
soup.body.p
26. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
<head><title>The Dormouse's story</title></head>
<title>The Dormouse's story</title>
<p class="title"><b>The Dormouse's story</b></p>
27. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Parse a document
#Access string
from bs4 import BeautifulSoup
html_doc = “""<h1>hello</h1>”””
soup = BeautifulSoup(html_doc,"html.parser")
print(soup.h1.string)
hello
28. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Parse a document
#Access attribute
from bs4 import BeautifulSoup
html_doc = “<a href="http://example.com/elsie" >Elsie</a>”
soup = BeautifulSoup(html_doc,"html.parser")
print(soup.a[‘href’])
http://example.com/elsie
29. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Parse a document
#Get all text in the page
from bs4 import BeautifulSoup
html_doc = """<html><head><title>The Dormouse's story</title></
head><body><p class="title"><b>The Dormouse's story</b></p></
body>”””
soup = BeautifulSoup(html_doc,"html.parser")
print(soup.get_text)
<bound method Tag.get_text of <html><head><title>The Dormouse's story</title></
head><body><p class="title"><b>The Dormouse's story</b></p></body></html>>
30. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Parse a document
# find_all()
from bs4 import BeautifulSoup
html_doc = """<a href="http://example.com/elsie" class="sister"
id="link1">Elsie</a>,<a href="http://example.com/lacie" class="sister"
id="link2">Lacie</a> and <a href="http://example.com/tillie"
class="sister" id="link3">Tillie</a>;”””
soup = BeautifulSoup(html_doc,"html.parser")
for a in soup.find_all(‘a’):
print(a)
31. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Parse a document
<a class="sister" href="http://example.com/elsie"
id="link1">Elsie</a>
<a class="sister" href="http://example.com/lacie"
id="link2">Lacie</a>
32. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Parse a document
#find_all()
soup.find_all(id='link2')
soup.find_all(href=re.compile("elsie"))
soup.find_all(id=True)
data_soup.find_all(attrs={"data-foo": “value"})
soup.find_all("a", class_="sister")
soup.find_all("a", recursive=False)
soup.p.find_all(“a", recursive=False)
33. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Parse a document
re.compile(…..)
<a href=“http://192.x.x.x” class=“c1”>hello</a>
<a href=“https://192.x.x.x” class=“c1”>hello</a>
<a href=“https://www.com” class=“c1”>hello</a>
find_all(href=re.compile(‘(https|http)://[0-9.]’))
https://docs.python.org/2/howto/regex.html
34. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Parse a document
read more : https://www.crummy.com/software/BeautifulSoup/
bs4/doc/
35. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Quiz#2 : Tag Extraction
1. Get webpage : http://pantip.com/tags
2. Extract tag name, tag link, number of topic in
first 10 pages
3. save to file as this format
tag name, tag link, number of topic, current datetime
4. Run every 5 minutes
36. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
JSON Parser : json
import json
json_doc = json.loads(“{key : value}“)
built-in function
37. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
JSON Parser : json
#JSON string
json_doc = “””{“employees":[
{"firstName":"John", "lastName":"Doe"},
{"firstName":"Anna", "lastName":"Smith"},
{"firstName":"Peter", "lastName":"Jones"}
]} “””
38. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
JSON Parser : json
#Parse string to object
import json
json_obj = json.loads(json_doc)
print(json_obj)
{'employees': [{'firstName': 'John', 'lastName': 'Doe'}, {'firstName': 'Anna', 'lastName': 'Smith'},
{'firstName': 'Peter', 'lastName': 'Jones'}]}
39. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
JSON Parser : json
#Access json object
import json
json_obj = json.loads(json_doc)
print(json_obj[‘employees’][0][‘firstName’])
print(json_obj[‘employees’][0][‘lastName’])
John
Doe
40. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
JSON Parser : json
#Create json doc
import json
json_obj = {“firstName” : “name”,”lastName” : “last”} #Dictionary
print(json.dumps(json_obj,indent=1))
{
"firstName": "name",
"lastName": “last"
}
41. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Quiz#3 : Post Monitoring
1. Register as Facebook Developer on
developers.facebook.com
2. Get information of last 10 hours post on the page
https://www.facebook.com/MorningNewsTV3
3. save to file as this format
post id, post datetime, #number like, current datetime
42. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Quiz#3 : Post Monitoring
URL
https://graph.facebook.com/v2.8/<PageID>?
fields=posts.limit(100)%7Blikes.limit(1).summary(true)
%2Ccreated_time%7D&access_token=
51. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
JSON
{"employees":[
{"firstName":"John", "lastName":"Doe"},
{"firstName":"Anna", "lastName":"Smith"},
{"firstName":"Peter", "lastName":"Jones"}
]}
list
dict
key
value
read more : http://www.json.org/
52. USING PYTHON TO ACCESS WEB DATA
▸ Web Service
Create Simple Web Service
from flask.ext.api import FlaskAPI
app = FlaskAPI(__name__)
@app.route('/example/')
def example():
return {'hello': 'world'}
app.run(debug=False,port=5555)
pip install Flask-API
53. USING PYTHON TO ACCESS WEB DATA
▸ Web Service
Create Simple Web Service
#receive input
from flask.ext.api import FlaskAPI
app = FlaskAPI(__name__)
@app.route(‘/hello/<name>/<lastName>')
def example(name,lastName):
return {'hello':name}
app.run(debug=False,port=5555)
54. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Quiz#4 : Tag Service
1. Build get TopTagInfo function using web service.
2. Input : Number of top topic
3. Output: tag name and number of top the topic in json
format.
55. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Quiz#4 : Top Tag Service
1. Build getTopTagInfo web service.
2. Input : Number of top topic
3. Output: tag name and number of top the topic in json
format.
58. USING DATABASES WITH PYTHON
Zero configuration
– SQLite does not need to be Installed as there is no setup procedure to use it.
Server less
– SQLite is not implemented as a separate server process. With SQLite, the process that wants to access the
database reads and writes directly from the database files on disk as there is no intermediary server process.
Stable Cross-Platform Database File
– The SQLite file format is cross-platform. A database file written on one machine can be copied to and used
on a different machine with a different architecture.
Single Database File
– An SQLite database is a single ordinary disk file that can be located anywhere in the directory hierarchy.
Compact
– When optimized for size, the whole SQLite library with everything enabled is less than 400KB in size
59. USING DATABASES WITH PYTHON
SQLite
import sqlite3
conn = sqlite3.connect('my.db')
built-in library : sqlite3
60. USING DATABASES WITH PYTHON
SQLite
1. Connect to db
2. Get cursor
3. Execute command
4. Commit (insert / update/delete) / Fetch result (select)
5. Close database
Workflow
61. USING DATABASES WITH PYTHON
SQLite
import sqlite3
conn = sqlite3.connect(‘example.db') # connect db
c = conn.cursor() # get cursor
# execute1
c.execute('''CREATE TABLE stocks
(date text, trans text, symbol text, qty real, price real)''')
# execute2
c.execute("INSERT INTO stocks VALUES ('2006-01-05','BUY','RHAT',100,35.14)")
conn.commit() # commit
conn.close() # close
Workflow Example
63. USING DATABASES WITH PYTHON
Database Storage
import sqlite3
conn = sqlite3.connect(‘example.db') #store in disk
conn = sqlite3.connect(‘:memory:’) #store in memory
64. USING DATABASES WITH PYTHON
Execute
#execute
import sqlite3
conn = sqlite3.connect(‘example.db')
c = conn.cursor()
t = ('RHAT',)
c.execute('SELECT * FROM stocks WHERE symbol=?', t)
65. USING DATABASES WITH PYTHON
Execute
#executemany
import sqlite3
conn = sqlite3.connect(‘example.db')
c = conn.cursor()
purchases = [('2006-03-28', 'BUY', 'IBM', 1000, 45.00),
('2006-04-05', 'BUY', 'MSFT', 1000, 72.00),
('2006-04-06', 'SELL', 'IBM', 500, 53.00),]
c.executemany('INSERT INTO stocks VALUES (?,?,?,?,?)', purchases)
66. USING DATABASES WITH PYTHON
fetch
#fetchaone
import sqlite3
conn = sqlite3.connect(‘example.db')
c = conn.cursor()
c.execute('SELECT * FROM stocks')
c.fetchone()
('2006-01-05', 'BUY', 'RHAT', 100.0, 35.14)
67. USING DATABASES WITH PYTHON
fetch
#fetchall
import sqlite3
conn = sqlite3.connect(‘example.db')
c = conn.cursor()
c.execute('SELECT * FROM stocks')
for d in c.fetchall():
print(d)
[('2006-01-05', 'BUY', 'RHAT', 100.0, 35.14),
('2006-03-28', 'BUY', 'IBM', 1000.0, 45.0),
('2006-04-05', 'BUY', 'MSFT', 1000.0, 72.0),
68. USING DATABASES WITH PYTHON
Context manager
import sqlite3
con = sqlite3.connect(":memory:")
con.execute("create table person (id integer primary key, firstname
varchar unique)")
#con.commit() is called automatically afterwards
with con:
con.execute("insert into person(firstname) values (?)", ("Joe"))
69. USING DATABASES WITH PYTHON
Read more :
https://docs.python.org/2/library/sqlite3.html
https://www.tutorialspoint.com/python/python_database_access.htm
70. USING DATABASES WITH PYTHON
Quiz#5 : Post DB
1. Register as Facebook Developer on
developers.facebook.com
2. Get information of last 10 hours post on the page
https://www.facebook.com/MorningNewsTV3
(post id, post datetime, #number like, current datetime)
3. design and create table to store posts
72. PROCESSING AND VISUALIZING DATA WITH PYTHON
▸ Processing : pandas
pip install pandas
high-performance, easy-to-use data structures and
data analysis tools
73. USING DATABASES WITH PYTHON
Pandas : Series
#create series with Array-like
import pandas as pd
from numpy.random import rand
s = pd.Series(rand(5), index=['a', 'b', 'c', 'd', 'e'])
print(s)
a 0.690232
b 0.738294
c 0.153817
d 0.619822
e 0.4347
74. USING DATABASES WITH PYTHON
Pandas : Series
#create series with dictionary
import pandas as pd
from numpy.random import rand
d = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(d) #with dictionary
print(s)
a 0
b 1
c 2
dtype: float64
75. USING DATABASES WITH PYTHON
Pandas : Series
#create series with Scalar
import pandas as pd
from numpy.random import rand
s = pd.Series(5., index=['a', 'b', 'a', 'd', ‘a']) #index can duplicate
print(s[‘a’])
a 5
a 5
a 5
dtype: float64
76. USING DATABASES WITH PYTHON
Pandas : Series
#access series data
import pandas as pd
from numpy.random import rand
s = pd.Series(5., index=['a', 'b', 'a', 'd', ‘a']) #index can duplicate
print(s[0])
print(s[:3])
5.0
a 5
b 5
a 5
dtype: float64
77. USING DATABASES WITH PYTHON
Pandas : Series
#series operations
import pandas as pd
from numpy.random import rand
import numpy as np
s = pd.Series(rand(10)) #index can duplicate
s = s + 2
s = s * s
s = np.exp(s)
print(s)
0 187.735606
1 691.660752
2 60.129741
3 595.438606
4 769.479456
5 397.052123
6 4691.926483
7 1427.593520
8 180.001824
9 410.994395
dtype: float64
78. USING DATABASES WITH PYTHON
Pandas : Series
#series filtering
import pandas as pd
from numpy.random import rand
import numpy as np
s = pd.Series(rand(10)) #index can duplicate
s = s[s > 0.1]
print(s)
1 0.708700
2 0.910090
3 0.380613
6 0.692324
7 0.508440
8 0.763977
9 0.470675
dtype: float64
79. USING DATABASES WITH PYTHON
Pandas : Series
#series incomplete data
import pandas as pd
from numpy.random import rand
import numpy as np
s1 = pd.Series(rand(10))
s2 = pd.Series(rand(8))
s = s1 + s2
print(s)
0 0.813747
1 1.373839
2 1.569716
3 1.624887
4 1.515665
5 0.526779
6 1.544327
7 0.740962
8 NaN
9 NaN
dtype: float64
80. USING DATABASES WITH PYTHON
Pandas : Series
#create series with Array-like
import pandas as pd
from numpy.random import rand
s = pd.Series(rand(5), index=['a', 'b', 'c', 'd', 'e'])
print(s)
a 0.690232
b 0.738294
c 0.153817
d 0.619822
e 0.4347
81. USING DATABASES WITH PYTHON
Pandas : DataFrame
2-dimensional labeled data
structure with columns
of potentially different types
82. USING DATABASES WITH PYTHON
Pandas : DataFrame
#create dataframe with dict
d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']),
'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print(df)
one two
a 1 1
b 2 2
c 3 3
d NaN 4
83. USING DATABASES WITH PYTHON
Pandas : DataFrame
#create dataframe with dict list
d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}
df = pd.DataFrame(d)
print(df)
one two
0 1 4
1 2 3
2 3 2
3 4 1
85. USING DATABASES WITH PYTHON
Pandas : DataFrame
#access dataframe row
d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}
df = pd.DataFrame(d)
print(df.iloc[:3])
one two
0 1 4
1 2 3
2 3 2
86. USING DATABASES WITH PYTHON
Pandas : DataFrame
#add new column
d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}
df = pd.DataFrame(d)
df['three'] = [1,2,3,2]
print(df)
one two three
0 1 4 1
1 2 3 2
2 3 2 3
3 4 1 2
87. USING DATABASES WITH PYTHON
Pandas : DataFrame
#show data : head() and tail()
d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}
df = pd.DataFrame(d)
df['three'] = [1,2,3,2]
print(df.head())
print(df.tail())
one two three
0 1 4 1
1 2 3 2
2 3 2 3
3 4 1 2
88. USING DATABASES WITH PYTHON
Pandas : DataFrame
#dataframe summary
d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}
df = pd.DataFrame(d)
print(df.describe())
89. USING DATABASES WITH PYTHON
Pandas : DataFrame
#dataframe function
d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}
df = pd.DataFrame(d)
print(df.mean())
one 2.5
two 2.5
dtype: float64
90. USING DATABASES WITH PYTHON
Pandas : DataFrame
#dataframe function
d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}
df = pd.DataFrame(d)
print(df.corr()) #calculate correlation
one two
one 1 -1
two -1 1
91. USING DATABASES WITH PYTHON
Pandas : DataFrame
#dataframe filtering
d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}
df = pd.DataFrame(d)
print(df[(df[‘one’] > 1) & (df[‘one’] < 3)] )
one two
1 2 3
92. USING DATABASES WITH PYTHON
Pandas : DataFrame
#dataframe filtering with isin
d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}
df = pd.DataFrame(d)
print(df[df[‘one’].isin([2,4])] )
one two
1 2 3
3 4 1
93. USING DATABASES WITH PYTHON
Pandas : DataFrame
#dataframe with row data
d = [ [1., 2., 3., 4.], [4., 3., 2., 1.]]
df = pd.DataFrame(d)
df.columns = ["one","two","three","four"]
print(df)
one two three four
0 1 2 3 4
1 4 3 2 1
94. USING DATABASES WITH PYTHON
Pandas : DataFrame
#dataframe sort values
d = [ [2., 1., 3., 4.], [1., 3., 2., 4.]]
df = pd.DataFrame(d)
df.columns = ["one","two","three","four"]
df = df.sort_values([“one”,”two”],ascending=[1,0])
print(df)
one two three four
0 2 1 3 4
1 1 3 2 4
95. USING DATABASES WITH PYTHON
Pandas : DataFrame
#dataframe from csv file
df = pd.read_csv(‘file.csv’)
print(df)
one two three
0 1 2 3
1 1 2 3
2 1 2 3
file.csv
one,two,three
1,2,3
1,2,3
1,2,3
96. USING DATABASES WITH PYTHON
Pandas : DataFrame
#dataframe from csv file, without header.
df = pd.read_csv(‘file.csv’,header=-1)
print(df)
0 1 2
0 1 2 3
1 1 2 3
2 1 2 3
file.csv
1,2,3
1,2,3
1,2,3
98. USING DATABASES WITH PYTHON
Pandas : DataFrame
#dataframe from html, need to install lxml first (pip install lxml)
df = pd.read_html(‘https://simple.wikipedia.org/wiki/
List_of_U.S._states’)
print(df[0])
Abbreviation State Name Capital Became a State
1 AL Alabama Montgomery December 14, 1819
2 AK Alaska Juneau January 3, 1959
3 AZ Arizona Phoenix February 14, 1912
99. USING DATABASES WITH PYTHON
Quiz#6 : Data Exploration
1. Goto https://archive.ics.uci.edu/ml/datasets/Adult
to read data description
2. Parse data into pandas using read_csv() and set
columns name
3. Explore data to answer following questions,
- find number of person in each education level.
- find correlation and covariance between continue
fields
- Avg age of United-States population where income
>50K.
101. PROCESSING AND VISUALIZING DATA WITH PYTHON
▸ Visualizing : seaborn
pip install seaborn
visualization library based on matplotlib
102. PROCESSING AND VISUALIZING DATA WITH PYTHON
▸ Visualizing : seaborn
seaborn : set inline plot for jupyter
%matplotlib inline
import numpy as np
import seaborn as sns
# Generate some sequential data
x = np.array(list("ABCDEFGHI"))
y1 = np.arange(1, 10)
sns.barplot(x, y1)
104. PROCESSING AND VISUALIZING DATA WITH PYTHON
▸ Visualizing : seaborn
seaborn : set layout
%matplotlib inline
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
f,ax = plt.subplots(1,1,figsize=(10, 10))
sns.barplot(x=[1,2,3,4,5],y=[3,2,3,4,2])
106. PROCESSING AND VISUALIZING DATA WITH PYTHON
▸ Visualizing : seaborn
seaborn : set layout
%matplotlib inline
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
f,ax = plt.subplots(2,2,figsize=(10, 10))
sns.barplot(x=[1,2,3,4,5],y=[3,2,3,4,2],ax=ax[0,0])
sns.distplot([3,2,3,4,2],ax=ax[0,1])
112. PROCESSING AND VISUALIZING DATA WITH PYTHON
▸ Visualizing : seaborn
seaborn : plot types
http://seaborn.pydata.org/examples/index.html
113. USING DATABASES WITH PYTHON
Quiz#7 : Adult Plot
1. Goto https://archive.ics.uci.edu/ml/datasets/Adult
to read data description
2. Parse data into pandas using read_csv() and set
columns name
3. Plot five charts.