SlideShare uma empresa Scribd logo
1 de 36
Baixar para ler offline
Saturday	Morning		
Keynote	
Wes	McKinney		
@wesmckinn	
PyCon	APAC	2016	(Seoul)
Me	
DataPad	
Apache	
Arrow	
Feather	
ibis
In	process:	
Python	for	Data	Analysis:	2nd	Edi:on	
Coming	2017	
	
(in	English	J)
Q:	What	brings	
you	here?
Our	shared	
values
Pride	in	soMware	
craMsmanship
My	story	
•  Accidental	soMware	developer	
•  2007:	My	first	job	(financial	research	analyst)	
•  I	started	wriPng	Python	libraries	to	do	my	own	
work	beQer	
•  Soon	I	was	helping	my	colleagues	work	beQer,	
too
Tools
Tools
Empathy	
the	feeling	that	you	understand	
and	share	another	person's	
experiences	and	emoPons	:	the	
ability	to	share	someone	else's	
feelings	
Source:	Merriam-Webster's	Learner's	DicPonary
Open	source	is	
wonderful…
Open	source	is	
wonderful…but	it	can	
also	be	frustraPng
Sustainable	open	source	
•  How	to	keep	contributors	from	drowning	/	
burning	out?	
•  How	to	fund	the	work?	
•  How	to	protect	and	serve	the	community?
The	Grind
“The	grind	is	an	endless	
stream	of	bug	reports,	
requests,	demands,	
quesPons,	and	
occasional	inquisiPons.”	
	 DHH,	Creator	of	Ruby	on	Rails
pandas,	the	open	source	project	
•  Parts	of	code	date	back	to	April	2008	
•  Over	600	unique	contributors	on	GitHub	
	
•  AcPve	project	maintainers	range	from	4-7	
people	
•  >	6900	Closed	Issues	
•  >	5100	Pull	Requests
pandas	at	end	of	2012
April	7,	2014
"Some	might	argue	that	
[Heartbleed]	is	the	worst	
vulnerability	found	(at	least	in	
terms	of	its	potenPal	impact)	
since	commercial	traffic	began	to	
flow	on	the	Internet."	
Joseph	Steinberg,	Forbes	cybersecurity	columnist
“	There	should	be	at	least…[6]	full	Pme	
OpenSSL	team	members,	not	just	one,	able	
to	concentrate	…	without	having	to	hustle	
commercial	work.	If	you’re	a	…	in	a	posiPon	
to	do	something	about	it,	give	it	some	
thought.	Please.	I’m	gemng	old	and	weary	
and	I’d	like	to	rePre	someday.”	
	
Steve	Marquess,	OpenSSL	team
By	Nadia	Eghbal,	supported	by	
the	Ford	FoundaPon	
For	more	on	this
“The	Cathedral		
and	the	Bazaar”
Python’s	normalizaPon	in	industry	
•  Python	has	become	a	leading	language	
instead	of	something	“experimental”	or	
“risky”	
•  Many	businesses	founded	on	the	growth	of	
the	Python	user	base	
•  See	Paul	Graham’s	2004	essay	“The	Python	
Paradox”	—	how	things	have	changed!
Governance	
“the	processes	of	interacPon	and	
decision-making	among	the	actors	
involved	in	a	collecPve	problem…”	
M.	HuMy	(via	Wikipedia)
Openness	and	
Transparency
Consensus
Some	example	governance	documents	
•  NumPy	(see	the	docs)	
•  IPython	/	Jupyter	governance	
– github.com/jupyter/governance	
•  pandas	
– github.com/pydata/pandas-governance	
– Modeled	aMer	Jupyter	governance
hQp://numfocus.org	
hQp://apache.org
conda-forge	
•  	Community-curated	conda	package	channel	
(hosted	on	anaconda.org)	
•  	Reproducible	build	infrastructure	(Docker	+	
Circle	CI	+	Travis	CI	+	Appveyor)	
•  	Automated	GitHub	helper	tools	
conda config --add channels conda-forge
What	is	next	for	pandas?	
•  pandas	1.0	
– A	stable,	maintenance-only	release	
•  Beginning	“pandas	2.0”	
– Planning	significant	refactoring	on	the	internals	of	
Series,	DataFrame
Why	pandas	2.0?	
•  Some	changes	difficult/impossible	to	do	in	an	
incremental	way	
•  pandas’s	relaPonship	with	the	ecosystem	has	
evolved	over	the	last	5	years	
	
•  Make	pandas	
– Faster	and	use	less	memory	
– Fix	long-standing	limitaPons	/	inconsistencies	
– Easier	interoperability	/	extensibility
Apache	
Arrow	
hQp://arrow.apache.org
High	Performance	Sharing	&	Interchange	
Today With Arrow
•  Each system has its own
internal memory format
•  70-80% CPU wasted on
serialization and
deserialization
•  Similar functionality
implemented in multiple
projects
•  All systems utilize the same
memory format
•  No overhead for cross-
system communication
•  Projects can share
functionality (eg, Parquet-
to-Arrow reader)
Feather	File	Format	for	Python	and	R	
• Problem:	fast,	language-
agnosPc	binary	data	
frame	file	format	
• By	Wes	McKinney	
(Python)	and	Hadley	
Wickham	(R)	
• Read	speeds	close	to	
disk	IO	performance	
• Leverages	Apache	Arrow
Thank	you	
	
@wesmckinn	
hQp://wesmckinney.com	
	
pandas	sprint	on	Monday!

Mais conteúdo relacionado

Semelhante a PyCon APAC 2016 Keynote

Semelhante a PyCon APAC 2016 Keynote (20)

From Science Librarian to UX Office of One
From Science Librarian to UX Office of OneFrom Science Librarian to UX Office of One
From Science Librarian to UX Office of One
 
How machines learn to talk. Machine Learning for Conversational AI
How machines learn to talk. Machine Learning for Conversational AIHow machines learn to talk. Machine Learning for Conversational AI
How machines learn to talk. Machine Learning for Conversational AI
 
Live Usability Lab: See One, Do One & Take One Home
Live Usability Lab: See One, Do One & Take One HomeLive Usability Lab: See One, Do One & Take One Home
Live Usability Lab: See One, Do One & Take One Home
 
Creating a Digital Media Space for Today's Teens: Part 2 (Jan. 2019)
Creating a Digital Media Space for Today's Teens: Part 2 (Jan. 2019)Creating a Digital Media Space for Today's Teens: Part 2 (Jan. 2019)
Creating a Digital Media Space for Today's Teens: Part 2 (Jan. 2019)
 
Open Source for Libraries
Open Source for LibrariesOpen Source for Libraries
Open Source for Libraries
 
Practical Open Source Software for Libraries
Practical Open Source Software for LibrariesPractical Open Source Software for Libraries
Practical Open Source Software for Libraries
 
Software art and design: computational thinking through programming practice ...
Software art and design: computational thinking through programming practice ...Software art and design: computational thinking through programming practice ...
Software art and design: computational thinking through programming practice ...
 
Rebranding opac
Rebranding opacRebranding opac
Rebranding opac
 
Our World is Socio-technical
Our World is Socio-technicalOur World is Socio-technical
Our World is Socio-technical
 
SciPy Latin America 2019
SciPy Latin America 2019SciPy Latin America 2019
SciPy Latin America 2019
 
Open Source: Freedom and Community
Open Source: Freedom and CommunityOpen Source: Freedom and Community
Open Source: Freedom and Community
 
PennImmersive: Why We Need Librarians
PennImmersive: Why We Need LibrariansPennImmersive: Why We Need Librarians
PennImmersive: Why We Need Librarians
 
OPAC 2.0 and Beyond
OPAC 2.0 and BeyondOPAC 2.0 and Beyond
OPAC 2.0 and Beyond
 
Libraries and SXSW (for LACONi)
Libraries and SXSW (for LACONi)Libraries and SXSW (for LACONi)
Libraries and SXSW (for LACONi)
 
Digital Fluencies: A Story of Trials & Triumph
Digital Fluencies: A Story of Trials & TriumphDigital Fluencies: A Story of Trials & Triumph
Digital Fluencies: A Story of Trials & Triumph
 
Dietel2.0
Dietel2.0Dietel2.0
Dietel2.0
 
Rethinking Scala Presented in San Francisco May 7, 2014
Rethinking Scala Presented in San Francisco May 7, 2014Rethinking Scala Presented in San Francisco May 7, 2014
Rethinking Scala Presented in San Francisco May 7, 2014
 
How to Contribute to Open Source
How to Contribute to Open SourceHow to Contribute to Open Source
How to Contribute to Open Source
 
The Art and Science of Computer Conversation: Talkabot 2016
The Art and Science of Computer Conversation: Talkabot 2016The Art and Science of Computer Conversation: Talkabot 2016
The Art and Science of Computer Conversation: Talkabot 2016
 
Maker Space Petting Zoo
Maker Space Petting ZooMaker Space Petting Zoo
Maker Space Petting Zoo
 

Mais de Wes McKinney

Mais de Wes McKinney (20)

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
 
Apache Arrow: High Performance Columnar Data Framework
Apache Arrow: High Performance Columnar Data FrameworkApache Arrow: High Performance Columnar Data Framework
Apache Arrow: High Performance Columnar Data Framework
 
New Directions for Apache Arrow
New Directions for Apache ArrowNew Directions for Apache Arrow
New Directions for Apache Arrow
 
Apache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data TransportApache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data Transport
 
ACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data FramesACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data Frames
 
Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020
 
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
 
Apache Arrow: Leveling Up the Analytics Stack
Apache Arrow: Leveling Up the Analytics StackApache Arrow: Leveling Up the Analytics Stack
Apache Arrow: Leveling Up the Analytics Stack
 
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionApache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS Session
 
Apache Arrow: Leveling Up the Data Science Stack
Apache Arrow: Leveling Up the Data Science StackApache Arrow: Leveling Up the Data Science Stack
Apache Arrow: Leveling Up the Data Science Stack
 
Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019
 
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
 
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018
 
Apache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory DataApache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory Data
 
Apache Arrow -- Cross-language development platform for in-memory data
Apache Arrow -- Cross-language development platform for in-memory dataApache Arrow -- Cross-language development platform for in-memory data
Apache Arrow -- Cross-language development platform for in-memory data
 
Shared Infrastructure for Data Science
Shared Infrastructure for Data ScienceShared Infrastructure for Data Science
Shared Infrastructure for Data Science
 
Data Science Without Borders (JupyterCon 2017)
Data Science Without Borders (JupyterCon 2017)Data Science Without Borders (JupyterCon 2017)
Data Science Without Borders (JupyterCon 2017)
 
Memory Interoperability in Analytics and Machine Learning
Memory Interoperability in Analytics and Machine LearningMemory Interoperability in Analytics and Machine Learning
Memory Interoperability in Analytics and Machine Learning
 

Último

Último (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

PyCon APAC 2016 Keynote