SlideShare uma empresa Scribd logo
1 de 19
Making AI Efficient
Dr Janet Bastiman
@yssybyl
STORYSTREAM.AI
Neurons
86 Billion Neurons, 20 Watts
Multiple pathways
Visual cortex estimated
at 13 Billion neurons
Visual system ~ 3 Watts
(2.3 x 10-11 W per neuron)
@yssybyl
Computers…
@yssybyl
Device GPU CUDA cores Power W W per CUDA core Transistors W per transistor
Dell XPS 1050 Ti 768 130 17 x 10-2 3.3 x109 3.9 x10-8
Dell Server 1080Ti 3584 1100 31 x 10-2 1.2 x1010 9.2 x10-8
DGX-1 8 Tesla P100 28672 6400 22 x 10-2 1.2 x1011 7.7x10-8
HGX-2 16 Tesla V100 81920 12800 16 x 10-2 3.3 x1011 3.9X10-8
Neurons are 1000x more energy efficient than transistors
and a billion times more efficient than a single CUDA core
If we are to get the most out of machines
we need to recognise the cost of what we use
How did we get here?
• Abstractions
• Higher level languages
• Growing resources
• Laziness
But what if you can’t use the latest and greatest…?
@yssybyl
What is efficient?
“Soon algorithms will be measured by the amount of intelligence they provide
per Watt” – Taco Cohen, Qualcomm
Minimal memory requirements
Speed may be more important than accuracy in some cases
Every flop is sacred
@yssybyl
Start with the basics
Learn how to code well in whatever language you choose
Understand the boundaries and the frameworks
Optimise the code flow
Discrete mathematics – know your computational linear algebra
@yssybyl
Stop cutting and pasting from
Stack Overflow without
understanding
STORYSTREAM.AI Dr Janet Bastiman @yssybyl
Optimise…
@yssybyl
Calling a library that just does the looping does not count as optimisation 
Simple Python Performance tricks
Pythonic code is more readable and usually faster by design
- Know the basic data structures – dicts have O(1) lookup
- Reduce memory footprint and avoid + operator on strings
- Use built in functions
- Calculations outside of loops
- Keep code small
@yssybyl
for i in big_it:
m = re.search(r'd{2}-d{2}-d{4}', i)
if m:
...
date_regex = re.compile(r'd{2}-d{2}-d{4}’)
for i in big_it:
m = date_regex.search(i)
if m:
...
newlist = []
for word in oldlist:
newlist.append(word.upper())
newlist = map(str.upper, oldlist)
# slow
msg = 'hello ' + my_var + ' world’
# faster
msg = 'hello %s world' % my_var
# or better:
msg = 'hello {} world'.format(my_var)
msg = 'line1n’
msg += 'line2n’
msg += 'line3n'
msg = ['line1', 'line2', 'line3’]
'n'.join(msg)
Hardware
We can’t all afford Nvidia’s latest and greatest 
Most of us are restricted by in-house hardware or a budget for cloud services
Horse sized duck vs duck sized horses?
Performance on benchmarks is not necessarily indicative of your own networks.
Value?
@yssybyl
Top down
Tight code will always run faster
Maximise CPU and GPU utilisation – split your program
• Parallelise I/O operations
• Parallelise data transformations
Understand the real requirements of your GPU with allow_growth
Use multiple GPUs if necessary
Put your code on the most efficient part of the system
@yssybyl
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)
Remove unnecessary imports
Don’t import the world – use what you need
numpy adds 320MB
Do you really need numpy, pandas, sklearn and tf?
Putting imports within rarely called functions may be beneficial
Use the best tool for the job
@yssybyl
Limit data in
This is what we do!
Large images means large networks
Don’t learn the noise
Focussed data will mean better results
with fewer examples
@yssybyl
Optimise Implementation
Not every problem needs tensorflow
PCA, naïve Bayes, SVM, Fourier transforms…
Pre-optimised networks – let someone else do the hardwork…
Understand your problem, understand your data, pick the best tool for the job
@yssybyl
Bayesian Optimisation
@yssybyl
https://arxiv.org/abs/1502.03492
https://arxiv.org/abs/1712.02902
https://github.com/SheffieldML/GPyO
pt
Established methodology, typically
using a Gaussian process
Some scaling problems
Speed up training
Pruning
@yssybyl
https://arxiv.org/abs/1707.06168
https://github.com/yihui-he/channel-pruning
Aim to reduce channels in feature map
While minimising error
LASSO regression
Initial problem is NP-hard
Yihui He et all added constraints to simplify that may not be relevant
5x speed increase
0.3% error increase
Quantisation
Trained models have a lot of floats.
Can reduce precision to 8-bits
Floats to store maximum and minimum values
Quantised has linear spread and effectively represents
an arbitrary magnitude of ranges
Impact on accuracy so use with care
@yssybyl
Priority
1. Be mindful of resources
2. Code well, use correct data structures
3. Use the right libraries for the right tasks
4. Use structures other people have already optimised
5. Minimise inputs
6. Optimise parameters
7. Prune your network
8. Quantise your variables
@yssybyl
STORYSTREAM.AI
STORYSTREAM.AI
Dr Janet Bastiman @yssybyl
Thank You
@yssybyl
janjanjan.uk
https://uk.linkedin.com/in/janetbastiman

Mais conteúdo relacionado

Mais procurados

3 python packages
3 python packages3 python packages
3 python packagesFEG
 
TensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache SparkTensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache SparkDatabricks
 
Beyond Kaggle: Solving Data Science Challenges at Scale
Beyond Kaggle: Solving Data Science Challenges at ScaleBeyond Kaggle: Solving Data Science Challenges at Scale
Beyond Kaggle: Solving Data Science Challenges at ScaleTuri, Inc.
 
Making Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedMaking Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedTuri, Inc.
 
HadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software FrameworkHadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software FrameworkThoughtWorks
 
Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Uwe Printz
 
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016MLconf
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017StampedeCon
 
Spark Meetup TensorFrames
Spark Meetup TensorFramesSpark Meetup TensorFrames
Spark Meetup TensorFramesJen Aman
 
SciPy - Scientific Computing Tool
SciPy - Scientific Computing ToolSciPy - Scientific Computing Tool
SciPy - Scientific Computing ToolMarcelo Cure
 
Deep Learning with PyTorch
Deep Learning with PyTorchDeep Learning with PyTorch
Deep Learning with PyTorchMayur Bhangale
 
3. basic data structures(2)
3. basic data structures(2)3. basic data structures(2)
3. basic data structures(2)Hongjun Jang
 
Artificial Neural Network Implementation on FPGA
Artificial Neural Network Implementation on FPGAArtificial Neural Network Implementation on FPGA
Artificial Neural Network Implementation on FPGADae Woon Kim
 
Climate data in r with the raster package
Climate data in r with the raster packageClimate data in r with the raster package
Climate data in r with the raster packageAlberto Labarga
 
Machine learning at scale with Google Cloud Platform
Machine learning at scale with Google Cloud PlatformMachine learning at scale with Google Cloud Platform
Machine learning at scale with Google Cloud PlatformMatthias Feys
 
Junhua wang ai_next_con
Junhua wang ai_next_conJunhua wang ai_next_con
Junhua wang ai_next_conJunhua Wang
 
Using Multi GPU in PyTorch
Using Multi GPU in PyTorchUsing Multi GPU in PyTorch
Using Multi GPU in PyTorchJun Young Park
 

Mais procurados (19)

3 python packages
3 python packages3 python packages
3 python packages
 
TensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache SparkTensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache Spark
 
Icpp power ai-workshop 2018
Icpp power ai-workshop 2018Icpp power ai-workshop 2018
Icpp power ai-workshop 2018
 
Beyond Kaggle: Solving Data Science Challenges at Scale
Beyond Kaggle: Solving Data Science Challenges at ScaleBeyond Kaggle: Solving Data Science Challenges at Scale
Beyond Kaggle: Solving Data Science Challenges at Scale
 
Making Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedMaking Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and Distributed
 
HadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software FrameworkHadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software Framework
 
Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)
 
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
 
Spark Meetup TensorFrames
Spark Meetup TensorFramesSpark Meetup TensorFrames
Spark Meetup TensorFrames
 
SciPy - Scientific Computing Tool
SciPy - Scientific Computing ToolSciPy - Scientific Computing Tool
SciPy - Scientific Computing Tool
 
Aggregation and Subsetting in ERDDAP
Aggregation and Subsetting in ERDDAPAggregation and Subsetting in ERDDAP
Aggregation and Subsetting in ERDDAP
 
Deep Learning with PyTorch
Deep Learning with PyTorchDeep Learning with PyTorch
Deep Learning with PyTorch
 
3. basic data structures(2)
3. basic data structures(2)3. basic data structures(2)
3. basic data structures(2)
 
Artificial Neural Network Implementation on FPGA
Artificial Neural Network Implementation on FPGAArtificial Neural Network Implementation on FPGA
Artificial Neural Network Implementation on FPGA
 
Climate data in r with the raster package
Climate data in r with the raster packageClimate data in r with the raster package
Climate data in r with the raster package
 
Machine learning at scale with Google Cloud Platform
Machine learning at scale with Google Cloud PlatformMachine learning at scale with Google Cloud Platform
Machine learning at scale with Google Cloud Platform
 
Junhua wang ai_next_con
Junhua wang ai_next_conJunhua wang ai_next_con
Junhua wang ai_next_con
 
Using Multi GPU in PyTorch
Using Multi GPU in PyTorchUsing Multi GPU in PyTorch
Using Multi GPU in PyTorch
 

Semelhante a Making AI efficient

Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable PythonTravis Oliphant
 
EOS5 Demo
EOS5 DemoEOS5 Demo
EOS5 Demodm_work
 
Project Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare MetalProject Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare MetalDatabricks
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Julien SIMON
 
GPUs in Big Data - StampedeCon 2014
GPUs in Big Data - StampedeCon 2014GPUs in Big Data - StampedeCon 2014
GPUs in Big Data - StampedeCon 2014StampedeCon
 
Deep learning with kafka
Deep learning with kafkaDeep learning with kafka
Deep learning with kafkaNitin Kumar
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Altinity Ltd
 
Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)Julien SIMON
 
Csw2016 wheeler barksdale-gruskovnjak-execute_mypacket
Csw2016 wheeler barksdale-gruskovnjak-execute_mypacketCsw2016 wheeler barksdale-gruskovnjak-execute_mypacket
Csw2016 wheeler barksdale-gruskovnjak-execute_mypacketCanSecWest
 
Advertising Fraud Detection at Scale at T-Mobile
Advertising Fraud Detection at Scale at T-MobileAdvertising Fraud Detection at Scale at T-Mobile
Advertising Fraud Detection at Scale at T-MobileDatabricks
 
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013Intel Software Brasil
 
A Platform for Accelerating Machine Learning Applications
 A Platform for Accelerating Machine Learning Applications A Platform for Accelerating Machine Learning Applications
A Platform for Accelerating Machine Learning ApplicationsNVIDIA Taiwan
 
Prácticas recomendadas en materia de arquitectura y errores que debes evitar
Prácticas recomendadas en materia de arquitectura y errores que debes evitarPrácticas recomendadas en materia de arquitectura y errores que debes evitar
Prácticas recomendadas en materia de arquitectura y errores que debes evitarElasticsearch
 
Deep Learning for Developers (January 2018)
Deep Learning for Developers (January 2018)Deep Learning for Developers (January 2018)
Deep Learning for Developers (January 2018)Julien SIMON
 

Semelhante a Making AI efficient (20)

Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable Python
 
EOS5 Demo
EOS5 DemoEOS5 Demo
EOS5 Demo
 
Project Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare MetalProject Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare Metal
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)
 
GPUs in Big Data - StampedeCon 2014
GPUs in Big Data - StampedeCon 2014GPUs in Big Data - StampedeCon 2014
GPUs in Big Data - StampedeCon 2014
 
Deep learning with kafka
Deep learning with kafkaDeep learning with kafka
Deep learning with kafka
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
 
Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)
 
Csw2016 wheeler barksdale-gruskovnjak-execute_mypacket
Csw2016 wheeler barksdale-gruskovnjak-execute_mypacketCsw2016 wheeler barksdale-gruskovnjak-execute_mypacket
Csw2016 wheeler barksdale-gruskovnjak-execute_mypacket
 
Advertising Fraud Detection at Scale at T-Mobile
Advertising Fraud Detection at Scale at T-MobileAdvertising Fraud Detection at Scale at T-Mobile
Advertising Fraud Detection at Scale at T-Mobile
 
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
 
A Platform for Accelerating Machine Learning Applications
 A Platform for Accelerating Machine Learning Applications A Platform for Accelerating Machine Learning Applications
A Platform for Accelerating Machine Learning Applications
 
Prácticas recomendadas en materia de arquitectura y errores que debes evitar
Prácticas recomendadas en materia de arquitectura y errores que debes evitarPrácticas recomendadas en materia de arquitectura y errores que debes evitar
Prácticas recomendadas en materia de arquitectura y errores que debes evitar
 
Angular and Deep Learning
Angular and Deep LearningAngular and Deep Learning
Angular and Deep Learning
 
Deep Learning for Developers
Deep Learning for DevelopersDeep Learning for Developers
Deep Learning for Developers
 
Deep Learning for Developers
Deep Learning for DevelopersDeep Learning for Developers
Deep Learning for Developers
 
Weld Strata talk
Weld Strata talkWeld Strata talk
Weld Strata talk
 
Deep Learning for Developers (January 2018)
Deep Learning for Developers (January 2018)Deep Learning for Developers (January 2018)
Deep Learning for Developers (January 2018)
 
Data analysis with pandas
Data analysis with pandasData analysis with pandas
Data analysis with pandas
 
Data Analysis With Pandas
Data Analysis With PandasData Analysis With Pandas
Data Analysis With Pandas
 

Mais de Dr Janet Bastiman

AI Fails: Avoiding bias in your systems
AI Fails: Avoiding bias in your systemsAI Fails: Avoiding bias in your systems
AI Fails: Avoiding bias in your systemsDr Janet Bastiman
 
Can abstraction lead to intelligence?
Can abstraction lead to intelligence?Can abstraction lead to intelligence?
Can abstraction lead to intelligence?Dr Janet Bastiman
 
Creating AI using biological network techniques
Creating AI using biological network techniquesCreating AI using biological network techniques
Creating AI using biological network techniquesDr Janet Bastiman
 
Collaboration, Publications, Community: Building your personal tech brand
Collaboration, Publications, Community: Building your personal tech brandCollaboration, Publications, Community: Building your personal tech brand
Collaboration, Publications, Community: Building your personal tech brandDr Janet Bastiman
 

Mais de Dr Janet Bastiman (8)

Making a deepfake
Making a deepfakeMaking a deepfake
Making a deepfake
 
Ethics of Deepfakes
Ethics of DeepfakesEthics of Deepfakes
Ethics of Deepfakes
 
What are deepfakes?
What are deepfakes?What are deepfakes?
What are deepfakes?
 
AI Fails: Avoiding bias in your systems
AI Fails: Avoiding bias in your systemsAI Fails: Avoiding bias in your systems
AI Fails: Avoiding bias in your systems
 
Can abstraction lead to intelligence?
Can abstraction lead to intelligence?Can abstraction lead to intelligence?
Can abstraction lead to intelligence?
 
AI Bias Oxford 2017
AI Bias Oxford 2017AI Bias Oxford 2017
AI Bias Oxford 2017
 
Creating AI using biological network techniques
Creating AI using biological network techniquesCreating AI using biological network techniques
Creating AI using biological network techniques
 
Collaboration, Publications, Community: Building your personal tech brand
Collaboration, Publications, Community: Building your personal tech brandCollaboration, Publications, Community: Building your personal tech brand
Collaboration, Publications, Community: Building your personal tech brand
 

Último

Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 

Último (20)

Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 

Making AI efficient

  • 1. Making AI Efficient Dr Janet Bastiman @yssybyl STORYSTREAM.AI
  • 2. Neurons 86 Billion Neurons, 20 Watts Multiple pathways Visual cortex estimated at 13 Billion neurons Visual system ~ 3 Watts (2.3 x 10-11 W per neuron) @yssybyl
  • 3. Computers… @yssybyl Device GPU CUDA cores Power W W per CUDA core Transistors W per transistor Dell XPS 1050 Ti 768 130 17 x 10-2 3.3 x109 3.9 x10-8 Dell Server 1080Ti 3584 1100 31 x 10-2 1.2 x1010 9.2 x10-8 DGX-1 8 Tesla P100 28672 6400 22 x 10-2 1.2 x1011 7.7x10-8 HGX-2 16 Tesla V100 81920 12800 16 x 10-2 3.3 x1011 3.9X10-8 Neurons are 1000x more energy efficient than transistors and a billion times more efficient than a single CUDA core If we are to get the most out of machines we need to recognise the cost of what we use
  • 4. How did we get here? • Abstractions • Higher level languages • Growing resources • Laziness But what if you can’t use the latest and greatest…? @yssybyl
  • 5. What is efficient? “Soon algorithms will be measured by the amount of intelligence they provide per Watt” – Taco Cohen, Qualcomm Minimal memory requirements Speed may be more important than accuracy in some cases Every flop is sacred @yssybyl
  • 6. Start with the basics Learn how to code well in whatever language you choose Understand the boundaries and the frameworks Optimise the code flow Discrete mathematics – know your computational linear algebra @yssybyl
  • 7. Stop cutting and pasting from Stack Overflow without understanding STORYSTREAM.AI Dr Janet Bastiman @yssybyl
  • 8. Optimise… @yssybyl Calling a library that just does the looping does not count as optimisation 
  • 9. Simple Python Performance tricks Pythonic code is more readable and usually faster by design - Know the basic data structures – dicts have O(1) lookup - Reduce memory footprint and avoid + operator on strings - Use built in functions - Calculations outside of loops - Keep code small @yssybyl for i in big_it: m = re.search(r'd{2}-d{2}-d{4}', i) if m: ... date_regex = re.compile(r'd{2}-d{2}-d{4}’) for i in big_it: m = date_regex.search(i) if m: ... newlist = [] for word in oldlist: newlist.append(word.upper()) newlist = map(str.upper, oldlist) # slow msg = 'hello ' + my_var + ' world’ # faster msg = 'hello %s world' % my_var # or better: msg = 'hello {} world'.format(my_var) msg = 'line1n’ msg += 'line2n’ msg += 'line3n' msg = ['line1', 'line2', 'line3’] 'n'.join(msg)
  • 10. Hardware We can’t all afford Nvidia’s latest and greatest  Most of us are restricted by in-house hardware or a budget for cloud services Horse sized duck vs duck sized horses? Performance on benchmarks is not necessarily indicative of your own networks. Value? @yssybyl
  • 11. Top down Tight code will always run faster Maximise CPU and GPU utilisation – split your program • Parallelise I/O operations • Parallelise data transformations Understand the real requirements of your GPU with allow_growth Use multiple GPUs if necessary Put your code on the most efficient part of the system @yssybyl config = tf.ConfigProto() config.gpu_options.allow_growth = True session = tf.Session(config=config, ...)
  • 12. Remove unnecessary imports Don’t import the world – use what you need numpy adds 320MB Do you really need numpy, pandas, sklearn and tf? Putting imports within rarely called functions may be beneficial Use the best tool for the job @yssybyl
  • 13. Limit data in This is what we do! Large images means large networks Don’t learn the noise Focussed data will mean better results with fewer examples @yssybyl
  • 14. Optimise Implementation Not every problem needs tensorflow PCA, naïve Bayes, SVM, Fourier transforms… Pre-optimised networks – let someone else do the hardwork… Understand your problem, understand your data, pick the best tool for the job @yssybyl
  • 16. Pruning @yssybyl https://arxiv.org/abs/1707.06168 https://github.com/yihui-he/channel-pruning Aim to reduce channels in feature map While minimising error LASSO regression Initial problem is NP-hard Yihui He et all added constraints to simplify that may not be relevant 5x speed increase 0.3% error increase
  • 17. Quantisation Trained models have a lot of floats. Can reduce precision to 8-bits Floats to store maximum and minimum values Quantised has linear spread and effectively represents an arbitrary magnitude of ranges Impact on accuracy so use with care @yssybyl
  • 18. Priority 1. Be mindful of resources 2. Code well, use correct data structures 3. Use the right libraries for the right tasks 4. Use structures other people have already optimised 5. Minimise inputs 6. Optimise parameters 7. Prune your network 8. Quantise your variables @yssybyl
  • 19. STORYSTREAM.AI STORYSTREAM.AI Dr Janet Bastiman @yssybyl Thank You @yssybyl janjanjan.uk https://uk.linkedin.com/in/janetbastiman

Notas do Editor

  1. The human brain is incredibly efficient – we have evolved to use our resources effectively and, at some point pumping more energy into our brains gave us that advantage, but it isn’t that much in the scheme of things. The brain is 2% of your body weight but 20% of the body’s energy. For an average 2400 kcal daily consumption this is about 23 J/s. The brain has about 86 Billion neurons of which 26 billion are in the cerebral cortex and about half of those are for the visual system, so our impressive visual system runs on abut 3 Watts. Now before we get into comparisons with artificial neural networks of any flavour it’s important to note that we have immensely specialised neurons. These aren’t just multi input transistors. There’s also multiple redundant pathways that get the signals through. Also, although neurons signal with an action potential that is all or nothing, the connections between neurons are analogue with the diffusion of neuro transmitters. At low values of signal to noise, analogue switches perform in a far more energy efficient manner than digital devices and there’s a great review in the proceedings of the IEEE : power consumption during neuronal computation that studies this beautifully. Even if we consider the power consumption of the brain as a whole rather than just the visual system, it’s still pretty efficient. My Dell XPS laptop that I run some of my tensorflow models on, takes 130 W and is nowhere near as capable as my brain.
  2. So we are making a few unfair comparisons here – assuming full utilisation of all resources in the server and I’m also not restricting to just the card… which may ve a little unfair in comparison, but when you look at the fact that a neuron is a billion times more efficient that a single CUDA core and even if we go to the transistor level, which is possibly a fairer comparison then they are still 1000x more energy efficient I’ve not added TPUs to this mainly because it’s a bit trickier to compare but based on the size of the TPU being less than half og the Intel Haswell CPU we can estimate that it’s ~332mm squared and has about 2.5 billion transistors with a 1600W power supply this would give a W per transistor of 4 x 10-7 which is still less energy efficient but gives faster performance by 1-2 orders of magnitude even going full body comparison, we are still several orders of magnitude more efficient. So we are starting with an imperfect artificial system. Further more we are not using these to capacity because we keep running inefficient applications on them.
  3. Nobody does machine code any more, we’ve moved away from low level languages into interpreted high level languages with bulky libraries and now into frameworks. With each level of abstraction comes an overhead of processing, of time and of power. We carry things we don’t need. Every level we move away from the hardware instruction set adds inefficiency in runtime but we gain so much more in ease and speed of development. So we’ve abstracted away from the hardware, adding layer and layer of inefficiency, but we’ve got more powerful CPUs, RAM and GPUs so we’ve not noticed. Because we’re getting a benefit from the hardware increases we don’t notice the bloat, and this is across the board not just deep learning. Unless you’ve ever done any embedded device programming you have probably never considered the requirements of the software you write. We have got lazy. Then what do you do when you can’t afford the biggest and best hardware, but still want to compete? What if you need to deploy to a resource limited device? If you don’t know how to make your models efficient you’ll be stuck
  4. I’ve been talking about energy requirements but let’s take a broader look at efficiency. I was at a conference in September and there was a great presentation on optimisation of networks by Taco Cohen of Qualcomm and a technique that I’ll go through shortly, but it was great to see that other people are thinking along the same lines as I am. Energy efficiency is great, but what about other resources. You may not have the on board RAM for very large models – where do you make the compromise. Similarly, speed of classification may be more important than the number of resources needed to created it . We see all the headlines about how fast things are going but the resources you need and the accuracy you get is appalling.. Start with the premise that every flop is sacred – work out what is expendable. If you minimise wastage at every level then your AI will run faster with lower requirements. So hopefully you’re all on board with being a little less lazy – whether that’s to save you money, get real time predictions or come up with the next crazy AI wearable…
  5. I can’t overemphasise this first one. If you’ve picked up python through self-teaching or just by following a few examples then you probably don’t know what you don’t know. Do coding competitions, look at and understand other people’s code If you’re coding in python then don’t underestimate this – so much is out of your control that coding well is difficult Every time you import a third party library you are giving up control to someone else. Someone else who has made decisions about how to code things. Someone else who can change how functions work without your permission or knowledge. Don’t assume that these developers are holding themselves to the highest standards. These projects have all the same flaws as the rest of us. Just because Google developed Tensorflow, don’t assume that it’s the best code you’ll ever see. Similarly TF will be updated for Google’s needs, meaning the underlying algorithms will be tuned and coded to optimise their solutions. This is true of all libraries. Use them knowing that they are inefficient and include lots of code you don’t need but will speed up your development. Take some time to work out what’s going on in your code and optimise the flow. Read the pragmatic programmer Diving into to code without thought is probably the worst way to get good code unless you’re happy to rewrite…. One of my better coding habits is to pseudocode everything first and then I can rearrange it before I code it up properly. Some of the bigger projects I’ve worked on don’t get really good until they’re on their third full rewrite. A minor bad decision that’s been built on can become almost impossible to remove. Another generic point to consider is the mathematics of your algorithms. If you do not understand the fundamental mathematics behind what you are doing then you’ll be inefficient at best or at worst wrong 
  6. You’d’ve thought this was obvious but I cringe every time I see people cutting and pasting code. Understand what you are including. Use this amazing resource to solve your problems by all means, but unless your problem is exactly the same you could be adding all sorts of inefficiencies. I’ve seen people break their code completely having “changed nothing” only to find they’d pasted something in and trashed a very important local variable. Similarly minimum working examples posted on SO can in themselves be locally efficient but generally inefficient. This is particularly true if you’re trying to optimise algorithms
  7. Let’s do a few maths examples. Starting simply. Let’s look at the Towers of Hanoi problem. As the number of discs increases, the minimum number of moves also increases on a three pin game. If you implement the first version of the algorithm it will naturally lend itself to what look slike a fairly tight recursion all the way down to M_1 = 1. This works fine with small numbers but you’ll quickly encounter problems when n starts to get large. A naïve solution would be to increase the python recursion limit and just accept the run time …. And yeah it works so they move on. If you understand the algorithm then you can simplify it to a single calculation that runs quickly and efficiently even for very large numbers. The second example is a bit more involved but I’d like you to think about how you’d implement this in python (or whatever language) You’re going to get a loop and it’s going to be annoying as n and therefore k get large. Using basic geometric series theory we can simplify this. First we split it into two series and then solve the sums inner and then outer to get a nice non loopy solution. Key point is, understand the maths of what you are doing. Do you really need lots of loopy or recursive functions? A single call to a library that just implements a loop is not an optimisation unless these steps are built in 
  8. There are some great resources for understanding what data structures to use, but dicts use hash tables to once you’ve set them up they’re super fast compared to searching Simple things like concatenating strings should be done using join and inserting variables with the string functions. Built in functions like map are pre-optimised in C and are much faster than anything else you can use in Python Similarly it might be obvious to do some transforms outside of loops, but definitions should also be done outside of the loop. Regexes might not feel like an overhead but in this example you’ll be redefining the regex with every iteration of the loop before you use it. In the second, you’re calling a local variable that has a predefined regex and is much faster.
  9. How you design and build your systems will be constrained by the hardware you have available both for training and deploy, the speed requirement of your network and the accuracy requirements. I’m assuming that you all know how to get accuracy so we’ll continue to focus on speed and size. At the risk of teaching you to suck eggs, if you have mechanical drives, make sure they’re defragged, but don’t defrag SSDs or the only thing you’ll speed up is their demise. Nobody has an infinite budget so consider how you will architect your services. If you think back to the energy requirements, the W per core and W per transistor were all pretty similar. So you can look at a combination of cost and on board memory . Rather than a behemoth of a system, you may find that many smaller systems will do the trick. If you keep your code tight then you will require fewer resources. Faster cards with lower RAM may be better value. Don’t buy large farms of physical servers unless you really need them, but make sure your team is not constrained - they’ll need as much as possible to do their research and having local devices that do the job is pretty much essential imho
  10. Small efficient code is always going to outperform larger codebases GPUs are great at all the things we need – large matrix operations for example, but don’t have the versatility we need for generic operations. This is why Google have developed the TPU - specialist chips have better performance. To be fair that’s exactly what’s happening with different neurons too. So we don’t want to have the GPU waiting to do the tasks it can do, we want it to be fully optimised. Similarly, we want to make as much use of our CPU as possible. By default TF maps all available GPU memory to the process to reduce memory fragmentation – set allow_growth and understand how much memory you actually need for final running of your network https://medium.com/@lisulimowicz/tensorflow-cpus-and-gpus-configuration-9c223436d4ef – nice example of playing with TF memory https://www.tensorflow.org/performance/performance_models – HPC tensorflow
  11. Even if you don’t call the libraries they will extend the demands on your system as they are loaded in ready to be used. If you don’t use something cut it. If you only need a subset of the features only import that subset. Numpy adds 320 MB even if it’s not called If you have some rare cases then you may want to put imports within the functions. This will make the function take longer but, may be beneficial depending on use. Python caches so it won’t reimport every time but it will still need to check that the library has been imported. However, profile first – usually you get far faster speed ups in other areas That said, pure python is not great at numerical analysis. https://realpython.com/numpy-tensorflow-performance/ Renato Candido did a great blog post for a simple linear regression problem – the timings are interesting for like for like code. Key here is to use the best tool for the job. Image copyright – Spaceballs, meme version https://funnyjunk.com/funny_pictures/4263235/Runnin+low+on+tp/ - covered by fair use
  12. We are actively discarding large amounts of our sensory inputs as we focus on what is critical in the moment. Our eyes see only about 10% of what is in front of us and our eyeballs are constantly moving – we make the rest up – this is why we’re fooled by optical illusions. This is not a video or gif it is a completely static jpeg. Your eyes are predicting movement because of the blur as your eyeballs move around. So we have far less real data coming into our brains than we think we do and our brains are making the rest of it up. Let’s apply the same techniques to our artificial brains. Your network will grow to accommodate the complexity of the data you are pushing through it. If you want speed and efficiency then simplify your problem and analyse the simpler problem. This requires deep knowledge of your problem space so you do not oversimplify and miss the nuances in your data. Images – reduce resolution, crop out the region of interest Time series – look at what encapsulates your patterns As was raised during questions, there is also dimensionality reduction as a limiter – for me this was beyond the scope of the talk as there are too many basic things that are not being done in AI. If you want to go further then look into dimensionality reduction, progressive networks and some of the trade-offs for size and accuracy. Image - www.ritsumei.ac.jp/~akitaoka/
  13. Tuning hyperparameters is one of the dark arts of ML. Unless you’ve got one of those brains that can see in hyperparameter space then you’ll pretty much pick something based on your gut feel. There are techniques you can use and Bayesian optimisation is one – you’ll get more accurate results faster. There are libraries you can bring in – GPyOpt has been around a while and is maintained. Amazon have been building on this for large scale deployment.
  14. So you’ve made your network, and it works, but where’s the redundancy? Could you have created a better architecture? How would you know. Well, back in 2015 a group from Harvard led by Ryan Adams started publishing papers on optimising networks using Bayesian pruning. Their spin out was bought out to stop this technology becoming easily available, but there are papers coming out now with similar ideas. Pruning channels is difficult because channels you remove from one layer will alter the input to the next layer, but you can get significant speed ups with only a minor increase in accuracy Nice paper from Yihui He et al and git hub repo – they used least absolute shrinkage and selection operator (LASSO) VGG-16 model and Applied both single channel and whole model pruning techniques. There are other optimisations techniques… Tensor factorisation, principle component iteration to determine the sub tensors that are important e.g. Accelerating Convolutional Neural Networks for Mobile Applications (tensor_optimisations.pdf)
  15. By far the biggest efficiency gains you will get will be from understanding your problem and coding well. These are the things I have to teach when I build teams. Following that, there are thousands of labs desperately trying to optimise the speed of training and speed of inference of a whole host of benchmarked networks and datasets – let them do the research and implement their techniques – you don’t need a fully fledged in house research team, just people who are capable of following the research. The most important thing is to treat your resources with care – be aware of when and why you are being wasteful