SlideShare uma empresa Scribd logo
1 de 58
Baixar para ler offline
Lynn Cherny, Assoc Prof Data Science, emlyon business
school
& Students!
@arnicas
PyData Paris 2017
Why am I here?
• Starting up a program in data science/analytics at
a business school: emlyon business school
• My courses first year: Python bootcamp, Data
analysis with Pandas, Text analysis/NLP, Business
Analytics (Excel pivot tables, SQL, Tableau).
• Next year: an intro AI course, some web & db stuff,
plus above.
–faculty in the marketing department when I introduced myself
“What do our students really need to know?”
–faculty in the marketing department when I introduced myself
“What do our students really need to know?”
–me, who likes NLP problems
“Hey, let’s find out by looking at job ads in
France.”
Also, This Project Course
• “Business Data Science Projects” — combine students
from
• École Lyon Centrale (engineering school, so
presumably coders) +
• emlyon business students (presumably non-coders)
for product design/research/plan
In practice, coding skills in the teams were not distributed
as expected; but my project had strong skills on both
sides (we already taught a few Python courses by then)
The student team
• Mathilde TRÉARDE (superb
project manager)
• Thomas PUCCI (amazing
reactjs front-end dev)
• Yann VAGINAY (great python
data scientist)
• Imen FEHRI
• Mohamed Amine MEJRI
• Roxane MARCILHACY (great
python data scientist)
• Julien RAULT
• Eric DUPRAZ
• Sophie REISER (great market
research/analyst)
• Nicolas LOUVIGNE (top notch
visual designer/branding)
• Grégoire CANER-CHABRAN
• Sarah DAIEN
Data Sources
Indeed API: targeted searches, text collection
apec.fr: targeted searches (and sifoning from API)
“JT” (CSV data dump from an edu provider)
Data collection began in February 2017 in earnest.
I beefed it up in April/May.
Demo
Filter: A PDF resume uploaded… maybe a bit imperfect now:
Biz students:
95 student interviews of job searchers
Excellent creative work
UI mockup suggestions from
biz team
Architecture
Lynn said we should do these (Mongo, ES, Flask)
and set up (poorly managed and insecure) Mongo / Elastic / EC2 crawler host
herself on AWS.
Dev team did their own github/react & nodejs/Heroku plan.
Some discoveries in the
code after it was over.
• Databases didn’t have date the items were added
to them (date of scrape)
• Scraping was based on rather random sets of
words, and not consistent across site sources
• No automation of the indexing in Elastic - manual
job from Jupyter notebook (they knew this was an
issue too)
• Scraper code was never put on github.
My security issues
• Tried and failed to secure mongo by my own ssh key gen,
ended up using tunneling from scraping machine(that works
fine).
• Elastic is wide open and had been written to by a virus
(Amazon just sent me a warning), creating extra tables.
• We had a lot of issues with university firewalls and the cloud.
We all had to tether to phones to access the dbs from
school.
• AWS security stuff is really confusing. (One student team
didn’t succeed in using AWS at all— no one helped them.)
the data in more
detail…
Total Data Now by Source
• “JT,” an academic partner (given us as dump in
Jan, now “out of date”): 78K
• Apec: 25K
• Indeed: 10K
Apec - cadres
My student: “they would never hire someone like me”
Indeed - international feed (API)
with links - need to scrape text
more english:
Data in the db : the search
terms requested by API (!?)
apec.fr Indeed
Dates in the db (remember,
not the date scraped…)
Indeed’s date of
publication counts
Apec
student work ended March/Apr - I added new terms and increased scraping into May/June
JT provided data dates
JT provided data dates
No, this spike is real,
they are different ads and dated
this same day.
Job type labels
on JT data
Largest cats are
Marketing, Bizdev,
Communication
(Dev/IT not small tho)
“JT” : more “stages”
Revisit the word2vec
part
Or create your own list
and see the related
skills in the
“neighborhood”:
scikit-learn is not in the skills list? but is found in a job ad!
What is that graph?
a few “closely related skills” (by word2vec distance) in
a simple TSNE layout, computed and passed over API.
Awesome idea… but caveat: “Skills” were pre-filtered from the
word2vec model of the job ads, using the list of LinkedIn
skills.
link
A few related links
Radio’s tutorial on using word2vec in gensim:
https://rare-technologies.com/word2vec-tutorial/
My 8 million links on w2v papers/code etc:
https://pinboard.in/search/u:arnicas?query=word2vec
Interactive demo of w2v tsne layout of Yelp text reviews:
https://bl.ocks.org/arnicas/dd2ef348ad8854e40ef2
Useful warnings/info about making tsne layouts (we need
a grid search option):
http://distill.pub/2016/misread-tsne/
LinkedIn Skillz list: English,
Mysterious, —Garbage?
LI skills only
from the w2v model
in March
Zoom in…
Word2Vec updated (a week ago)
?!
Python also didn’t make the
“top 50 words
per search term,” which is
sad.
My shitty tsne layout that took
40 minutes on my laptop
Tensorboard projector view
convert your gensim model to tensorflow tsv files and upload
http://projector.tensorflow.org/
english
Tableau app
vis in
Tableau,
more UI
options
Most frequent
data-related
words, sized by
frequency
in search on
source.
Note: few JT ads
words (pink)
Sales,
logistics
supply chain -
lot of JT.
Let’s look at job ads
again…
“skills” are often soft or “previous
experience doing” in business job ads
link
Market research with
students:
Algorithm to determine “skill” “matches” is interesting but
worrying. It has to be really “good.”
–one of my students (who did better after tips on searching for skills I’d
taught on other job sites) :)
“I feel like we’re all looking at the same vague
job ads and competing with each other.”
Search by courses taken?
some of these descriptions are really short and vague; what’s
a good criterion for match?
sure, with 2 words, we get
some matches…
Teaching vs. Jobs, a Gap.
Les	 entrepreneurs	 sont	 appelés	 à	 résoudre	
constamment	des	problèmes	avec	peu	de	temps	
et	 de	 ressources	 pour	 prendre	 du	 recul	 dans	 un	
environnement	à	forte	incer7tude.	En	s'appuyant	
sur	des	résultats	en	recherche	sur	le	management	
et	la	psychologie	cogni7ve,	ce	cours	vise	à	fournir	
quelques	 apports	 simples	 pour	 développer	 et	
accompagner	 l'ap7tude	 décisionnelle	 des	
par7cipants.
“decision-making” course:
Job ad: “You can make decisions”?
So, Extension Ideas
• For student job search improvement:
• Return to skill extraction problem; use some training data. (Do some qualitative
analysis.)
• CV matching problem: revisit. Use different skills extraction (n-grams)
• Compare description of ALL courses taken (and liked) vs. jobs out there; is this
better?
• Curriculum development:
• Evaluate course descriptions by how well they match jobs
• Find “gaps” in teaching — what’s not being taught? (E.g., SQL.)
• Could course descriptions (and content) be better? Make this easier for
students?
My plan now
• Generally, starting up a Data Science Institute in
EM-Lyon. Money —> DS and data vis visitors/
confs/talks.
• Looking for help with teaching/workshops/tutorials
(Paris, Lyon, St. Etienne, Shanghai, Casablanca,
India)
• Contact me at cherny@em-lyon.com or @arnicas
Reminder: The student team
• Mathilde TRÉARDE (superb
project manager)
• Thomas PUCCI (amazing
reactjs front-end dev, multiply
employed)
• Yann VAGINAY (great python
data scientist doing NLP in
German stage now)
• Imen FEHRI
• Mohamed Amine MEJRI
• Roxane MARCILHACY (great python
data scientist) - now also web dev.
Looking for stage in Paris.
• Julien RAULT
• Eric DUPRAZ
• Sophie REISER (great market
research/analyst, not dev, but looking)
• Nicolas LOUVIGNE (top notch visual
designer/branding)
• Grégoire CANER-CHABRAN
• Sarah DAIEN

Mais conteúdo relacionado

Mais procurados

PuppetConf track overview: Inside Puppet
PuppetConf track overview: Inside PuppetPuppetConf track overview: Inside Puppet
PuppetConf track overview: Inside PuppetPuppet
 
How to really obfuscate your pdf malware
How to really obfuscate your pdf malwareHow to really obfuscate your pdf malware
How to really obfuscate your pdf malwarezynamics GmbH
 
Polyglot Applications with GraalVM
Polyglot Applications with GraalVMPolyglot Applications with GraalVM
Polyglot Applications with GraalVMjexp
 
C++ Unit testing - the good, the bad & the ugly
C++ Unit testing - the good, the bad & the uglyC++ Unit testing - the good, the bad & the ugly
C++ Unit testing - the good, the bad & the uglyDror Helper
 
Introduction to Agile Software Development & Python
Introduction to Agile Software Development & PythonIntroduction to Agile Software Development & Python
Introduction to Agile Software Development & PythonTharindu Weerasinghe
 
Java vs JavaScript | Edureka
Java vs JavaScript | EdurekaJava vs JavaScript | Edureka
Java vs JavaScript | EdurekaEdureka!
 
Introduction to mobile reversing
Introduction to mobile reversingIntroduction to mobile reversing
Introduction to mobile reversingjduart
 
Down With JavaScript!
Down With JavaScript!Down With JavaScript!
Down With JavaScript!Garth Gilmour
 
ConFoo Montreal - Approaches for application request throttling
ConFoo Montreal - Approaches for application request throttlingConFoo Montreal - Approaches for application request throttling
ConFoo Montreal - Approaches for application request throttlingMaarten Balliauw
 
Application Development Using Java - DIYComputerScience Course
Application Development Using Java - DIYComputerScience CourseApplication Development Using Java - DIYComputerScience Course
Application Development Using Java - DIYComputerScience Courseparag
 
Introduction to Enterprise Applications and Tools
Introduction to Enterprise Applications and ToolsIntroduction to Enterprise Applications and Tools
Introduction to Enterprise Applications and ToolsTharindu Weerasinghe
 
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...Maarten Balliauw
 
Node.js Presentation Rotterdam.PHP
Node.js Presentation Rotterdam.PHPNode.js Presentation Rotterdam.PHP
Node.js Presentation Rotterdam.PHPJoris Verbogt
 
JavaScript for Enterprise Applications
JavaScript for Enterprise ApplicationsJavaScript for Enterprise Applications
JavaScript for Enterprise ApplicationsPiyush Katariya
 
PHP Mega Meetup, Sep, 2020, Anti patterns in php
PHP Mega Meetup, Sep, 2020, Anti patterns in phpPHP Mega Meetup, Sep, 2020, Anti patterns in php
PHP Mega Meetup, Sep, 2020, Anti patterns in phpAhmed Abdou
 

Mais procurados (20)

PuppetConf track overview: Inside Puppet
PuppetConf track overview: Inside PuppetPuppetConf track overview: Inside Puppet
PuppetConf track overview: Inside Puppet
 
How to really obfuscate your pdf malware
How to really obfuscate your pdf malwareHow to really obfuscate your pdf malware
How to really obfuscate your pdf malware
 
Polyglot Applications with GraalVM
Polyglot Applications with GraalVMPolyglot Applications with GraalVM
Polyglot Applications with GraalVM
 
C++ Unit testing - the good, the bad & the ugly
C++ Unit testing - the good, the bad & the uglyC++ Unit testing - the good, the bad & the ugly
C++ Unit testing - the good, the bad & the ugly
 
Polyglot
PolyglotPolyglot
Polyglot
 
Introduction to Agile Software Development & Python
Introduction to Agile Software Development & PythonIntroduction to Agile Software Development & Python
Introduction to Agile Software Development & Python
 
Core java slides
Core java slidesCore java slides
Core java slides
 
Java vs JavaScript | Edureka
Java vs JavaScript | EdurekaJava vs JavaScript | Edureka
Java vs JavaScript | Edureka
 
Introduction to mobile reversing
Introduction to mobile reversingIntroduction to mobile reversing
Introduction to mobile reversing
 
Down With JavaScript!
Down With JavaScript!Down With JavaScript!
Down With JavaScript!
 
ConFoo Montreal - Approaches for application request throttling
ConFoo Montreal - Approaches for application request throttlingConFoo Montreal - Approaches for application request throttling
ConFoo Montreal - Approaches for application request throttling
 
Application Development Using Java - DIYComputerScience Course
Application Development Using Java - DIYComputerScience CourseApplication Development Using Java - DIYComputerScience Course
Application Development Using Java - DIYComputerScience Course
 
Introduction to Enterprise Applications and Tools
Introduction to Enterprise Applications and ToolsIntroduction to Enterprise Applications and Tools
Introduction to Enterprise Applications and Tools
 
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...
 
Why Concurrency is hard ?
Why Concurrency is hard ?Why Concurrency is hard ?
Why Concurrency is hard ?
 
Ateji PX for Java
Ateji PX for JavaAteji PX for Java
Ateji PX for Java
 
Node.js Presentation Rotterdam.PHP
Node.js Presentation Rotterdam.PHPNode.js Presentation Rotterdam.PHP
Node.js Presentation Rotterdam.PHP
 
JavaScript for Enterprise Applications
JavaScript for Enterprise ApplicationsJavaScript for Enterprise Applications
JavaScript for Enterprise Applications
 
Java vs python
Java vs pythonJava vs python
Java vs python
 
PHP Mega Meetup, Sep, 2020, Anti patterns in php
PHP Mega Meetup, Sep, 2020, Anti patterns in phpPHP Mega Meetup, Sep, 2020, Anti patterns in php
PHP Mega Meetup, Sep, 2020, Anti patterns in php
 

Semelhante a Lynn Cherny Data Science Program emlyon business school

Info Session : University Institute of engineering and technology , Kurukshet...
Info Session : University Institute of engineering and technology , Kurukshet...Info Session : University Institute of engineering and technology , Kurukshet...
Info Session : University Institute of engineering and technology , Kurukshet...HRITIKKHURANA1
 
Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015Talent42
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningPaco Nathan
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLPaco Nathan
 
20180707 - 2nd meeting - Group Orientation
20180707 - 2nd meeting - Group Orientation20180707 - 2nd meeting - Group Orientation
20180707 - 2nd meeting - Group OrientationDuc Lai Trung Minh
 
CETS 2011, Mark Steiner, Top 10 Ways to Make Your eLearning Project Successful
CETS 2011, Mark Steiner, Top 10 Ways to Make Your eLearning Project SuccessfulCETS 2011, Mark Steiner, Top 10 Ways to Make Your eLearning Project Successful
CETS 2011, Mark Steiner, Top 10 Ways to Make Your eLearning Project SuccessfulChicago eLearning & Technology Showcase
 
When develpment met test(shift left testing)
When develpment met test(shift left testing)When develpment met test(shift left testing)
When develpment met test(shift left testing)SangIn Choung
 
Stefan Geissler kairntech - SDC Nice Apr 2019
Stefan Geissler kairntech - SDC Nice Apr 2019 Stefan Geissler kairntech - SDC Nice Apr 2019
Stefan Geissler kairntech - SDC Nice Apr 2019 Stefan Geißler
 
How to start your data career
How to start your data careerHow to start your data career
How to start your data careerAdwait Bhave
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLPaco Nathan
 
DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19Yong Siang (Ivan) Tan
 
Start Building Machine Learning Models Faster Than You Think
Start Building Machine Learning Models Faster Than You ThinkStart Building Machine Learning Models Faster Than You Think
Start Building Machine Learning Models Faster Than You ThinkCheah Eng Soon
 
Rapid elearning tools and techniques
Rapid elearning tools and techniquesRapid elearning tools and techniques
Rapid elearning tools and techniquesSteve Rayson
 
How to Build your Career.pptx
How to Build your Career.pptxHow to Build your Career.pptx
How to Build your Career.pptxvaideheekore
 
Agilelessons scanagile-final 2013
Agilelessons scanagile-final 2013Agilelessons scanagile-final 2013
Agilelessons scanagile-final 2013lokori
 
Designing for insight
Designing for insightDesigning for insight
Designing for insightAaron Silvers
 

Semelhante a Lynn Cherny Data Science Program emlyon business school (20)

Info Session : University Institute of engineering and technology , Kurukshet...
Info Session : University Institute of engineering and technology , Kurukshet...Info Session : University Institute of engineering and technology , Kurukshet...
Info Session : University Institute of engineering and technology , Kurukshet...
 
Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 
Google summer of code 2012
Google summer of code 2012Google summer of code 2012
Google summer of code 2012
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
20180707 - 2nd meeting - Group Orientation
20180707 - 2nd meeting - Group Orientation20180707 - 2nd meeting - Group Orientation
20180707 - 2nd meeting - Group Orientation
 
Report on web development
Report on web developmentReport on web development
Report on web development
 
CETS 2011, Mark Steiner, Top 10 Ways to Make Your eLearning Project Successful
CETS 2011, Mark Steiner, Top 10 Ways to Make Your eLearning Project SuccessfulCETS 2011, Mark Steiner, Top 10 Ways to Make Your eLearning Project Successful
CETS 2011, Mark Steiner, Top 10 Ways to Make Your eLearning Project Successful
 
Why other ppl_dont_get_it
Why other ppl_dont_get_itWhy other ppl_dont_get_it
Why other ppl_dont_get_it
 
When develpment met test(shift left testing)
When develpment met test(shift left testing)When develpment met test(shift left testing)
When develpment met test(shift left testing)
 
Computer software specialists wikki verma
Computer software specialists   wikki vermaComputer software specialists   wikki verma
Computer software specialists wikki verma
 
Stefan Geissler kairntech - SDC Nice Apr 2019
Stefan Geissler kairntech - SDC Nice Apr 2019 Stefan Geissler kairntech - SDC Nice Apr 2019
Stefan Geissler kairntech - SDC Nice Apr 2019
 
How to start your data career
How to start your data careerHow to start your data career
How to start your data career
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area ML
 
DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19
 
Start Building Machine Learning Models Faster Than You Think
Start Building Machine Learning Models Faster Than You ThinkStart Building Machine Learning Models Faster Than You Think
Start Building Machine Learning Models Faster Than You Think
 
Rapid elearning tools and techniques
Rapid elearning tools and techniquesRapid elearning tools and techniques
Rapid elearning tools and techniques
 
How to Build your Career.pptx
How to Build your Career.pptxHow to Build your Career.pptx
How to Build your Career.pptx
 
Agilelessons scanagile-final 2013
Agilelessons scanagile-final 2013Agilelessons scanagile-final 2013
Agilelessons scanagile-final 2013
 
Designing for insight
Designing for insightDesigning for insight
Designing for insight
 

Mais de Pôle Systematic Paris-Region

OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...Pôle Systematic Paris-Region
 
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...Pôle Systematic Paris-Region
 
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...Pôle Systematic Paris-Region
 
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...Pôle Systematic Paris-Region
 
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...Pôle Systematic Paris-Region
 
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...Pôle Systematic Paris-Region
 
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...Pôle Systematic Paris-Region
 
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick MoyOsis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick MoyPôle Systematic Paris-Region
 
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMAOsis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMAPôle Systematic Paris-Region
 
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur BittorrentOsis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur BittorrentPôle Systematic Paris-Region
 
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...Pôle Systematic Paris-Region
 
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riotOSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riotPôle Systematic Paris-Region
 
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...Pôle Systematic Paris-Region
 
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...Pôle Systematic Paris-Region
 
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...Pôle Systematic Paris-Region
 
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)Pôle Systematic Paris-Region
 
PyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelatPyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelatPôle Systematic Paris-Region
 

Mais de Pôle Systematic Paris-Region (20)

OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
 
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
 
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
 
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
 
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
 
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
 
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
 
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick MoyOsis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
 
Osis18_Cloud : Pas de commun sans communauté ?
Osis18_Cloud : Pas de commun sans communauté ?Osis18_Cloud : Pas de commun sans communauté ?
Osis18_Cloud : Pas de commun sans communauté ?
 
Osis18_Cloud : Projet Wolphin
Osis18_Cloud : Projet Wolphin Osis18_Cloud : Projet Wolphin
Osis18_Cloud : Projet Wolphin
 
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMAOsis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
 
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur BittorrentOsis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
 
Osis18_Cloud : Software-heritage
Osis18_Cloud : Software-heritageOsis18_Cloud : Software-heritage
Osis18_Cloud : Software-heritage
 
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
 
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riotOSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
 
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
 
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
 
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
 
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
 
PyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelatPyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelat
 

Último

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Último (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 

Lynn Cherny Data Science Program emlyon business school

  • 1. Lynn Cherny, Assoc Prof Data Science, emlyon business school & Students! @arnicas PyData Paris 2017
  • 2. Why am I here? • Starting up a program in data science/analytics at a business school: emlyon business school • My courses first year: Python bootcamp, Data analysis with Pandas, Text analysis/NLP, Business Analytics (Excel pivot tables, SQL, Tableau). • Next year: an intro AI course, some web & db stuff, plus above.
  • 3. –faculty in the marketing department when I introduced myself “What do our students really need to know?”
  • 4. –faculty in the marketing department when I introduced myself “What do our students really need to know?” –me, who likes NLP problems “Hey, let’s find out by looking at job ads in France.”
  • 5. Also, This Project Course • “Business Data Science Projects” — combine students from • École Lyon Centrale (engineering school, so presumably coders) + • emlyon business students (presumably non-coders) for product design/research/plan In practice, coding skills in the teams were not distributed as expected; but my project had strong skills on both sides (we already taught a few Python courses by then)
  • 6. The student team • Mathilde TRÉARDE (superb project manager) • Thomas PUCCI (amazing reactjs front-end dev) • Yann VAGINAY (great python data scientist) • Imen FEHRI • Mohamed Amine MEJRI • Roxane MARCILHACY (great python data scientist) • Julien RAULT • Eric DUPRAZ • Sophie REISER (great market research/analyst) • Nicolas LOUVIGNE (top notch visual designer/branding) • Grégoire CANER-CHABRAN • Sarah DAIEN
  • 7. Data Sources Indeed API: targeted searches, text collection apec.fr: targeted searches (and sifoning from API) “JT” (CSV data dump from an edu provider) Data collection began in February 2017 in earnest. I beefed it up in April/May.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14. Filter: A PDF resume uploaded… maybe a bit imperfect now:
  • 15. Biz students: 95 student interviews of job searchers
  • 17. UI mockup suggestions from biz team
  • 18. Architecture Lynn said we should do these (Mongo, ES, Flask) and set up (poorly managed and insecure) Mongo / Elastic / EC2 crawler host herself on AWS. Dev team did their own github/react & nodejs/Heroku plan.
  • 19. Some discoveries in the code after it was over. • Databases didn’t have date the items were added to them (date of scrape) • Scraping was based on rather random sets of words, and not consistent across site sources • No automation of the indexing in Elastic - manual job from Jupyter notebook (they knew this was an issue too) • Scraper code was never put on github.
  • 20. My security issues • Tried and failed to secure mongo by my own ssh key gen, ended up using tunneling from scraping machine(that works fine). • Elastic is wide open and had been written to by a virus (Amazon just sent me a warning), creating extra tables. • We had a lot of issues with university firewalls and the cloud. We all had to tether to phones to access the dbs from school. • AWS security stuff is really confusing. (One student team didn’t succeed in using AWS at all— no one helped them.)
  • 21. the data in more detail…
  • 22. Total Data Now by Source • “JT,” an academic partner (given us as dump in Jan, now “out of date”): 78K • Apec: 25K • Indeed: 10K
  • 23. Apec - cadres My student: “they would never hire someone like me”
  • 24. Indeed - international feed (API) with links - need to scrape text
  • 26. Data in the db : the search terms requested by API (!?) apec.fr Indeed
  • 27. Dates in the db (remember, not the date scraped…) Indeed’s date of publication counts Apec student work ended March/Apr - I added new terms and increased scraping into May/June
  • 29. JT provided data dates No, this spike is real, they are different ads and dated this same day.
  • 30. Job type labels on JT data Largest cats are Marketing, Bizdev, Communication (Dev/IT not small tho)
  • 31. “JT” : more “stages”
  • 33. Or create your own list and see the related skills in the “neighborhood”: scikit-learn is not in the skills list? but is found in a job ad!
  • 34. What is that graph? a few “closely related skills” (by word2vec distance) in a simple TSNE layout, computed and passed over API. Awesome idea… but caveat: “Skills” were pre-filtered from the word2vec model of the job ads, using the list of LinkedIn skills. link
  • 35. A few related links Radio’s tutorial on using word2vec in gensim: https://rare-technologies.com/word2vec-tutorial/ My 8 million links on w2v papers/code etc: https://pinboard.in/search/u:arnicas?query=word2vec Interactive demo of w2v tsne layout of Yelp text reviews: https://bl.ocks.org/arnicas/dd2ef348ad8854e40ef2 Useful warnings/info about making tsne layouts (we need a grid search option): http://distill.pub/2016/misread-tsne/
  • 36. LinkedIn Skillz list: English, Mysterious, —Garbage?
  • 37. LI skills only from the w2v model in March
  • 39. Word2Vec updated (a week ago) ?! Python also didn’t make the “top 50 words per search term,” which is sad.
  • 40.
  • 41. My shitty tsne layout that took 40 minutes on my laptop
  • 42. Tensorboard projector view convert your gensim model to tensorflow tsv files and upload http://projector.tensorflow.org/ english
  • 43.
  • 45. Most frequent data-related words, sized by frequency in search on source. Note: few JT ads words (pink)
  • 47. Let’s look at job ads again…
  • 48.
  • 49.
  • 50. “skills” are often soft or “previous experience doing” in business job ads link
  • 51. Market research with students: Algorithm to determine “skill” “matches” is interesting but worrying. It has to be really “good.”
  • 52. –one of my students (who did better after tips on searching for skills I’d taught on other job sites) :) “I feel like we’re all looking at the same vague job ads and competing with each other.”
  • 53. Search by courses taken? some of these descriptions are really short and vague; what’s a good criterion for match?
  • 54. sure, with 2 words, we get some matches…
  • 55. Teaching vs. Jobs, a Gap. Les entrepreneurs sont appelés à résoudre constamment des problèmes avec peu de temps et de ressources pour prendre du recul dans un environnement à forte incer7tude. En s'appuyant sur des résultats en recherche sur le management et la psychologie cogni7ve, ce cours vise à fournir quelques apports simples pour développer et accompagner l'ap7tude décisionnelle des par7cipants. “decision-making” course: Job ad: “You can make decisions”?
  • 56. So, Extension Ideas • For student job search improvement: • Return to skill extraction problem; use some training data. (Do some qualitative analysis.) • CV matching problem: revisit. Use different skills extraction (n-grams) • Compare description of ALL courses taken (and liked) vs. jobs out there; is this better? • Curriculum development: • Evaluate course descriptions by how well they match jobs • Find “gaps” in teaching — what’s not being taught? (E.g., SQL.) • Could course descriptions (and content) be better? Make this easier for students?
  • 57. My plan now • Generally, starting up a Data Science Institute in EM-Lyon. Money —> DS and data vis visitors/ confs/talks. • Looking for help with teaching/workshops/tutorials (Paris, Lyon, St. Etienne, Shanghai, Casablanca, India) • Contact me at cherny@em-lyon.com or @arnicas
  • 58. Reminder: The student team • Mathilde TRÉARDE (superb project manager) • Thomas PUCCI (amazing reactjs front-end dev, multiply employed) • Yann VAGINAY (great python data scientist doing NLP in German stage now) • Imen FEHRI • Mohamed Amine MEJRI • Roxane MARCILHACY (great python data scientist) - now also web dev. Looking for stage in Paris. • Julien RAULT • Eric DUPRAZ • Sophie REISER (great market research/analyst, not dev, but looking) • Nicolas LOUVIGNE (top notch visual designer/branding) • Grégoire CANER-CHABRAN • Sarah DAIEN