SlideShare a Scribd company logo
1 of 17
Download to read offline
IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)

Dunedin, New Zealand

11 - 14 August 2020
On Understanding
Data Scientists
Paula Pereira 

University of Minho

Portugal 

a77672@alunos.uminho.pt 

Jácome Cunha

University of Minho & HASLab/INESC Tec

Portugal 

jacome@di.uminho.pt 

João Paulo Fernandes 

CISUC, University of Coimbra 

Portugal 

jpf@dei.uc.pt
Forbes
“… each flight generating more than 30
times the amount of data the previous
generation of wide-bodied jets
produced.”
“By 2026, annual data generation should
reach 98 billion gigabytes, or 98 million
terabytes, according to a 2016 estimate
by Oliver Wyman.”
https://www.forbes.com/sites/oliverwyman/2017/06/16/the-data-science-revolution-transforming-aviation/#67f14be67f6c
Data Scientist: The
Sexiest Job of the
21st Century
https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
We need to know more about
data science and data scientists!
• We conducted semi-structured
interviews to 8 people (1 excluded
- P3)

• 3 Female, 5 Male

• 1 Business Intelligence Manager

1 Big Data Architect

2 Data Analysts

4 Data Scientists

• Domains from music management
to web development
Participants
INTERVIEWEES INFORMATION.
ID Sex Age Job Title Education Level Domain
P1 F 30 Data Analyst Master, Marketing Music
Manag.
P2 M 36 Business
Intelligence
Manager
Master, Data Analysis
and Decision Support
Systems
Retail
P3 M 37 Big Data
Architect
Bachelor, Math and
Computer Science
Software
Dev.
P4 M 34 Data
Scientist
PhD, Electrical
and Computer
Engineering
Telecom.
P5 M 42 Data
Scientist
PhD, Data Mining Virtual
Call
Center
P6 F 26 Data Analyst Master, Mathematics
and Computation
Web Dev.
P7 F 32 Data
Scientist
Master, Mathematics
Engineering
Virtual
Call
Center
P8 M 32 Data
Scientist
PhD, Evolutionary Bi-
ology
Telecom.
And the
results are in…
Academic Background
• MSc in Marketing (Bachelor in Hotel Management)

• Bachelor in Economics

• PhD in Electrical and Computer Engineering 

• MSc in Mathematics Engineering 

• MSc in Mathematics and Computation 

• PhD in Evolutionary Biology

• PhD in Data Mining
Implications
• Some find the need to learn more

“I had been working in auditing information systems for two years, and at that time I decided that data
was ‘the thing’and I went to get a master’s degree in Data Analysis and Decision Support Systems.” — P2
• Background also defines the kinds of tasks performed

• Only those with training in CS or engineering do tasks related to the
creation of machine learning and deep learning models 

• The remaining dedicate themselves to more direct analysis, based on
statistical parameters such as average, standard deviation, distributions
Data Sources and Quality
• Data sources

• data generated internally by various teams

• public data sources is also frequent

• The need for data integration is significant

• Data ranges from customer data, to operational data

• Only one case (P2) reported using some kind of data quality metrics
More resultsStay tuned ;)
• R and Python (not new)

• Choice made according to personal preferences and the type of tasks

• Some cases (P5, P7), choose as a team so that all elements use the same technologies

• Most participants do not use data analysis tools

• These tools end up limiting their analysis

• Does not happen when they produce their own code

• However, P1 and P2 do a large part of their analysis using only MS Excel (very fast
results)
Tools and Programming Languages
Difficulties
• Lack of training in the field of data science 

• Access to information with quality and relevant to the problems in hands

“I believe that access to quality information and information relevant to our problems is the greatest
challenge.” – P4
• Lack of teammates 

“In my case, being alone is a big limitation, ...and initially, it is very difficult to have the required business
expertise to understand what are its needs.” – P6
Yes, There Are More Difficulties
• Often very difficult to convert business problems into data science issues

• Development of stable and scalable code 

“On a personal level, I think my biggest challenge is to write a stable and scalable code because my
training is not very oriented for software engineering.” – P8
• Professionals being hired for data science positions that should be occupied
by other type of professionals

“Companies look at the market and, because there is a demand for data scientists, they also want to hire
one. However, looking at the job’s requirements, their needs would be easily mitigated by other types of
professionals.” – P7
Almost…
• Very unclear/eclectic definition of data science job

• It’s important to clarify which are the different areas of data science

• This would help the professionals who wish to work in this field to position
themselves correctly

“In my opinion, there are two main areas: technological and application data science. The data scientist of the future
must know how to put himself in the right area of data science to avoid regretting what (s)he is doing.” – P2
• All participants agree that it is a great advantage to have people with different
backgrounds in data science teams because

• Bring different perspectives on the data
So…
Opportunities for the Research Community
• We still need to learn more about data scientists

• Only then we can help them

• They are also (data science) end users
• As we (have) help(ed) software engineers and developers end-users, we need
to help these new end users

• Tailored languages, tools, methodologies, …

• For learning, data cleaning, analysis, visualization, integration, etc.
16
IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)

Dunedin, New Zealand

11 - 14 August 2020
On Understanding
Data Scientists
Paula Pereira 

University of Minho

Portugal 

a77672@alunos.uminho.pt 

Jácome Cunha

University of Minho & HASLab/INESC Tec

Portugal 

jacome@di.uminho.pt 

João Fernandes 

CISUC, University of Coimbra 

Portugal 

jpf@dei.uc.pt

More Related Content

What's hot

It presentation final1
It presentation final1It presentation final1
It presentation final1
wakhale
 

What's hot (20)

Data+Science : A First Course
Data+Science : A First CourseData+Science : A First Course
Data+Science : A First Course
 
Data science Big Data
Data science Big DataData science Big Data
Data science Big Data
 
Data Science, Big Data, Data Analytics
Data Science, Big Data, Data AnalyticsData Science, Big Data, Data Analytics
Data Science, Big Data, Data Analytics
 
International Journal on Foundations of Computer Science & Technology (IJFCST)
International Journal on Foundations of Computer Science & Technology (IJFCST)International Journal on Foundations of Computer Science & Technology (IJFCST)
International Journal on Foundations of Computer Science & Technology (IJFCST)
 
Computer science Engineering Distance Education
Computer science Engineering Distance EducationComputer science Engineering Distance Education
Computer science Engineering Distance Education
 
Data Science
Data ScienceData Science
Data Science
 
How to be data savvy manager
How to be data savvy managerHow to be data savvy manager
How to be data savvy manager
 
data science
data sciencedata science
data science
 
Building up a Data Science Team from Scratch
Building up a Data Science Team from ScratchBuilding up a Data Science Team from Scratch
Building up a Data Science Team from Scratch
 
Data science skills and development for the energy sector
Data science skills and development for the energy sectorData science skills and development for the energy sector
Data science skills and development for the energy sector
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
 
BIG DATA ANALYTICS-OPPORTUNITIES,CHALLENGES AND THE FUTURE:CERTIFICATE OF ACH...
BIG DATA ANALYTICS-OPPORTUNITIES,CHALLENGES AND THE FUTURE:CERTIFICATE OF ACH...BIG DATA ANALYTICS-OPPORTUNITIES,CHALLENGES AND THE FUTURE:CERTIFICATE OF ACH...
BIG DATA ANALYTICS-OPPORTUNITIES,CHALLENGES AND THE FUTURE:CERTIFICATE OF ACH...
 
Data science and visualization lab presentation
Data science and visualization lab presentationData science and visualization lab presentation
Data science and visualization lab presentation
 
م.64-مبادرة#تواصل_تطوير-خبير علم البيانات.أحمد بهاء الدين محرم-علم البيانات
م.64-مبادرة#تواصل_تطوير-خبير علم البيانات.أحمد بهاء الدين محرم-علم البياناتم.64-مبادرة#تواصل_تطوير-خبير علم البيانات.أحمد بهاء الدين محرم-علم البيانات
م.64-مبادرة#تواصل_تطوير-خبير علم البيانات.أحمد بهاء الدين محرم-علم البيانات
 
CVnew
CVnewCVnew
CVnew
 
Datascienceindia article
Datascienceindia articleDatascienceindia article
Datascienceindia article
 
Data science
Data scienceData science
Data science
 
Statement of Research Interests
Statement of Research InterestsStatement of Research Interests
Statement of Research Interests
 
It presentation final1
It presentation final1It presentation final1
It presentation final1
 
Information & data science (1) converted
Information & data science (1) convertedInformation & data science (1) converted
Information & data science (1) converted
 

Similar to On Understanding Data Scientists

Similar to On Understanding Data Scientists (20)

Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
 
1 data science with python
1 data science with python1 data science with python
1 data science with python
 
Data Science Unit1 AMET.pdf
Data Science Unit1 AMET.pdfData Science Unit1 AMET.pdf
Data Science Unit1 AMET.pdf
 
intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabad
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
data science training and placement
data science training and placementdata science training and placement
data science training and placement
 
online data science training
online data science trainingonline data science training
online data science training
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 
data science online training in hyderabad
data science online training in hyderabaddata science online training in hyderabad
data science online training in hyderabad
 
Best data science training in Hyderabad
Best data science training in HyderabadBest data science training in Hyderabad
Best data science training in Hyderabad
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
 
Data Science Training and Placement
Data Science Training and PlacementData Science Training and Placement
Data Science Training and Placement
 
Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)
 

More from Jácome Cunha

Type-Safe Evolution of 
Web Services
Type-Safe Evolution of 
Web ServicesType-Safe Evolution of 
Web Services
Type-Safe Evolution of 
Web Services
Jácome Cunha
 
Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14
Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14
Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14
Jácome Cunha
 
Talk at the Joint SSaaPP/FATBIT 2012 Workshop
Talk at the Joint SSaaPP/FATBIT 2012 WorkshopTalk at the Joint SSaaPP/FATBIT 2012 Workshop
Talk at the Joint SSaaPP/FATBIT 2012 Workshop
Jácome Cunha
 

More from Jácome Cunha (20)

Spreadsheet Engineering
Spreadsheet EngineeringSpreadsheet Engineering
Spreadsheet Engineering
 
Model-driven Spreadsheets
Model-driven SpreadsheetsModel-driven Spreadsheets
Model-driven Spreadsheets
 
Model-Driven Spreadsheet Development
Model-Driven Spreadsheet DevelopmentModel-Driven Spreadsheet Development
Model-Driven Spreadsheet Development
 
Energy Efficiency Across 
Programming Languages
Energy Efficiency Across 
Programming LanguagesEnergy Efficiency Across 
Programming Languages
Energy Efficiency Across 
Programming Languages
 
LMCC - 30 Anos
LMCC - 30 AnosLMCC - 30 Anos
LMCC - 30 Anos
 
Explaining Spreadsheets with Spreadsheets
Explaining Spreadsheets with SpreadsheetsExplaining Spreadsheets with Spreadsheets
Explaining Spreadsheets with Spreadsheets
 
Automatically Inferring ClassSheet Models from Spreadsheets
Automatically Inferring ClassSheet Models from SpreadsheetsAutomatically Inferring ClassSheet Models from Spreadsheets
Automatically Inferring ClassSheet Models from Spreadsheets
 
Systematic Spreadsheet Construction Processes @ VL/HCC 2017
Systematic Spreadsheet Construction Processes @ VL/HCC 2017Systematic Spreadsheet Construction Processes @ VL/HCC 2017
Systematic Spreadsheet Construction Processes @ VL/HCC 2017
 
jStanley: Placing a Green Thumb on Java Collections
jStanley: Placing a Green Thumb on  Java CollectionsjStanley: Placing a Green Thumb on  Java Collections
jStanley: Placing a Green Thumb on Java Collections
 
Type-Safe Evolution of 
Web Services
Type-Safe Evolution of 
Web ServicesType-Safe Evolution of 
Web Services
Type-Safe Evolution of 
Web Services
 
MDSheet – Model-Driven Spreadsheets
MDSheet – Model-Driven SpreadsheetsMDSheet – Model-Driven Spreadsheets
MDSheet – Model-Driven Spreadsheets
 
Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14
Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14
Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14
 
Summer School DSL 2013 - SpreadSheet Engineering
Summer School DSL 2013 - SpreadSheet EngineeringSummer School DSL 2013 - SpreadSheet Engineering
Summer School DSL 2013 - SpreadSheet Engineering
 
Talk at VL/HCC '12
Talk at VL/HCC '12Talk at VL/HCC '12
Talk at VL/HCC '12
 
Talk at QUATIC '12
Talk at QUATIC '12Talk at QUATIC '12
Talk at QUATIC '12
 
Talk at the Joint SSaaPP/FATBIT 2012 Workshop
Talk at the Joint SSaaPP/FATBIT 2012 WorkshopTalk at the Joint SSaaPP/FATBIT 2012 Workshop
Talk at the Joint SSaaPP/FATBIT 2012 Workshop
 
Talk
TalkTalk
Talk
 
Talk at IS-EUD '11
Talk at IS-EUD '11Talk at IS-EUD '11
Talk at IS-EUD '11
 
Talk at EUSPRIG '11
Talk at EUSPRIG '11Talk at EUSPRIG '11
Talk at EUSPRIG '11
 
Talk at VL/HCC '11
Talk at VL/HCC '11Talk at VL/HCC '11
Talk at VL/HCC '11
 

Recently uploaded

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
Silpa
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
ANSARKHAN96
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
Silpa
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 

Recently uploaded (20)

CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptx
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 

On Understanding Data Scientists

  • 1. IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) Dunedin, New Zealand 11 - 14 August 2020 On Understanding Data Scientists Paula Pereira University of Minho Portugal a77672@alunos.uminho.pt Jácome Cunha
 University of Minho & HASLab/INESC Tec Portugal jacome@di.uminho.pt João Paulo Fernandes CISUC, University of Coimbra Portugal jpf@dei.uc.pt
  • 2. Forbes “… each flight generating more than 30 times the amount of data the previous generation of wide-bodied jets produced.” “By 2026, annual data generation should reach 98 billion gigabytes, or 98 million terabytes, according to a 2016 estimate by Oliver Wyman.” https://www.forbes.com/sites/oliverwyman/2017/06/16/the-data-science-revolution-transforming-aviation/#67f14be67f6c
  • 3. Data Scientist: The Sexiest Job of the 21st Century https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
  • 4. We need to know more about data science and data scientists!
  • 5. • We conducted semi-structured interviews to 8 people (1 excluded - P3) • 3 Female, 5 Male • 1 Business Intelligence Manager
 1 Big Data Architect
 2 Data Analysts
 4 Data Scientists • Domains from music management to web development Participants INTERVIEWEES INFORMATION. ID Sex Age Job Title Education Level Domain P1 F 30 Data Analyst Master, Marketing Music Manag. P2 M 36 Business Intelligence Manager Master, Data Analysis and Decision Support Systems Retail P3 M 37 Big Data Architect Bachelor, Math and Computer Science Software Dev. P4 M 34 Data Scientist PhD, Electrical and Computer Engineering Telecom. P5 M 42 Data Scientist PhD, Data Mining Virtual Call Center P6 F 26 Data Analyst Master, Mathematics and Computation Web Dev. P7 F 32 Data Scientist Master, Mathematics Engineering Virtual Call Center P8 M 32 Data Scientist PhD, Evolutionary Bi- ology Telecom.
  • 7. Academic Background • MSc in Marketing (Bachelor in Hotel Management) • Bachelor in Economics • PhD in Electrical and Computer Engineering • MSc in Mathematics Engineering • MSc in Mathematics and Computation • PhD in Evolutionary Biology • PhD in Data Mining
  • 8. Implications • Some find the need to learn more
 “I had been working in auditing information systems for two years, and at that time I decided that data was ‘the thing’and I went to get a master’s degree in Data Analysis and Decision Support Systems.” — P2 • Background also defines the kinds of tasks performed • Only those with training in CS or engineering do tasks related to the creation of machine learning and deep learning models • The remaining dedicate themselves to more direct analysis, based on statistical parameters such as average, standard deviation, distributions
  • 9. Data Sources and Quality • Data sources • data generated internally by various teams • public data sources is also frequent • The need for data integration is significant • Data ranges from customer data, to operational data • Only one case (P2) reported using some kind of data quality metrics
  • 11. • R and Python (not new) • Choice made according to personal preferences and the type of tasks • Some cases (P5, P7), choose as a team so that all elements use the same technologies • Most participants do not use data analysis tools • These tools end up limiting their analysis • Does not happen when they produce their own code • However, P1 and P2 do a large part of their analysis using only MS Excel (very fast results) Tools and Programming Languages
  • 12. Difficulties • Lack of training in the field of data science • Access to information with quality and relevant to the problems in hands
 “I believe that access to quality information and information relevant to our problems is the greatest challenge.” – P4 • Lack of teammates 
 “In my case, being alone is a big limitation, ...and initially, it is very difficult to have the required business expertise to understand what are its needs.” – P6
  • 13. Yes, There Are More Difficulties • Often very difficult to convert business problems into data science issues • Development of stable and scalable code 
 “On a personal level, I think my biggest challenge is to write a stable and scalable code because my training is not very oriented for software engineering.” – P8 • Professionals being hired for data science positions that should be occupied by other type of professionals “Companies look at the market and, because there is a demand for data scientists, they also want to hire one. However, looking at the job’s requirements, their needs would be easily mitigated by other types of professionals.” – P7
  • 14. Almost… • Very unclear/eclectic definition of data science job • It’s important to clarify which are the different areas of data science • This would help the professionals who wish to work in this field to position themselves correctly
 “In my opinion, there are two main areas: technological and application data science. The data scientist of the future must know how to put himself in the right area of data science to avoid regretting what (s)he is doing.” – P2 • All participants agree that it is a great advantage to have people with different backgrounds in data science teams because • Bring different perspectives on the data
  • 15. So…
  • 16. Opportunities for the Research Community • We still need to learn more about data scientists • Only then we can help them • They are also (data science) end users • As we (have) help(ed) software engineers and developers end-users, we need to help these new end users • Tailored languages, tools, methodologies, … • For learning, data cleaning, analysis, visualization, integration, etc. 16
  • 17. IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) Dunedin, New Zealand 11 - 14 August 2020 On Understanding Data Scientists Paula Pereira University of Minho Portugal a77672@alunos.uminho.pt Jácome Cunha
 University of Minho & HASLab/INESC Tec Portugal jacome@di.uminho.pt João Fernandes CISUC, University of Coimbra Portugal jpf@dei.uc.pt