SlideShare uma empresa Scribd logo
1 de 65
Baixar para ler offline
Confidentiality Protection in Crowdsourcing
Simran Saxena
linkedin/in/simransaxena @simran_s21 fb.com/simran.s21
Dr. Ponnurangam Kumaraguru (Chair)
Dr. Alpana Dubey (Co-chair)
2
Thesis Committee
◆ Dr. Arun Balaji Buduru, IIIT Delhi
◆ Dr. Niharika Sachdeva, InfoEdge
◆ Dr. Alpana Dubey, Accenture Technology Labs
◆ Dr. Ponnurangam Kumaraguru, IIIT Delhi
Demo
◆ Proof of Concept
3
4
Outline
◆ Research Motivation
◆ Research Aim
◆ Crowdsourcing at the level of organizations
◆ Survey to understand confidentiality
◆ Unboxing a typical crowdsourcing task
◆ Understanding conversations between workers and task posters
◆ Protecting confidentiality loss in crowdsourcing
◆ Conclusion
5
Shift Towards a Gig-Economy: Benefits
6
Cost
Effective
Faster
delivery
Access to
a diverse
talent pool
Ref: http://isroset.org/pub_paper/IJSRCSE/2-IJSRCSE-0315.pdf
Benefits of Crowdsourcing
7
Cheaper and faster
60% of the times
Reduces cost by
30 - 80%
Reduced time-to-market
because of Parallelism
Ref: 1, 2
Happiness and Satisfaction Levels
8
74%
95%
72%
Ref: https://fieldnation.com/wp-content/uploads/2017/03/The_2016_Field_Nation_Freelancer_Study_R1V1__1_-2.pdf
Crowdsourcing as the Next Big Revolution
9
By 2020, freelancers
will form 43% of the
workforce in USA
Crowdsourcing in Organizations
10
Many popular
companies make use
of Crowdsourcing
Crowdsourcing and Software Development
11
Coding
Designing,
Prototyping
Testing
Data
Analysis
Mobile
Development
Web
Development
Crowdsourcing Platforms
12
And many more..
Stakeholders in Crowdsourcing
13
Task Poster -> tp
Worker -> w
Company/Organization -> c
Components of a Crowdsourcing Task
◆ ~ 20 attributes per task
○ Task ID
○ Task Title
○ Task Description
○ Task Category
○ Task Type
○ Task Workload
○ ….
14
Resources shared commonly
15
Textual Content: Title,
Description, Documents,
Read-me files.
Database
UI Screens
Images
Code
Sample Software Development Task [1]
16
Why is it a problem?
17
(A)
(B)
18
(A)
(B)
Exposed data of the Customers:
● Name
● Address
● Email ID
● Phone number
● Details of availed services
○ Date
○ Time
○ Cost
Why is it a problem?
19
(B)(A)
Why is it a problem?
20
(A)
(B)
Impacts of Exploited Vulnerability
◆ Economic Exploitation
◆ Information Theft
◆ Intrusion on personal privacy
◆ Social engineering
◆ System penetration/attack
◆ Information Bribery
◆ Sale of personal information
◆ System sabotage
◆ Unauthorized system access
◆ CIA Loss
21
Research Aim:
Given Crowdsourced Software Development:
◆ Identify
○ stages involved in confidentiality loss
○ critical attributes of a task
○ type of critical data
◆ Develop techniques to analyze tasks
22
INPUT
Task Title
Task Description
OUTPUT
A list of sensitive
terms
Outline
◆ Research Motivation
◆ Research Aim
◆ Crowdsourcing at the level of organizations
◆ Survey to understand confidentiality
◆ Unboxing a typical crowdsourcing task
◆ Understanding conversations between workers and task posters
◆ Protecting confidentiality loss in crowdsourcing
◆ Conclusion
23
The Crowdsourcing Cycle
24
The Crowdsourcing Cycle
25
The Crowdsourcing Cycle
26
The Crowdsourcing Cycle
27
The Crowdsourcing Cycle
28
The Crowdsourcing Cycle
29
The Crowdsourcing Cycle
30
The Crowdsourcing Cycle
31
The Crowdsourcing Cycle
32
Stages from which information leak could
occur
◆ PII details are sensitive
◆ Sharing a database publicly may compromise with confidentiality
◆ Sensitive information in code
◆ Sensitive components in the comments
Survey to validate some assumptions
33
Survey to validate some assumptions
◆ Wireframes and UX examples may give out information about the company posting
the job: logo
◆ Market of the end result of the task can be inferred given a task
◆ Domain of the task can be identified from the task, given the type details it is asking
for
34
Survey to understand Confidentiality
Suppose you are a project manager in a multinational
organization and you need to get some job done by an external
freelancer. These jobs require sharing some details like sample
code, documents, UX wireframes, etc. and some information
might be sensitive in these resources. You need to answer the
following questions considering the above scenario.
35
Likert scale based Questions
36Type of Data
SensitivityLevelsAverage Sensitivity Levels of Sharing Various Details
◆ 6.4% of the respondents believed that sharing a database publicly
is not a sensitive activity
◆ 2.2% people think that sharing API Keys is okay
Some observations
37
Situational Questions
38
Sample Task Question
This is a sample task posted by a Task Poster, according to you, what all sensitive data does it
contain?
39
I am using a online platform called tomakeanapp.com, there is a module which you can upload a picture but I HAVE TO DO IT one
by one.
I need a imacro or anyscript that could upload a batch of photo from a folder. No other requirement. That's it. It is dead simple and
first come first serve. Please apply
If you want to know what i am saying.
Go to testmyapp.com
Username: trialappdemo
Password: password134
>Click my project> Click "Edit", then you will see "TraiApp" in YOUR ACTIVITY
Then click "Edit" and you will see the upload screen I talked above
Sample Task Question
This is a sample task posted by a Task Poster, according to you, what all sensitive data does it
contain?
40
I am using a online platform called tomakeanapp.com, there is a module which you can upload a picture but I HAVE TO DO IT one
by one.
I need a imacro or anyscript that could upload a batch of photo from a folder. No other requirement. That's it. It is dead simple and
first come first serve. Please apply
If you want to know what i am saying.
Go to testmyapp.com
Username: trialappdemo
Password: password134
>Click my project> Click "Edit", then you will see "TraiApp" in YOUR ACTIVITY
Then click "Edit" and you will see the upload screen I talked above
Sample Code Question
Following is a sample code snippet, point out the parts in the code that you think are
sensitive:
41
Sample Code Question
Following is a sample code snippet, point out the parts in the code that you think are
sensitive:
42
A wireframe of an App has been attached, what all can you infer
from it?
43
A wireframe of an App has been attached, what all can you infer
from it?
44
What can you infer about the type of industry that this app is for?
45
What can you infer about the type of industry that this app is for?
46
1. Master Database of a company
2. Personal Information:
a. Name
b. Email ID
c. SSN / Unique Identification Number
d. Address
e. Date of Birth
f. Employee ID
g. Health related information
3. Context for the given task
4. Company related details:
a. Details of the clients of a company
b. Details of a company
c. Details of a company’s future projects
d. Any sort of financial data
e. Proprietary Data
Other Confidential Entities
47
7. Credentials of any sort: Usernames and Passwords
8. Bank Account Number / Credit or Debit Card Details
9. API Keys
10. Server Details
11. Google Play keys / App Secrets / OAuth Tokens
12. Named Entities
13. Variable names / Class Names
14. Test Inputs
15. Sensitive Datasets where the users can be uniquely identified
16. Images and Diagrams with confidential information
Other Confidential Entities
48
Dataset of 65,000 tasks
◆ 4 unique Task Categories:
○ Data Science & Analytics
○ IT & Networking
○ Web, Mobile & Software Dev
○ Writing
◆ 22 Unique Task Subcategories
○ Game Development
○ Data Visualization
○ Network and System Administration
○ Machine Learning
○ ...
49
Analysis of attributes of a task
◆ ~ 20 fields per task
○ Task ID
○ Task Title
○ Task Description
○ Task Category
○ Task Sub-Category
○ Date of creation of Task
○ Task Type
○ Task Workload
○ Task Duration
○ Task Budget
○ Preferred Location of the worker
50
○ Preferred range of feedback score
○ Level of English Skills required
○ Payment range
○ Range of working hours per day
○ Requirement of Cover Letter
○ Requirement of Portfolio
○ Candidates Registered for the task
○ Skills required for the task
◆ Crucial attributes:
○ Task Title
○ Task Description
◆ Observations:
○ 23% tasks contained links
○ 2.2% tasks had credentials
○ 2.46% tasks had GitHub links
Analysis of Tasks
51
Outline
◆ Research Motivation
◆ Research Aim
◆ Crowdsourcing at the level of organizations
◆ Survey to understand confidentiality
◆ Unboxing a typical crowdsourcing task
◆ Understanding conversations between workers and task posters
◆ Protecting confidentiality loss in crowdsourcing
◆ Conclusion
52
Understanding Conversations between Workers
and Task Posters
◆ ~ 214 lines per task (average)
◆ ~30 files exchanged [min: 0, max: 90]
◆ 42% tasks involved Skype calls
◆ 14% involved sharing data over cloud
53
54
Analysis of a Task for Sensitivity
Pipeline
55
List of sensitive
terms
Algorithm
56
57
Testing
Induced sensitivity in the 65,000 tasks dataset
◆ Curated a list of sensitive terms:
○ Names of companies
○ Email Addresses from the Enron Dataset
○ Dummy API Keys, Passwords, etc
○ List of URLs
○ List of countries and cities
○ Randomly generated usernames
○ ...
58
Performance Metrics
59
Precision Recall F1 Score
0.68 0.82 0.74
Real World Application
◆ Analyze
○ Tasks
○ Content before putting on cloud
○ Content before sharing
60
Conclusion
◆ Narrow down on the crucial stages in the crowdsourcing cycle
◆ Enumerate critical attributes in a task
◆ Highlight type of confidential data
◆ NLP and Rule based algorithm to detect confidentiality loss
61
Challenges, Limitations and Future Work
◆ Dataset availability
◆ Lack of labeled data
◆ Expand the algorithm
○ Cater to images, databases, etc
◆ Incorporate Machine Learning techniques for classification
◆ Sanitization
◆ More fine grained analysis
62
Acknowledgement
◆ Committee Members
◆ Abhinav and Sakshi, Accenture Labs Bangalore
◆ Indira, Gurpriya, Arpit, Shubham, Sonu, Divyansh
◆ Members of Precog family
◆ Family and friends
63
References
◆ http://www.timbroder.com/2015/01/my-2375-amazon-ec2-m
istake.html
◆ https://www.topcoder.com/about-topcoder/customer-stories/
◆ https://www.merriam-webster.com/dictionary/crowdsourcing
◆ https://www.figure-eight.com/
◆ https://www.flaticon.com/authors/freepik
64
Thanks!
simran13104@iiitd.ac.in
@simran_s21
65

Mais conteúdo relacionado

Semelhante a Simran confidentiality protection in crowdsourcing

AI Talks Live - ML.NET and NLP (with ONNX)
AI Talks Live - ML.NET and NLP (with ONNX)AI Talks Live - ML.NET and NLP (with ONNX)
AI Talks Live - ML.NET and NLP (with ONNX)Mauro Bennici
 
Agile London at Ticketmaster
Agile London at TicketmasterAgile London at Ticketmaster
Agile London at TicketmasterBilly Jenkins
 
Introduction to Machine Learning with Python ( PDFDrive.com ).pdf
Introduction to Machine Learning with Python ( PDFDrive.com ).pdfIntroduction to Machine Learning with Python ( PDFDrive.com ).pdf
Introduction to Machine Learning with Python ( PDFDrive.com ).pdfbisan3
 
Using Product Box to Build the Complete Developer
Using Product Box to Build the Complete DeveloperUsing Product Box to Build the Complete Developer
Using Product Box to Build the Complete DeveloperLuke Hohmann
 
Organisering av digitale prosjekt: Hva har IT-bransjen lært om store prosjekter?
Organisering av digitale prosjekt: Hva har IT-bransjen lært om store prosjekter?Organisering av digitale prosjekt: Hva har IT-bransjen lært om store prosjekter?
Organisering av digitale prosjekt: Hva har IT-bransjen lært om store prosjekter?Torgeir Dingsøyr
 
OSFair2017 Training | OpenMinTeD platform training
OSFair2017 Training | OpenMinTeD platform trainingOSFair2017 Training | OpenMinTeD platform training
OSFair2017 Training | OpenMinTeD platform trainingOpen Science Fair
 
Software craftsmanship and you a strong foundation in your team
Software craftsmanship and you a strong foundation in your teamSoftware craftsmanship and you a strong foundation in your team
Software craftsmanship and you a strong foundation in your teamDattatray Kale
 
The Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionThe Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionInside Analysis
 
Women in Automation - Intro to Studio Session 1
Women in Automation - Intro to Studio Session 1Women in Automation - Intro to Studio Session 1
Women in Automation - Intro to Studio Session 1Cristina Vidu
 
Unlocking AI Potential: Leveraging PIA Processes for Comprehensive Impact Ass...
Unlocking AI Potential: Leveraging PIA Processes for Comprehensive Impact Ass...Unlocking AI Potential: Leveraging PIA Processes for Comprehensive Impact Ass...
Unlocking AI Potential: Leveraging PIA Processes for Comprehensive Impact Ass...TrustArc
 
😊 Good Closing Paragraph. What Are The Best Ways To Start A Conclusion .pdf
😊 Good Closing Paragraph. What Are The Best Ways To Start A Conclusion .pdf😊 Good Closing Paragraph. What Are The Best Ways To Start A Conclusion .pdf
😊 Good Closing Paragraph. What Are The Best Ways To Start A Conclusion .pdfAngela Baxter
 
😊 Good Closing Paragraph. What Are The Best Ways To Start A Conclusion .pdf
😊 Good Closing Paragraph. What Are The Best Ways To Start A Conclusion .pdf😊 Good Closing Paragraph. What Are The Best Ways To Start A Conclusion .pdf
😊 Good Closing Paragraph. What Are The Best Ways To Start A Conclusion .pdfBrooke Lord
 
Autodiscovery or The long tail of open data
Autodiscovery or The long tail of open dataAutodiscovery or The long tail of open data
Autodiscovery or The long tail of open dataConnected Data World
 
Securing The Reality of Multiple Cloud Apps: Pandora's Story
Securing The Reality of Multiple Cloud Apps: Pandora's StorySecuring The Reality of Multiple Cloud Apps: Pandora's Story
Securing The Reality of Multiple Cloud Apps: Pandora's StoryCloudLock
 
DSDT Meetup March 2019
DSDT Meetup March 2019DSDT Meetup March 2019
DSDT Meetup March 2019DSDT_MTL
 
Business Analyst Series 2023 - Week 1 Session 1
Business Analyst Series 2023 -  Week 1 Session 1Business Analyst Series 2023 -  Week 1 Session 1
Business Analyst Series 2023 - Week 1 Session 1DianaGray10
 
Adversary Emulation and Red Team Exercises - EDUCAUSE
Adversary Emulation and Red Team Exercises - EDUCAUSEAdversary Emulation and Red Team Exercises - EDUCAUSE
Adversary Emulation and Red Team Exercises - EDUCAUSEJorge Orchilles
 

Semelhante a Simran confidentiality protection in crowdsourcing (20)

AI Talks Live - ML.NET and NLP (with ONNX)
AI Talks Live - ML.NET and NLP (with ONNX)AI Talks Live - ML.NET and NLP (with ONNX)
AI Talks Live - ML.NET and NLP (with ONNX)
 
Agile London at Ticketmaster
Agile London at TicketmasterAgile London at Ticketmaster
Agile London at Ticketmaster
 
Introduction to Machine Learning with Python ( PDFDrive.com ).pdf
Introduction to Machine Learning with Python ( PDFDrive.com ).pdfIntroduction to Machine Learning with Python ( PDFDrive.com ).pdf
Introduction to Machine Learning with Python ( PDFDrive.com ).pdf
 
Using Product Box to Build the Complete Developer
Using Product Box to Build the Complete DeveloperUsing Product Box to Build the Complete Developer
Using Product Box to Build the Complete Developer
 
Organisering av digitale prosjekt: Hva har IT-bransjen lært om store prosjekter?
Organisering av digitale prosjekt: Hva har IT-bransjen lært om store prosjekter?Organisering av digitale prosjekt: Hva har IT-bransjen lært om store prosjekter?
Organisering av digitale prosjekt: Hva har IT-bransjen lært om store prosjekter?
 
OSFair2017 Training | OpenMinTeD platform training
OSFair2017 Training | OpenMinTeD platform trainingOSFair2017 Training | OpenMinTeD platform training
OSFair2017 Training | OpenMinTeD platform training
 
Software craftsmanship and you a strong foundation in your team
Software craftsmanship and you a strong foundation in your teamSoftware craftsmanship and you a strong foundation in your team
Software craftsmanship and you a strong foundation in your team
 
The Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionThe Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop Adoption
 
Women in Automation - Intro to Studio Session 1
Women in Automation - Intro to Studio Session 1Women in Automation - Intro to Studio Session 1
Women in Automation - Intro to Studio Session 1
 
Unlocking AI Potential: Leveraging PIA Processes for Comprehensive Impact Ass...
Unlocking AI Potential: Leveraging PIA Processes for Comprehensive Impact Ass...Unlocking AI Potential: Leveraging PIA Processes for Comprehensive Impact Ass...
Unlocking AI Potential: Leveraging PIA Processes for Comprehensive Impact Ass...
 
CYBRScore Course Catalog
CYBRScore Course CatalogCYBRScore Course Catalog
CYBRScore Course Catalog
 
😊 Good Closing Paragraph. What Are The Best Ways To Start A Conclusion .pdf
😊 Good Closing Paragraph. What Are The Best Ways To Start A Conclusion .pdf😊 Good Closing Paragraph. What Are The Best Ways To Start A Conclusion .pdf
😊 Good Closing Paragraph. What Are The Best Ways To Start A Conclusion .pdf
 
😊 Good Closing Paragraph. What Are The Best Ways To Start A Conclusion .pdf
😊 Good Closing Paragraph. What Are The Best Ways To Start A Conclusion .pdf😊 Good Closing Paragraph. What Are The Best Ways To Start A Conclusion .pdf
😊 Good Closing Paragraph. What Are The Best Ways To Start A Conclusion .pdf
 
Autodiscovery or The long tail of open data
Autodiscovery or The long tail of open dataAutodiscovery or The long tail of open data
Autodiscovery or The long tail of open data
 
Beyond User Research
Beyond User ResearchBeyond User Research
Beyond User Research
 
Securing The Reality of Multiple Cloud Apps: Pandora's Story
Securing The Reality of Multiple Cloud Apps: Pandora's StorySecuring The Reality of Multiple Cloud Apps: Pandora's Story
Securing The Reality of Multiple Cloud Apps: Pandora's Story
 
automatic extraction of job information from job vacancies
automatic extraction of job information from job vacanciesautomatic extraction of job information from job vacancies
automatic extraction of job information from job vacancies
 
DSDT Meetup March 2019
DSDT Meetup March 2019DSDT Meetup March 2019
DSDT Meetup March 2019
 
Business Analyst Series 2023 - Week 1 Session 1
Business Analyst Series 2023 -  Week 1 Session 1Business Analyst Series 2023 -  Week 1 Session 1
Business Analyst Series 2023 - Week 1 Session 1
 
Adversary Emulation and Red Team Exercises - EDUCAUSE
Adversary Emulation and Red Team Exercises - EDUCAUSEAdversary Emulation and Red Team Exercises - EDUCAUSE
Adversary Emulation and Red Team Exercises - EDUCAUSE
 

Mais de IIIT Hyderabad

Responsible & Safe AI Systems at ACM India ROCS at IIT Bombay
Responsible & Safe AI Systems at ACM India ROCS at IIT BombayResponsible & Safe AI Systems at ACM India ROCS at IIT Bombay
Responsible & Safe AI Systems at ACM India ROCS at IIT BombayIIIT Hyderabad
 
International Collaboration: Experiences, Challenges, Success stories
International Collaboration: Experiences, Challenges, Success storiesInternational Collaboration: Experiences, Challenges, Success stories
International Collaboration: Experiences, Challenges, Success storiesIIIT Hyderabad
 
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBiasResponsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBiasIIIT Hyderabad
 
Identify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake NewsIdentify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake NewsIIIT Hyderabad
 
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafetyData Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafetyIIIT Hyderabad
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...IIIT Hyderabad
 
Beyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic AmbiguityBeyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic AmbiguityIIIT Hyderabad
 
Data Science for Social Good: #LegalNLP #AlgorithmicBias...
Data Science for Social Good:                      #LegalNLP #AlgorithmicBias...Data Science for Social Good:                      #LegalNLP #AlgorithmicBias...
Data Science for Social Good: #LegalNLP #AlgorithmicBias...IIIT Hyderabad
 
How to Write a (Good) Research Paper
How to Write a (Good) Research Paper How to Write a (Good) Research Paper
How to Write a (Good) Research Paper IIIT Hyderabad
 
Data Science for Social Good: #LegalNLP #AlgorithmicBias
Data Science for Social Good: #LegalNLP #AlgorithmicBiasData Science for Social Good: #LegalNLP #AlgorithmicBias
Data Science for Social Good: #LegalNLP #AlgorithmicBiasIIIT Hyderabad
 
Social Computing Research in India
Social Computing Research in IndiaSocial Computing Research in India
Social Computing Research in IndiaIIIT Hyderabad
 
Social Computing Research in India
Social Computing Research in IndiaSocial Computing Research in India
Social Computing Research in IndiaIIIT Hyderabad
 
Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...IIIT Hyderabad
 
Privacy. Winter School on “Topics in Digital Trust”. IIT Bombay
Privacy. Winter School on “Topics in Digital Trust”. IIT BombayPrivacy. Winter School on “Topics in Digital Trust”. IIT Bombay
Privacy. Winter School on “Topics in Digital Trust”. IIT BombayIIIT Hyderabad
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...IIIT Hyderabad
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...IIIT Hyderabad
 
Leveraging Social Media for Financial Advice
Leveraging Social Media for Financial AdviceLeveraging Social Media for Financial Advice
Leveraging Social Media for Financial AdviceIIIT Hyderabad
 
Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...IIIT Hyderabad
 
A Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian LanguagesA Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian LanguagesIIIT Hyderabad
 

Mais de IIIT Hyderabad (20)

Responsible & Safe AI Systems at ACM India ROCS at IIT Bombay
Responsible & Safe AI Systems at ACM India ROCS at IIT BombayResponsible & Safe AI Systems at ACM India ROCS at IIT Bombay
Responsible & Safe AI Systems at ACM India ROCS at IIT Bombay
 
International Collaboration: Experiences, Challenges, Success stories
International Collaboration: Experiences, Challenges, Success storiesInternational Collaboration: Experiences, Challenges, Success stories
International Collaboration: Experiences, Challenges, Success stories
 
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBiasResponsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
 
Identify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake NewsIdentify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake News
 
#ChatGPT #ResponsibleAI
#ChatGPT #ResponsibleAI#ChatGPT #ResponsibleAI
#ChatGPT #ResponsibleAI
 
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafetyData Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...
 
Beyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic AmbiguityBeyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic Ambiguity
 
Data Science for Social Good: #LegalNLP #AlgorithmicBias...
Data Science for Social Good:                      #LegalNLP #AlgorithmicBias...Data Science for Social Good:                      #LegalNLP #AlgorithmicBias...
Data Science for Social Good: #LegalNLP #AlgorithmicBias...
 
How to Write a (Good) Research Paper
How to Write a (Good) Research Paper How to Write a (Good) Research Paper
How to Write a (Good) Research Paper
 
Data Science for Social Good: #LegalNLP #AlgorithmicBias
Data Science for Social Good: #LegalNLP #AlgorithmicBiasData Science for Social Good: #LegalNLP #AlgorithmicBias
Data Science for Social Good: #LegalNLP #AlgorithmicBias
 
Social Computing Research in India
Social Computing Research in IndiaSocial Computing Research in India
Social Computing Research in India
 
Social Computing Research in India
Social Computing Research in IndiaSocial Computing Research in India
Social Computing Research in India
 
Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...
 
Privacy. Winter School on “Topics in Digital Trust”. IIT Bombay
Privacy. Winter School on “Topics in Digital Trust”. IIT BombayPrivacy. Winter School on “Topics in Digital Trust”. IIT Bombay
Privacy. Winter School on “Topics in Digital Trust”. IIT Bombay
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...
 
Leveraging Social Media for Financial Advice
Leveraging Social Media for Financial AdviceLeveraging Social Media for Financial Advice
Leveraging Social Media for Financial Advice
 
Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...
 
A Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian LanguagesA Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian Languages
 

Último

Levelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodLevelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodManicka Mamallan Andavar
 
KCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosKCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosVictor Morales
 
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptJohnWilliam111370
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communicationpanditadesh123
 
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptxPython Programming for basic beginners.pptx
Python Programming for basic beginners.pptxmohitesoham12
 
Javier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier Fernández Muñoz
 
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESCME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESkarthi keyan
 
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTFUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTSneha Padhiar
 
Novel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsNovel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsResearcher Researcher
 
Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewArtificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewsandhya757531
 
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书rnrncn29
 
List of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfList of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfisabel213075
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSsandhya757531
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Coursebim.edu.pl
 
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.elesangwon
 
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptxTriangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptxRomil Mishra
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Sumanth A
 

Último (20)

Levelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodLevelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument method
 
KCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosKCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitos
 
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communication
 
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptxPython Programming for basic beginners.pptx
Python Programming for basic beginners.pptx
 
Javier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptx
 
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESCME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
 
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTFUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
 
Novel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsNovel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending Actuators
 
Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewArtificial Intelligence in Power System overview
Artificial Intelligence in Power System overview
 
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
 
List of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfList of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdf
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Course
 
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
 
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptxTriangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptx
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
 

Simran confidentiality protection in crowdsourcing

  • 1. Confidentiality Protection in Crowdsourcing Simran Saxena linkedin/in/simransaxena @simran_s21 fb.com/simran.s21 Dr. Ponnurangam Kumaraguru (Chair) Dr. Alpana Dubey (Co-chair)
  • 2. 2 Thesis Committee ◆ Dr. Arun Balaji Buduru, IIIT Delhi ◆ Dr. Niharika Sachdeva, InfoEdge ◆ Dr. Alpana Dubey, Accenture Technology Labs ◆ Dr. Ponnurangam Kumaraguru, IIIT Delhi
  • 3. Demo ◆ Proof of Concept 3
  • 4. 4
  • 5. Outline ◆ Research Motivation ◆ Research Aim ◆ Crowdsourcing at the level of organizations ◆ Survey to understand confidentiality ◆ Unboxing a typical crowdsourcing task ◆ Understanding conversations between workers and task posters ◆ Protecting confidentiality loss in crowdsourcing ◆ Conclusion 5
  • 6. Shift Towards a Gig-Economy: Benefits 6 Cost Effective Faster delivery Access to a diverse talent pool Ref: http://isroset.org/pub_paper/IJSRCSE/2-IJSRCSE-0315.pdf
  • 7. Benefits of Crowdsourcing 7 Cheaper and faster 60% of the times Reduces cost by 30 - 80% Reduced time-to-market because of Parallelism Ref: 1, 2
  • 8. Happiness and Satisfaction Levels 8 74% 95% 72% Ref: https://fieldnation.com/wp-content/uploads/2017/03/The_2016_Field_Nation_Freelancer_Study_R1V1__1_-2.pdf
  • 9. Crowdsourcing as the Next Big Revolution 9 By 2020, freelancers will form 43% of the workforce in USA
  • 10. Crowdsourcing in Organizations 10 Many popular companies make use of Crowdsourcing
  • 11. Crowdsourcing and Software Development 11 Coding Designing, Prototyping Testing Data Analysis Mobile Development Web Development
  • 13. Stakeholders in Crowdsourcing 13 Task Poster -> tp Worker -> w Company/Organization -> c
  • 14. Components of a Crowdsourcing Task ◆ ~ 20 attributes per task ○ Task ID ○ Task Title ○ Task Description ○ Task Category ○ Task Type ○ Task Workload ○ …. 14
  • 15. Resources shared commonly 15 Textual Content: Title, Description, Documents, Read-me files. Database UI Screens Images Code
  • 17. Why is it a problem? 17 (A) (B)
  • 18. 18 (A) (B) Exposed data of the Customers: ● Name ● Address ● Email ID ● Phone number ● Details of availed services ○ Date ○ Time ○ Cost
  • 19. Why is it a problem? 19 (B)(A)
  • 20. Why is it a problem? 20 (A) (B)
  • 21. Impacts of Exploited Vulnerability ◆ Economic Exploitation ◆ Information Theft ◆ Intrusion on personal privacy ◆ Social engineering ◆ System penetration/attack ◆ Information Bribery ◆ Sale of personal information ◆ System sabotage ◆ Unauthorized system access ◆ CIA Loss 21
  • 22. Research Aim: Given Crowdsourced Software Development: ◆ Identify ○ stages involved in confidentiality loss ○ critical attributes of a task ○ type of critical data ◆ Develop techniques to analyze tasks 22 INPUT Task Title Task Description OUTPUT A list of sensitive terms
  • 23. Outline ◆ Research Motivation ◆ Research Aim ◆ Crowdsourcing at the level of organizations ◆ Survey to understand confidentiality ◆ Unboxing a typical crowdsourcing task ◆ Understanding conversations between workers and task posters ◆ Protecting confidentiality loss in crowdsourcing ◆ Conclusion 23
  • 32. The Crowdsourcing Cycle 32 Stages from which information leak could occur
  • 33. ◆ PII details are sensitive ◆ Sharing a database publicly may compromise with confidentiality ◆ Sensitive information in code ◆ Sensitive components in the comments Survey to validate some assumptions 33
  • 34. Survey to validate some assumptions ◆ Wireframes and UX examples may give out information about the company posting the job: logo ◆ Market of the end result of the task can be inferred given a task ◆ Domain of the task can be identified from the task, given the type details it is asking for 34
  • 35. Survey to understand Confidentiality Suppose you are a project manager in a multinational organization and you need to get some job done by an external freelancer. These jobs require sharing some details like sample code, documents, UX wireframes, etc. and some information might be sensitive in these resources. You need to answer the following questions considering the above scenario. 35
  • 36. Likert scale based Questions 36Type of Data SensitivityLevelsAverage Sensitivity Levels of Sharing Various Details
  • 37. ◆ 6.4% of the respondents believed that sharing a database publicly is not a sensitive activity ◆ 2.2% people think that sharing API Keys is okay Some observations 37
  • 39. Sample Task Question This is a sample task posted by a Task Poster, according to you, what all sensitive data does it contain? 39 I am using a online platform called tomakeanapp.com, there is a module which you can upload a picture but I HAVE TO DO IT one by one. I need a imacro or anyscript that could upload a batch of photo from a folder. No other requirement. That's it. It is dead simple and first come first serve. Please apply If you want to know what i am saying. Go to testmyapp.com Username: trialappdemo Password: password134 >Click my project> Click "Edit", then you will see "TraiApp" in YOUR ACTIVITY Then click "Edit" and you will see the upload screen I talked above
  • 40. Sample Task Question This is a sample task posted by a Task Poster, according to you, what all sensitive data does it contain? 40 I am using a online platform called tomakeanapp.com, there is a module which you can upload a picture but I HAVE TO DO IT one by one. I need a imacro or anyscript that could upload a batch of photo from a folder. No other requirement. That's it. It is dead simple and first come first serve. Please apply If you want to know what i am saying. Go to testmyapp.com Username: trialappdemo Password: password134 >Click my project> Click "Edit", then you will see "TraiApp" in YOUR ACTIVITY Then click "Edit" and you will see the upload screen I talked above
  • 41. Sample Code Question Following is a sample code snippet, point out the parts in the code that you think are sensitive: 41
  • 42. Sample Code Question Following is a sample code snippet, point out the parts in the code that you think are sensitive: 42
  • 43. A wireframe of an App has been attached, what all can you infer from it? 43
  • 44. A wireframe of an App has been attached, what all can you infer from it? 44
  • 45. What can you infer about the type of industry that this app is for? 45
  • 46. What can you infer about the type of industry that this app is for? 46
  • 47. 1. Master Database of a company 2. Personal Information: a. Name b. Email ID c. SSN / Unique Identification Number d. Address e. Date of Birth f. Employee ID g. Health related information 3. Context for the given task 4. Company related details: a. Details of the clients of a company b. Details of a company c. Details of a company’s future projects d. Any sort of financial data e. Proprietary Data Other Confidential Entities 47
  • 48. 7. Credentials of any sort: Usernames and Passwords 8. Bank Account Number / Credit or Debit Card Details 9. API Keys 10. Server Details 11. Google Play keys / App Secrets / OAuth Tokens 12. Named Entities 13. Variable names / Class Names 14. Test Inputs 15. Sensitive Datasets where the users can be uniquely identified 16. Images and Diagrams with confidential information Other Confidential Entities 48
  • 49. Dataset of 65,000 tasks ◆ 4 unique Task Categories: ○ Data Science & Analytics ○ IT & Networking ○ Web, Mobile & Software Dev ○ Writing ◆ 22 Unique Task Subcategories ○ Game Development ○ Data Visualization ○ Network and System Administration ○ Machine Learning ○ ... 49
  • 50. Analysis of attributes of a task ◆ ~ 20 fields per task ○ Task ID ○ Task Title ○ Task Description ○ Task Category ○ Task Sub-Category ○ Date of creation of Task ○ Task Type ○ Task Workload ○ Task Duration ○ Task Budget ○ Preferred Location of the worker 50 ○ Preferred range of feedback score ○ Level of English Skills required ○ Payment range ○ Range of working hours per day ○ Requirement of Cover Letter ○ Requirement of Portfolio ○ Candidates Registered for the task ○ Skills required for the task
  • 51. ◆ Crucial attributes: ○ Task Title ○ Task Description ◆ Observations: ○ 23% tasks contained links ○ 2.2% tasks had credentials ○ 2.46% tasks had GitHub links Analysis of Tasks 51
  • 52. Outline ◆ Research Motivation ◆ Research Aim ◆ Crowdsourcing at the level of organizations ◆ Survey to understand confidentiality ◆ Unboxing a typical crowdsourcing task ◆ Understanding conversations between workers and task posters ◆ Protecting confidentiality loss in crowdsourcing ◆ Conclusion 52
  • 53. Understanding Conversations between Workers and Task Posters ◆ ~ 214 lines per task (average) ◆ ~30 files exchanged [min: 0, max: 90] ◆ 42% tasks involved Skype calls ◆ 14% involved sharing data over cloud 53
  • 54. 54 Analysis of a Task for Sensitivity
  • 57. 57
  • 58. Testing Induced sensitivity in the 65,000 tasks dataset ◆ Curated a list of sensitive terms: ○ Names of companies ○ Email Addresses from the Enron Dataset ○ Dummy API Keys, Passwords, etc ○ List of URLs ○ List of countries and cities ○ Randomly generated usernames ○ ... 58
  • 59. Performance Metrics 59 Precision Recall F1 Score 0.68 0.82 0.74
  • 60. Real World Application ◆ Analyze ○ Tasks ○ Content before putting on cloud ○ Content before sharing 60
  • 61. Conclusion ◆ Narrow down on the crucial stages in the crowdsourcing cycle ◆ Enumerate critical attributes in a task ◆ Highlight type of confidential data ◆ NLP and Rule based algorithm to detect confidentiality loss 61
  • 62. Challenges, Limitations and Future Work ◆ Dataset availability ◆ Lack of labeled data ◆ Expand the algorithm ○ Cater to images, databases, etc ◆ Incorporate Machine Learning techniques for classification ◆ Sanitization ◆ More fine grained analysis 62
  • 63. Acknowledgement ◆ Committee Members ◆ Abhinav and Sakshi, Accenture Labs Bangalore ◆ Indira, Gurpriya, Arpit, Shubham, Sonu, Divyansh ◆ Members of Precog family ◆ Family and friends 63
  • 64. References ◆ http://www.timbroder.com/2015/01/my-2375-amazon-ec2-m istake.html ◆ https://www.topcoder.com/about-topcoder/customer-stories/ ◆ https://www.merriam-webster.com/dictionary/crowdsourcing ◆ https://www.figure-eight.com/ ◆ https://www.flaticon.com/authors/freepik 64