SlideShare uma empresa Scribd logo
1 de 67
Baixar para ler offline
Search & Data Mining 
SKILLS SEMINAR 
Master of European History, University of Luxembourg, 11 December 2014 
Gerben Zaagsma 
Lichtenberg-Kolleg,
Overview 
1. 
2. T 
3. Practical exercises 
1. Introduction search & data mining
Code yourself… …or use existing tools
Why historians should be 
interested: 
Old New CHANGE 
Analogue resources Digital resources 
SCALE 
Small data Big data 
Close reading Distant reading TECHNOLOGY
the Big Data revolution? 
Big data and claims about a paradigm change in the 
humanities
culturomics and Google ngrams
the Big Data revolution? 
Big data and claims about a paradigm change in the 
humanities 
Data driven history
the Big Data revolution? 
Big data and claims about a paradigm change in the 
humanities 
Data driven history 
Patterns and structures: a new essentialism?
the Big Data revolution? 
Big data and claims about a paradigm change in the 
humanities 
Data driven history 
Patterns and structures: a new essentialism? 
Based upon changes of scale & method: humanities 
supposedly becoming more ‘scientific’ > results can be 
checked and replicated, but can they? Interpretation.
the Big Data revolution? 
Big data and claims about a paradigm change in the 
humanities 
Data driven history 
Patterns and structures: a new essentialism? 
Based upon changes of scale & method: humanities 
supposedly becoming more ‘scientific’ > results can be 
checked and replicated, but can they? Interpretation. 
Politics: funding & valorisation
“One of the problems confronting data enthusiasts in 
the humanities is that we feel a need to convince our 
more old-fashioned colleagues about what can be done. 
But our role as advocates of data shouldn't mean that 
we lose our critical sense as scholars. 
[....] there is a risk that we look more carefully at the 
technical components of the datasets than the 
historical context of the information that they represent. 
Andrew Prescott, ‘The Deceptions of Data’, Digital Riffs (13 
January 2013).
Frédéric Clavert, ‘Lecture des sources historiennes à l’ère 
numérique’ (14 November 2012) 
Integrate 
approaches 
& methods/ 
hybridity
1. SEARCH
Google/ Bing/ Yahoo 
er is veel meer ...
zoeken op Internet algemeen: 
Google 
er is veel meer dan Google 
filter bubble? bekijk eens: http://dontbubble.us
zoeken op Internet algemeen: 
Google 
er is veel meer dan Google 
filter bubble? bekijk eens: http://dontbubble.us 
http://www.langreiter.com/exec/yahoo-vs-google.html
zoeken op Internet algemeen: 
Google 
er is veel meer dan Google 
filter bubble? bekijk eens: http://dontbubble.us 
http://yometa.com
filter bubble? 
http://www.thefilterbubble.com
filter bubble? 
http://www.thefilterbubble.com
Web search round-up 
differences between search engines 
filter bubble 
deep web versus visible web
Searching digital libraries & archives…
composition of resources, selection…
example of Compactmemory: a great resource on 
German-Jewish history
Die Sammlung umfasst die 110 wichtigsten jüdischen 
Zeitungen und Zeitschriften des deutschsprachigen Raumes 
aus den Jahren 1806-1938. Die Periodika repräsentieren die 
gesamte religiöse, politische, soziale, literarische oder 
wissenschaftliche Bandbreite der jüdischen Gemeinschaft. 
but be aware of selection: focus on elites and organisations that 
highlight German Jewry’s process of emancipation : 
• classical vision in historiography on German Jewry? 
• reinforcement of existing master narratives?
mind the context…
Processing and searching data on your own 
computer…
1. DATA MINING
data? 
data = computer-processable information
Example of structured data
Many digital libraries/archives: 
un-/semi-structured data
Digital editions: bridging the gap with XML
http://eculture.cs.vu.nl/europeana/session/search 
•Google/ Bing/ Yahoo 
• er is veel meer ... 
• resultaten verschillen per zoekmachine 
• en er is een filter bubbel 
•--> kortom: weten wat je zoekt en zoekstrategie cruciaal 
Semantic web and linking data
•Google/ Bing/ Yahoo 
• er is veel meer ... 
• resultaten verschillen per zoekmachine 
• en er is een filter bubbel 
•--> kortom: weten wat je zoekt en zoekstrategie cruciaal 
cs.vu.nl/europeana/session/search
•Google/ Bing/ Yahoo 
• er is veel meer ... 
• resultaten verschillen per zoekmachine 
• en er is een filter bubbel 
•--> kortom: weten wat je zoekt en zoekstrategie cruciaal
Some definitions of data mining:
At its simplest, data mining is the process of extracting 
new knowledge (usually in terms of previously unknown 
patterns) from sets of data already in existence. 
Jonathan Hagood
Data mining (the analysis step of the "Knowledge Discovery in 
Databases" process, or KDD), an interdisciplinary subfield of 
computer science, is the computational process of discovering 
patterns in large data sets involving methods at the intersection 
of artificial intelligence, machine learning, statistics, and 
database systems. 
The overall goal of the data mining process is to extract 
information from a data set and transform it into an 
understandable structure for further use. 
Wikipedia
Examples of projects and techniques
an n-gram is a contiguous sequence of n 
items from a given sequence of text or speech
Topic Modeling Martha Ballard’s Diary
data? 
data & data mining ≠ neutral
“What is too often forgotten, though, is that our 
digital helpers are full of ‘theory’ and ‘judgement’ 
already. As with any methodology, they rely on sets 
of assumptions, models, and strategies. Theory is 
already at work on the most basic level when it 
comes to defining units of analysis, algorithms, and 
visualisation procedures.” 
Bernhard Rieder and Theo Röhle, ‘Digital Methods: Five 
Challenges’ in: David M Berry ed., Understanding Digital 
Humanities (Houndmills: Palgrave Macmillan, 2012) 67-85, 
70.
2. TOOLS
3. Practical exercises
Overview of exercises 
http://goo.gl/72fCn7
Tools & workflows 
Voyant Tools 
Voyant Tools Documentation 
Programming Historian 
DIRT: Digital Research Tools 
Turkel, William J., Kevin Kee, and Spencer Roberts, ‘A 
Method for Navigating the Infinite Archive’ in: Toni 
Weller ed., History in the Digital Age (London; New 
York: Routledge, 2013). 
William J. Turkel: How To
Further reading 
Special issue on Digital History, BMGN - Low Countries Historical Review, 128/4 (2013). 
Haber, Peter, Digital Past : Geschichtswissenschaft Im Digitalen Zeitalter (München: 
Oldenbourg Verlag, 2011). 
Boonstra, Onno, Leen Breure, and Peter Doorn, Past, Present and Future of Historical 
Information Science (Amsterdam: NIWI-KNAW, 2004). 
Ciravegna, Fabio, Mark Greengrass, Tim Hitchcock, Sam Chapman, Jamie McLaughlin, 
and Ravish Bhagdev, ‘Finding Needles in Hay- Stacks: Data-Mining in Distributed 
Historical Datasets’ in: Mark Greengrass and Lorna M Hughes eds., The Virtual 
Representation of the Past (Ashgate, 2008). 
Cohen, D, F Gibbs, T Hitchcock, G Rockwell, J Sander, R Shoemaker, S Sinclair, S Takats, 
W J Turkel, and C Briquet. "Data Mining with Criminal Intent." Final white paper (2011). 
Hagood, Jonathan, "A Brief Introduction to Data Mining Projects in the Humanities." 
Bulletin of the American Society for Information Science and Technology 38/4 (2012). 
Hitchcock, Tim, "Big Data for Dead People: Digital Readings and the Conundrums of 
Positivism." (9 December 2013). 
Leonard, Peter, "Mining Large Datasets for the Humanities”, IFLA WLIC 2014.
Dr. Gerben Zaagsma 
http://gerbenzaagsma.org 
de.linkedin.com/in/gerbenzaagsma/ 
https://twitter.com/gerbenzaagsma 
https://uni-goettingen.academia.edu/GerbenZaagsma 
https://www.researchgate.net/profile/Gerben_Zaagsma 
https://www.slideshare.net/gerbenzaagsma
Image credits 
The Field Museum Library, Hall 37 Geology overview. URL: https://www.flickr.com/photos/ 
field_museum_library/3333920156/in/set-72157614881700424. 
The U.S. National Archives, Photograph of Card Catalog in Central Search Room, 1942. URL: http:// 
www.flickr.com/photos/usnationalarchives/3873932255/. 
Witch computer 1951: Wolverhampton and Staffordshire College of Technology in 1961, The National 
Computing Museum and Computer Conservation Society/UKAEA/Wolverhampton Express and Star, via: 
http://www.wired.com/2009/09/britan-oldest-computer/. 
Code: https://www.flickr.com/photos/lord_james/4696338852/. 
Tools: Flickr Commons 
The droids we're googling for: https://www.flickr.com/photos/st3f4n/3951143570/. 
Jaws (Steven Spielberg) original movie poster: https://en.wikipedia.org/wiki/File:JAWS_Movie_poster.jpg 
Structured/unstructured data: http://www.emc.com/collateral/demos/microsites/emc-digital-universe- 
2011/index.htm 
Macbook Data Mining: http://www.flickr.com/photos/17208993@N00/442531562/. 
Topic Modeling Martha Ballard’s Diary: http://www.cameronblevins.org/posts/topic-modeling-martha-ballards- 
diary/. 
Boolean operators: http://uksourcers.co.uk/2012/capital-letters-the-key-to-boolean-success/ 
Miami University students in laboratory classroom 1908: https://www.flickr.com/photos/ 
muohio_digital_collections/3199691495/

Mais conteúdo relacionado

Mais procurados

International Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data ScienceInternational Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data Sciencedatasciencekorea
 
A Cabinet Of Web2.0 Scientific Curiosities
A Cabinet Of Web2.0 Scientific CuriositiesA Cabinet Of Web2.0 Scientific Curiosities
A Cabinet Of Web2.0 Scientific CuriositiesIan Mulvany
 
Big Data in the Arts and Humanities
Big Data in the Arts and HumanitiesBig Data in the Arts and Humanities
Big Data in the Arts and HumanitiesAndrew Prescott
 
Humanities in the Digital World
Humanities in the Digital WorldHumanities in the Digital World
Humanities in the Digital WorldDavid De Roure
 
From DARPA to Shakespeare: All the Data we Can Handle
From DARPA to Shakespeare: All the Data we Can Handle From DARPA to Shakespeare: All the Data we Can Handle
From DARPA to Shakespeare: All the Data we Can Handle Kimberly Hoffman
 
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddjData-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddjMirko Lorenz
 
New Forms of Data for e-Research
New Forms of Data for e-ResearchNew Forms of Data for e-Research
New Forms of Data for e-ResearchDavid De Roure
 
CLIR Fellows - Science Data - 14_0730
CLIR Fellows - Science Data - 14_0730CLIR Fellows - Science Data - 14_0730
CLIR Fellows - Science Data - 14_0730jeffreylancaster
 
MPhil Lecture of Data Vis for Presentation
MPhil Lecture of Data Vis for PresentationMPhil Lecture of Data Vis for Presentation
MPhil Lecture of Data Vis for PresentationShawn Day
 
Data Management Solutions from Libraries at NSF Large Facilities Workshop
Data Management Solutions from Libraries at NSF Large Facilities WorkshopData Management Solutions from Libraries at NSF Large Facilities Workshop
Data Management Solutions from Libraries at NSF Large Facilities WorkshopCarly Strasser
 
Scholarship in the Digital World
Scholarship in the Digital WorldScholarship in the Digital World
Scholarship in the Digital WorldDavid De Roure
 
CUA Humanities Lecture on Scholarly Communications LSC634 Fall2014
CUA Humanities Lecture on Scholarly Communications LSC634 Fall2014CUA Humanities Lecture on Scholarly Communications LSC634 Fall2014
CUA Humanities Lecture on Scholarly Communications LSC634 Fall2014Kimberly Hoffman
 
Intro to Linked Open Data in Libraries Archives & Museums.
Intro to Linked Open Data in Libraries Archives & Museums.Intro to Linked Open Data in Libraries Archives & Museums.
Intro to Linked Open Data in Libraries Archives & Museums.Jon Voss
 
Beyond Preservation: Situating Archaeological Data in Professional Practice
Beyond Preservation: Situating Archaeological Data in Professional PracticeBeyond Preservation: Situating Archaeological Data in Professional Practice
Beyond Preservation: Situating Archaeological Data in Professional PracticeEric Kansa
 
How to Build Linked Data Sites with Drupal 7 and RDFa
How to Build Linked Data Sites with Drupal 7 and RDFaHow to Build Linked Data Sites with Drupal 7 and RDFa
How to Build Linked Data Sites with Drupal 7 and RDFascorlosquet
 
Google Tools for Digital Humanities Scholars
Google Tools for Digital Humanities ScholarsGoogle Tools for Digital Humanities Scholars
Google Tools for Digital Humanities ScholarsShawn Day
 

Mais procurados (20)

Rogers digitalmethods 4nov2010
Rogers digitalmethods 4nov2010Rogers digitalmethods 4nov2010
Rogers digitalmethods 4nov2010
 
Semantic web Santhosh N Basavarajappa
Semantic web   Santhosh N BasavarajappaSemantic web   Santhosh N Basavarajappa
Semantic web Santhosh N Basavarajappa
 
International Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data ScienceInternational Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data Science
 
A Cabinet Of Web2.0 Scientific Curiosities
A Cabinet Of Web2.0 Scientific CuriositiesA Cabinet Of Web2.0 Scientific Curiosities
A Cabinet Of Web2.0 Scientific Curiosities
 
Big Data in the Arts and Humanities
Big Data in the Arts and HumanitiesBig Data in the Arts and Humanities
Big Data in the Arts and Humanities
 
Humanities in the Digital World
Humanities in the Digital WorldHumanities in the Digital World
Humanities in the Digital World
 
From DARPA to Shakespeare: All the Data we Can Handle
From DARPA to Shakespeare: All the Data we Can Handle From DARPA to Shakespeare: All the Data we Can Handle
From DARPA to Shakespeare: All the Data we Can Handle
 
Mini-Training: DataViz, data-driven documents and D3.js
Mini-Training: DataViz, data-driven documents and D3.jsMini-Training: DataViz, data-driven documents and D3.js
Mini-Training: DataViz, data-driven documents and D3.js
 
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddjData-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
 
New Forms of Data for e-Research
New Forms of Data for e-ResearchNew Forms of Data for e-Research
New Forms of Data for e-Research
 
CLIR Fellows - Science Data - 14_0730
CLIR Fellows - Science Data - 14_0730CLIR Fellows - Science Data - 14_0730
CLIR Fellows - Science Data - 14_0730
 
MPhil Lecture of Data Vis for Presentation
MPhil Lecture of Data Vis for PresentationMPhil Lecture of Data Vis for Presentation
MPhil Lecture of Data Vis for Presentation
 
Data Management Solutions from Libraries at NSF Large Facilities Workshop
Data Management Solutions from Libraries at NSF Large Facilities WorkshopData Management Solutions from Libraries at NSF Large Facilities Workshop
Data Management Solutions from Libraries at NSF Large Facilities Workshop
 
Scholarship in the Digital World
Scholarship in the Digital WorldScholarship in the Digital World
Scholarship in the Digital World
 
CUA Humanities Lecture on Scholarly Communications LSC634 Fall2014
CUA Humanities Lecture on Scholarly Communications LSC634 Fall2014CUA Humanities Lecture on Scholarly Communications LSC634 Fall2014
CUA Humanities Lecture on Scholarly Communications LSC634 Fall2014
 
Intro to Linked Open Data in Libraries Archives & Museums.
Intro to Linked Open Data in Libraries Archives & Museums.Intro to Linked Open Data in Libraries Archives & Museums.
Intro to Linked Open Data in Libraries Archives & Museums.
 
Beyond Preservation: Situating Archaeological Data in Professional Practice
Beyond Preservation: Situating Archaeological Data in Professional PracticeBeyond Preservation: Situating Archaeological Data in Professional Practice
Beyond Preservation: Situating Archaeological Data in Professional Practice
 
How to Build Linked Data Sites with Drupal 7 and RDFa
How to Build Linked Data Sites with Drupal 7 and RDFaHow to Build Linked Data Sites with Drupal 7 and RDFa
How to Build Linked Data Sites with Drupal 7 and RDFa
 
Our World is Socio-technical
Our World is Socio-technicalOur World is Socio-technical
Our World is Socio-technical
 
Google Tools for Digital Humanities Scholars
Google Tools for Digital Humanities ScholarsGoogle Tools for Digital Humanities Scholars
Google Tools for Digital Humanities Scholars
 

Destaque

Data mining slides
Data mining slidesData mining slides
Data mining slidessmj
 
Data Mining Ieee Papers Trichy
Data Mining Ieee Papers TrichyData Mining Ieee Papers Trichy
Data Mining Ieee Papers Trichykrish madhi
 
Presentation data mining(1)
Presentation data mining(1)Presentation data mining(1)
Presentation data mining(1)cegonsoft1999
 
Cloud computing 2015 ieee papers Data mining ieee project titles
Cloud computing  2015 ieee papers  Data mining ieee project titlesCloud computing  2015 ieee papers  Data mining ieee project titles
Cloud computing 2015 ieee papers Data mining ieee project titlesDoClick Solutions
 
Project center in trichy @ieee 2016 17 titles for java and dotnet
Project center in trichy @ieee 2016 17 titles for java and dotnetProject center in trichy @ieee 2016 17 titles for java and dotnet
Project center in trichy @ieee 2016 17 titles for java and dotnetElakkiya Triplen
 
MINING HEALTH EXAMINATION RECORDS A GRAPH-BASED APPROACH
MINING HEALTH EXAMINATION RECORDS  A GRAPH-BASED APPROACHMINING HEALTH EXAMINATION RECORDS  A GRAPH-BASED APPROACH
MINING HEALTH EXAMINATION RECORDS A GRAPH-BASED APPROACHNexgen Technology
 
Mining Electronic Health Records for Insights
Mining Electronic Health Records for InsightsMining Electronic Health Records for Insights
Mining Electronic Health Records for InsightsOntotext
 
Final year IEEE,NON IEEE projects for 2013-14 for BCA,BTECH,Diploma,Mtech,MCA
Final year IEEE,NON IEEE projects for 2013-14 for BCA,BTECH,Diploma,Mtech,MCAFinal year IEEE,NON IEEE projects for 2013-14 for BCA,BTECH,Diploma,Mtech,MCA
Final year IEEE,NON IEEE projects for 2013-14 for BCA,BTECH,Diploma,Mtech,MCAprojectsepark
 
Data mining on social networks for students learning experiences
Data mining on social networks for students learning experiences Data mining on social networks for students learning experiences
Data mining on social networks for students learning experiences Biplab Debnath
 
SMART HEALTH PREDICTION USING DATA MINING by Dr.Mahboob Khan Phd
SMART HEALTH PREDICTION USING DATA MINING by Dr.Mahboob Khan PhdSMART HEALTH PREDICTION USING DATA MINING by Dr.Mahboob Khan Phd
SMART HEALTH PREDICTION USING DATA MINING by Dr.Mahboob Khan PhdHealthcare consultant
 
Smart health prediction using data mining by customsoft
Smart health prediction using data mining by customsoftSmart health prediction using data mining by customsoft
Smart health prediction using data mining by customsoftCustom Soft
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & HadoopEdureka!
 
Monkey runner & Monkey testing
Monkey runner & Monkey testingMonkey runner & Monkey testing
Monkey runner & Monkey testingSWAAM Tech
 
Data mining seminar report
Data mining seminar reportData mining seminar report
Data mining seminar reportmayurik19
 

Destaque (20)

Data mining slides
Data mining slidesData mining slides
Data mining slides
 
Data Mining Ieee Papers Trichy
Data Mining Ieee Papers TrichyData Mining Ieee Papers Trichy
Data Mining Ieee Papers Trichy
 
Presentation data mining(1)
Presentation data mining(1)Presentation data mining(1)
Presentation data mining(1)
 
Cloud computing 2015 ieee papers Data mining ieee project titles
Cloud computing  2015 ieee papers  Data mining ieee project titlesCloud computing  2015 ieee papers  Data mining ieee project titles
Cloud computing 2015 ieee papers Data mining ieee project titles
 
Project center in trichy @ieee 2016 17 titles for java and dotnet
Project center in trichy @ieee 2016 17 titles for java and dotnetProject center in trichy @ieee 2016 17 titles for java and dotnet
Project center in trichy @ieee 2016 17 titles for java and dotnet
 
MINING HEALTH EXAMINATION RECORDS A GRAPH-BASED APPROACH
MINING HEALTH EXAMINATION RECORDS  A GRAPH-BASED APPROACHMINING HEALTH EXAMINATION RECORDS  A GRAPH-BASED APPROACH
MINING HEALTH EXAMINATION RECORDS A GRAPH-BASED APPROACH
 
Mining Electronic Health Records for Insights
Mining Electronic Health Records for InsightsMining Electronic Health Records for Insights
Mining Electronic Health Records for Insights
 
PPT FOR BIG
PPT FOR BIGPPT FOR BIG
PPT FOR BIG
 
Final year IEEE,NON IEEE projects for 2013-14 for BCA,BTECH,Diploma,Mtech,MCA
Final year IEEE,NON IEEE projects for 2013-14 for BCA,BTECH,Diploma,Mtech,MCAFinal year IEEE,NON IEEE projects for 2013-14 for BCA,BTECH,Diploma,Mtech,MCA
Final year IEEE,NON IEEE projects for 2013-14 for BCA,BTECH,Diploma,Mtech,MCA
 
Data mining on social networks for students learning experiences
Data mining on social networks for students learning experiences Data mining on social networks for students learning experiences
Data mining on social networks for students learning experiences
 
Data mining
Data miningData mining
Data mining
 
Text categorization
Text categorizationText categorization
Text categorization
 
SMART HEALTH PREDICTION USING DATA MINING by Dr.Mahboob Khan Phd
SMART HEALTH PREDICTION USING DATA MINING by Dr.Mahboob Khan PhdSMART HEALTH PREDICTION USING DATA MINING by Dr.Mahboob Khan Phd
SMART HEALTH PREDICTION USING DATA MINING by Dr.Mahboob Khan Phd
 
Smart health prediction using data mining by customsoft
Smart health prediction using data mining by customsoftSmart health prediction using data mining by customsoft
Smart health prediction using data mining by customsoft
 
Monkey talk
Monkey talkMonkey talk
Monkey talk
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
Monkey runner & Monkey testing
Monkey runner & Monkey testingMonkey runner & Monkey testing
Monkey runner & Monkey testing
 
HMI
HMIHMI
HMI
 
Human machine interface
Human machine interfaceHuman machine interface
Human machine interface
 
Data mining seminar report
Data mining seminar reportData mining seminar report
Data mining seminar report
 

Semelhante a SEMINAR ON SEARCH & DATA MINING SKILLS

Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving UpPaco Nathan
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Big Data Spain
 
Digital research: Collections, data, tools and methods
Digital research: Collections, data, tools and methods Digital research: Collections, data, tools and methods
Digital research: Collections, data, tools and methods Stella Wisdom
 
Digital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social SciencesDigital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social SciencesChantal van Son
 
Accessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeAccessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeJosh Cowls
 
Critical issues in the collection, analysis and use of student (digital) data
Critical issues in the collection, analysis and use of student (digital) dataCritical issues in the collection, analysis and use of student (digital) data
Critical issues in the collection, analysis and use of student (digital) dataUniversity of South Africa (Unisa)
 
Mapping (big) data science (15 dec2014)대학(원)생
Mapping (big) data science (15 dec2014)대학(원)생Mapping (big) data science (15 dec2014)대학(원)생
Mapping (big) data science (15 dec2014)대학(원)생Han Woo PARK
 
When Search becomes Research and Research becomes Search
When Search becomes Research and Research becomes SearchWhen Search becomes Research and Research becomes Search
When Search becomes Research and Research becomes SearchJaap Kamps
 
Digital Humanities by Ingrid Thomson
Digital Humanities  by Ingrid ThomsonDigital Humanities  by Ingrid Thomson
Digital Humanities by Ingrid Thomsonpvhead123
 
Digital Humanities - Conversation Starter 2015
Digital Humanities - Conversation Starter 2015Digital Humanities - Conversation Starter 2015
Digital Humanities - Conversation Starter 2015University of Cape Town
 
Introduction to the Venice Time Machine
Introduction to the Venice Time MachineIntroduction to the Venice Time Machine
Introduction to the Venice Time MachineGiovanni Colavizza
 
Data Culture Series - Keynote & Panel - Birmingham - 8th April 2015
Data Culture Series  - Keynote & Panel - Birmingham - 8th April 2015Data Culture Series  - Keynote & Panel - Birmingham - 8th April 2015
Data Culture Series - Keynote & Panel - Birmingham - 8th April 2015Jonathan Woodward
 
Exploring human behaviour in interdisciplinary learning environments - Ali Fi...
Exploring human behaviour in interdisciplinary learning environments - Ali Fi...Exploring human behaviour in interdisciplinary learning environments - Ali Fi...
Exploring human behaviour in interdisciplinary learning environments - Ali Fi...The Higher Education Academy
 
Big Data in NATO and Your Role
Big Data in NATO and Your RoleBig Data in NATO and Your Role
Big Data in NATO and Your RoleJay Gendron
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Paul Groth
 
Digital project planning and pedagogy
Digital project planning and pedagogyDigital project planning and pedagogy
Digital project planning and pedagogylibrarianrafia
 

Semelhante a SEMINAR ON SEARCH & DATA MINING SKILLS (20)

Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
 
Digital research: Collections, data, tools and methods
Digital research: Collections, data, tools and methods Digital research: Collections, data, tools and methods
Digital research: Collections, data, tools and methods
 
Digital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social SciencesDigital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social Sciences
 
Accessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeAccessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science Knowledge
 
Critical issues in the collection, analysis and use of student (digital) data
Critical issues in the collection, analysis and use of student (digital) dataCritical issues in the collection, analysis and use of student (digital) data
Critical issues in the collection, analysis and use of student (digital) data
 
AHRC CDP Digital Humanities 101
AHRC CDP Digital Humanities 101  AHRC CDP Digital Humanities 101
AHRC CDP Digital Humanities 101
 
Mapping (big) data science (15 dec2014)대학(원)생
Mapping (big) data science (15 dec2014)대학(원)생Mapping (big) data science (15 dec2014)대학(원)생
Mapping (big) data science (15 dec2014)대학(원)생
 
When Search becomes Research and Research becomes Search
When Search becomes Research and Research becomes SearchWhen Search becomes Research and Research becomes Search
When Search becomes Research and Research becomes Search
 
Dh presentation 2018
Dh presentation 2018Dh presentation 2018
Dh presentation 2018
 
Digital Humanities Workshop
Digital Humanities WorkshopDigital Humanities Workshop
Digital Humanities Workshop
 
Digital Methods by Richard Rogers
Digital Methods by Richard RogersDigital Methods by Richard Rogers
Digital Methods by Richard Rogers
 
Digital Humanities by Ingrid Thomson
Digital Humanities  by Ingrid ThomsonDigital Humanities  by Ingrid Thomson
Digital Humanities by Ingrid Thomson
 
Digital Humanities - Conversation Starter 2015
Digital Humanities - Conversation Starter 2015Digital Humanities - Conversation Starter 2015
Digital Humanities - Conversation Starter 2015
 
Introduction to the Venice Time Machine
Introduction to the Venice Time MachineIntroduction to the Venice Time Machine
Introduction to the Venice Time Machine
 
Data Culture Series - Keynote & Panel - Birmingham - 8th April 2015
Data Culture Series  - Keynote & Panel - Birmingham - 8th April 2015Data Culture Series  - Keynote & Panel - Birmingham - 8th April 2015
Data Culture Series - Keynote & Panel - Birmingham - 8th April 2015
 
Exploring human behaviour in interdisciplinary learning environments - Ali Fi...
Exploring human behaviour in interdisciplinary learning environments - Ali Fi...Exploring human behaviour in interdisciplinary learning environments - Ali Fi...
Exploring human behaviour in interdisciplinary learning environments - Ali Fi...
 
Big Data in NATO and Your Role
Big Data in NATO and Your RoleBig Data in NATO and Your Role
Big Data in NATO and Your Role
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
 
Digital project planning and pedagogy
Digital project planning and pedagogyDigital project planning and pedagogy
Digital project planning and pedagogy
 

Mais de Gerben Zaagsma

20130315 - Cursus Digitaal Historisch Onderzoek 2013: College 3 - Bronnenkri...
20130315 - Cursus Digitaal Historisch Onderzoek 2013: College 3  - Bronnenkri...20130315 - Cursus Digitaal Historisch Onderzoek 2013: College 3  - Bronnenkri...
20130315 - Cursus Digitaal Historisch Onderzoek 2013: College 3 - Bronnenkri...Gerben Zaagsma
 
20130314 - Historical sources and data in the digital age
20130314 - Historical sources and data in the digital age20130314 - Historical sources and data in the digital age
20130314 - Historical sources and data in the digital ageGerben Zaagsma
 
20130301 - Cursus Digitaal Historisch Onderzoek 2013: College 2 - Historische...
20130301 - Cursus Digitaal Historisch Onderzoek 2013: College 2 - Historische...20130301 - Cursus Digitaal Historisch Onderzoek 2013: College 2 - Historische...
20130301 - Cursus Digitaal Historisch Onderzoek 2013: College 2 - Historische...Gerben Zaagsma
 
20130215 - Cursus Digitaal Historisch Onderzoek 2013: College 1 - Inleiding
20130215 - Cursus Digitaal Historisch Onderzoek 2013: College 1 - Inleiding20130215 - Cursus Digitaal Historisch Onderzoek 2013: College 1 - Inleiding
20130215 - Cursus Digitaal Historisch Onderzoek 2013: College 1 - InleidingGerben Zaagsma
 
20130107 - Introduction: On Digital History
20130107 -  Introduction: On Digital History20130107 -  Introduction: On Digital History
20130107 - Introduction: On Digital HistoryGerben Zaagsma
 
20110517 - Presenting the Yiddish past in contemporary Europe
20110517 - Presenting the Yiddish past in contemporary Europe20110517 - Presenting the Yiddish past in contemporary Europe
20110517 - Presenting the Yiddish past in contemporary EuropeGerben Zaagsma
 
20111031 - Online Jewish content in a broader context
20111031 - Online Jewish content in a broader context20111031 - Online Jewish content in a broader context
20111031 - Online Jewish content in a broader contextGerben Zaagsma
 

Mais de Gerben Zaagsma (7)

20130315 - Cursus Digitaal Historisch Onderzoek 2013: College 3 - Bronnenkri...
20130315 - Cursus Digitaal Historisch Onderzoek 2013: College 3  - Bronnenkri...20130315 - Cursus Digitaal Historisch Onderzoek 2013: College 3  - Bronnenkri...
20130315 - Cursus Digitaal Historisch Onderzoek 2013: College 3 - Bronnenkri...
 
20130314 - Historical sources and data in the digital age
20130314 - Historical sources and data in the digital age20130314 - Historical sources and data in the digital age
20130314 - Historical sources and data in the digital age
 
20130301 - Cursus Digitaal Historisch Onderzoek 2013: College 2 - Historische...
20130301 - Cursus Digitaal Historisch Onderzoek 2013: College 2 - Historische...20130301 - Cursus Digitaal Historisch Onderzoek 2013: College 2 - Historische...
20130301 - Cursus Digitaal Historisch Onderzoek 2013: College 2 - Historische...
 
20130215 - Cursus Digitaal Historisch Onderzoek 2013: College 1 - Inleiding
20130215 - Cursus Digitaal Historisch Onderzoek 2013: College 1 - Inleiding20130215 - Cursus Digitaal Historisch Onderzoek 2013: College 1 - Inleiding
20130215 - Cursus Digitaal Historisch Onderzoek 2013: College 1 - Inleiding
 
20130107 - Introduction: On Digital History
20130107 -  Introduction: On Digital History20130107 -  Introduction: On Digital History
20130107 - Introduction: On Digital History
 
20110517 - Presenting the Yiddish past in contemporary Europe
20110517 - Presenting the Yiddish past in contemporary Europe20110517 - Presenting the Yiddish past in contemporary Europe
20110517 - Presenting the Yiddish past in contemporary Europe
 
20111031 - Online Jewish content in a broader context
20111031 - Online Jewish content in a broader context20111031 - Online Jewish content in a broader context
20111031 - Online Jewish content in a broader context
 

Último

Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 

Último (20)

Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 

SEMINAR ON SEARCH & DATA MINING SKILLS

  • 1. Search & Data Mining SKILLS SEMINAR Master of European History, University of Luxembourg, 11 December 2014 Gerben Zaagsma Lichtenberg-Kolleg,
  • 2.
  • 3. Overview 1. 2. T 3. Practical exercises 1. Introduction search & data mining
  • 4. Code yourself… …or use existing tools
  • 5.
  • 6. Why historians should be interested: Old New CHANGE Analogue resources Digital resources SCALE Small data Big data Close reading Distant reading TECHNOLOGY
  • 7. the Big Data revolution? Big data and claims about a paradigm change in the humanities
  • 9.
  • 10.
  • 11. the Big Data revolution? Big data and claims about a paradigm change in the humanities Data driven history
  • 12. the Big Data revolution? Big data and claims about a paradigm change in the humanities Data driven history Patterns and structures: a new essentialism?
  • 13. the Big Data revolution? Big data and claims about a paradigm change in the humanities Data driven history Patterns and structures: a new essentialism? Based upon changes of scale & method: humanities supposedly becoming more ‘scientific’ > results can be checked and replicated, but can they? Interpretation.
  • 14. the Big Data revolution? Big data and claims about a paradigm change in the humanities Data driven history Patterns and structures: a new essentialism? Based upon changes of scale & method: humanities supposedly becoming more ‘scientific’ > results can be checked and replicated, but can they? Interpretation. Politics: funding & valorisation
  • 15. “One of the problems confronting data enthusiasts in the humanities is that we feel a need to convince our more old-fashioned colleagues about what can be done. But our role as advocates of data shouldn't mean that we lose our critical sense as scholars. [....] there is a risk that we look more carefully at the technical components of the datasets than the historical context of the information that they represent. Andrew Prescott, ‘The Deceptions of Data’, Digital Riffs (13 January 2013).
  • 16. Frédéric Clavert, ‘Lecture des sources historiennes à l’ère numérique’ (14 November 2012) Integrate approaches & methods/ hybridity
  • 18. Google/ Bing/ Yahoo er is veel meer ...
  • 19. zoeken op Internet algemeen: Google er is veel meer dan Google filter bubble? bekijk eens: http://dontbubble.us
  • 20. zoeken op Internet algemeen: Google er is veel meer dan Google filter bubble? bekijk eens: http://dontbubble.us http://www.langreiter.com/exec/yahoo-vs-google.html
  • 21. zoeken op Internet algemeen: Google er is veel meer dan Google filter bubble? bekijk eens: http://dontbubble.us http://yometa.com
  • 24.
  • 25. Web search round-up differences between search engines filter bubble deep web versus visible web
  • 28. example of Compactmemory: a great resource on German-Jewish history
  • 29. Die Sammlung umfasst die 110 wichtigsten jüdischen Zeitungen und Zeitschriften des deutschsprachigen Raumes aus den Jahren 1806-1938. Die Periodika repräsentieren die gesamte religiöse, politische, soziale, literarische oder wissenschaftliche Bandbreite der jüdischen Gemeinschaft. but be aware of selection: focus on elites and organisations that highlight German Jewry’s process of emancipation : • classical vision in historiography on German Jewry? • reinforcement of existing master narratives?
  • 31.
  • 32.
  • 33.
  • 34. Processing and searching data on your own computer…
  • 35.
  • 36.
  • 37.
  • 39.
  • 40. data? data = computer-processable information
  • 41.
  • 43. Many digital libraries/archives: un-/semi-structured data
  • 44. Digital editions: bridging the gap with XML
  • 45.
  • 46.
  • 47. http://eculture.cs.vu.nl/europeana/session/search •Google/ Bing/ Yahoo • er is veel meer ... • resultaten verschillen per zoekmachine • en er is een filter bubbel •--> kortom: weten wat je zoekt en zoekstrategie cruciaal Semantic web and linking data
  • 48. •Google/ Bing/ Yahoo • er is veel meer ... • resultaten verschillen per zoekmachine • en er is een filter bubbel •--> kortom: weten wat je zoekt en zoekstrategie cruciaal cs.vu.nl/europeana/session/search
  • 49. •Google/ Bing/ Yahoo • er is veel meer ... • resultaten verschillen per zoekmachine • en er is een filter bubbel •--> kortom: weten wat je zoekt en zoekstrategie cruciaal
  • 50. Some definitions of data mining:
  • 51. At its simplest, data mining is the process of extracting new knowledge (usually in terms of previously unknown patterns) from sets of data already in existence. Jonathan Hagood
  • 52. Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. Wikipedia
  • 53. Examples of projects and techniques
  • 54.
  • 55. an n-gram is a contiguous sequence of n items from a given sequence of text or speech
  • 56.
  • 57.
  • 58. Topic Modeling Martha Ballard’s Diary
  • 59. data? data & data mining ≠ neutral
  • 60. “What is too often forgotten, though, is that our digital helpers are full of ‘theory’ and ‘judgement’ already. As with any methodology, they rely on sets of assumptions, models, and strategies. Theory is already at work on the most basic level when it comes to defining units of analysis, algorithms, and visualisation procedures.” Bernhard Rieder and Theo Röhle, ‘Digital Methods: Five Challenges’ in: David M Berry ed., Understanding Digital Humanities (Houndmills: Palgrave Macmillan, 2012) 67-85, 70.
  • 63. Overview of exercises http://goo.gl/72fCn7
  • 64. Tools & workflows Voyant Tools Voyant Tools Documentation Programming Historian DIRT: Digital Research Tools Turkel, William J., Kevin Kee, and Spencer Roberts, ‘A Method for Navigating the Infinite Archive’ in: Toni Weller ed., History in the Digital Age (London; New York: Routledge, 2013). William J. Turkel: How To
  • 65. Further reading Special issue on Digital History, BMGN - Low Countries Historical Review, 128/4 (2013). Haber, Peter, Digital Past : Geschichtswissenschaft Im Digitalen Zeitalter (München: Oldenbourg Verlag, 2011). Boonstra, Onno, Leen Breure, and Peter Doorn, Past, Present and Future of Historical Information Science (Amsterdam: NIWI-KNAW, 2004). Ciravegna, Fabio, Mark Greengrass, Tim Hitchcock, Sam Chapman, Jamie McLaughlin, and Ravish Bhagdev, ‘Finding Needles in Hay- Stacks: Data-Mining in Distributed Historical Datasets’ in: Mark Greengrass and Lorna M Hughes eds., The Virtual Representation of the Past (Ashgate, 2008). Cohen, D, F Gibbs, T Hitchcock, G Rockwell, J Sander, R Shoemaker, S Sinclair, S Takats, W J Turkel, and C Briquet. "Data Mining with Criminal Intent." Final white paper (2011). Hagood, Jonathan, "A Brief Introduction to Data Mining Projects in the Humanities." Bulletin of the American Society for Information Science and Technology 38/4 (2012). Hitchcock, Tim, "Big Data for Dead People: Digital Readings and the Conundrums of Positivism." (9 December 2013). Leonard, Peter, "Mining Large Datasets for the Humanities”, IFLA WLIC 2014.
  • 66. Dr. Gerben Zaagsma http://gerbenzaagsma.org de.linkedin.com/in/gerbenzaagsma/ https://twitter.com/gerbenzaagsma https://uni-goettingen.academia.edu/GerbenZaagsma https://www.researchgate.net/profile/Gerben_Zaagsma https://www.slideshare.net/gerbenzaagsma
  • 67. Image credits The Field Museum Library, Hall 37 Geology overview. URL: https://www.flickr.com/photos/ field_museum_library/3333920156/in/set-72157614881700424. The U.S. National Archives, Photograph of Card Catalog in Central Search Room, 1942. URL: http:// www.flickr.com/photos/usnationalarchives/3873932255/. Witch computer 1951: Wolverhampton and Staffordshire College of Technology in 1961, The National Computing Museum and Computer Conservation Society/UKAEA/Wolverhampton Express and Star, via: http://www.wired.com/2009/09/britan-oldest-computer/. Code: https://www.flickr.com/photos/lord_james/4696338852/. Tools: Flickr Commons The droids we're googling for: https://www.flickr.com/photos/st3f4n/3951143570/. Jaws (Steven Spielberg) original movie poster: https://en.wikipedia.org/wiki/File:JAWS_Movie_poster.jpg Structured/unstructured data: http://www.emc.com/collateral/demos/microsites/emc-digital-universe- 2011/index.htm Macbook Data Mining: http://www.flickr.com/photos/17208993@N00/442531562/. Topic Modeling Martha Ballard’s Diary: http://www.cameronblevins.org/posts/topic-modeling-martha-ballards- diary/. Boolean operators: http://uksourcers.co.uk/2012/capital-letters-the-key-to-boolean-success/ Miami University students in laboratory classroom 1908: https://www.flickr.com/photos/ muohio_digital_collections/3199691495/