Title: Computational Social Science – what is it and what can(‘t) it do?
What is your talk about?
In Computational Social Science (CSS) we use computer science algorithms to analyse qualitative data at scale. In this talk I define CSS, describe what the opportunities and barriers are in using such methods, and give examples from published research, for example on analysing thousands of Ofsted documents.
What are the key messages of your talk?
The use of CSS methods makes it is possible to analyse some data sources at scale that previously would be unrealistic to analyse ‘by hand’.
What are the implications for practice or research from your talk?
CSS allows both more qualitative and more quantitative researchers to analyse unstructured data sources at scale.
Short Biography
Dr Christian Bokhove is an Associate Professor in Mathematics. In his research, he combines conventional qualitative and quantitative methods with novel computational methods.
Computational Social Science – what is it and what can(‘t) it do?
1. COMPUTATIONAL SOCIAL
SCIENCE – WHAT IS IT AND WHAT
CAN(‘T) IT DO?
Seminar, 2 February 2021
Dr Christian Bokhove
Southampton Education School
University of Southampton
2. Who am I?
• Dr Christian Bokhove
• From 1998-2012 teacher mathematics & computer science, head of
ICT secondary school Netherlands
• PhD Utrecht University
‘Use of ICT for acquiring, practicing and assessing algebraic expertise’
• Associate Professor at University of
Southampton
• Mathematics education
• Technology use
• Large-scale assessment (PISA/TIMSS)
• Research methods
• Always have liked to combine education
research with computational techniques
3. This seminar
• The features of Computational Social Science (CSS).
• We look at examples of such research from three different
types of CSS:
1. Automated social information extraction;
2. Social networks and social complexity;
3. Social simulation modelling;
Note that each study has much more detail which I won’t cover; I give
the references for when you want to follow up.
• The challenges of CSS
• Conclusion
5. Bokhove, C. (In press). Computational research methods and data science. In R. Coe, M. Waring, L. V. Hedges,
& L. Day Ashley (Eds.), Research Methods and Methodologies in Education Sage Publishing.
6. Computational research methods
Approach that relies on forms of automated analysis of information,
using computers, to answer education research questions.
The methods can include one or more of the following:
• Analysis depends on algorithms, including the use of
• Artificial intelligence (AI) - computers make complex, human-like judgements
• Machine Learning (ML) - computers learn to copy human behaviour
• Data sets are usually large scale, 'Big Data', sometimes millions of
sources are collected and analysed.
• Information already exists, rather than collected specifically for
research.
• 'Scraping' from websites (news, reports, blogs, etc)
• Extraction from databases and archives created for other purposes (eg journal
contents, interactions with a learning platform)
• Social networks (e.g. social media)
• Simulating new data
7. Characteristics
“Interdisciplinary concern an investigation of the social universe on
many scales” (Cioffi-Revilla, 2017, p. 2).
• different (social) groupings with a great variety of organisational, temporal, and
spatial dimensions.
‘Computational’ refers to computer-based instruments, but also
concepts and theories, like algorithms that can extract information and
computer simulation models.
So, in that sense, the ‘interdisciplinary’ refers to multiple aspects like
concepts, principles, theories, and research methods. The range of
these tools also keeps on expanding with ever-improving technology.
The multidisciplinary character is exemplified by multiple fields like
social sciences, applied computer science and data science
approaches coming together. Note, though, that this is not always a
convenient boundary. The boundary between disciplines remains fuzzy.
8. Information processing paradigm
• Society’s social systems and processes generate
information and it is an understanding of this information
that plays a fundamental role in explaining and
understanding social complexity.
• Two important differences with ‘traditional social science’:
• Assumes that information processing is key to understand society;
human and social processing of information is fundamental.
• Embraces ‘computing’ as fundamental approach for understanding
and modelling such social complexity (not ‘replacing’ other methods
but rather complementing other historical, statistical, or
mathematical methods).
9. Technology….
• ….has allowed us to get a more fine-grained picture of
interactions over time, giving us further insight in both the
structure and content of such relationships.
(limitations apply, of course)
• ….this is not just restricted to interactional data. It can
also include textual and auditory data; advances in
Natural Language Processing and linguistic developments
allow us to also include data that typically were very hard
to study at scale.
10. Adapting Cioffi-Revilla (2017), we can distinguish different
types of computational social science, each with associated
computational research methods.
• Automated social information extraction;
• Social networks and social complexity;
• Social simulation modelling;
Cioffi-Revilla, C. (2017). Introduction to computational social science (2nd edition).
London, UK: Springer.
12. For example, Bokhove
(2015) scraped thousands of
OFSTED reports from the
inspection website to answer
the question whether topics
and sentiments in the
reports had changed over
time, so-called ‘sentiment
analysis.
Bokhove, C. (2015). Text mining school inspection reports in England with R. University of
Southampton.
13.
14.
15.
16. Bokhove, C., & Sims, S. (2020). Demonstrating the potential of text mining for analyzing school inspection
reports: a sentiment analysis of 17,000 Ofsted documents. International Journal of Research and Method in
Education. https://doi.org/10.1080/1743727X.2020.1819228
17. CRISP-DM data mining process
• The first phase, Organizational Understanding, involves
gaining an understanding of the institution and the data it
produces: what is available, what does it say, and how
could it be used?
• The second, Data Understanding, involves investigating
the precise format of the data.
• In phase three, Data Preparation, the data is transformed
into a format that is understandable for the software that
will perform the analyses.
• Finally in phase 4, Modelling, the analytical procedure is
applied to the data.
18.
19. Boxplot showing the distribution of sentiment scores by inspection grade. N=3,155.
20. Average sentiment score for the corpus of inspection documents by Chief Inspector. N=17,212.
21. Decomposing the proportional contribution to average sentiment scores
among the twelve most influential words. N=17,212.
22. BHR: Behaviour;
COM: Community, meeting, social, networking,
conferences);
CUR: Topics to do with the curriculum
(Including FE and HE);
DOE: Department for Education matters,
policies etc.;
LIT: Reading, writing and matters to do with
literacy;
SKC: Skills, knowledge, cognition;
STA: Staff issues, training;
SUB: Subjects (may be specific such as MFL
(modern foreign languages) or inferred);
TPR: Teaching practice;
Hewitt et al. (2020) scraped
educational blogs to see if changes
arose after policy changes. Such
developments have been made
possible by advances in computational
linguistics. From 2000 several so-
called ‘topic-modelling techniques’
have been developed, such as latent
semantic analysis (Dumais, 2004) and
latent Dirichlet allocation (Blei, Ng, &
Jordan, 2003).
Hewitt, S., Tiropanis, T., & Bokhove, C. (2020). The reception of education reforms through the
Blogosphere. In WebSci '20: 12th ACM Conference on Web Science (pp. 194-201). ACM.
https://doi.org/10.1145/3394231.3397909
23. • Munoz-Najar Galvez et al. (2019) used text analysis to
study the paradigm wars in graduate research in the field of
education.
• Research trends in 137,024 dissertation abstracts from
1980 to 2010 and related these to students’ academic
employment outcomes.
• Structural topic models (with the stm package in the R
language) to detect overarching themes in large collections
of text: to find research areas, methodologies, and theories
in the field and show how these topics change over time.
https://www.structuraltopicmodel.com/
Munoz-Najar Galvez, S., Heiberger, R., & McFarland, D. (2020). Paradigm wars revisited: A cartography
of graduate research in the field of education (1980–2010). American Educational Research Journal,
57(2), 612-652.
25. • Topic modelling by Inglis and Foster (2018) with the
package MALLET, to study evidence of the ‘social turn’ in
five decades of mathematics education research.
• They tried to answer the question ‘How has the field of
mathematics education research in two top-tier journals
changed since 1968?’
• They describe topic modelling as a “somewhat analogous
to a quantitative version of grounded theory.”.
• In fact, Professor Inglis did a seminar on this in
Southampton in 2019.
Inglis, M., & Foster, C. (2018). Five decades of mathematics education research.
Journal for Research in Mathematics Education, 49(4), 462-500.
26.
27. Computational methods
can also be used to
transform audio into text,
in a methods context, for
example, this has been
used to reduce the time
spent on transcribing
audio recordings into
transcripts (Bokhove &
Downey, 2018).
Bokhove, C., & Downey, C. (2018). Automated generation of ‘good enough’ transcripts as a first step
to transcription of audio-recorded data. Methodological innovations, 11(2), 2059799118790743.
29. • Cioffi-Revilla (2017) describes how Social Network
Analysis (SNA) and social complexity are two major areas
within data science and computational research methods.
• The increasing popularity of SNA builds on the
prominence of networks in the study of social complexity.
Social media has certainly been a large catalyst in this
increase. However, networks are prevalent in many more
areas as well and has been for many decades.
30. Fictional network of eleven pre-service teachers communicating over a given timespan.
31. • This extends to visualisations of networks as well.
• Time-stamped data also incentivised new approaches to
modelling network dynamics.
• Brouwer et al. (2020) collected communication, advice-
seeking and friendship network data over four time-points
during a teacher training programme, and then used
modelling in the R package RSIENA to see how relations in
the networks were formed or dissolved.
Brouwer, J., Downey, C., & Bokhove, C. (2020). The development of communication networks
of pre-service teachers on a school-led and university-led programme of initial teacher
education in England. International Journal of Educational Research, 100, 1-13. [101542].
https://doi.org/10.1016/j.ijer.2020.101542
32. To illustrate this diversity,
Bokhove (2018) used SNA
techniques to model
classroom interaction within
mathematics lessons in a
secondary school. Social
networks can be said to be
a part of the study of social
complexity and this can also
be highly interdisciplinary
and include key concepts
from complexity science.
Bokhove, C. (2018). Exploring classroom interaction with dynamic social network
analysis. International Journal of Research & Method in Education, 41(1), 17-37.
The second lesson (R4) is a year 7 maths
lesson the area of a triangle.
34. Social simulation modelling
• Another computational development has been the use of
simulations. These can be based on existing data that are
available or even be modelled based on ‘first principles’, for
example a mathematical model.
• It is advisable to design and build simulation models around a
set of research questions, which may concern basic science or
policy analysis, sometimes both.
• Cioffi-Revilla (2017) described how another characteristic
shared by social simulation models is that they are developed
through a set of developmental stages. As mentioned before,
the large volumes of data have accelerated methodological
developments in capturing the dynamics of communication and
organizational systems.
35. Process mining • Rizvi et al. (2020)
• Analyse and model data
created in a Massive Open
Online Course (MOOC).
• They set out to answer the
question how sequences of
learning activities differ
between three groups of
learners, by analysing 2086
learners in 68 learning
activities.
• Process mining can analyse
the data created by users, for
example event logs, in the
course and provide insight into
what learners are doing.
• Simulations can also be
completely theoretical.
Rizvi, S., Rienties, B., Rogaten, J., & Kizilcec, R. F. (2020). Investigating variation in learning
processes in a FutureLearn MOOC. Journal of computing in higher education, 32(1), 162-181.
36. • Lewandowsky et al. (2019) used a
software package called NetLogo to
model how information between
experts and the public on climate
change might interact and develops
over time. NetLogo is a multi-agent
programmable modelling
environment. See
https://ccl.northwestern.edu/netlogo/
• An advantage of such simulations is
that you can introduce changes in
the models and see whether this
leads to different outcomes.
• Models’ outcomes will always
depend on its underpinning
assumptions.
Lewandowsky, S., Pilditch, T. D., Madsen, J. K., Oreskes, N., & Risbey, J. S. (2019). Influence
and seepage: An evidence-resistant minority can affect public opinion and scientific belief
formation. Cognition, 188, 124-139.
37. • Simulating what happens if a tie in a social network (e.g. a classroom) is
broken or formed.
• Simulation of answers of Likert scales in ILSA studies.
• Simulating ‘cognitive load’ in a cognitive architecture like ACT-R (e.g. see work
by Wirzberger).
• Networks of charities with (formerly) Companieshouse data.
39. Challenges(-a-plenty)
• Multidisciplinary
• Theory and practice, paradigm clash?
• Classical debates between theory driven research and data driven
research.
• Phenomenon-driven research: iterations between theory and data, with data
informing theory and theory informing data.
• Qualitative and quantitative complementary
• Interpretation of findings
• Cause-and-effect
• Reproducibility of the algorithms
• Challenges of language (e.g. double negations)
• Can be computationally intensive (hard on a laptop!)
• Coding skills and software knowledge
• but many resources available; I use R a lot.
• Structure of sources changes
• Ofsted website, Companieshouse website, PDF documents
• Ethical aspects of secondary data
40. Conclusion
We have seen that advances in technology give us new
opportunities to process and analyse all kinds of data at
scale. Computational research methods can be used
automatically extract social information, to more easily
study social networks and social complexity, and to
simulate social contexts. The methods often cut across
paradigmatic boundaries, using theory and data iteratively,
and allow for research questions to be front and centre.
However, to fully utilise these affordances it is important to
have interdisciplinary skills and an understanding of tools,
skills and the social context.
41. Thank you - Questions
• C.Bokhove@soton.ac.uk
• Southampton Education School
• Twitter: @cbokhove
• Website: www.bokhove.net
Notas do Editor
Title: Computational Social Science – what is it and what can(‘t) it do?
What is your talk about?
In Computational Social Science (CSS) we use computer science algorithms to analyse qualitative data at scale. In this talk I define CSS, describe what the opportunities and barriers are in using such methods, and give examples from published research, for example on analysing thousands of Ofsted documents.
What are the key messages of your talk?
The use of CSS methods makes it is possible to analyse some data sources at scale that previously would be unrealistic to analyse ‘by hand’.
What are the implications for practice or research from your talk?
CSS allows both more qualitative and more quantitative researchers to analyse unstructured data sources at scale.
Short Biography
Dr Christian Bokhove is an Associate Professor in Mathematics. In his research, he combines conventional qualitative and quantitative methods with novel computational methods.