Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Where does EU money go? Availability and quality of Open Data on the recipients of EU Structural Funds
1. 1
2nd
Interna*onal
EIBURS-‐TAIPS
conference
on:
“Innova&on
in
the
public
sector
and
the
development
of
e-‐services”
Where
does
EU
money
go?
Availability
and
quality
of
Open
Data
on
the
recipients
of
EU
Structural
Funds
Marco
Biage<,
Luigi
Reggi
EIBURS-‐TAIPS
team
and
Italian
Ministry
of
Economic
Development
*
luigi.reggi@tesoro.it
University
of
Urbino
April
18th,
2013
*
The
views
expressed
here
are
those
of
the
authors
and,
in
parEcular,
do
not
necessarily
reflect
those
of
the
Ministry
of
Economic
Development
2. 2
Outline
• Open Government Data and the development of public eServices
• Open Data on EU Regional Policy
• Relevant literature and research objectives
• Methodology and results
• Data collection
• Nonlinear PCA & cluster analysis: identifying Open Data strategies
• mlogit and logit models: the determinants of strategic choices
• Conclusions
3. 3
Open
Govn’t
Data
and
public
eServices
provision
Increased openness of government datasets is emerging as a desirable feature across
Europe (Davies, 2010). Open data is seen as having significant economic potential,
generating user-driven innovation (Von Hippel, 2005) based on the availability of
previously restricted information and the creation of new firms. This can lead to the
creation of new public eServices that are both effective (user-centred) and efficient
(harnessing capacity and knowledge outside government).
In particular, Open Government Data (OGD):
(a) fosters transparency and accountability of policy choices;
(b) enables the creation of new public eServices by government, civil society and
individual citizens
(c) increases the collaboration across government bodies and with citizens and
enterprises
(d) enables substantial improvements in the quality of policy making, in terms, e.g., of
quality of the spending and public value delivered;
(e) may contribute to creation of social capital through the enhancement of information
flows to and from the citizen (e.g. participation to public debates, crowdsourcing of
relevant information).
4. 4
Open
Government
Data
Defini&on:
The
8
Principles
1. Data Must Be Complete
All public data are made available. Data are electronically stored information or
recordings, including but not limited to documents, databases, transcripts, and audio/
visual recordings. Public data are data that are not subject to valid privacy, security or
privilege limitations, as governed by other statutes
2. Data Must Be Primary
Data are published as collected at the source, with the finest possible level of
granularity, not in aggregate or modified forms
3. Data Must Be Timely
Data are made available as quickly as necessary to preserve the value of the data.
4. Data Must Be Accessible
Data are available to the widest range of users for the widest range of purposes.
5. Data Must Be Machine processable
Data are reasonably structured to allow automated processing of it.
6. Access Must Be Non-Discriminatory
Data are available to anyone, with no requirement of registration.
7. Data Formats Must Be Non-Proprietary
Data are available in a format over which no entity has exclusive control.
8. Data Must Be License-free
Data are not subject to any copyright, patent, trademark or trade secret regulation.
Reasonable privacy, security and privilege restrictions may be allowed as governed by
other statutes.
5. 5
EU
Open
Data
policy
E-government action plan 2011-2015
• Improvement of Transparency
• Access to information on government laws and
regulations, policies and finance
• Re-use of Public Sector Information
The Digital Agenda for Europe
“Turning government data into
gold”
Re-use of Public Sector Information Directive (2003)
A common legislative framework regulating how public sector bodies should make their
information available for re-use in order to remove barriers such as discriminatory practices,
monopoly markets and a lack of transparency.
In December 2011, the Commission presented an Open Data Package:
1. A Communication on Open Data
2. A proposal for a revision of the Directive, which aims at opening up the market for
services based on public-sector information, by
• including new bodies in the scope of application of the Directive such as libraries
(including university libraries), museums and archives;
• limiting the fees that can be charged by the public authorities at the marginal costs
as a rule;
• introducing independent oversight over re-use rules in the Member States;
• making machine-readable formats for information held by public authorities the
norm.
3. New Commission rules on re-use of the documents it holds
6. 6
Relevant
literature
on
open
data
policy
Open
data
and
the
“invisible
hand”
Public
Value
&
Data
divide
Current
emerging
pracEce
focuses
on
the
publica*on
of
open
government
data
in
machine-‐readable
format,
possibly
through
open
standards,
so
that
the
data
can
be
easily
re-‐used
by
ciEzens,
enterprises
and
civil
society.
How
to
measure
this
effort?
Government
should only
publish data in
open, machine-
readable formats
Other scholars
think that
government should
consider different
users needs
(public value) and
provide also easy-
to-access data in
processed form
(data divide)
Brito, 2007
Robinson et al., 2009
Dawes and Helbig, 2010
Gurstein, 2011
Harrison et al, 2011
There’s a first stream of
literature focusing on the
“invisible hand” of private
sector or civil society
organizations which is able
to reuse PSI and to mash
up this information with
other sources to create
new innovative services
7. 7
Relevant
literature
on
open
data
policy
Theore*cal
framework
Source: Dawes (2010)
Stewardship
1. Metadata provision
2. Data management
3. Data standards and formats
4. Information quality and classification
Usefulness
1. Easy-to-use basic features
2. Searching and display
3. Use social media to enhance
description and use
EXAMPLES OF STEW & USEF VARIABLES:
Most voted proposals from “Evolving Data.gov with You” online dialogue
(as of April 21, 2010)
Two
complementary
principles that
need to be
balanced
8. 8
Research
objec&ves
• To explore the information-based strategies that European public agencies
are pursuing when publishing their data on the web
• To analyze the evolution of such strategies from 2010 to 2012
9. 9
Open
Government
Data
and
EU
Regional
Policy
EU Cohesion Policy represents an ideal opportunity for measuring the levels of transparency,
trustworthiness and interactivity of available open government data
• Beneficiaries of public funding are widely recognized as the open data #1 priority (Osimo,
2008)
• Cohesion Policy is the second item of EU budget: 347 billion Euros for 2007-13 period. The
purpose of cohesion policy is to reduce disparities between the levels of development of
the EU's various regions.
• Transparency of EU Structural Funds has been questioned
• On the one hand, all Member States and EU regions are involved and share common rules
and regulations, which makes data perfectly comparable.
• On the other hand, the regulations focus only on a minimum set of requirements for
publishing data on the web, which leaves room for an improvement in terms of detail,
quality, access and visualization.
“the managing authority shall be responsible for organising the
publication, electronically or otherwise, of 1. the names of the
beneficiaries, 2. the names of the operations and 3. the amount
of public funding allocated to the operations”
Structural Funds Regulation 2007-13
Art. 7 Reg. 1828 8 dic 2006
10. 10
Open
Government
Data
and
EU
Regional
Policy
The new regulations for the 2014-2020 programming period – currently under negotiation
– are stressing the need for more transparency and openness.
Art. 105 General Regulation (EC proposal)
Machine-readable format: CSV, XML
single national website or portal
Now mandatory data fields include
• Beneficiary name (only legal entities; no natural persons shall be named);
• Operation name; Operation summary;
• Operation start date & Operation end date (expected date for physical completion or full
implementation of the operation);
• Total eligible expenditure allocated to the operation;
• EU co-financing rate (as per priority axis);
• Operation postcode;
• Name of category of intervention for the operation;
• Date of last update of the list of operations.
• The headings of the data fields and the names of the operations shall be also provided
in at least one other official language of the European Union.
11. 11
Empirical
method
Web-based analysis of the lists of beneficiaries of 434 EU27 Operational Programmes
co-funded by Structural Funds
Empirical analysis:
1. Aggregating the 33 initial variables
2. Nonlinear Principal Component Analysis: reducing 33 variables to 2 main
dimensions
3. Identifying and analysing the evolution of open data strategies from 2010 to 2012
4. Exploring the determinants of the different strategies
12. 12
Data
collec&on
An ad-hoc web-based survey has been carried out into the universe of all EU OPs co-
funded by the European Regional Development Fund (ERDF) and the European Social
Fund (ESF), aiming to ascertain the presence or absence of 33 specific quality features
• All EU Countries and Regions included
• 434 Operational Programmes reviewed
[European Commission - DG Regional Policy database]
• Starting point: EC DG Regional Policy and DG Employment dedicated portals
• Three waves: Oct 2010, Oct 2011, Oct 2012
The
methodology
stems
from
the
following
studies
and
guidelines:
• Technopolis
Group:
Study
on
the
quality
of
websites
containing
lists
of
beneficiaries
of
EU
Structural
Funds
(2010)
• UK
Central
Office
of
InformaIon:
Underlying
data
publicaIon:
guidance
for
public
sector
communicators,
website
managers
and
policy
teams
(2010)
• Open
Government
Working
Group:
8
Principles
of
Open
Government
Data
(2007)
• Open
Knowledge
FoundaIon,
The
Open
Data
Manual
hSp://opendatamanual.org
• W3C:
Improving
Access
to
Government
through
BeSer
Use
of
the
Web
(2009)
• Preliminary
survey
on
prevailing
characterisIcs
(August-‐Sept
2010)
13. 13
From
33
basic
dichotomous
variables
to
8
indices
For each of the categories composing Stewardship and Usefulness in
terms of access and dissemination of data on Structural Funds’
beneficiaries, as follows the itemisation of the results attained by EU
Operational Programmes through a simple index (expressed in
percentage) resulting from the sum of the characteristics already active
versus theoretically overall “activable” characteristics
14. 14
From
33
basic
dichotomous
variables
to
8
indices
Aggregated
variables
Underlying
variables
Content
CONT
Final
Beneficiary
Project
Axis
Specific/Operat.
ObjecEves
IntervenEon
Line
Project
descripEon
Award
and
payment
dates
Project
start/end
dates
Status
(acEve/completed)
Financial
Data
FIN
Financial
value
allocated
to
the
project
Payments
EU
co-‐financing
NaEonal
co-‐financing
(or
other)
Format
=
PDF
Format
=
HTML
Format
=
XLS
or
CSV
PDF
HTML
XLSCSV
PDF
HTML
XLS
or
CSV
Informa*on
Quality
QUAL
Last
update
date
Update
frequency
Data
descripEon
Fields
descripEon
in
another
language
Number
of
clicks
from
home
page
<
3
robots.txt
does
not
prevent
search
engine
search
STEWARDSHIP
VARIABLES
15. 15
From
33
basic
dichotomous
variables
to
8
indices
Aggregated
variables
Underlying
variables
DB
consulta*on
through
masks
RIC
Search
by
Fund
type
Search
by
Project
Search
by
OP
Search
by
Axis/Object./AcEon
Search
by
Beneficiary
Search
by
Resources
Search
by
Territory/Area
Search
by
Project
status
Advanced
Func*ons
GEO
Georeferencing
through
maps
VisualisaEon
through
graphs
and
other
elaboraEons
Data
with
sub-‐regional
detail
USEFULNESS
VARIABLES
16. 16
Descrip&ve
stats
All
variables
have
increased
during
the
short
period
of
*me
considered
except
(of
course)
pdf
17. 17
Dimension
reduc&on:
Nonlinear
PCA
The eight constructed variables are categorical and metric but in no way
continuous.
We are willing to reduce the number of dimensions through “summarizing artificial
ones” and still preserve the basic (bi)linearity of a traditional multivariate technique
such as the Principal Component Analysis.
Bilinearity means that data matrix are approximated by inner products of scores and loadings.
WE ALSO WANT TO ALLOW FOR POSSIBLE NON LINEAR TRANSFORMATIONS
OF THE VARIABLES => We use NON LINEAR PCA (NLPCA)
Indeed, NLPCA should be used whenever there are rank orders made up by numerical
values but the possibility of non linear transformations that better fit the bilinear
model cannot be discarded. In other cases NLPCA can be performed together with
Multiple Correspondence Analysis (De Leeuw, 2005).
18. 18
Dimension
reduc&on:
Nonlinear
PCA
In other words, we do not only want to merely minimize the loss over scores and
loadings to assess the fit of, say, p dimensions like it is done in the PCA but also
over the admissible transformations of the columns of X (our data matrix).
Least squares loss function of PCA to be
minimized where a = component scores, b =
loading scores
Least squares loss function of NLPCA to be
minimized where a, b are the same as above
Admissible transformations of variable j. NLPCA of this kind has
been proposed for monotone transformations by Lingoes &
Guttman (1968), Kruskal & Shepard (1974). Young et al. (1978)
and Gifi (1990) extended NLPCA to wider classes of admissible
transformations beyond monotone
19. 19
Iden&fying
EU
regional
open
data
strategies
The following figures help us analyze graphically the first two underlying
dimensions of the 8 indices (variables) considered altogether.
We plot the coordinates of the variables’ loadings (black arrows), which are very
important to analyze the relations between each variable, and the coordinates of
each observation (blue little circles), that is each Operational Programme (OP)
considered.
The points represented are less than 434 because the OPs that share a common portal have the same
coordinates.
We are looking for meaningful clusters of variables (loadings) that are consistent
with current literature on open data strategies
20. 20
Iden&fying
EU
regional
open
data
strategies
2010 2011 2012
[35%]
[23%]
[38%]
[21%]
[47%]
[13%]
21. 21
Iden&fying
EU
regional
open
data
strategies
2010 & 2011
The first dimension (accounted var = 35 to 38%) helps differentiate a “regulation-centred”
approach from a proactive strategy
The second dimension (accounted var = 23 to 21%) is useful to distinguish between the
stewardship and the usefulness approach
3 different strategies
1. where DIM1 > 0 & DIM2 > 0
STEWARDSHIP STRATEGY (STEW): it implies the release of high-quality data in machine-
readable format
2. where DIM1 > 0 & DIM2 < 0
USEFULNESS STRATEGY (USEF): focused on data visualization and interactive search in
order to include non-technically oriented citizens in open data re-use and understanding
3. where DIM1 < 0
REGULATION-CENTRED STRATEGY (PDF): this strategy is about NOT being open. Little
detail, little quality, PDF format pevailing
22. 22
Iden&fying
EU
regional
open
data
strategies
2012
The first dimension (accounted var increases to 47%) helps differentiate a “regulation-
centred” approach from a proactive strategy
The second dimension accounts for much less % of total variance (13%, while the third
and fourth dimensions account for 12 and 11% respectively) and is hardly interpretable.
Some variables previously belonging to alternative proactive strategies now are highly
correlated.
For example, in 2010 a machine-readable format was associated with highly detailed
financial data on project implementation or with proper metadata and projects’ description,
while the presence of a map or of advanced search capabilities was likely where data were
presented directly in a HTML page. Now the two formats are highly correlated.
So we take into account only the first dimension to interpret the results.
We can identify only two alternative strategies, based on the 1st DIM:
1. where DIM1 > 0
MIXED PROACTIVE STRATEGY
2. where DIM1 < 0
REGULATION-CENTRED STRATEGY (PDF)
23. 23
Strategies
iden&fied:
descrip&ve
tabs
by
year
2010
2011
2012
n
%
n
%
n
%
Regulation-
centred [PDF]
255
59
Regulation-
centred [PDF]
235
54
Regulation-
centred [PDF]
233
54
Usefulness
106
24
Usefulness
120
28
Mixed
proactive
201
46
Stewardship
73
17
Stewardship
79
18
Total
434
100
Total
434
100
Total
434
100
No. of OPs by strategy adopted
24. 24
How
do
they
evolve
over
&me?
Transi&on
matrices
The majority of PDF-centered OPs are confirming their strategy. PDFs and “closed data”
are die-hard features of EU OPs!
However, from 2010 to 2012, OPs adopting the “regulation-centered” strategy (PDF) are
slightly decreasing over time. From 2010 to 2011, most of these OPs switched to the
Usefulness strategy (17.5% of OPs adopting the Usefulness strategy in 2011 have chosen
the PDF strategy back in 2010).
28. 28
Explaining
strategies:
the
independent
variables
What are the determinants of the strategic choices made by EU public
authorities?
We employ the following variables as regressors
1) centralization = presence of a centralized national website or portal, i.e. one site for
all OPs active in the Country (it changes through the 3 years: no=0 from 234 [2010] to 225
[2012], oppositely from 225 to 234 yes=1)
2) fund = EU Regional Development Fund (ERDF) or EU Social Fund (ESF) (317 EDRF
and 117 ESF)
3) financial endowment = total financial resources allocated to the OP (the only
continuous independent variable)
4) objective = 1 for Convergence objective, 2 for Competitiveness and Employment
objective, 3 for Cooperation objective, U for OPs that belongs to both Convergence and
Competitiveness objectives (161 OPs for 1, 173 for 2, 71 for 3, 29 for U)
5) naz_reg = territorial scope of the OP (71 cb= Cross border, 12 m=multiregional, 92
n=national, 258 r=regional
6) new_entries = YES if new Member States, NO if EU15 (71 missing = crossborder –
no nationality of OPs, 268 of old member states, 95 of new member states)
29. 29
Explaining
strategies:
the
technique
Clusterization showed that for 2010 and 2011 3 strategies are present. In
2012 the story is quite different. There are only 2 strategies.
Furthermore, variables used hardly change through the years. That is
why the use of non linear panel data techniques is not very informative in
our case.
WE PREFER TO USE MULTINOMIAL LOGIT (ML) FOR THE FIRST
TWO YEARS AND LOGIT (L) FOR THE LAST TO CHECK HOW
INDEPENDENT VARIABLES MOLD THE PROBABILITY OF
CHOOSING A STRATEGY.
ML => 3 STRATEGIES L => 2 STRATEGIES
Two specifications proposed: Model A with all of the OPs; Model B
with Convergence and Competitiveness OPs but without Cross-
border OPs. Model B allows us to add the variable “new entry”
which cannot be attributed to Cross-border OPs.
30. 30
Explaining
strategies:
empirical
results
2010
Base category = PDF
UsefulnessStewardship
Base categories: Centralization==0, fund=ERDF, objective=1, naz_reg
(model A)=cb | naz_reg (model B)=n, new_entries (model B)=0
31. 31
Explaining
strategies:
basic
results
-‐
2010
Centralization affects positively both proactive strategies in both
specifications. So does the fact of being a new member in model B
ESF does bad in model A for proactive strategies
Financial endowments are good for proactive strategies exclusive of
stewardship in model B. So do objective 2 programs except for stewardship
in model A.
Multiregional programs are ok for proactive strategies only in model B
Regional programs affect negatively the shift from pdf to uselfuness in
model A and positively that from PDF to stewardship in model B (so do
national for what concerns model A)
32. 32
Explaining
strategies:
results
(from
pdf
to
other)
2010
These categories
are important as LR
test shows
confirming the
Pseudo R2 when
the variable new
entry has been
taken out
This means that model B is
better specified even though
we lose CB OPs there
33. 33
Explaining
strategies:
some
predicted
probs
2010
In model B, if an OP were centralized there would be a 42% prob that a pdf
strategy were adopted, a 44% prob of adopting a usefulness strategy and a
14% prob for the stewardship strategy. But if it were adopted by a new
member state the pdf strategy would decrease to 5%, the usefulness would
go down to 15% and stewardship would increase to 80%!!
In model A If an OP were centralized there would be a 32% prob of
adopting a pdf strategy, a 41% prob that a usefulness strat were adopted
and a 27% prob for stewardship.
34. 34
Explaining
strategies:
results
(from
pdf
to
others)
2011
Base category = PDF
UsefulnessStewardship
Base categories: Centralization==0, fund=ERDF, objective=1, naz_reg
(model A)=cb | naz_reg (model B)=n, new_entries (model B)=0
35. 35
Explaining
strategies:
basic
results
-‐
2011
The specification of the model loses momentum in 2011 (Pseudo R
2 decreases for both specifications).
Even centralization – though strongly and positively correlated to
the probability of adopting proactive strategies – is a bit less so for
what concerns the shift from PDF to stewardship in model B. New
membership keeps on counting a lot.
National, regional or multiregional programs keep on being not
very informative in model A in the shift to stewardship, while national
and regional ones affect negatively the path from PDF to
usefulness.
Oppositely, in model B multiregional OPs are positively correlated
to the shifts towards proactive strategies. Again model B should be
preferred even though an analysis of CB OPs cannot be performed
(CB are by definition lacking of the variable membership).
36. 36
Explaining
strategies:
results
(from
pdf
to
other)
2011
It does not change
much in 2011
exclusive of a
decrease in the
strong significance
of the objective 2 ,
multiregional and
regional programs
Again model B with less
observation but showing
better specification
performance
37. 37
Explaining
strategies:
some
predicted
probs
2011
In model B If an OP were centralized the probabilities would not change
much wrt 2010 but. If centralization were carried out by new member states
the prob of adopting a passive strategy would be 6%, that of usefulness
would be 19%, that of stewardship 75%
In model A If an OP were centralized there would be a 31% prob of
adopting a pdf strategy, a 43% prob that a usefulness strat were adopted
and a 26% prob for stewardship (they hardly change).
38. 38
Explaining
strategies:
results
(from
pdf
to
proac&ve)
2012
Base category = PDF (remind it is a binary logit)
Proactivestrategy
Base categories: Centralization==0, fund=ERDF, objective=1, naz_reg
(model A)=cb | naz_reg (model B)=n, new_entries (model B)=0
39. 39
Explaining
strategies:
basic
results
-‐
2012
Centralization and new membership are confirmed to be the most
important determinants also on the mixed strategy.
ESF affects negatively the proactive strategy more in model A than in
model B while financial endowment affects it positively more in the former
than in the latter.
Objective 2 programs are better in the better specified model B, while
objective U are negative for proactive strategies in model A
Multinational programs are good in model B, while regional are bad for
proactive strategies in model A.
40. 40
Explaining
strategies:
some
predicted
probs
2012
In model A, were a OP centralized it would have 69% of odds of adopting a
proactive mixed strategy. In model B this prob would be 62% but it would
increase to 92%(!!!) if it were adopted by a new member state!
TO SUM UP: SENIORITY IN MEMBERSHIP AND CENTRALIZATION ARE
FOUND TO BE THE MOST IMPORTANT DETERMINANTS FOR THE
ADOPTION OF PROACTIVE STRATEGIES
41. 41
Conclusions
1. There is still a long way to go to ensure that data on EU Regional Policy are truly
transparent and re-usable for the creation of new public eServices.
A nonlinear multivariate analysis of 8 indices on the openness and transparency of 434
Operational Programmes in Europe shows that a strategy that we called “Regulation-
centered” (PDF) is prevailing (54% of total OPs adopted it in October 2012). This
strategy implies little information detail, difficult accessibility, non-machine readable
formats. Available information is limited to basic information on projects, funding and
beneficiaries
2. In 2010 and 2011 we can also identify 2 different proactive strategies:
a. a first strategy focuses on the characteristics of data quality and reusability
(content, financial data, downloadable XLS format, ease of search, update and
description), which then appear strongly inter-connected. This strategy is
therefore consistent with the Stewardship principle developed in the literature by
Dawes (2010).
b. a second strategy focuses on the characteristics that enable users to more
effectively access data published in administrations’ websites. The variables
characterising this cluster are: presence of a search mask, data geo-referencing,
and use of "pop-up" or other HTML views to display data detail on projects and
beneficiaries. This strategy is consistent with the Usefulness principle
42. 42
Conclusions
3. From October 2010 to October 2012 the strategies have evolved, leaving room for
more speculation about what kind of supply of policy data we can expect for the future.
More precisely, data suggests that the two proactive strategies have become one.
In fact, it is impossible to clearly distinguish a strategy based on re-usable formats and
detailed information from a strategy focused on letting users browse through data and
diagrams.
For example, some national or regional portals now let the users both download
the data in bulk and surf through the data right on the website. Obviously, this is
good news for researchers, data journalists and ordinary citizens. Data providers seem
to be more aware that the usefulness and stewardship principles are complementary.
4. The characteristic of the OPs that influences the most the choice of a pro-active
strategy is the presence of a centralized, national portal containing all data from the
OPs managed within the Country. This is consistent with the provisions of the
proposed new 2014-2020 General Regulation of Structural Funds.
New EU Member States tend to be more open and transparent in managing EU
funds. This choice could be explained by the greater influence that the EU Commission
can exert on local Managing Authorities.
43. Usefulness
Stewardship
Closed data
Data
quality
approach
FOCUSED ON
raw data,
advanced user,
mash-up apps
Data
visualization
approach
FOCUSED ON
processed data, non
technically-oriented
citizens
Open,
hi-‐quality,
useful
and
accessible
data
Re-‐user
centered
User
centered
RegulaEon
centered
Conclusions:
the
path
to
a
balanced
approach