2016 SDMX Experts meeting, Using SDMX to enable data-sharing for analytical and research purposes: The Bundesbank’s House of Micro Data/ Using SDMX for exchanging complex and large micro data sets, Deutsche Bundesbank
Jisc research data shared service overview IDCC 2016
Semelhante a 2016 SDMX Experts meeting, Using SDMX to enable data-sharing for analytical and research purposes: The Bundesbank’s House of Micro Data/ Using SDMX for exchanging complex and large micro data sets, Deutsche Bundesbank
Semelhante a 2016 SDMX Experts meeting, Using SDMX to enable data-sharing for analytical and research purposes: The Bundesbank’s House of Micro Data/ Using SDMX for exchanging complex and large micro data sets, Deutsche Bundesbank (20)
Chapter 1 - Introduction to Data Mining Concepts and Techniques.pptx
2016 SDMX Experts meeting, Using SDMX to enable data-sharing for analytical and research purposes: The Bundesbank’s House of Micro Data/ Using SDMX for exchanging complex and large micro data sets, Deutsche Bundesbank
1. Using SDMX to enable data-sharing for analytical and research
purposes: The Bundesbank’s House of Micro Data / Using
SDMX for exchanging complex and large micro data sets
Marta Salvador Villà / Jürgen Krelaus
Statistical Information Management Dept. – Deutsche Bundesbank
Expert Meeting on SDMX, Aguascalientes (Mexico), 19 October 2016
19 October 2016
Statistics Department, Deutsche Bundesbank
Page 1
2. Contents
❙ Two (at first sight) independent topics of this contribution
1. Bundesbank‘s House of Microdata
2. Large and complex micro data sets (for data reception)
❙ Interconnected by the statistical business process
and the increasing role of SDMX
19 October 2016
Seite 2
Statistics Department, Deutsche Bundesbank
3. The statistical business process –
Main steps of the GSBPM (simplified)
Data collection Processing Analysis Dissemination
19 October 2016
Page 3
Planning and
design
Statistics Department, Deutsche Bundesbank
Details see e.g. http://www1.unece.org/stat/platform/display/GSBPM
4. Central statistics infrastr.Primary statistical systems
The statistical business process –
Implementation at Bundesbank („classical“ approach)
File reception
Fix reporting /
Ad-hoc Analyses
Data Quality
management
Final data
Exchange/
Publication
19 October 2016
Page 4
Statistics Department, Deutsche Bundesbank
(SDMX)
Clean
copies for
results
Various
systems/
technologies/
formats
Data model
based on
SDMX
5. House of MicrodataPrimary statistical systems
The statistical business process –
Implementation at Bundesbank (micro-data approach)
File reception
Analysis /
Internal research
Data Quality
management
Dissemination /
External Research
19 October 2016
Page 5
Statistics Department, Deutsche Bundesbank
(SDMX)
Clean
copies for
microdata
Various
systems/
technologies/
formats
Data model
based on
SDMX
6. The role of SDMX in
Bundesbank‘s House of Microdata
19 October 2016
Statistics Department, Deutsche Bundesbank
Page 6
7. House of Microdata – IMIDIAS initiative
Microdata Hub in the Bundesbank
(External) researchers and analysts
❙ Focus on understanding complex interactions
between actors/sectors and market segments as
well as the interaction between the financial sector
and real economy
❙ Have restricted access through the RDSC
(research data service center) (e.g. anonymisation
aspects)
Internal macroprudential analysts
❙ are becoming increasingly important and new core
business of the Bank
❙ make high demands in terms of horizontal and
vertical consistency, granularity and comparability
❙ Have full access via the HoM (House of Microdata)
19 October 2016
Seite 7
Statistics Department, Deutsche Bundesbank
8. Steering Comittee
House of Microdata
HoM
Research
Data Service Centre
RDSC
Data experts
in business areas
House of Microdata –
Value Added for Analysts and Researchers
Data services
Analysis services
Direct access
Information
Analysts
and
Researchers
Research and analysis files(Selected) HoM-Data
Selected process data
No direct access for
external researchers
(for external researchers via the RDSC)
Clean copies
!
19 October 2016
Statistics Department, Deutsche Bundesbank
Seite 8
9. House of Microdata –
SDMX and the central statistics infrastructure
❙ The Bundesbank has been using SDMX as a basis (input and output format) for
its central statistics infrastructure since 2003, thus a comprehensive statistical
toolset for operational and analytical tasks exists
❙ By using the SDMX metamodel, the central statistics infrastructure is open
and can also be used for micro data and data pools outside the Statistics
Department (given the scalability of the system)
❙ It can largely be used independently of the Statistics Department and without
technical (software implementing) support
❙ The standardised interfaces mean that these users can employ their own
evaluation instruments rather than the statistical toolset
❙ In addition, a growing number of software products can be used with SDMX
19 October 2016
Page 9
Statistics Department, Deutsche Bundesbank
10. House of Microdata –
step-wise data Integration „Content projects“
19 October 2016
Seite 10
Statistics Department, Deutsche Bundesbank
How to choose the relevant data?
1. Inventory list of existing micro data sets at Bundesbank (usually in primary statistical systems)
2. Ranking based on: relevance, sustainability, legal requirements, costs
3. Selection of top 12 as first contents
4. Initiating a content project for each content
5. Main tasks in each content project
• Classify the micro-data by creating the relevant DSDs
• Implement output interfaces/processes to the central statistical infrastructure (IT-Project)
• Extract the data on a regular base
6. (If needed) implement special-purpose analysis tools based on the SDMX model to fulfil special needs
on the respective primary statistical data
11. The role of SDMX for exchanging complex and large micro data
sets
19 October 2016
Seite 11
Statistics Department, Deutsche Bundesbank
12. House of MicrodataPrimary statistical systems
The statistical business process –
Looking at the data reception step…
File reception
Analysis /
Internal research
Data Quality
management
Dissemination /
External Research
19 October 2016
Page 12
Statistics Department, Deutsche Bundesbank
So far: reporting
with generic formats, or in other
standards …
Let‘s try SDMX at the fírst opportunity!
13. Promising candidate for SDMX reporting –
AnaCredit
❙ AnaCredit: ESCB project to establish a database for granular loan-by-loan credit
data.
❙ Approach: primary/secondary reporting, ECB receives data via NCBs
❙ Current state: begin of implementation phase at Bundesbank
19 October 2016
Seite 13
Statistics Department, Deutsche Bundesbank
14. AnaCredit reporting tables –
Rather complex relational model
19 October 2016
Seite 14
Statistics Department, Deutsche Bundesbank
15. Mapping of AnaCredit data to SDMX –
ECB approach for secondary reporting
❙ Each reporting table is mapped to a SDMX structure (“cube”). Conceptually,
keys and values are determined by the referential integrity of the tables
❙ SMDX 2.1 allows for the flat format, keys and values are not distinguished, only
“attributes” exist. Short code snippet:
❙ This structure is suitable for large data sets, overhead compared to a CSV
structure seems negligible (factor of 2 to 3 depending on the length of the
attributes names)
19 October 2016
Seite 15
Statistics Department, Deutsche Bundesbank
16. Mapping of AnaCredit data to SDMX –
Bundesbank‘s approach for primary reporting
❙ In contrast to other EU members, Bundesbank has to develop a new
primary statistical system for AnaCredit
❙ Therefore: Green-field approach for the reporting format possible
❙ Project decision in line with SDMX strategy of the statistical department:
❙ Reuse the ECB SDMX 2.1 flat format for primary reporting as far as
possible!
19 October 2016
Seite 16
Statistics Department, Deutsche Bundesbank
17. House of MicrodataPrimary statistical system
The statistical business process
for Bundesbank‘s AnaCredit project…
File reception
Analysis /
Internal research
Data Quality
management
Dissemination /
External Research
19 October 2016
Page 17
Statistics Department, Deutsche Bundesbank
… will at almost all interfaces largely based on SDMX
18. Conclusion
❙ The SDMX-based House of Microdata provides bank wide data
integration and harmonisation by extending the central statistics
infrastructure to micro data and supplying (standard) tools for
analyses
❙ SDMX as reporting standard for large micro data sets: first business
case at the Bundesbank will be AnaCredit
❙ Vision 1: Full penetration of SDMX in the statistical business process
❙ Vision 2: Extending the AnaCredit approach via the ESCB
SDD/ERF/BIRD initiatives to other primary statistics
19 October 2016
Seite 18
Statistics Department, Deutsche Bundesbank
19. Thank you very much!
Questions?
19 October 2016
Seite 19
Statistics Department, Deutsche Bundesbank