O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

10 Steps for High Quality Datasets

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Carregando em…3
×

Confira estes a seguir

1 de 1 Anúncio

10 Steps for High Quality Datasets

Baixar para ler offline

10 Steps for High-Quality Datasets: a "Divide et impera" approach to #data.

I want to share my experience in producing high-quality data sets for data analysts.

How can #standardization improve #dataquality of #datasets for #dataanalysis?

10 Steps for High-Quality Datasets: a "Divide et impera" approach to #data.

I want to share my experience in producing high-quality data sets for data analysts.

How can #standardization improve #dataquality of #datasets for #dataanalysis?

Anúncio
Anúncio

Mais Conteúdo rRelacionado

Semelhante a 10 Steps for High Quality Datasets (20)

Anúncio

Mais recentes (20)

10 Steps for High Quality Datasets

  1. 1. 10 STEPS FOR HIGH-QUALITY DATASETS BY PIER GIUSEPPE DE MEO #1 Keep your Datasets separate. #2 Prepare a toolbox with a set of transformation processes (procedures, functions, scripts, etc.) that can be reused. #3 Logically group the types of transformations, based on categories (e.g. missing values, decodes, normalization, etc.). #4 For every category identified, select a subset of data in a Dataset on which to apply this type of transformation: repeat this process on all your Datasets separately. #5 For every Dataset, if needed, enrich the data contained with other derived information (e.g. calculated field, extraction of sub-information, etc.). #6 Define the minimum level of details shared across all Datasets (e.g. single transaction per day, groups of transactions per month, etc.). #7 For every Dataset, groups data at the same level of granularity. #8 Join all formatted Datasets in a single Master Dataset, based on granularity defined. #9 In the Master Dataset produced, check whether there exists a subset of data on which to apply any of the transformations in the toolbox. #10 In the Master Dataset produced, if needed, enrich the data with some extra information (e.g. metrics from various Datasets combined to form a KPI, decryption based on a combination of fields, etc.). Knowledge Share Series 1 DATASETS A "Divide et impera" approach in producing high-quality Datasets for data analysts.

×