As data volumes grow and customers store more data on AWS, they often have valuable data that is not easily discoverable and available for analytics. The AWS Glue Data Catalog provides a central view of your data lake, making data readily available for analytics. We introduce key features of the AWS Glue Data Catalog and its use cases. Learn how crawlers can automatically discover your data, extract relevant metadata, and add it as table definitions to the AWS Glue Data Catalog. We will also explore the integration between AWS Glue Data Catalog and Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum.
4. Why is this a new problem?
Web and mobile
data
Logs
Social Media data
Streaming data IOT data
Spreadsheets
Structured data
Unstructured and Semi-structured data
Dark data
5. Data Volume
The Data Gap
1990 2000 2010 2020
Generated Data
Available for Analysis
Dark data challenge
6. Multiple consumers and requirements
Data duplication
Data Scientists
Analysts
Business Users
Applications
Agile Real time
Flexible Scale