SlideShare uma empresa Scribd logo
1 de 18
What is Amazon Athena ?
Athena is an ANSI-standard query tool, or interactive query service, that
works with “big data” stored in Amazon Simple Storage Service (S3).
Typical use cases supported by Amazon Athena are data science,
machine learning, visualizations, ETL, and reporting.
Since AWS Athena is serverless, this means no infrastructure to manage, and you can
tap into scalable storage on S3. This also means you only pay for the queries you run,
which benefits someone like a data analyst who wants to minimize Amazon Athena
costs.
Amazon Athena is a serverless, interactive analytics service built on open-source
frameworks, supporting open-table and file formats. Athena provides a simplified,
flexible way to analyze petabytes of data where it lives. Analyze data or build
applications from an Amazon Simple Storage Service (S3) data lake and 25+ data
sources, including on-premises data sources or other cloud systems using SQL or
AWS Athena is a serverless interactive analytics service offered by
Amazon that can be readily used to gain insights on data residing in S3.
Under to hood, Athena used a distributed SQL engine called Presto,
which is used to run the SQL queries. Presto is based on the popular
open-source technology Hive, to store structured, semi-structured and
unstructured data.
Amazon Athena is a serverless data query tool which means it is scalable
and cost-effective at the same time. Usually, customers are charged on a
pay per query basis which translates to the number of queries that are
executed on a given time period. The normal charge for scanning 1TB of
data from S3 is 5 USD.
Working with Athena
It can quickly analyze the data with the help of Amazon S3 using standard SQL. It even
does not need to load the data in Athena.
All we require to do is to point to the data in Amazon S3, define the particular schema
and start querying using the standard SQL. With the help of Amazon Athena, we can
process any of data, whether it is structured, semi-structured or unstructured data, i.e., it
can handle the data in CSV ,arrays and objects
Amazon Athena provides a simple UI.Getting started with Athena is much more
comfortable, all need to do is create a database, select the table name and specify the
location of the data on Amazon S3.
Working of AWS Athena
Amazon Athena works in direct association with the S3 data. It is used as a
distributed SQL engine for running the queries and it also uses Apache Hive
for creating and altering tables and partitions. Some of the important
standpoints needed for working with Athena include:
1.You must have an AWS Account
2.You should enable your account to export the cost and usage data into the
S3 bucket.
3.You can prepare buckets for Athena to connect.
4.AWS also creates manifest files with the use of metadata each time it writes
to the bucket. In fact, it creates a folder within the technology AWS billing data
bucket known as Athena that contains only the data.
5.For simplifying the setup, a region called the US-West-2 region can also be
used.
6.The last and final step is downloading the credentials for the new user
because the credentials help indirectly mapping to the database credentials.
Athena Benefits
Amazon Athena makes it easier to run the interactive queries against the extensive data by directly
uploading them in Amazon S3 and don’t worry about managing the infrastructure and handling the
data. Athena is best suited when we need to run the queries against some weblogs for troubleshooting
the issues in the site.
•Based on SQL: You can use Athena to run SQL queries against the desired table that is
configured in the Glue data catalogue or data sources that you can connect to using the
Athena Query Federation SDK. For users who already know SQL, there is no learning curve to
get started.
•Open architecture (no vendor lock-in): Athena enables open access to data rather than lock-in
to a specific tool or technology. This manifests itself in various ways;
•Ubiquitous Access: Because your data is stored in an S3 bucket and the schema is defined in
the Glue Data Catalog, you can switch between query engines that can read from these
sources without redefining the schema or creating a separate copy of the data.
Athena Benefits
Amazon Athena makes it easier to run the interactive queries against the extensive data by directly
uploading them in Amazon S3 and don’t worry about managing the infrastructure and handling the
data. Athena is best suited when we need to run the queries against some weblogs for troubleshooting
the issues in the site.
•Separated storage and computing resources: Athena has a complete separation of computing
and memory resources. Data is stored in your Amazon S3 account, while Amazon Web
Services provide Athena computation as a shared resource among all Athena users.
•Open file formats: Unlike many high-performance databases, Athena does not use a
proprietary file format but supports standard open source formats such as Apache Parquet,
ORC, CSV, and JSON.
•Low cost: Athena’s pricing model is based on terabytes of scanned data. You can control and
keep costs down by checking only the data you need to answer a specific query (this can be
done using data splitting – see below).
•Access to all your data: Most organizations process only 30 to 35 percent of their data into a
traditional data warehouse due to the high operational and infrastructure costs of constantly
resizing database clusters.
Speed and Performance
As Amazon Athena is serverless, which makes it quicker and easier to execute the
queries on Amazon S3 without taking care of the server and the cluster to set up or
manage. Another thing is the initialization time, in Athena, we can straight away query
the data on Amazon S3, but in Redshift, we have to wait for the cluster to get active and
once the cluster is activated, only then we are allowed to query the data.
Speed and Performance
•The optimization is limited to queries: You can optimize your questions, not your data.
However, your data is already stored in Amazon S3; performing transformations to use Athena
Athena may affect other users using the exact information for other purposes.
•Multi-tenancy means pooled resources: All Athena users receive a similar SLA for queries at
any time. In other words, the entire global user base is “competing” for the same resources –
and although AWS provides more as needed, this could mean that query performance
fluctuates depending on other people’s usage.
•No indexing: Indexes are integrated into traditional databases but do not exist in Athena. This
makes joining large tables a demanding operation that increases the load on Athena and
negatively impacts performance. For example, running a query by key requires scanning all
the data and searching for the desired key in the result list. This is solved using Upsolver
lookup tables.
•Partitioning: Efficient queries in Athena require partitioning of the data. Maintaining the
number of partitions in the park that meet your performance needs is essential. Every 500
divisions scanned will add 1 second to your query.
Which data types does Amazon Athena support?
Athena can process numerous structured and unstructured data types, including
standard data formats like CSV (comma-separated value), JSON (JavaScript Object
Notation), ORC (Optimized Row Columnar), Apache Parquet and Apache Avro. Athena
also supports compressed data in Snappy, Zlib, LZO (Lempel-Ziv-Oberhumer) and Gzip
(GNU Zip) formats.
Other examples of supported data types include:
•Boolean
•TinyIT
•SMALLINT
•Column
•VARCHAR
•CHAR
•BigInt
•WorkGroupConfigurationUpdates
•UnprocessedNamedQueryId
Feature of Athena
•Serverless
It is serverless so that the end-user does not have to worry about configuration,
infrastructure, scaling, or failure. Athena takes care of it all easily.
•Pay Per Query
Athena charges you just for the query you run which is the amount of data that gets
managed per query. You can actually save a lot if you compress the data and format it
accordingly.
•Secure
Using the IAM policies and the AWS identity, Amazon Athena offers complete control
over the data set. With the data being stored in S3 buckets the IAM policies can help in
managing control to users.
•Available
Amazon Athena is highly available and the users can execute queries round the clock.
•Machine Learning
The developers can use Amazon Sage Maker for creating and deploying the machine
learning models in Amazon Athena.
What are the limitations of Amazon Athena?
•Optimization is limited to queries. For example, data already stored in S3 cannot be
optimized.
•No indexing options. Indexing options commonly appear in traditional databases.
Without indexing, the operation load on Athena increases, potentially affecting
performance.
•Efficient queries require partitioning. In order to enable efficient queries, data must first
be partitioned. Partitions must then be managed for what best fits performance needs.
•Stored procedures, parameterized queries and Presto federated connectors are not
supported. Amazon Athena Federated Query is needed to connect data sources.
•When querying a table with thousands of partitions, Athena can time out.
•Source files that start with an underscore or a dot are treated as hidden.
•The row and column size cannot exceed 32 megabytes.
•Athena does not support querying data in S3 Glacier and S3 Glacier Deep Archive
storage classes.
Summary
Athena is a service offered by Amazon that is an interactive query service. Athena makes it
easy for the user to directly analyze data in Amazon S3 (Simple Storage Service) using
standard SQL. For example, in the Amazon Management Console, it can be set to point to
where data is stored in Amazon S3 with a few clicks of a button. SQL can then be used to run
ad-hoc queries, bringing the result to the user in seconds.
•It does not store data. Instead, storage is managed entirely on Amazon S3. The Athena
query service is fully managed, so resources are automatically allocated by AWS as needed
to execute a query.
•Because your data is stored in an S3 bucket and the schema is defined in the Glue Data
Catalog, you can switch between query engines that can read from these sources without
redefining the schema or creating a separate copy of the data.
•As one of the best serverless architectures, Amazon Athena makes data queries
easy to use, set up and fast to run. In fact, the pay-per-use model of Athena makes
the entire thing affordable to run analytics. Moreover, since Athena works with
Amazon S3 and comes with great scalability, reliability, and durability, this is one of
the best suites to run analytics workloads.
THANK YOU
Like the Video and Subscribe the Channel

Mais conteúdo relacionado

Semelhante a What is Amazon Athena

Los Angeles AWS Users Group - Athena Deep Dive
Los Angeles AWS Users Group - Athena Deep DiveLos Angeles AWS Users Group - Athena Deep Dive
Los Angeles AWS Users Group - Athena Deep DiveKevin Epstein
 
Denver AWS Users' Group meeting - September 2017
Denver AWS Users' Group meeting - September 2017Denver AWS Users' Group meeting - September 2017
Denver AWS Users' Group meeting - September 2017David McDaniel
 
Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview Amazon Web Services
 
Aws Atlanta meetup Amazon Athena
Aws Atlanta meetup Amazon AthenaAws Atlanta meetup Amazon Athena
Aws Atlanta meetup Amazon AthenaAdam Book
 
Interactive Analytics on AWS - AWS Summit Tel Aviv 2017
Interactive Analytics on AWS - AWS Summit Tel Aviv 2017Interactive Analytics on AWS - AWS Summit Tel Aviv 2017
Interactive Analytics on AWS - AWS Summit Tel Aviv 2017Amazon Web Services
 
Serverlesss Big Data Analytics with Amazon Athena and Quicksight
Serverlesss Big Data Analytics with Amazon Athena and QuicksightServerlesss Big Data Analytics with Amazon Athena and Quicksight
Serverlesss Big Data Analytics with Amazon Athena and QuicksightAmazon Web Services
 
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...Amazon Web Services
 
Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2Amazon Web Services
 
Serverless Big Data Analytics with Amazon Athena and QuickSight
Serverless Big Data Analytics with Amazon Athena and QuickSightServerless Big Data Analytics with Amazon Athena and QuickSight
Serverless Big Data Analytics with Amazon Athena and QuickSightAmazon Web Services
 
From Data Collection to Actionable Insights in 60 Seconds: AWS Developer Work...
From Data Collection to Actionable Insights in 60 Seconds: AWS Developer Work...From Data Collection to Actionable Insights in 60 Seconds: AWS Developer Work...
From Data Collection to Actionable Insights in 60 Seconds: AWS Developer Work...Amazon Web Services
 
AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS Amazon Web Services
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWSAmazon Web Services
 
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...Amazon Web Services
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSAmazon Web Services
 
Building Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudBuilding Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudAmazon Web Services
 
What's New with Big Data Analytics
What's New with Big Data AnalyticsWhat's New with Big Data Analytics
What's New with Big Data AnalyticsAmazon Web Services
 
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...Rukmani Gopalan
 

Semelhante a What is Amazon Athena (20)

Los Angeles AWS Users Group - Athena Deep Dive
Los Angeles AWS Users Group - Athena Deep DiveLos Angeles AWS Users Group - Athena Deep Dive
Los Angeles AWS Users Group - Athena Deep Dive
 
Denver AWS Users' Group meeting - September 2017
Denver AWS Users' Group meeting - September 2017Denver AWS Users' Group meeting - September 2017
Denver AWS Users' Group meeting - September 2017
 
Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
Aws Atlanta meetup Amazon Athena
Aws Atlanta meetup Amazon AthenaAws Atlanta meetup Amazon Athena
Aws Atlanta meetup Amazon Athena
 
Interactive Analytics on AWS - AWS Summit Tel Aviv 2017
Interactive Analytics on AWS - AWS Summit Tel Aviv 2017Interactive Analytics on AWS - AWS Summit Tel Aviv 2017
Interactive Analytics on AWS - AWS Summit Tel Aviv 2017
 
Serverlesss Big Data Analytics with Amazon Athena and Quicksight
Serverlesss Big Data Analytics with Amazon Athena and QuicksightServerlesss Big Data Analytics with Amazon Athena and Quicksight
Serverlesss Big Data Analytics with Amazon Athena and Quicksight
 
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
 
Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2
 
Serverless Big Data Analytics with Amazon Athena and QuickSight
Serverless Big Data Analytics with Amazon Athena and QuickSightServerless Big Data Analytics with Amazon Athena and QuickSight
Serverless Big Data Analytics with Amazon Athena and QuickSight
 
From Data Collection to Actionable Insights in 60 Seconds: AWS Developer Work...
From Data Collection to Actionable Insights in 60 Seconds: AWS Developer Work...From Data Collection to Actionable Insights in 60 Seconds: AWS Developer Work...
From Data Collection to Actionable Insights in 60 Seconds: AWS Developer Work...
 
AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWS
 
Building Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudBuilding Data Lakes in the AWS Cloud
Building Data Lakes in the AWS Cloud
 
What's New with Big Data Analytics
What's New with Big Data AnalyticsWhat's New with Big Data Analytics
What's New with Big Data Analytics
 
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
 
What is AWS Glue
What is AWS GlueWhat is AWS Glue
What is AWS Glue
 

Mais de jeetendra mandal

Eventual consistency vs Strong consistency what is the difference
Eventual consistency vs Strong consistency what is the differenceEventual consistency vs Strong consistency what is the difference
Eventual consistency vs Strong consistency what is the differencejeetendra mandal
 
Batch Processing vs Stream Processing Difference
Batch Processing vs Stream Processing DifferenceBatch Processing vs Stream Processing Difference
Batch Processing vs Stream Processing Differencejeetendra mandal
 
Difference between Database vs Data Warehouse vs Data Lake
Difference between Database vs Data Warehouse vs Data LakeDifference between Database vs Data Warehouse vs Data Lake
Difference between Database vs Data Warehouse vs Data Lakejeetendra mandal
 
Difference between Client Polling vs Server Push vs Websocket vs Long Polling
Difference between Client Polling vs Server Push vs Websocket vs Long PollingDifference between Client Polling vs Server Push vs Websocket vs Long Polling
Difference between Client Polling vs Server Push vs Websocket vs Long Pollingjeetendra mandal
 
Difference between TLS 1.2 vs TLS 1.3 and tutorial of TLS2 and TLS2 version c...
Difference between TLS 1.2 vs TLS 1.3 and tutorial of TLS2 and TLS2 version c...Difference between TLS 1.2 vs TLS 1.3 and tutorial of TLS2 and TLS2 version c...
Difference between TLS 1.2 vs TLS 1.3 and tutorial of TLS2 and TLS2 version c...jeetendra mandal
 
Difference Program vs Process vs Thread
Difference Program vs Process vs ThreadDifference Program vs Process vs Thread
Difference Program vs Process vs Threadjeetendra mandal
 
Carrier Advice for a JAVA Developer How to Become a Java Programmer
Carrier Advice for a JAVA Developer How to Become a Java ProgrammerCarrier Advice for a JAVA Developer How to Become a Java Programmer
Carrier Advice for a JAVA Developer How to Become a Java Programmerjeetendra mandal
 
How to become a Software Tester Carrier Path for Software Quality Tester
How to become a Software Tester Carrier Path for Software Quality TesterHow to become a Software Tester Carrier Path for Software Quality Tester
How to become a Software Tester Carrier Path for Software Quality Testerjeetendra mandal
 
How to become a Software Engineer Carrier Path for Software Developer
How to become a Software Engineer Carrier Path for Software DeveloperHow to become a Software Engineer Carrier Path for Software Developer
How to become a Software Engineer Carrier Path for Software Developerjeetendra mandal
 
Microservice Architecture Software Architecture Microservice Design Pattern
Microservice Architecture Software Architecture Microservice Design PatternMicroservice Architecture Software Architecture Microservice Design Pattern
Microservice Architecture Software Architecture Microservice Design Patternjeetendra mandal
 
Event Driven Software Architecture Pattern
Event Driven Software Architecture PatternEvent Driven Software Architecture Pattern
Event Driven Software Architecture Patternjeetendra mandal
 
Top 5 Software Architecture Pattern Event Driven SOA Microservice Serverless ...
Top 5 Software Architecture Pattern Event Driven SOA Microservice Serverless ...Top 5 Software Architecture Pattern Event Driven SOA Microservice Serverless ...
Top 5 Software Architecture Pattern Event Driven SOA Microservice Serverless ...jeetendra mandal
 
Observability vs APM vs Monitoring Comparison
Observability vs APM vs  Monitoring ComparisonObservability vs APM vs  Monitoring Comparison
Observability vs APM vs Monitoring Comparisonjeetendra mandal
 
Disaster Recovery vs Data Backup what is the difference
Disaster Recovery vs Data Backup what is the differenceDisaster Recovery vs Data Backup what is the difference
Disaster Recovery vs Data Backup what is the differencejeetendra mandal
 
What is Spinnaker? Spinnaker tutorial
What is Spinnaker? Spinnaker tutorialWhat is Spinnaker? Spinnaker tutorial
What is Spinnaker? Spinnaker tutorialjeetendra mandal
 
Difference between Github vs Gitlab vs Bitbucket
Difference between Github vs Gitlab vs BitbucketDifference between Github vs Gitlab vs Bitbucket
Difference between Github vs Gitlab vs Bitbucketjeetendra mandal
 

Mais de jeetendra mandal (20)

what is OSI model
what is OSI modelwhat is OSI model
what is OSI model
 
What is AWS Cloud Watch
What is AWS Cloud WatchWhat is AWS Cloud Watch
What is AWS Cloud Watch
 
What is AWS Fargate
What is AWS FargateWhat is AWS Fargate
What is AWS Fargate
 
Eventual consistency vs Strong consistency what is the difference
Eventual consistency vs Strong consistency what is the differenceEventual consistency vs Strong consistency what is the difference
Eventual consistency vs Strong consistency what is the difference
 
Batch Processing vs Stream Processing Difference
Batch Processing vs Stream Processing DifferenceBatch Processing vs Stream Processing Difference
Batch Processing vs Stream Processing Difference
 
Difference between Database vs Data Warehouse vs Data Lake
Difference between Database vs Data Warehouse vs Data LakeDifference between Database vs Data Warehouse vs Data Lake
Difference between Database vs Data Warehouse vs Data Lake
 
Difference between Client Polling vs Server Push vs Websocket vs Long Polling
Difference between Client Polling vs Server Push vs Websocket vs Long PollingDifference between Client Polling vs Server Push vs Websocket vs Long Polling
Difference between Client Polling vs Server Push vs Websocket vs Long Polling
 
Difference between TLS 1.2 vs TLS 1.3 and tutorial of TLS2 and TLS2 version c...
Difference between TLS 1.2 vs TLS 1.3 and tutorial of TLS2 and TLS2 version c...Difference between TLS 1.2 vs TLS 1.3 and tutorial of TLS2 and TLS2 version c...
Difference between TLS 1.2 vs TLS 1.3 and tutorial of TLS2 and TLS2 version c...
 
Difference Program vs Process vs Thread
Difference Program vs Process vs ThreadDifference Program vs Process vs Thread
Difference Program vs Process vs Thread
 
Carrier Advice for a JAVA Developer How to Become a Java Programmer
Carrier Advice for a JAVA Developer How to Become a Java ProgrammerCarrier Advice for a JAVA Developer How to Become a Java Programmer
Carrier Advice for a JAVA Developer How to Become a Java Programmer
 
How to become a Software Tester Carrier Path for Software Quality Tester
How to become a Software Tester Carrier Path for Software Quality TesterHow to become a Software Tester Carrier Path for Software Quality Tester
How to become a Software Tester Carrier Path for Software Quality Tester
 
How to become a Software Engineer Carrier Path for Software Developer
How to become a Software Engineer Carrier Path for Software DeveloperHow to become a Software Engineer Carrier Path for Software Developer
How to become a Software Engineer Carrier Path for Software Developer
 
Events vs Notifications
Events vs NotificationsEvents vs Notifications
Events vs Notifications
 
Microservice Architecture Software Architecture Microservice Design Pattern
Microservice Architecture Software Architecture Microservice Design PatternMicroservice Architecture Software Architecture Microservice Design Pattern
Microservice Architecture Software Architecture Microservice Design Pattern
 
Event Driven Software Architecture Pattern
Event Driven Software Architecture PatternEvent Driven Software Architecture Pattern
Event Driven Software Architecture Pattern
 
Top 5 Software Architecture Pattern Event Driven SOA Microservice Serverless ...
Top 5 Software Architecture Pattern Event Driven SOA Microservice Serverless ...Top 5 Software Architecture Pattern Event Driven SOA Microservice Serverless ...
Top 5 Software Architecture Pattern Event Driven SOA Microservice Serverless ...
 
Observability vs APM vs Monitoring Comparison
Observability vs APM vs  Monitoring ComparisonObservability vs APM vs  Monitoring Comparison
Observability vs APM vs Monitoring Comparison
 
Disaster Recovery vs Data Backup what is the difference
Disaster Recovery vs Data Backup what is the differenceDisaster Recovery vs Data Backup what is the difference
Disaster Recovery vs Data Backup what is the difference
 
What is Spinnaker? Spinnaker tutorial
What is Spinnaker? Spinnaker tutorialWhat is Spinnaker? Spinnaker tutorial
What is Spinnaker? Spinnaker tutorial
 
Difference between Github vs Gitlab vs Bitbucket
Difference between Github vs Gitlab vs BitbucketDifference between Github vs Gitlab vs Bitbucket
Difference between Github vs Gitlab vs Bitbucket
 

Último

CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 

Último (20)

CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 

What is Amazon Athena

  • 1.
  • 2. What is Amazon Athena ? Athena is an ANSI-standard query tool, or interactive query service, that works with “big data” stored in Amazon Simple Storage Service (S3). Typical use cases supported by Amazon Athena are data science, machine learning, visualizations, ETL, and reporting. Since AWS Athena is serverless, this means no infrastructure to manage, and you can tap into scalable storage on S3. This also means you only pay for the queries you run, which benefits someone like a data analyst who wants to minimize Amazon Athena costs. Amazon Athena is a serverless, interactive analytics service built on open-source frameworks, supporting open-table and file formats. Athena provides a simplified, flexible way to analyze petabytes of data where it lives. Analyze data or build applications from an Amazon Simple Storage Service (S3) data lake and 25+ data sources, including on-premises data sources or other cloud systems using SQL or
  • 3. AWS Athena is a serverless interactive analytics service offered by Amazon that can be readily used to gain insights on data residing in S3. Under to hood, Athena used a distributed SQL engine called Presto, which is used to run the SQL queries. Presto is based on the popular open-source technology Hive, to store structured, semi-structured and unstructured data.
  • 4. Amazon Athena is a serverless data query tool which means it is scalable and cost-effective at the same time. Usually, customers are charged on a pay per query basis which translates to the number of queries that are executed on a given time period. The normal charge for scanning 1TB of data from S3 is 5 USD.
  • 5.
  • 6. Working with Athena It can quickly analyze the data with the help of Amazon S3 using standard SQL. It even does not need to load the data in Athena. All we require to do is to point to the data in Amazon S3, define the particular schema and start querying using the standard SQL. With the help of Amazon Athena, we can process any of data, whether it is structured, semi-structured or unstructured data, i.e., it can handle the data in CSV ,arrays and objects Amazon Athena provides a simple UI.Getting started with Athena is much more comfortable, all need to do is create a database, select the table name and specify the location of the data on Amazon S3.
  • 7. Working of AWS Athena Amazon Athena works in direct association with the S3 data. It is used as a distributed SQL engine for running the queries and it also uses Apache Hive for creating and altering tables and partitions. Some of the important standpoints needed for working with Athena include: 1.You must have an AWS Account 2.You should enable your account to export the cost and usage data into the S3 bucket. 3.You can prepare buckets for Athena to connect. 4.AWS also creates manifest files with the use of metadata each time it writes to the bucket. In fact, it creates a folder within the technology AWS billing data bucket known as Athena that contains only the data. 5.For simplifying the setup, a region called the US-West-2 region can also be used. 6.The last and final step is downloading the credentials for the new user because the credentials help indirectly mapping to the database credentials.
  • 8. Athena Benefits Amazon Athena makes it easier to run the interactive queries against the extensive data by directly uploading them in Amazon S3 and don’t worry about managing the infrastructure and handling the data. Athena is best suited when we need to run the queries against some weblogs for troubleshooting the issues in the site. •Based on SQL: You can use Athena to run SQL queries against the desired table that is configured in the Glue data catalogue or data sources that you can connect to using the Athena Query Federation SDK. For users who already know SQL, there is no learning curve to get started. •Open architecture (no vendor lock-in): Athena enables open access to data rather than lock-in to a specific tool or technology. This manifests itself in various ways; •Ubiquitous Access: Because your data is stored in an S3 bucket and the schema is defined in the Glue Data Catalog, you can switch between query engines that can read from these sources without redefining the schema or creating a separate copy of the data.
  • 9. Athena Benefits Amazon Athena makes it easier to run the interactive queries against the extensive data by directly uploading them in Amazon S3 and don’t worry about managing the infrastructure and handling the data. Athena is best suited when we need to run the queries against some weblogs for troubleshooting the issues in the site. •Separated storage and computing resources: Athena has a complete separation of computing and memory resources. Data is stored in your Amazon S3 account, while Amazon Web Services provide Athena computation as a shared resource among all Athena users. •Open file formats: Unlike many high-performance databases, Athena does not use a proprietary file format but supports standard open source formats such as Apache Parquet, ORC, CSV, and JSON. •Low cost: Athena’s pricing model is based on terabytes of scanned data. You can control and keep costs down by checking only the data you need to answer a specific query (this can be done using data splitting – see below). •Access to all your data: Most organizations process only 30 to 35 percent of their data into a traditional data warehouse due to the high operational and infrastructure costs of constantly resizing database clusters.
  • 10. Speed and Performance As Amazon Athena is serverless, which makes it quicker and easier to execute the queries on Amazon S3 without taking care of the server and the cluster to set up or manage. Another thing is the initialization time, in Athena, we can straight away query the data on Amazon S3, but in Redshift, we have to wait for the cluster to get active and once the cluster is activated, only then we are allowed to query the data.
  • 11.
  • 12. Speed and Performance •The optimization is limited to queries: You can optimize your questions, not your data. However, your data is already stored in Amazon S3; performing transformations to use Athena Athena may affect other users using the exact information for other purposes. •Multi-tenancy means pooled resources: All Athena users receive a similar SLA for queries at any time. In other words, the entire global user base is “competing” for the same resources – and although AWS provides more as needed, this could mean that query performance fluctuates depending on other people’s usage. •No indexing: Indexes are integrated into traditional databases but do not exist in Athena. This makes joining large tables a demanding operation that increases the load on Athena and negatively impacts performance. For example, running a query by key requires scanning all the data and searching for the desired key in the result list. This is solved using Upsolver lookup tables. •Partitioning: Efficient queries in Athena require partitioning of the data. Maintaining the number of partitions in the park that meet your performance needs is essential. Every 500 divisions scanned will add 1 second to your query.
  • 13.
  • 14. Which data types does Amazon Athena support? Athena can process numerous structured and unstructured data types, including standard data formats like CSV (comma-separated value), JSON (JavaScript Object Notation), ORC (Optimized Row Columnar), Apache Parquet and Apache Avro. Athena also supports compressed data in Snappy, Zlib, LZO (Lempel-Ziv-Oberhumer) and Gzip (GNU Zip) formats. Other examples of supported data types include: •Boolean •TinyIT •SMALLINT •Column •VARCHAR •CHAR •BigInt •WorkGroupConfigurationUpdates •UnprocessedNamedQueryId
  • 15. Feature of Athena •Serverless It is serverless so that the end-user does not have to worry about configuration, infrastructure, scaling, or failure. Athena takes care of it all easily. •Pay Per Query Athena charges you just for the query you run which is the amount of data that gets managed per query. You can actually save a lot if you compress the data and format it accordingly. •Secure Using the IAM policies and the AWS identity, Amazon Athena offers complete control over the data set. With the data being stored in S3 buckets the IAM policies can help in managing control to users. •Available Amazon Athena is highly available and the users can execute queries round the clock. •Machine Learning The developers can use Amazon Sage Maker for creating and deploying the machine learning models in Amazon Athena.
  • 16. What are the limitations of Amazon Athena? •Optimization is limited to queries. For example, data already stored in S3 cannot be optimized. •No indexing options. Indexing options commonly appear in traditional databases. Without indexing, the operation load on Athena increases, potentially affecting performance. •Efficient queries require partitioning. In order to enable efficient queries, data must first be partitioned. Partitions must then be managed for what best fits performance needs. •Stored procedures, parameterized queries and Presto federated connectors are not supported. Amazon Athena Federated Query is needed to connect data sources. •When querying a table with thousands of partitions, Athena can time out. •Source files that start with an underscore or a dot are treated as hidden. •The row and column size cannot exceed 32 megabytes. •Athena does not support querying data in S3 Glacier and S3 Glacier Deep Archive storage classes.
  • 17. Summary Athena is a service offered by Amazon that is an interactive query service. Athena makes it easy for the user to directly analyze data in Amazon S3 (Simple Storage Service) using standard SQL. For example, in the Amazon Management Console, it can be set to point to where data is stored in Amazon S3 with a few clicks of a button. SQL can then be used to run ad-hoc queries, bringing the result to the user in seconds. •It does not store data. Instead, storage is managed entirely on Amazon S3. The Athena query service is fully managed, so resources are automatically allocated by AWS as needed to execute a query. •Because your data is stored in an S3 bucket and the schema is defined in the Glue Data Catalog, you can switch between query engines that can read from these sources without redefining the schema or creating a separate copy of the data. •As one of the best serverless architectures, Amazon Athena makes data queries easy to use, set up and fast to run. In fact, the pay-per-use model of Athena makes the entire thing affordable to run analytics. Moreover, since Athena works with Amazon S3 and comes with great scalability, reliability, and durability, this is one of the best suites to run analytics workloads.
  • 18. THANK YOU Like the Video and Subscribe the Channel