Amazon Athena is a serverless, interactive analytics service constructed on open-source frameworks, supporting open-table and file codecs.
gives a simplified, versatile strategy to analyze petabytes of knowledge in an S3 information lake and 30 information sources, together with on-premises information sources or different cloud techniques utilizing SQL or Python with out loading the information.
is constructed on open-source Trino and Presto engines and Apache Spark frameworks, with no provisioning or configuration effort required.
is extremely obtainable and runs queries utilizing compute sources throughout a number of services, robotically routing queries appropriately if a selected facility is unreachable
can course of unstructured, semi-structured, and structured datasets.
integrates with QuickSight for visualizing the information or creating dashboards.
helps numerous normal information codecs, together with CSV, TSV, JSON, ORC, Avro, and Parquet.
helps compressed information in Snappy, Zlib, LZO, and GZIP codecs. You may enhance efficiency and scale back prices by compressing, partitioning, and utilizing columnar codecs.
can deal with complicated evaluation, together with giant joins, window features, and arrays
makes use of a managed Glue Knowledge Catalog to retailer info and schemas in regards to the databases and tables that you simply create for the information saved in S3
makes use of schema-on-read know-how, which implies that the desk definitions are utilized to the information in S3 when queries are being utilized. There’s no information loading or transformation required. Desk definitions and schema might be deleted with out impacting the underlying information saved in S3.
helps fine-grained entry management with AWS Lake Formation which permits for centrally managing permissions and entry management for information catalog sources within the S3 information lake.
Supply: Amazon
Athena Workgroups
Athena workgroups can be utilized to separate customers, groups, purposes, or workloads, to set limits on quantity of knowledge every question or the whole workgroup can course of, and to trace prices.
Useful resource-level identity-based insurance policies can be utilized to regulate entry to a selected workgroup.
Workgroups assist view query-related metrics in CloudWatch, management prices by configuring limits on the quantity of knowledge scanned, create thresholds, and set off actions, resembling SNS, when these thresholds are breached.
Workgroups combine with IAM, CloudWatch, Easy Notification Service, and AWS Value and Utilization Reviews as follows:
IAM identity-based insurance policies with resource-level permissions management who can run queries in a workgroup.
Athena publishes the workgroup question metrics to CloudWatch in case you allow question metrics.
SNS subjects might be created that problem alarms to specified workgroup customers when information utilization controls for queries in a workgroup exceed the established thresholds.
Workgroup tag might be configured as a price allocation tag within the Billing and Value Administration console and the prices related to working queries in that workgroup seem within the Value and Utilization Reviews with that price allocation tag.
Athena Finest Practices
Partition the information
which helps hold the associated information collectively based mostly on column values resembling date, nation, and area.
Athena helps Hive partitioning
Decide partition keys that may help the queries
Partition projection is an Athena function that shops partition info not within the Glue Knowledge Catalog however as guidelines within the properties of the desk in AWS Glue.
Compression
Compressing the information can pace up queries considerably, so long as the information are both of an optimum measurement or the information are splittable.
Smaller information sizes scale back the information scanned from S3, leading to decrease prices of working queries and lowered community visitors.
Optimize file sizes
Queries run extra effectively when information scanning might be parallelized and when blocks of knowledge might be learn sequentially.
Columnar file codecs
Columnar storage codecs like ORC and Parquet are optimized for quick retrieval of knowledge as they permit compression and are splittable.
A splittable file might be learn in parallel by the execution engine in Athena, whereas an unsplittable file can’t be learn in parallel.
Optimize queries
AWS Certification Examination Follow Questions
Questions are collected from Web and the solutions are marked as per my data and understanding (which could differ with yours).
AWS companies are up to date on a regular basis and each the solutions and questions is likely to be outdated quickly, so analysis accordingly.
AWS examination questions should not up to date to maintain up the tempo with AWS updates, so even when the underlying function has modified the query may not be up to date
Open to additional suggestions, dialogue and correction.
A SysOps administrator is storing entry logs in Amazon S3 and desires to make use of normal SQL to question information and generate a report with out having to handle infrastructure. Which AWS service will permit the SysOps administrator to perform this process?
Amazon Inspector
Amazon CloudWatch
Amazon Athena
Amazon RDS
A Options Architect should design a storage resolution for incoming billing reviews in CSV format. The information doesn’t have to be scanned incessantly and is discarded after 30 days. Which service will probably be MOST cost-effective in assembly these necessities?
Import the logs into an RDS MySQL occasion
Use AWS Knowledge pipeline to import the logs right into a DynamoDB desk
Write the information to an S3 bucket and use Amazon Athena to question the information