Sunday, April 17, 2022

AWS Redshift

 AWS Redshift 

Amazon Redshift is a data warehouse product developed by Amazon and is a part of Amazon's cloud platform, Amazon Web Services. Redshift is a relational database management system designed specifically for OLAP and is built on top of PostgreSQL and ParAccel's Massive Parallel Processing technology, leveraging its distributed architecture, columnar storage, and column compression to execute exploratory queries. Due to being based off of PostgreSQL, Redshift allows clients to make connections and execute DDL and DML SQL statements using JDBC or ODBC.

 For better Understanding about AWS RedShift click the link below

RedShift.pdf

 What is Amazon Redshift?

Amazon Redshift is a fully managed, scalable cloud data warehouse that accelerates your time to insights with fast, easy, and secure analytics at scale. Thousands of customers rely on Amazon Redshift to analyze data from terabytes to petabytes and run complex analytical queries. You can get real-time insights and predictive analytics on all your data across your operational databases, data lake, data warehouse, and third-party datasets. Amazon Redshift delivers all this at a price performance that’s up to 3x better than other cloud data warehouses out of the box, helping you keep your costs predictable.

Amazon Redshift Serverless makes it easy for you to run petabyte-scale analytics in seconds to get rapid insights without having to configure and manage your data warehouse clusters. Amazon Redshift Serverless automatically provisions and scales the data warehouse capacity to deliver high performance for demanding and unpredictable workloads, and you pay only for the resources you use.

What are the top reasons customers choose Amazon Redshift?

Thousands of customers choose Amazon Redshift to accelerate their time to insights because it’s easy to use, it delivers performance at any scale, and it lets you analyze all your data. Amazon Redshift is a fully managed service and offers both provisioned and serverless options, making it easy for you to run and scale analytics without having to manage your data warehouse. You can choose the provisioned option for predictable workloads or go with the Amazon Redshift Serverless option to automatically provision and scale the data warehouse capacity to deliver high performance for demanding and unpredictable workloads. It delivers performance at any scale with up to 3x better price performance than other cloud data warehouses out of the box, helping you keep your costs predictable. Amazon Redshift lets you get insights from running real-time and predictive analytics on all your data across your operational databases, data lake, data warehouse, and thousands of third-party datasets. Amazon Redshift keeps your data secure at rest and in transit and meets internal and external compliance requirements. It supports industry-leading security to protect your data in transit and at rest and is compliant with SOC1, SOC2, SOC3, and PCI DSS Level 1 requirements. All Redshift security and compliance features are included at no additional cost.

How does Amazon Redshift simplify data warehouse management?

Amazon Redshift is fully managed by AWS so you no longer need to worry about data warehouse management tasks such as hardware provisioning, software patching, setup, configuration, monitoring nodes and drives to recover from failures, or backups. AWS manages the work needed to set up, operate, and scale a data warehouse on your behalf, freeing you to focus on building your applications. Amazon Redshift also has automatic tuning capabilities, and surfaces recommendations for managing your warehouse in Redshift Advisor. For Redshift Spectrum, Amazon Redshift manages all the computing infrastructure, load balancing, planning, scheduling, and execution of your queries on data stored in Amazon S3. The serverless option automatically provisions and scales the data warehouse capacity to deliver high performance for demanding and unpredictable workloads, and you pay only for the resources you use.

 How does the performance of Amazon Redshift compare to that of other data warehouses?

TPC-DS benchmark results show that Amazon Redshift provides the best price performance out of the box, even for a comparatively small 3 TB dataset. Amazon Redshift delivers up to 3x better price performance than other cloud data warehouses. This means that you can benefit from Amazon Redshift’s leading price performance from the start without manual tuning. Get up to 3x better price performance with Amazon Redshift than with other cloud data warehouses | AWS Big Data Blog.

Amazon Redshift uses a variety of innovations to achieve up to 10x better performance than traditional databases for data warehousing and analytics workloads, including efficient read-optimized columnar compressed data storage with massively parallel processing (MPP) compute clusters that scale linearly to hundreds of nodes. Instead of storing data as a series of rows, Amazon Redshift organizes the data by column. When loading data into an empty table, Amazon Redshift automatically samples your data and selects the most appropriate compression scheme.

Redshift Spectrum lets you run queries against exabytes of data in Amazon S3. There is no loading or extract, transform, and load (ETL) required. Even if you don’t store any of your data in Amazon Redshift, you can still use Redshift Spectrum to query datasets as large as an exabyte in Amazon S3. Materialized views provide significantly faster query performance for repeated and predictable analytical workloads such as dashboards, queries from business intelligence (BI) tools, and ETL data processing. Using materialized views, you can store the precomputed results of queries and efficiently maintain them by incrementally processing the latest changes made to the source tables. Subsequent queries referencing the materialized views use the precomputed results to run much faster, and automatic refresh and query rewrite capabilities simplify and automate the use of materialized views.

The compute and storage capacity of on-premises data warehouses are limited by the constraints of the on-premises hardware. Amazon Redshift gives you the ability to scale compute and storage independently as needed to meet changing workloads. With Redshift Managed Storage (RMS), you now have the ability to scale your storage to petabytes using Amazon S3 storage.

Automatic Table Optimization (ATO) is a self-tuning capability that helps you achieve the performance benefits of creating optimal sort and distribution keys without manual effort. ATO observes how queries interact with tables and uses machine learning (ML) to select the best sort and distribution keys to optimize performance for the cluster’s workload. ATO optimizations have shown to increase cluster performance by 24% and 34% using the 3 TB and 30 TB TPC-DS benchmarks, respectively, versus a cluster without ATO. Additional features such as Automatic Vacuum Delete, Automatic Table Sort, and Automatic Analyze eliminate the need for manual maintenance and tuning of Redshift clusters to get the best performance for new clusters and production workloads.

Workload management allows you to route queries to a set of defined queues to manage the concurrency and resource utilization of the cluster. Today, Amazon Redshift has both automatic and manual configuration types. With manual WLM configurations, you’re responsible for defining the amount of memory allocated to each queue and the maximum number of queries, each of which gets a fraction of that memory, which can run in each of their queues. Manual WLM configurations don’t adapt to changes in your workload and require an intimate knowledge of your queries’ resource utilization to get right. Amazon Redshift Auto WLM doesn’t require you to define the memory utilization or concurrency for queues. Instead, it adjusts the concurrency dynamically to optimize for throughput. Optionally, you can define query priorities to provide queries preferential resource allocation based on your business priority. Auto WLM also provides powerful tools to let you manage your workload. Query priorities let you define priorities for workloads so they can get preferential treatment in Amazon Redshift, including more resources during busy times for consistent query performance, and query monitoring rules offer ways to manage unexpected situations such as detecting and preventing runaway or expensive queries from consuming system resources. The following are key areas of Auto WLM with adaptive concurrency performance improvements: proper allocation of memory, elimination of static partitioning of memory between queues, and improved throughput.

Amazon Redshift Advisor develops customized recommendations to increase performance and optimize costs by analyzing your workload and usage metrics for your cluster. Sign in to the Amazon Redshift console to view Advisor recommendations.

How do I get started with Amazon Redshift?

With just a few clicks in the AWS Management Console, you can start querying data. You can take advantage of pre-loaded sample data sets, including benchmark datasets TPC-H, TPC-DS, and other sample queries to kick start analytics immediately. You can create databases, schemas, tables and load data from Amazon S3, Amazon Redshift data shares, or restore from an existing Amazon Redshift provisioned cluster snapshot. You can also directly query data in open formats, such as Parquet or ORC in Amazon S3 data lake, or query data in operational databases, such as Amazon Aurora, Amazon RDS PostgreSQL and MySQL.

To get started with Amazon Redshift Serverless, choose “Try Amazon Redshift Serverless” and start querying data. Amazon Redshift Serverless automatically scales to meet any increase in workloads.

 What is Advanced Query Accelerator (AQUA) for Amazon Redshift?

Advanced Query Accelerator (AQUA) is a new distributed and hardware-accelerated cache that enables Amazon Redshift to run up to 10x faster than other enterprise cloud data warehouses by automatically boosting certain types of queries. AQUA is available with the RA3.16xlarge, RA3.4xlarge, or RA3.xlplus nodes at no additional charge and with no code changes.

 How do I enable/disable AQUA for my Redshift data warehouse?

For Redshift clusters running on RA3 nodes, you can enable/disable AQUA at the cluster level using the Redshift console, AWS Command Line Interface (CLI), or API. For Redshift clusters running on DC, DS, or older-generation nodes, you must upgrade to RA3 nodes first and enable/disable AQUA.

What type of queries are accelerated by AQUA?

AQUA accelerates analytics queries by running data-intensive tasks such as scans, filtering, and aggregation closer to the storage layer. You’ll see the most noticeable performance improvement on queries that require large scans, especially those with LIKE and SIMILAR_TO predicates. Over time, the types of queries that are accelerated by AQUA will increase.

How do I know which queries on my Redshift cluster are accelerated by AQUA?

You can query the system tables to see the queries accelerated by AQUA.

What is Amazon Redshift managed storage?

Amazon Redshift managed storage is available with serverless and RA3 node types and lets you scale and pay for compute and storage independently so you can size your cluster based only on your compute needs. It automatically uses high-performance SSD-based local storage as tier-1 cache and takes advantage of optimizations such as data block temperature, data block age, and workload patterns to deliver high performance while scaling storage automatically to Amazon S3 when needed without requiring any action.


No comments: