Amazon Redshift – Overview Amazon Redshift
- OLAP database (Data warehousing solution) based on PostgreSQL
- OLAP = Online Analytical Processing
- Can query petabytes of structured and semi-structured data across your data warehouse and your data lake using standard SQL
- 10x performance than other data warehouses
- Columnar storage of data (instead of row based)
- Massively Parallel Query Execution (MPP), highly available
- Has a SQL interface for performing the queries
- BI tools such as AWS Quicksight or Tableau integrate with it.
- Data is loaded from S3,Kinesis Firehose,DynamoDB,DMS ....
- Can contain from 1 node to 128 compute nodes, up to 160 GB per node
- Can provision multiple nodes, but it’s not Multi-AZ
- Leader node: for query planning, results aggregation
- Compute node: for performing the queries, send results to leader.
- Backup & Restore, Security VPC / IAM / KMS, Monitoring.
- Redshift Enhanced VPC Routing: COPY / UNLOAD goes through VPC
- Redshift is provisioned, so it’s worth it when you have a sustained usage (use Athena instead if the queries are sporadic)
Redshift Architecture
- Massively parallel columnar database, runs within a VPC
- Single leader node and multiple compute nodes
- You can connect to Redshift using any application supporting JDBC or ODBC driver for PostgreSQL
- Clients query the leader node using SQL endpoint
- • A job is distributed across compute nodes
- • Compute nodes partition the job into slices.
- Leader node then aggregates the results and returns them to the client
Redshift node types
• Dense compute nodes (DC2)
• For compute-intensive DW workloads with local SSD storage
• Dense storage nodes (DS2)
For large DWs, uses hard disk drives (HDDs)
RA3 nodes with managed storage
• For large DWs, uses large local SSDs
• Recommended over DS2
• Automatically offloads data to S3 if node grows beyond its size
• Compute and managed storage is billed independently
No comments:
Post a Comment