Monday, April 17, 2023

Amazon Redshift- Loading Data into RedShift

 

 Loading Data into RedShift



Typically, data from OLTP systems is loaded into Redshift for analytics and BI purposes.

  • Data from OLTP systems can be loaded into S3 and data from S3 can then be loaded into Redshift.
  • Data from Kinesis Firehose can also be loaded in the same way.


COPY command

  • Loads data from files stored in S3 into Redshift
  • Data is stored locally in the Redshift cluster (persistent storage = cost)
  • DynamoDB table data and EMR data can also be loaded using COPY command


Loading data from S3 with COPY command



copy users from 's3://my_bucket/tickit/allusers_pipe.txt' credentials 'aws_iam_role=arn:aws:iam::0123456789:role/MyRedshiftRole' delimiter '|' region 'us-west-2';

  • Create an IAM Role.
  • Create your Redshift cluster
  • Attach the IAM role to the cluster
  • The cluster can then temporarily assume the IAM role on your behalf
  • Load data from S3 using COPY command


More ways to load data into Redshift


 


  • Use AWS Glue – fully managed ETL service 
  • ETL = Extract, Transform, and Load
  •  Use ETL tools from APN partners
  • Use Data Pipeline
  • For migration from on-premise, use.
  1. AWS Import/Export service (AWS Snowball).
  2. AWS Direct Connect (private connection between your datacenter and AWS)






No comments: