Databases on AWS
- RDS
( MySQL, PostgreSQL, SQL Server, MariaDB, Aurora)
- DynamoDB
- Non relational DB (important for developer exam)
- Redshift
- Data warehousing system -
- ElastiCache - Cloud in-memory DB (important for developer / architect exam)
RDBMS Types
- MS-SQL
Server
- Oracle
- MySQL
- PostgreSQL
- Aurora
- MariaDB
NoSQL DBs
Document Oriented
- CouchDB
- MongoDB
- Dynamo
DB
Collection = Table, Document = Row, Keys-Value Pairs = Fields
Data Warehousing
OLTP (pulls out specific / narrow record set) vs OLAP – (pulls in large
number of records). It used different architecture and infrastructure layer.
Differ in terms of queries run on top of data. OLAP is more about aggregation.
ElastiCache
In memory cache in cloud.
- Memcached
- Redis
Exam – Improve database performance – e.g. top 10 deals of the day.
Database Migration
Service
Migrate production database to AWS. AWS manages all complexities of
migration process. Source database remains fully operational. Both homogenous
(Oracle to Oracle) as well as heterogeneous migrations are supported (Oracle to
Aurora or Microsoft SQL). Can also be used for continuous data replication with
high availability
AWS Schema migration tool makes heterogeneous database - migrations - easy by automatically converting the source
database schema and a majority of the custom code, including views, stored
procedures, and functions, to a format compatible with the target database. Any
code that cannot be automatically converted is clearly marked so that it can be
manually converted.
RDS – Back Ups, Multi AZs & Read Replicas
OLTP systems.
Backups
- Automated
Backups – full daily snapshot & will also store transaction logs.
- Enabled
by default. Stored in S3. Free backup storage in S3 upto the RDS Instance
size.
- You
can define backup window. Choose wisely.
- Backups
are deleted when the RDS Instance is deleted.
Snapshots
- Done
manually. They are stored even after you delete the instance.
- You
can copy snapshots across regions.
- You
can publish the snapshot to make it publically available.
- Restoring
Backups/ Snapshots – The restored version will be a new RDS instance with
new end point.
- You
can check the instance size to restore.
- You
cannot restore to existing instance
Encryption
- Encryption
at rest is supported for MySQL, SQL Server, Oracle and PostgreSQL &
MariaDB.
- Managed
by AWS KMS.
- Cannot
encrypt an already present instance. To encrypt, create new instance with
encryption enabled and then migrate your data to it.
Multi-AZ Deployment
- A
standby copy is created in another AZ. AWS handles replication and
auto-failover
- AWS
can automatically failover RDS instance to another instance.
- In
case of failover, No need to change connection string.
- This
can be used for DR purpose only. This option has to be selected at
instance creation time. This option is not useful for improving
performance / scaling.
Read Replica
Databases.
- Read-replica
– async data transfer to another RDS instance. You can actually read from
these instances, unlike Multi-AZ deployments. You can also have read
replicas of read-replicas up to 5 copies. (Watch out as async causes
latency)-
- Read-replicas
can be used for Dev/Test environments, run certain workloads only against
them and not against direct production deployment – Intensive workloads.
- MySQL
, MariaDB, PostgreSQL only for read-replicas , no Oracle & SQL Server
- You
cannot have read-replicas that have multi-AZ. However, you can create read
replicas of Multi AZ source databases.
- Read
replicas can be of a different size than source DB.
- Each
read-replica will have its own DNS end point
- Automatic
backups must be turned on in order to deploy a read replica
- Read
Replicas can be promoted to be their own databases. This breaks
replication. E.g. Dev/Test can be connected to the replica by first
promoting it as DB itself.
- Read
Replicas can be done in a second region for MySQL and MariaDB – no
PostgreSQL.
- Application
re-architecture is required to make use of Read replicas
- Read
replicas are not used for DR. they are used for performance scaling only.
DynamoDB
- Fast
and flexible NoSQL database
- Consistent,
single digit millisecond latency.
- Fully
managed DB – supports both document based & Key-value data models.
- Great
fit for mobile, IoT, web, gaming etc. applications.
- Stored
on SSDs
- Stored
on 3 geographically distinct DCs (not AZs). Built in redundancy
- Consistency
- Eventual
consistent reads - Consistency reached up to 1 second (default)
- Strongly
Consistent reads - Consistency reached after writes to all copies are
completed. <1 second
Select type based on application needs
- Pricing
– Write Capacity Units and Read Capacity Units ($/hr.). Also Storage cost
per month. You provision capacity in units/second. It can scale on the
fly. Provisioned capacity.
- Dynamo
DB – Expensive for Writes. Cheap for Reads. Important point v/s RDS.
- You
can dynamically add columns – without the need to update other rows with
the column data. As this is no RDBMS.
- Reserved
capacity is available for DynamoDB as well.
RDS v/s DynamoDB
- Use
DynamoDB for Push button scaling. With RDS – to scale horizontally a new
instance has to be created.
- DynamoDB
is cheap for writes and expensive for reads.
- Observe
workload characteristics and decide
- Use
RDS if data requires joins and relationships.
- In
RDBMS database structure cannot be dynamically altered. With DynamoDB you
can.
Redshift
Petabyte scale DW solution in cloud. Used for OLAP – sum of various
columns and joining the data.
Configurations
- Single
Node – 160 GB. Used by Small and Medium Size businesses.
- Multi-Node
– Leader Node (handles all incoming connections & receives queries)
& compute Node (store data and perform queries and computations – up
to 128 Compute Nodes)
Performance
- Redshift
is 10 times faster than usual OLAP systems.
- It
uses Columnar Data Store. Columnar data is stored sequentially on storage
system. Hence low I/O required – improving performance.
- Advanced
Compression (easier to do it via Columns instead of via Rows – which have
different data types). Columns have similar type of data. Doesn’t use
indexes and views – hence less storage required.
- Based
on data, appropriate data compression scheme is used.
- Allows
for massive parallel processing
Pricing
- Based
on Compute Node hours (compute node only – no leader node).
- Backup
and Data Transfer (only within VPC)
Security
- Transit
encrypted via SSL,
- At
rest using AES-256 encryption
- Can
use your own HSM or default AWSK Key management.
Availability
Not Multi-AZs. Can restore snapshots
Exam Tips – Database warehousing service, cheap, faster. Best seller AWS
Service. Speed achieved due to columnar storage. And Data stored sequentially
on disk – hence faster.
ElastiCache
- Easy
to deploy, operate and scale an in-memory cache in the cloud.
- Improve
performance by avoiding repeated calls to DB.
- Improve
latency and throughput for read-heavy applications.
- Can
be used for compute intensive data
Memcached
- All
Memcached tooling can be easily ported over.
Redis
- Supports
Master / Slave replication and multi-AZ deployment to get redundancy.
Exam Tips
- ElastiCache
is used if DB is primarily read-heavy and not frequently changing
- Use
Redshift – if application is slow due to constant OLAP transactions on top
of OLTP focused DB.
Aurora
- Bespoke
Database Engine.
- It
is MySQL compatible.
- However
you can’t download and install on your workstation.
Performance
5 times better performance than MySQL. At a fraction of cost as compared
to Oracle.
Scaling
- Outset
10 Gb Storage, auto increment of storage
- No
Push button scaling – unlike DynamoDB
Fault Tolerance
- Maintains
2 copies of your data in at least 3 availability zones. This is for the
Data only not for the instances that runs the Database.
- 2
copies lost – no impact on write availability.
- 3
copies lost – no impact on read availability.
- Storage
is self-healing.
Replicas
- MySQL
Read Replica can be created from the Aurora source DB.(up to 5 of them)
- Aurora
Replicas – up to 15 of them. If leader crashes, the replica with the
highest tiers becomes the leader. While creating replicas, remember to
assign different tier levels.
- Cluster
Endpoint vs Individual Endpoint
No Free Tier usage available. Also available only in select regions.
Takes slightly longer to provision
Exam Tips
- Why
you can’t connect to DB Server from DMZ. Check the security group – if it
is removed or added
- Have
separate groups for EC2 Instance and RDS Instance.
- Multi-AZ
for Disaster Recovery only. Not for performance improvement. For
performance improvement use, multiple read-replicas
- Dynamo
DB v/s RDS
If you want push button scaling, without any downtown, you will always
want to use DynamoDB.
With RDS scaling is not so easy, you have to use a bigger instance or
add read replicas (manual process).
- If
you are using Amazon RDS Provisioned IOPS storage with a MySQL or Oracle
database engine, what is the maximum size RDS volume you can have by
default? – 6TB
- What
data transfer charge is incurred when replicating data from your primary
RDS instance to your secondary RDS instance? - There is no charge associated
with this action.
- When
you have deployed an RDS database into multiple availability zones, can
you use the secondary database as an independent read node? – No
- RDS
automatically creates RDS Security Group w/ TCP port # 3306 enabled.
- In
VPC Security Group, the answer would be YES because you will have manually
specify access to port & protocol.
- AWS Cloud #Databases provides a broad selection of purpose-built databases for any enterprise.
- AWS databases support all database management tasks, such as server provisioning, patching, configuration, and backups.
- AWS provides various databases to supports relational and non-relational databased requirements to the clients.
Types of Database Services:
- Relational Database
- Key–Value Database
- In Memory Database
Relational Database
In relational database data is stored in a tabular format. It uses SQL to run queries to perform operations such as insertion, updating, deletion, and more. AWS provides following relational database services.
- Amazon RDS
Amazon Relational Database Service is a collection of managed services that makes it simple to set up, operate, and scale databases in the cloud.
It provides six database engines to choose and that includes Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle Database, and SQL Server.
- Amazon Redshift
Amazon Redshift uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and machine learning to deliver the best price performance at any scale.
- Amazon Aurora
Amazon Aurora provides built-in security, continuous backups, serverless compute, up to 15 read replicas, automated multi-Region replication, and integrations with other AWS services.
Key Value Database
key-value database is a type of non-relational database that uses a simple key-value method to store data. A key-value database stores data as a collection of key-value pairs in which a key serves as a unique identifier.
- Amazon DynamoDB
DynamoDB offers built-in security, continuous backups, automated multi-Region replication, in-memory caching, and data import and export tools.
In Memory Database
In-memory databases are purpose-built databases that rely primarily on memory for data storage, in contrast to databases that store data on disk or SSDs.
- Amazon ElastiCache
Amazon ElastiCache is a fully managed, in-memory caching service supporting flexible, real-time use cases.
It accelerates application and database performance, or as a primary data store for like session stores, gaming, streaming, and analytics.
No comments:
Post a Comment