A region is a cluster of data centers – Regions(Availability Zones (Data Centers))
Most AWS services are region-scoped.
Not all services will be there in all regions – https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/
How to choose AWS region:
- Compliance with data governance and legal requirements: data never leaves a region without your explicit permission
- Proximity to customers: reduced latency
- Available services within a Region: new services and new features aren’t available in every Region
- Pricing: pricing varies region to region and is transparent in the service pricing page
AWS Well-Architected Framework:
- Security
- Cost Optimization
- Reliability
- Performance efficiency
- Operational Excellence
- Sustainability
https://aws.amazon.com/architecture/well-architected
Security Responsibility: The Shared Responsibility model is used in AWS.
- It states that AWS will secure the cloud itself, but the customer or user must secure that which they place in the cloud. For example, the customer must secure the operating system running in an EC2 instance as AWS will not control or manage that which runs in an instance.
- Amazon tracks stored data and usage in order to bill services rendered
- Amazon may access customer data when required to do so by law
EC2 (Elastic Compute Cloud):
- Infrastructure as a service
- An EC2 instance can be attached to up to 28 elements, such as network interfaces and EBS volumes. For example, if you attach one network interface, you can also attach 27 EBS volumes
EC2 Categories:
- General Purpose: Balance between CPU, memory, networking etc.
- Compute Optimized: Powerful CPUs, High Performance Computing, Machine Learning, batch processing etc.
- Memory: Database Workloads etc.
- Accelerated Computing: Hardware GPU, FPGAs etc.
- Storage Optimized: Sequential and random IO (ElasticSearch), great for workloads requiring high, sequential read/write access to large data sets on local storage.
Bootstrap script: (configure at first launch) EC2 user data
- Bootstraping means launching commands when a machine starts
- EC2 User data script runs with the root user. This is used to automate boot tasks
- Use EC2 user data to customize the dynamic installation parts at boot time
Key Pair: allows us to connect to instance securely
- Mac/linux/windows 10: .pem (for use with OpenSSH)
- Windows 8 or 7: .ppk (for use with PuTTY)
Go to directory where pem file resides
#chmod 0400 EC2Tutorial.pem
#ssh -i EC2Tutorial.pem ec2-user@<public IP>
#aws configure –> never use this and give below in EC2 connect
- Access key ID: give key ID
- AWS secret access key: give secret
- Default region name: closest one
- Default output format: none
#aws iam list-users -> to verify
- When EC2 stops and starts again, public IP will change while private IP remains same
EC2 instance connect: comes with AWS cli, port 22 should be opened
select the instance and click connect
- Give ec2-user
- Connect
EC2 instance role:
- Select the instance
- Actions -> Security -> Modify IAM role
- Attach the IAM role
- An instance can have only one role assigned at a time
IAM role: its like a user but not physical, it is for a service
- For ex, EC2 instance wants to perform some action on AWS. Create an IAM role with permissions and assign to EC2
EC2 purchasing options:
- On-Demand Instances – short workload, predictable pricing, pay by second
- Reserved (1 & 3 years)
- Reserved Instances – long workloads
- Convertible Reserved Instances – long workloads with flexible instances
- Savings Plans (1 & 3 years) – commitment to an amount of usage, long workload
- Spot Instances – short workloads, cheap, can lose instances (less reliable)
- Dedicated Hosts – book an entire physical server, control instance placement
- Dedicated Instances – no other customers will share your hardware
- Capacity Reservations – reserve capacity in a specific AZ for any duration
Dedicated Host vs Instances: By default, EC2 instances run on a shared-tenancy basis.
- Dedicated Instances are Amazon EC2 instances that run in a virtual private cloud (VPC) on hardware that’s dedicated to a single customer. Dedicated Instances that belong to different AWS accounts are physically isolated at the hardware level. However, Dedicated Instances may share hardware with other instances from the same AWS account that is not Dedicated Instances.
- A Dedicated Host is also a physical server that’s dedicated to your use. With a Dedicated Host, you have visibility and control over how instances are placed on the server.
- A Dedicated Host provides a solution to specific licensing scenarios that require the license be locked to hardware, the number of CPU sockets, etc. Licenses based on the number of users or license files can work on any tenancy model. Using more than one application is possible with any tenancy model as well.
EC2 Spot Instances: Instances that we can lose at any point of time. 90% discount compared to on-demand
- If anyone willing to pay more than our spot price, we can choose to stop or terminate instances within a 2 minutes grace period
- Not suitable for critical jobs or databases
- A Spot Instance request is either one-time or persistent.
- If the spot request is persistent, the request is opened again after Spot Instance is interrupted. If the request is persistent and we stop the Spot Instance, the request only opens after we start our Spot Instance
- To cancel a persistent Spot request and terminate its Spot Instances, we must cancel the Spot request first and then terminate the Spot Instances.
Spot fleets: set of spot instances + (optional) on-demand instances
- Spot Fleets allow us to automatically request Spot Instances with the lowest price
Spot Blocks: Spot Instances with a defined duration (also known as Spot blocks) are designed not to be interrupted and will run continuously for the duration we select.
Elastic IP: for fixed public IP address. (5 per account)
Will be charged if Elastic IP is allocated and not assigned to any instance
EC2 placement groups:
- Cluster: clusters instances into a low-latency group (10 Gbps bandwidth) in a single Availability Zone (same rack, same AZ)
- Spread: places your EC2 instances on different physical hardware across different AZs (max 7 instances per AZ per placement group)
- Partition: up to 7 partitions per AZ, spans across AZs in same region
- The instances in a partition do not share racks with the instances in the other partitions
Scales to 100s of EC2 instances per group (used for Hadoop, Cassandra, Kafka)
ENI – Elastic network interface:
- Logical component in a VPC that represents a virtual network card
- It’s like a ethernet interface which has fixed/Elastic private IP to connect to EC2, public IP also gets assigned only for primary (the interface which comes when EC2 was created)
- We can add more ENI’s as required
- We can create ENI independently and attach them on the fly (move them) on EC2 instances for failover
- Bound to a specific availability zone (AZ)
AMI – Amazon machine Image:
- Customization of EC2 instance
- Region specific, can be copied to another regions, can be shared across accounts
- We can’t launch an EC2 instance using an AMI in another AWS Region, but you can copy the AMI to the target AWS Region and then use it to create your EC2 instances.
- When the new AMI is copied from region A into region B, it automatically creates a snapshot in region B because AMIs are based on the underlying snapshots.
- A Golden AMI is an AMI that can be standardized through configuration, consistent security patching, and hardening. It also contains agents we approve for logging, security, performance monitoring, etc. For the given use-case, we can have the static installation components already setup via the golden AMI.
Storage cloud native options:
- Block – EBS, Instance Store
- File – EFS, FSx
- Object – S3, Glacier
EC2 Instance storage:
- Network-attached (EBS & EFS)
- Hardware (EC2 instance store)
Elastic File System (EFS) can be shared across regions and Elastic Block Store (EBS) cannot
EBS volume: network drive, bound to AZ, snapshot required to restore in another AZ, persist data after termination
- Volume Types:
- io1 / io2 (SSD): Highest-performance SSD volume for mission-critical low-latency or high-throughput workloads
- sc1 (HDD): Lowest cost HDD volume designed for less frequently accessed workloads.. Ex, backup data
- st1 (HDD): Low cost HDD volume designed for frequently accessed, throughput intensive workloads
- gp2 / gp3 (SSD): General purpose SSD volume that balances price and performance for a wide variety of workloads
- Only gp2/gp3 and io1/io2 can be used as boot volumes
- gp2: IO increases if the disk size increases
- io1: can increase IO independently
- For EBS General Purpose SSD (gp2) volumes, the charges are $0.10 per GB-month of provisioned storage.
- EBS optimized instances provide dedicated capacity for Amazon EBS I/O. EBS optimized instances are designed for use with all EBS volume types
Instance store: high performance hardware disk, data lost once stopped
- When we stop, hibernate, or terminate an instance, every block of storage in the instance store is reset
- We can specify the instance store volumes for the instance only when we launch an instance. We can’t attach instance store volumes to an instance after it is launched
EFS file system: Managed network file system (NFS) – multi AZ, only Linux
- Storage classes:
- Standard
- Infrequent Access (EFS-IA) – life cycle policy can be created
- It exists outside of VPC. Mount target is used to connect EFS from EC2
- The EFS Standard Storage pricing is $0.30 per GB per month.
- We do not use IAM to control access to files and directories by user and group, but we can use IAM to control who can administer the file system configuration.
- We can control access to files and directories with POSIX-compliant user and group-level permissions. POSIX permissions allows us to restrict access from hosts by user and group.
- EFS Security Groups act as a firewall, and the rules we add define the traffic flow
- The default limit on EFS file systems per account is 10. This can be increased by opening a support ticket with AWS
EBS Multi-attach: io1/io2
- Attach the same EBS volume to multiple EC2 instances (max 16) in the same AZ
- Multi-Attach is supported exclusively on Provisioned IOPS SSD volumes
EBS Snapshots:
- Make a backup (snapshot) of your EBS volume at a point in time
- Not necessary to detach volume to do snapshot, but recommended
- Can copy snapshots across AZ or Region
- Snapshots are incremental backups such that they backup only the blocks that have changed in the EBS volume since the last snapshot was taken. This provides a fast backup option.
- Fast Snapshot Restore (FSR): Force full initialization of snapshot to have no latency on the first use ($$$). Recommended for use-case where we need consistent high I/O performance from start
EFS Performance modes:
- General Purpose mode
- Max I/O mode: It is used to scale to higher levels of aggregate throughput and operations per second. Highly parallelized applications and workloads, such as big data analysis, media processing, and genomic analysis, can benefit from this mode
EFS Throughput modes:
- Provisioned Throughput mode: We can instantly provision the throughput of our file system (in MiB/s) independent of the amount of data stored.
- Bursting Throughput mode: Throughput on Amazon EFS scales as the size of our file system in the standard storage class grows
RAID configuration options: A RAID array uses multiple EBS volumes to improve performance or redundancy
- RAID 0 when I/O performance is more important than fault tolerance
- RAID 1 when fault tolerance is more important than I/O performance
General Purpose SSD:
- Cost effective storage, low-latency
- System boot volumes, Virtual desktops, Development and test environments
- 1 GiB – 16 TiB
- gp3:
- Baseline of 3,000 IOPS and throughput of 125 MiB/s
- Can increase IOPS up to 16,000 and throughput up to 1000 MiB/s independently
- gp2:
- Small gp2 volumes can burst IOPS to 3,000
- Size of the volume and IOPS are linked, max IOPS is 16,000
- 3 IOPS per GB, means at 5,334 GB we are at the max IOPS
Provisioned IOPS (PIOPS) SSD:
- Critical business applications with sustained IOPS performance
- Or applications that need more than 16,000 IOPS
- Great for databases workloads (sensitive to storage perf and consistency)
- io1/io2 (4 GiB – 16 TiB):
- Max PIOPS: 64,000 for Nitro EC2 instances & 32,000 for other
- Can increase PIOPS independently from storage size
- io2 have more durability and more IOPS per GiB (at the same price as io1)
- io2 Block Express (4 GiB – 64 TiB):
- Sub-millisecond latency
- Max PIOPS: 256,000 with an IOPS:GiB ratio of 1,000:1
- Supports EBS Multi-attach
EC2 Hibernate: To enable EC2 Hibernate, the EC2 Instance Root Volume type must be an EBS volume and must be encrypted to ensure the protection of sensitive content.
- Instance RAM Size – must be less than 150 GB.
- Available for On-Demand, Reserved and Spot Instances
- An instance can NOT be hibernated more than 60 days
Elastic Fabric Adapter (EFA):
- Improved ENA (Elastic Network Adapter) for HPC, only works for Linux
- Great for inter-node communications, tightly coupled workloads
- Leverages Message Passing Interface (MPI) standard
- Bypasses the underlying Linux OS to provide low-latency, reliable transport
Docker: Software development platform to deploy apps and Apps are packaged in Containers that can run on any OS. Many containers on one server i.e. Resources are shared with the host
Docker repository: Docker images are stored here
Docker Hub (https://hub.docker.com)
- Public repository
- Find base images for many technologies or OS (e.g., Ubuntu, MySQL, …)
Amazon ECR (Amazon Elastic Container Registry):
- Store container Images
- Private repository
- Public repository (Amazon ECR Public Gallery https://gallery.ecr.aws)
- Access is controlled through IAM
Amazon Elastic Container Service (Amazon ECS): Amazon’s own container platform
- Each EC2 instance must run the ECS Agent to register in the ECS cluster
- EC2 Launch Type: we must provision & maintain the infrastructure (the EC2 Instances)
- Launch Docker containers on AWS means Launch ECS Tasks on ECS Clusters
- AWS takes care of starting / stopping containers
- Tasks can be auto scaled based on AWS application auto scaling which can be set based on Target Tracking, Step scaling, Scheduling Scaling
AWS Fargate: Amazon’s own Serverless container platform, Works with ECS and with EKS
- We do not provision the infrastructure (no EC2 instances to manage)
- It’s all Serverless
- AWS just runs ECS Tasks for us based on the CPU/RAM we need. It will auto scale as the Tasks increase
Amazon ECS – IAM Roles:
- EC2 Instance Profile (EC2 Launch Type only):
- Used by the ECS agent
- Makes API calls to ECS service
- Send container logs to CloudWatch Logs
- Pull Docker image from ECR
- Reference sensitive data in Secrets Manager or SSM Parameter Store
- ECS Task Role:
- Allows each task to have a specific role
- Use different roles for the different ECS Services we run
- Task Role is defined in the task definition
- Can specify IAM role through taskRoleArn so that ECS task can assume role to perform operations on S3 and other services
Amazon ECS – Load Balancer Integrations:
- Application Load Balancer supported and works for most use cases
- Dynamic port mapping with an Application Load Balancer makes it easier to run multiple tasks on the same Amazon ECS service on an Amazon ECS cluster.
- Network Load Balancer recommended only for high throughput / high performance use cases, or to pair it with AWS Private Link
Amazon ECS – Data Volumes:
- Mount EFS file systems onto ECS tasks
- Works for both EC2 and Fargate launch types
- Tasks running in any AZ will share the same data in the EFS file system
- Fargate + EFS = Serverless
Amazon Elastic Kubernetes Service (Amazon EKS): Amazon’s managed Kubernetes (open source)
- It is a way to launch managed Kubernetes clusters on AWS
- Kubernetes is an open-source system for automatic deployment, scaling and management of containerized (usually Docker) application
- It’s an alternative to ECS, but different API
- EKS supports EC2 if you want to deploy worker nodes or Fargate to deploy serverless containers
- Kubernetes is cloud-agnostic (can be used in any cloud – Azure, GCP…)
- For multiple regions, deploy one EKS cluster per region
- Collect logs and metrics using CloudWatch Container Insights
Use case: if your company is already using Kubernetes on-premises or in another cloud, and wants to migrate to AWS using Kubernetes
AWS App Runner: Fully managed service that makes it easy to deploy web applications and APIs at scale
- Start with your source code or container image
- Configure Settings such as RAM, vCPU, Auto Scaling, Health Check
- Create and Deploy
- Access using the URL
AWS Elastic Beanstalk: Region scoped (platform as a service)
- Elastic Beanstalk is a developer centric view of deploying an application on AWS
- Managed service:
- Automatically handles capacity provisioning, load balancing, scaling, application health monitoring, instance configuration, …
- Just the application code is the responsibility of the developer
- We still have full control over the configuration
- Beanstalk is free but you pay for the underlying instances
- Java, .NET, Node.js, PHP, Ruby, Python, and Go are supported.
- Here, we just select the platform and upload our application code, Beanstalk leverages Cloud Formation backend to deploy the infrastructure needed for our application
- Once application is launched, we will get the DNS name as well
Web Server Environment Tier: Traditional architecture with ELB, ASG with EC2 instances
Worker Environment Tier: an SQS is placed instead of ELB, workers (EC2 Instances configured with ASG) will pull messages from SQS queue
- Scale based on the number of SQS messages
- Can push messages to SQS queue from another Web Server Tier
Vertical scaling: increasing the size of an instance. Ex, RDS, ElastiCache.. Scale UP
Horizontal scaling: increasing the number of instances
High Availability: goes hand in hand with horizontal scaling, mainly to survive a data center loss
Load Balancers: forward traffic to multiple servers
- Single point of access (DNS)
- SSL termination
- Stickiness with cookies – works with ALB
Elastic Load balancer: AWS Load Balancer (AWS terminology)
- It is integrated with many AWS offerings / services
- EC2, EC2 Auto Scaling Groups, Amazon ECS
- AWS Certificate Manager (ACM), CloudWatch
- Route 53, AWS WAF, AWS Global Accelerator
Health checks are crucial for ELB. These are at the target group level
Target group: It is like a pool where EC2 instances can be added and this target group will be attached to ELB
Rebalancing: When rebalancing, Amazon EC2 Auto Scaling launches new instances before terminating the old ones, so that rebalancing does not compromise the performance or availability of your application.
- However, the scaling activity of Auto Scaling works in a different sequence compared to the rebalancing activity.
- Auto Scaling creates a new scaling activity for terminating the unhealthy instance and then terminates it.
- Later, another scaling activity launches a new instance to replace the terminated instance.
- Auto Scaling automatically handles spot interruptions, deregisters the instance from the load balancer, and initiates steps to launch a replacement instance.
Application Load Balancer: HTTP/1.1, HTTP/2, HTTPS, WebSocket ==> Layer 7
- Supports redirects (http to https)
- Routing tables to different target groups based on the URL suffix path, URL hostname, query strings, headers
- ALB are a great fit for micro services & container-based application (example: Docker & Amazon ECS)
- Has a port mapping feature to redirect to a dynamic port in ECS
- No UDP
- Cross zone load balanced enabled by default, can be disabled at target group
Target groups:
- EC2 instances (can be managed by an Auto Scaling Group) – HTTP
- ECS tasks (managed by ECS itself) – HTTP
- Lambda functions – HTTP request is translated into a JSON event
- IP Addresses – must be private IPs
Network Load Balancer: TCP, TLS (secure TCP), UDP ==> Layer 4
- Less latency then ALB
- NLB has one static IP per AZ, and supports assigning Elastic IP
- Health Checks support the TCP, HTTP and HTTPS Protocols
- Cross zone load balanced disabled by default
Target groups:
- EC2 instances
- IP Addresses – must be private IPs
- Application Load Balancer
Note: Only Network Load Balancer provides both static DNS name and static IP. While, Application Load Balancer provides a static DNS name but it does NOT provide a static IP.
Gateway Load Balancer: IP Protocol ==> Layer 3
- Deploy, scale, and manage a fleet of 3rd party network virtual appliances in AWS.
- Ex, Firewalls, IDS, IPS, deep packet inspection systems etc.
- Combines the following functions: Load Balancer – distributes traffic to your virtual appliances
- Transparent Network Gateway – single entry/exit for all traffic
- Uses the GENEVE protocol on port 6081
- Cross zone load balanced disabled by default
Target groups:
- EC2 instances
- IP Addresses – must be private IPs
Cross-zone load balancing: each load balancer instance distributes evenly across all registered instances in all AZ
Ex, one AZ has 2 instances, another AZ has 8 instances
- With CZLB, traffic will be load balanced based on instance count (10% for each instance)
- Without CZLB, traffic will be load balanced based on AZ (50% for each AZ)
- Cross zone load balanced enabled by default for ALB, can be disabled at target group
- Cross zone load balanced disabled by default for NLB and GLB
Load balancers – SSL Certificates:
- The load balancer uses an X.509 certificate (SSL/TLS server certificate)
- We can manage certificates using ACM (AWS Certificate Manager)
- Our own certificate also can be uploaded
Server Name Indication: solves the problem of multiple certificates onto one web server (to serve multiple websites)
- Add hostname/domain name of the target server in the certificate SNI
- Only works for ALB & NLB (newer generation), CloudFront
Connection draining/Deregistration Delay: stops sending new requests to EC2 while it is unhealthy
Sticky Sessions: We can use the sticky session feature (also known as session affinity) to enable the load balancer to bind a user’s session to a specific instance. This ensures that all requests from the user during the session are sent to the same instance. Sticky sessions cannot be used to complete in-flight requests made to instances that are de-registering or unhealthy.
Idle Timeout:
- For each request that a client makes through an Elastic Load Balancer, the load balancer maintains two connections
- The front-end connection is between the client and the load balancer
- The back-end connection is between the load balancer and a registered EC2 instance
- The load balancer has a configured “idle timeout” period that applies to its connections
- If no data has been sent or received by the time that the “idle timeout” period elapses, the load balancer closes the connection
- “Idle timeout” cannot be used to complete in-flight requests made to instances that are de-registering or unhealthy.
Auto Scaling Group (ASG): scale out (add), scale in (remove)
- Its free, only pay for underlying instances
- Possible to scale based on CloudWatch Alarms which in turn based on a metric like CPU, RequestCountPerTarget, Any custom metric etc.
Target scaling: Avg CPU of instances should be around 40%
Simple/step scaling: if CloudWatch alarm metric cpu>70%, add 2. if cpu<30, remove 1
Scheduled scaling: automate based on work timings
Predictive scaling: continuously forecast load and schedule scaling ahead, means analyze last 15 days and understand when the traffic is more and less. Accordingly scale as per the analysis timings
Scaling cooldowns: After a scaling activity happens, you are in the cooldown period (default 300 seconds)
- During the cooldown period, the ASG will not launch or terminate additional instances (to allow for metrics to stabilize)
- Advice: Use a ready-to-use AMI to reduce configuration time in order to be serving request fasters and reduce the cooldown period
All these are server less
AWS Lambda: Virtual Functions, for short executions (15 mins max), Run on-demand, scaling is automated
- Supports many programming languages
- Easy monitoring through AWS CloudWatch
- Easy to get more resources per functions (up to 10GB of RAM)
- Increasing RAM will also improve CPU and network
- By default, Lambda function is launched outside of our own VPC (in an AWS owned VPC), so it cannot access resources in VPC, but public resources like S3, DynamoDB will work
- So we need to Launch Lambda in our VPC by specifying VPC ID, subnets and SGs. So Lambda will create an ENI in our subnets to access internal resources
- Inbound network connections are blocked by AWS Lambda. However, outbound calls are allowed from Lambda.
- If it is launched inside VPC, to access internet, we would need a NAT device added to VPC and a route established from the private subnet to NAT device.
- By default, Lambda allows 1000 concurrent executions across all functions within a region. We can increase this limit by contacting support.
- Lambda also incurs a cold-start delay when a function is invoked after a long idle period. To prevent cold-start delay, we can optionally provision lambda capacity (called as Lambda provisioned concurrency) to keep them ready all the time at an extra cost.
X-Ray: By enabling X-ray tracing, we can identify the time taken for specific API calls and functions in the lambda code. This information can help in addressing the performance issue
Lambda Integrations: Important list
- API Gateway
- Kinesis
- DynamoDB
- S3
- CloudFront
- EventBridge
- CloudWatch logs
- SNS
- SQS
- Cognito
Customization at the Edge: Many modern applications execute some form of the logic at the edge
- Edge Function:
- A code that you write and attach to CloudFront distributions
- Runs close to your users to minimize latency
- CloudFront provides two types: CloudFront Functions & Lambda@Edge
- We don’t have to manage any servers, deployed globally
CloudFront Functions:
- Lightweight functions written in JavaScript
- For high-scale, latency-sensitive CDN customizations
- Sub-ms startup times, millions of requests/second
- Used to change Viewer requests and responses:
- Viewer Request: after CloudFront receives a request from a viewer
- Viewer Response: before CloudFront forwards the response to the viewer
- Native feature of CloudFront (manage code entirely within CloudFront)
Use cases:
- Cache key normalization: Transform request attributes (headers, cookies, query strings, URL) to create an optimal Cache Key
- Header manipulation: Insert/modify/delete HTTP headers in the request or response
- URL rewrites or redirects
- Request authentication & authorization: Create and validate user-generated tokens (e.g., JWT) to allow/deny requests
Lambda@Edge:
- Lambda functions written in NodeJS or Python
- Scales to 1000s of requests/second
- Used to change CloudFront requests and responses:
- Viewer Request – after CloudFront receives a request from a viewer
- Origin Request – before CloudFront forwards the request to the origin
- Origin Response – after CloudFront receives the response from the origin
- Viewer Response – before CloudFront forwards the response to the viewer
- Author your functions in one AWS Region (us-east-1), then CloudFront replicates to its locations
Use cases:
- Longer execution time (several ms)
- Adjustable CPU or memory
- Your code depends on a 3rd libraries (e.g., AWS SDK to access other AWS services)
- Network access to use external services for processing
- File system access or access to the body of HTTP requests
Lambda with RDS Proxy: If Lambda functions directly access the database, they may open too many connections under high load
RDS Proxy:
- Improve scalability by pooling and sharing DB connections
- Improve availability by reducing 66% the failover time and preserving connections
- Improve security by enforcing IAM authentication and storing credentials in Secrets Manager
- The Lambda function must be deployed in our VPC, because RDS Proxy is never publicly accessible
Invoking Lambda from RDS & Aurora:
- Invoke Lambda functions from within your DB instance
- Allows to process data events from within a database
- Supported for RDS for PostgreSQL and Aurora MySQL
- Must allow outbound traffic to your Lambda function from within your DB instance (Public, NAT GW, VPC Endpoints)
- DB instance must have the required permissions to invoke the Lambda function (Lambda Resource-based Policy & IAM Policy)
Amazon DynamoDB: non-relational DB, Key value pairs
- Fully managed, highly available with replication across multiple AZs
- NoSQL database – not a relational database – with transaction support
- Integrated with IAM for security, authorization and administration
- Standard & Infrequent Access (IA) Table Class
- It is made of Tables. Each Table has primary key (must be decided at the creation time) and have many items (rows) and items have attributes (null (zero) or more)
- By default, all DynamoDB tables are encrypted under an AWS owned customer master key (CMK), which do not write to CloudTrail logs
DynamoDB capacity Modes:
Provisioned Mode: (default)
- We specify the number of reads/writes per second
- We need to plan capacity beforehand
- Pay for provisioned Read Capacity Units (RCU) & Write Capacity Units (WCU)
- Possibility to add auto-scaling mode for RCU & WCU
On-Demand Mode:
- Read/writes automatically scale up/down with your workloads
- No capacity planning needed
- Pay for what you use, more expensive ($$$)
- Great for unpredictable workloads, steep sudden spikes
- A unit of Write Capacity enables you to perform one write per second. Item size for write operation is measured in units of 1 KB. So, writing a 4 KB item would be counted as four write operations.
- 1KB = 1 write per second…. Examples:
- 1 item = 8KB in size = 8 write capacity units
- 10 items = each 4KB in size = 40 write capacity units
- 10 items = each 8KB in size = 80 write capacity units
- 1KB = 1 write per second…. Examples:
- A unit of Read Capacity enables you to perform one strongly consistent read per second or two eventually consistent reads per second. Each read can transfer up to 4KB. So, if you read a 6 KB item, it would be counted as two read operations.
- Eventual consistent is applicable only for READs and we can get double the throughput when compared to strongly consistent reads
- 4KB = 1 read per second (strongly consistent – SC)
- 4KB = 2 reads per second (eventually consistent – EC)…. Examples:
- 1 item = 8KB in size = 2 RCU in SC or 1 RCU in EC
- 10 items = each 4KB in size = 10 RCU in SC or 5 RCU in EC
- 10 items = each 8KB in size = 20 RCU in SC or 10 RCU in EC
- A unit of Write Capacity enables you to perform one write per second. Item size for write operation is measured in units of 1 KB. So, writing a 4 KB item would be counted as four write operations.
DynamoDB Accelerator (DAX):
- Fully-managed, highly available, seamless in-memory cache for DynamoDB
- Help solve read congestion by caching
- Microseconds latency for cached data
- Doesn’t require application logic modification (compatible with existing DynamoDB APIs)
- 5 minutes TTL for cache (default)
DynamoDB Streams: Ordered stream of item-level modifications (create/update/delete) in a table
- Invoke AWS Lambda on changes to your DynamoDB table
DynamoDB Global Tables:
- Make a DynamoDB table accessible with low latency in multiple-regions
- Active-Active replication
- Applications can READ and WRITE to the table in any region
- Must enable DynamoDB Streams as a pre-requisite
- By default, all DynamoDB tables are encrypted under an AWS owned customer master key (CMK), which do not write to CloudTrail logs
DynamoDB – TTL:
- Automatically delete items after an expiry timestamp
- Use cases: reduce stored data by keeping only current items, adhere to regulatory obligations, web session handling etc.
DynamoDB Backups: Recovery process creates a new table
- Continuous backups using point-in-time recovery (PITR) – optionally enabled for the last 35 days, Export in DynamoDB JSON or ION format to S3 is possible and can be used Athena for query
- On-demand backups – long term until we delete, configured and managed through AWS Backup
API Gateway: API – Application programming Interface for microservices architecture
- Basically API is an intermediary between two applications that allows them to communicate with each other
- API Gateway takes all the requests from the client and send it to specific service
Amazon API Gateway: It is an AWS Service for creating, publishing, monitoring Rest APIs, HTTP and WebSocket APIs
- It’s server less, so no infrastructure to manage
- It does throttle requests when it exceeds 10000 requests per second. we can contact AWS support to increase the service limit
- Handle Security (authentication and authorization)
- Cache API responses
- Can Invoke Lambda Function, Expose HTTP endpoints in the backend, Expose any AWS service
- API Gateway creates RESTful APIs that enable stateless client-server communication
- API Gateway also creates WebSocket APIs that adhere to the WebSocket protocol, which enables stateful, full-duplex communication between client and server
Amazon API Gateway – Endpoints:
Edge-Optimized (default): For global clients
- Requests are routed through the CloudFront Edge locations (improves latency)
- The API Gateway still lives in only one region
Regional:
- For clients within the same region
- Could manually combine with CloudFront (more control over the caching strategies and the distribution)
Private:
- Can only be accessed from VPC using an interface VPC endpoint (ENI)
- Use a resource policy to define access
API Gateway – Security:
User Authentication through
- IAM Roles (useful for internal applications)
- Cognito (identity for external users – example mobile users)
- Custom Authorizer (your own logic)
Custom Domain Name HTTPS security through integration with ACM
- If using Edge-Optimized endpoint, then the certificate must be in us-east-1
- If using Regional endpoint, the certificate must be in the API Gateway region
- Must setup CNAME or A-alias record in Route 53
ACM – API Gateway: Create a Custom Domain Name in API Gateway
Edge-Optimized (default): For global clients
- Requests are routed through the CloudFront Edge locations (improves latency)
- The API Gateway still lives in only one region
- The TLS Certificate must be in the same region as CloudFront, in us-east-1
- Then setup CNAME or (better) A-Alias record in Route 53
Regional:
- For clients within the same region
- The TLS Certificate must be imported on API Gateway, in the same region as the API Stage
- Then setup CNAME or (better) A-Alias record in Route 53
API Gateway – Caching:
- We can add caching to API calls by provisioning an Amazon API Gateway cache and specifying its size in gigabytes
- Caching allows to cache the endpoint response
- Caching can reduce number of calls to the backend and improve latency of requests to the API
AWS Step Functions: Build serverless visual workflow to orchestrate your Lambda functions
Ex, start –> all steps with yes/no statements –> stop
Synchronous replication products write data to primary storage and the replica simultaneously. As such, the primary copy and the replica should always remain synchronized.
Asynchronous replication products write data to the primary storage first and then copy the data to the replica.
RDS: Entire database and the OS to be managed by AWS
- Cant SSH into instances, full managed by AWS
- Storage backed by EBS (gp2 or io1)
- Scales automatically (maximum storage threshold can be set)
- Useful for application with unpredictable workloads
- Supports all RDS database engines (MariaDB, MySQL, PostgreSQL, SQL Server, Oracle)
- Audit Logs can be enabled and sent to CloudWatch Logs for longer retention
- If your workload is unpredictable, we can enable storage autoscaling for an Amazon RDS DB instance. With storage autoscaling enabled, when Amazon RDS detects that it is running out of free database space it automatically scales up the storage
RDS Read Replicas: up to 5 replicas (within AZ, cross AZ or cross region)
- Asynchronous replication (within region: free transfer, cross region: $$$)
- Replicas can be promoted to their own DB
- Applications must update the connection string to leverage read replicas
RDS Multi AZ (disaster recovery): automatic failover to standby (so only one DNS name)
- Synchronous replication
- Converting RDS from one AZ to multi AZ requires snapshot from source and restore in destination AZ. (no need to stop the DB)
- Any database engine level upgrade for an RDS DB instance with Multi-AZ deployment triggers both the primary and standby DB instances to be upgraded at the same time. This causes downtime until the upgrade is complete
- RDS applies OS updates by performing maintenance on the standby, then promoting the standby to primary, and finally performing maintenance on the old primary, which becomes the new standby
- When primary goes down, the CNAME record will be updated to point to the standby DB
- Amazon RDS supports Cross-Region Automated Backups. Manual snapshots and Read Replicas are also supported across multiple Regions.
RDS Custom: full admin access to the underlying OS and the database
- Managed Oracle and Microsoft SQL Server Database with OS and database customization
- De-activate Automation Mode to perform the customization, better to take a DB snapshot before
- Access to underlying database and OS, so that we can install patches, access EC2 via SSH or session manager, configure settings, enable native features
RDS Backup:
- It supports two methods of backup: automated backups and automated snapshots
- The backups are complete backups of the database
- The snapshots are backups of the changes incurred since the last snapshot
RDS DB snapshots: Amazon RDS creates a storage volume snapshot of DB instance, backing up the entire database
- We can use these snapshots to restore when required. For ex, we want to run a DB in a test environment once a month, so run a database, once testing is completed, take snapshot and remove the database. This is cost effective
- When DB Instance is stopped, we are charged for provisioned storage (including provisioned IOPS)
- Snapshots are available until customer explicitly deletes them
- Restored DBs will always be a new RDS instance with a new DNS endpoint and you can restore up to the last 5 minutes, and new default SG will be applied
- Older version when restored from the snapshot will be automatically upgraded to currently supported database engine version
RDS Encryption:
- We can enable encryption for RDS DB instance only at the time of creation, but not after
- However, we can add encryption to an unencrypted DB instance by creating a snapshot of DB instance, and then creating an encrypted copy of that snapshot
- We can than restore a DB instance from the encrypted snapshot to get an encrypted copy of original DB instance
RDS Proxy:
- Fully managed database proxy for RDS
- Allows apps to pool and share DB connections established with the database
- Improving database efficiency by reducing the stress on database resources (e.g., CPU, RAM) and minimize open connections (and timeouts)
- Serverless, autoscaling, highly available (multi-AZ)
- Reduced RDS & Aurora failover time by up 66%
- Supports RDS (MySQL, PostgreSQL, MariaDB) and Aurora (MySQL, PostgreSQL)
- Enforce IAM Authentication for DB, and securely store credentials in AWS Secrets Manager
- RDS Proxy is never publicly accessible (must be accessed from VPC)
Lambda with RDS Proxy: If Lambda functions directly access the database, they may open too many connections under high load
RDS Proxy:
- Improve scalability by pooling and sharing DB connections
- Improve availability by reducing 66% the failover time and preserving connections
- Improve security by enforcing IAM authentication and storing credentials in Secrets Manager
- The Lambda function must be deployed in our VPC, because RDS Proxy is never publicly accessible
Invoking Lambda from RDS & Aurora:
- Invoke Lambda functions from within your DB instance
- Allows to process data events from within a database
- Supported for RDS for PostgreSQL and Aurora MySQL
- Must allow outbound traffic to your Lambda function from within your DB instance (Public, NAT GW, VPC Endpoints)
- DB instance must have the required permissions to invoke the Lambda function (Lambda Resource-based Policy & IAM Policy)
Aurora: Postgres and MySQL are supported
- 6 copies of data across 3 AZ’s for High Availability. Shared storage Volume (with Replication + self-healing + auto expanding) across AZ’s, this is where master writes to storage
- One copy will act as a master and others act as a read replicas (max 15 replicas with auto scaling feature)
- If Master goes down, any read replica can be become master.. Failover is less than 30 seconds
- Storage automatically grows in increments of 10GB, up to 128 TB ==> auto expanding
- Support for cross region replication
- Application will have writer endpoint (DNS) which points to master and Reader endpoint (DNS) which points to read replicas (connection load balancing)
- We can also create custom endpoint and can point to specific read replicas
- Audit Logs can be enabled and sent to CloudWatch Logs for longer retention
Note: Aurora read replica will become master as part of failover. RDS read replica is different
- For Amazon Aurora, each Read Replica is associated with a priority tier (0-15). In the event of a failover, Amazon Aurora will promote the Read Replica that has the highest priority (the lowest numbered tier). If two or more Aurora Replicas share the same priority, then AWS promotes the replica that is largest in size. If two or more Aurora Replicas share the same priority and size, then Amazon Aurora promotes an arbitrary replica in the same promotion tier.
- Ex, Tier-15 (32TB), Tier-1 (16TB), Tier-10 (16TB)
Aurora Serverless: Automated database instantiation and autoscaling based on actual usage
- Client will access proxy fleet that is managed by Aurora and in the backend many Aurora instances will be created based on the workload in a serverless fashion.
Aurora multi-master: For immediate Failover. Every node does Read & Write, instead of only one master node
Aurora Global Database:
- 1 primary region, 5 secondary regions, 16 read replicas per secondary region
- Promoting another region (for disaster recovery) has an RTO of < 1 minute
- Cross region replication takes less than 1 second
RDS and Aurora restore:
- Restoring a RDS / Aurora backup or a snapshot creates a new database
- RDS: store on-prem DB backup in S3 and restore into new RDS instance running MySQL
- Aurora: backup on-prem DB using Percona XtraBackup and store in S3 and restore into new aurora cluster running MySQL
Aurora DB Cloning: Useful to create a “staging” database from a “production” database without impacting the production database
- Creating a clone is faster and more space-efficient than physically copying the data using other techniques such as restoring the snapshot
Elasticache: Key/value pairs, managed Redis or Memcached ==> DB Cache
- Recommended for storing sessions data
- It is a noSQL Database
- Caches are in-memory databases with really high performance, low latency
- Using ElastiCache involves heavy application code changes
- ElastiCache can return responses to queries with sub-millisecond latency
Redis: reduntant architecture (Highly available)
Memcached: multi-threaded architecture, Multi-node for partitioning of data (sharding)
Sharding: is where we take a single relational database and spread it across multiple DB servers. This can be done by splitting a table horizontally or vertically
- Below are the few examples where it is used:
- Session store:
- User logs into any of the application
- The application writes the session data into ElastiCache
- The user hits another instance of our application
- The instance retrieves the data and the user is already logged in
- DB Cache:
- Applications queries ElastiCache, if not available, get from RDS and store in ElastiCache.
- Helps relieve load in RDS
- Cache must have an invalidation strategy to make sure only the most current data is used in there
- Session store:
- Redis authentication tokens enable Redis to require a token (password) before allowing clients to execute commands, thereby improving data security.
Elasticache patterns:
Lazy Loading: all the read data is cached, data can become stale in cache
Write Through: Adds or update data in the cache when written to a DB (no stale data)
Session Store: store temporary session data in a cache (using TTL features)
RDS Databases ports:
- PostgreSQL: 5432
- MySQL: 3306
- Oracle RDS: 1521
- MSSQL Server: 1433
- MariaDB: 3306 (same as MySQL)
- Aurora: 5432 (if PostgreSQL compatible) or 3306 (if MySQL compatible)
Identity & Access Management is Global service
Should follow principle of least privilege access
Groups contain only users, not other groups ==> 5000 IAM users per AWS account, 1 user can be part of 10 groups max.
- To overcome this, we can make use of IAM roles and federated users
User can be alone or single group or multiple groups
IAM Credentials Report: a report that lists all your account’s users and the status of their various credentials
IAM Access Advisor: Access advisor shows the service permissions granted to a user and when those services were last accessed.
Access keys: access AWS using CLI or SDK, associated with IAM user
How to create access keys: Login with IAM user who has admin access
- Open user
- Go to security credentials
- Create access key under Access Keys
- Download csv file or show and copy Key ID & Secret to notepad
#aws configure –> never use this and give below in EC2 connect
- Access key ID: give key ID
- AWS secret access key: give scret
- Default region name: closest one
- Default output format: none
#aws iam list-users -> to verify
AWS cloud shell: the terminal to issue commands against AWS. Its only available in few regions (Icon will be there beside search bar)
Permissions:
- Users or Groups can be assigned JSON documents called policies
- These policies define permissions of the users
- Apply the least privilege principle, means don’t give more permissions than user needs
IAM Policies: also called as identity policies: Set of security statement to AWS (grants/denies access to any identity that uses the policy)
- By default, no access (Implicit Deny) is applied to any new user/group
- Can be assigned to groups – which in turn assigns to users in group
- Can be assigned directly to users which are called inline policies
- Explicit Deny policy will have preference
Explicit Deny > Explicit Allow > Implicit Deny
IAM Evaluation Logic:
- Decision starts with assumption that the request will be denied
- Then, all the attached policies are evaluated
- Code will look if there is any explicit deny in the policy
- If explicit deny is not found, code will look for allowed instruction and if yes, then decision is allowed
- If no allow is found, decision is deny
Inline Policies: Unique to each identity, assigned directly to users, editing is hard as we have to change for each user requirement
Managed Policies: Attach to anyone we want, this is applied to set of users/all users/groups
IAM policy structure: Statements consists of
- Sid:an identifier for the statement (optional)
- Effect:whether the statement allows or denies access (Allow, Deny)
- Principal:account/user/role to which this policy applied to
- Action:list of actions this policy allows or denies
- Resource:list of resources to which the actions applied to
Ex: arn:awn:s3:::test/* (object level permission) => actual contents on bucket
Ex: arn:awn:s3:::test (bucket level permission) => actual bucket
- Condition:conditions for when this policy is in effect (optional)
AWS CLI: protected by access keys
- Access keys are generated through the AWS console
- Access keys are secret, don’t share
- CLI is a tool that enables us to interact with AWS services using commands in cmd shell
- Direct access to the public APIs of AWS services
- We can develop scripts to manage resources
- Its an open-source https://github.com/aws/aws-cli
IAM role: it’s like a user but not physical, it is for a service
- Roles can be used as temporary identities. By default the granted role expires after 12 hours.
- Assign permissions to AWS services to perform actions
- Can be attached to various services like EC2, Lambda and others
- IAM Policies can be attached to IAM Role that determines the permissions
- Using IAM roles, it is possible to access cross-account resources
- An instance can have only one role assigned at a time
For ex, EC2 instance wants to perform some action on AWS. Create an IAM role with permissions and assign to EC2
Trust Policy: Who can assume the role (which identities can assume that role)
- If the role gets assumed by something which is allowed to assume the role, then AWS generates temporary credentials (STS) and made available to identity which assumes the role
- Once STS expires, identity needs to reassume the role
Permissions Policy: Normal IAM policies…what actions can be performed
Billing Data access to IAM users:
- Login with root
- Click on My account
- Scroll down to IAM user and Role access to Billing Information
- Edit and Enable “Activate IAM access”
AWS Organizations:
- Allows to manage multiple AWS accounts. Management account and member accounts
- Consolidated billing across all accounts – single payment method
- Enable CloudTrail on all accounts, send logs to central S3 account
- Send CloudWatch Logs to central logging account
- Migrate the account’s using AWS organization console. To do this we must have root or IAM access to both the member and master accounts.
Service Control Policies (SCP):
- IAM Polices applied to OU or Accounts to restrict Users and Roles
- They do not apply to the management account (full admin power)
- Must have an explicit allow (does not allow anything by default – like IAM)
Ex, OU contains accounts. If OU has deny Redshift, Account has allow all. Still users in account will be denied Redshift access
- Create a policy and attach to OU or account
- SCPs affect all users and roles in attached accounts, including the root user
- SCPs do not affect service-linked role
Resource Based Policies: The policies which are related to a resource for cross account access. Ex, Bucket policy for an AWS organization
- aws:PrincipalOrgID can be used in any resource policies to restrict or allows access to accounts that are member of an AWS Organization
- With Resource-based policy, we can grant access to users, roles and other accounts. Group is not considered identity, and we cannot grant access to a group in a resource-based policy.
IAM Role vs Resource Based Policies:
- When you assume a role (user, application or service), you give up your original permissions and take the permissions assigned to the role
- When using a resource-based policy, the principal doesn’t have to give up his permissions
- SNS, SQS, Lambda, CloudWatch logs, API Gateway etc. are for resource based policies
- Kinesis data streams, Systems Manager Run Command, ECS Task is using IAM Role
IAM Permission Boundaries:
- It is supported for users and roles (not groups)
- Advanced feature to use a managed policy to set the maximum permissions an IAM entity can get.
Note: Effective permissions will be match from all three (Organizations SCP, IAM Permission Boundaries, Identity Based Policy)
IAM Identity Center: earlier known as AWS single sign-on
- One login (single sign-on) for all your
- AWS accounts in AWS Organizations
- Business cloud applications (e.g., Salesforce, Box, Microsoft 365, …)
- SAML2.0-enabled applications
- EC2 Windows Instances
- IdP can be IAM Identity center, AD, Okta etc.
- Permission sets (IAM Policies…read only or write access) can be assigned to users, group, OUs
AWS Directory Services:
- AWS Managed Microsoft AD
- Create your own AD in AWS, manage users locally, supports MFA
- Establish “trust” connections (two way trust) with your on-premises AD
- To run directory-aware workloads in the AWS Cloud such as SQL Server-based applications
- AD Connector
- Directory Gateway (proxy) to redirect to on-premises AD, supports MFA
- Users are managed on the on-premises AD
- Simple AD
- AD-compatible managed directory on AWS
- Cannot be joined with on-premises AD
AWS Control Tower:
- Easy way to set up and govern a secure and compliant multi-account AWS environment based on best practices
- AWS Control Tower uses AWS Organizations to create accounts
Guardrails: Provides ongoing governance for your Control Tower environment (AWS Accounts)
- Preventive Guardrail: using SCPs (e.g., Restrict Regions across all your accounts)
- Detective Guardrail: using AWS Config (e.g., identify untagged resources) => Trigger SNS and invoke Lamda function to remediate/add tags
Amazon Cognito: Give outside users an identity to interact with our web and mobile application
Cognito User Pools: acts as an IdP, provide sign-in functionality
- Create a server less database of user for your web and mobile apps
- Integrates with API Gateway and ALB
Cognito Identity Pools:
- Provide temporary AWS credentials to users so they can access some AWS resources directly or through API Gateway
- Users source can be Cognito User Pools, 3rd party logins, etc.
- Integrate with Cognito User Pools as an identity provider
- IAM policies applied to the credentials are defined in Cognito
- Default IAM roles for authenticated and guest users
AWS STS – Security: Security Token Service
- Temporary, limited privileged credentials for IAM users or federated users
- By default, It is a global service, and all STS requests will go to a single endpoint at https://sts.amazonaws.com
- We can optionally send your AWS STS requests to endpoints in any region (can reduce latency)
- The AWS STS API action returns temporary security credentials that consist of:
- An access key which consists of an access key ID and a secret ID.
- A session token.
- Expiration or duration of validity.
- Users (or an application that the user runs) can use these credentials to access your resources.
- With STS you can request a session token using one of the following APIs:
- AssumeRole – can only be used by IAM users (can be used for MFA).
- AssumeRoleWithSAML – can be used by any user who passes a SAML authentication response that indicates authentication from a known (trusted) identity provider.
- AssumeRoleWithWebIdentity – can be used by an user who passes a web identity token that indicates authentication from a known (trusted) identity provider.
- GetSessionToken – can be used by an IAM user or AWS account root user (can be used for MFA).
- GetFederationToken – can be used by an IAM user or AWS account root user.
- AWS recommends using Cognito for identity federation with Internet identity providers.
VPC: It’s a virtual network in the cloud. max CIDR is /16 and min is /28 (max 5 CIDRs)
- Default VPC range is assigned from 172.16.0.0/12
- VPCs can contain up to 200 subnets
- A region can contain up to 5 VPCs by default
Default VPC: All AWS accounts will have default VPC, any EC2 instance launched will get public IP, private IP and internet access
Custom VPC: In a non-default VPC instances will be assigned a private but not a public DNS hostname. Public DNS hostname will be assigned depending on the DNS attributes we specify for the VPC and if the instance has a public IPv4 address
Internet Gateway: To connect to internet
- Once created, we need to edit route tables for internet access
- An Internet Gateway serves two purposes
- To provide a target in VPC route tables for internet-routable traffic
- To perform network address translation (NAT) for instances that have been assigned public IPv4 addresses
Egress only internet gateway: similar to NAT Gateway, but for IPV6. Only outbound connection though. EC2 with IPV6 can be in private subnet also
AWS Subnets: public or private where EC2 instances reside
- AWS reserves 5 IP addresses (first 4 & last 1) in each subnet
- These 5 IP addresses are not available for use and can’t be assigned to an EC2 instance
- Example: if CIDR block 10.0.0.0/24, then reserved IP addresses are:
- 10.0.0.0 – Network Address
- 10.0.0.1 – reserved by AWS for the VPC router
- 10.0.0.2 – reserved by AWS for mapping to Amazon-provided DNS
- 10.0.0.3 – reserved by AWS for future use
- 10.0.0.255 – Network Broadcast
Route tables: attach to subnet to route towards next hop
- This will have immutable (cant delete) local route entry for VPC subnet
Bastian Host: will be in public subnet. We can use this host to SSH into private instances
NAT Gateway: only in 1 AZ – Allows private subnets indirect access to the internet
- No Security Groups to manage
- Resides in public subnet
- Must create multiple NAT Gateways in multiple AZs for fault-tolerance
NAT Instance: Legacy solution, resides in public subnet, Disable source/destination check
- Internet traffic bandwidth depends on EC2 instance type
- Security Groups to manage
- It supports port forwarding
- It can be used as a Bastian Host
ENI – Elastic network interface:
- Logical component in a VPC that represents a virtual network card
- It’s like a ethernet interface which has fixed/Elastic private IP to connect to EC2, public IP also gets assigned only for primary (the interface which comes when EC2 was created)
- We can add more ENI’s as required
- We can create ENI independently and attach them on the fly (move them) on EC2 instances for failover
- Bound to a specific availability zone (AZ)
- There is one network interface per load balancer subnet
Security groups: instance level – only allow rules, stateful
- For new SG, Inbound traffic is blocked by default, outbound traffic is allowed by default
- We can allow SG as a source in another SG. For Ex, if EC2 instances are load balanced, then EC2 SG will have source as Load Balancer SG
- We can assign multiple (up to five) security groups to your EC2 instances
- Within the default VPC there is a default security group, which has inbound allow rules (allowing traffic from within the group) and all outbound traffic is allowed
NACL: subnet level, stateless (return traffic should be allowed)
- Great way to block specific IPs at subnet level
- A VPC automatically comes with a default network ACL which allows all inbound/outbound traffic
- A custom NACL denies all traffic both inbound and outbound by default.
VPC Peering: To connect two VPC’s, no overlapping CIDRs
- Route tables must be updated to connect to another VPC
- It is not transitive, need to create VPN peering between each VPC’s
- Max 125 peering connections
VPC Endpoint: AWS Private Network, traffic routed through internal network
- Gateway: S3 and DynamoDB, no SG, its free
- Interface: most AWS services, Provisions an ENI (private IP address) as an entry point (must attach a Security Group)
AWS PrivateLink:
- Most secure and scalable way to expose a service to 1000s of VPC (own or other accounts)
- Does not require VPC peering, IGW, NAT, route tables
- Requires NLB (service/provider VPC) and ENI (customer/consumer VPC)
- If the NLB is in multiple AZ, and ENI in multiple AZ, then the solution is fault tolerant
VPC Interface Endpoint (PrivateLink):
- Only to expose one service from one VPC (provider) to other VPC (consumer)
- Host the own service like (either in VPC – ex, providers VPC or on-prem) behind the Network Load Balancer (NLB) and expose NLB as endpoint service and that endpoint service can be accessed via the interface endpoint
- The endpoint service NLB can be exposed to multiple consumer VPCs
- NLB to on-prem connectivity is via VPN or Direct Connect (DX)
- So basically we can create interface endpoint either for NLB (custom endpoint service) or any AWS Services like SQS, Kinesis, API Gateway, EC2, ELB, KMS etc.
- VPC Interface Endpoint creates ENI into our subnet, into our VPC (ex, consumer VPC) and through which we can access the endpoint services through AWS PrivateLink
- Generally, we can be either service provider or consumer based on the requirement
- Basically, “Endpoint Service” will be created on provider VPC and “Endpoint” will be created on consumer VPC in VPC section
VPC Flow logs: captures information about IP traffic going into our interfaces.
- Can go to S3 and CloudWatch
- Query VPC flow logs using Athena on S3 or CloudWatch Logs Insights
AWS VPN Cloud hub:
- Low-cost hub-and-spoke model for primary or secondary network connectivity between different locations (VPN only)
- connect multiple VPN connections on the same VGW, setup dynamic routing and configure route tables
Virtual Interfaces:
- Public VIF: Enables the connectivity to all AWS Public IP addresses (S3, Dynamo DB, Elastic IP etc.)
- Private VIF: Enables the connectivity to resources within VPC using private IPs
- Transit VIF: Enables access to Transit Gateways associated with Direct connect gateways
Direct Connect (DX): private connection from remote network to VPC
- Virtual private gateway is needed on AWS
- Public Virtual Interface: Connects to S3 from AWS direct connect endpoint
- Private Virtual Interface: connects to Virtual Private Gateway (VPG)
path 1 (dedicated): customer <-> AWS direct connect endpoint
path 2 (Hosted): customer <-> partner router <-> AWS direct connect endpoint
Direct Connect Gateway: To setup a Direct Connect to one or more VPC in many different regions (same account)
Customer network <-> AWS Direct connection location (Dedicated or Hosted) <-> DCG <-> VPG (multiple)
<-> is private virtual interface
Site-to-Site VPN:
- Virtual Private Gateway (VPG) – connect to VPN/direct connect termination point
- Customer Gateway
Important step: enable Route Propagation for the Virtual Private Gateway in the route table that is associated with your subnets
Transit Gateway: Transitive peering between thousands of VPC and on-premises
- Works with Direct Connect Gateway, VPN Connections
VPC Traffic Mirroring: Allows us to capture and inspect traffic in VPC
- Capture the traffic from source (ENI) to destination (ENI or NLB)
- Capture all packets or capture the packets of your interest (optionally, truncate packets)
- Source and Target can be in the same VPC or different VPCs (VPC Peering)
VPC Reachability Analyzer: It is a configuration analysis tool that enables us to perform connectivity testing between a source resource and a destination resource in virtual private clouds (VPCs).
- When the destination is reachable, Reachability Analyzer produces hop-by-hop details of the virtual network path between the source and the destination.
- When the destination is not reachable, Reachability Analyzer identifies the blocking component. For Ex, paths can be blocked by configuration issues in a security group, network ACL, route table, or load balancer.
AWS Network Firewall: protects VPC
- It is a stateful, managed, network firewall and intrusion detection and prevention service for VPC
- Rules can be centrally managed cross-account by AWS Firewall Manager to apply to many VPCs
- From Layer3 to Layer7 protection
- Send logs of rule matches to Amazon S3, CloudWatch Logs, Kinesis Data Firehose
AWS Firewall Manager: Allows us to centrally configure and manage firewall rules across your accounts and applications in AWS Organizations. As new applications are created, Firewall Manager makes it easier to bring new applications and resources into compliance by enforcing a common set of security rules.
Services supported by AWS FM:
- AWS WAF
- AWS Network Firewall
- AWS Shield
- Amazon Route53 resolver DNS Firewall
- VPC Security Groups
- Third party firewalls
AWS Resource Access Manager: helps us securely share our resources across AWS accounts, within our organization or organizational units (OUs) in AWS Organizations, and with IAM roles and IAM users for supported resource types. We can use AWS RAM to share resources with other AWS accounts. This eliminates the need to provision and manage resources in every account. When we share a resource with another account, that account is granted access to the resource and any policies and permissions in that account apply to the shared resource.
VPC sharing: (part of Resource Access Manager) allows multiple AWS accounts to create their application resources such as EC2 instances, RDS databases, Redshift clusters, and Lambda functions, into shared and centrally-managed Amazon Virtual Private Clouds (VPCs).
To set this up, the account that owns the VPC (owner) shares one or more subnets with other accounts (participants) that belong to the same organization from AWS Organizations. After a subnet is shared, the participants can view, create, modify, and delete their application resources in the subnets shared with them. Participants cannot view, modify, or delete resources that belong to other participants or the VPC owner.
The only AWS service which provides 100% availability SLA
DNS: domain name to IP address
Local DNS –> Root DNS –> TLD DNS –> SLD DNS server
Domain Registrar: Amazon Route 53, GoDaddy, …
DNS Records: A, AAAA, CNAME, NS, …
Zone File: contains DNS records
Name Server: resolves DNS queries (Authoritative or Non-Authoritative)
Top Level Domain (TLD): .com, .us, .in, .gov, .org, …
Second Level Domain (SLD): amazon.com, google.com,
Hosted Zones: A container for records that define how to route traffic to a domain and its subdomains
Private hosted zone: It is a container for records for a domain that we host in one or more Amazon virtual private clouds (VPCs).
- We create a hosted zone for a domain (such as example.com), and then we create records to tell Amazon Route 53 how we want traffic to be routed for that domain within and among VPCs.
- For each VPC that we want to associate with the Route 53 hosted zone, change the following VPC settings to true:
- enableDnsHostnames
- enableDnsSupport
- DNS hostnames and DNS resolution are required settings for private hosted zones. DNS queries for private hosted zones can be resolved by the Amazon-provided VPC DNS server only. As a result, these options must be enabled for your private hosted zone to work.
- If we use custom DNS domain names defined in a private hosted zone in Amazon Route 53, we must set both the enableDnsHostnames and enableDnsSupport attributes to true.
CNAME: maps the hostname to another hostname. Only for non-root domain
Aliases: Points the hostname to a AWS resource like CloudFront, load balancer etc.
A TXT Record: DNS record can be used to store human-readable information about a server, network, and other accounting data with a host
MX Record: It is used for specifying mail servers in your domain
Pointer (PTR) record: resolves an IP address to a fully-qualified domain name (FQDN) as an opposite to what A record does. PTR records are also called Reverse DNS records. ‘PTR record’ cannot be used to map one domain name to another.
Routing policies:
- Simple – route traffic a single resource, if multiple IPs, client chooses randomly, no health checks
- Weighted – Control the % of the requests that go to each specific resource, health checks are possible
- Assign a weight of 0 to a record to stop sending traffic to a resource
- If all records have weight of 0, then all records will be returned equally
- Failover – health check is mandatory
- Latency based – least latency, health checks are possible
- Geolocation – Specify location by continent or country, Health checks are possible
- Multi-value answer – up to 8 healthy records, health check is possible
- Geo-proximity (using the Route53 traffic flow feature)
Route 53 Health Checks: only for public resources, automated DNS failover
- Route 53 health checkers are outside the VPC
- They can’t access private endpoints (private VPC or on-premises resource)
- We can create a CloudWatch Metric and associate a CloudWatch Alarm, then create a Health Check that checks the alarm itself
3rd party Domain registrar with AWS DNS Service:
- Create a Hosted Zone in Route 53
- Update NS Records on 3rd party website to use Route 53 Name Servers
CloudFront: Content Delivery Network (CDN)
- Improves read performance and content is cached at the edge
- Great for static content that must be available everywhere
- The cost of data out per edge location varies
- CloudFront can route to multiple origins based on the content type
- Use field level encryption in CloudFront to protect sensitive data for specific content
- CloudFront edge locations are read/write
- we can use an Origin Access Control (OAC) to restrict access to content in Amazon S3 but not on EC2 or ELB
CloudFront Origins:
S3 Bucket: Edge to S3 will be private AWS network
- Enhanced security with CloudFront Origin Access Control (OAC) + bucket policy
- CloudFront can be used as an ingress (to upload files to S3)
Custom Origin (HTTP):
- Application Load Balancer – allow public IPs of edge location in Security Group
- EC2 instance (public) – allow public IPs of edge location in Security Group
- S3 website (must first enable the bucket as a static S3 website)
- Any HTTP backend you want
- We can set up CloudFront with origin failover for scenarios that require high availability.
- We create an origin group with two origins: a primary and a secondary.
- If the primary origin is unavailable or returns specific HTTP response status codes that indicate a failure, CloudFront automatically switches to the secondary origin.
CloudFront Geo Restriction: Allow or Block based on the country
CloudFront Price Classes:
- Price Class All: all regions – best performance
- Price Class 200: most regions, but excludes the most expensive regions
- Price Class 100: only the least expensive regions
CloudFront – Cache Invalidations:
- In case, we update the backend origin CloudFront doesn’t know about it and will only get the refreshed content after the TTL has expired
- However, we can force an entire or partial cache refresh (thus bypassing the TTL) by performing a CloudFront Invalidation
- You can invalidate all files (*) or a special path (/images/*)
Note: Use HTTP Cache-control header max-age to specify the duration for which the content can be used. For more precise control, we can use the Expires header to specify an expiration timestamp. Invalidation is an expensive operation and should be used only occasionally. One scenario for using invalidation is removing in-appropriate content that was distributed accidentally. Fast-changing information can be cached and reused from edge locations. Renaming page every 15 minutes is an unnecessarily complicated solution.
Customization at the Edge: Many modern applications execute some form of the logic at the edge
- Edge Function:
- A code that you write and attach to CloudFront distributions
- Runs close to your users to minimize latency
- CloudFront provides two types: CloudFront Functions & Lambda@Edge
- We don’t have to manage any servers, deployed globally
CloudFront Functions:
- Lightweight functions written in JavaScript
- For high-scale, latency-sensitive CDN customizations
- Sub-ms startup times, millions of requests/second
- Used to change Viewer requests and responses:
- Viewer Request: after CloudFront receives a request from a viewer
- Viewer Response: before CloudFront forwards the response to the viewer
- Native feature of CloudFront (manage code entirely within CloudFront)
Use cases:
- Cache key normalization: Transform request attributes (headers, cookies, query strings, URL) to create an optimal Cache Key
- Header manipulation: Insert/modify/delete HTTP headers in the request or response
- URL rewrites or redirects
- Request authentication & authorization: Create and validate user-generated tokens (e.g., JWT) to allow/deny requests
Lambda@Edge:
- Lambda functions written in NodeJS or Python
- Scales to 1000s of requests/second
- Used to change CloudFront requests and responses:
- Viewer Request – after CloudFront receives a request from a viewer
- Origin Request – before CloudFront forwards the request to the origin
- Origin Response – after CloudFront receives the response from the origin
- Viewer Response – before CloudFront forwards the response to the viewer
- Author your functions in one AWS Region (us-east-1), then CloudFront replicates to its locations
Use cases:
- Longer execution time (several ms)
- Adjustable CPU or memory
- Your code depends on a 3rd libraries (e.g., AWS SDK to access other AWS services)
- Network access to use external services for processing
- File system access or access to the body of HTTP requests
CloudFront signed URLs:
- Many companies that distribute content over the internet want to restrict access to documents, business data, media streams, or content that is intended for selected users, for example, users who have paid a fee.
- To securely serve this private content by using CloudFront, we can do the following:
- Require that users access private content by using special CloudFront signed URLs or signed cookies.
- A signed URL includes additional information, for example, expiration date and time, that gives more control over access to the content.
CloudFront signed cookies: Allow us to control who can access our content when we don’t want to change our current URLs or when we want to provide access to multiple restricted files, for example, all of the files in the subscribers area of a website.
Unicast IP: one server holds one IP address
Anycast IP: all servers hold the same IP address and the client is routed to the nearest one
AWS Global Accelerator:
- In General, To access the application hosted in different country, the traffic has to go via multiple hops to reach the servers
- This Accelerator will make use of edge locations to route the traffic to end server via AWS private network
- It will create two global Anycast IP address to access the application and DNS Name
(URL) as well. Endpoint can be EC2, Elastic IP, ALB, NLB (public or private) - We can also migrate up to two /24 IPv4 address ranges and choose which /32 IP addresses to use when you create the accelerator.
- When we connect Global Accelerator DNS/URL, Anycast IP send traffic directly to Edge Locations, the Edge locations send the traffic to your application
- Global accelerator does health check as well of our applications
- Security group should have source as two Anycast IPs only.
- No caching here at the edge, all the traffic will go to our application, which means edge acts like a proxy
AWS Global Accelerator vs CloudFront:
- They both use the AWS global network and its edge locations around the world
- Both services integrate with AWS Shield for DDoS protection.
- CloudFront
- Improves performance for both cacheable content (such as images and videos)
- Dynamic content (such as API acceleration and dynamic site delivery)
- Content is served at the edge
- Global Accelerator
- Improves performance for a wide range of applications over TCP or UDP
- Proxying packets at the edge to applications running in one or more AWS Regions.
- Good fit for non-HTTP use cases, such as gaming (UDP), IoT (MQTT),Voice over IP
- Good for HTTP use cases that require static IP addresses
- Good for HTTP use cases that required deterministic, fast regional failover
AWS Snow: offline devices, copy data to devices, send to Amazon, import to S3 and then to Glacier if required (Direct import to Glacier will not work)
Snowcone: Data Migration and Edge Computing
- Its small device
- 8TB of HDD or 14TB of SSD storage
- Must provide your own battery / cables
- Can be sent back to AWS offline, or connect it to internet and use AWS DataSync to send data
- AWS DataSync Agent preinstalled in this device
Snowball Edge: Data Migration and Edge Computing, it can do local processing
- Snowball Edge Storage Optimized – 80 TB of HDD capacity for block volume and S3 compatible object storage, 40 vCPUs, 80 GiB of RAM for Edge Computing
- Snowball Edge Compute Optimized – 42 TB of HDD capacity for block volume and S3 compatible object storage, 52 vCPUs, 208 GiB of RAM for Edge Computing
- Both AWS Snowball Edge Storage Optimized and AWS Snowball Edge Compute Optimized offer the storage clustering feature.
Snowmobile: Data Migration via Truck
Snow Family: Usage Process:
- Request Snowball devices from the AWS console for delivery
- Install the snowball client / AWS OpsHub on your servers
- Connect the snowball to your servers and copy files using the client
- Ship back the device when you’re done (goes to the right AWS facility)
- Data will be loaded into an S3 bucket
- Snowball is completely wiped
Edge Computing: process data while you are in transit (even without internet) with snowcone or snowball devices. Can run EC2 Instances & AWS Lambda functions
AWS OpsHub: A software to install on laptop to manage snow family device
- Unlocking and configuring single or clustered devices
- Transferring files
- Launching and managing instances running on Snow Family Devices
- Monitor device metrics (storage capacity, active instances on your device)
- Launch compatible AWS services on your devices (ex: Amazon EC2 instances, AWS DataSync, Network File System (NFS))
Database Migration service:
- Continuous data replication using change data capture (CDC)
- EC2 instance (running DMS) should be created to perform the replication tasks
- It enables us to seamlessly migrate data from supported sources to relational databases, data warehouses, streaming platforms, and other data stores in AWS cloud.
AWS Schema Conversion tool: Convert Database’s Schema from one engine to another
- Use DMS + SCT for continuous replication from one engine to another engine
- Server with SCT installed should be there which will convert the schema
AWS Application Discovery Service:
- Plan migration projects by gathering information about on-premises data centers
- Server utilization data and dependency mapping are important for migrations
Agentless Discovery (AWS Agentless Discovery Connector): VM inventory, configuration, and performance history such as CPU, memory, and disk usage
Agent-based Discovery (AWS Application Discovery Agent): System configuration, system performance, running processes, and details of the network connections between systems
- Resulting data can be viewed within AWS Migration Hub
AWS Migration Hub: provides a single location to track the progress of application migrations across multiple AWS and partner solutions.
AWS Application Migration Service: (replacing AWS Server Migration Service)
- Lift-and-shift (rehost) solution which simplify migrating applications to AWS
- Converts your physical, virtual, and cloud-based servers to run natively on AWS
- Supports wide range of platforms, Operating Systems, and databases
- Minimal downtime, reduced costs
We will install AWS Replication Agent in on-prem and do continuous replication of disks to AWS and cutover at some point of time.
AWS Transfer Family: A fully-managed service for file transfers into and out of Amazon S3 or Amazon EFS using the FTP protocol => Multi-AZ
- Supported Protocols
- AWS Transfer for FTP (File Transfer Protocol (FTP))
- AWS Transfer for FTPS (File Transfer Protocol over SSL (FTPS))
- AWS Transfer for SFTP (Secure File Transfer Protocol (SFTP))
- Integrate with existing authentication systems (Microsoft Active Directory, LDAP, Okta, Amazon Cognito, custom)
AWS DataSync: Move large amounts of data to and from AWS to AWS, on-prem to AWS, other cloud to AWS.
- Continuous syncup
- When sending data from on-prem or other cloud, Datasync agent should be installed on on-prem environment
- Can synchronize to S3, EFS, FSx
Storage cloud native options:
- Block – EBS, Instance Store
- File – EFS, FSx
- Object – S3, Glacier
S3 is Simple Storage Service
S3 and Glacier are object storage, not block storage
Bucket (directories) will have objects (files)
Buckets will have globally unique names
The key is composed of prefix + object name
s3://my-bucket/my_folder1/another_folder/my_file.txt
S3 can host static websites and have them accessible on internet
All objects in S3 are ‘Private’ by default
A byte-range request is a perfect way to get the beginning of a file and ensuring we remain efficient during our scan of our S3 bucket.
The default limit is 100 S3 buckets per account. However, we can request an upgrade to create more S3 buckets.
Upload to S3: Max object size is 5TB. If uploading more than 5GB, multi-part upload (recommended for files <100MB) must be used. We can also combine with S3 Transfer Acceleration which will Increase transfer speed by transferring file to an AWS edge location which will forward the data to the S3 bucket in the target region
- There are no S3 data transfer charges when data is transferred in from the internet. Also with S3TA, we pay only for transfers that are accelerated.
- aws S3 cp command automatically does multi-part upload for files over 100MB
- aws S3 sync command uses the CopyObject APIs to copy objects between S3 buckets.
#aws s3 sync s3://EXAMPLE-BUCKET-SOURCE s3://EXAMPLE-BUCKET-TARGET
https://s3-accelerate-speedtest.s3-accelerate.amazonaws.com/en/accelerate-speed-comparsion.html
- By default, all files transferred to an S3 bucket are transferred using a single PUT operation.
- The application can achieve at least 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per prefix in a bucket.
- Deletes and Overwrite PUTs have eventual consistency
S3 Bucket URL – Static Website: When we configure bucket as a static website, the website is available at the AWS Region-specific website endpoint of the bucket.
- Depending on Region, Amazon S3 website endpoints follow one of these two formats.
s3-website dot (.) Region ‐ http://bucket-name.s3-website.Region.amazonaws.com
s3-website dash (-) Region ‐ http://bucket-name.s3-website–Region.amazonaws.com
S3 Object versioning: can be enabled at bucket level to protect against unintended deletes
- By default it is disabled
- Once enabled, it can only be suspended and can never be disabled
- When working with S3 Versioning, we can optionally add another layer of security by configuring a bucket to enable MFA. When we do this, the bucket owner must include two forms of authentication in any request to delete a version or change the versioning state of the bucket
Amazon S3 – Security:
User-Based:
- IAM Policies – which API calls should be allowed for a specific user from IAM
Resource-Based:
- Bucket Policies – bucket wide rules from the S3 console – allows cross account
- Object Access Control List (ACL) – finer grain (can be disabled)
- Bucket Access Control List (ACL) – less common (can be disabled)
Note: an IAM principal can access an S3 object if
The user IAM permissions ALLOW it OR the resource policy ALLOWS it AND there’s no explicit DENY
Policies example:
- Public access – use bucket policy ==> by default, it is blocked
- IAM User access – IAM policy
- EC2 access – IAM Role
- IAM User access from other AWS account – use bucket policy
IAM policy structure: Statements consists of
- Sid:an identifier for the statement (optional)
- Effect:whether the statement allows or denies access (Allow, Deny)
- Principal:account/user/role to which this policy applied to
- Action:list of actions this policy allows or denies
- Resource:list of resources to which the actions applied to
- Ex: arn:awn:s3:::test/* (object level permission)
- Ex: arn:awn:s3:::test (bucket level permission)
- Condition:conditions for when this policy is in effect (optional)
S3 Replication:
- Only new objects will be replicated. To replicate existing objects, use S3 Batch replication
- Must enable versioning in source and destination buckets
- Cross Region Replication and same region replication works
- Buckets can be in different AWS accounts
- Copying is asynchronous
- Must give proper IAM permissions to S3
- Objects encrypted with SSE-C (customer provided key) are never replicated
- Unencrypted objects and objects encrypted with SSE-S3 are replicated by default
- For objects encrypted with SSE-KMS, we need to enable the option
- Deletes are not replicated
S3 Storage Classes: Can move between classes manually or using S3 Lifecycle configurations
- Amazon S3 Standard – General Purpose, the pricing is $0.023 per GB per month
- Amazon S3 Standard-Infrequent Access (IA) – rapid access
- Amazon S3 One Zone-Infrequent Access – rapid access, but only one AZ
- Amazon S3 Glacier Instant Retrieval – Millisecond retrieval, data required once quarter, min storage 90 days
- Amazon S3 Glacier Flexible Retrieval – Expedited (1 to 5 minutes), Standard (3 to 5 hours), Bulk (5 to 12 hours) – free, min storage 90 days
- Amazon S3 Glacier Deep Archive – for long term, min storage 180 days, Standard (12 hours), Bulk (48 hours)
- Amazon S3 Intelligent Tiering – Small monthly monitoring and auto-tiering fee, no retrieval fee
S3 – Lifecycle rules:
- Transition Actions – configure objects to transition to another storage class
- Expiration actions – configure objects to expire (delete) after some time
- Versioning must be enabled for the lifecycle policy to occur
- The minimum storage duration is 30 days before we can transition objects from S3 Standard to S3 standard-IA or S3 One Zone-IA
Life cycle transitions:
S3 – Requester pays:
- In general, bucket owners pay for all Amazon S3 storage and data transfer costs associated with their bucket
- With Requester Pays buckets, the requester instead of the bucket owner pays the cost of the request and the data download from the bucket
- Helpful when you want to share large datasets with other accounts
- The requester must be authenticated in AWS (cannot be anonymous)
S3 – Event Notifications:
- Any event (creation, removed, replication etc.) can be sent to SQS or SNS or Lambda or All
- Can create as many “S3 events” as desired
- We can also send all events to Amazon EventBridge which in turn sends to 18 AWS services
S3 Select & Glacier select:
- Retrieve less data using SQL by performing server-side filtering
- Can filter by rows & columns (simple SQL statements)
- Less network transfer, less CPU cost client-side
Ex, I have 10 shops and all the data from 10 shops comes in a ziped csv every day. Generally we download, extract and filter the data for a particular shop. With S3 Select, we can execute simple SQL statement to provide data only for a particular shop
S3 Batch Operations: Perform bulk operations on existing S3 objects with a single request
S3 – Object Encryption: Force encryption using bucket policies and refuse any API call to PUT an S3 object without encryption headers (SSE-KMS or SSE-C)
Server-side encryption:
- Amazon S3-Managed Keys (SSE-S3) – enabled by default
- Must set header “x-amz-server-side-encryption”: “AES256”
- AWS KMS-Managed Keys (SSE-KMS)
- Must set header “x-amz-server-side-encryption”: “aws:kms”
- Customer-Provided Keys (SSE-C)
- HTTPS must be used
- Encryption key must be provided in HTTP headers, for every HTTP request made
- Amazon S3-Managed Keys (SSE-S3) – enabled by default
Client-side Encryption:
- AWS KMS-Managed Customer Master Key (CSE-KMS)
- Client-side master Key (CSE-C)
S3 Access logs:
- Any request made to S3, from any account, authorized or denied, will be logged into another S3 bucket (imp note: do not keep same bucket for logging, it creates a loop)
- That data can be analyzed using data analysis tools
- The target logging bucket must be in the same AWS region
S3 – Pre-signed URLs:
- All objects in S3 are private by default
- Generate pre-signed URLs using the S3 Console, AWS CLI or SDK
- URL Expiration
- S3 Console – 1 min up to 720 mins (12 hours)
- AWS CLI – configure expiration with –expires-in parameter in seconds (default 3600 secs, max. 604800 secs ~ 168 hours)
- Users given a pre-signed URL inherit the permissions of the user that generated the URL for GET / PUT
S3 Access Point:
Each Access Point gets its own DNS and policy to limit who can access it
- A specific IAM user / group
- One policy per Access Point => Easier to manage than complex bucket policies
Ex, For a group, policy of RW access to a specific /HR prefix… which gives access to HR folder in the bucket
S3 Object Lambda Access Point: Use AWS Lambda Functions to change the object before it is retrieved by the caller application
- Only one S3 bucket is needed, on top of which we create S3 Access Point and S3 Object Lambda Access Points.
Use Cases:
- Redacting personally identifiable information for analytics or nonproduction environments.
- Converting across data formats, such as converting XML to JSON.
- Resizing and watermarking images on the fly using caller-specific details, such as the user who requested the object.
S3 – CORS: Cross-Origin Resource Sharing
Ex, if we access website (one.com, origin), that website wants to get images from another website (two.com, cross-origin). the web browser has a security built in, and first is going to do a pre-flight request (OPTIONS request (host as two.com, Origin as https://www.one.com)) to the cross-origin
If the website two.com doesn’t allow resource sharing it will be blocked OR it will give pre-flight response with Access-Control-Allow-Origin: one.com, Access-Control-Allow-Methods: GET, PUT, DELETE (CORS headers).
The web browser is going to make GET request to two.com to retrieve the images
- If a client makes a cross-origin request on our S3 bucket, we need to enable the correct CORS headers
- You can allow for a specific origin or for * (all origins)
CORS is a web browser security that allows us to enable images or assets or files being retrieved from one S3 bucket in case the request is originating from another origin.
S3 Object Lock: (versioning must be enabled)
- Adopt a WORM (Write Once Read Many) model
- Block an object version deletion for a specified amount of time
- Retention mode – Compliance:
- Object versions can’t be overwritten or deleted by any user, including the root user
- Objects retention modes can’t be changed, and retention periods can’t be shortened
- Retention mode – Governance:
- Most users can’t overwrite or delete an object version or alter its lock settings
- Some users have special permissions to change the retention or delete the object
- Retention Period: protect the object for a fixed period, it can be extended
- Legal Hold:
- protect the object indefinitely, independent from retention period
- can be freely placed and removed using the s3:PutObjectLegalHold IAM permission
Storage cloud native options:
- Block – EBS, Instance Store
- File – EFS, FSx
- Object – S3, Glacier
Amazon FSx: Provides highly cost-effective, fully managed, shared cloud file storage for Windows and Linux applications
FSx for Windows File Server: for windows, can be mounted on Linux EC2 instances
- Can be accessed from on-premises infrastructure (VPN or Direct Connect)
- Can be configured Multi-AZ (High Availability)
- Data is backed up daily to S3
- Connect Linux instances to the file system by installing the cifs-utils package. The Linux instances can then mount an SMB/CIFS file system.
FSx for Lusture: for Linux, Machine Learning and High Performance computing (HPC)
- Can be accessed from on-premises infrastructure (VPN or Direct Connect)
- Scratch File system: for temporary, High burst (processing)
- Persistent File system: for Long term, data replication within AZ
- It provides the ability to both process the ‘hot data’ in a parallel and distributed fashion as well as easily store the ‘cold data’ on Amazon S3.
FSx for NetApp ONTAP: File System compatible with NFS, SMB, iSCSI protocol
FSx for OpenZFS: File System compatible with NFS (v3, v4, v4.1, v4.2)
Storage Gateways: Bridge between on-premises data and cloud data. These Gateways (Virtual) resides/installs in on-prem to connect to AWS
- S3 File Gateway
- FSx File Gateway
- Volume Gateway (Cache and Stored)
- Tape Gateway
S3 File Gateway:
- Configured S3 buckets are accessible using the NFS and SMB protocol
- Most recently used data is cached in the file gateway
- Supports S3 Standard, S3 Standard IA, S3 One Zone IA, S3 Intelligent Tiering
- Transition to S3 Glacier using a Lifecycle Policy
- Bucket access using IAM roles for each File Gateway
- SMB Protocol has integration with Active Directory (AD) for user authentication
FSx File Gateway:
- Native access to Amazon FSx for Windows File Server
- Local cache for frequently accessed data
- Windows native compatibility (SMB, NTFS, Active Directory…)
- Useful for group file shares and home directories
Volume Gateway:
- Block storage using iSCSI protocol backed by S3
- Backed by EBS snapshots which can help restore on-premises volumes!
- Cached volumes: low latency access to most recent data
- Stored volumes: entire dataset is on premise, scheduled backups to S3
- In the cached mode, your primary data is written to S3, while retaining your frequently accessed data locally in a cache for low-latency access.
- In the stored mode, your primary data is stored locally and your entire dataset is available for low-latency access while asynchronously backed up to AWS.
Tape Gateway:
- Some companies have backup processes using physical tapes (!)
- With Tape Gateway, companies use the same processes but, in the cloud
- Virtual Tape Library (VTL) backed by Amazon S3 and Glacier
- Back up data using existing tape-based processes (and iSCSI interface)
- Works with leading backup software vendors
Storage Gateway – Hardware Appliance: Works with all the types of gateways, if there is no virtualization in on-prem
- Can be ordered in Amazon.com
- Can be setup as an any type of gateway
AWS Transfer Family: A fully-managed service for file transfers into and out of Amazon S3 or Amazon EFS using the FTP protocol => Multi-AZ
- Supported Protocols
- AWS Transfer for FTP (File Transfer Protocol (FTP))
- AWS Transfer for FTPS (File Transfer Protocol over SSL (FTPS))
- AWS Transfer for SFTP (Secure File Transfer Protocol (SFTP))
- Integrate with existing authentication systems (Microsoft Active Directory, LDAP, Okta, Amazon Cognito, custom)
AWS Data Sync: Move large amounts of data to and from AWS to AWS, on-prem to AWS, other cloud to AWS.
- When sending data from on-prem or other cloud, Datasync agent should be installed
- Can synchronize to S3, EFS, FSx
Elastic Fabric Adapter (EFA):
- Improved ENA (Elastic Network Adapter) for HPC, only works for Linux
- Great for inter-node communications, tightly coupled workloads
- Leverages Message Passing Interface (MPI) standard
- Bypasses the underlying Linux OS to provide low-latency, reliable transport
These are decoupling solutions
SQS: Queue model
SNS: Pub/Sub model (Publish/Subscribe)
Kinesis: Realtime streaming model
SQS is a fully managed service, and multiple copies of every message are stored redundantly in multiple availability zones in a region. SNS, Kinesis, EventBridge all store data across multiple availability zones in a region
SQS: producers send the messages to SQS queue and consumers (EC2, servers, Lambda etc) poll for the messages (Upto 10 at a time) from the queue
- Unlimited throughput, unlimited number of messages in queue
- Default retention of messages: 4 days, maximum of 14 days
- Low latency (<10 ms on publish and receive)
- Limitation of 256KB per message sent
- Produced to SQS using the SDK (SendMessage API) and Consumer process the message (For ex, insert the message into RDS) and delete the messages using the DeleteMessage API
SQS with Auto Scaling group: if SQS length is becoming big, we can use CloudWatch Metric – Queue Length “ApproximateNumberOfMessages” to set certain limit and raise CloudWatch Alarm which is attached to Auto Scaling group which in turn scales EC2 instances
SQS Message Visibility Timeout:
- After a message is polled by a consumer, it becomes invisible to other consumers
- Visibility timeout is used for handling failures. The default timeout is 30 seconds, and the maximum is 12 hours.
- That means the message has 30 seconds to be processed when it has default value
- After the message visibility timeout is over, the message is “visible” in SQS
- If a message is not processed within the visibility timeout, it will be processed twice
- A consumer could call the ChangeMessageVisibility API to get more time
- If visibility timeout is high (hours), and consumer crashes, re-processing will take time
- If visibility timeout is too low (seconds), we may get duplicates
SQS – Long Polling:
- When a consumer requests messages from the queue, it can optionally “wait” for messages to arrive if there are none in the queue
- LongPolling decreases the number of API calls made to SQS while increasing the efficiency and reducing latency of your application
- Using long polling can reduce the cost of using SQS because you can reduce the number of empty receives.
- The wait time can be between 1 sec to 20 sec (20 sec preferable)
- Long Polling is preferable to Short Polling
- Long polling can be enabled at the queue level or at the API level using “WaitTimeSeconds”
SQS – FIFO Queue:
- First In First Out (ordering of messages in the queue)
- Limited throughput: 300 msg/s without batching, 3000 msg/s with batching
- Exactly-once send capability (by removing duplicates)
- Messages are processed in order by the consumer
- Content deduplication is supported only in the FIFO queue.
SQS Delay queues: postpone the delivery of new messages to a queue for several seconds, for example, when consumer application needs additional time to process messages. If we create a delay queue, any messages that sent to the queue remain invisible to consumers for the duration of the delay period.
- The default (minimum) delay for a queue is 0 seconds. The maximum is 15 minutes.
SQS as a buffer:
- For Ex, if we want to write all the customer transactions into databases, some may get lost due to data base overload due to sudden/high number of transactions.
- So we can use SQS queue to write transactions, the application with enqueue message to SQS queue and create another application group to poll for transactions and write to database
- Helps during database downtime, upgradation stage
SQS Dead Letter Queue: The main task of a dead-letter queue is handling message failure.
- It is a standard or FIFO queue that has been specified as a dead-letter queue
- Messages are moved to the dead-letter queue when the ReceiveCount for a message exceeds the maxReceiveCount for a queue.
- Dead-letter queues should not be used with standard queues when your application will keep retrying transmission.
- Dead-letter queues will break the order of messages in FIFO queues.
Amazon SQS temporary queues: Temporary queues help us save development time and deployment costs when using common message patterns such as request-response. We can use the Temporary Queue Client to create high-throughput, cost-effective, application-managed temporary queues.
SNS: To send message to many receivers (subscribers)
- Many AWS services can send data directly to SNS for notifications
- Subscribers can be SQS, Lambda, Kinesis Data Firehose, Emails, sms, HTTPs Endpoints
SNS+SQS – FANOUT: This method is to send event to multiple SQS queues by pushing only once to SNS, which means producer will send message to SNS and SNS will publish to multiple queues and also other AWS services as required
Note: If we want publish message from SNS to S3, send it to Kinesis data firehose which in turn KDF will send it to S3
SNS – FIFO Queue:
- First In First Out (ordering of messages in the queue)
- Can only have SQS FIFO queues as subscribers
- Fanout is possible, multiple SQS FIFO queues will be subscribers
- Limited throughput: 300 msg/s without batching, 3000 msg/s with
- By default, FIFO queues support up to 300 messages per second (300 send, receive, or delete operations per second). When you batch 10 messages per operation (maximum), FIFO queues can support up to 3,000 messages per second.
- The 3000 transactions represent 300 API calls, each with a batch of 10 messages
SNS – Message Filtering: While publishing message to multiple SQS queues, we can filter the data for each queue. For Ex, placed orders to Queue1, cancelled orders to Queue2
- If no filter, all messages will go to Queues
Kinesis: To collect, process and analyze streaming data in real time
- Kinesis Data Streams: capture, process, and store data streams
- Kinesis Data Firehose: load data streams into AWS data stores
- Kinesis Data Analytics: analyze data streams with SQL or Apache Flink
- Kinesis Video Streams: capture, process, and store video streams
Note: Kinesis does not support resource-based policies.
Kinesis Data Streams: capture, process, and store data streams
- A Stream has n number of shards, scales automatically on on-demand mode
- Retention between 1 day to 365 days
- Real-time
- Ability to reprocess (replay) data
- Once data is inserted in Kinesis, it can’t be deleted (immutability)
- Data that shares the same partition goes to the same shard (ordering)
- Control access / authorization using IAM policies
- VPC Endpoints available for Kinesis to access within VPC
- It is used for buffering messages and multiple consumers can read the messages anytime
Producers: AWS SDK, Kinesis Producer Library (KPL), Kinesis Agent
Consumers:
- Write your own: Kinesis Client Library (KCL), AWS SDK
- Managed: AWS Lambda, Kinesis Data Firehose, Kinesis Data Analytics,
Note: Send data with partition key. The same key will go to same Shard
Kinesis Data Streams – Capacity Modes:
Provisioned mode:
- You choose the number of shards provisioned, scale manually or using API
- Each shard gets 1MB/s in (or 1000 records per second)
- Each shard gets 2MB/s out (classic or enhanced fan-out consumer)
- We pay per shard provisioned per hour
On-demand mode:
- No need to provision or manage the capacity
- Default capacity provisioned (4 MB/s in or 4000 records per second)
- Scales automatically based on observed throughput peak during the last 30 days
- Pay per stream per hour & data in/out per GB
Kinesis Data Firehose: load data streams into AWS data stores
- It’s a server less
- Destinations are S3, RedShift, OpenSearch, 3rd party partners (splunk, mongoDB etc), custom HTTP endpoint
- Near real-time (buffer time min. 60 sec)
- Automatic scaling
- No data storage
- Doesn’t support replay capability
Amazon MQ: It is an open protocol for decoupling the application where mostly used on on-prem environments. This is used in AWS when we migrate to the cloud and we don’t want to re-engineer the application to use SQS and SNS
- Amazon MQ doesn’t scale as much as SQS / SNS
- Amazon MQ runs on servers, can run in Multi-AZ with failover
- Amazon MQ has both queue feature (SQS) and topic features (SNS)
- Protocols – STOMP, MQTT, AMQP
- With AmazonMQ, we can easily migrate applications built on ActiveMQ and RabbitMQ to the cloud.
Amazon Athena:
- Serverless query service to analyze data stored in Amazon S3
- Uses standard SQL language to query the files
- Commonly used with Amazon Quicksight for reporting/dashboards
Amazon Redshift: ideal for running big queries against large datasets
- Redshift is based on PostgreSQL, but it’s used for OLAP – online analytical processing (analytics and data warehousing)
- 10x better performance than other data warehouses, scale to PBs of data
- Columnar storage of data (instead of row based) & parallel query engine
- Pay as you go based on the instances provisioned or can use Reserved Instances for cost savings
Redshift – Snapshot:
- Redshift has “Multi-AZ” mode for some clusters
- Snapshots are point-in-time backups of a cluster, stored internally in S3
- Snapshots are incremental (only what has changed is saved)
- We can restore a snapshot into a new cluster
- You can configure Amazon Redshift to automatically copy snapshots (automated or manual) of a cluster to another AWS Region
Automated: every 8 hours, every 5 GB, or on a schedule. Set retention between 1 to 35 days
Manual: snapshot is retained until you delete it
Redshift Spectrum:
- It can efficiently query and retrieve structured and semi-structured data from files in Amazon S3 without having to load the data into Amazon Redshift tables.
- Must have a Redshift cluster available to start the query
- The query is then submitted to thousands of Redshift Spectrum nodes
- Redshift Spectrum queries employ massive parallelism to execute very fast against large datasets.
Amazon OpenSearch Service: Earlier called as ElasticSearch Service
- In DynamoDB, queries only exist by primary key or indexes…
- With OpenSearch, you can search any field, even partially matches
- OpenSearch requires a cluster of instances (not serverless)
- Comes with OpenSearch Dashboards (visualization)
Amazon EMR: Elastic MapReduce
- EMR helps creating Hadoop clusters (Big Data) to analyze and process vast amount of data
- The clusters can be made of hundreds of EC2 instances
- EMR takes care of all the provisioning and configuration
- Auto-scaling and integrated with Spot instances
- It is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data.
- EMR utilizes a hosted Hadoop framework running on Amazon EC2 and Amazon S3.
Use cases: data processing, machine learning, web indexing, big data…
AWS Glue:
- Managed extract, transform, and load (ETL) service
- Useful to prepare and transform data for analytics
- Fully serverless service
- This is used to convert data into Parquet format (Columnar data format)
- It tracks data that has already been processed during a previous run of an ETL job by persisting state information from the job run
- This persisted state information is called a job bookmark. Job bookmarks help AWS Glue maintain state information and prevent the reprocessing of old data
Glue Data Catalog: which is to catalog data sets.
- So the Glue Data Catalog will run Glue Data Crawlers and they will be connected to various data sources such as Amazon S3, Amazon RDS, Amazon DynamoDB or a compatible JDC database in on prem
- Glue Data Crawler is going to crawl these databases and is going to write all the metadata of tables, columns, data types etc. into the Glue Data Catalog
- And so it will have all the databases, the tables, and the metadata, and that will be leveraged by the Glue jobs to perform ETL
Amazon QuickSight: Serverless machine learning-powered business intelligence service to create interactive dashboards
- In-memory computation using SPICE engine if data is imported into QuickSight
Use cases:
- Business analytics
- Building visualizations
- Perform ad-hoc analysis
- Get business insights using data
QuickSight – Dashboard & Analytics:
- Define Users (standard versions) and Groups (enterprise version)
- By default, dashboard in QuickSight aren’t shared with anyone and are only accessible to owner
- These users & groups only exist within QuickSight, not IAM
- we can share the analysis or the dashboard with Users or Groups
- To share a dashboard, we must first publish it
- After we publish a dashboard, we can share it with other users or groups in our QuickSight account
- Users who see the dashboard can also see the underlying data
Amazon Kinesis: To collect, process and analyze streaming data in real time
- Kinesis Data Streams: capture, process, and store data streams
- Kinesis Data Firehose: load data streams into AWS data stores
- Kinesis Data Analytics: analyze data streams with SQL or Apache Flink
- Kinesis Video Streams: capture, process, and store video streams
Kinesis Data Analytics: analyze data streams with SQL or Apache Flink
SQL Application:
- Real-time analytics on Kinesis Data Streams & Firehose using SQL
- Add reference data from Amazon S3 to enrich streaming data
- Fully managed, no servers to provision
- Automatic scaling
- Pay for actual consumption rate
Output:
- Kinesis Data Streams: create streams out of the real-time analytics queries
- Kinesis Data Firehose: send analytics query results to destinations
Apache Flink:
- Use Flink (Java, Scala or SQL) to process and analyze streaming data
- Run any Apache Flink application on a managed cluster on AWS
- provisioning compute resources, parallel computation, automatic scaling
- application backups (implemented as checkpoints and snapshots)
- Use any Apache Flink programming features
- Flink does not read from Firehose (use Kinesis Analytics for SQL instead)
AWS Lake Formation:
- Data lake = central place to have all your data for analytics purposes
- Multiple teams can access data and perform analytics (based on access control)
- Fully managed service that makes it easy to setup a data lake in days
- Discover, cleanse, transform, and ingest data into your Data Lake
- Combine structured and unstructured data in the data lake
- Out-of-the-box source blueprints: S3, RDS, Relational & NoSQL DB…
- Fine-grained Access Control for your applications (row and column-level)
- Built on top of AWS Glue
Amazon Neptune: It is a fast, reliable, fully managed graph database service that makes it easy to build and run applications that work with highly connected datasets.
- It is highly available, with read replicas, point-in-time recovery, continuous backup to Amazon S3, and replication across Availability Zones.
- It is secure with support for HTTPS encrypted client connections and encryption at rest.
- It is fully managed, so you we longer need to worry about database management tasks such as hardware provisioning, software patching, setup, configuration, or backups.
- It can quickly and easily process large sets of user-profiles and interactions to build social networking applications.
CloudWatch: Provides metrics (CPU Utilization etc.) for every service in AWS to monitor everything that’s happening in our account
- We can also create custom metrics as per requirement instead of using AWS defined metrics
CloudWatch Metric streams:
- Continually stream CloudWatch metrics to a destination of our choice, with near-real-time delivery and low latency
- The destination is Amazon Kinesis Data Firehose (and then any destination)
- Option to filter metrics to only stream a subset of them
- Metric filters can be used to trigger CloudWatch alarms
Basic: Many AWS services offer basic monitoring by publishing a default set of metrics to CloudWatch with no charge to customers. By default, when we start using AWS services, basic monitoring is automatically enabled.
Detailed: Detailed monitoring is offered by only some services. It also incurs charges. To use it for an AWS service, you must choose to activate it.
- Amazon EC2 detailed monitoring provides more frequent metrics, published at one-minute intervals, instead of the five-minute intervals used in Amazon EC2 basic monitoring
CloudWatch Logs:
- We can create a Log group with any name (Ex, ALB) to send the log stream of any application (Ex, ALB)
- Can be sent to
- Amazon S3 (only for exporting CloudWatch logs) –> use KDF to send to S3 for near real time
- Kinesis Data Streams
- Kinesis Data Firehose (KDF)
- AWS Lambda
- OpenSearch (near real time) – allow ingesting, searching and visualization of data
- Can define log expiration policies (never expire, 30 days, etc..) as we are paying for storage on CloudWatch Logs
- CloudWatch Logs Insights can be used to query logs and add queries to CloudWatch Dashboards
For EC2:
- By default, no logs from your EC2 machine will go to CloudWatch
- We need to run a CloudWatch agent on EC2 to push the log files we want
- Make sure IAM permissions are correct
- The CloudWatch log agent can be setup on-premises too
CloudWatch Logs Agent:
- Old version of the agent
- Can only send to CloudWatch Logs
CloudWatch Unified Agent:
- Collect additional system-level metrics such as RAM, processes, swap space etc…
- Collect logs to send to CloudWatch Logs
- Centralized configuration using SSM Parameter Store
CloudWatch Log Sources:
- Sources are SDK, CloudWatch Logs Agent, CloudWatch Unified Agent
- Elastic Beanstalk: collection of logs from application
- ECS: collection from containers
- AWS Lambda: collection from function logs
- VPC Flow Logs: VPC specific logs
- API Gateway: will send all the requests made to the API gateway into CloudWatch logs,
- CloudTrail based on filter
- Route53: Log DNS queries
CloudWatch Subscriptions: Filter that we apply on top of your CloudWatch logs, and then we can send it to a destination.
- We can create 2 subscription filter per log group
CloudWatch Alarms: Alarms are used to trigger notifications for any metric
Alarm States:
- OK
- INSUFFICIENT_DATA
- ALARM
Alarm Targets:
- Stop, Terminate, Reboot, or Recover an EC2 Instance
- Trigger Auto Scaling Action
- Send notification to SNS
CloudWatch Alarms – Composite Alarms: While CloudWatch Alarms are on a single Metric, Composite Alarms are monitoring the states of multiple other alarms using AND/OR conditions
Amazon EventBridge: Event Bus
- Schedule Events – To run jobs (scheduled/con scripts) on Lambda Function
- Event Pattern – Root user signin event to SNS Topic with Email Notification
- Trigger Lambda functions
- Send SQS/SNS messages
- EventBridge will create a JSON document as per event and send to different kind of destinations
- AWS services will send to Default Event Bus, Partners will send to Partner Event Bus, Custom apps can send to custom Event Bus
- Event buses can be accessed by other AWS accounts using Resource-based Policies
- It is the only event-based service that integrates directly with third-party SaaS partners
CloudTrail: It is enabled by default… This is used for Audit, all API calls will be logged
- Get an history of events / API calls made within your AWS Account by:
- Console
- SDK
- CLI
- AWS Services
- Can put logs from CloudTrail into CloudWatch Logs or S3
CloudTrail Events:
Management Events – Operations that are performed on resources in our AWS account
- Can separate Read Events (that don’t modify resources) from Write Events (that may modify resources)
Data Events –
- By default, data events are not logged (because high volume operations)
- Amazon S3 object-level activity (ex: GetObject, DeleteObject, PutObject): can separate Read and Write Events
- AWS Lambda function execution activity (the Invoke API)
CloudTrail Insights Events – To detect unusual activity in our account
- CloudTrail Insights analyzes normal management events to create a baseline
- And then continuously analyzes write events to detect unusual patterns
- Events are stored for 90 days in CloudTrail
- To keep events beyond this period, log them to S3 and use Athena
Ex, if we use DeleteTable API call on DynamoDB, this will be logged in CloudTrail. All the API Calls will end up as events as well in Amazon EventBridge, so we can create SNS alert out of it
AWS Config: Configuration as per rules is complaint/non-complaint
- Can use AWS managed config rules or make custom rules which should be defined in AWS Lambda
- Rules can be evaluated and triggered for each config change or based on time interval
- It is a per region service
- We can receive alerts (SNS Notifications) for any changes
- Possibility of storing the configuration data into S3 (analyzed by Athena)
- AWS Config Rules does not prevent actions from happening (no deny)
Config Rules Ex:
- Evaluate if each EBS disk is of type gp2
- Evaluate if each EC2 instance is t2.micro
- Detect resources that are not properly tagged
Config Rules – Remediations:
- Automate remediation of non-compliant resources using SSM Automation Documents
- Use AWS-Managed Automation Documents or create custom Automation Documents that invokes Lambda function which will help to do whatever we want
- We can set Remediation Retries if the resource is still non-compliant after auto remediation
Config Rules – Notifications:
- Use EventBridge to trigger notifications when AWS resources are noncompliant
AWS Resource Access Manager: helps us securely share our resources across AWS accounts, within our organization or organizational units (OUs) in AWS Organizations, and with IAM roles and IAM users for supported resource types. We can use AWS RAM to share resources with other AWS accounts. This eliminates the need to provision and manage resources in every account. When we share a resource with another account, that account is granted access to the resource and any policies and permissions in that account apply to the shared resource.
AWS Trusted Advisor: Analyzes AWS accounts and provide recommendations for
- Cost Optimization
- Performance
- Security
- Fault Tolerance
- Service Limits
- Core Checks and recommendations – all customers
- Can enable weekly email notification from the console
Full Trusted Advisor – Available for Business & Enterprise support plans
- Ability to set CloudWatch alarms when reaching limits
- Programmatic Access using AWS Support API
Rekognition: face detection, labeling, celebrity recognition
Transcribe: audio to text (ex: subtitles)
Polly: text to audio
Translate: translations
Lex: build conversational bots – chatbots. Ex, Alexa
Connect: cloud based virtual contact center
Comprehend: natural language processing, Uses machine learning to find insights and relationships in text
Amazon Comprehend Medical: detects and returns useful information in unstructured clinical text.
SageMaker: machine learning for every developer and data scientist
Forecast: build highly accurate forecasts
Kendra: ML-powered search engine
Personalize: real-time personalized recommendations
Textract: detect text and data in documents
Fraud Detector: It can detect online frauds with machine learning. For example, this service can flag suspicious online payments, detect new account fraud and incorporate additional verification steps, account takeover detection, and so forth
Costing depends mostly on below:
- Compute
- Storage
- Data transfer
AWS Pricing Calculator: It can be used to estimate costs for the major AWS services including S3 and EC2
AWS Total Cost of Ownership (TCO) Calculator: It asks us to input the components needed to host a solution internally and then provides a cost comparison between hosting locally and in AWS. The AWS Simple Monthly Calculator allows to calculate the cost of a specified solution in AWS alone. The Trusted Advisor provides suggestions for enhanced security, performance, and cost reduction. The AWS Budget service allows to configure budgets and alerts for when budgets are or may be exceeded.
AWS Budgets:
- Used to track cost, usage, or coverage and utilization for Reserved Instances and Savings Plans, across multiple dimensions, such as service, or Cost Categories
- Alerting through event-driven alert notifications for when actual or forecasted cost or usage exceeds budget limit, or when Reserved Instances and Savings Plans coverage or utilization drops below threshold
- Create annual, quarterly, monthly, or even daily budgets depending on business needs.
Cost Explorer: It has an easy-to-use interface that lets to visualize, understand and manage AWS costs and usage over time
- Visualize, understand, and manage your AWS costs and usage over time
- Create custom reports that analyze cost and usage data.
- Analyze your data at a high level: total costs and usage across all accounts
- Monthly, hourly, resource level granularity
- By enabling hourly granularity, we can view hourly costs up to the past 14 days (good for in-depth analysis)
- Choose an optimal Savings Plan (to lower prices on your bill)
- Forecast usage up to 12 months based on previous usage
- It shows 6 months history by default. This can be adjusted. The history can be used to see where the costs are accruing and to potentially predict future costs.
Cost and Usage Reports:
- These are generated from Redshift or QuickSight data source and are stored in S3 buckets
- The interface provides an option to create an S3 bucket, appropriately configured, for report storage
AWS Compute Optimizer: It helps us identify the optimal AWS resource configurations, such as Amazon Elastic Compute Cloud (EC2) instance types, Amazon Elastic Block Store (EBS) volume configurations, and AWS Lambda function memory sizes, using machine learning to analyze historical utilization metrics.
- It provides a set of APIs and a console experience to help us reduce costs and increase workload performance by recommending the optimal AWS resources for AWS workloads.
Server-side encryption:
- Amazon S3-Managed Keys (SSE-S3)
- AWS KMS-Managed Keys (SSE-KMS)
- Customer-Provided Keys (SSE-C)
Client-side Encryption:
- AWS KMS-Managed Customer Master Key (CSE-KMS)
- Client-side master Key (CSE-C)
Symmetric Encryption: Single encryption key that is used to Encrypt and Decrypt
Asymmetric Encryption: Public (Encrypt) and Private Key (Decrypt) pair
AWS KMS: It’s a key management service i.e. Encryption
- AWS manages encryption keys for us
- Fully integrated with IAM for authorization, define who can access or administer
- Able to audit KMS Key usage using CloudTrail
- KMS Key Encryption also available through API calls (SDK, CLI)
- Supports both Symmetric and Asymmetric (public key is downloadable)
Note: Asymmetric encryption is required for outside users who can’t call the KMS API
KMS Keys:
- AWS Managed Key: free (aws/service-name, example: aws/s3, aws/rds or aws/ebs), automatic rotation every 1 year
- Customer Managed Keys (CMK) created in KMS: $1/month, automatic rotation every 1 year (must be enabled)
- Customer Managed Keys imported (must be 256-bit symmetric key): $1/month, manual rotation
- SSE-KMS provides with an audit trail that shows when CMK was used and by whom
- + pay for API call to KMS ($0.03 / 10000 calls)
Note: Preserve the old keys in KMS. They are required for decrypting data that was encrypted with old keys. If we delete the keys, the existing encrypted data will be inaccessible
KMS Multi-Region Keys: Identical KMS keys across multiple regions
- Encrypt in one region and decrypt in another
- It is not global, it is primary + replicas
- Each Multi-region key is managed independently
AWS CloudHSM: It is an AWS single-tenant security appliance for storing your keys. If we choose this option, the key material is generated by the CloudHSM hardware
- CloudHSM cannot rotate the key material automatically, and we need to rotate them manually
IAM Identity Center: earlier known as AWS single sign-on
- One login (single sign-on) for all your
- AWS accounts in AWS Organizations
- Business cloud applications (e.g., Salesforce, Box, Microsoft 365, …)
- SAML2.0-enabled applications
- EC2 Windows Instances
- IdP can be IAM Identity center, AD, Okta etc.
- Permission sets (IAM Policies…read only or write access) can be assigned to users, group, OUs
AWS Directory Services:
- AWS Managed Microsoft AD
- Create your own AD in AWS, manage users locally, supports MFA
- Establish “trust” connections (two way trust) with your on-premises AD
- AD Connector
- Directory Gateway (proxy) to redirect to on-premises AD, supports MFA
- Users are managed on the on-premises AD
- Simple AD
- AD-compatible managed directory on AWS
- Cannot be joined with on-premises AD
SSM Parameter Store: Provides secure data storage. We can store data such as passwords, database strings, AMI IDs, and license codes as parameter value
- To read contents, we need two policies:
- Read access on Parameter store
- Decrypt access on KMS key that encrypted the parameter
- Free for basic username/password
- Hierarchical structure
- No rotation
AWS Secrets Manager: for storing secrets, not free
- Capability to force rotation of secrets every X days
- Automate generation of secrets on rotation (uses Lambda)
- Integration with Amazon RDS (MySQL, PostgreSQL, Aurora)
- Secrets are encrypted using KMS
- Replicate secret across multiple regions and ability to promote a read replica secret to standalone secret
AWS Certificate Manager (ACM): Provision, manage, deploy TLS Certificates
- ACM generated certificates are auto renewed. We can also import public certificates into ACM, but we have to manually renew it
- ACM sends daily expiration events (starting 45 days prior) to EventBridge
- AWS Config has a managed rule named acm-certificate-expiration-check to check for expiring certificates
- Supports for public (free of charge) and private certificates
- Integrations with
- Elastic Load Balancers (CLB, ALB, NLB) – ALB can redirect http to https
- CloudFront Distributions
- APIs on API Gateway
ACM – API Gateway: Create a Custom Domain Name in API Gateway
Edge-Optimized (default): For global clients
- Requests are routed through the CloudFront Edge locations (improves latency)
- The API Gateway still lives in only one region
- The TLS Certificate must be in the same region as CloudFront, in us-east-1
- Then setup CNAME or (better) A-Alias record in Route 53
Regional:
- For clients within the same region
- The TLS Certificate must be imported on API Gateway, in the same region as the API Stage
- Then setup CNAME or (better) A-Alias record in Route 53
AWS Firewall Manager: Allows us to centrally configure and manage firewall rules across your accounts and applications in AWS Organizations. As new applications are created, Firewall Manager makes it easier to bring new applications and resources into compliance by enforcing a common set of security rules.
Services supported by AWS FM:
- AWS WAF
- AWS Network Firewall
- AWS Shield
- Amazon Route53 resolver DNS Firewall
- VPC Security Groups
- Third party firewalls
AWS WAF: Protects web application against Layer7 attacks
- Deploy on
- Application Load Balancer
- API Gateway
- CloudFront
- AppSync GraphQL API
- Cognito User Pool
- Set Web Access Control List (ACL) rules which are regional except for CloudFront
AWS Shield: To protect from DDoS Attack
Standard: Free for every customer
- Provides protection from attacks such as SYN/UDP Floods, Reflection attacks and other layer 3/layer 4 attacks
Advanced: monthly 3000$, 24/7 access to DDoS Response team,
- Protect against more sophisticated attack on Amazon EC2, Elastic Load Balancing (ELB), Amazon CloudFront, AWS Global Accelerator, and Route 53
- Shield Advanced automatic application layer DDoS mitigation automatically creates, evaluates and deploys AWS WAF rules to mitigate layer 7 attacks
AWS Network Firewall: protects VPC
- It is a stateful, managed, network firewall and intrusion detection and prevention service for VPC
- Rules can be centrally managed cross-account by AWS Firewall Manager to apply to many VPCs
- From Layer3 to Layer7 protection
- Send logs of rule matches to Amazon S3, CloudWatch Logs, Kinesis Data Firehose
Amazon Guard Duty: Intelligent threat discovery to protect the account, one click to enable (30 days trail), no software required
- Input Data Includes:
- CloudTrail Events Logs – unusual API calls, unauthorized deployments
- CloudTrail Management Events – create VPC subnet, create trail etc.
- CloudTrail S3 Data Events – get object, list objects, delete object etc
- VPC Flow Logs – unusual internal traffic, unusual IP address
- DNS Logs – compromised EC2 instances sending encoded data within DNS queries
- Kubernetes Audit Logs – suspicious activities & potential EKS cluster compromises
- CloudTrail Events Logs – unusual API calls, unauthorized deployments
- Can setup EventBridge rules to notify the findings and send to Lambda or SNS
- Disable the service in the general settings will delete all remaining data, including findings and configurations before relinquishing the service permissions and resetting the service.
- A CloudWatch Events rule can be used to set up automatic email notifications for Medium to High Severity findings to the email address of our choice
Amazon Inspector: It is for automated security assessments, identifying security vulnerabilities
- It is used for
- EC2: Leveraging AWS Systems Manager (SSM) agent
- Container Images: when they are pushed to Amazon ECR
- Lambda Functions: Function code and package dependencies
- Reporting & integration with AWS Security Hub
- Send findings to Amazon Event Bridge
Amazon Macie: It is a fully managed data security and data privacy service that uses machine learning and pattern matching to discover and protect your sensitive data in AWS.
- Macie helps identify and alert you to sensitive data, such as personally identifiable information (PII)
- It will analyze the data in S3 and notify through Amazon EventBridge
Recovery Point Objective (RPO): How much of a data loss (1 day or 1 hour). It depends on last backup before disaster happens
Recovery Time Objective: When we recover from disaster (downtime)
DR strategies:
- Backup and Restore – High RPO
- Pilot Light – A small version of app always running in the cloud
- Warm Standby – Full system is up and running, but at min size, scale when disaster
- Hot Site / Multi Site Approach – Full production scale is running on AWS and on-prem
AWS Backup:
- Centrally manage and automate backups across AWS services
- No need to create custom scripts and manual processes
- Supports cross-region and cross-account backups
- OnDemand and scheduled backups (backup to S3)
- Supports PITR (point in time recovery) for supported services
- It can also take care of the retention policies. For ex, if we want the backup to be taken every day and the backup should be stored for next 35 days.
Supported services:
- Amazon EC2 / Amazon EBS
- Amazon S3
- Amazon RDS (all DBs engines) / Amazon Aurora / Amazon DynamoDB
- Amazon DocumentDB / Amazon Neptune
- Amazon EFS / Amazon FSx (Lustre & Windows File Server)
- AWS Storage Gateway (Volume Gateway)
CI/CD: continuous integration and continuous delivery (CI/CD) pipeline on AWS.
- A pipeline helps you automate steps in your software delivery process, such as initiating automatic builds and then deploying to Amazon EC2 instances.
- We will use AWS CodePipeline, a service that builds, tests, and deploys our code every time there is a code change, based on the release process models we define.
AWS CloudFormation: Infrastructure as a code
- CloudFormation is a declarative way of outlining the AWS Infrastructure, for any resources (most of them are supported).
- For example, within a CloudFormation template, we say:
- I want a security group
- I want two EC2 instances using this security group
- I want an S3 bucket
- I want a load balancer (ELB) in front of these machines
- Then CloudFormation creates those for us, in the right order, with the exact configuration that we specify
- we can estimate the costs of resources using the CloudFormation template
AWS CloudFormation StackSet: Extends the functionality of stacks by enabling us to create, update, or delete stacks across multiple accounts and regions with a single operation.
- A stack set lets us create stacks in AWS accounts across regions by using a single AWS CloudFormation template.
- Using an administrator account of an “AWS Organization”, we can define and manage an AWS CloudFormation template, and use the template as the basis for provisioning stacks into selected target accounts of an “AWS Organization” across specified regions.
AWS Serverless Application Model (AWS SAM): It is an extension of AWS CloudFormation that is used to package, test, and deploy serverless applications.
Amazon SES:
- Fully managed service to send emails securely, globally and at scale
- Allows inbound/outbound emails
- Send emails using any application using AWS Console, APIs, or SMTP
Note: Prefer to use group email address (Distribution list – DL) for root account, so email and alerts that are sent by AWS can reach multiple people
Amazon Pinpoint:
- Scalable 2-way (outbound/inbound) marketing communications service
- Supports email, SMS, push, voice, and in-app messaging
- Scales to billions of messages per day
Use cases: run campaigns by sending marketing, bulk, transactional SMS messages
AWS Systems Manager (SSM):
Session Manager: Allows us to start a secure shell on EC2 and on-premises servers
- No SSH access, bastion hosts, or SSH keys needed
- No port 22 needed (better security)
- Supports Linux, macOS, and Windows
- Send session log data to S3 or CloudWatch Logs
Run Command: Allows us to automate common administrative tasks and perform one-time configuration changes at scale
Parameter Store: Provides secure data storage. We can store data such as passwords, database strings, AMI IDs, and license codes as parameter values
Patch Manager: Allows installation of patches/updates for OS packages. Not for custom 3rd party application patches (use Run Command)
AWS Trusted Advisor: Analyzes AWS accounts and provide recommendations for
- Cost Optimization
- Performance
- Security
- Fault Tolerance
- Service Limits
- Core Checks and recommendations – all customers
- Can enable weekly email notification from the console
Full Trusted Advisor: Available for Business & Enterprise support plans
- Ability to set CloudWatch alarms when reaching limits
- Programmatic Access using AWS Support API
AWS OpsWorks: AWS OpsWorks is a configuration management service that provides managed instances of Chef and Puppet. Chef and Puppet are automation platforms that allows us to use code to automate the configurations of servers. OpsWorks lets us use Chef and Puppet to automate how servers are configured, deployed, and managed across Amazon EC2 instances or on-premises compute environments.
OpsWorks has three offerings:
- AWS Opsworks for Chef Automate
- AWS OpsWorks for Puppet Enterprise
- AWS OpsWorks Stacks
AWS Workspaces: Virtual Desktops
AWS AppSync: It creates serverless GraphQL and Pub/Sub APIs that simplify application development through a single endpoint to securely query, update, or publish data.
Amazon Elastic Transcoder: It is used to convert media files stored in S3 into media files in the formats required by consumer playback devices (phones etc..)
AWS Batch: it’s a job with a start and an end (not continuous)
- Efficiently run 100,000s of computing batch jobs on AWS
- Batch will dynamically launch EC2 instances or Spot Instances
- Batch jobs are defined as Docker images and run on ECS
- AWS Batch provisions the right amount of compute / memory
Batch <-vs-> Lambda:
- No time Limit <-> time Limit
- Any runtime as long as it’s packaged as a Docker image <-> Limited runtimes
- Rely on EBS / instance store for disk space <-> Limited temporary disk space
- Relies on EC2 (can be managed by AWS) <-> serverless
Amazon Appflow: Fully managed integration service that enables us to securely transfer data between Software-as-a-Service (SaaS) applications and AWS
AWS IoT Core: It is a managed cloud service that lets connected devices easily and securely interact with cloud applications and other devices. It can support billions of devices and trillions of messages, and can process and route those messages to AWS endpoints and to other devices reliably and securely.
Amazon Keyspaces: (for Apache Cassandra), It is a scalable, highly available, and managed Apache Cassandra–compatible database service.
AWS Private 5G: It is a managed service that makes it easy to deploy, operate, and scale your own private cellular network, with all required hardware and software provided by AWS.
AWS License Manager: Makes it easier to manage software licenses from vendors such as Microsoft, SAP, Oracle, and IBM across AWS and on-premises environments. AWS License Manager lets administrators create customized licensing rules that mirror the terms of their licensing agreements.
AWS Wavelength: If the application receives a lot of traffic from mobile devices, we can reduce latency and improve performance by hosting the containers in AWS Wavelength. This is an edge infrastructure managed by AWS, and the EC2 instances are embedded within 5G communication providers. Wavelength is a logical extension of a Region and is managed through the AWS region
AWS Proton: It is an Infrastructure as Code (IaC) deployment workflow tool.
- With Proton, we can provision environments and then configure services running in those environments
- Environments and services are based on environment templates and service templates that we choose in AWS Proton versioned template library