Amazon Aurora:
A New Dawn in the World of RDBMS
Kim Schmidt -> AWS Consultant
DataLeader.io -> AWS Partner
-> AWS Vendor
Frank La Vigne
frank@dataleader.io
Introduction
 President & CEO of DataLeader.io http://dataleader.io/
 8 Industry Certifications (including Microsoft), Currently
Studying for the Amazon Web Services Solution Architect
Associate Exam
 Won the National Windows 7 Incubation Week 9 Months
Prior to Release, NAPW “Woman of the Year”, O’Reilly
Media Author & Trainer
 Email: kim@dataleader.io
 Twitter: @DataLeader
 Blog: https://awskimschmidt.com/ &
https://kimschmidtsbrain.com/
 LinkedIn: https://www.linkedin.com/in/dataleader/
2 |2 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center
*
Why Amazon Aurora?
3 |3 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center
• MySQL/PostgreSQL – ANSI-SQL
• Speed/Availability of High-End
Commercial Databases with
Simplicity & Cost Effectiveness of
Open Source Databases
• Highly Secure Multiple Levels
• At least as Durable & Fault-
Tolerant of Enterprise-class
Database Engines at 1/10 the Cost
& No License Needed
• Fully-Managed
• Built ON AWS FOR the cloud “from
scratch”
• Integrates with Other AWS Services
• Infinitely Scalable
• Decoupled Storage, Logging, &
Caching from DB Engine – SOA!
• Asynchronous Scheme for
Durable State
• Drastically Reduced NW I/O &
PPS on NW
• 6 Copies of Data Across
Locations
• Re-engineered Thread Pooling
• Over .5M SELECTs/sec & 100K
reads/sec
• 6 million INSERTs/min & 30M
SELECTs/min
• Further Scales with Up To 15
Read Replicas Automatically
Grows Storage Up To 64 TB
• THE LOG IS THE DATABASE!
Almost Instant Crash Recovery
with No Data Loss
What? No Maintenance?
4 |4 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center
HIGH
AVAILABILITY
SCALABILITY
OS INSTALLS
OS PATCHING
POWER
HVAC
NETWORK
RACKING
STACKING
MAINTENANCE
DB SW INSTALLS
DB SW PATCHES
DB BACKUPS
MANAGED BY AWS
MANAGED BY YOU
APP
OPTIMIZATION
Fully Managed Means This is YOU!
5 |5 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center
Amazon Aurora
Storage Cluster Volume Diagram
6 |6 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center
What Why How When Where, Say What???
7 |7 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center
3 Significant Architectural
Advantages:
 Storage: independent, fault-
tolerant, self-healing SERVICE
across data centers
 Network IOPs reduced by only
writing redo logs to storage
 Backup & Redo Recovery is
continuous, asynchronous,
with compute & memory
spread across a large
distributed fleet of Aurora
instances
Amazon Aurora Architecture 101
8 |8 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center
Database
Engine
Storage,
Logging
(&
Caching)
Continual
Backups
Amazon
RDS:
Storage
Control
Plane
Services
Used
Customer VPC
RDS VPC
Storage VPC
*
Scaling Up
9 |9 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center
One Amazon Aurora Instance Can Scale from:
1 vCPU & 2GB memory (new small) -> 32 vCPUs & 244GB memory
Scaling Out
10 |10 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center
MySQL with Standby I/O
11 |11 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center
REDO LOG BINLOG DATA DOUBLE-WRITE FRM FILES
I/O Traffic in Amazon Aurora Database
12 |12 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center
REDO LOG BINLOG DATA DOUBLE-WRITE FRM FILES
I/O Traffic in Amazon Aurora Storage Node
13 |13 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center
Throughput, Availability, & Durability
14 |14 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center
Storage Node Availability / Durability:
 Quorum system for read / write that’s
latency tolerant & doesn’t stall writes
 Peer-to-Peer gossip replication to fill in the
holes
 Continuous scrubbing of data blocks
 Amazon Aurora’s backup capability
enables point-in-time recovery of your
database instance, to any second during
your established retention period, up to 35
day
 Backups are automatic, incremental, &
continuous, have no impact on database
performance, & has 99.999999999%
durability
Amazon Aurora’s
Asynchronous Group Commits
15 |15 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center
Amazon Aurora’s Adaptive Thread Pool
16 |16 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center
Always-Warm Cache
17 |17 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center
THE LOG IS THE DATABASE!
18 |18 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center
DEMO
LAUNCHING AN
AMAZON AURORA
CLUSTER
The 1st EVER #SQLSatLA20 |
AWS Database Migration Service
21 |21 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center
AWS Schema Conversion Tool for
Heterogeneous Data Migration
22 |22 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center
AWS Schema
Conversion Tool
DATA MIGRATION
ASSESSMENT
REPORT BEING
GENERATED
Migrating Heterogeneous
Databases to Amazon Aurora
23 |23 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center
1. SOURCE DB
SCHEMA
2. ACTION ITEMS:
CAN’T CONVERT
AUTOMATICALLY
3. STATUS OF
CURRENT TARGET
DB SCHEMA
4. CHOSEN SOURCE
SCHEMA ELEMENT
DETAILS
5. CHOSEN SCHEMA
ELEMENT TARGET
SCHEMA DETAILS
Migrating Heterogeneous
Databases to Amazon Aurora
24 |24 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center
http://bit.ly/heteroMigration
Summary
At the end of this session, you should have learned:
 Amazon Aurora has a SOA where the database engine is decoupled from
storage, logging & cache
 By only writing redo log records to storage, network IOPs are drastically
reduced
 Backup is asynchronous, continual & incremental happening in the background
 Recovery is near-instantaneous via warm cache + without checkpointing,
occurs in the background, & without data loss
25 |25 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center
The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center26 |
Please Support Our Sponsors
SQL Saturday is made possible with the generous support of these sponsors.
You can support them by opting-in and visiting them in the sponsor area.
EMAIL ME!: kim@dataleader.io
TWEET ME!: @DataLeader
CONNECT WITH ME!:
https://www.linkedin.com/in/dataleader/

An Introduction to Amazon Aurora Cloud-native Relational Database

  • 1.
    Amazon Aurora: A NewDawn in the World of RDBMS Kim Schmidt -> AWS Consultant DataLeader.io -> AWS Partner -> AWS Vendor Frank La Vigne frank@dataleader.io
  • 2.
    Introduction  President &CEO of DataLeader.io http://dataleader.io/  8 Industry Certifications (including Microsoft), Currently Studying for the Amazon Web Services Solution Architect Associate Exam  Won the National Windows 7 Incubation Week 9 Months Prior to Release, NAPW “Woman of the Year”, O’Reilly Media Author & Trainer  Email: kim@dataleader.io  Twitter: @DataLeader  Blog: https://awskimschmidt.com/ & https://kimschmidtsbrain.com/  LinkedIn: https://www.linkedin.com/in/dataleader/ 2 |2 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center
  • 3.
    * Why Amazon Aurora? 3|3 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center • MySQL/PostgreSQL – ANSI-SQL • Speed/Availability of High-End Commercial Databases with Simplicity & Cost Effectiveness of Open Source Databases • Highly Secure Multiple Levels • At least as Durable & Fault- Tolerant of Enterprise-class Database Engines at 1/10 the Cost & No License Needed • Fully-Managed • Built ON AWS FOR the cloud “from scratch” • Integrates with Other AWS Services • Infinitely Scalable • Decoupled Storage, Logging, & Caching from DB Engine – SOA! • Asynchronous Scheme for Durable State • Drastically Reduced NW I/O & PPS on NW • 6 Copies of Data Across Locations • Re-engineered Thread Pooling • Over .5M SELECTs/sec & 100K reads/sec • 6 million INSERTs/min & 30M SELECTs/min • Further Scales with Up To 15 Read Replicas Automatically Grows Storage Up To 64 TB • THE LOG IS THE DATABASE! Almost Instant Crash Recovery with No Data Loss
  • 4.
    What? No Maintenance? 4|4 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center HIGH AVAILABILITY SCALABILITY OS INSTALLS OS PATCHING POWER HVAC NETWORK RACKING STACKING MAINTENANCE DB SW INSTALLS DB SW PATCHES DB BACKUPS MANAGED BY AWS MANAGED BY YOU APP OPTIMIZATION
  • 5.
    Fully Managed MeansThis is YOU! 5 |5 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center
  • 6.
    Amazon Aurora Storage ClusterVolume Diagram 6 |6 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center
  • 7.
    What Why HowWhen Where, Say What??? 7 |7 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center 3 Significant Architectural Advantages:  Storage: independent, fault- tolerant, self-healing SERVICE across data centers  Network IOPs reduced by only writing redo logs to storage  Backup & Redo Recovery is continuous, asynchronous, with compute & memory spread across a large distributed fleet of Aurora instances
  • 8.
    Amazon Aurora Architecture101 8 |8 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center Database Engine Storage, Logging (& Caching) Continual Backups Amazon RDS: Storage Control Plane Services Used Customer VPC RDS VPC Storage VPC *
  • 9.
    Scaling Up 9 |9| The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center One Amazon Aurora Instance Can Scale from: 1 vCPU & 2GB memory (new small) -> 32 vCPUs & 244GB memory
  • 10.
    Scaling Out 10 |10| The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center
  • 11.
    MySQL with StandbyI/O 11 |11 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center REDO LOG BINLOG DATA DOUBLE-WRITE FRM FILES
  • 12.
    I/O Traffic inAmazon Aurora Database 12 |12 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center REDO LOG BINLOG DATA DOUBLE-WRITE FRM FILES
  • 13.
    I/O Traffic inAmazon Aurora Storage Node 13 |13 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center
  • 14.
    Throughput, Availability, &Durability 14 |14 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center Storage Node Availability / Durability:  Quorum system for read / write that’s latency tolerant & doesn’t stall writes  Peer-to-Peer gossip replication to fill in the holes  Continuous scrubbing of data blocks  Amazon Aurora’s backup capability enables point-in-time recovery of your database instance, to any second during your established retention period, up to 35 day  Backups are automatic, incremental, & continuous, have no impact on database performance, & has 99.999999999% durability
  • 15.
    Amazon Aurora’s Asynchronous GroupCommits 15 |15 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center
  • 16.
    Amazon Aurora’s AdaptiveThread Pool 16 |16 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center
  • 17.
    Always-Warm Cache 17 |17| The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center
  • 18.
    THE LOG ISTHE DATABASE! 18 |18 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center
  • 19.
  • 20.
    The 1st EVER#SQLSatLA20 |
  • 21.
    AWS Database MigrationService 21 |21 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center
  • 22.
    AWS Schema ConversionTool for Heterogeneous Data Migration 22 |22 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center AWS Schema Conversion Tool DATA MIGRATION ASSESSMENT REPORT BEING GENERATED
  • 23.
    Migrating Heterogeneous Databases toAmazon Aurora 23 |23 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center 1. SOURCE DB SCHEMA 2. ACTION ITEMS: CAN’T CONVERT AUTOMATICALLY 3. STATUS OF CURRENT TARGET DB SCHEMA 4. CHOSEN SOURCE SCHEMA ELEMENT DETAILS 5. CHOSEN SCHEMA ELEMENT TARGET SCHEMA DETAILS
  • 24.
    Migrating Heterogeneous Databases toAmazon Aurora 24 |24 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center http://bit.ly/heteroMigration
  • 25.
    Summary At the endof this session, you should have learned:  Amazon Aurora has a SOA where the database engine is decoupled from storage, logging & cache  By only writing redo log records to storage, network IOPs are drastically reduced  Backup is asynchronous, continual & incremental happening in the background  Recovery is near-instantaneous via warm cache + without checkpointing, occurs in the background, & without data loss 25 |25 | The 1st EVER #SQLSatLA on June 10th 2017 Microsoft Technology Center
  • 26.
    The 1st EVER#SQLSatLA on June 10th 2017 Microsoft Technology Center26 | Please Support Our Sponsors SQL Saturday is made possible with the generous support of these sponsors. You can support them by opting-in and visiting them in the sponsor area.
  • 27.
    EMAIL ME!: kim@dataleader.io TWEETME!: @DataLeader CONNECT WITH ME!: https://www.linkedin.com/in/dataleader/

Editor's Notes

  • #2 CLICK Laptop Sticker
  • #4 * Question attendees KEEP CLICKING THROUGH FOR ANIMATIONS Since it was released in July 2015, surpassing Redshift, which pioneered cloud-based dw Not a SQL Server expert & I’m also not an expert on Amazon Aurora. Some of you might know more about Aurora than I do. But I’m here to tell a you a story that happened last fall….CLICK Story incl YouTube & Aurora Team & Abdul Sait In a nutshell, then what can in depth Amazon Aurora is: CLICK CLICK CLICK UNTIL LOG IS DB Achieves consensus on durable state across many storage nodes with an asynchronous scheme, avoiding chatty recovery protocols
  • #5 Managed Services Provision only what you need on-demand, pay only for what’s used
  • #6 CLICK
  • #7 A Cluster = 1+ instances & cluster volume that manages data for the instances CLICK Here, 1 cluster so it’s both r/w, pointed to by the blue arrow CLICK Cluster vol = all-SSD virtual db stoage vol spanning 3 Azs in whatever region. Recommend to place in diff regions for autom failover Has 2 storage nodes in ea AZ, totaling 6 nodes for high avail, even if you only have 1 instance CLICK Each node has individual segments that have their own redo logs CLICK In this diagram, you have a Primary Cluster & 2 Read Replicas distributed across 3 AZs for durability & availability, & scalability The blue arrow here points to the cluster volume backing up to S3 Notice the 2-way arrows between the storage nodes in each AZ. This is peer-to-peer gossip that we’ll be talking about later
  • #8 3 significant archetectural advantages: CLICK Storage is an independent, fault-tolerant, & self-healing SERVICE across many data centers, protecting db from perf variance & transient or permanent failures at either the nw or storage tier CLICK By only writing redo log records to storage, nw IOPs lowers dramatically. Once the new bottleneck was addressed, the Aurora Team were able to aggressively optimize numerous points of contention, obtain significant throughput improvements Moved the complex & critical functionality of backup & redo recovery from a 1-time expensive, multi-phase operations in the database engine to continuous, asynch ops amortized (how much time & memory) across a large distributed fleet of instances = near-instant crash recovery w/o checkpointing & with inexpensive backups that don’t interfere with the foreground process LET ME EXPLAIN
  • #9 Decoupled SOA Architecture Moved Logging & Storage to its own service (like EBS) Storage deployed on cluster of EC2 SSD VMs Caching outside the db process to remain warm in case of db restart CLICK Automatic, continual, incremental backups to Amazon S3 (11 9’s) for no extra charge (*Dropbox uses S3!) CLICK Amazon RDS = agent that monitors cluster health & determines if it needs to fail over or if an instance needs to be replaced CLICK Amazon DynamoDB: NOSQL - persistent storage of cluster & volume configuration, volume metadata & a detailed description of data backed up to S3 CLICK For orchestrating long-running operations (ie restore), Amazon Simple Workflow Service (SWF) is used. CLICK Amazon Route 53 is used in helping to maintain pro-active, automated, & early detection of real & potential problems, before end users are impacted (re-routes to Replicas or creates new instance) CLICK For Security, communication is isolated between the database, apps & storage with VPCs
  • #10 Scaling Up buy a bigger database host Scaling Out = Sharding, additional administration costs The minimum storage capacity for an Amazon Aurora cluster is 10 GB. Based on your usage, your storage will automatically grow in 10 GB increments up to 64 TB with no impact to database performance, and no need to provision this storage increase in advance If you want to scale up immediately, you can scale compute resources allocated to your instance in the AWS Console: the associated memory and CPU resources are modified by changing your instance class. You can scale from an instance with 2 vCPUs with 15 GB memory to an instance with 32 vCPUs and 244 GB memory. Scales up to millions of transactions per minute If you need more than that, you can add up to 15 Read Replicas CLICK You can MODIFY a running instance to Scale Up Can click checkbox to “Apply Immediately” or if you don’t it will happen during your next chosen maintenance window (avail impact for a few min)
  • #11 Scaling out by creating up to 15 AURORA Read Replicas spread across 3 Availability Zones to further scale read capacity, reduce latency, increase availability & durability. You can do this live from the console
  • #12 Here you see a representation of what a MySQL database running on Amazon RDS would look like with Standby. #1: Writes to Prim Inst writes data against EBS, then by #2 mirrored to another EBS for EBS dur & avail, at #3, same write operation issued to standby where again with #4&5 the EBS get data writes DRDB=DISTRIBUTED REPLICATED BLOCK DEVICE – LINUX DIST REPLICATED STORAGE Looking at Observations, if nothing else, I/O at #’s 1,3&% are sequential & synchronous Note a Performance Test was done: they got 780 thousand write transactions in 30 minutes, and about 7.4 thousand read/write IO transactions per million transactions. This was only tested on the Primary Instance because the Standby database is part of the Amazon RDS Service and really relevant to what’s happening on the Primary database instance. (END)
  • #13 Let’s now look at IO traffic on an Amazon Aurora DATABASE One difference is that the only thing the Primary Instance is doing to the storage layer that spans the multiple AZs is sending redo log files to that storage node. It collects them up, aka “boxcarring” combines them together & sends them in regular chunks to the storage node. The redo log files are small, & batching them up before sending really helps reduce network IO being sent across the wire. This allows the Aurora Cluster to be a lot more TOLERANT of what’s going on from a network standpoint because it’s sending small amounts of data to start with thus any little hiccups that might occur are less impactful to what’s happening from the Aurora cluster standpoint. Once the log files get to the storage layer, they do 6x more writes than you’d see in MySQL because we’re writing it to all 6 nodes – but it ends up being 9x less network traffic compared to what you’d see in MySQL PRIMARY INSTANCE ONLY Same Benchmarking test: (able to write 28 MILLION transactions, which is) 35x transaction more than what was achieved with MySQL. That works out to about 950K IO operations per 1M transactions – this has 6x the amplification because of the 6 storage nodes. If you broke that down it’s about 158K read/write operation per storage node. That’s about 7.7x less than what it took from a MySQL standpoint.
  • #14 Let’s now look at IO traffic on an Amazon Aurora Storage Node CLICK The Primary Instance is going to send log records to the storage node The Storage Node, BY #1, is going to put that into an in-memory queue CLICK Then BY #2, it’s going to pull of that queue and persist that data & write it to disk = At that point the data is durable on the storage node & it would acknowledge back – notice the acronym “ACK” to the Primary Instance, & at that point, all interactions with that Primary Instance is DONE! That is the critical path from an Aurora standpoint. Everything else is done asynchronously & can happen independent of communication with the Primary Instance. CLICK BY #3, Once the storage node has its log files, it’s going to start organizing those log files & the records in those log files because things can chill up out of sequence or not show up at all sometimes. It’ll find out “do I have everything”, “am I missing any writes”, “is anything out of sequence”? CLICK At this point, by #4, is where for all 6 storage nodes begin the peer-to-peer gossip network that helps all 6 nodes talk to each other & resolve conflicts where maybe 1 of the nodes is missing data. The nodes sort all that out & exchange data so that all the nodes have the same amount of data & that all missing data or conflicting data is resolved. CLICK BY #5, Once that’s done they coalesce the log records into new data block versions CLICK BY #6, Then periodically, asynchronously & very frequently those storage nodes will backup the log & block data to Amazon S3, implying this is our new storage used for our database. CLICK BY #7, Storage nodes will also go through garbage collection, looking for old log files & data blocks that have been replaced & get rid of those CLICK BY #8, Finally it will do data scrubbing. It re-reads data blocks independent of requests from the database instance & verify checksums against those blocks to ensure they’re still good data blocks that haven’t been corrupted through normal disk usage or that they’re dirty now. If they do find a bad data block, they leverage the peer-to-peer network again & they’ll heal themselves across that network so everything is fine again. Some things to note: the input queue is 46x less than in MySQL IO (unamplified, per node). They get the foreground latency path done in the first 2 steps out of the way making everything else asynchronous. The storage tier is multi-tenant, so there are going to be patterns in high usage & low usage during the day on that storage tier, so the team worked to take advantage of the low points to get a lot of these asynchronous jobs done so there’s no negative impact on the customers but still get all of the work done in a decent amount of time.
  • #15 CLICK Amazon Aurora implements a technique called “Quorum” that enforces consistent operation in a distributed system. It’s used as a replica control method and a way to ensure transaction atomicity in the presence of network partitioning without stalling writes. CLICK Peer-to-Peer “gossip replication” fills in holes. The gossip protocol discovers and shares location & state information about the other nodes in the cluster, and is persisted locally by each node. This has been likened to gossip around a water cooler where by the end of the day everyone has heard the latest “gossip” at least once, if not many times CLICK Amazon Aurora has continuous scrubbing of data blocks. Data scrubbing is an error correction technique that uses a background task to periodically inspect main memory or storage for errors, then correct detected errors using redundant data in the form of checksums. This reduces the likelihood that single correctable errors will accumulate CLICK Amazon Aurora’s backup capability enables point-in-time recovery of your database instance, to any second during your established retention period, up to the last 5 minutes. Your automatic retention period can be anywhere up to 35 days, as you can see in this screenshot. Automated backups are stored in Amazon S3 with no impact on database performance
  • #16 Another architectural change done in Amazon Aurora is the way commits happen Traditionally, the way that commits work is somebody does a write, & those writes are collected in a buffer. time has passed, the buffer will get flushed & written to disk. The problem with this is whoever was the first writer will get a latency penalty. If not enough reads happen, they have to wait for the timeout flush to happen. In Aurora, as soon as the first write happens, IO operations start. Every write gets its own IO, it’s not waiting for anything else. These writes are collected in a buffer & a background job collects them at some point & they’re sent off to the storage node. They’re considered durable when 4 out of the 6 storage nodes acknowledge YES, I’ve got the data & I’ve committed it to disk. Then they look at the last log record number, - or log sequence number or LSN - CLICK in this case it’s #47, and the system asks “who below this number needs an acknowledgement?” & they’re going to acknowledge back to all the numbers below that that the write was successful & at that point they consider the database durable up to this last record number and then it advances from that point. There’s a lot more complexity about how Aurora is able to master asynchronous processing other than simply LSN Look up
  • #17 Another architectural change done in Amazon Aurora is they re-engineered the thread pooling. Today with MySQL, every connection gets a thread. And as more connections happens and the database is more heavily used, this can be a problem, leading to performance challenges. TIME (Some of this is solved with MySQL Enterprise Edition where thread groups are used. If it sees a long running connection, it’ll add another thread to accommodate that, but it’s kind of a work-around where it requires careful stall threshold tuning to add threads in the right places so there’s not too many threads which will burden the database or not add enough threads which will delay getting things processed.) So on Amazon Aurora from a thread point, is everything connects via epoll() CLICK Behind epoll() is a Latch Free Task Queue that has a bunch of threads that aren’t doing any work & are available for work. Because these are independent, threads are able to scale up or down dynamically depending on the amount of pending transactions or connections coming in to epoll(). What’s really cool about this is it’s “aware” of when a transaction is awaiting a commit. While awaiting that commit, the thread can be repurposed & let it go & do other work & have another thread hang around to wait for the acknowledgement of the commits when they happen. This helps threading get the most of what it’s already allocated. This means Amazon Aurora can gracefully handle 5 thousand + concurrent client sessions on the largest Amazon Aurora instance, r3.8xl.
  • #18 You can also cache the writes in memory & when the buffer fills up you flush it out asynchronously. When you cache the writes in memory, you end up with great consistent write performance because you’re writing to memory & not to storage. This also means backups are continuous and incremental. Each new log segment is copied to backup storage as it’s completed. Last bullet point: Instant Crash Recovery + Survivable Cache = Quick & Easy Recovery from DB Failures There’s also what’s referred to as “Multi-Version Concurrency Control” which has to do with the fact that data is only appended, not updated, What that means is that when a client requests a read, the read is going to look up the bit of data via the pointers in the index. This means it’s going to get the most current data at the time the read was requested. Someone could change the data while the client is reading it, but that’s ok because to that client it got the most up-to-date version. If the client then wants to write a value back out, all you need to do is on the write, look to see if the pointer has changed from the value when it was read & if answer is yes, you have a concurrency problem. In Amazon Aurora, you can handle that optimistically as opposed to pessimistically where you’d have to lock that data down during that transaction. Reads get a copy of the data in the exact state it existed when the transaction started with optimistic concurrency. This doesn’t mean you’re going to overwrite what someone else has written, it just means you can choose to do this or you can choose to re-read it & re-write it. This solves a lot of scaling issues with relational databases’ heavy use of locks, which can be burdensome. (END)
  • #19 WHAT THIS LOG-BASED STORAGE REALLY MEANS IS THAT THE DB FILE ITSELF IS THE WRITE-AHEAD LOG. It’s your replay log as you’re writing out, appending these blocks to storage, which is what write-ahead does That means your IO is reduced because you don’t have to do 2 writes now, which is write to the write-ahead log then secondly do what you said you were going to do in the write-ahead log which is how databases work today, and also you get almost instantaneous recovery from failures because the database file is the write ahead file so if the database fails you just restart the database & start reading from where you left off, update the pointers & you’re up by your replay logs as you’re reading from the database.
  • #20 FOR BEN: ON THE NEXT SLIDE/VIDEO #20 CLICK IT, IT’S A VIDEO-ONLY DEMO WITH ANNOTATIONS WHERE I’D STOP TO EXPLAIN. EVERYONE LOVED THIS FORMAT, BECAUSE I CUT OUT ALL THE CLICKS & STUFF THAT TAKES UP TIME
  • #21 CLICK SLIDE, IT’S A VIDEO-ONLY DEMO WITH ANNOTATIONS WHERE I’D STOP TO EXPLAIN. EVERYONE LOVED THIS FORMAT, BECAUSE I CUT OUT ALL THE CLICKS & STUFF THAT TAKES UP TIME Launch Options Regions (& AZs) Aurora DB Engine DB Instance Class is how you scale up – watch “Details” Multiple AZ Deployment VPC: Must Deploy in VPC w at Least 1 Subnet in at Least 2 AZs. RDS Auto-Provisions a New Instance in an AZ that has a VPC Subnet upon a Failover. ALSO Provides options to load balance across AZs if one of the Aza becomes temp unavail VPC Security Group: Rules for Inbound Access, by default no access IAM or DB Authorization: IAM more granular & safer Encryption: KMS Failover Priority Backup Retention Period Enhanced Monitoring: = Metrics from the DB are free via CloudWatch. Enhanced=agent on the instance, useful to see how diff processes or threads on DB uses the CPU, etc) Maintenance Window Launch & Stop: Review Window RDS Dashboard, Instances Tab: New Instance Creating Primary Cluster with a Read Replica Instance Actions: Modify: Scale Up Instance Actions: Create Read Replica: Scale Out Alarms & Recent Events Tab Configuration Details Tab -> Subnets & Security Group DB Cluster Details & Cluster Endpoint SQL Client: Paste Endpoint into “Host or IP Address” Testing Connection Create Table Insert Data into Table -> Better to Use Load from S3 Select * ??? – Where’s the data coming from? * Another query to see how the cast from Better Call Saul rates me on my presentation Back in RDS Dashboard CloudWatch Metrics
  • #22 AWS Database Migration Service helps you migrate databases to AWS easily and securely. The source database remains fully operational during the migration, minimizing downtime to applications that rely on the database. The AWS Database Migration Service can migrate your data to and from most widely used commercial and open-source databases. CLICK With just a few clicks, the migration to Amazon Aurora starts CLICK While your original db stays live CLICK You can even replicate back to your original db
  • #23 For heterogeneous migrations, the AWS Schema Conversion Tool automatically converts the source database schema and a majority of the custom code to a format compatible with the target database. The custom code that the tool converts includes views, stored procedures, and functions. Any code that the tool cannot convert automatically is clearly marked so that you can convert it yourself. CLICK The SCT creates an assessment report upon completion the assessment report view opens, showing the Summary tab. The Summary tab displays the summary information from the database migration assessment report. It shows items that were converted automatically, and items that were not converted automatically. AWS SCT ANALYZES YOUR APP, EXTRACTS THE SQL CODE & CREATES A LOCAL VERSION OF THE CONVERTED SQL FOR YOU TO REVIEW & EDIT. THE TOOL DOESN’T CHANGE THE CODE IN YOUR APP UNTIL YOU’RE READY
  • #24 The assessment report view also includes an Action Items tab. This tab contains a list of items that can't be converted automatically to the database engine of your target Amazon RDS DB instance. If you select an action item from the list, AWS SCT highlights the item from your schema that the action item applies to. The report also contains recommendations for how to manually convert the schema item. CLICK Sample report: Turquoise highlight = “Of the total 179 database storage objects, in the source database, we were able to identify 169 (94%) database storage objects that can be converted automatically or with minimal changes to MySQL” The second line states “10 (6%) database storage objects required 58 medium & 10 significant user actions to complete the conversion” Simple – Actions that can be completed in less than 1 hour. Medium – Actions that are more complex and can be completed in 1 to 4 hours. Significant – Actions that are very complex and take more than 4 hours to complete.
  • #25 YOU can migrate your db for as little as $3/TB AWS DMS supports, as a source, on-premises and Amazon EC2 instance databases for Microsoft SQL Server versions 2005, 2008, 2008R2, 2012, and 2014. The Enterprise, Standard, Workgroup, and Developer editions are supported. The Web and Express editions are not supported.
  • #26 Add benefits, re-evaluate