Amazon Aurora DSQL is a new serverless relational SQL database specifically engineered for transactional (OLTP) workloads like microservices, websites, and mobile apps. It provides scalability for a vast range of applications, from those handling millions of transactions per second down to those needing only tens of requests per day. The goal is to offer a database that seamlessly scales with your business, eliminating the need for disruptive data migrations.
https://www.youtube.com/watch?v=huGmR_mi5dQ
Key Architectural Principles
The architecture is built around several key principles:
- Disaggregation: Aurora DSQL disaggregates the traditional monolithic database architecture into independent, horizontally scalable layers for each core function. This includes a transaction and session router, a compute layer using Firecracker microVMs to securely run the Postgres engine, an adjudicator for isolation, a journal for atomicity and durability, and a storage engine for efficient querying. Each layer scales independently based on the workload, optimizing resource utilization.
- Minimizing Coordination: Scalability in distributed systems hinges on minimizing coordination between components. Aurora DSQL achieves this by avoiding coordination before commit time, allowing reads and writes to occur locally without cross-region communication. This design results in significant performance gains, especially for multi-region deployments.
- Strong Consistency: The database emphasizes strong snapshot isolation, equivalent to Postgres' repeatable read level, but with the added benefit of strong consistency. This choice simplifies application development by ensuring that developers can always rely on the data they read being accurate and consistent, even in distributed, multi-region environments.
Optimizing for Performance and Scalability
Several key design choices contribute to the performance and scalability of Aurora D-SQL:
- The Log is the Database: Aurora DSQL leverages the "log is the database" concept, utilizing the internal, highly scalable, distributed log service called Journal. This service, already powering services like S3, DynamoDB, and Kinesis, ensures atomic and durable writes.
- Efficient Storage Engine: The storage engine, optimized for querying data from the journal, employs techniques like pushdown compute to minimize round trips between the SQL engine and storage, addressing the challenge of increasing data volumes and latency limitations in distributed systems.
- Multi-Version Concurrency Control: Reads are handled using multi-version concurrency control (MVCC) based on atomic clocks from the AWS Time Sync service. This approach provides consistent snapshots across the database without inter-node communication or locking on the read path, significantly improving read scalability.
Ensuring Correctness and Reliability
The development team prioritized several key aspects to ensure the correctness and reliability of D-SQL:
- Rust as the Foundation: All new code is written in Rust, chosen for its performance and memory safety, mitigating stability and security issues common in other languages.
- Deterministic Simulation Testing: Rigorous testing using deterministic simulation allows for controlled testing of failure scenarios (network failures, clock issues, component unreliability) that are difficult to reproduce in real-world environments.
- Formal Methods and Runtime Monitoring: Formal methods like TLA+ are employed to mathematically verify the correctness of protocols. Runtime monitoring bridges the gap between implementation and specification by continuously checking application logs against formal specifications.
Multi-Region Capabilities and Failover
DSQL is designed for active-active multi-region deployments, enabling low-latency access for global users and ensuring resilience in case of regional failures. Key features include: