S3 was the first service that AWS released into production. Back in 2006, the product deserved it’s moniker “Simple Storage Service”.
The basics were there — you could create a bucket, put objects in it, and control access to it with access control lists based on IAM users. Unfortunately, security defaults were such that people creating unsecured buckets became a major problem and if you wanted to encrypt your objects at rest you had to do that yourself.
Fourteen years later, S3 is far from Simple — in fact, it’s now one of the most feature-laden (and complex) products in the AWS suite. While its purpose remains simple — it’s an object store — how it achieves that aim is now very sophisticated.
In 2020, S3 boasts:
- Fine-grained access control with secure defaults
- Private network access via VPC endpoints and Access Points
- Data lifecycle management into a range of useful storage classes
- Versioning, static web hosting, and much more
And, as of the 2020 Re:Invent announcements, there are even more sophistications to add to that list.
If it gets any more sophisticated it may start wearing a monocle!
S3 multiple-bucket replication
Previously you had two options for replicating a bucket using S3 — SRR same-region replication and CRR cross-region replication. In both cases you could only replicate your bucket to a single destination bucket.
So, if you needed to have multiple copies of your data in different S3 buckets, (eg perhaps your application is multi-region, or you want multi-region failover) you had to build your own S3 replication service using bucket events and Lambda.
S3 Replication gives you the ability to replicate data from one source bucket to multiple destination buckets by setting up multiple replication rules, allowing you to replicate data to multiple buckets in the same AWS Region using S3 SRR or across different AWS Regions by using S3 CRR, or a combination of both.
As with single-destination replication, you must have versioning enabled on your bucket to set up replication. Also note that it’s still the case that replicated objects can’t be replicated again, so this is still a one-way object sync…which brings me to the next new feature.
Bidirectional Metadata Sync
While this announcement got a lot of people excited, thinking it was making two-way object sync available, its scope is actually limited to syncing metadata, ie the data about your objects that S3 maintains.
What this means is that metadata items, including things like access control lists (ACLs), tags, and Object Lock settings that are changed on a replicated object can be copied back to the original object.
Usually AWS will provide a blog post which illustrates a use-case for new features but they haven’t done that for this feature as yet, and I’m not sure what that use-case is or will be. My guess is it’s for multi-region apps that want to be able to change tags or object lock settings on objects.
Perhaps if you know a use-case you could let me know in the comments below.
Strong read after write consistency
Under the hood, S3 consists of many redundant nodes that store multiple copies of your data, which is why it offers 11 9s of durability. A downside of this design, however, is that it has traditionally only offered eventual consistency for read-after-update operations. This means doing a read from the store immediately after updating an object in S3 might have resulted in you receiving an older version of that object from a node to which that change was yet to propagate.
With S3 this is now no longer the case. Initial object writes and updates will now both provide strongly consistent reads.
This consistency will help S3 support data-oriented services like data lakes, along with the plethora of new data movement use-cases that AWS is supporting, all of which rely on S3 as a core enabling service.
Bucket Keys
Amazon S3 Bucket Keys are a new option for encrypting your objects at rest in S3. Previously you had the option of using the default server side encryption (SSE) key that AWS auto-generates and manages for you, or you could use customer-managed keys (CMK) via KMS.
While KMS — at USD 3c per 10,000 requests — is pretty affordable for most use-cases, invoking it for millions of transactions on millions of objects every month can get expensive. This is because a unique KMS key is generated for each object and must be retrieved for each operation.
With S3 Bucket Keys, instead of creating an individual KMS key for each object, a bucket-level key is used. S3 uses this bucket key to create data keys for objects in the bucket, meaning it doesn’t need to invoke KMS when encrypting or decrypting each object.
It won’t stop there
I’m sure AWS will keep innovating with S3 as it has done consistently over the last 14 years. What will they do next? What’s your favourite S3 feature? What features do you think it needs still?