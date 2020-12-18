The 2020 portion of AWS re:Invent came to a close this week, ending with an emphasis on preparing for the unexpected and navigating the current realities of the world.

Amazon CTO and vice president Werner Vogels gave the closing keynote in what has become a sort of annual counterbalance to AWS CEO Andy Jassy's talk. Jassy delivers a more polished, rapid-fire sales pitch on the value of choosing AWS, while Vogels takes a more systematic and deliberate approach focused on best practices for those already on the platform.

There was also some news in week three, especially around IoT. And while it was supposed to be the last week of the conference, AWS announced plans to add another round of virtual sessions that will run Jan. 12 to Jan. 14. However, don't expect this addendum of content to include any major service announcements, keynotes or leadership sessions.

In this third and final re:Invent recap -- you can find the first two recaps here and here -- we'll go over Vogels' keynote and the news, and share some insights from two other company leaders on the latest from AWS.

Vogels keynote While Vogels did make several service announcements, his keynote was largely a call to action. He advised customers to be more efficient with their cloud resources in order to reduce their carbon footprint. He also urged developers to be mindful of their users and the uncertainties and hardships they might be facing. He used examples of children doing schoolwork in parking lots because they lack decent internet connections at home. Applications that are essential services should be designed so they still work on low bandwidth, high latency connections, he said. "We as developers have a responsibility to our customers to build the best applications we can for them in ways that take the current reality very seriously," Vogels said. He also spoke extensively about the need to design robust, dependable applications. IT teams must be able to deal with changes -- foreseeable and unforeseen, Vogels said. Vogels reiterated past points about the importance of logs and metrics, and he used that as a springboard to discuss the growing focus on observability and the need to see the signals of failure before they happen. He also discussed how AWS uses fault injections to find unreliable unknowns through a process known as fuzzing, and he highlighted a service coming in 2021, AWS Fault Injection Simulator (FIS) that will run controlled chaos engineering experiments on users' applications. Experience with infrequent but critical events could also improve application performance, because IT teams will learn about blind spots that are missed in monitoring and alarms, Vogels said. "Mean time to resolution isn't just about your architecture and automation, but also about the operational muscles you've built and exercise over time," he said.

Additional AWS insights Over the course of re:Invent, SearchCloudComputing interviewed several AWS leaders to talk about the news from the show. The following are insights on a spectrum of topics from Dave Brown, vice president, Amazon EC2, and Deepak Singh, vice president, compute services. EKS Anywhere Amazon Elastic Container Service (ECS) Anywhere and Amazon Elastic Kubernetes Services (EKS) Anywhere extend AWS' managed container services on premises, to edge locations and, in theory, to other public clouds. And despite the bundled announcement, the two services are really targeting different audiences. ECS Anywhere provides the same experience regardless of location, so it's mostly about adding capacity at locations beyond AWS' data centers. On the other hand, EKS Anywhere is for IT teams that prefer the Kubernetes control plane. There's no shortage of ways to run Kubernetes on premises, so the goal here isn't to compete in that space, Singh said. Instead, the service is for IT teams that like the EKS operational model and want a way to set up their on-premises clusters to make it easier to migrate to AWS over time. Amazon ECS and Fargate AWS has evolved its roadmap for Amazon ECS to focus more on simplicity and the developer experience, Singh said. The changing focus is in response to customer usage patterns. And while Kubernetes gets much of the spotlight in the container world, Amazon ECS is apparently still going strong, especially with Fargate, which runs serverless containers on the platform. "ECS has well over 100,000 customers and about half of every new container customer we have on AWS in 2020 starts on Fargate -- the vast majority of them on ECS," Singh said. Chip manufacturing AWS added more Graviton2 instances at re:Invent. These instances rely on AWS' custom Arm-based processors. It also announced plans to add a custom machine learning processor call AWS Trainium, on top of its existing Inferentia machine learning chip. When asked if he could see a time when it would be feasible for AWS to do most of its chips in-house, Brown said, "It's difficult to say," but that ultimately it comes down to price/performance. Brown reiterated that Intel and AMD remain critical processors for many customers -- most instances still run on Intel -- and that AWS will need those two chip manufacturers to continue to innovate. However, he also acknowledged the advancements in Arm chips and how that's impacted their internal efforts on Graviton2-based instances. "I don't know what the mix would look like longer-term, but 40% price/performance on Graviton2 is pretty enticing for our customers." Mac instances The Amazon EC2 Mac instances for macOS was another announcement that generated a lot of buzz, but what might have flown under the radar is the engineering effort required to make it work. These instances rely on Mac minis installed inside AWS data centers. The Mac mini fits perfectly in a 1U server sled, Brown said. From there, the Nitro card can emulate peripheral devices, making the Mac mini think it's a hard drive plugged in via the Thunderbolt connection. But that didn't solve everything. "The one thing we couldn't do was push the power button," Brown said. To handle that, engineers used a solenoid with a small motor that can send an API to it and turn on the Mac mini. Also, the terms and conditions for Big Sur apparently had to be changed to accommodate the service, with new language added around leasing, Brown said. Kinesis outage Amazon Kinesis suffered a major outage in the Northern Virginia Region in the week leading up to re:Invent. And because of dependencies, it also impacted Amazon Cognito, CloudWatch and, to a lesser extent, AWS Lambda. You can read the full summary of the incident, but Brown was forthright when asked about it. "That's a tough learning [experience] and unfortunately we had to learn it the hard way, and now we have to make sure we don't have that problem anywhere else," he said. Kinesis failed on a limit within the system, so AWS has audited the rest of its services to make sure they won't fail on a thread limit or any other kernel limit. Engineers will also take a closer look at service dependencies to ensure they degrade properly when systems fail. The incident also gives some insight into the scale and complexity of AWS operations, since the system hit the 32,000-thread limit on a single machine -- something Brown said was unheard of before this event. "We run at a scale that is just abnormal, and we do very well at doing that normally," Brown said. "One of the things I said is I've never seen a single service use that many threads on a single host before, and that was a learning experience for us."