My biggest failure was the start of the greatest journey

Marco Suma
8 min readJan 15, 2022

--

I perfectly remember the day of my biggest failure. I remember it because it’s the day where everything started. Let me tell you a story.

It was 2018 and I was working at Amazon. It was almost my second year in the company and I was also in my second team in the same AWS org, the Cloudwatch team in Dublin. In those days I just had an idea of implementing what is now called AWS Usage Metrics, a service officially released in 2019. I led that project until the last day I worked in AWS. After that, other brilliant engineers kept working on it.

At that time, I thought it was a quick win for the company and a great win for our customer because it was a solution offered for prevention from getting throttled when using AWS services: in other words, I wanted to offer them the possibility of having monitoring metrics to track their resources usage; with that, they could use alarms to be informed when reaching thresholds and therefore prevent being throttled and ask for a limit increase.

In fairness, that is a more mature idea of the one I had. When I proposed it to my manager, it was more like “we could offer a service for customers to show them their rate of usage of our APIs”.

My manager still liked the idea (btw I only had great managers in Amazon, and he was one of them) and so I started socialising it even more. Also one of my colleagues, who is now a great friend, more senior than me and in the team for many years, seemed excited about it also because he had that in mind for a long time but never managed to prioritise it.

Everything looked great.

I was excited by the idea that I could deliver something proposed by me supported by the excitement of most people in the team (this is the most ideal scenario for an engineer looking for promotion: you create scope for you and the team, you deliver it and create value for your customers). So we planned the project, estimated the capacity and added it in the roadmap of the following quarter.

When the new quarter came, this project was not the highest priority therefore it took a bit of time for me to start making some progress. When the time came, my manager and the senior engineer in the team decided to present a design document of what we wanted to do and how. It was my first big, important, design doc from scratch to be presented alone to my director and other principal engineers. I was nervous and actually with no experience of what I had to write in that doc. What was the information I had to focus on? How to write it? I actually had to kind of sell the idea as well.

Of course, the fact that it was the first time was actually quite evident: it went to a disaster up to a point that my director called my manager after the meeting to ask for explanations on how there was so little progress and many unanswered questions after many weeks from the start of the quarter.

That moment represented a turning point in my career. It almost feels like to me that to become a senior engineer I had to go to that sort of a failure. Weirdly, I almost felt embarrassed but at the same time happy that happened. But what exactly was the failure? And why was it so important?

Let’s start with a fact: one of the biggest lessons I learned in software engineering is that you must know the limits of your system. You must know what is the tipping point at which it does not scale anymore. If you know that, you’ve done an awesome job. This was a frequent exercise we were running with the team in Amazon: we were running these so-called Game Day events where the intention was to test the production environment by simulating outages or reduced capacity.

In other words, what you do is to increasingly remove capacity up to when you start seeing failures. As soon as that happens, you record that moment and you restore everything. That exercise is going to give you information like how much unit of capacity is needed for each amount of requests you handle. Similarly, you can emulate more and more incoming concurrent requests (stress test) and you verify up to where you arrive without failures. All of these are very valid techniques you need to know and do.

But the point of this note is actually different. Based on the same principle I’ve just explained, in life you must know your limits in order to succeed: you must fail in order to be able to make progress. That is why I was happy that this episode happened to me: I knew that from that day on, I would have become better and I would have learned.

Be a master in writing design docs

For a software engineer, writing design docs should not be a hobby or a “I’ll do it in my free-time” thing. For what it matters, when running for promotions things like your ability to bring people to an agreement, your clarity when writing and expressing concepts are very much considered.

With a well-thought design doc, implementing something becomes ridiculously easy. It’s something that you can end-up doing with peace of mind and maybe delegate to your intern or junior engineer.

Here’s some suggestions when you have to write a design doc.

Know your tenets

The first time a principal engineer told me “I don’t even know why do we need this, what are our tenets?” I was shocked. I didn’t even know what tenets were and I underestimated the importance of giving context and making sure everyone is on the same page. On a big scale, tenets are usually supported by a team mission. Practically, is a list of principles for which you are coming up with the idea of solving a specific problem in a specific way. Some examples of tenets are:

  • Time-to-market: you’ve done a user research analysis and you’ve realised that you’re way behind your competitors on a specific functionality. Whatever the functionality is, you want to implement a solution that is mindful of the time needed to implement
  • Cost: your team is running short of engineering capacity or your company cannot afford as many servers as you need. You need to prefer a solution that has a lower cost than the other
  • Availability: having a system that is always available requires some mechanisms like redundancy or replication. This has a cost. If availability is not the most important thing, you should call that out. If it is, you should still call that out too.
  • Flexibility: let’s say you’re building a new feature that consists of an insights dashboard. This can contain many types of insights, and there may be new insights for the future. You may need to enlist in your principles the need of designing the system in a flexible way, that allows future customisation without having to rewrite the code.

Every problem has more than one solution

Every problem has more than one solution, but which one is the best? To know that, that is why the tenets I’ve just explained are important. But one essential thing to do when mastering a design doc is to present in your doc all the most important potential solutions that you’ve found out. And most importantly, you need to be able to define the pros and cons of these solutions. With that, you should also express which one is your recommended one. This approach works very well! It’s an incredible booster for a constructive conversation around the design doc. Once, I presented a doc with three solutions, and I proposed one of them. By driving the discussion with the team, one of my colleagues raised a valid point / idea that she came up with while reading one of the non-recommended solutions. Without writing it, probably she would have never realised it.

Distinguish high-level details from detailed design

I’ve seen so many meetings with managers or directors driven by engineers where they would talk about specific chosen programming languages, or class names or chosen frameworks. If you do so, you’re a bit out of context. Please, know your audience when you’re presenting something, adapt your language to their needs. If you’re talking to a director, you probably want to focus on business values and high-level details. If you’re talking with the engineering team, they’ll probably want to know the detailed design you’ve defined.

Think Big!

When I approached presenting this project, I had honestly thought 10% of what could have been actually done. The problem is that I had not asked myself the right questions. For example, I didn’t think of how many services we could have integrated with; and I did not do an exhaustive market research (which eventually led us to find the perfect team in AWS to collaborate with).

The secret of delivering the right value for customers is to think big. I like to mention Henry Ford: “If I had asked people what they wanted, they would have said faster horses.” Thinking big is the key to success. If you are planning to do A, leave yourself some headroom and plan also for B and C. Of course, you don’t have to think big just for the sake of it. Most everything you do should make sense and bring value for your customers. And please, know who your customers are.

Thinking big actually takes time, that’s why most of the time we tend to skip that part. It takes effort, it’s like writing a novel. But you can only gain from it: you’ll create more opportunities for yourself, your team and maybe, who knows, your org.

There are multiple ways you can think big. My favourite one is: socialise your ideas. Every time I talked informally with my colleagues I always realised something I was missing. I genuinely think that I abused those chats to guarantee a successful project. There’s not only that. If you want to think big, you must have uncomfortable conversations, with people that have a completely different background than yours. I can prove you this works with reductio ad absurdum: if you were about to always talk with a copy of yourself, you wouldn’t learn anything new. Different point of views are crucial.

I hope you enjoyed this small note. I write as a hobby and I am happy to share my experiences. If you’re interested, have a look at my other notes.

--

--