What I Learned at Bestow
I have never been at a startup in a phase of "hyper growth" prior to Bestow, so there are naturally a lot of technical and team-oriented lessons that I learned. Thanks to a great engineering manager, I also learned some things about myself and managed to grow personally, but this post will primarily focus on useful lessons that I intended to take forward into future jobs.
Note that these are personal lessons that I am taking forward. You may disagree with them, or you may find that they are more strongly stated than is reasonable. If you feel strongly enough, feel free to email me.
Part One: Organizational Lessons
We grew a lot at Bestow, and made plenty of mistakes. This is an attempt to capture some of the things we should or should not have done at the broad, cross-team level, or in relation to the organization at large.
Don't Split Languages
If your core product is in one language and is still under development, but you'd like to move towards another language, either commit to rewriting the core product or commit to sticking with the original language. Avoid the "strangler pattern" if at all possible. What is likely to happen is that the team responsible for developing and improving the primary application will continue adding features and complexity to that application, and it will never actually get strangled. This will lead to a split in your engineering teams between the people working on the "old" language and the people working in the "new" language. Cross-team communication, knowledge-sharing, and collaboration will suffer. Teams are more likely to develop an "us vs. them" mentality from either side. Engineers are harder to allocate where needed because they lack the appropriate language skills. Hiring becomes more complicated because you either get someone who can only be useful on one side of the project or you have to look for people who know both languages.
If you do want to switch languages while the core project is still under active development, commit to rewriting it then and there. If you cannot commit at that time, wait until the product has reached a point of stability before starting the process of switching languages. If you are in an organization where "stability" is a pipe dream because of constant feature pressure, it is probably best to just stick with the original language and invest in making that better.
This is not a mark against the strangler pattern in general. For a legacy system, it works quite well.
Think Long and Hard Before Introducing Microservices
Microservices for microservices' sake are a mistake. Every microservice creates an infrastructural and logistical burden, and poorly drawn microservice boundaries can make encapsulation of business logic within a given service impossible. This is especially a problem when said microservice boundary winds up being split across a language or team boundary, as well.
Microservices can make sense when different domains of business logic are truly encapsulated, but make damned sure they are before splitting things out. Like module dependency graphs, microservice dependency graphs work best when they are directed and acyclic. Default to fewer, larger services, and split things out as the boundaries become obvious.
Define "Done" for Microservices
It's easy to get a microservice up to the point where it returns the right outputs for the right inputs, and it's easy to call it done at that point and move on, but resist the temptation. For a microservice to be considered done, it should:
- Have a sufficient automated test suite
- Be load tested and have its resource requirements for a given load documented
- Expose metrics for monitoring and observability
- Generate all of the appropriate events and hook in to the organization's data pipelines
If this isn't all done while the microservice is under active development, context will be lost and you will waste time doing it later. Even worse, you may put the microservice into production in a critical path without any load testing or planning for scale. You may also find that deploying your microservice winds up decreasing organizational visibility if you aren't pushing the appropriate data into the appropriate pipelines.
If you feel as though this is too much work just to get a microservice out there, consider not using microservices!
Be Vigilant about Culture
Firing people can be tough, but like it or not, hiring and firing are the most powerful levers available at the executive level in order to effect organizational change. It is critical to recognize early when a hire has been made, particularly in a leadership position, who does not align with the culture of the organization. When this happens, if the situation cannot be rectified quickly (within a month), compassionately split with the person. Keeping a leader who is toxic or who is actively sabotaging the culture does immeasurable damage over time.
Work for Technical Leaders
Leadership being technical is, in my experience, an important factor in their building a good engineering team. Bestow is the only company I have worked for with non-technical founders. When I joined, there was an excellent VP of Engineering, who built a great engineering culture, but once he left the steady and slow downturn began. Be careful joining companies that do not have technical people at the founder/executive level.
No matter how good and well-aligned your team/group/department is, if it is out of sync with the philosophy and values being demonstrated at the top of the organization, it will eventually become more like the organization, rather than the other way around. This may not be true at very large companies where there is a huge separation between top-level leadership and line-level employees, but it is definitely true at smaller companies.
Actively Invest in DevX
Developer experience is critical to the smooth operation of an engineering team. Beyond a certain size, it makes a lot of sense to have a full-time team devoted to DevX. At smaller sizes, ensure that engineers have dedicated time to focus on making the environment easier to work worth, improving test suites, simplifying CI pipelines, and so on.
Part Two: Technical Lessons
I'm always learning new things whenever I build new things, so here's a selection of lessons I learned while building things at Bestow.
Keep the Strangeness Budget in Mind
One of the concepts that went into the design of Rust is the idea of a "strangeness budget", a theoretical limit on the amount of "weirdness" (read: difference from what they're used to) people are willing to tolerate when learning something new. This is an important idea, and I need to do a better job of keeping it in mind across all aspects of the stack (tooling, divergence from language idioms, etc.). Where possible choose boring technology, but also choose boring design patterns. I didn't do a great job of this at Bestow, and it made it more difficult to onboard people into certain areas of the codebase.
Strict TypeScript is Pretty Good
Strict typescript does a pretty good job of actually being typesafe and pleasant
to work with, giving some semblance of the feeling of safety and ease of
refactoring you get in a statically typed language. It's not perfect, and there
is some weirdness in defaulting to structural rather than nominal types, but
it's a sight better than nothing. Libraries like
fp-ts
can make it feel like a functional
language, but be wary of the strangeness budget.
I Dislike Go, but It's a Fine Choice for Many Projects
Go is a little like Python, in that there's always someone ready to tell you that you're not writing idiomatic Go. Unlike Python, some of these idioms are irrevocable, baked in components of the language (e.g. capitalization of private vs. public symbols), so you can't really opt out of them. Go is simply not my kind of language philosophically: I'd rather have a nice abstraction for deleting an item from a list than copy and paste error-prone list-deletion logic all over my codebase, manually checking for errors in every single function is an insane exercise in self-flagellation, and the documentation is terrible (no, auto-generated API docs are not a substitute for proper documentation). That being said, Go is performant, it's got a halfway decent type system, and goroutines are cool. I think generics are likely to help a fair bit with my complaints, and maybe one day Go will get sum types. In the meantime, it's a totally fine choice for writing an application unlikely to need much in the way of abstraction, e.g. bog-standard webservers doing the standard dance of marshalling and unmarshalling data.
Part Three: Python Lessons
Bestow was my fourth job working extensively in Python. The application I was working on is the largest Python application I've maintained, and the team working on it was by far the largest team working on one Python codebase that I've managed. This was a great opportunity to learn a few lessons about Python at scale, which are enumerated below.
Don't Use Python
At this point I'm convinced that, for most typical startups, Python is a poor fit for three reasons: scale, scale, and scale. Its performance makes scaling your traffic difficult, its the lack of type safety and cavalier attitude to backwards compatibility make scaling your codebase difficult, and its idioms make scaling your team difficult.
Python's performance suffers at any reasonable scale, and so you will inevitably find yourself either halting feature work to focus on improving performance or throwing lots of money at your cloud provider so you can host an infinite fleet of python containers. Of course, you will eventually hit performance bottlenecks in any language, but this happens quite early with Python, and needs to be revisited constantly. Typical synchronous Python web frameworks also in my experience are really difficult to optimize in a containerized environment. YMMV, but the constant struggle of trying to optimize containers, threads, and processes for uWSGI or gunicorn or whatever has been a constant frustration. Of course, you can write performant Python. It's just difficult, and your time would be better spent working on features.
In addition to the problems with traffic scaling, Python is a hard language to use for large, complex systems. Type safety is approximately achievable with mypy, but trying to write typesafe Python code leads to writing relatively unidiomatic code, since lots of things wind up needing to be represented in typed classes rather than dicts (named tuples, typed dicts, and the new dataclass stuff helps with this somewhat). We also made an attempt to increase the actual safety by using some runtime type checking via metaclasses that used the mypy type annotations. This worked well, but the performance hit was substantial. Meanwhile, using Python without type checking is almost a non-starter for projects of any significant size. It's impossible to be disciplined enough, to code defensively enough, or to test well enough to avoid runtime type errors.
Beyond typing, another contributor to the difficulty of running a big Python codebase at scale is the constant stream of breaking changes in every new "minor" Python version. Upgrading from 3.5 to 3.6 to 3.7 and so on is almost always a few days of work in a codebase of any substantial size, and the Python core team is fairly aggressive about "end-of-life-ing" old versions. Barring security fixes, this would not be a huge problem, but third-party libraries tend to require the latest and greatest, so you're constantly having to run on the upgrade treadmill if you want to use the Python ecosystem (which remains one of the best things about Python). Contrast this with Rust, where I have never had code fail to compile when upgrading the compiler version, and even upgrading major editions from 2018 to 2021 required no changes.
Finally, while it's easy to hire Python engineers, it is quite difficult to hire Python engineers who can immediately and effectively contribute to a large project. Even though conventions that help with scale like mypy are becoming common in the Python community, most people are still writing standard Python in smaller applications or scripts. As such, bringing people onboard if you're using type annotations means paying the extra cost of teaching people how to write typed code. You may also find yourself needing to teach people whatever architectural patterns you're using to enable the project from degrading into a giant ball of mud. Of course, some amount of this will be necessary in any language, but with Python you may also find yourself fighting a weird No True Scotsman battle where whatever way you or your company writes Python is not idiomatic enough and therefore needs to be changed in order to comply with the One True Python as handed down by Guido himself or whatever.
This doesn't even get into the ridiculous mess that is the Python packaging ecosystem and the difficulty of ensuring reproducible builds.
Of course, there are still places where Python still makes a lot of sense, the main one being data science, but for most other domains, I can think of a better fit. Generally, one of typescript, go, or rust should be your pick if you're outside of the .NET/Java ecosystem. For correctness and performance, bias towards Rust or Go, in that order. For quick hiring and onboarding, bias towards TS or Go, in that order.
If you Do Use Python, Go Async
There's really no reason not to use async for Python webservices. The performance is more predictable, and async services have more predictable performance in a containerized environment. Django supports async, and there are a substantial number of new async frameworks. FastAPI with pydantic looks particularly nice, and is probably what I would use if I were starting a new project today on a small, experienced team.
Don't Use Flask
This is cheating a little bit, because I already learned this lesson at previous jobs. Flask is a "micro-framework," providing request routing and some basic request/response handling and not much else. To make a real webservice, you'll need to venture out into the vast world of flask plugins, which are third-party packages with varying degrees of maintenance and quality. Some have strong opinions about how your application should look, and sometimes their opinions diverge. You'll often be in a position of realizing too late that some third-party plugin is no longer maintained or doesn't work as well as you thought it would, but its assumptions have managed to burrow their way throughout your codebase, so replacing it becomes quite a chore. You also will need to make a lot of decisions early on about which plugins to use. This can all be okay with an experienced team where everyone also has experience working with flask and knows in advance how and why to hide plugins behind interfaces whenever possible, but it gets out of hand quickly with an inexperience team. Be particularly careful about deciding to take over an existing Flask project from a team of unknown quality. Almost always, the better option is to use a fully-featured framework like Django. You may not agree with every design decision, but at least you know that all of the various pieces are officially supported and will be regularly updated if needed.