WA 2020 #3: Conway's Law of Thread Pools

The mobile app I'm helping to build at work has stunning number of threads and it's been complained by the system health related teams, who are monitoring the apps built in the organization and alert their developers as needed.

One day, an engineer from a neighboring team filed a bug that said "the app has too many threads". I was furious - Their teams is responsible for large part of it, and you just throw the shit to us and ridicule us? What a rude guy. I was furious also because their team has been contributing to the app unstability for a long time and debugging it has wasted a lot of my time. But this kind of them-vs-us mindset never solves any issue, so I just left the bug open and moved on (which doesn't solve any issues either, literary, but at least I don't make it worse.)

People casually create thread pools. In many cases that's not a problem. Each thread isn't that expensive after all. However, if a lot of teams who are developing compute-intensive features come together and put their efforts into a single app, each increment is gradually harming the performance.

How expensive are threads? Talking about thread overhead, the first thing people care about is the memory. Thread does allocate some memory, but the actual number is not clear. In addition to the linux thread, an ART thread is attached to each thread on Android.

In practice though, the thread creation latency becomes more like a problem than the thread heap consumption. The thread creation is serialized, blocking other threads. Even without that, it does a lot at the beginning of the thread lifetime, and its CPU consumption is visible in a startup trace data.

In short: Threads can be heavy in memory, but thread creation latency is more evident harm.

Why do teams create itstheir own thread pool instead of sharing it? Because it's easier. Creating a new one is one liner but passing around thread (or Java Executor) needs some plumbing, especially it crosses the team boundary. Also, it is often safe to have a new one if you need specific semantics, like serialized execution on a single thread. Java doesn't have a way to express these characteristics of an Executor.

If you think a bit more, you would realize that you can implement the serialized semantics over an existing thread pool. Guava's SequentialExecutor implements it for example. People just don't care. Another evidence that people don't care is that everyone creates the thread pool with number of thread as the device's CPU count. What a f*** - Multiply it with the number of thread pools. How does it possibly make sense. (I know it does in a few occasions, but it doesn't help me from the complaints from the other side.)

Java or Android could have had some API that returns an Executor with desired property but is backed by a system-(or process-)wide thread pool. That didn't happen. Whom to curse?

Using native (C++) code in the app makes the situation way worse: There is no obvious way to share threads (or thread pools) between C++ and Java. As a result, same people create separate thread pools in C++ in addition to ones in Java.

The number of thread is now the number of CPU cores multiplied by the number of participating teams plus the number of async-but-serialized executions multiple by the number of programming languages. This is crazy. And that's why the number of threads in our app is crazy large.

Solving this problem is technically trivial: You can build a cross language thread pool library that has thread sharing in mind. But it's more about organizational problems: You have to convince other teams to use that library. You have to demand other teams extra work and complexity. You have to convince that your design is right, eschewing the endless bikeshedding, while you have deadlines, as well as do others.

I don't have energy to go through all of these, but probably I should start building something small and start using it in our own codebase, then go to the rude neighbor, and then pitch it to the research org, etc, etc... Oh god it's depressing. Stop thinking too much and just start small. Later.

Another side of me appreciates where I am - It's much better to have at-least-partially-technical problems stacking up in front of myself, instead of having the need to looking around to find even niche problems to solve, or having a not-at-all-technical problems in front of, but not reachable from, me. For me a hired professional, problems I have are bread and bacon.