Solid Queue first impressions: Nice!

Solid Queue was released yesterday, a new relational-database backend for Active Job.

I’m the author of GoodJob and I’ve been following along and am very interested and excited about Solid Queue. These are some things I noticed when first going through it.

tl;dr; It’s nice! I learned some things. It makes a few different choices than GoodJob and I’m very curious how they turn out (in a good way!).

I admit, I didn’t run Solid Queue in production. I poked through the code, got the development environment set up (spent the majority of my time trying to get mysql2 to compile, which is no surprise for me; Trilogy is my day job), ran the test suite and tried TDDing a new feature for it: perform_all_later support. These are just my notes, something to refer back to.

Lots of database tables: Solid Queue has many database tables. There is a “hot” table (my terminology) in which job records are queued/dequeued/locked from by all the workers, and then several other tables where job records are staged (e.g. they’re scheduled in the future, so don’t insert them into the hot table yet) or archived after they complete/error. This seems smart because that hot table and its indexes stays compact and single purpose, which is good for performance. Compare that to GoodJob in which the jobs table has like 8 indexes to cover both queue/dequeue and Dashboard listings and everything else, which does slow down inserts and updates. I’ve had the impression with GoodJob that orchestrating across multiple tables would be more difficult (everything is tradeoffs!), so I’m very curious to see an alternative implementation in Solid Queue.

Note: I wasn’t successfully able to implement perform_all_later in my 1 hour timebox because it was more complicated than an insert_all because of the necessity of writing to multiple tables.

Aside: One of the very first comments I got when I launched GoodJob 3 years ago was like “your design assumptions are less than ideal” and then they never replied to any of my follow-ups. That sucked! This is not that. Nothing in Solid Queue is particularly concerning, just different (sometimes better!). Kudos to Rosa Gutiérrez and the Solid Queue developers; you’re doing great work! 💖

Again, lots of database tables: GoodJob is easy mode just targeting Postgres, because there are Advisory Locks and lots of Postgres-only niceties. I do not envy Solid Queue being multi-database, because it has to implement a bunch of stuff with a coarser toolbox. For example, there is a semaphores table, which is used for the Concurrency Controls feature (🎉). I think the “SOLID” libraries (also Solid Cache) are interesting because they have to implement behavior in a relational database that come for free in in-memory databases (example: TTL/record expiration/purging).

Puma Plugin: TIL. Looks nicer and more explicit than GoodJob trying to transparently detect it’s in the webserver to run asynchronously

Multiprocess. A nice surprise to me, Solid Queue has a multiprocess supervisor. It does seem like even the Puma plugin forks off another process though; that could have implications for memory constrained environments (e.g. Heroku dynos). I’m nearly 4 years into GoodJob and haven’t tackled multiprocess yet, so exciting to see this in Solid Queue’s first release.

Queue priority: Nice! I have opinions about how people set up their application’s queues, along the lines of: many people do it wrong, imo. Solid Queue looks like it provides a lot of good flexibility to let people easily migrate and configure their queues initially (though wrongly, by dependency, imo), but then reorient them more performantly (by latency, again imo). A single thread-pool/worker can pull from multiple queues.

Care. I notice lots of little things that are nice in Solid Queue. The code is clean. The indexes are named for their purpose/usage rather than just like index_table_column_column. The Puma Plugin is nice. There are things in GoodJob that I dream about what a clean-room, lessons-learned reimplementation would look like, but it’s never top of my priorities, and some things are never going back in the stable (table names are basically forever). Reading the Solid Queue code was a vicarious-nice! experience.

Differences. Do they even matter? I dunno:

  • No Dashboard yet. Waiting on Mission Control. GoodJob definitely got more twisty as I learned all of the things of “you want a button to do what now with those jobs? …oh, I guess that makes sense. hmm.”
  • No LISTEN/NOTIFY (yet?). Seems possible, but would be Postgres only so maybe not. That means latency will never be less than the polling frequency, though an example shows 0.1 seconds which seems good to me.
  • No cron-like functionality. It took me a minute to come around to the the necessity of this, maybe Solid Queue will too. 🤦 I missed this on first read through: “Unique jobs and recurring, cron-like tasks are coming very soon.” 🙌

Final thoughts: Path dependency is hard, so I don’t imagine lots of people should swap out their job backend just because there is something new (please, don’t let me ever read a “3 job backends in 4 years” blog post). New projects and applications will be more likely making these choices (and they shouldn’t be valueless choices, hence my excitement for Solid Queue becoming first party to Rails) and I’m really excited to see how Solid Queue grows up with them, and alongside other options like GoodJob and Sidekiq and Delayed et al.


Recently

  • One of my coworkers said this week “You’ve been an engineering director and in leadership before, right? I appreciate your perspective; any advice and resources you’d recommend?” So that set my mind racing. I dunno. On one hand, it’s like, well, first, you grind out 10 years of 1-year of experience 10 times, but do it 50 times a year. On the other, keep a delta file and I also think about Secrets of Consulting quite a lot (content warning: I haven’t re-read it in a long time; I tried reading the same author’s The Psychology of Computer Programming more recently and couldn’t do it).
  • Work otherwise is in the final marathon of promo packets and performance reviews and quarterly planning and a reorg and oh, the next version of Ruby is released in 3 weeks and it’s go time. Then we do it all again. I love my team so much.
  • I finished reading The Final Architecture series. I didn’t enjoy it as much as Children of Time (“the octopus books”). After reading all the Gateway books and 3-body problem books, I’m a bit over the idea that there’s a malicious (or at least self-interested) group of people who are unhappy with the current value of the Planck constant and are doing something about it. I was into the subplot of alien criminality.
  • I finished Talos Principle 2, but I screwed up the golden gates thing, so I have to beat it again just to get the special special ending. I’ve been playing it in parallel with a friend and appreciate our back and forth:

    Me: I’m personally more worried about environmental catastrophe than AI, but i guess they’re intertwined. Material conditions that are unfit for life. Like some of the talos robots seem to touch on my philosophical question which is like: how do we maximize individual agency+satisfaction while also avoiding collective/systemic fucking-around-and-finding-out.

    Friend: we can see from the game the answer lies somewhere on the spectrum between having 1000 robits around a crumbling power source vs having a magic 3D printing pyramid for use to conquer the stars

    I also started playing Talos 1 and it’s much less chill than the 2nd game. I may not finish it.

  • For GoodJob, all of the things I want have been labeled “Help Wanted”. I do want to get the row-locking foundations in place myself, though I think the safe upgrade path for it might take a little while to straighten out. I think I have finally mastered advisory locks, so, of course, that means change it all up.
  • I ran bundle update on Day of the Shirt, which means I also upgraded to Shakapacker, which means that I have, once again, spent an entire weekend fumbling with Webpack configuration to get window.$ working. I also got a nice email from the owner of a t-shirt website that validated my thesis that no one visits websites anymore, let alone to buy t-shirts: the website owner got a (different) full-time job.

The Rails Executor: increasingly everywhere

The Rails Executor rules everything around you your code.

If you write multithreaded-Rails code—like me, author of GoodJob—you’re probably familiar with the Rails Executor which is described in the Rails Multithreading Guide.

If you’re new to the Rails Executor: it sets up and tears down a lot of Rails’ framework magic. Code wrapped with a Rails Executor or its sibling, the Reloader, pick up a lot of powerful behavior:

  • Constant autoloading and reloading
  • Database connection/connection-pool management and query retries
  • Query Cache
  • Query Logging
  • CurrentAttributes
  • Error reporting

You usually won’t think about it. The Rails framework already wraps every Controller Action and Active Job with an Executor. Recently, as of Rails v7.1, it’s showing up everywhere within the Rails codebase:

The effect of these small changes could be surprising:

  • I came to write this blog post because I saw a Rails Discussion asking how “Rails 7.1 uses query cache for runner scripts” and aha, I knew the answer: the Executor.
  • I recently fixed a bunch of flaky GoodJob unit tests by wrapping each RSpec example in a Rails Executor. This is a problem specific to GoodJob, which uses connection-based Advisory Locks, but I discovered that if an Executor context was passed through (for example, executing an Active Job inline), the current database connection would be returned to the pool, sometimes breaking the Advisory Locks when a different connection was checked back out to continue the test. This was only a fluke of the tests, but was a longtime annoyance. I’ve previously had to work around a similar reset of CurrentAttributes that occurs too.
  • At my day job, GitHub, we’ve also been double-checking that all of our Rails-invoking scripts and daemons are wrapped with Rails Executors. Doing so has fixed flukey constant lookups, reduced our database connection error rate and increased successful query retries, and necessitated updating a bunch of tests that counted queries that now hit the query cache.

The Rails Executor is great! Your code is probably already wrapped by the Rails framework, but anytime you start writing scripts or daemons that require_relative "./config/environment.rb" you should double-check, and definitely if you’re using Thread.new, Concurrent::Future or anything that runs in a background thread.

I used the following code in GoodJob to debug that database connection checkout occurs in a Rails Executor, maybe you could adopt something similar too:

# config/initializers/debug_executors.rb

ActiveSupport.on_load :active_record do
  ActiveRecord::ConnectionAdapters::AbstractAdapter.set_callback :checkout, :before, (lambda do |conn|
    unless ActiveSupport::Executor.active?
      $stdout.puts "WARNING: Connection pool checkout occurred outside of a Rails Executor"
    end
  end)
end

One last thing about Executors, you want to make sure that you’re wrapping individual units of work, so the execution context has a chance to reset itself (check-in database connections, unload and reload code, etc.):

# scripts/do_all_the_things.rb
# ...

# bad
Rails.application.executor.wrap do
  loop { MyModel.do_something }
end

# good
loop do
  Rails.application.executor.wrap { MyModel.do_something }
end

Update: I offered a Rails PR to make the script runner’s Executor conditional because the introduction of an Executor around bin/rails runner script.rb could introduce problems if the script is long-running/looping/daemon-like; developers would still need to use an Executor, but to wrap individual units of work in their longrunning script.


Reflections on GoodJob for Solid Queue

Rails World presents Solid Queue and Mission Control

GoodJob, via its introductory blog post, was highlighted last week at Rails World. A new Active Job queue backend, Solid Queue, was announced, and I’m excited to see where it goes!

I attended Rails World in Amsterdam this past week. During the conference, a new Active Job backend was announced: Solid Queue (video), which has the potential to become first, first-party backend in Rails. Solid Queue, like my GoodJob, is backed by a relational database. I’m very excited about this! I had a chance to talk to Rosa Gutierrez, who is leading the effort at 37signals, and I’m hopeful that I’ll be able to contribute to Solid Queue and who knows, maybe it could even become a successor to GoodJob.

With that thought in mind, I reflected on some of the design and motivations that became GoodJob, and that I believe are important regardless of the Active Job backend under development. These are not intended to be design documents but more a list of things that I have learned or come across during my 3 years working on GoodJob. It would be nice to keep these in mind when designing a potential successor to GoodJob. And I hope they can be the seed to further conversations, rather than a fully realized proposal or argument. Let’s go:

  • SIGKILL Safety. Recovering from a SIGKILL (or someone unplugging the power cord) is always number one in my mind when thinking of GoodJob. That informed my desire to use Advisory Locks (which are automatically released on disconnect), and my future thinking about heartbeats if GoodJob switched over to using FOR UPDATE SKIP LOCK instead of Advisory Locks. I do not think jobs should be limited to a specific timeout (as Delayed Job’s design uses) as that also creates significant retry latency when resumed, and jobs definitely shouldn’t be wrapped with a transaction either.
  • (Human) Exception and Retry Workflows. Everybody has a different workflow for how they deal with errors, and I believe that a backend needs to track, report (e.g. send to Sentry or Bugsnag) and expose the various reasons an error appears: retried, retry stopped, explicitly discarded, SIGKILLed/interrupted, unhandled error, etc. I still am dialing this in on GoodJob because there is wide variability of how people and teams manage their error workflows. I’m always learning something new. For example, there are very different answers on “when using retry_on SpecialError, attempts: 3 should the 4th error be reported to the exception tracker? What about an explicit discard_on? Should a discard_on error be reviewed and reenqueued or not?” If a job is SIGKILLed/interrupted, should it be automatically restarted or held for manual review? Everyone seems to do it differently! I haven’t cracked the code on what is “ideal” or reasonable to say “nope, don’t do it that way.” Active Job’s error handling isn’t clear cut either, so maybe we can make that better and come around to a more opinionated (but still inclusive) design. Maybe!
  • Process Harnesses. I think it’s interesting that Rails might ship with a 1st party queue backend before it ships with a 1st party webserver: there is a lot of operational overlap. Signal handling, timeouts, daemonization, liveness and healthcheck probes, monitoring and scaling instrumentation. There’s quite a lot of ground to cover, and a lot different systems and tooling: Kubernetes, systemd, rc.d, Heroku, Judoscale, to name just a few of the various operational targets that I’ve spent considerable time supporting.
  • Repeating Jobs / Clock Process. It took me a while to come around to this in GoodJob, but I believe that performing work repetitively on a schedule (“cron-like”) is very much in the same problem-domain as background jobs. There’s lots of different ways to design it that I don’t feel strongly about, for example GoodJob minimizes autoloading by keeping schedules separate from job classes, but I do think it is necessary to plan for scheduled jobs in a well-architected Rails application.
  • Unique Jobs, Throttles, Fuses and other Concurrency Controls,. Similarly to Repeating Jobs, demand is high for everything I’d bucket under “concurrency controls”, which I’ll say covers both enqueue and dequeue complexity. And these features are tough because they sit in counterbalance to overall performance: do you want to run jobs faster or smarter? And these are the features that I think are legit because there are other features below under Queue Design that I think are bunk. There’s a lot of discernment to do!
  • Queue design and multi-queue execution pools. I do think queue design is a place where lots of people do it wrong. I believe queues should be organized by maximum total latency SLO (latency_15s, latency_15m , latency_8h) and not by their purpose or dependencies (mailers, billing, api). Nate Berkopec believes similarly. And I think that informs that execution pools (e.g. thread pools) should be able to work from multiple queues and have independent concurrency configuration (e.g. number of threads), both to ease transition from the latter to the former, but also because it allows sharing resources as optimally as possible (having 3 separate pools that pull from "latency_15s", "latency_15m, latency_15s", and "latency_8h,*" in GoodJob’s syntax). I personally think concepts like priority or ordered-queues lead to bad queue design, so I wouldn’t sweat that. Any ordering regime more complex than first-in-first-out (FIFO) prioritizes capacity (or lack thereof) over latency. This might sound strange coming from me who champions running workloads in the webbrowser on tiny dynos, but it’s different in my mind: I don’t think it’s possible to meet a latency target through prioritization when there is a fundamental lack of capacity.
  • Labels. Per the previous point, though I have yet to implement this in GoodJob (soon!), I think that giving developers the option to label their jobs might break their bad habit of using queue names as functional labels, instead of what I believe queues should be appropriately used for: latency and quality-of-service thresholds. I mention it here just in case that informs Solid Queue’s design.
  • Observability. GoodJob maintains a lot of bookkeeping, keeping job and granular execution data around after execution so it can be inspected. People seem to like that, and it’s necessary to keep them around for calculating accurate latency metrics, though it all is a trade-off against performance. It makes for a fun Web Dashboard too.
  • Performance Envelope. I dunno, I mention this just because I think people spend an inordinate amount of time comparing queue backend performance and asking “do the jobs go brrrrr?” GoodJob targets the small and medium end of projects (though some big ones use it too) and prioritizes operational simplicity over performance. That works for me (and a lot of others!) but also isn’t really reflective of the scale of companies leading Rails development. There’s a tension here.
  • Making better mistakes tomorrow. I’m really proud of having a reputation for being helpful and responsive and curious in the GoodJob issue queue and discussions and various support Slacks (like Rails Link). I think there is a lot to the queue backend domain that won’t be learned by upfront analysis, and that can’t be easily bucketed into either “the library is doing it wrong” or “the developer is doing it wrong” There’s a lot of variation! (not to mention across JRuby,etc. and various database versions). I’m able to do things with GoodJob that I think is unlikely on a 1st party Rails queue backend (like cutting a new release after every patch and fix), and I’m able to stay oriented to the people and the problem they’re trying to solve over the technological solution itself. I hope all that can be preserved as these things move upstream.

That’s it! I’m probably forgetting stuff, so I’ll reserve the right to keep adding to this list. I’d love to keep talking about this and hope that Solid Queue will be fantastic!

Oh, and Solid Queue isn’t released yet, so if this seems compelling, use GoodJob in the meantime.


Writing Object Shape friendly code in Ruby

Update: Jean Boussier wrote a deeper explaination of how Ruby Object Shapes are implemented (and more up-to-date for Ruby 3.3, unreleased as of October 23, 2023) and when and how to optimize for them.

My rule of thumb is that one or two memoized variables in a class are fine, but more than that likely deserve a quick refactor.

My original post is below…

Ruby 3.2 includes a performance optimization called Object Shapes, that changes how the Ruby VM stores, looks up, and caches instances variables (the variables that look like @ivar) . YJIT also takes advantage of Object Shapes, and the upcoming Ruby 3.3 has further improvements that improve the performance of Object Shapes.

This is a brief blog post about how to write your own Ruby application code that is optimized for Object Shapes. If instead you’d like to learn more about how Object Shapes is implemented in Ruby, watch Aaron Patterson’s RubyConf 2022 video or read this explanation from Ayush Poddar .

Big thank you to my colleagues John Hawthorn and Matthew Draper for feedback on the coding strategies described here. And John Bachir, Nate Matykiewicz, Josh Nichols, and Jean Boussier whose conversation in Rails Performance Slack inspired it.

The general rule: define your instance variables in the same order every time

To take advantage of Object Shape optimizations in your own Ruby Code, the goal is to minimize the number of different shapes of objects that are created and minimize the number of object shape transitions that occur while your application is running:

  • Ensure that instances of the same class share the same object shape
  • Ensure that objects do not frequently or unnecessarily transition or change their shape
  • Help objects that could share the same object shape (e.g. substitutable child classes) to do so, with reasonable effort and without compromising readability and maintainability.

This succinct explanation is from Ayush Poddar, and explains the conditions that allow objects to share a shape:

New objects with the same [instance variable] transitions will end up with the same shape. This is independent of the class of the object. This also includes the child classes since they, too, can re-use the shape transitions of the parent class. But, two objects can share the same shape only if the order in which their instance variables are set is the same.

That’s it, that’s what you have to do: if you want to ensure that two objects share the same shape, make sure they define their instance variables in the same order. Let’s start with a counterexample:

# Bad: Object Shape unfriendly
class GroceryStore
  def fruit
    @fruit = "apple"
  end

  def vegetable
    @vegetable = "broccoli"
  end
end

# The "Application"
alpha_store = GroceryStore.new
alpha_store.fruit # defines @fruit first
alpha_store.vegetable # defines @vegetable second

beta_store = GroceryStore.new
beta_store.vegetable # defines @vegetable first
beta_store.fruit # defines #fruit second 

In this example, alpha_store and beta_store do not share the same object shape because the order in which their instance variables are defined depends on the order the application calls their methods. This code is not Object Shape friendly.

Pattern: Define your instance variables in initialize

The simplest way to ensure instance variables are defined in the same order every time is to define the instance variables in #initialize:

# Good: Object Shape friendly
class GroceryStore
  def initialize
    @fruit = "apple"
    @vegetable = nil # declare but assign later
  end

  def fruit
    @fruit
  end

  def vegetable
    @vegetable ||=  "broccoli"
  end
end

It’s also ok to define instance variables implicitly with attr_* methods in the class body, which has the same outcome of always defining the instance variables in the same order. Update: Ufuk Kayserilioglu informed me that attr_* do not define the instance variable until they are first called, meaning that these methods or their associated instance variables should also be declared with a value in #initialize.

Now I realize this is a very simplistic example, but that’s really all there is to it. If it makes you feel better, at GitHub where I work, we have classes with upwards of 200 instance variables. In hot code, where we have profiled, we go to a negligible effort of making sure those instance variables are defined in the same order; it’s really not that bad!

Pattern: Null memoization

Using instance variables to memoize values in your code may present a challenge when nil is a valid memoized value. This is a common pattern in Ruby that is not Object Shape friendly:

# Bad: Object Shape unfriendly
class GroceryStore
  def fruit
    return @fruit if defined?(@fruit)
    @fruit = an_expensive_operation
  end
end

Rewrite this by creating a unique NULL constant and check for its presence instead:

# Good: Object Shape friendly
class GroceryStore
  NULL = Object.new
  NULL.freeze # not strictly necessary, but makes it Ractor-safe

  def initialize
    @fruit = NULL
  end

  def fruit
    return @fruit unless @fruit == NULL
    @fruit = an_expensive_operation 
  end
end

Alternatively, if you’re doing a lot of meta or variable programming and you need an arbitrary number of memoized values, use a hash and key check instead:

# Good: Object Shape friendly
class GroceryStore
  def initialize
    @produce = {}
  end

  def produce(type)
    return @produce[type] if @produce.key?(type)
    @produce[type] = an_expensive_operation(type) 
  end
end

That’s it

Creating Object Shape friendly code is not very complicated!

Please reach out if there’s other patterns I’m missing: [email protected] / twitter.com/@bensheldon / ruby.social/@bensheldon


In defense of consensus

There’s a style of reactionary meme that takes a photo of like, empty store-shelves or a trash-strewn street, and applies the image macro “This is what Communism looks like”. But upon closer inspection (and social media lampooning), it’s a photo of America, capitalist America, very much not under communism. We’ll come back to this.

Let’s talk about “consensus”. Not a week goes by in my Bay Area tech worklife where I don’t read or hear someone dragging consensus. Consensus is pilloried: weak, indecisive, lowest-common denominator, unclear, drawn out… consensus is bad, they say.

Working in tech for a decade, I have to admit this struck me as strange the first time I heard a coworker complain about that bogeyman “consensus”. I’ve been a facilitator of consensus-based practice for 13 years. These practices, taught to me through the Institute for Cultural Affair’s ToP (“Technology of Participation”) series, served me well when I was doing nonprofit and community work, serving on boards and facilitating offsites. And consensus-based practices have served me well in tech and business too: using its methods to do discovery, lead meetings, get feedback, and drive decision-making and action. I do strategic planning consultation too.

The consensus-based practices I’ve learned take a group through a process: beginning with a prompt or need, then collecting facts and inputs, understanding people’s reactions to them, their interpretations and implications, and ultimately describing a series of actions and commitments to take. This can be a simple conversation, or a multi-day event that builds fractally on itself: a preceding session’s final actions could be deciding on what will be the following session’s initial inputs. When I’m working with leaders to design the process, we’ll discuss what responsibilities we want to delegate to the group, and what decisions will be retained among leadership. Leadership remains accountable, in the sense that there is a legible decision-making process, which is a strong benefit of deliberative practice. That’s “consensus”.

Alternatively, the Bay Area tech process, not “consensus”, oh no, seems to follow these recipes:

  • Plan and socialize
  • Disagree and commit

I was introduced to “plan and socialize” in my second tech job, being mentored by the Director of Engineering. To “socialize” is more than informing people, it’s having conversations and helping them understand how a plan or proposal will affect their work, and getting feedback that might lead to adjustments or compensatory actions. It’s also somewhat vague: asking people to leave comments in a google doc, attend an office hours, or a loosely moderated feedback session. Decisions, once made, are also socialized: explained, defended, adjusted, or white-knuckled through.

Depending on their power level, leaders may then ask people to “disagree and commit” meaning that the (negative) feedback has been heard but those underlings must commit to carrying the plan out regardless. Suck it up, professionally, so to speak. Sometimes this is used as performance feedback: “I’m aware you’ve been sharing your dislike of the plan with coworkers. That lack of trust is undermining the business. I need you to disagree and commit”… and keep your thoughts to yourself.

Under the spotlight, these approaches look less like bold and steely decision-making, and more like mumbly plan shifting backed by blusterful threats. Like the “this is what communism looks like”-meme, the scary-othered threat is not “consensus” but simply the current reality: confused, inadequate, probationary, triangulating, embarrassing, shameful.

There’s a joke in civic tech: government tech projects may say they can’t do incremental development, but that’s exactly what happens after their big-bang waterfall launch crashes-and-burns and they end up having to fix it one piece at a time. Clay Shirky captures it in “How Willful Ignorance Doomed HealthCare.gov”:

It is hard for policy people to imagine that HealthCare.gov could have had a phased rollout, even while it is having one. At launch, on Oct. 1, only a tiny fraction of potential users could actually try the service. They generated errors. Those errors were handed to a team whose job was to improve the site, already public but only partially working. The resulting improvements are incremental and put in place over a period of months. That is a phased rollout, just one conducted in the worst possible way.

Bay Area tech has the same relationship to decisions and consensus: by “socializing” plans and decisions, leaders are trying to craft a deliberativeu process for information sharing, feedback gathering, and alignment building. They’re simply doing it after they’ve already written and decided on an insufficient course of action and are grasping for a fix. Ultimately they are reaching for consensus, just consensus conducted in the worst possible way.

Please think of this the next time you hear (or say) something bad about consensus. Consensus is pretty great, and even better when used from the start.

The Institute for Cultural Affairs has lots of trainings on consensus-based facilitation. The Center for Strategic Facilitation is the Bay Area’s local trainer and service provider, but there are trainers and service providers all over the globe.

There is a system known as “Formal Consensus” which gained some notability during the 1999 “Battle of Seattle” WTO protests as a means of empowering small groups, particularly indigenous representatives, by providing a limited and fixed number of “blocks” during deliberations to stop actions proposed by far larger groups. Also how my buddy organized FreeGeek Chicago. I have never heard anyone in Bay Area tech reference any of this in regards to what they mean by consensus.


Appropriately using Ruby’s Thread.handle_interrupt

Working on GoodJob, I spend a lot of time thinking about multithreaded behavior in Ruby. One piece of Ruby functionality that I don’t see written about very often is Thread.handle_interrupt. Big thanks to John Bachir and Matthew Draper for talking through its usage with me.

Some background about interrupts: In Ruby, exceptions can be raised anywhere and at anytime in a thread by other threads (including the main thread, that’s how timeout works). Even rescue and ensure blocks can be interrupted. Everywhere. Most of the time this isn’t something you need to think about (unless you’re using Timeout or rack-timeout or doing explicit multithreaded code). But if you are, it’s important to think and code defensively.

Starting with an example:

thread = Thread.new do 
  open_work
  do_work
ensure
  close_work
end

# wait some time
thread.kill # or thread.raise

In this example, it’s possible that the exception raised by thread.kill will interrupt the middle of the ensure block. That’s bad! And can leave that work in an inconsistent state.

Ruby’s Thread.handle_interrupt is the defensive tool to use. It allows for modifying when those interrupting exceptions are raised:

# :immediate is the default and will interrupt immediately
Thread.handle_interrupt(Exception: :immediate) { close_work }

# :on_blocking interrupts only when the GVL is released 
# e.g. IO outside Ruby
Thread.handle_interrupt(Exception: :on_blocking) { close_work }

# :never will never interrupt during that block
Thread.handle_interrupt(Exception: :never) { close_work }

Thread.handle_interrupt will modify behavior for the duration of the block, and it will then raise the interrupt after the block exits. It can be nested too:

Thread.handle_interrupt(Exception: :on_blocking) do
  ruby_stuff
  
  Thread.handle_interrupt(Exception: :never) do
   really_important_work
  end
  
  file_io # <= this can be interrupted
end  

FYI BE AWARE: Remember, the interrupt behavior is only affected within the handle_interrupt block. The following code has a problem:

thread = Thread.new do 
  open_work
  do_work
ensure
  Thread.handle_interrupt(Exception: :never) do
    close_work
  end
end

Can you spot it? It’s right here:

ensure
  # <- Interrupts can happen right here
  Thread.handle_interrupt(Exception: :never) do

There’s a “seam” right there between the ensure and the Thread.handle_interrupt where interrupts can happen! Sure, it’s probably rare that an interrupt would hit right then and there, but if you went to the trouble to guard against it, it’s likely very bad if it did happen. And it happens! “Why puma workers constantly hung, and how we fixed by discovering the bug of Ruby v2.5.8 and v2.6.6”

HOW TO USE IT APPROPRIATELY: This is the pattern you likely want:

thread = Thread.new do 
  Thread.handle_interrupt(Exception: :never) do
    Thread.handle_interrupt(Exception: :immediately) do
      open_work
      do_work
    end
  ensure
    close_work
  end
end

That’s right: have the ensure block nested within the outer Thread.handle_interrupt(Exception: :never) so that interrupts cannot happen in the ensure, and then use a second Thread.handle_interrupt(Exception: :immediately) to allow the interrupts to take place in the code before the ensure block.

There’s another pattern you might also be able to use with :on_blocking:

thread = Thread.new do 
  Thread.handle_interrupt(Exception: :on_blocking) do
    open_work
    do_work
  ensure
    Thread.handle_interrupt(Exception: :never) do
      close_work
    end
  end
end

Doesn’t that have the problematic seam? Nope, because when under :on_blocking there isn’t an operation taking place right there would release the GVL (e.g. no IO).

But it does get tricky if, for example, the do_work is some Ruby calculation that is unbounded (I dunno, maybe a problematic regex or someone accidentally decided to calculate prime numbers or something). Then the Ruby code will not be interrupted at all and your thread will hang. That’s bad too. So you’d then need to do something like this:

thread = Thread.new do 
  Thread.handle_interrupt(Exception: :on_blocking) do
    open_work
    Thread.handle_interrupt(Exception: :immediately) do
      do_work
    end
  ensure
    Thread.handle_interrupt(Exception: :never) do
      close_work
    end
  end
end

See, it’s ok to nest Thread.handle_interrupt and likely necessary to achieve the safety and defensiveness you’re expecting.


Fake the algorithm til you make it

Almost exactly a decade ago I worked at OkCupid Labs where small teams (~2 engineers, a designer, and a fractional PM) would build zero-to-one applications. It was great fun and I worked mainly on Ravel! though bopped around quite a bit too.

With small teams and quick timelines, I learned a lot about where to invest time in early social apps (onboarding and core loop) and where not to (matchmaking algorithms). The following is lightly adapted from a bunch of comments I wrote on an r/webdev post a few years ago, asking for “Surprisingly simple web apps?”. My response was described as “one of the more interesting things I’ve read on reddit in 5 years”:

If you’re looking for inspiration, what is successful today is likely more complex than it was when it was originally launched. Twitter, Tinder, Facebook all likely launched with simple CRUD and associations, and only later did they get fancy algorithms. Also, Nextdoor, Grindr, Yelp [this was 2013].

I used to work on social and dating apps and it is all “fake it till you make it”. The “secret sauce” is bucket by distance intervals, then order by random using the user-id as a seed so it’s determinist, but still just random sort. Smoke and mirrors and marketing bluster.

You see this “Secret Sauce” marketing a lot. An app will imply that they have some secret, complex algorithm that no other competitor has. The software equivalent of “you can get a hamburger from anywhere, but ours has our secret sauce that makes it taste best”. But that can be bluster and market positioning rather than actually complexity. In truth, it’s secretly mayo, ketchup and relish. Or as I’ve encountered building apps, deterministic random.

Imagine you have a dating/social app and you want to have a match-making algorithm. You tell your users that you have the only astrologist datascience team building complex machine-learning models that can map every astronomical body in the known universe to individual personality traits and precisely calculate true love to the 9th decimal.

In truth, you:

  • For the current user, bucket other users by distance: a bucket of users that are less than 5km away; less than 25km; less than 100km; and everyone else. Early social app stuff is hard because you have a small userbase but you need to appear to be really popular, so you may need to adjust those numbers; also a reason to launch in a focused market.
  • Within each distance bucket, simply sort the users by random, seeded by the user id of the current user (Postgres setseed). That way the other people will always appear in the same order to the current user.

It works on people’s confirmation bias: if you confidently tell someone that they are a match, they are likely to generate their own evidence to support that impression. You don’t even have to do the location bucketing either, but likely you want to give people something that is actionable for your core loop.

And remember, this is really about priorities in the early life of a product. It’s not difficult to do something complex, but it takes time and engaged users to dial it in; so that’s why you don’t launch with a real algorithm.

This is all really easy to do with just a relational database in the database, no in-memory descent models or whatever. Here’s a simple recommendation strategy for t-shirts (from my Day of the Shirt), in SQL for Ruby on Rails:

For a given user, find all of the t-shirts they have favorited, then find all of the users that have also favorited those t-shirts and strength them based on who has favorited the most t-shirts in common with the initial user, and then find all of the t-shirts those users have favorited, multiply through the counts and strengths, sum and order them. There’s your recommended t-shirts:

class Shirts < ApplicationRecord
  # ...
  scope :order_by_recommended, lambda { |user|
    joins(<<~SQL.squish).order('strength DESC NULLS LAST')
      LEFT JOIN (
        WITH recommended_users AS (
          SELECT user_id, count(*) AS strength
          FROM favorite_shirts_users
          WHERE
            shirt_id IN (
              SELECT shirt_id
              FROM favorite_shirts_users
              WHERE #{sanitize_sql_for_conditions(['user_id = ?', user.id])}
            )
          GROUP BY user_id
        )
        SELECT shirt_id, SUM(strength) AS strength
        FROM favorite_shirts_users
        LEFT JOIN recommended_users ON recommended_users.user_id = favorite_shirts_users.user_id
        GROUP BY shirt_id
      ) AS recommended_shirts ON recommended_shirts.shirt_id = shirts.id
    SQL
  }
end

That’s a relatively lightweight strategy, that you can run in real-time and if there is enough engagement can appear effective. And if you don’t have enough engagement, again, enrich it with some deterministically random results.

It’s basic but you can also add in other kinds of engagement and weigh them differently or whatever. It’s all good. Then you have massive success and hire a real datascience team.


How to isolate I18n configuration in a Rails Engine

GoodJob is a multithreaded, Postgres-based, Active Job backend for Ruby on Rails.

GoodJob includes an administrative web dashboard that is packaged as a mountable Rails Engine. The dashboard is currently translated into 8 different languages: English, German, Spanish, French, Japanese, Dutch, Russian, Turkish, and Ukrainian (I’d love your help improving these and translating additional languages too). Demo here: https://goodjob-demo.herokuapp.com/

I have learned quite a lot during the GoodJob development process about internationalizing a Rails Engine. I’ve previously worked on rather large and complicated localized government welfare applications, so I’m familiar with localization in the context of a Rails app. But internationalizing a Rails Engine was new, getting it right was harder than I expected, and I had trouble finding documentation and code examples in other libraries.

Overall, internationalizing a Rails Engine was nearly identical to the process of internationalizing a Rails Application as covered in the Rails Guides: using the I18n library and extracting strings from ERB views into keyed configuration files (e.g. config/locales/en.yml) and replacing them with <%= t(".the.key") %> . Simple.

The difficult part was separating and isolating GoodJob’s I18n configuration from the parent applications.

Why is it necessary to isolate I18n?

As a mountable Rails Engine, GoodJob’s dashboard sits within a parent Rails application. GoodJob should step lightly.

The I18n library provides a number of configuration options:

  • I18n.current_locale
  • I18n.default_locale
  • I18n.available_locales
  • I18n.enforce_available_locales , which will raise an exception if the locale is switched to one not contained within the set of available locales.

It’s possible that GoodJob’s administrative web dashboard would have different values for these than the parent Rails Application. Imagine: An English and Ukrainian speaking development and operations team administering a French and German language only website. How to do it?

Isolating configuration values

I18n configuration needs to be thread-local, so that a multithreaded webserver like Puma can serve a web request to the GoodJob Dashboard in Ukrainian (per the previous scenario) while also serving a web request for the parent Rails application in French (or raise an exception if someone tries to access it in Italian).

Unfortunately, I18n.current_locale is the only configuration value that delegates to a thread-locale variable. All other configuration values are implemented as global @@ class variables on I18n.config. This makes sense when thinking of a monolithic application, but not when a Rails application is made up of multiple Engines or components that serve different purposes and audiences (the frontend visitor and the backend administrator). I struggled a lot figuring out a workaround for this, until I discovered that I18n.config is also thread-local.

Swap out the entire I18n.config value with your Engine’s own I18n::Config-compatible object:

# app/controllers/good_job/application_controller.rb
module GoodJob
  class ApplicationController < ActionController::Base
    around_action :use_good_job_locale

    def use_good_job_locale(&action)
      @original_i18n_config = I18n.config
      I18n.config = ::GoodJob::I18nConfig.new
      I18n.with_locale(current_locale, &action)
    ensure
      I18n.config = @original_i18n_config
      @original_i18n_config = nil
    end
  end
end

# lib/good_job/i18n_config.rb
module GoodJob
  class I18nConfig < ::I18n::Config
    BACKEND = I18n::Backend::Simple.new
    AVAILABLE_LOCALES = GoodJob::Engine.root.join("config/locales").glob("*.yml").map { |path| File.basename(path, ".yml").to_sym }.uniq
    AVAILABLE_LOCALES_SET = AVAILABLE_LOCALES.inject(Set.new) { |set, locale| set << locale.to_s << locale.to_sym }

    def backend
      BACKEND
    end

    def available_locales
      AVAILABLE_LOCALES
    end

    def available_locales_set
      AVAILABLE_LOCALES_SET
    end

    def default_locale
      GoodJob.configuration.dashboard_default_locale
    end
  end
end

Here’s the PR with the details that also shows the various complications I had introduced prior to finding this better approach: https://github.com/bensheldon/good_job/pull/1001

Isolating Rails Formatters

The main reason I implemented GoodJob’s Web Dashboard as a Rails Engine is because I want to take advantage of all of Rail’s developer niceties, like time and duration formatters. These are also necessary to isolate, so that GoodJob’s translations don’t leak into the parent application.

First, time helper translations should be namespaced in the yaml translation files:

# config/locales/en.yml
good_job: 
  # ...
  datetime:
    distance_in_words:
    # ...
  format: 
    # ...
  # ...

Then, for each helper, here’s how to scope them down:

  • number_to_human(number, unit: "good_job.number")
  • time_ago_in_words(timestamp, scope: "good_job.datetime.distance_in_words")

By the way, there is a great repository of translations for Rails helpers here: https://github.com/svenfuchs/rails-i18n/tree/64e3b0e59994cc65fbc47046f9a12cf95737f9eb/rails/locale

Closing thoughts

Whenever I work on internationalization in Rails, I have to give a shoutout for the i18n-tasks library, which has been invaluable in operationalizing translation workflows: surfacing missing translation, normalizing and linting yaml files, making it easy to export the whole thing to a spreadsheet for review and correction, or using machine translation to quickly turn around a change (I have complicated feelings on that!).

Internationalizing GoodJob has been a fun creative adventure. I hope that by writing this that other Rails Engine developers prioritize internationalization a little higher and have an easier time walking in these footsteps. And maybe we’ll make the ergonomics of it a little easier upstream too.


I read "Recoding America" by Jen Pahlka

| Review | ★★★★★

The last time I communicated with Jen Pahlka was in the early days of the pandemic: May 2020. No longer locked down but still heavily masked, I visited the San Francisco Ferry Building Farmers Market and bought some fava beans. In the years before the pandemic, Jen had brought her own harvest of fava beans into the Code for America offices and shared them with any staff who wanted them. That had been my very first experience with fresh fava beans: shelling and boiling and shelling them a second time. Now, again with fresh fava beans in hand again, I thought of Jen amidst the pandemic turmoil. I sent her a short email hoping she was well. Jen never replied.

Jen’s book, Recoding America is a good book. It is a laundry list of scenes and vignettes of the greatest hits of government technology, albeit in language of elites discussing elite things in a bloodless elite way. Paired with a more hands-on manual like Cyd Harrell’s A Civic Technologists Practice Guide it covers the ground of the past decade.

That decade is an interesting one. Jen and I shared roughly the same tenure at Code for America: 2011 - 2020 for her, 2012 - 2022 for me; both of us with a gap in the middle. Of course, she was the founder and CEO, whereas I was a fellow, and then an engineer, then a manager, then a director. So we saw a lot of the same things, though from different vantage points on a different journey. In addition to my formal duties, which had a finger in every program at Code for America (if not an arm and leg) there were two activities that were initially happenstance but ended up turning into personal programss of mine:

First, I reached out to new hires, especially new people managers, and offered to have a casual 1:1 and welcome them to the organization. We would chitchat and the message I would work to impart was this: the dissonance between Code for America’s competent external brand and its lived internal chaos could chew people up and was ripe for gaslighting. Instead, those new hires should remember they deserved to be here, they should trust their experience and competence, and truly everyone is winging it (some nicer and more self-aware about it than others).

I’d tell them a story about a prior illuminating executive AMA with Jen where she had shared a philosophy: instead of drawing a bullseye on the wall and trying to hit it down the center with your programs and activities, you can simply throw stuff at the wall and draw the bullseye afterwards around what sticks.

Second, my desk overlooked the executive conference room, which was a glass fishbowl with a couch. It was not infrequent for people to leave that room in a mess of hot tears. When they did and I saw, I’d send them a Slack message and gently offer to buy them a coffee at the Blue Bottle around the corner from the office. There was no ulterior motive; working through it with my leadership coach, a protege of financier Carl Icahn: I simply operationalized giving a shit about people.

Compared to writing a book or leading an organization or serving as a government executive for a year, as Intel CEO Andy Grove might observe: my activities were not high leverage. They also fall outside the bullseye drawn around the stories of the book.

We’re all heroes of our own story. The book briefly touches on an employee who describes himself as “the new guy” because he’d only been a claims processor for 17 years, far less than his more senior colleagues, and still not enough to be capable of processing the department’s most complex claims. His offhand comment is a foil for a deep dive into classifications and processing bottlenecks and mythical man months, but not much else about him as a person. Or what I imagine is the multitudes contained in that brief remark. Is it self-protection, a humblebrag, an invitation for further dialogue about those 17 years, or the years ahead? I’ll never know because the focus shifts back to the administrator and her institutional processes.

In Zen and the Art of Motorcycle Maintenance, a professor advises a student who is hopelessly writers-blocked; the student wants to write about the history of the United States, but is gently redirected to first focus on one brick in one building on the main street of their college town. Writers block broken, the student cannot stop writing, successfully building towards their initial wide vision.

The first brick of this book is possibly the few people “who had all once been part of the team that keeps Google up and running, then had come to DC to help get healthcare.gov back on track.” Easily overlooked in a footnote, they now run a private consultancy for government. The book frames their work as a series of high stakes technology and leadership interventions but not their personal stories, motivations, finances (business and personal), deal flow, engagement philosophy and practices, their loves, losses, missed opportunities, sacrifices, shames, human complexities, ironies, paradoxes. There is no crying, no dying, no problematic influencers, no sketchy investors nor strange bedfellows, no grudgefull quitting, no harassment, no union busting. Nor ziplines, chickens, joyful tears, pirate flags, PDF-form frostings from the sexy-cake shop down the street, nor fava beans for that matter.

In reviewing why too much law and policy ultimately ends up in the courts, Recoding America references a criticism by Ezra Klein: “liberals are too often missing or too timid to claim: a vision of what the law is for.” Recoding America similarly fails here to share a compelling vision of just what civic technology is for. It describes the work of pushing the rock, but without a destination in mind. Every over, under, and through of the bear hunt, without describing the bear. A road built by walking… a division of men gathering wood for a ship… you get the idea.

In all, if you care about civic technology and want to know the major story points: read the book. I am hoping it moves the Overton Window on the stories people will tell about civic tech. Please, someone write our movement’s _Sex and Broadcasting, the humanistic tome of my prior career in community media and community technology. (Sociologist’s Karina Alexis Rider’s “Volunteering the Valley: Designing technology for the common good in the San Francisco Bay Area” suffices in the meantime.)

Lastly and memorably, Jen recounts serving on a task force addressing pandemic unemployment insurance. Writing with a vague yet startling honesty that haunts my own recollections:

The state should not have needed a task force to tell the EDD what it already knew, and it shouldn’t have needed us to secure permission to act on it. These things are never said out loud—neither the permission we had nor [administrator] Paula’s lack of it. But when we were gone, so was that permission. And soon after, for reasons that were not clear to me, a new backlog began to accrue.

This passage brings into focus the qualities that characterize my own experience with civic tech: power, permission, access, the parasocial qualities of professional relationships, and the fleeting closures of our ongoing experiments to live together in liberal democracy. I hope Jen is doing well, and though I didn’t write this explicitly in my last email, I’d love to hear from her.