Blog, Uncategorized

Using Betas to Deploy New Features Safely

From the software business, changing a running system is dangerous. Because they say, “If it ain’t broke, don’t fix it” Regrettably, even if code were flawless, progress marches ever forward, and brand new features are being added. Certainly, one of the worst feelings for a developer is learning that something you merely shipped has generated concern, or even worse, an incident.

For companies like Shopify that practice continuous deployment, our code is slowly shifting many times every day. We have to de-risk new features to ship smoothly and confidently without impacting the million + merchants using our platform.

Beta flags are just one approach to feature development that provides us with a variety of unique benefits:

  • Reduce burst radius of changes–should they don’t go as planned–by rolling out the Beta to a percentage of the subject category.
  • Instantaneously choose when the shift is active in production. Without beta flags, fluctuations are occupied at the time of deployment.
  • Instantaneously rollback the feature.
  • However, ship new code paths, an average of inactive, allow devs to test these code paths at production.

Anatomy of a Beta Flag

The term”beta” is a bit overloaded from the software business, being used even to reference established products. We are going to define some primitives for clarity.



Subject: A concept which you would like to specify a control plane against. For multi-tenant SAAS applications, this is usually the version corresponding to your renter. For Shopify, that is typically our concept of a store. When designing, consider a polymorphic approach which means it is possible to implement betas against multiple things.

BetaIdentifier: that is frequently an elementary series that reflects the feature you are developing. Keep in mind that you should be skeptical of case sensitivity and inadvertently reusing the same series in the future if you work with a string instead of an auto-incrementing integer. Metadata could be related to this particular and should be contemplated for internal documentation/tooling reasons. For example, a high-level description of this feature, a list of chat channels requiring telling relating to its feature’s rollout, a list of owners, descriptions of this behavior when this feature is enabled or disabled, etc…

BetaFlag: At the lowest level, this is a small piece of data associated with a Subject. This can be implemented as a “Subject has_many BetaFlag” relationship. Inside the BetaFlag, we have a BetaIdentifier and typically some created_at/updated_at timestamps.

For a given BetaFlag, we can:

  • Check if an instance of Subject has this BetaIdentifier? If so, the feature is turned on for the Subject.
  • Grab a list of all Subjects with this BetaIdentifier.

This check allows us explicit per-Subject feature toggling.

BetaRollout: This is data that lives unrelated to any one particular subject. Inside BetaRollout, we have:

  • beta_name: a BetaIdentifier, so we can know which concept we’re dealing with.
  • percentage_rollout: an integer (0..100) reflecting which percentage of Subject you wish to enable the BetaFlag for.
  • A method to calculate whether to consider a Subject as “rolled out”.

For a given BetaRollout, we can ask, “Does an instance of Subject have this BetaIdentifier?” If so, the feature is turned on for the Subject.

If the BetaRollout record does not exist, we will assume that the feature is turned off.

Here’s an example of a performant way to implement the BetaRollout#enabled? method:


def enabled?(rollout_percentage, beta_identifier, subject_identifier)
  return true if rollout_percentage >= 100
  return false if rollout_percentage <= 0

  percentage_hash(beta_identifier, subject_identifier) < rollout_percentage
end

def percentage_hash(beta_identifier, subject_identifier)
  Digest::MD5.hexdigest("#{beta_identifier}#{subject_identifier}").to_i(16) % 100
end

By calculating a digest of the two identifiers and converting it in an integer modulo 100, we ensure that each percentage rollout will hit an alternative pair of Topics as the percentage gains. This means that an alternative subset of Topics is changed every time we perform every beta rollout. Why does this thing? This prevents potential adverse effects which occur because of rollouts from impacting the exact subset of subjects consistent.

This execution also has a wonderful invariant: As the per cent rises (as an example, from 0% to 11 per cent, then 11 per cent to 20 per cent, etc.), the previous pair of Subjects that watched the feature continues to be seeing the feature. The digest modulo 100 stays stationary even as the rollout_percentage change. This is vital to ensuring a great user experience because it will be troublesome to see a feature evaporating and appearing (seemingly randomly) whilst the rollout per cent rises.

The greatest effect is that each BetaRollout comes with a uniquely consistent growing set of Subjects experiencing the feature in the journey from 0 per cent to 100% (or from X% to 0%).

Bundling These Concepts

Once we define a BetaIdentifier, we can do the following things:

  • Apply the new feature to a particular Subject by creating and associating a new record of BetaFlag to the Subject.
  • Roll out the feature to X% of Subjects by creating a BetaRollout and setting it to a particular % value.

These two separate concepts are related to the BetaIdentifier. The question about whether or not a new feature is enabled then looks something like this for a given Subject:

def beta_enabled?(beta_identifier)
  betas_for_subject.include?(beta_identifier) || BetaRollouts.enabled?(subject_identifier, beta_identifier)
end

In English, this equates to”Does the subject have the flag (via BetaFlag) or implicitly (via BetaRollout)?”

What happens when we want to revert the beta flag? Perhaps the feature is bugged. There are two cases to consider:

  • The flag was rolled out using the % mechanism only.
  • The flag was rolled out by manually applying the flag to specific instances of the Subject

After the flag is wrapped out with the% mechanism only, the rollback process is extremely straightforward: change the per cent rollout value of their BetaRollout into’0′. Be aware that anyone with the direct application of this flag will still observe the feature.

When the flag is rolled out to maybe thousands of Subjects by direct application, we find ourselves in an infinitely more challenging circumstance. We have a continuous episode and tens of thousands of records at the DB we can not immediately remove. In the ideal case, we have to write a caring test to alter the database to get these thousands of Subjects. In the absolute worst case, it’s just not possible!

It would’ve been more tempting to refer to the Beta concept direct during our code as it seems to offer us all the flexibility we all would like. But, we’ve just discovered a situation where we can’t easily roll back. How do we proceed?

Taking Things a Step Further

Rather than having all of our code refer to Abeta directly, we should be writing an abstraction a layer higher. In our hypothetical situation, we can imagine something called a”Feature” that is described such as:

class FeatureSet
  # ... 
  def my_cool_new_feature_enabled?
    Beta.enabled?(‘my-cool-new-feature’) &&
      !Beta.enabled?(‘my-cool-new-feature-opt-out’)
  end
end

With another layer above the primitives like this, we’ve just introduced flexibility for developers to:

  • Apply a feature directly to a specific Subject (or thousands of them).
    For example: Add a feature to a production Subject for testing or for a particularly unique rollout that cannot be %-based.
  • Apply this feature to a random %-based sampling of all other Subjects.
    For example: A typical rollout of a feature might encompass slowly rolling out to an ever-increasing population of Subjects
  • Apply an escape hatch for specific Subjects that encounter problems with the feature by applying the my-cool-new-feature-opt-out flag directly to the Subject.
    For example: Some bugs encountered might not be severe enough to roll back the rollout for all Subjects, but we want to allow a specific Subject to disable the feature
  • Apply a kill switch for all Subjects by rolling out the my-cool-new-feature-opt-out flag to 100%.
    For example: A Beta has been applied to thousands of Subjects but can’t easily be removed, but we need to halt the feature immediately.

if we imagine a runaway train of an incident caused by rolling out your feature, we find that individuals have to hopefully be in a position to fix it by employing the kill switch immediately! Power, safety, speed. Often,

Beta features may begin with some simple qualifications. Invariably, They Frequently evolve to eventually become something more like:

def my_cool_new_feature_enabled?
  subject.domiciled_in_usa? &&
  subject.using_advanced_plan? &&
  Beta.enabled?(‘my-cool-new-feature’) &&
    !Beta.enabled?(‘my-cool-new-feature-opt-out’)
end

Utilizing a higher-level abstraction makes room for your feature to quickly change eligibility as the business requirements change. Things that appear to be simple Yes|No Beta flags usually evolve to have other conditions in our experience. This is all to say that you should think about avoiding speaking to the lesser level primitives directly. Is it easier to improve one method than heaps of interspersed Beta? enabled? (…) invocations. This generalizes well to vulnerability within an API: If the feature evolves, N clients remain talking directly into the primitives. We can not necessarily update all of them simultaneously without the higher-level abstraction getting exposed.

Some Things to Remember

That is only one way to developing new features safely and efficiently that we’ve found to be highly capable at Shopify.

The formulations we developed here are enlightening, and also your implementation will surely change.

Data Structure Differences and Reconciliation

In case code path A generates data incompatible with code path B, you will more than likely have issues once you roll back unless you have pre-emptively thought about the rollback experience upfront. Consistently consider what happens when you switch paths.

This beta pattern optimizes the ability to improve code paths in production fast, but the lingering data may often be forgotten.

Not all Features Can be Developed Sanely in this Manner

Similar to the previous point; a few features impact your underlying data version. Even if we believed in migrating between both code paths, you might encounter a feature where switching back leaves things in a nonsensical state. Some things do not sound right to be able to be rolled back. Some features are far better developed iteratively as opposed to switching all at once

Not all Features Can be Rolled Back Without a User Disruption

Imagining a new feature: once a user has come to depend on it, it may be impossible to roll back without leaving the user confused. Imagine introducing a new business concept that wasn’t fully fleshed out. You want to prevent future users from using/seeing it, but you still need to accommodate the current set of users that have already used it and come to rely upon it.

Imagining a new feature: once a user has begun to rely on it, it may not be possible to roll without leaving the user confused. Imagine introducing a new business concept that was not fully fleshed out. You want to avoid prospective users from using/seeing it; however, you still should accommodate the present group of users who have used it and trust it.

In that scenario, you can allow a subset of users to continue using the feature while preventing further access for all users who haven’t yet used it. You could accomplish that by manually employing the BetaFlag into a set of Topics while switching the BetaRollout down to 0 per cent.

Avoid Reusing BetaIdentifier Names

Suppose you are using a magical series as a treat. If that’s the circumstance, you’ll be able to imagine that if you should reuse a previously used handle, then you may have BetaFlag identifiers and BetaRollout in your database . When these features stayed enabled without proper deprecation (or even deletion from the database), you can imagine that as soon as you employ the same handle again, the flag will be immediately activated on several Topics, which will be perhaps not what you expect. But in practice, this is extremely rare

Treat Beta Rollouts as if They Were Deploys

While you’ve already shipped the underlying code for the beta, typically, it lies inactive until we apply the flags as described above. One of the most impactful things we’ve done at Shopify is to consider %-based rollouts as being as important as deploys.

Even though you’ve already shipped the underlying code for that Beta; it also builds inactive before employing the same flags as described previously. Perhaps one of the important things we’ve achieved at Shopify would always be to think about per cent -based rollouts to be as essential as deploys.

When someone shifts a per cent -based rollout at Shopify, our #operations chat channel is informed of the change and also why it was changed. If exceptions begin or our metrics start to decline, we finally have another data point to think about. Previously, we were operating in the dark at the surface of beta rollouts. Immediately changing a program’s running code paths sounds a great deal as a”deploy.”

When someone employs a beta flag to a Subject directly at Shopify, the teams that have developed the feature are informed with a Slack message due to the metadata we correlated with the BetaIdentifier described above. This can help to prevent errant beta applications and inform the teams about”opt-out” beta applications. The teams developing the feature might better steward it armed with the knowledge of everything exactly is shifting.

Realize the Testing Nuances

Once you’ve added your new feature, you’ve added unit tests to the newest code paths, probably even some integration tests. These tests all look roughly similar: first, enable the feature, then examine the new code path. All works well. Or will it be?

Depending on how profoundly the new feature touches the system, unit tests could be enough. But, you have to consider that your whole suite of tests is analyzing the older code path except for these brand-new tests you have added. If the feature is set to 100 per cent (and sits there for months), then almost every one of your tests is testing a code path that doesn’t truly occur in production any longer. We’ve seen this present after removing beta episodes following rollouts: unexpectedly, tens of thousands of tests fail because the previous evaluations did not examine the new code path. On average, it’s really a slight inconvenience, and also, some additional effort must be spent auditing and adjusting the tests. In our experience, this, on average, has never been a challenge for many small features.

Finally, each feature collection adds yet another permutation into the set of possible code paths that our code may run. For some especially hairy options, we’ve chosen to perform whole test files (by way of instance, special controllers, maybe not the entire suite) double for an extra degree of confidence.

As a pragmatic note to this dev working on this particular feature, it may be affixed to hard code the feature to”true” and witness which tests fail on branch CI, potentially pointing into missed considerations and border cases within the feature’s implementation.

Clean up Your Work

It may be common to see code wrapped to 100%; however, the beta flags exist weeks or even years after. In case the prior code path is still okay, it is sometimes a good practice to maintain the beta flags round for a couple of months if something comes up. However, given time, teams may step off the undertaking, and also, that pre-beta code path becomes dead code. Fundamentally, this is technician debt that has to be washed up

Conclusion

These primitives and patterns have enabled Shopify to build up a wide range of developments, extensive new options, small and huge refactors, and the toggling of operation improvements, to name a few. Much more, armed with such primitives, we’ve got the confidence to ship boldly, knowing that we have mechanisms to restrain the applications once, even when it’s been deployed. The power this level of the controller provides you can not be understated.