Optimizely Intelligence Cloud: How To Use Stats Engine To A/B Test Smarter, And Faster

Optimizely Stats Engine and A/B Testing Strategies

If you’re looking to run an experimentation program to help your business test & learn, chances are you’re using Optimizely Intelligence Cloud – or you’ve at least looked at it. Optimizely is one of the most powerful tools in the game, but like any such tool, you might use it wrong if you don’t understand how it works. 

What makes Optimizely so powerful? At the core of its feature set lies the most informed and intuitive statistics engine in a third-party tool, allowing you to focus more on getting important tests live – without needing to worry that you’re misinterpreting your results. 

Much like a traditional blind study in medicine, A/B testing will randomly show different treatments of your site to different users to then compare each treatment’s efficacy. 

Statistics then help us make inferences about how effective that treatment may be over the long term. 

Most A/B testing tools rely on one of two types of statistical inference: Frequentist or Bayesian stats. Each school has various pros and cons – Frequentist statistics require a sample size to be fixed in advance of running an experiment, and Bayesian statistics mainly care about making good directional decisions rather than specifying any single figure for impact, to name two examples. Optimizely’s superpower is that it’s the only tool on the market today to take a best of both worlds approach.

The end result? Optimizely enables users to run experiments faster, more reliably, and more intuitively.

In order to take full advantage of that, though, it’s important to understand what’s happening behind the scenes. Here are 5 insights and strategies that will get you using Optimizely’s capabilities like a pro.

Strategy #1: Understand That Not All Metrics Are Created Equal

In most testing tools, a commonly overlooked issue is that the more metrics you add and track as part of your test, the more likely you are to see some incorrect conclusions due to random chance (in statistics, this is called the “multiple testing problem”). In order to keep its results reliable, Optimizely uses a series of controls and corrections to keep the odds of that happening as low as possible. 

Those controls and corrections have two implications when you go to set up tests in Optimizely. First, the metric you designate as your Primary Metric will reach statistical significance fastest, all other things constant. Second, the more metrics you add to an experiment, the longer your later metrics will take to reach statistical significance.

When planning an experiment, make sure you know which metric will be your True North in your decision-making process, make that your Primary Metric. Then, keep the rest of your metrics list lean by removing anything that’s too superfluous or tangential.

Strategy #2: Build Your Own Custom Attributes

Optimizely is great at giving you several interesting and helpful ways to segment your experiment results. For example, you can examine whether certain treatments perform better on desktop vs. mobile, or observe differences across traffic sources. As your experimentation program matures though, you’ll quickly wish for new segments – these may be specific to your use case, like segments for one-time vs. subscription purchases, or as general as “new vs. returning visitors” (which, frankly, we still can’t figure out why that isn’t provided out of the box).

The good news is that via Optimizely’s Project Javascript field, engineers familiar with Optimizely can build any number of interesting custom attributes that visitors can be assigned to and segmented by. At Cro Metrics, we’ve built a number of stock modules (like “new vs. returning visitors”) that we install for all of our clients via their Project Javascript. Leveraging this ability is a key differentiator between mature teams who have the right technical resources to help them execute, and teams who struggle to realize the full potential of experimentation.

Strategy #3: Explore Optimizely’s Stats Accelerator

One often-overhyped testing tool feature is the ability to use “multi-armed bandits”, a type of machine learning algorithm that dynamically changes where your traffic is allocated over the course of an experiment, to send as many visitors to the “winning” variation as possible. The issue with multi-armed bandits is that their results aren’t reliable indicators of long-term performance, so the use case for these types of experiments are limited to time-sensitive cases like sales promotions.

Optimizely, though, has a different type of bandit algorithm available to users on higher plans – Stats Accelerator (now known as the “Accelerate Learnings” option inside Bandits). In this setup, instead of trying to dynamically allocate traffic to the highest-performing variation, Optimizely dynamically allocates traffic to the variations most likely to reach statistical significance quickest. This way, you can learn faster, and retain the replicability of traditional A/B test results.

Strategy #4: Add Emojis to Your Metric Names

At first glance, this idea probably sounds out of place, even inane. However, a key aspect of making sure you’re reading the right experiment results starts at making sure that your audience can understand the question. 

Sometimes despite our best efforts, metric names can become confusing (wait – does that metric fire when the order is accepted, or when the user hits the thank you page?), or an experiment has so many metrics that scrolling up and down the results page leads to total cognitive overload.

Adding emojis to your metrics names (targets, green checkmarks, even the big money bag could work) can result in pages that are far more scannable. 

Trust us – reading out results will feel much easier.

Strategy #5: Re-consider Your Statistical Significance Level

Results are deemed conclusive in the context of an Optimizely experiment when they’ve reached statistical significance.  Statistical significance is a tough mathematical term, but essentially it’s the probability that your observations are the result of a real difference between two populations, and not just random chance. 

Optimizely’s reported statistical significance levels are “always valid” thanks to a mathematical concept called sequential testing – this actually makes them far more reliable than those of other testing tools, which are prone to all sorts of “peeking” issues if you read them too soon.

It’s worth considering what level of statistical significance you deem important to your testing program. While 95% is the convention in the scientific community, we’re testing website changes, not vaccines. Another common choice in the experimental world: 90%.  But are you willing to accept a little more uncertainty in order to run experiments faster and test more ideas? Could you be using 85% or even 80% statistical significance? Being intentional about your risk-reward balance can pay exponential dividends over time, so think this through carefully.

Read More About Optimizely Intelligence Cloud

These five quick principles and insights will be incredibly helpful to keep in mind while using Optimizely. As with any tool, it boils down to making sure that you’ve got a good understanding of all of the behind-the-scenes customizations, so you can make sure you are using the tool efficiently and effectively. With these understandings, you can get the reliable results you’re looking for, when you need them. 

What do you think?

This site uses Akismet to reduce spam. Learn how your comment data is processed.