Payment Gateway A/B Testing Guide

Let’s quickly wrap up what we have done so far to be able to run successful A/B test on your payment gateway setup:

We have chosen metrics (KPIs) for which we want to optimize our payment gateway setup like Acceptance Rate or Conversion Rate
We have done the research and investigated what might be hurting our payment gateway performance
Based on the research insights we have come up with a hypothesis on how we want to improve payment gateway performance metrics

Now it is time to launch the A/B test and see if our hypothesis is true by any chance. But before we jump into testing itself, we need to run some numbers and do the math and statistics.

A/B testing is a scientific method of validating optimization ideas, so it requires some scientific rigour. As a matter of fact, everyone can set-up and launch the test, but not everyone can and should. This is determined by the number of users using your payment gateway daily and the number of users paying using your payment setup.

But why do you need these numbers in the first place? Your test results need to be representative of the whole population of your users. You are running your A/B test only on a portion of your user base, so if there are not enough users and conversions in your test sample, the whole test is just not representative. In the A/B testing world, we call it “statistical significance.” To achieve statistical significance at the end of your test, you need to provide enough users and conversions to test in the first place.

We strongly recommend that you read more about statistical significance in A/B testing to grasp the idea fully.

There are a few quick and straightforward methods to verify if you got what it takes to launch an A/B test. First, you can have a look at this A/B testing cheat sheet to get the general idea about how many conversions/payments you need on a daily basis to be able to run the test and how long it may take.

(Image Source)

As you can see, if your payment gateway is seeing only 5 payments per day (or fewer), you will not be able to run a valid A/B test. If your daily number of successful payments is 6, you can run your test, but it will still take you about 35 days to see the results.

If you still think you have what it takes to run the test, it is time to crack the numbers. We strongly recommend using the CXL pre-test analysis calculator

The numbers you need to provide include:

Weekly traffic (session or users) – i.e., the number of people visiting your payment gateway URL or app screen each week. You can easily take this number from your web analytics tool. We recommend that you use the number of “users” instead of sessions, page views, or visits. After all, you are running a test for real people and not for some abstract number of page views.
Weekly conversions – i.e., the number of successful payments you are seeing for your payment gateway setup right now. You can get these metrics from your web analytics tool if you have set up your conversion tracking, or you can pick this number form the administration panel of your payment gateway. If your payment provider does not provide such metrics, you have yet another reason to switch to another payment gateway.
Number of variations – i.e. the number of payment gateways or set-ups you would like to test. It is strongly recommended that you always stick to two variants and never test more than once. Testing more variants requires much more traffic and conversions, and drastically increases the probability of statistical error.

Below you will see additional numbers that are auto-generated based on the figures you provide above:

Baseline conversion rate – i.e., the metrics that are telling you how efficient your payment gateway is at converting users into customers. These are the metrics towards which you will be most likely running your A/B test. You can easily calculate it on your own by dividing the number of “weekly conversions” by the number of “weekly traffic.”
Confidence level – i.e., the number that influences how many users and how many conversions you need to be able to run valid tests. 95% confidence tells you that at the end of the test, you want to have a 95% probability that your results are not accidental. It means that that there is a 95% probability that if you replicate the test or implement your findings, you will receive similar (or better) results. This also means that you have a 5% probability that your tests results are not valid, and if you implement them, you will actually see a decrease in conversion. Here you can read more why you should use 95% statistical confidence level for your test and what it is exactly.
Statistical power – the probability of getting statistically significant test results at level alpha (α) if there is an effect of a certain magnitude. In other words, it’s the ability to see a difference between test variations when a difference actually exists. This is a highly scientific metric. You don’t need an in-depth understanding of it to run successful A/B tests. Nevertheless, we encourage that you have a look at this article explaining the role of statistical power in A/B testing.
Number of weeks running the test – i.e., how long you will have to run your test in order to achieve 95% statistical confidence at the given the “minimum detectable effect” level and at the given number of users taking part in the test. This is only a minimum time of running your test. You should never end your test earlier than this unless the test is broken.
Minimal detectable effect – i.e., how much change in conversion rate your test will need to generate to achieve 95% statistical confidence at a given number of users taking part in the test. So the more users that take part in the test, the less uplift your test will need to generate, and vice versa.
Visitors per variant – i.e., how your test length and minimal detectable effect will vary depending on the number of users that will take part in the test. The more users in the test, the faster you will see the results, and the lesser changes in conversion will be required.

This is basically all you need to know to validate the numbers before you run an A/B test. You need to know exactly how long the test needs to run, how much users should take part in it, and, more importantly, and if you have enough users and conversions to run a valid test.

Of course, there are plenty of other A/B test calculators you can use to double-check your test numbers. For example, check out this A/B test duration calculator by VWO.

What if it turns out that you do not have the numbers or just feel overwhelmed by the complexity of all the stats and numbers behind A/B testing? Well, we have some workaround for A/B testing noobs.

If you have a small website or your business is only gaining traction, you will most likely have to face the brutal truth – you do not have enough users or conversions (or both) to run an A/B test that would achieve 95% statistical significance. Alternatively, you would have to run your test for a couple of months to achieve valid results – which does not make much business sense either way.

Everything we have covered so far about checking if you can run a valid A/B test and the 95% statistical significance falls into the “frequentist statistics” category. This is the most popular statistics model taught in schools and used in most A/B testing tools on the market.

But there is also a less popular statistics model that can be applied to A/B testing, known as Bayesian Statistics. This model is much simpler to grasp for the non-technical people and is even easier to understand for the business people. However, it comes with some pitfalls.

Pros of using the Bayesian model:

Easier to understand for business people especially when it comes to presenting test results
Requires less data to run and validate the test
You decide when to end the test and how much error risk you can bear
It takes in to account not only the number of conversions but also the revenue generated by each of the testing variants

Cons of using the Bayesian model:

Less accurate than frequentist model – carrying more risk of an error
An inexperienced optimizer can be tempted to stop the test too early and call a winner

Let’s have a look at Bayesian A/B testing in action. The best way to this is to head to this great Bayesian A/B test calculator

The only number you have to provide to evaluate the results of your test are:

Number of users that will see version A of your payment gateway
Number of users that will see version B of your payment gateway
Number of conversions / successful payment for version A
Number of conversions / successful payment for version B

That’s it. Everything else in this calculator is optional, but we are going to cover it anyway. First, let’s see what you get after hitting the Make calculation button. The first thing that you will see will be the horizontal chart with two bars – green and red.

This chart is by far the simplest presentation of A/B test results you can get for your tests.

The green bar will always be shown for the winning test variation and will have % metrics attached to it. This % metric indicates the chances of winning / outperforming the other variation.
The green bar will always be shown for the losing test variation and will have % metrics attached to it. This % metric indicates the chances of winning/outperforming the other variation.

Below the chart, you will see another number that should be self-explanatory. The most important thing is the two % numbers on the chart. In the above example, there is a 91% chance of variation A to have a higher conversion rate in real life than variation B. By real life, we mean the post-test implementation of the winning variation and seeing the business impact.

This is as simple as it gets. You have 91% of being right with your test and around 9% risk of being wrong. And it’s your call to either implement the test results or consider it unsuccessful based on the simple probability metrics.

So if you do not have enough metrics to achieve the 95% probability, you will have to make a decision based on the less favourable odds. For example, you may have to decide based on the 70% probability that the test results were valid, and a 30% risk of the test results being misleading. It all comes down to simple business decisions based on the calculated risk and probability. Simple as that.

But there is one more cool use case of the Bayesian calculator. It can actually build a business use case for you taking into account not only the raw number of conversion but the actual money value and the impact on your revenue.

In the optional section, you can provide the following metrics:

How long is your test going to run, i.e., 14 days
What is the average order/transaction/cart or subscription value?
Minimum revenue yield in 6 months, meaning how much revenue uplift you would like to see 6 months after implementing your test results?

If you provide these 3 numbers, the calculator will build a business use case for you that will look something like this:

The most important thing you should pay attention to is the section under the chart. Apart from the previous probability of success and error, you will see “the effect on revenue.” On top of pure probability, you can now use an actual revenue value to drive your decisions. Now it is not about being right or wrong, but it is about losing or earning money.

Back to the first chart and table, you should also see something like this:

So if your plan is to make €45,000 of additional revenue as a result of your test, or this is the quota you need to break even, you will also get a probability of hitting this number 6 months after the test.

We encourage you to learn more about the differences between Bayesian and Frequentist statistic model and their impact on A/B testing from these great resources:

And a must-read for everyone putting their hands on A/B testing – Most Common Testing Statistics Mistakes and How to Avoid Them.

Source: https://securionpay.com/blog/payment-gateway-ab-testing/

Generative Data Intelligence

🔴Ethereum ETFs Delayed | This Week in Crypto – Mar 11, 2024

7 BEST Meme Coins to Buy NOW – What is the NEXT SHIBA INU?

Latest Intelligence

Winklevoss twins receive refund of over $300k from Trump’s campaign

Winklevoss twins receive refund of over $300k from Trump’s campaign

Render Network Highlights Spring 2024 Achievements and Future Plans

Arweave’s AO sees $260 million pre-bridged in 4 days

Arweave’s AO sees $260 million pre-bridged in 4 days

New Products 6/19/24 Feat. Adafruit HX711 24-bit ADC for Load Cells / Strain Gauges!