September 29, 2014
Every marketing team dreams of dropping a campaign that drives extraordinary traffic and engagement, particularly during the holiday season. However, without proper preparation, that dream can quickly turn into a nightmare. Unplanned load spikes can, at best, slow down your site’s performance and degrade the customer experience. At worst they can bring your whole site down. This worst case scenario means that you’ve lost the sales from that stellar campaign, but also your customer’s trust, affecting their future purchase habits and your brand’s reputation.
I often get asked to estimate the number of concurrent users an environment can handle. My response is always “it depends”. The truth is there is no simple answer to this question. The true capacity of an environment is a function of two critical elements: your infrastructure and your application code. Both elements, if improperly sized or tuned, can result in lost capacity and poor performance. The demarcation between infrastructure and application is blurry. Infrastructure can often resolve issues with bad code, and a well written application can make up for slow infrastructure. Understanding the true capacity of an environment can only occur when you have done a stress test that assesses your environment with both elements in place.
Often the business or marketing teams have no idea what the actual capacity of their environment is. By capacity we mean the point at which the number of users hitting the site will cause performance degradation. Depending on your infrastructure and code, this number could be anywhere between 100 and millions of users. So how do you determine your true capacity? The easy answer is you test it.
Load or Stress Testing is an important element in your application performance tool box. Conducting load tests regularly and after major releases is an important step in ensuring the day to day health of your environment, as well as to help you prepare for your peak season. If done properly, it also tells you the true capacity of your environment.
There are two types of load tests that you can perform:
The first is called a Benchmark Load Test. In this type of test, you incrementally test a site until you reach the projected number of users you expect will visit your site. This number should exceed the projected number of sessions your environment was estimated to handle. Results will confirm whether your site can handle your projections, but not much more.
The second type of load test is called a Peak Threshold Load Test. In this type of test you keep increasing the number of users to your site until you see performance degradation. This is the number you communicate to your business team when they ask you how many concurrent users the site can handle.
The type of test your run could be determined by your specific business, but in general we would recommend that B2C retailers run a Peak Threshold Load Test leading up to the holiday season. Ideally, you would run one load test early during your planning season to understand current capacity. Based on those results your teams can work on improving performance and increasing capacity leading up to a second test. That second test will give you your magic number, the best indicator of what your site can handle during the holiday season.
Where should you be running this test? If you don’t have a test environment which is a full mirror of your production environment (and few organizations do) then you should run the test in production. So how do you do the test without impacting your paying guests? Pick the point during the week when you have the fewest number of users. Consider telling your users what you are doing by posting a message in a banner or in a pop up when clients come to the site. It is also vital to communicate your test to your stakeholders including the business team, your infrastructure provider and any upstream integration partners you have that your site connects to. For example, you want to make sure that your marketing team is not running a campaign or sale during your test. Participation from all vendors in a load test is important to its success. Communicate to them when, what and how the test is being done. This allows them to monitor their systems for impact and potential problems while the test is being executed.
Load testing can be an expensive service. Pricing is based on the number of users sent to the site, the number of scenarios used and the length of the test. Although it is still a large investment, the recent introduction of cloud testing services has made the service much more accessible for mid-sized merchants.
Testing the right scenarios is key to making sure you get the most for your dollar. Look at the most frequent activities performed by your users and make sure those actions make it into your plan. You don’t have to test all scenarios, but make sure you are testing the most frequently performed actions in order to get a good picture of your site performance. If your Marketing team is planning interactive campaigns as part of their holiday planning, make sure you test those scenarios as well.
Load Testing vendors should provide you with a comprehensive report on the results of the load test. It should include successful transactions, performance metrics like page views and load times, errors detected and percentage of transactions where the errors was experienced, as well as infrastructure performance.
As part of the test, we recommend the following tools to monitor your environment:
Google Analytics – here you can monitor the number of concurrent users on the site. In addition, Google Analytics will often tell you there is a problem with the site before your monitoring systems pick it up. Sudden drops in pageviews per second and visitors usually indicate there is a problem with the site.
Hardware Monitoring - Key areas to look at during a test on your hardware are CPU, disk and memory utilization.
Application Monitoring – by leveraging performance tools you can keep a close eye on critical KPIs like Calls Per Minute, Long Running Calls, Top 10 Processes, Garbage Collection Frequency and Page Load Time.
Once your test is complete you will have that elusive magic number: your best prediction of the number of concurrent users your site can handle.
If you’re early in your planning process you still have time to review your environment and make improvements to your capacity to increase that number. If you’re looking for advice on how to do this, check out our Ecommerce Holiday Guide. It includes lots of best practices and tips for preparing your infrastructure, application and marketing for the busy holiday shopping season. If you’re looking for an automated solution consider the Tenzing Site Optimizer our automated front end optimization and CDN service.
About Elizabeth Scott and Tenzing
Elizabeth Scott is the Director of Technical Services at Tenzing, a leading managed services provider for Ecommerce merchants. Inspired by her experience working with retailers, Elizabeth created Tenzing’s Cyber Week and Holiday Season Preparedness Programs to ensure Tenzing clients are well prepared for the holiday season, much of which drove their holiday guide content. Founded in 1998, Tenzing delivers more than scalable infrastructure, fast networks and great managed services. Tenzing combines scalable AWS infrastructure, deep Magento platform expertise, advanced managed services, and extensive industry partnerships to help Magento merchants increase revenues and deliver remarkable customer experiences. For more information, visit: www.tenzing.com.