14 Factors Slowing Your Website & How to Handle Them

bizpic-tablet
Techno Functional Consultant at HighRadius

 

“250 milliseconds, either slower or faster, is close to the magic number for competitive advantage on the Web” – Harry Shum, Executive Vice President of Technology and Research, Microsoft.

According to a report published by the site optimization website Strangeloop (now Radware), 57% of online customers will abandon a website after waiting 3 seconds for a website to load out of which 80% will not return ever again and half of this fraction will tell others about their bad experience. Add to this the fact that Internet users are reported to have faulty perceptions of time spent ‘waiting’ (15% slower than actual load time), we can imagine how crucial role Web Performance (Speed , Stability and Availability) plays in determining the success or failure of an enterprise in this exceedingly online world as it directly impacts user retention , online feedback, number of downloads (mobile apps) , conversions, and thus, revenue.

The quantitative rewards of having a fairly fast and robust online presence can be imagined by the fact that President Obama’s fundraising site raised an additional $34 million for his campaign in 2011, after increasing page speed by 60% and that Intuit saw a 14% increase in conversions after cutting down its load time to half (Ref.). On the contrary, any adverse behavior brings with it the cost of lost opportunity, potential loss of customer loyalty and a severe damage to the brand reputation.

An ideal way to mitigate such risks will be to estimate the impact of web performance and downtime on the business revenue using production load statistics juxtaposed with business revenue model and use the findings to optimize the capacity planning process and appropriate funding for performance tuning/testing activities accordingly.

Large enterprises mostly have dedicated teams to avert any ‘slowness’ in production , armed with state of art 24 X 7 real time monitoring and diagnostic tools equipped with dashboards and alerts/triggers , self-healing mechanisms, graceful fail-overs (like Oracle RAC implementation) and multiple levels of redundancy and data backups to take over the primary configuration in case things go south.

The precursor to such monitoring and arguably a much more vital step is to test this chunk of code and all the pieces of production hardware where this is going to be deployed for predefined performance goals w.r.t throughput and latency through a battery of tests (load, stress, scalability, soak) usually performed by a dedicated Performance Testing and Engineering Team (or the development team itself) over a period of time before release. Since prevention is always better than cure, almost 90% of  performance issues in production can be prevented if the hardware, configurations and code are thoroughly tested together by performance experts and the results carefully analyzed (avoiding wrongful extrapolations) for any anomalies and deviation from the standard expected behavior agreed upon by all the stakeholders.

Easier said than done, there are a plethora of challenges that the Performance Engineering team usually faces in this process , the most important ones being the absence of a 100% mirror environment of production (due to cost constraints), lack of enough unique test data for applications under heavy load (in tune of millions of hits per hour), stubbing or virtualization of test data in absence of enough unique test data (which fails to simulate exact production like realistic loads) and lastly the common practice of selectively targeting only business critical and high volume business flows for load and stress testing due to time constraints while neglecting other business flows which become potential threats as they can trigger a slowdown and an eventual crash over a time period by a simple memory leak or unexpected load behavior.

How the end user perceives the application/website performance will depend on a host of factors like the end device being used for access, network speed in that geographic location, code behavior and hardware capacity at different levels (back end, middle ware and front end), the overall user load at that point of time and a multitude of configuration settings (like thread pool configuration, message queue lengths etc) at the application, database and web servers.

A Performance Engineer in true sense has the responsibility of tuning all these components and configuration settings within them so that working together they deliver peak performance and do so at the minimum cost, almost with the precision of conducting a 1000 pieces symphony orchestra. Though, this might sound overwhelming at first and admittedly takes years to master, the Performance Experts know by experience that most of these performance issues can be usually attributed to a few broad areas and  improperly configured parameters which might have gone unnoticed or were overlooked by the developer and tester during normal development cycle.

The remaining part of this article is a non-exhaustive list of such probable problem areas and configurations to inspect when facing a performance degradation. In a holistic way agnostic to any specific technology, database or middle ware/web server (but mostly revolving around Java Enterprise implementations), the list is based on my personal experiences in similar roles and also talks about detecting and resolving the related performance issues using appropriate tools and methodology, assuming basic understanding of performance engineering related terms and concepts by the reader. This can also be used in parts as a high level checklist by a Performance Engineer to refer to before finally stamping the code and hardware configuration fit to go live.

  1.  Memory Leaks: Memory leaks are one of the most common reasons causing slow response times over time in a Java Code, arising out of improper JVM configurations and malicious objects. If left unchecked, such leaks can cause a complete shutdown (OutofMemory error). There are multiple ways to check for memory leaks as per tool and time availability like interpreting the output of  -XX:+PrintGCDetails on the console directly or monitoring GC behavior on Introscope (saw-tooth pattern). For digging deeper, taking heap dumps (jmap) and analyzing them on tools like IBM PMAT or Eclipse MAT plugin can give an in-depth information on the classes and objects causing the leak. Memory leaks can be resolved by handling the discovered classes/objects at code-level by the developer and tuning the MemArgs Settings for the JVM.  JVM Tuning is an extensive topic in itself requiring a through understanding of memory management in Java and beyond the scope of this article but on a high level involves using the most suited GC algortihms for minor and major GC  along with setting optimum values of Xmx, Xms, Survivor Ratio, NewSize and MaxTenuringThreshold parameters as per the expected allocation rate. It is a common practice to test different JVM configuration settings for the same load and select the one giving best performance at the end. One can refer to this wonderful guide for in-depth information on JVM tuning for best performance. Tools: JMX Clients (JConsole, JVisualVM), JStat (Command line tool), GCViewer, HP JMeter, Introscope LeakHunter, Profilers (hprof, AProf).
  2. Thread Blocks & Deadlocks: Abnormally high CPU usage, requests getting timed out and abysmally slow processing can be caused by thread blocks (one thread occupying the lock and preventing others from obtaining it) , deadlocks (thread A needs thread B’s lock to continue its task while thread B needs thread A’s lock to continue at the same time) or waiting threads at the CPU level. Best way to identify and fix a thread issue is to take thread dumps (also known as Java Dumps) at frequent intervals (jstack utility) and analyzing them using tools like TDA (Thread Dump Analyzer) to check the number of blocked threads, waiting threads or a potential deadlock pattern (such pattern finding can be done manually too but tools make life easier). getThreadInfo returns information difficult to acquire by thread dumps like the amount of time that the threads waited or were blocked along with a list of threads that have been inactive for abnormally long time-periods. Tools: Thread Dump Analyzer, Introscope (can take thread dumps directly), Samurai (open source), JConsole.
  3.  Gridlocks: Too much of synchronization to avoid thread deadlocks might inadvertently ‘single-thread’ the application. It can lead to very slow response times coupled with low CPU utilization as each of the threads reach the synchronized code and enter a waiting state. This can be confirmed by checking thread dumps for a large number of threads in wait state (Waiting or Timed Waiting) across multiple dumps taken consecutively. Gridlocks can be avoided by trying to use immutable resources and using synchronization sparingly, at the code level.
  4. Thread Pool Size:  In web applications thread pool size determines the number of concurrent requests that can be handled at any given time. A properly sized thread pool allows as many requests to run as the hardware and application can support without straining the hardware. A larger than optimum thread pool leads to unnecessary overhead on the CPU which can further slowdown the response while a smaller than optimum pool size will lead to lot of threads waiting for execution, thus again slowing down transactions. An ideal way for determining the pool size is to estimate the average number of users in the system using Little’s Law (average serving time multiplied by arrival rate) and size the thread pool to match this number (keeping buffer for occasional spikes) while also ensuring that this size is ably supported by the hardware (i.e. number of CPUs at disposal). If there is a sudden increase in user load (more than the expected spikes) or even a gradual growth over time (for example with increasing customer base), the thread pool size needs to be realigned accordingly for the response to be equally fast as earlier.
  5. Database Connection Pool Configuration: Database connections are relatively expensive to create. Hence they are created beforehand and used whenever access to database is needed. This also limits the amount of load coming from a particular application to the database as too much of load can crash the database and impact other applications (in case of a shared database). This pool size has to be optimized according to the load expected and hardware capacity of the database server. As is the case with thread pools, a small DB Connection pool will force the business transactions to wait for a connection to be available. This can be confirmed by monitoring the queue time and length , both of which will be increasing rapidly. Also, majority of the business transactions will wait on aDatasource.GetConnection () call while Database will show low resource utilization.  On the contrary a very large (larger than optimum) pool size will allow too much load to flow to the database and can slowdown business transactions across all application servers accessing this database. This is characterized by high SQL query processing time and high CPU utilization on the database, observed from DB logs or AWR Reports. The application will wait on DB query executions (PreparedStatement.Execute()). The golden value for pool size is just below the saturation point in general.
  6. Poor/Corrupt Table Indexes:  This is a very common cause for SQL queries taking a long time to process. Once we have the query causing the slowness identified (by say AWR report or db logs) we need to list out all the tables being called in that query and check for indexes on all those tables. More often than not, if the query has shown a sudden degradation in performance and the query in itself has not been modified , it can be fixed simply by rectifying/rebuilding the faulty indexes.
  7.  Badly Designed Stored Procedures/SQL Queries: Many times the existing Stored Procedures are modified (making calls to new tables, new join statements etc) to support new functionalities. If there is a noticeable degradation in query processing time when compared to a previous codebase, these modified procedures need to be checked as first suspects. It is very possible that some performance impacting bit has crept in the query (increased number of sortings or full table scans) which is causing such behavior. There are multiple query optimization tools available in the market like DBOptimizer by Idera, SQLSentry PlanExplorer, queryProfiler by DBForgeStudio etc but the best way to get around such issues is to get hold of a DB expert (if you are not one) and let him/her tune the query !
  8. Non-effective bundling / Excessive http requests: While testing front-end systems, if there is observed a degradation in page load times at the browser, one of the easiest ways to do root cause analysis is to observe the Network tab in Developer view on Chrome browser while keeping the page under inspection open. This not only lists all the objects (like stylesheets,  images etc) being downloaded with their size and total count but also the time taken by the browser to download each of them. This list can be sorted for maximum time consuming objects or object size and then compared with the older code base before degradation (if available) for the number of components being downloaded and their respective load times to pinpoint the cause. An increased number of components getting downloaded (without any change in functionality) will require more number of http(s) requests adding to the page load times and indicating non-effective bundling/packaging of objects which need to be fixed at the code level. For a long running test, we can use the Web Page Diagnostics tool offered by HP Loadrunner to get similar metrics on component level response times for a webpage.
  9. Faulty Methods in Specific Transactions:  If there are specific transactions in test or production which appear to be problematic w.r.t. response times, we can trace their execution paths and component response times using Introscope Transaction Tracer. Transaction tracer not only filters poorly performing transactions based on given filters but also helps to identify the cause (components) causing such behavior within those transactions. Introscope also provides the provision of Dynamic Instrumentation on the fly (attaching byte code to return more metrics from a particular method selected for deeper investigation). The identified components need to be isolated and fixed (mostly at code level) in order to get things back to normal.
  10. Poor load distribution / Improper F5 Configuration: Load balancers are used for increasing the concurrency and reliability of large systems (by redundancy) but if  they do not have enough resources or are not configured properly (for example incorrect weights assigned in Weighted Round Robin (WRR) or Weighted Least Connection (WLC) algorithms), they can actually decrease the performance of application by putting too much traffic on a single server in the cluster or acting as single point of failures.  Load balancing issue leads to unequal distribution of load across servers and the same can be easily confirmed by monitoring CPU  usage, memory utilization and logging activity on all the servers sharing the load w.r.t. each other (the instances not taking load will be in ideal state). This overload of traffic on the remaining servers if left unchecked can cause severe performance degradation and might eventually lead to a crash because of not enough hardware available to handle the excess load. Apart from the load balancer configuration, such a behavior can also be caused by all the server instances not coming up properly to take up the load after a restart or new code deployment/ configuration change.
  11. Too Many Context Switches:  A high context-switching rate  (switching of CPU from one thread to another) indicates an excess of threads competing for the processors on the system. Context switching is a computationally intensive task and an unusually high switching rate will adversely impact the performance of multiprocessor computers. Excessive switching can be caused by excessive page faults caused by insufficient memory or a processor bottleneck w.r.t load . This can be monitored using sar -w and proc/[pid]/status on the server or by enablingRSTATD on the server being tested and observing the metric ‘Context switch rate’ in LoadRunner. Context switching rate can be checked by reducing the number of active threads on the system by the use of thread pooling or disabling hyper-threading if enabled.  The ultimate solution is to bring in a more powerful processor  or simply adding an additional one to the existing.
  12. JDBC Connection Leaks:  Connection leaks happen when we open a connection to the database from our application and forget to close it, or for some reasons it doesn’t get closed by the code. Connection leaks saturate the connection pool for new connections to be established causing major timeout and slowdown issues. Such leaks can be investigated by analyzing thread dumps for total number of created, active and idle threads or monitoring JDBC pool utilization using Introscope. Once confirmed, there are multiple ways of resolving this issue based on the implementation. For example using Weblogic Profile Connection Leak mechanism to pinpoint the root cause,  using Spring JDBC templates or using connection pool implementations which offer the option of forcefully closing connections (or notify about connection leakage) based on predefined conditions (the TomCat connection pool has logAbandoned and suspectTimeout properties to configure pool to log about possible leak suspects).
  13. Saturated Message Queues (MQs): A much larger than expected load can saturate the message queues and cause timeouts at the application level. If there is a message queue implementation in place and a slowdown is observed beyond a particular load during testing, it is a good idea to check the queues for saturation and flush them (in test) if saturated. Queue lengths can be monitored using Sitescope or by logging into the MQ servers. In case of  repeated saturation, the queues need to be optimally reconfigured as very large queue lengths can also cause a slowdown by hogging too much of memory. Relevant attributes to configure is Max queue depth in IBM Websphere MQ.
  14. Saturated Disk Space by excessive logging: As trivial as it might sound, it is quite a possibility that the disk space gets saturated earlier than expected (on account of some recurring error making the logs grow 100 times faster than usual or an inadvertent change in logging mode to error/debug from info) causing the response times to slow down before starving and the server thus refusing to process any further requests. With hundreds of metrics to monitor and inspect, this is also one area which tends to be easily overlooked at times. Disk  space utilization can be checked using du/df commands or monitored through Sitescope in real time. Once detected, we need to figure out the cause (look for errors and logging mode), fix it, clean up the disk space (or take backup) and then start all over again.

In addition to the problem areas listed above, it is a good idea to keep an eye on the Stall Count metric on Introscope where consistently high values imply slow backend, periodically high values imply load related bottlenecks in the system and progressively increasing count implies resource leaks. For more advanced diagnostics, custom Probe Build Directories (PBDs) can be written and deployed and the preferred metrics thus configured can be monitored via Introscope graphically. 

References:

  1. http://javaeesupportpatterns.blogspot.in/2011/02/jdbc-pool-leak-using-weblogic-90.html
  2. http://businesswireindia.com/news/news-details/wily-technology-introduces-new-tools-introscope-transaction-tracer-int/2491
  3. https://plumbr.eu/blog/io/acquiring-jdbc-connections-what-can-possibly-go-wrong
  4. https://docops.ca.com/ca-apm/9-6/en/using/ca-apm-metrics/the-five-basic-metrics#TheFiveBasicMetrics-StallCount
Advertisements

2100 Solutions launches NEW Cloud Based, Ruby on Rails website!

Here is a URL for a Ruby on Rails website I just launched using the Godaddy Cloud Server today: http://192.169.168.247:8080/

I have tried AWS, Open Shift, Dreamhost, etc. Godaddy was the easiest.

The IP:port still needs to be updated so that the Domain Name for 2100Solutions.com appears.

One last little piece of magic left to go….but I couldn’t wait to show it…

Ruby, Ruby On Rails, SQLite3, RVM, rbenv, bundler, all source code, launched from the command prompt @ Godaddy VPS and it all works.

Key Needs for most Data Management Project

Thoughts and Key Strategies for Test Data Management:

Data Acquisition

  • Data Profiling and Distribution
    • Data that includes confidential or sensitive information can still be shared, however. There are a number of steps researchers can take to protect subjects’ privacy2:
      • withholding part of the data
      • statistically altering the data in ways that will not compromise secondary analyses
      • requiring engineers who seek data to commit to protect privacy and confidentiality
      • providing data access in a controlled site, sometimes referred to as a data enclave.
  • How is it provided
    • The data have its own device name and data name. The data is stored in a separate file with binary format. Each file has “n” number of columns for data, which generally represent various data features.
    • from stored historical data
    • Active systems / projects
      • from dynamic / fresh data
    • Web site/platforms
    • Mobile / OTA
  • Automated, Tool or Manual Data Preparation
    • Automated Accelerators
    • Tools such as Informatica, etc.
    • Manual teams – most used – and can be up to 40-50% of the project spend
    • defining test data
      • to be used in a test case,
      • copying or creating test data,
      • securing or masking sensitive data,
      • and so on—are taking up more than half a developers’ and QA team’s time during the testing phase.
  • Data Integrity
    • Data must be archived in a controlled, secure environment in a way that safeguards the primary data, observations, or recordings.
    • The archive must be accessible by engineers analyzing the data, and available to collaborators or others who have rights of access
    • Primary data should be stored securely for sufficient time following publication, analysis, or termination of the project.
    • The number of years that data should be retained varies from field to field and may depend on the nature of the data and the research.
    • Keeping data partitioned and limited to view by assigned groups
  • Data Maintenance and Refresh
    • When determining the appropriate storage format for  data, consider
      • What will you do with your data once you acquire it?
      • Will you write and read data with the same application?
      • How much data will you acquire?
      • At what rate will you acquire data?
      • Will you need to exchange data with another program?
      • Will you need to search your data files?
  • Data Governance
    • Rules and Policies for testing phases and teams
    • How is it used

Without Test Data Management

Define Test Data Requirements

  • Tester Efficiency
    • Cost Per Test Cycle = $2.7M
    • 100 Days, 300 Testers, $90/hour
    • 20% application defects due to bad test data
  • Application quality
    • Cost of a defect found later in the release cycle = XX

With TDM

Define Test Data Requirements

  • Improve Tester Efficiency by 30%                                         Improve Tester Efficiency by > 30%
    • Cost Per Test Cycle = $1.6M                                           Reduce Testing cycles by > 25%
    • 60 Days, 300 Testers, $90/hour                                       Improve application quality by 90% Tester Efficiency
    • 90% reduction in test data-related defects
    • Cost avoidance

ref: Informatica “Why Do You Need Test Data Management?” 
ref: Forrester Report:  “http://tealium.com/assets/pdf/Forrester_Boost_Digital_Intelligence.pdf

Vendor Management Service Transformation: Entry 1 – Re-Factoring, Businss Architecture

metodo_pratiche_agile_chart_manifesto_itaEntry 1    3.22.2015

I recently was invited to join a project for a Vendor Management Service (VMS) in Mid-March 2015. The project is to provide in Phase 1 a re-factoring of our Client’s code by replacing the hardcoded middleware with services, and adding new client facing features, along with a new UI. All needing new documentation, of which there is now verylittle.

Our client provides a turnkey service for managing IT vendors who need to outsource their HR, Recruiting, Accounting, and Financial Services for this aspect of their business.

My role is to document the present legacy Business Processes, the new Processes, the new Services and the newly re-factored APIs, processes and added features by providing the Requirements, Use Cases, Workflows and Processes.

The leadership on this project is not only setting the pace, but shining a bright light into the future vision for this client, and for the VMS industry. It is a privilege to work with them.

 

Presently, I am awash in the project ramp-up and assimilation of the many layers, features and infrastructure required to successfully launch a program as complex as this.

We have two teams: one is onsite with FTE EEs of the customer, and a fly-in contingent of our leadership. The other is an offsite team in Atlanta, that is providing an AGILE based component for delivery of the new code which provides the new Service APIs and integration; as well as Leadership, Business Architecure, Process Articulation & Documentation. The client will observe the present SDLC based approach for now.

We have defined the primary users and their roles, the features – both new and old – associated with their roles. The functionality of these features some of which, for now, will remain as legacy, while others are new. There are around 400 of these. Some are Epic, requiring some of the features to support the workflows.

For the new and replacement pieces (in AGILE) we have defined the primary “Day in the Life” from the “need to the ass in the seat” E2E process to establish a critical Happy Path. Variations and UCM will be modeled based upon this primary structure.

The software and coding will be the same, albeit updated. The specific usage of the system will vary based upon the needs and systems of client-users of this system.

The SDLC pieces for things like the DATA, and QA will be driven from the client sites.

I will be updating this log at various points along the way…. so STAY TUNED!!

Bill Fulbright

Sneaky Gollum

Security in 2015 – What measures will be implemented to improve it?

by ,  Sr. Quality Strategy and Delivery Advisor

Is Security in your environment covered?

  • Do you have a full rundown/analysis of the gaps you may have in your system?
  • Have you created a checklist of all:

    • touchpoints
    • protocols
    • profiles
    • methods of transmission
    • firewalls
    • frequency of sweeps
    • frequency of security monitoring status reports?
    • Do you have a network ops monitoring application?

imageThis is not the year to be avoiding the security risks afoot … not only from your own employees, random local hacker, but serious international hacking as pro-active attacks on your system. 2014 demonstrated an increase in security leaks – or might i say exposure of weak security by upstream hackers with malicious intent. Expect more of it. Breaches have been happening every year for some time. We are no longer surprised by them. It is another overload of input that we as consumers can do little to prevent.

Prevention of security leaks are up to those responsible for maintaining our private accounts and data. That they have allowed weaknesses that are gaps, and hackable, is irresponsible, unacceptable, and once leaked causes much damage financially, and personally.

What is a Threat Agent?

The term Threat Agent is used to indicate an individual or group that can manifest a threat. It is fundamental to identify who would want to exploit the assets of a company, and how they might use them against the company.  You can read more about it here:
https://www.owasp.org/index.php/Category:Threat_Agent

Here are some Highlights from Open Web Application Security Project “Attacks” references:
https://www.owasp.org/index.php/Category:Attack

Looking forward to seeing deeper security measures, and fewer assailable gaps by our financial institutions and retailers.

All comments invited.

Internet of Things – The new User Interface – Do we need new test tools?

Internet of Things – The new User Interface – Do we need new test tools?

Director Test Strategy and Consulting

JwristPADust wondering. Internet of Things will be massive. Wearable devices for Health, Medicine, Communication, Entertainment, Functional Workplace Applications, etc. There are as many applications under development and those we haven’t seen, that will challenge the test methodology we use to test our present systems and environments.

Imagine the test required for a brain wave synchronizer, being driven by an application and data residing in the cloud, that will capture the experiential responses as well as govern them for the user. The uses in this case are vast. Relaxation, Accelerated Learning, Medical monitoring of Brain Wave activity, treatment of ADHD, transmission of said data to and from subscribers, etc. I can imagine the Test Strategy Document, Test Plan, the Lab work, the logistics and Planning. Test resources with the skills to run the full gamut of tests? This was a product I developed back in the 80’s, but I was the testing guinea pig!!

intel-wearable-featWe will need to step it up, to keep up with the variety and depth of new applications. Creative thinking, innovative approaches to capturing the device dynamics, and reporting those as metrics… I think it is a very exciting time, and we will see this explosion happen over the next 15 years. It is inevitable.

You might want to consider: What does this mean to you? How will you remain relevant? Does this mean your present skills are already obsolete, or that you will have to learn something new (I certainly hope so!)

Let me know how and why you think this will impact your testing career!

Bill

BPM Testing – Get the Best Bang for your Buck!

QA2100-BPMTesting-16MP

Interested in getting the best bang for your buck with BPM Product Design, Development, Strategy, Testing, and Implementation?

Need a lift?  We can help!

Give us the opportunity to provide you with our assessments.  We have USA resources, and fully experienced offshore capacities for development, testing and delivery.

We have lived it for over 8 years and provided some of the finest products in the Insurance and Banking Industries.

Ccropped-facebookcover.jpgontact Bill Fulbright
Company: QA 2100 Test Strategy and Consulting
Website: http://qa2100.com
Email: bill@qa2100.com

8 Key Factors for Cloud Delivery: Eight CIO Recommendations

To thrive in today’s swift-changing and unforgiving marketplace, companies need accessible, agile and adaptable IT.   Flexible service delivery is the answer.   Here’s how to employ it.   In the post-PC era, IT decision makers have a choice to make: Stay with the platform that got them here? Adopt a private or public cloud? Perhaps IT as-a-service or a mix of all the above?

CIO_STATS_mapping_the_enablers

Whatever you decide, a move away from rigid IT infrastructures is a move in the right direction. According to a recent McKinsey & Company survey, more than half of surveyed officers cited the switch to flexible service delivery as a top priority

The reason is simple: Flexible delivery is more adaptable and costs less. It’s a smarter way to distribute IT to users.

We recently confirmed this with a large U.S.-based telecommunications client that needed its technology to scale to tens of millions of subscribers on demand and then let those same subscribers pick and choose online the services most important to them.

Originally, the company considered a traditional infrastructure and then briefly a public cloud. But after completing a holistic analysis, we determined an internal private cloud would be the best option for three reasons: 1) It met the client’s needs (i.e. time-to-market, minimal downtime, continuity and rapid scalability requirements); and 2) It was less expensive over time when compared with the public cloud; and 3) It left open the option for later incorporating a public cloud if desired.

The beta test verified that the model enabled rapid elasticity, continuity and increased uptime. Mission accomplished.

You can achieve the same results. But only after you consider the following eight recommendations we’ve used to transition clients to flexible service delivery:

1.  Determine your biggest pain point.
No one’s going to say “no” to faster time to market, enhanced computing flexibility, improved performance, tighter security and better service at a lower cost. While all of those areas can certainly be addressed through flexible service delivery, the model you choose will largely depend on your top pain point. In other words, you’ll need to honestly answer the following: What keeps you up at night?

2.  Decide which functions to shift.
Next, you’ll need to designate which functions and processes to switch to flexible service delivery. This entails a “core vs. context” analysis, in which you distinguish business activities that provide you with a competitive advantage from those that should be offloaded to external providers. Remember, what was previously considered a core activity is now often viewed a contextual one, including (but not limited to) network management, tech support, performance measurement and financial planning.

3.  Start with a low-risk pilot program.
For established companies, it’s usually best to dip your toes into flexible delivery before diving in headfirst. You can achieve this by developing a pilot program for non-production processes, back-office functions or anything else that has lower performance requirements and less impact on end-users, such as testing and development. In some cases, however, it might make sense to start with customer-facing applications if there’s a pressing need to market new products.

4.  Identify “chatty” applications.
To get the most for your money, you’ll need to determine the consumption levels of your CPU, memory, network and disk storage. In doing so, you’ll be able to identify “chatty” applications that sometimes incur surprising surcharges between the cloud provider and the business. The more you know, the more you save.

5.  Mind those non-IT bottlenecks.
Once IT has been upgraded to the equivalent of a 12-lane highway with the help of flexible delivery, other parts of the organization cannot remain in horse-and-buggy mode. Well, they can. But they’ll become a bottleneck to your business. The challenge is to optimize the entire enterprise so that other areas don’t hold up the software lifecycle. To do this, you’ll need to educate and update your company culture.

6.  Compare service requirements.
Buyer beware: Most public cloud providers offer standard service level agreements that cannot be customized according to client needs. In some cases, general service levels may be sufficient for a development environment. But they often fall short of the demands of a production environment. To find a service level best suited to you, you’ll need to know the difference.

7.  Check security qualifications.
Security is a top-of-mind consideration, particularly for applications and systems that handle personal information. To ensure your data is protected, always certify a provider’s security qualifications. And know that cloud providers typically conduct security audits at a more intensive level than companies hosting internal private clouds.

8.  Demand transparency with a daily dashboard.
To manage variable costs, you’ll want to monitor your capacity and all of its operational parameters — straight down to the lowest server — with a daily dashboard. With this level of transparency, you can make day-to-day decisions about the level of CPU, memory and storage required and use metrics and trends to make decisions about future capacity financial modeling.

Admittedly, the changes required to move to a flexible service delivery can seem overwhelming. But by following the above advice, you’ll put yourself in a better position to find your own success.

For more information, read our white paper on Flexible Service Delivery (pdf), get inspired by our enabler series on The Future of Work or visit Cognizant Business Consulting.

Uncover the key factors to consider before designing a flexible service delivery model. Read on to find out more:

http://www.cognizant.com/latest-thinking/perspectives/Pages/finding-flexible-delivery-success-eight-cio-recommendations.aspx#.VE1ilqK-f8R.twitter