Our financial services customer uses a third-party application which provides regulatory compliance reporting.
Key users ran a report each day that took 25-30 minutes to complete. After migration to a VDI desktop user environment, the report would take up to an hour, and this meant that time-critical compliance processes might miss their target completion times.
As part of the move to VDI, the reporting application was packaged to run on a XenApp server. The user accessed the published application via an ESXi-based VDI solution. The published application used a single active application server and a database server.
We saw the following protocols used by the system:
- SMB2 (Server Message Block Version 2)
- TDS (Tabular Data Stream, used for Database communications)
- ICA (Citrix access to VDI and application server)
Users are located in the UK and Ireland. The servers are hosted in two UK datacentres.
Network traffic was captured initially at the application server. The report took 57 minutes to complete.
This showed that the client accessed the application server via SMB2 and that although the client sent the application server about 180,000 requests, the total time to service those was about 4 minutes. Therefore, the application server was not causing any significant delay.
The next step was to examine what the client was doing. This can be quite tricky in a virtualised environment as normally the application client might run on any available XenApp server. We pinned the application to a single XenApp server and captured the network traffic on that server so that we could see all client-side network traffic. We saw that the client was even more chatty than was at first thought. We observed much more traffic with this capture, the same 180,000 requests to the application server but also about 590,000 requests directly to the database server.
Analysis of the capture showed that the time for the application server to respond to the 180,000 requests had increased from 4 to over 10 minutes. This indicated that the network round-trip time from the XenApp server to the application server (and the database) was about 2.3 ms.
The total network time for the 770,000 requests was about 29 minutes, but although this was a very significant time, there were many simultaneous TCP sessions to both the application and the database servers, and if many of these were concurrent then it would dilute the effect of this 29 minutes on the overall elapsed time. So, we still couldn’t be sure that the 2.3ms network time was the cause of the problem without undertaking some further analysis.
In order to see how much concurrency there was, we had to find how much time was spent where the XenApp server was not waiting for a response from either server. We produced a program to calculate this time through the network capture. It was found to be just over 20 minutes.
The total time to run the report was 3434 secs
Gap time (client not waiting) 806 secs
Time client waiting (total time minus gaps) 2628 secs
Total measured response time 2638 secs
This shows that there was 10 seconds of concurrent requests. This could be discounted as it was such a low value. Therefore, the delay due to network round trips is the full value stated earlier. This means that moving the client running on XenApp into the same data centre as the other two servers had the potential to improve the time by about half-an-hour.
The solution was very simple. Our customer provisioned extra XenApp servers in the same data centre as the two other servers and ensured that the application was run from those. This reduced the time taken to run the report back to the expected 25-30 minutes.