Our client has a fund transaction management platform based on WinTel technologies, that allows users to receive, acknowledge and process fund transactions. This service is made available to fund managers in the UK and off-shore and accessed via a web browser. The users noticed a sudden increase in the time taken to acknowledge receipt of fund transactions, along with intermittent failure to process transactions at the end of the day.
The technologies involved were:
- A 64-bit Java SE 6 application
- JBoss application server on Windows
- Microsoft SQL Server database
- MQ connectivity to SWIFT network
- FIX connections to Euroclear EMX
The Windows Server components were virtualised on VMWare ESXi and distributed between two data centres in the US.
We processed network packet data collected at the network interfaces of the application and database servers to visualise application and database response times.
The above graph shows that the transaction service times observed at the database server are as expected and acceptable. The response time measured at the application server User Interface is different.
Although the majority of response time measured at the UI are less than one second, there are a significant number of higher value. We then moved on to compare this performance with that of the database.
This finding resulted in the investigation focusing on end-to-end analysis of several slow transactions. We found that the application server experienced slow responses from the database due to packet drops in the network path between the application and database servers. An overload condition in the core network infrastructure caused the excessive packet drops.
As the platform was virtualised, it was easy to move both the application and database servers to other VMware ESXi servers serviced by a different part of the core switched network that was not overloaded. This workaround resolved the immediate problem and gave the network team time to re-engineer the overloaded portion of the network.