Testing Mediation Server Clustering

The following is a HA Notification Transport Test Plan, testing mediation servers, expected, and actual results.

Test Environment

The tests listed below assume the following deployment configuration:

• One Application Server

• A mediation cluster consisting of two distributed mediation servers configured as an HA pair, where one of the mediation servers will initially be the Active server and the other server will be the Standby server.

Testing Procedure

The procedure for each of these tests includes sending 30,000 events to both the active and standby mediation servers. Testing sends the events at a continuous rate of 100 per second for 300 seconds (5 minutes).

The test starts once the first event has been sent and concludes once all 30,000 events have been processed by the mediation cluster.

At the conclusion of each test, logs are examined. The log content provides the test results. Augment standard log content to include the following statistics:

• Mediation Role: Active or Standby

• Number of events delivered to the application server

• Number of events discarded

• Number of events currently spooled

• The last receive timestamp

These statistics provide a clear indication of the work that each mediation server is performing.

Because of potential system clock differences between the mediation servers, network latency and processing loads on the mediation servers it is likely that the same event delivered to both mediation servers will have slightly different received times. Because of this, the standby mediation server may hold onto a batch of events even though the active server has already delivered the batch. So in various test scenarios, when the test completes the standby server may not have discarded all 30,000 events. Delivery of additional events (beyond the 30,000) would cause the standby to discard the remaining events.

This also leads to differences in the batching between the mediation servers. The active server may process 47 events in its first batch while the standby server has batched 53 events. In addition, since the standby server is effectively running behind the active server, doing all of its processing based upon a last received time stamp posted by the active server, potentially 15 seconds ago, the standby server should always be running behind the active server. So when the expected results state something like “Before process termination the standby server discards all events,” that statement is relative because up to 15 seconds of events may not have been discarded. That is acceptable for these tests.

Time Difference Vs Active Value in Log Output

Redcell calculates the time differential for heartbeats between mediation servers, subtracting the time that the active server created the heartbeat message between servers from the time it was received by the standby server. Both of these values are in the clock time of the respective server.

Notification transport uses this value to ensure that, regardless of the difference in system time between the servers, the standby server will not drop these notifications before the active server forwards them to the application server, and to ensure that the standby server will not retain excess notifications already processed by the active server.

Tests

The following describes test details, and examples of the expected and actual results of testing.