Mediation Clustering

You can configure mediation servers which can fail over SNMP/CLI requests, but failover for data collection is not supported for performance monitoring. For redundancy, you can run identical data collection profiles on multiple mediation servers, however, but it creates much more network traffic.

When you cluster mediation servers as a High Availability (HA) pair, Redcell typically designates one as the active server for HA services while the other serves as a standby server. The standby server does not forward data unless the active server becomes unavailable. When initially starting the mediation servers, the server that starts first is Redcell’s active server. When you shut down the active server, or it stops responding, Redcell promotes the standby server to active. If a server process shuts down, it is no longer considered active. However, if the network connection is lost to the active server, but the server process continues to run, it is still considered active, and once the network connection is restored, both servers will temporarily be considered active. Both servers will forward data to the application server, and the application server will eliminate duplicates. When we detect that the connection to the disconnected mediation server has been restored, and that both servers are now active, the server with the lowest IP address remains active, and the other server is demoted to standby. This process takes approximately two minutes for detection and then an additional period of seconds for the downgrade/promotion to occur. The standby should hold any data which had not been processed during switchover and that backlog should be processed when it becomes active.

See Independent and Clustered Mediation Agents for the alternative when mediation agents have to communicate with clustered application servers.

[spacer]

Best practice for changing properties in the following steps assume that you override the mentioned properties with properties pasted in owareapps/installprops/lib/installed.properties, rather altered than in the mentioned properties files.

Follow these steps to configure Mediation server clusters:

1. (For MySQL installations only) If the installation has not already done so, override the database host name originally specified in oware/lib/owdatabase.properties.

2. Set oware.config.server in owareapps/installprops/medserver/lib/installed.properties, setting it to the primary mediation host’s name (for example: aaron).

3. This property need to be enabled, only if the MBeans (Trap / Data Collection) needs to be setup in Forwarding/Standby Mode so that mediation servers work as peers and have listener failover capability.

com.dorado.mediation.listener.use.high.availability=true

[spacer]

The lowest numbered IP address for an application cluster member tends to be the most active cluster member, so making the highest one the config server is best practice. You can check whether a particular member is active or standby by consulting http://<memberIP>:8080/jmx-console and looking at the appropriate mbean. For mediation clusters, jobs are typically distributed among all members of the cluster so no traditional active/standby distinction applies--all members are active and all can handle any job for that cluster.

An HA “cluster” consists of only two mediation servers. You can have more than two, but that just enables more than one standby server (something we do not test). If you set the HAproperty to false, all clustered mediation servers are active at the same time, and any number of mediation servers can be in the same cluster.

4. Mediation server peers use a multicast address for communication. If the server requires another mediation server peer setup, change this multicast address for that peer.

com.dorado.mediation.listener.multicast.intercomm.address=228.0.0.200

For an alternative to multicasting, see Disabling Multicast.

5. Uncomment the following sonicmq jms properties in /oware/lib/owjms.properties:

 

#jms.provider=SONICMQ

#jms.qf=QueueConnectionFactory

#jms.tf=TopicConnectionFactory

#com.dorado.eventchannel.VendorInitFactoryClass.sonicmq=com.dorado.core.jms.OWSonicMQInitFactory

#com.dorado.jms_vendor.port.sonicmq=2506

#com.dorado.eventchannel.protocol.sonicmq=tcp://

[spacer]

After making the change to the above properties, you must re-source the Oware Environment.
Also: You can safely ignore the error messages about not finding a JMS provider in the application server shell.

6. An example of the mediation cluster’s start command:

startmedagent -c PrimarymedServer (aaron) -a primaryappserver (laika) -p ClusterName (qacluster) -m address (228.0.0.200)

For an alternative startup, see Starting Clusters Durably. Remember, start the primary first. Wait until it has completely started before starting the secondary.

[spacer]

When setting up a failover mediation cluster in owareapps/installprops/medserver/lib/installed.properties, do not set up an application server cluster for the mediation servers. The config server for these mediation clusters is the same machine the individual mediation server runs on.