Saturday, July 5, 2008

EJBCA HA best practices

There are many ways to design a HA system taking all considerations into account. After dealing with this issue for a couple of years, here is our teams experience on what works and what doesn't work.

There are two important components in a HA EJBCA setup:
  • Database
  • EJBCA application server
The database is by far the trickiest to set up in HA-mode. The database holds everything that is really important in an EJBCA setup.
In case of failure, everything can be re-created from the EJBCA distribution except the database contents.
A full HA setup would look like:
  • Load balancers in front of the EJBCA app servers
  • EJBCA app servers using a single HA database on a single ip
  • Load balancers in front of the database cluster
  • A HA database cluster
This is of course expensive and this setup is suitable for organizations with dedicated database/app server/load balancer groups that have the resources and knowledge to handle this kind of system.

Most shops however simply don't want, don't need, or can't handle that kind of complexity.

Another alternative, that does not provide full HA, but that does provide very good data safety with short fail over times is:
  • Two combined EJBCA/database servers with three ip's, one real for each server and one "virtual" that can be moved.
  • Node 1 has the virtual ip by default.
  • Database master on node 1 that replicates, in real time, to node 2.
  • EJBCA running on both nodes using the "virtual" ip as database ip.
  • If node 1 fails, a script must be manually run that changes the virtual ip to node 2, and restarts app server on node 2. Now node 2 is master and single point of failure while node 1 is brought up again.
  • When node 1 is brought up again the system is either restored to original state with node 1 as master (requires restoring database on node 1 and reseting replication), or node 2 is now the master and replicates to node 1 (requires starting replication in that direction).
Other alternatives that you might start to look at is to include software load balancers and automatic fail-over scripts in the combined servers.
In our experience this is not a good idea!
In most cases this setup will cause more problems than it solves and your issues will originate from the load balancing software/fail-over scripts not working instead of the database/EJBCA not working.
If you are not sure what you are doing and has done this kind of setups several times before, stay away from it.

No comments: