Tuesday, January 26, 2021

Performance! How to use EJBCA as a Massive PKI!

Are  your CRLs are scaling out of proportion, clients are complaining about timeouts and your VA is on its knees? Are your certificates counted not by thousands but by the millions? Never to fear, EJBCA is designed to handle some of the world's biggest PKI, so read on to find out how!

What can you do about your Architecture?

A large part of scaling up is adapting the architecture of your PKI to meet the requirements. Before we move on, here are two basic design choices that are the key to every large scale PKI.

Clustering and Load Balancing

The first step to being able to handle more issuances and traffic is by clustering: splitting the load between several instances of EJBCA working in concert. This has the added bonus of adding a layer of reliability to your PKI, a cluster can always survive one or more of its nodes failing. EJBCA has been designed to allow for hot-upgrading, meaning that your PKI is still active and running while the nodes in your cluster are running different versions of EJBCA, with zero downtime as a result. 


Likewise, clustering can be performed on VAs or RAs to easen to load on your PKI, depending on where your PKI is having performance issues:

  • If you're experiencing long response times or timeouts in your VA infrastructure, then either the VA's HSM or the database are overloaded by queries - this can simply be solved by adding more VAs, but also by clustering the VA instances. 
  • If you're issuing/revoking certificates in large volumes, clustering the CAs will allow more nodes to do the work of revoking and publishing. Each revocation sent out to the VA's is just a single write per cluster.
Essential if your have multiple VAs/VA clusters is to place them behind a Load Balancer in order to balance the load on each VA. 

Database Sharding

For databases of extreme volumes, it may be desirable to shard the database over several database instances in order to save space.  By setting the following value to true in database.properties:

database.useSeparateCertificateTable = true

the certificate body will be stored in the table Base64CertificateData instead of CertificateData


Base64CertificateData can then be sharded and placed on a different database volume. 

CRL Partitioning

If your population of unexpired certificates is large and you rely on CRLs, you might start finding that CRL generation times are beginning to spin out of control, and that CRL sizes becomes unmanageable. EJBCA supports CRL partitioning in accordance with RFC 5280, allowing certificates to be assigned to a specific CRL shard. 

CRL partitioning means that instead of a single CRL, the CRL is split into several shards. As the shards grow themselves in size, EJBCA allows you to suspend shards, automatically creating new ones. 

Service Pinning

In a clustered EJBCA instance, service execution happens at semi-random, the service being run by the first node to activate within the granted service interval. If some services - for example generating CRLs - are taking an inordinate amount of time, you may be experiencing latency in the cluster node executing the service, leading to intermittent delays being experienced while the service is running. The easiest solution is to pin the service to a single node and remove that node from the load balancer's roster, meaning that all service executions will happen on that node only, while enrollment, issuance and revocation operations are processed on the remaining node. 





Ephemeral Certificates

EJBCA can be configured to function as an Ephemeral Certificate CA. In this mode EJBCA simply functions as a high speed certificate factory, issuing certificates but not storing any trace of them in the local database.

While the mode still allows for revocation of certificates, it does not allow for certificates to be searched for in the database or for any constraints based on existing certificates to be enforced.

Precompiled OCSP Responses

Each OCSP reply requires an individual signature by the crypto token on the VA. While generated responses are cached by the EJBCA VA,  validity times of OCSP replies are commonly short (< one day) and caches are not shared between nodes in a cluster, thus responses still need to be generated anew frequently. The traditional solution to this has been OCSP Stapling, caching the first reply encountered in the http proxy. While this may solve the problem to some extent, it moves the burden of administration of caching the replies over to you.

Instead, EJBCA offers Precompiled OCSP Responses. Colloquially known as Canned OCSP, this functionality allows a VA to generate the full set of expected OCSP responses on a regular schedule within a set timeframe, when there are expected lulls in traffic. For any PKI, this will dramatically decrease the latency of the VA infrastructure.