Running Elasticsearch in Production. The Ultimate Checklist

In Elasticsearch by Aurimas MikalauskasLeave a Comment

elastic-graph
Running Elasticsearch in dev and production is day and night difference. For dev, you just run “elasticsearch” and you’re done. In production, you would almost never run one instance anyway, but there’s a lot more you need to do and check when running a production Elasticsearch cluster.

Here’s a quick checklist for you to go over every now and again, if you don’t want to get into trouble with your cluster:

Software version:
Elasticsearch is Up To Date (latest version lives here) ,
Java or OpenJDK is Up To Date (latest version lives here) .

Elasticsearch Configuration:
Cluster name is NOT “elasticsearch” ,
Nodes have meaningful names ,
Data path is set explicitly ,
Elasticsearch is set to start at server startup ,
Unicast is used, not Multicast ,
System variable gateway.recover_after_nodes is set to the actual number of nodes ,
System variable discovery.zen.minimum_master_nodes is set to n/2+1 where n = number of nodes ,
Backups are in place (howto) .

JVM Heap :
ES_HEAP_SIZE is set to ~50% of available memory ,
ES_HEAP_SIZE is lower than 32GB ,
Garbage collection is fast and frequent .

OS Configuration:
Elasticsearch can open enough file descriptors ,
Server isn’t swapping and is set up to never swap ,
ES is configured with bootstrap.mlockall = true ,
Server port is NOT world accessible

Hardware Configuration:
Got as much Memory as I possibly could ,
Got enough CPU cores to keep CPU usage in control ,
Got SSD Disks if I could ,
I’m using RAID0 (if RAID at all) ,
I’ve configured noop scheduler ,
I’ve got low latency network ,
Not a split data center setup .

To finish off, here’s a list of Elasticsearch URL’s for monitoring the the cluster. You are monitoring your cluster, are you not?

  • /_cluster/health – status here can be used as an ultimate indicator of cluster health,
  • /_status – all information about all the indices (can be used with single index too),
  • /_nodes – all information about the nodes,
  • /_stats – all of the statistics. I’ll write some day what do all of these mean and what are the most interesting ones.

Share this Post