Tuesday, 5 April 2022

Reconfiguring Elasticsearch Clusters: A Guide to Cluster Configuration

Elastic Search Cluster: 


Elasticsearch cluster is a group of nodes that have the same cluster.name attribute. As nodes join or leave a cluster, the cluster automatically reorganizes itself to evenly distribute the data across the available nodes.

If you are running a single instance of Elasticsearch, you have a cluster of one node.



Types of nodes in Elastic Search: 


  1. Master Nodes: Control of the cluster requires a minimum of 3 with one active at any given time.
  2. Data Nodes: Holds indexed data and performs data related operations Differentiated Hot and Warm Data nodes can be used -> More below.
  3. Ingest Nodes: Use ingest pipelines to transform and enrich data before indexing.


You define a node’s roles by setting node.roles in elasticsearch.yml. If you set node.roles, the node is only assigned the roles you specify. For Example: 


To create a dedicated master-eligible node, set:

node.roles: [ master ]

To create a dedicated data node, set:

node.roles: [ data ]


To create a dedicated ingest node, set:

node.roles: [ ingest ]



Minimum of Master Nodes:


Master nodes are the most critical nodes in your cluster. In order to calculate how many master nodes you need in your production cluster, here is a simple formula:


N / 2 + 1


Where N is the total number of “master-eligible” nodes in your cluster, you need to round that number down to the nearest integer. There is a particular case; however, if your usage is shallow and only requires one node, then the query is 1. However, for any other use, you need at least a minimum of 3 master nodes in order to avoid any split-brain situation. This is a terrible situation to be in; it can result in an unhealthy cluster with many issues.


This setting should always be configured to a quorum (majority) of your master-eligible nodes. A quorum is (number of master-eligible nodes / 2) + 1. Here are some examples:


If you have ten regular nodes (ones that can either hold data and become master), the quorum is 6
If you have three dedicated master nodes and a hundred data nodes, the quorum is 2.
If you have two regular nodes, you are in a conundrum. A quorum would be 2, but this means a loss of one node will make your cluster inoperable. A setting of 1 will allow your cluster to function but doesn’t protect against the split-brain. It is best to have a minimum of three nodes.


Sharding Impact on Performance:


  • What is a shard? 

Each shard is a separate Lucene index, made of little segments of files located on your disk. Whenever you write, a new segment will be created. When a certain amount of segments is reached, they are all merged.


  • How many shards to use?

If you have a write-heavy indexing case with just one node, the optimal number of indices and shards is 1. However, for search cases, you should set the number of shards to the number of CPUs available. In this way, searching can be multithreaded, resulting in better search performance.




Conclusion:


A resilient cluster requires redundancy for every required cluster component. This means a resilient cluster must have:

  • At least three master-eligible nodes
  • At least two nodes of each role
  • At least two copies of each shard (one primary and one or more replicas, unless the index is a searchable snapshot index)

A resilient cluster needs three master-eligible nodes so that if one of them fails then the remaining two still form a majority and can hold a successful election.

Similarly, the redundancy of nodes of each role means that if a node for a particular role fails, another node can take on its responsibilities.

Finally, a resilient cluster should have at least two copies of each shard. If one copy fails then there should be another good copy to take over. Elasticsearch automatically rebuilds any failed shard copies on the remaining nodes in order to restore the cluster to full health after a failure.

Failures temporarily reduce the total capacity of your cluster. In addition, after a failure, the cluster must perform additional background activities to restore itself to health. You should make sure that your cluster has the capacity to handle your workload even if some nodes fail. 

No comments:

Post a Comment

Exploring the Power of Generative AI Services: Unlocking Limitless Creativity

Introduction In recent years, we have witnessed remarkable advancements in the field of artificial intelligence (AI). One of the most intrig...