In the early days of my playing with ElasticSearch, I remember struggling with some of the basic terminology and concepts. Naturally, many of us try to equate ElasticSearch, with what we know of RDBMS. To that end, I thought of posting this simple topic as a reference to others:

Node = DB Instance

One Database Instance

A node is simply one ElasticSearch instance (1 java process).

Consider this a running instance of MySQL. Just like you can have more than one MySQL instance running per machine on different ports… you can have more than one elasticsearch node running per machine on different ports.

Cluster = Database Cluster

1..N Nodes with the same Cluster Name.

Index = Database Schema

Similar to a Database, or Schema. Consider it a set of tables with some logical grouping. In ElasticSearch terms, an index is a Collection of Documents; where a “Document” is similar to a DB table.

Mapping Type = Database Table

ElasticSearch uses document definitions that act as tables. If you PUT (“Index”) a document in ElasticSearch, you will notice that it automatically tries to determine the property types. This is like inserting a JSON blob in MySQL, and MySQL determining the number of columns and column types (int, string, datetime, etc…) as it creates the DB table for you, on-the-fly.

Note: I’ve heard this refered to as “Type”, “Document Type”, and “Mapping Type”.

Shard = Uhhh…

I don’t think this one has a DB equivalent, but it’s likely the most important aspect listed here. A Shard is the smallest unit of worker in your cluster. It is one running Lucene instance. Shards are distributed across all of the nodes in your cluster and they are what makes ElasticSearch, elastic, sort-a-speak; giving your information and ES process redundancy.

Comments are closed.