ElasticSearch recently announced, some of the numbers behind their ever-increasing rates of adoption. You can see the post, by their CEO here. Hitting the 5m download mark is an impressive milestone, and with 500,000 d/ls / month, it seems to be increasing (at an increasing rate). The buzz surrounding ES seems to have some legs under it after all.
That blog post inspired me to take a look at some of the ElasticHQ numbers and see if we can dig a bit deeper in to ElasticSearch usage patterns. We are in a unique position of being able to gather and analyze generic usage and environmental data. ElasticHQ is less than a year old, but is widely used by Fortune 100 companies and smaller companies alike. The user-base is widely distributed across developers and system engineers / sysops. I mention the two previous points, because they effectively skew data… when analyzing user patterns, one has to take the user (actor) in to account. Unfortunately, ElasticHQ can’t read job titles or intent, so I had to make due with raw data and assume a margin of error across ~10,000 unique clusters.
Now… enough typing. More numbers and pretty charts…
% Clusters vs. ElasticSearch version
Distribution: # Nodes per Cluster
# Documents per Cluster
- Quartile 1: 20,718 Documents
- Median: 1,134,029 Documents
- Quartile 3: 30,047,243 Documents
- Maximum: 4,294,967,295 Documents
# Indices per Cluster
ElasticSearch distribution across the globe….
- United States: 31.95%
- France: 7.34%
- Germany: 6.29%
- United Kingdom: 5.92%
- India: 3.98%
- Brazil: 2.67%
- Russia: 2.61%
- Netherlands: 2.48%
- China: 2.34%
- Canada: 2.19%
There is a lot more data to share with this respect, but I have only so many free hours in a week. It’s interesting to see what is detailed here so far, as in summary it hints at ElasticSearch use and deployment patterns:
- v.90.5 as the most common version used. (admittedly, I didn’t take version adoption over time in to account)
- Most clusters have a small # of nodes (hardware pricey? Are we tracking a large # of dev boxes?)
- Over half of the deployments here have (what I consider) to be medium to large document stores
- A small (un-complex?) number of indices per cluster.
If this can be trusted as a gauge for ElasticSearch usage in the wild, it will be interesting to see how it changes over time, and more importantly… where it leads ElasticSearch (the company), as it may give a hint as to the user-base make-up. ElasticHQ sees daily use by large companies (Disney, eBay, Goldman Sachs, Siemens, etc…), yet usage is heavily skewed toward SMBs and startups. I can only assume the data gathered here and the companies using HQ every day are an accurate depiction.