Pleased to announce the release of ElasticHQ v1.0.0.
This release added:
  1. Support for ElasticSearch v1.0.0RC1 and unbroke the breaking changes. ;-)
  2. Support for monitoring multiple file systems
  3. Support for G1 GC
  4. Allow user to select which nodes are displayed on the Diagnostics Screen
Every HQ release is always backwards compatible, so there’s no extra work needed on your part. As always, you can get it here: http://www.elastichq.org/gettingstarted.html
Hats off to the ElasticSearch team, as this release seemed to tie up a lot of loose ends and added some great bits like the _cat API, federated search, and the ability to Backup/Restore on a running instance - more.

 

elasticsearch_global_usersElasticSearch recently announced, some of the numbers behind their ever-increasing rates of adoption. You can see the post, by their CEO here. Hitting the 5m download mark is an impressive milestone, and with 500,000 d/ls / month, it seems to be increasing (at an increasing rate). The buzz surrounding ES seems to have some legs under it after all. ;-)

That blog post inspired me to take a look at some of the ElasticHQ numbers and see if we can dig a bit deeper in to ElasticSearch usage patterns. We are in a unique position of being able to gather and analyze generic usage and environmental data. ElasticHQ is less than a year old, but is widely used by Fortune 100 companies and smaller companies alike. The user-base is widely distributed across developers and system engineers / sysops. I mention the two previous points, because they effectively skew data… when analyzing user patterns, one has to take the user (actor) in to account. Unfortunately, ElasticHQ can’t read job titles or intent, so I had to make due with raw data and assume a margin of error across ~10,000 unique clusters.

Now… enough typing. More numbers and pretty charts…

% Clusters vs. ElasticSearch version

ElasticSearch Version Distribution

Distribution: # Nodes per Cluster

nodespercluster

 

# Documents per Cluster

docspercluster

Quartile Distribution:

  1. Quartile 1: 20,718 Documents
  2. Median: 1,134,029 Documents
  3. Quartile 3: 30,047,243 Documents
  4. Maximum: 4,294,967,295 Documents

 # Indices per Cluster

indicespercluster

 

User Locations

ElasticSearch distribution across the globe….

  1. United States: 31.95%
  2. France: 7.34%
  3. Germany: 6.29%
  4. United Kingdom: 5.92%
  5. India: 3.98%
  6. Brazil: 2.67%
  7. Russia: 2.61%
  8. Netherlands: 2.48%
  9. China: 2.34%
  10. Canada: 2.19%

There is a lot more data to share with this respect, but I have only so many free hours in a week. ;-) It’s interesting to see what is detailed here so far, as in summary it hints at ElasticSearch use and deployment patterns:

  • v.90.5 as the most common version used. (admittedly, I didn’t take version adoption over time in to account)
  • Most clusters have a small # of nodes (hardware pricey? Are we tracking a large # of dev boxes?)
  • Over half of the deployments here have (what I consider) to be medium to large document stores
  • A small (un-complex?) number of indices per cluster.

If this can be trusted as a gauge for ElasticSearch usage in the wild, it will be interesting to see how it changes over time, and more importantly… where it leads ElasticSearch (the company), as it may give a hint as to the user-base make-up. ElasticHQ sees daily use by large companies (Disney, eBay, Goldman Sachs, Siemens, etc…), yet usage is heavily skewed toward SMBs and startups. I can only assume the data gathered here and the companies using HQ every day are an accurate depiction.