Tech dossier: Elasticsearch / the ELK stack

Because tech is moving so fast, I’ve been keeping dossiers in Evernote of open source products I have to learn more of, which I’ve decided to put on my blog. My last one was about Kubernetes. This one is about Elasticsearch, also known as the ELK stack.

 

A short description – in English

Elasticsearch is a search engine. You can use it to quickly find keywords in a large collection of documents. But the people who came up with it must have realized at one point that you can use the same technology also to search in much more than text. You can use it to search any data.

You use Logstash to load data into your Elastic search engine. Or, as Elastic themselves call it: it’s a server-side data processing pipeline. Kibana is used for virtualization. Together it’s called the ELK stack. One thing the ELK stack is often used for, is on logfiles throughout one’s architecture. The ELK stack is one of the faster data technologies, but it will take a bit to get used to the JSON style queries against the REST API.

 

Learning Elasticsearch

If you want to learn Elasticsearch with a training from Elastic themselves, get ready to pay old fashioned high training prices. But there are other options. (They have some videos. You have to register to view them.)

Of course nowadays, there are cheaper options. Like the Elasticsearch 6 and the ELK stack In Depth and Hands On! course by Sundog Education at Udemy.com. This is a great course to get you started and currently sets you back 15 euros.

If you only want to get a taste of what Elasticsearch, Logstash and Kibana can do, check out my video:

 

Building your own environment

The Install Elasticsearch 6 on a Virtual Ubuntu Machine video by Sundog Education is actually how I got to their Udemy course. Frank Kane shows you exactly how to set your ELK environment up. The only things lacking, were discussed in a second video in the course.

There are also Elasticsearch images available. When I still was learning ElasticSearch I didn’t get how to work with them. But if you want to try, here is the link: https://bitnami.com/stack/elasticsearch/virtual-machine

 

Loading csv files with LogStash

There were some specific things I had to find out to get my Gaia data in Elasticsearch project going. Like how to get Logstash to load csv data. Here are a couple of sources that I’ve read.

https://qbox.io/blog/import-csv-elasticsearch-logstash-sincedb

https://www.elastic.co/guide/en/logstash/current/plugins-filters-csv.html

https://github.com/aarreedd/CSV-to-ElasticSearch

https://www.elastic.co/blog/little-logstash-lessons-part-using-grok-mutate-type-data

Getting a scatterplot in Kibana

https://www.timroes.de/2018/02/05/kibana-vega-scatterplot/

Vega allows you in Kibana to tell exactly how you want your graph to look like. So naturally I had to read the docs. https://vega.github.io/vega/docs/

 

On my reading list:

On how Bol.com uses Elasticsearch as primary data store: https://vlkan.com/blog/post/2018/11/14/elasticsearch-primary-data-store/

Maybe something for a NiFi tech dossier: Apache NiFi: From syslog to Elasticsearch: http://blog.davidvassallo.me/2018/09/19/apache-nifi-from-syslog-to-elasticsearch/

 

About Marcel-Jan Krijgsman

In 2017 I made the leap to Big Data after 20 years of experience with Oracle databases. I followed courses on Hadoop, Big Data Analytics, Machine Learning and Python, MongoDB and Elasticsearch.
This entry was posted in Tech dossier and tagged , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *