-
Recent Posts
Recent Comments
- Mart Laurano on I built a working Hadoop-Spark-Hive cluster on Docker. Here is how.
- Marcel-Jan Krijgsman on I tried Lion’s Mane as a cognitive enhancer. Here are my experiences with it.
- B on I tried Lion’s Mane as a cognitive enhancer. Here are my experiences with it.
- Suresh Vemuri on I built a working Hadoop-Spark-Hive cluster on Docker. Here is how.
- Suresh Vemuri on I built a working Hadoop-Spark-Hive cluster on Docker. Here is how.
Archives
- May 2022
- April 2022
- March 2022
- January 2022
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- February 2021
- October 2020
- November 2019
- September 2019
- June 2019
- April 2019
- March 2019
- January 2019
- December 2018
- May 2018
- April 2018
- February 2018
- January 2018
- December 2017
- November 2017
- August 2017
- July 2017
- June 2017
- May 2017
- April 2017
- February 2017
Categories
Meta
-
Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy
Category Archives: Learning Big Data
My Github repo got 50 stars
I never imagined myself as a maintainer of a data engineering related open source thing. Yet. But when I was working on our data engineering course, I needed some kind of data lake software. At first I used the Cloudera … Continue reading
Posted in Apache Products for Outsiders, Data engineering, Learning Big Data
Tagged Docker, docker-compose, Github, Hadoop, stars
Leave a comment
Five years of data engineering
Five years ago I made the switch from Oracle database administration to data engineering. It has been quite a ride. I made a video about this to celebrate.
Posted in Active Learning, Data engineering, Learning Big Data
Tagged data engineering
Leave a comment
I built a working Hadoop-Spark-Hive cluster on Docker. Here is how.
TL;DR: I made a Docker compose that runs Hadoop, Spark and Hive in a multi-container environment. You can find the necessary files for it here: https://github.com/Marcel-Jan/docker-hadoop-spark [Update 2021-11-09: Since Docker Desktop turned “Expose daemon on tcp://localhost:2375 without TLS” off by … Continue reading
Posted in Howto, Learning Big Data, Spark
Tagged Apache Spark, Big Data Europe, DIKW, Docker, docker-compose, Hadoop, Hive
17 Comments
R Studio: Doing data science on my health data – Part 1
Up to seven years ago my doctor would nag me every half year that I should lose some weight. Nagging didn’t work on me that much. What did work however was competition. I wanted to become faster in a bike … Continue reading
Posted in Learning Big Data, Weird experiments
Tagged Big Data, ggplot2, health data, R Studio, Udemy
Leave a comment
Check your /tmp on HDFS
If you have sensitive data on your Hadoop cluster, you might want to check /tmp on HDFS once a while to see what ends up there. /tmp is used by several components. Hive for example stores its “scratch data” there. … Continue reading
Making a Hertzsprung-Russell diagram from Gaia DR2 data with Elasticsearch
Elasticsearch was one of the open source products on my list to try out, ever since I got rejected for a couple of assignments as a consultant last year. Apparently it’s a popular product. But why do you need a … Continue reading
Posted in Learning Big Data, NoSQL
Tagged 2001 A Space Odyssey, ElasticSearch, Frank Kane, Gaia, Hertzsprung-Russell diagram, Kibana, Logstash, Udemy, Vega
Leave a comment
Building HDP 2.6 on AWS, Part 3: the worker nodes
This is part 3 in a series on how to build a Hortonworks Data Platform 2.6 cluster on AWS. By now we have an edge node to run Ambari Server, three master nodes for Hadoop name nodes and such. Now … Continue reading
Posted in Howto, Learning Big Data
Tagged Amazon Web Services, AWS, cloning nodes, Hadoop, HDP, Hortonworks Data Platform, Ubuntu Server, worker nodes
Leave a comment