Recent Posts
- Showing a gift total on a Raspberry Pi with an e-ink display – how hard could it be?
- How I memorise my lines (and other things) with Anki
- Categorising text with ChatGPT. Results may be messy.
- A Strava dashboard on a Raspberry Pi (Part 3): The Strava API
- A Strava dashboard on a Raspberry Pi (Part 2): Installing software
Recent Comments
Category Archives: Apache Products for Outsiders
My Github repo got 50 stars
I never imagined myself as a maintainer of a data engineering related open source thing. Yet. But when I was working on our data engineering course, I needed some kind of data lake software. At first I used the Cloudera … Continue reading
Posted in Apache Products for Outsiders, Data engineering
Tagged Docker, docker-compose, Github, Hadoop, stars
Leave a comment
Gaining insights on my workout data with Apache Superset
For a few years I’ve been gathering data on my workouts. In Excel. It’s not exactly state of the art data architecture, but it was fine for a while. But data alone doesn’t do much. I wanted some questions answered. … Continue reading
Posted in Apache Products for Outsiders, Howto
Tagged Apache Superset, DATETIME, Docker, docker-compose, health data, PostgreSQL
Leave a comment
Hadoop in a Hurry – Security
When talking about Hadoop security there are so many products and features. What do all of them do? This video gives a high over overview.
Hadoop High Availability In A Hurry – Part 2: YARN
If you don’t know a lot about YARN and why it’s called a data operating system, you’re in luck. I found it necessary to explain how YARN works before I could explain the solutions for high availability. At first YARN … Continue reading
Posted in Apache Products for Outsiders
Tagged Application Master, Container, Hadoop, Node Manager, Resource Manager, YARN, ZooKeeper
1 Comment
Hadoop High Availability In A Hurry – Part 1: HDFS
I’ve been studying for a couple of hours how Hadoop high availability works, for the HDPCA exam. And now I’ve condensed that knowledge to a video on HDFS HA in just under 9 minutes. Enjoy!
Posted in Apache Products for Outsiders
Tagged DataNode, edits file, Fencing, fsimage, Hadoop, HDFS, High availability, JournalNode, NameNode, Split brain, ZKFC, ZooKeeper
1 Comment
Quickly start of the Nifi crash course
As I said last in my last blogpost, I have followed the Apache NiFi crash course that Hortonworks provides. Now the tutorial describes several different scenarios and options and you have to read through that to find which you want. … Continue reading
Posted in Apache Products for Outsiders
Tagged Apache NiFi, Google Places API, HDP Sandbox, Hortonworks, tutorial
2 Comments
My first experiences with Apache NiFi
There are a lot of data-related Apache products out there and it’s hard to keep up with all of them. There are several products to stream or flow data (what’s the difference?). Like Kafka, Storm, Flink and NiFi. Yes, all … Continue reading