My Github repo got 50 stars

I never imagined myself as a maintainer of a data engineering related open source thing. Yet. But when I was working on our data engineering course, I needed some kind of data lake software. At first I used the Cloudera sandbox, but some of my colleagues tried it and they complained it took way too much time to start and way to many resources of their laptop. It would be a good bet that our students would get that problem too.

Long story short: I found that Big Data Europe already had a simple Dockerized Hadoop. They actually did all the hard work. But I wanted it to have Hive and Spark too. I went playing with docker-compose yml files and learned a lot from that BTW. And after some initial frustrations it finally worked. Continue reading

Posted in Apache Products for Outsiders, Data engineering | Tagged , , , , | Leave a comment

Five years of data engineering

Five years ago I made the switch from Oracle database administration to data engineering. It has been quite a ride. I made a video about this to celebrate.

Posted in Active Learning, Data engineering | Tagged | Leave a comment

First experiments with the OAK-D Lite

Last week my OAK-D Lite from Luxonis arrived. I can imagine you’ve never heard from it. Basically it is a camera that can do all kinds of AI tasks on the device itself. I got mine via Kickstarter. And where I say camera, I actually mean it has multiple cameras. That’s how it can see depth for example.

It can do much more. Load an algorithm, point it at the street next to your house and it starts detecting cars, cyclists and pedestrians. Load the human posture algorithm and it starts showing your posture. Or gestures, sign language, face recognition or COVID-19 mask detection.

Continue reading

Posted in Active Learning, Weird experiments | Tagged , , , , | Leave a comment

What a year 2021 has been

So at the end of 2021 I found myself in the waiting room of an emergency dentist. An infection above my front teeth became unbearable. Fortunately antibiotics makes my live much better now. Let that event not colour my view on 2021.

For me 2021 was a great year, despite the pandemic, lockdowns and those damned curfews. Luckily 2021 eventually also had vaccinations.

And what a difference a year makes. Last year I was frantically working on the last modules of the Certified Data Engineering Professional course. This year I have time for whimsical things like watching a movie (on tv) or playing a computer game.

 

Work

I’ve learned so much by writing and teaching modules for this course. You want to learn stuff really, really fast? Teaching it is probably the best way to go. Also creating a course you can repeat is a great investment in the future. For every day I teach I have to spend time to prepare of course, but the greatest investment is happily behind me.

Me (right) and my colleague Jeroen Odijk during the first Certified Data Engineering Professional course. That one was held online. The second one fortunately was on site.

I also managed to get an assignment at a Dutch financial company as a data engineer working with Hadoop, NiFi and other stuff. This is the first kind of data engineering assignment as I envisioned it back when I started changing from Oracle database expert to data engineering. The latest development is that I’m getting more and more into the practical aspects of data management.

Also got my Azure Fundamentals certification, but that will be just the start of my Azure certifications.

 

Github

I never saw myself as a contributor of open source software. I wanted it to be, but I thought most of these are written by people who fluently write Java or Python. But then I created a docker-compose for a Hadoop environment for our data engineering course. And I decided to put it on GitHub. And to my surprise this repo has 38 stars now. Turns out there are other people who would like to run Hadoop with Spark and Hive on their own laptop.

So now I’m suddenly a maintainer of a repo. Halfway through the year Docker decided to harden their software (which is good). And whoops: suddenly some of the containers couldn’t communicate propery with each other. It took me a while before I found out what happened. What changed was that Docker Desktop no longer exposes the daemon without TLS. When you do expose it, everything works again. I need to find a better solution for that BTW.

 

Vacations

In August I went cycling in France in the Vercors and Drome and that was very nice. This area of France has some beautiful rides, hewn out of rock, like the Col de la Machine. The climbs are usually less steep than the onces I’ve encountered in other years in the Alpes. Except the Col de Noyer BTW.

Then in October I decided to go to Paris for a week.  I’ve been in Paris before, about 30 years ago. On sort of a 24 hour race through the city to see all the sights. As you do when you’re a student. I remember running through the Louvre with my brother just to see the Mona Lisa and then run out to see the next thing (didn’t matter, because the ticket was free).

This time I took my time to see all the sights. I went in all the famous museums: Musee d’ Orsay, the Louvre, Centre Pompidou and really took time to soak it all in. I went through the Louvre bottom to top. Amazing. I didn’t know the Venus de Milo is also there. The Nike of Samothrace was also stunning. A different museum I visited was the Musee des Arts et Metiers, which has a particle collider from the 1930s and a steampunk diving suit from the 1880s.

A 19th century diving suit. Also seen in Bioshock games.

The modern art in the Centre Pompidou is just fantastic. I’m not exactly a modern art connaisseur. I didn’t expect the joy of seeing this collection. I learned to love artists I never knew, like Frantisek Kupka and Suzanne Valadon. (Surprised there are hardly any books on Kupka, except in French).

A painting by Frantisek Kupka. Sorry for the fisheye lens photo. I was tired of switching lenses.

I had the most fun in years. The food was great. And I got to see the Arc de Triomphe, Wrapped by Cristo (only for 3 weeks). It was a stunning sight.

Arc de Triomphe, Wrapped

If I have any resolutions for 2022, it’s to do another city trip like this. Preferably by train. Maybe Barcelona.

 

Cycling

I set myself a goal to ride 10,000 kms this year. I knew I could do it. In 2020 I reached 8754 kms. And already halfway in October I managed to pass 10,000 kms. And just on the last day of the year I reached the stretch goal of 12,000 km.

View from my bike somewhere north of Nieuwkoop.

BTW I have no plans for even larger goals in 2022. There’s no need to overdo it. Cycling takes time. Riding 12,000 kms took 436 hours. I need to leave time to do other things.

But I do keep looking out for new rides. This year my favorite discoveries were the Kromme Mijdrecht, a river south of Amsterdam. You can follow it along a road next to the river in both ways.

I also found my ride along the coast between Noordwijk and Katwijk is wonderful. This is a ride with a view on the North Sea for several kms. I need to ride about 110 kms in total to experience it, but in summer, when the days are long this is quite achievable. Also a ride I did with my brother in Noord-Brabant was a good one.

This year I rode the Gran Fondo Rosa and the Vael Ouwe. Both beautiful but long rides (although the Gran Fondo Rosa was very, very wet), both about 160 kms long. Which was a new record for me for kms on one day. Actually the Gran Fondo Rosa ended up being 170 kms because I couldn’t find the car park.

(Oh and almost forgot: in Februari it froze long enough that I was able to go ice skating. Which was the first time in many years. Even went round on one of the lakes near Reeuwijk. Great experience.)

 

Astronomy and space

This is my hobby that gets less and less time. Observing already suffered for several years. It’s especially hard to combine with the amount of cycling I did. After several hours of training your natural tendency is to go to bed early. And not stay up late to watch the skies.

But I was happy to observe the noctilucent clouds in June. And the passage of ISS and the new Nauka module (and its 3rd rocket stage).

I used to do a lot more writing about the solar system (in Dutch), but the work on the course and the hours of cycling pushed that to the side. I did enjoy the landing of the Perseverance rover and the recent launch of the James Webb Space Telescope. Not to mention the bellyflops of the several Starship tests. Those were amazing. We’ll have to wait a couple of months for the first images of JWST, but that’s going to be game changing.

 

Writing

I’ve not done as much writing as I would like to, but my blogpost about critical thinking is one that was kind of influential in my personal life. It is about how I got advise from a homeopath to stop taking one of my prescribed medicine with bad results. I never wrote about something so personal, but in the end it worked out well.

 

Stage acting

This is the only thing that never got of the ground after the earlier lockdowns. I got together with my stage acting group only once in the pub. We made plans and then the new peak of COVID-19 happened.

 

Cooking

I never order meals at home. I just keep cooking myself. And the freezer is usually well stocked with extra portions for other days. One of my favorite new recipes was bibimbap, a Korean dish.

Okonomiyaki with “cycling” beer

Recently I invited two friends of mine who I hadn’t seen in a while, for a meal at my house. But again COVID-19 infections peaked, so we saw another lockdown. And my friends understandably were rather safe than sorry. So I changed the plan. They live nearby, so I decided to cook Thai curry (there’s a recipe I’ve grown to like) and delivered it to their home with rice, beer and krupuk (sort of Indonesian deep fried crackers you get at a lot of Indonesian and Chinese meals in the Netherlands). We ate it both at our home and then discussed via Zoom. Despite the limitations these Zoom calls gives us, it all it turned out to be a very fun evening. I hope to do this more often.

 

2022

I don’t do new years resolutions. I have some ideas. They might pan out or not. I kept my Strava year goal on 10,000 km cycling. That is more than enough to stay healthy. Azure certifications will be a thing and I’m thinking of short new courses for DIKW. We’ll see.

Best of whishes for 2022. Let’s make it a good one.

Oh, and of course you will be able to buy the NFT of this blogpost as soon as I know how to do that.

Posted in Active Learning, Data engineering | Tagged , , , , , , , | Leave a comment

Weekendlinks 2021-47: bike crashes, Tetris and Airflow foundations

It’s weekend and it’s raining. Time to play some computer games.

 

Bikrash

In meatspace I try to avoid crashing with my bike. But in this free game they are rampant. And you can actually win by causing them, by kicking other racers. Also watch out for the road spanning, enormous potholes.

https://hisashimaru.itch.io/bikrash

 

AI does Tetris. Fast!

This is fun to watch. The Stackrabbit algorithm plays Tetris so fast, at a certain point it arrives in territory where the game doesn’t have programmed colour combinations anymore. And eventually it breaks.

Found here: https://kottke.org/21/11/watch-an-ai-break-tetris

 

What I’m currently reading

Professionally I’m reading Data Pipelines with Apache Airflow by Bas Harenslak and Julian Rutger de Ruiter. After a bit slow intro, it really helps you along building your first pipelines in Airflow.

Recreationally I’m rereading the Foundation trilogy by Isaac Asimov. Because I’m also watching the series on Apple+. I think it’s at least 25 years ago that I read it the first time and I didn’t remember any of it. Except that it was about Hari Seldon and there was a Second Foundation.

 

Posted in Weekendlinks | Tagged , , , , | Leave a comment

Weekendlinks 2021-46 brrrr, space juggling and salt crystals

I have to say, I’m a fan of long summer nights where I can ride my bike for hours after work in short sleeves. This is not that time. The sun goes under at 16:45  now. So if I do decide I want to ride my bike after work, I need some good bike lights to see where I am. But hey, at least it’s not as cold as in Yakutia. (Please summer, come back quickly)

 

You think it’s cold where you live?

Wait till you hear from Yakutia, in Northern Siberia. Living in an area where the temperatures drop to -71°C has its challenges. But on the other hand: you don’t need to buy a freezer.

(Via https://kottke.org/21/11/how-people-live-in-the-coldest-place-on-earth)

 

The Space Juggler

I never miss an episode of the Weekly Space Hangout. Hosted by Fraser Cain from Universe Today you get to hear the latest news in space and astronomy. It’s informative and funny.

But when I heard they had a guest called the Space Juggler, I could not keep one of my eyebrows from rising. It turns out Dr. Adam Dipert is doing incredibly interesting and diverse research. On the one hand he tries to find out why there was more matter than antimatter in the Universe after the Big Bang. On the other hand he experiments with juggling on zero-G flights. You will find that that research is equally mindblowing.

 

Grow your own sodium chloride crystals

Wait don’t I already have salt crystals in my cupboard? Not big transparent crystals like these, my friend. This looks like an awesome experiment you can do at home. Maybe with the kids?

How to Grow Sodium Chloride Crystals at Home

 

Posted in Weekendlinks | Tagged , , , , , , , | Leave a comment

Gaining insights on my workout data with Apache Superset

For a few years I’ve been gathering data on my workouts. In Excel. It’s not exactly state of the art data architecture, but it was fine for a while. But data alone doesn’t do much. I wanted some questions answered.

Lately I’ve been hearing a lot about Apache Superset. (Well, I’ve been hearing lately about lots of products actually. It’s hard to choose one product to spend a lot of time on.) Apache Superset is open source data visualization software. I decided to give it a try for this particular problem.

The video

Apart from the installation I demo most things in this video:

 

Starting on your laptop

If there’s anything I’ve learned recently working with Docker Desktop, it’s that it is often very easy to get a working environment of most open source data products in one or more Docker containers. Usually they have an image or docker-compose on their site somewhere to get you started. Same for Superset.

All you need to get started is Git and Docker Desktop (available for Windows, Mac and Linux). I use Git for Windows. To build your Dockerized Superset environment, just follow the instructions on the Superset documentation site:

https://superset.apache.org/docs/installation/installing-superset-using-docker-compose

This will start 6 Docker containers. One of them (named superset_db) runs a PostgreSQL 10 database, which contains the sample data, but also can be used to upload your CSV data. Another, superset_init, will only be used for installation and won’t run anymore after that. That is fine.

After the installation is done, go to http://localhost:8088/login/ or, in Docker Desktop, click here:

You can log in here with username admin, password admin. Now the Superset welcome screen will show. You can have a look at the example charts and dashboards.

 

Uploading my own dataset

Superset allows you to upload a CSV file. It will do this in the example (PostgreSQL) database. But first you need to edit the settings of the example database to allow uploads. Go to Data, Databases and click on edit:

Go to Advanced, Security and check Allow data upload. All this is explained in above video.

 

How to get a proper datetime column

I also explain also all the settings you can do to upload the CSV file. But there was a problem. PostgreSQL didn’t recognise the date and time format. Despite the “Infer Datetime Format” setting I had enabled.

An example of my date and time data was this: 2021-10-21 17:52:00

Because PostgreSQL didn’t recognise this as a datetime, Superset didn’t allow more advanced time-related features. For example, when I used a Time-series Bar Chart v2 or Time-series Area Chart v2, and I chose a Time Grain: week or year, it would come up with an “Unexpected error”:

Error: function date_trunc(unknown, text) does not exist LINE 1: SELECT DATE_TRUNC('year', "Datum") AS __timestamp, ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts.

To see the problem, go to your dataset, More dataset options and Edit dataset:

In my case my Datum column was of the type “TEXT”, not “DATETIME”. It doesn’t matter that I chose this to be temporal data. You will run into this Unexpected error.

This is more of a PostgreSQL problem. To solve it, let’s go into the superset_db container and change this datatype.

You need to solve this on the command prompt. In Windows I’ve used Powershell for this.

Log in in the superset_db container with this command:

docker exec -it superset_db bash

Connect to the PostgreSQL database:

psql -h 127.0.0.1 -p 5432 -U superset

Check if you can access your table from here. In my case the table was called Workouts2. Because the name is case sensitive, I had to use double quotes around the name.

select * from "Workouts2";

This is how you change the data type of your date column. In my case it was called “Datum”. Here I create a temporary column called temp_date with the DATETIME data type. I then load the data from the “Datum” column:

ALTER TABLE "Workouts2" ADD COLUMN temp_date TIMESTAMP without time zone NULL;
UPDATE "Workouts2" SET temp_date = "Datum"::TIMESTAMP;
ALTER TABLE "Workouts2" ALTER COLUMN "Datum" TYPE TIMESTAMP without time zone USING temp_date;
ALTER TABLE "Workouts2" DROP COLUMN temp_date;

After this small operation, you will have the same old “Datum” column, but now with the DATETIME data type. And Superset will notice this too:

Now you can really start using the fun time related features in Superset, which I demoed in the video.

 

My Superset experience

Once I had my datetime column set up, things really took off. Superset became really fun to use. I really was able to gain insights quickly. Superset is indeed very powerful. If possible I certainly will use it in the future.

 

Furure work

My Superset dashboard looks awesome, but for every update to my Excel sheet, I need to upload it in Superset. That is not something I’m looking forward doing in the future.

In fact, I don’t like to enter my health data in Excel at all. Here is how I would like my data architecture to look like:

  • I want an app on my iPad or iPhone to enter my weight data. (Now I use Evernote and copy that to Excel). The app uploads this data directly in a central database in my home. Possibly on a Raspberry Pi.
  • I have a pipeline that retrieves new workout data from Strava and Polar and enters this data in my central database.
  • On this central database I run Superset, where my dashboard is always up to date.

This will require quite some work and time. I have no idea how to create an app that works on my iPad or iPhone. I hope it can be done in Python, but even creating an app in Python is new ground for me.

If I can create this, I will share it on this blog. But don’t hold your breath just now.

 

 

 

Posted in Apache Products for Outsiders, Howto | Tagged , , , , , | Leave a comment

Weekendlinks 2021-41: Rickrolling, carbon capture, Shatner’s flight

Last weekend I was too busy either cycling or celebrating that I rode 10,000 kms this year. Nevertheless, here are this weekend’s links.

 

Rickrolling your high school by hacking the IPTV system

This student managed to gain access to his high school’s IPTV system. And he carefully prepared a rickrolling prank.

Rick Ashley in action on the Elk Grove High School. (Photo: Tom Tran)

https://whitehoodhacker.net/posts/2021-10-04-the-big-rick

 

This carbon capture method might actually work

Let’s face it: if we’re dependent on all polical leaders to do their part to reduce greenhouse gasses… well, just looking to my country’s (the Netherlands) leadership, I don’t expect them to even work very hard on it.

Credit: Tang et al.

So let’s capture all that carbon dioxide. That could work, right? Sure, but up to now you needed a lot of energy to pull it off. How do we generate that energy? Please don’t say “fossil fuels”.

But this new method by using the metal gallium can convert carbon dioxide to solid products and oxygen, and you don’t seem to need massive amounts of energy. Having solid carbonaceous products is a plus when you want to store it. And getting your oxygen back is positive too.

http://www.sci-news.com/othersciences/chemistry/liquid-gallium-carbon-dioxide-conversion-10164.html

 

Shatner’s flight to space

Last Wednesday William Shatner flew on the 17th flight of Blue Origin’s New Shepard rocket. And when he came back, he was literally moved to tears after seeing the Earth from 100 km altitude.

It’s moving to see him trying to describe his experience. And to think Jeff Bezos almost ruined it, waving his champagne bottle around. If we must do space tourism, please send more people like William Shatner. And not just because he was captain James T. Kirk in Star Trek.

 

 

Posted in Weekendlinks | Tagged , , , | Leave a comment

Weekendlinks 2021-38

 

Question I’m pondering

I was listening to Lex Fridman’s podcast where he interviews Daniel Kahneman. You might have heard about Kahneman: he wrote the influential book “Thinking Fast and Slow”. It is about the two modes of thinking our brain: System 1 (fast, instinctive and emotional) and System 2 (slower, more deliberative, and more logical).

At one point in the interview Fridman and Kahneman discuss happiness. Kahneman tells he gave up on happiness research. And he explains this, hopefully hypothetical, scenario:

Suppose you go on a vacation. But at the end of the vacation you’ll get an amnesic drug and you won’t remember anything. And all your pictures will be destroyed. Would you choose the same vacation?

That is kind of interesting. I look fondly back on my recent vacation in France: the beautiful routes I’ve ridden, the wonderful meals I had, the cols I’ve climbed. And I’m still busy making video compilations of each day. And I’m looking proudly back on Strava on the rides I’ve done. Suppose all that was deleted? What would I change my holiday? That is such an interesting question.

I think I would do another cycling holiday, just because I feel great afterwards. Part of the fun though is reviewing the videos and photos I shot. Would I do an intensive cycling holiday in a less beautiful area, just because I would not remember anything about it anyway?

 

An asteroid hit Jupiter

This is something almost only amateur astronomers find: impacts of asteroids on Jupiter. Like Jose Luis Pereira from Brasil, who made this find, while imaging the largest planet in our solar system.

(You’d think spacecraft would pick that up first, but then you overestimate how many active spacecraft there are around our planets. Currently there’s only NASA’s Juno mission in orbit around Jupiter. Juno is in a very elongated orbit to stay out of Jupiter’s harmful radiation belt for most of the time. It was that, or make  spacecraft with expensive fully radiation hardened electronics. Also Juno does have a camera, but that was more or less added to the spacecraft for public outreach. It can’t view Jupiter any better than amateur astronomers can from the point in it’s orbit it is currently in. It will take until October 16th for Juno to get close again and by that time probably there will be no trace left of the impact.)

 

The Vinland map is (partly) fake

When I was young I remember about this possible map, drawn by Vikings, which would have Northern America on it. Well, it was analyzed by scientists at Yale University. The map is old, but if you do X-ray spectroscopy on it, it will show that the ink used to depict America has titanium in it. And those types of ink started being used only since the 1920s.

https://news.yale.edu/2021/09/01/analysis-unlocks-secret-vinland-map-its-fake

 

Wonderful picture from ISS

This is Eastern Europe, shot at night from the ISS. In front you see the Soyuz spaceship (used to bring cosmonauts to ISS and back home) and the new Nauka module

The Soyuz and Nauka above eastern Europe

Posted in Weekendlinks | Tagged , , , , | Leave a comment

Weekendlinks 2021-33

I’m back for a wonderful cycling holiday in the Vercors and Drôme regions of France. And this is what it looked like:

But enough of that. Let’s have some weekend links.

 

One little RNA change: Boom! 50% more potato for you

Scientists found that by changing one methyl group in the structure of RNA of potato plants causes it to yield 50% bigger potatoes. And it’s not just a more watery potato. There were no changes to starch and protein. And it doesn’t stop at potatoes. In rice plants it causes more rice (not bigger). The reason of this seems to be because the roots of the plants grow deeper and the photosynthesis is quite more effective.

So the world rejoyces, right? Bigger potatoes and more rice for everyone. No more world hunger! Probably more tests need to be done. And likely the stigma of genetically modified organisms (GMO) will prevent much change happen here in Europe. But hopefully in other countries it will indeed cause less hunger some day?

https://blogs.sciencemag.org/pipeline/archives/2021/07/28/one-lost-methyl-group-huge-amounts-of-food-production

 

GPT-3 writes an attorney case

The GPT-3 is a language model that can generate very impressive texts. But it has its limits. You can use it to create the text of an exciting court case with plot twists and everything. But that doesn’t mean it understands how courts work.

(Found on the Links for July post on the Astral Codex Ten blog)

 

A different cooking program

I love Nat’s What I Reckon. It’s not your average cooking program and yet he serves totally good food.

Posted in Weekendlinks | Tagged , , | Leave a comment