Handling far future dates in pandas

Recently I got the request to add specific data quality metadata with csv datasets that my client delivers to customers. It was very simple. Just counts, min, max and -in case of integers – sums per attribute. Not a difficult task. After a short talk with architects we decided to build this with Python and pandas. Because my efforts were required in another project at that time, my fellow DIKW entrepeneur Wyas build most of it. It worked out well. It ran through a couple of GBs in minutes.

(more…)

My Github repo got 50 stars

I never imagined myself as a maintainer of a data engineering related open source thing. Yet. But when I was working on our data engineering course, I needed some kind of data lake software. At first I used the Cloudera sandbox, but some of my colleagues tried it and they complained it took way too much time to start and way to many resources of their laptop. It would be a good bet that our students would get that problem too.

Long story short: I found that Big Data Europe already had a simple Dockerized Hadoop. They actually did all the hard work. But I wanted it to have Hive and Spark too. I went playing with docker-compose yml files and learned a lot from that BTW. And after some initial frustrations it finally worked. (more…)

First experiments with the OAK-D Lite

Last week my OAK-D Lite from Luxonis arrived. I can imagine you’ve never heard from it. Basically it is a camera that can do all kinds of AI tasks on the device itself. I got mine via Kickstarter. And where I say camera, I actually mean it has multiple cameras. That’s how it can see depth for example.

It can do much more. Load an algorithm, point it at the street next to your house and it starts detecting cars, cyclists and pedestrians. Load the human posture algorithm and it starts showing your posture. Or gestures, sign language, face recognition or COVID-19 mask detection.

(more…)