Masterclass Machine Learning in Cycling

Last Tuesday Paul van Herpt and I traveled to Lille for a special Machine Learning and Cycling Masterclass. As data partner of Soudal Quick-Step Pro Cycling Team, these are exactly the applications that touch where we as Transfer Solutions can make the difference. Hence Paul and I followed this special course from the IDLab (UGent – UAntwerpen – imec).

The author (left) and Paul van Herpt at the Masterclass Machine Learning in Lille.

Machine learning is already used a lot in sports. In soccer, for example, a huge amount of statistics is at hand: who has how long ball contact, who usually shoots to whom, makes the most runs, who is the most dangerous? That kind of data is already very easily traceable. And in tennis, it is easy to track the ball, calculate speed, etc..

(more…)

Visiting PyGrunn 2025

Conferences are a great way to learn diverse topics in your field. That’s why I like to go to events like Pycons and last Friday, PyGrunn. PyGrunn is a Python event in Groningen, the Netherlands. I submitted two talks for the event myself. One of them was selected.

Here is a recap of the talks I attended and the stuff I learned, so you maybe get inspired to attend Python conferences and even speak at these events.

Keeping your Python in check – Mark Boer

Python was originally developed to make coding more accessible. Where at other programming languages you had to tell what type of data type your variables are, Python deduced this automatically. Good for beginning coders, maybe not so good for advanced data solutions.

Mark Boer has experience in strong typing in his data science solutions. He shared how you can ensure typing in different ways: in data classes, using Pydantic and named tuples. The talk assumed that the attendees already had experience with typing. I had not, so it was a lot to take in. But if I can review the video in a few weeks, I hope to catch on.

(more…)

My experiences with agentic AI

Originally I wanted to write a blogpost about what data engineers are going to do with AI writing their code. But before I can write that, I need to share my experiences so far. Because from this you’ll get an idea where they work and where they lack.

This is not meant as a treatise of AI coding assistants and agentic AI tools. But here are some of the tools I’ve tried:

  • I’ve worked with VSCode and Copilot now for at least a year.
  • I regularly use ChatGPT and Phind.com for advice on programming tasks.
  • I’ve used VSCode with Cline / Roo Code extensions and LLM models.
  • And I’ve used Claude Code (which is not free, but there seems to be a trial amount of free tokens). Claude Code works from the command line.

The agentic AI solutions are interesting. They are quite capable to create whole Python projects based on your requests. But it doesn’t mean these projects will work right out of the box. Usually there needs to be some tweaking and restarting and checking results.

(more…)
Diagram of heights of Olympic athletes. There's a big gap between short and tall athletes.

Profiling data with ydata in PySpark

When you got a dataset to explore, there are several ways to do that in PySpark. You can do a describe or a summary. But if you want something a little more advanced, and if you want to get a bit of a view of what is in there, you might want to go data profiling.

Older documentation might point you to Pandas profiling, but this functionality is now part of the Python package ydata-profiling (which is imported as ydata_profiling).

I’ve been following this blog on starting with ydata-profiling:

https://www.databricks.com/blog/2023/04/03/pandas-profiling-now-supports-apache-spark.html

Getting ydata-profiling to work is not exactly a walk in the park. You’d think you can just feed it your messy dataset and it will show you what the data is like. But I encountered some problems:

  • I got errors about missing Python packages in some situations.
  • ydata doesn’t seem to like dataframes with only string columns.
(more…)
An e-ink display showing an amount of 837 euros with a field of tulips as background.

Showing a gift total on a Raspberry Pi with an e-ink display – how hard could it be?

TL;DR:

These Python and Raspberry Pi projects. They are fun aren’t they? And often they look deceptively simple. But you don’t see all the projects that failed and usually not where they struggled. This project got stuck (and almost failed) at:

  • Not being able to scrape dynamic website content.
  • When I found out how to do that, I couldn’t run my working Python code on the Raspberry Pi.
  • That turned out to be because the scraping packages use a chromium browser, but not for the ARM processor that the Raspberry Pi has.
  • And to top it all off, the Python package for the Inky Impression e-ink display had some kind of problem running numpy.
(more…)

A Strava dashboard on a Raspberry Pi (Part 3): The Strava API

This is part 3 of a series of blogposts on how I created a Strava dashboard on a Inky Impression e-ink display with a Raspberry Pi.

OAuth2

This was the part that I expected to be the hard part: getting my data from Strava. Or, to be more precise: getting the connection right so the Strava API would allow me to get that data. Because it requires authentication via the OAuth2 protocol and I’ve tried a similar thing a few years back with a Google API and I just didn’t get it. But now I do.

Strava API documentation

It requires a whole “dance” between your computer code and the Strava API where you exchange all kinds of tokens back and forth. Strava’s Getting Started with the Strava API document explains it quite well. And this blogpost by Graziano Fuccio helped me a lot with the Python code: http://www.grace-dev.com/python-apis/strava-api/.

Frustratingly I still didn’t get it to work though. The reason I found out, is because the URL of the authentication has changed. From https://www.strava.com/oauth/token it became  https://www.strava.com/api/v3/oauth/token. I found this elsewhere in the Stava API documentation, where the correct URL was shown. I’ve told Strava that their Getting Started documentation is outdated. They asked me to create a ticket and I’ve done so, but I don’t think they changed their document yet. But Graziano Fuccio did though.

(more…)