The blog is back

Well, that was scary. Just before I went on holiday I switched providers for my marcel-jan.eu domain. And while I had some time build in before going on vacation, there were problems with the transfer code not working. Because apparently the .eu domain is different from the regular .nl domain.

In the end I managed to get my marcel-jan.eu mail working just the evening before leaving. But I saw no way to migrate the blog while packing my bags. So the blog was down for more than 2 weeks. Did anybody miss it?

After getting back home I had to piece back the WordPress blog with a .zip backup and a backup of the filesystem. Never done such a thing before. And the original WordPress blog on my old provider’s site was already gone. So there were no more alternatives to do a better export.

Importing did not go as planned

I started by installing WordPress at my new provider’s site. And I went to PHPMyAdmin, which is the tool to work with the database behind WordPress. I imported the .zip (with a .sql file in it). And.. no blogposts. A further look with PHPMyAdmin in the database showed that there were several xxx_posts tables. The one the WordPress site was looking in, was wplx_posts. My imported tables where called wp_posts and 4a2vK12BOL_posts. wp_posts contained old stuff. The 4a2vK12BOL_posts table turned out to have all my posts.

Time to play dirty with SQL

So how do I point WordPress to the right data? It’s good to have some SQL skills. What if.. hear me out.. I read the .sql file I got from the export, pick out the SQL to import the 4a2vK12BOL_posts table. Search and replace in the SQL text the term “4a2vK12BOL_posts” for “wplx_posts” in a text editor? And then import that? It’s dirty, I grant you that.

But it turns out, it works. As long as you don’t create any new posts beforehand that use the same ID as the ones you try to import. A quick removal of the Hello World post made sure of that.

And it worked. I got my posts back. Okay, that’s something. I don’t have to type all my writings from 2017 to now again.

I did something similar for the comments. Make sure you do that before the first comment spam arrives. Because it will overlap the ID in the comment table with the ones you try to import.

Now I need some images

I was not really surprised that restoring table contents did nothing for my images. Pretty sure that had to come from the filesystem. Luckily I had made a backup of all that. But where to get the image files and where to put them?

Well, looking over the sql for the posts table, I found references to image files like this one: https://marcel-jan.eu/datablog/wp-content/uploads/2017/11/Heart-Reanimation-65992.gif. So somewhere there should be a path with something like wp-content/uploads in the name and a lot of gifs and jpgs in it. I found that, uploaded the directories to the new site and now I had my images back.

That one time I used TablePress

My article about Lion’s Mane is one of the most popular blogposts for some reason. Lots of people who want to gain cognitive enhancement. (I wished my post about becoming a skeptic was just as popular. Oh well.) In that post was my one use of a TablePress table. How to get that back?

It turns out the data can be found in the options table. But I had some doubts whether importing it would mess other things up and whether TablePress would find it. So I dug in the Internet Archive to find the contents of the table, and used Excel to create a csv file of that table. Imported that in TablePress and hey presto: we got ourselves our table back.

Tags and categories

One thing I noticed that my categories and tags were gone. The categories were a big mess after 5 years of blogging. Actually it wasn’t a big loss. More like a good moment to rethink them. As for tags: it would be nice to retrieve them somehow.

Fortunately there is documentation on the data model of WordPress’ database. Like this site: https://wp-staging.com/docs/the-wordpress-database-structure/

From this I learned what tables I needed to import to get my tags back. It turns out it’s wplx_term_taxonomy and wplx_term_relationships. In wplx_term_taxonomy there were already 3 IDs taken. ID 2 and 3 were now a wp_theme, where in my old table they were categories.

I decided to remove ID 1, 2 and 3 from my insert statement and import that. If I’m missing 2 categories, that won’t hurt me a lot.

Anything else?

From the wp-staging article I learned I probably won’t be needing much more from the import. Maybe I will me missing some stuff from the options table, because there’s all kind of stuff that plugins put there. But I’m not going to open that can of worms.

I certainly learned a lot on WordPress and its database.. forcefully. Glad the blog is back on the road at my new provider.

Coverart by DALL-E 2

Posted in Howto | Tagged , , , , | Leave a comment

I started vlogging about data mesh (and other things)

Last June I made a short video while walking in the park next to the DIKW Intelligence office. And I posted it on LinkedIn. To my surprise it did very well. So I thought: why not make more of these short videos on data topics? And why not make them in somewhere in nature?

I’m on my bike almost every day this time of year. Surely I could make a short stop and do a little talk? I started to make them in Dutch and then also in English. Continue reading

Posted in Active Learning, Data engineering | Tagged , , | Leave a comment

Adding the track of my bike ride on a Folium map

Having markers of videos and photos taken during my bike ride is cool and all, but how about having a track of the bike ride itself? All my bike rides are registered on Strava, the cycling and running app. Strava has an API for developers, but it requires connecting via OAuth 2.0 and knowledge of the API. I decided to go an easier route: because I’m Strava Premium member, I can download the GPX track of any ride, including my own.

These .gpx track files are of the same XML structure as we saw embedded in video files in my last blogpost. I can just open the file and use almost the same Python code to read the locations.

Continue reading

Posted in Howto, Python | Tagged , , , , , , , | Leave a comment

Photo locations, marker icons and displaying photos on my map

When I was finished last week with creating my video location map in Python, I thought “shame I can’t plot photo locations”. That’s because my Fuji X-T30 camera doesn’t store GPS info. When I bought the camera I assumed every modern camera had GPS tagging, so I didn’t even checked that feature. Too bad. But I also made some photo’s during my vacations with my humble iPhone 8. And it does have GPS tags. So let’s plot some photo locations.

Continue reading

Posted in Data engineering, Howto, Python | Tagged , , , , , , , | Leave a comment

Making my video location map even better with Folium

Yesterday I shared how I plotted locations of videos shot with my Sony FDR-X3000 camera on a map. I was already pretty happy. Then I got a tip from Twitter user Bob Haffner (@bobhaffner): why not use Folium to create my maps?

Huh? I already got a working map now, didn’t I? Well creating a map with matplotlib is a bit of a hassle. You’ve got to download a base map from Openstreetmap.org. And if it’s too big, like the map of all my rides in the Vercors and Drôme (France) last year, you might not get it.

 

Folium

A quick look at blogs about Folium tell me you don’t need to download a map. It even does zoomable maps. Okay. Wasn’t exactly looking for that. But sounds great.

And I do like the markers you can create. Even with popup texts. Yes please!

 

Changing the code

Of course, the new version of this code is on Github in my repo: https://github.com/Marcel-Jan/media_gpsplot.

First of all we need to import Folium:

import folium

So I already have my dataframe with geo data. Don’t need to change that. But I’m going to change all the plotting stuff.

For Matplotlib we needed to define a boundary box. For Folium we only have to have the center point of the map. And you don’t need to load map images or anything here. So that is very nice.

# Find center of folium map
latitude_mean = geodf['latitude'].mean()
longitude_mean = geodf['longitude'].mean()

Now I define a map:

my_map = folium.Map(location=[latitude_mean, longitude_mean], zoom_start=12)

And for each point in my Pandas dataframe I want a marker. Folium markers allow you to add popup text, which you can make nice with HTML tags. I decided I wanted to have my filename and creation date in here.

for index, georow in geodf.iterrows():
    folium.Marker([georow['latitude'], georow['longitude']], popup=f"filename: {georow['xmlfilename']}</br>creationdate: {georow['creationdate']}").add_to(my_map)

 

Showing the map

But why didn’t PyCharm show my map? Well, it turns out that Folium is more notebook (Jupyter) oriented. Not to worry. You can save the html:

my_map.save('videogps_folium.html')

And when you open this file, there you have it: a wonderful, zoomable map with markers for all my video locations.

And it turns out that now the map is zoomable, this is actually very useful. Remember last blogpost that I would probably create my maps per day? Not anymore. I will just zoom in to that particular ride now.

 

As mentioned, you can find my new version of this code in my Github repo:

https://github.com/Marcel-Jan/media_gpsplot

You can also find an example output file:

https://github.com/Marcel-Jan/media_gpsplot/blob/main/videogps_folium.html

 

Other blogposts I wrote about geo data in Python:

Adding the track of my bike ride in Folium (Antpaths and Polylines)

Digging into video files for geolocations (Exif data in video files, running OS commands from Python, processing XML)

Photo location markers and displaying photos on a map (Accessing exif data in JPGs with Python, projecting photos on a Folium map).

 

Posted in Data engineering, Howto, Python | Tagged , , , | Leave a comment

Plotting video locations from my Sony camera in Python

Two years ago I bought a Sony FDR-X3000 actioncam to record video on my bike rides. And I’m really happy about it. It’s just great reliving my rides in 4K, going downhill for kilometers from some col I climbed. I also make compilation videos for fellow cyclists. Like these:

Continue reading

Posted in Data engineering, Howto, Python | Tagged , , , , , , | Leave a comment

Actually, you do security to stop these guys

Today I am studying for the Microsoft Certified: Azure Data Engineer Associate exam. And currently I’m going through some dry stuff on database security. Today I also read this article on Krebs on Security about the LAPSUS$ hackers who stole lots of source code. Private chat messages of this hacker collective got out in the open recently and it is telling how they look at their targets. Continue reading

Posted in Active Learning | Tagged , , , , | Leave a comment

Handling far future dates in pandas

Recently I got the request to add specific data quality metadata with csv datasets that my client delivers to customers. It was very simple. Just counts, min, max and -in case of integers – sums per attribute. Not a difficult task. After a short talk with architects we decided to build this with Python and pandas. Because my efforts were required in another project at that time, my fellow DIKW entrepeneur Wyas build most of it. It worked out well. It ran through a couple of GBs in minutes.

Continue reading

Posted in Python | Tagged , , , , , | Leave a comment

My Github repo got 50 stars

I never imagined myself as a maintainer of a data engineering related open source thing. Yet. But when I was working on our data engineering course, I needed some kind of data lake software. At first I used the Cloudera sandbox, but some of my colleagues tried it and they complained it took way too much time to start and way to many resources of their laptop. It would be a good bet that our students would get that problem too.

Long story short: I found that Big Data Europe already had a simple Dockerized Hadoop. They actually did all the hard work. But I wanted it to have Hive and Spark too. I went playing with docker-compose yml files and learned a lot from that BTW. And after some initial frustrations it finally worked. Continue reading

Posted in Apache Products for Outsiders, Data engineering | Tagged , , , , | Leave a comment