Expedition Data – Page 3 – My journey to learn all things data engineering (and that's a lot)

Using Stable Diffusion to create images for a presentation

Have you heard about text-to-image models like DALL-E 2, Stable Diffusion and MidJourney? These are AI algorithms that take in text (the “prompt”) that describes what kind of picture you want as input and as output the algorithm creates that picture, based on billions of images.

An example could be “an astronaut on a bicycle on the moon by Van Gogh”. And this would be one of the results:

{“prompt”: {“software”: “imaginairy”, “prompts”: [[1, “an astronaut on a bicycle on the moon in the style of Van Gogh”]], “prompt_strength”: 7.5, “init_image”: “None”, “init_image_strength”: 0.6, “seed”: 938321671, “steps”: 40, “height”: 512, “width”: 512, “upscale”: false, “fix_faces”: false, “sampler_type”: “plms”}}

I got access to DALL-E 2 in July this year. DALL-E 2 is a closed source algorithm made by OpenAI. You can sign up to request access to DALL-E 2. Once you get access you can use it for free for a limited of runs. After that you have to pay to use it more.

(more…)

By Marcel-Jan Krijgsman, 3 yearsOctober 12, 2022 ago

Howto

The blog is back

Well, that was scary. Just before I went on holiday I switched providers for my marcel-jan.eu domain. And while I had some time build in before going on vacation, there were problems with the transfer code not working. Because apparently the .eu domain is different from the regular .nl domain. In the end I managed to get my marcel-jan.eu mail working just the evening before leaving. But I saw no way to migrate the blog Read more

By Marcel-Jan Krijgsman, 3 yearsAugust 20, 2022 ago

Active Learning

I started vlogging about data mesh (and other things)

Last June I made a short video while walking in the park next to the DIKW Intelligence office. And I posted it on LinkedIn. To my surprise it did very well. So I thought: why not make more of these short videos on data topics? And why not make them in somewhere in nature?

I’m on my bike almost every day this time of year. Surely I could make a short stop and do a little talk? I started to make them in Dutch and then also in English. (more…)

By Marcel-Jan Krijgsman, 3 years ago

Howto

Adding the track of my bike ride on a Folium map

Having markers of videos and photos taken during my bike ride is cool and all, but how about having a track of the bike ride itself? All my bike rides are registered on Strava, the cycling and running app. Strava has an API for developers, but it requires connecting via OAuth 2.0 and knowledge of the API. I decided to go an easier route: because I’m Strava Premium member, I can download the GPX track of any ride, including my own.

These .gpx track files are of the same XML structure as we saw embedded in video files in my last blogpost. I can just open the file and use almost the same Python code to read the locations.

(more…)

By Marcel-Jan Krijgsman, 4 yearsMay 23, 2022 ago

Data engineering

Digging into video files for geolocations

So far I’ve found geolocations in XML metadata that my actioncam stores on disk as seperate .XML files and I’ve found them in JPG files. When I showed the cool maps I made to my father, he asked if I could create maps from his holiday videos. So that he can show cool maps in his video compilations.

Where do locations get stored in video files?

My father has a Sony PJ650VE video camera that makes videos in AVCHD format. Even the camera itself can show you a map of a video location. So I knew it should store geolocations somewhere. But looking on disk I saw no handy metadata files for me to read. So where did the locations go?

I learned that video formats like MP4, Quicktime (.mov) and AVCHD have EXIF metadata stored in them, just like JPG files. Luckily I had all the videos my father had made of our trip to the east coast of the USA in 2013. So I had lots of examples of AVCHD files to work with.

(more…)

By Marcel-Jan Krijgsman, 4 yearsMay 21, 2022 ago

Data engineering

Photo locations, marker icons and displaying photos on my map

When I was finished last week with creating my video location map in Python, I thought “shame I can’t plot photo locations”. That’s because my Fuji X-T30 camera doesn’t store GPS info. When I bought the camera I assumed every modern camera had GPS tagging, so I didn’t even checked that feature. Too bad. But I also made some photo’s during my vacations with my humble iPhone 8. And it does have GPS tags. So let’s plot some photo locations.

(more…)

By Marcel-Jan Krijgsman, 4 yearsMay 11, 2022 ago

Data engineering

Making my video location map even better with Folium

Yesterday I shared how I plotted locations of videos shot with my Sony FDR-X3000 camera on a map. I was already pretty happy. Then I got a tip from Twitter user Bob Haffner (@bobhaffner): why not use Folium to create my maps? Huh? I already got a working map now, didn’t I? Well creating a map with matplotlib is a bit of a hassle. You’ve got to download a base map from Openstreetmap.org. And if Read more

By Marcel-Jan Krijgsman, 4 yearsMay 4, 2022 ago

Data engineering

Plotting video locations from my Sony camera in Python

Two years ago I bought a Sony FDR-X3000 actioncam to record video on my bike rides. And I’m really happy about it. It’s just great reliving my rides in 4K, going downhill for kilometers from some col I climbed. I also make compilation videos for fellow cyclists. Like these:

(more…)

By Marcel-Jan Krijgsman, 4 yearsMay 3, 2022 ago

Active Learning

Actually, you do security to stop these guys

Today I am studying for the Microsoft Certified: Azure Data Engineer Associate exam. And currently I’m going through some dry stuff on database security. Today I also read this article on Krebs on Security about the LAPSUS$ hackers who stole lots of source code. Private chat messages of this hacker collective got out in the open recently and it is telling how they look at their targets. (more…)

By Marcel-Jan Krijgsman, 4 years ago

Python

Handling far future dates in pandas

Recently I got the request to add specific data quality metadata with csv datasets that my client delivers to customers. It was very simple. Just counts, min, max and -in case of integers – sums per attribute. Not a difficult task. After a short talk with architects we decided to build this with Python and pandas. Because my efforts were required in another project at that time, my fellow DIKW entrepeneur Wyas build most of it. It worked out well. It ran through a couple of GBs in minutes.

(more…)

By Marcel-Jan Krijgsman, 4 yearsMarch 24, 2022 ago