{"id":1968,"date":"2026-01-07T16:25:30","date_gmt":"2026-01-07T16:25:30","guid":{"rendered":"https:\/\/marcel-jan.eu\/datablog\/?p=1968"},"modified":"2026-03-30T09:57:05","modified_gmt":"2026-03-30T09:57:05","slug":"data-engineering-in-the-european-cloud-part-2-scaleway","status":"publish","type":"post","link":"https:\/\/marcel-jan.eu\/datablog\/2026\/01\/07\/data-engineering-in-the-european-cloud-part-2-scaleway\/","title":{"rendered":"Data engineering in the European cloud &#8211; Part 2: Scaleway"},"content":{"rendered":"\n<p>This is Part 2 in a series where I try to create a data engineering environment in the European cloud. In <a href=\"https:\/\/marcel-jan.eu\/datablog\/2026\/01\/05\/data-engineering-in-the-european-cloud-part-1-the-plan\/\">Part 1<\/a> I described my plan for creating a data lakehouse in the European cloud. Now it&#8217;s time to get our hands dirty. We&#8217;re going to do this in the <a href=\"https:\/\/www.scaleway.com\/en\/\">Scaleway cloud<\/a>.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The architecture<\/h2>\n\n\n\n<p>To get this data lakehouse running we will create a <a href=\"https:\/\/kubernetes.io\">Kubernetes<\/a> cluster and <a href=\"https:\/\/en.wikipedia.org\/wiki\/Object_storage\">object storage<\/a> for our data storage. In Kubernetes we can run containerised applications that will run our data lakehouse. I&#8217;ve consulted ChatGPT for this architecture. It had a better and more modern solution than I originally had in mind.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"683\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/ChatGPT-Image-Euro-data-lakehouse-1024x683.png\" alt=\"\" class=\"wp-image-2012\" srcset=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/ChatGPT-Image-Euro-data-lakehouse-1024x683.png 1024w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/ChatGPT-Image-Euro-data-lakehouse-300x200.png 300w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/ChatGPT-Image-Euro-data-lakehouse-768x512.png 768w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/ChatGPT-Image-Euro-data-lakehouse-360x240.png 360w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/ChatGPT-Image-Euro-data-lakehouse.png 1536w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>We&#8217;re going to use the <a href=\"https:\/\/iceberg.apache.org\">Apache Iceberg<\/a> open table format. This will allow us to create database like tables based on <a href=\"https:\/\/parquet.apache.org\/docs\/file-format\/\">Parquet<\/a> formatted files. <a href=\"https:\/\/projectnessie.org\">Nessie<\/a> will be the Iceberg data catalog (Hive Metastore was another option). It allows our data solutions to find the Iceberg tables and underlying Parquet files.<\/p>\n\n\n\n<p><a href=\"https:\/\/trino.io\">Trino<\/a> will be the query engine. That will be the fastest way to get our first queries going.<\/p>\n\n\n\n<!--more-->\n\n\n\n<h2 class=\"wp-block-heading\">Starting at Scaleway<\/h2>\n\n\n\n<p>First of all we need to create an account at Scaleway. You get 52 days of free usage (as I recall). So you have time to try things out. One important thing when creating your account: you will be asked to verify your account. You get limited quota if you don&#8217;t and these will not be enough to create the Kubernetes cluster. So I guess you have to. During the verification you will be asked to hold an ID in front of the camera and make a photo of your face.<\/p>\n\n\n\n<p>After creating your account you will be asked to define an organisation. Note: I named my organisation here after my employer, but this is project really is a private one of my own.<\/p>\n\n\n\n<p>Next I made a project in which I misspelled sovereign. And you can&#8217;t rename it afterwards. So I have to live with that for now.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"329\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-05-at-21.33.59-1024x329.png\" alt=\"\" class=\"wp-image-1969\" srcset=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-05-at-21.33.59-1024x329.png 1024w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-05-at-21.33.59-300x96.png 300w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-05-at-21.33.59-768x247.png 768w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-05-at-21.33.59.png 1185w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Creating the Kubernetes cluster<\/h2>\n\n\n\n<p>After logging in at Scaleway you enter the Scaleway Console. On the left hand menu you find the products you can build. Here, go to Containers, and then Kubernetes. Choose the region where you want to create the Kubernetes cluster. I&#8217;m starting completely greenfield here, but if you have other parts of your architecture in a particular region, you probably want this new cluster in the same region.<\/p>\n\n\n\n<p>I&#8217;m going for the cheapest options here. So I&#8217;m going with the free control pane.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"610\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-05-at-21.47.59-1024x610.png\" alt=\"\" class=\"wp-image-1970\" srcset=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-05-at-21.47.59-1024x610.png 1024w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-05-at-21.47.59-300x179.png 300w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-05-at-21.47.59-768x457.png 768w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-05-at-21.47.59.png 1342w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>You are also going to need a Virtual Private Network, which you can create during this process. And this is free. <\/p>\n\n\n\n<p>Next you need to configure a pool and this is where the costs enter the picture. Notice that you can choose a Development node type, which has the cheapest node type: DEV-M1. It is \u20ac57.53 per month. But an important point: unlike <a href=\"https:\/\/azure.microsoft.com\/en-us\/products\/kubernetes-service\">Azure Kubernetes Service<\/a>, you can&#8217;t stop the cluster to cut down the costs. Once it runs, it runs. (As far as I can remember I was able to stop AKS when I wasn&#8217;t using it).<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"488\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-05-at-21.55.24-1024x488.png\" alt=\"\" class=\"wp-image-1971\" srcset=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-05-at-21.55.24-1024x488.png 1024w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-05-at-21.55.24-300x143.png 300w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-05-at-21.55.24-768x366.png 768w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-05-at-21.55.24-1536x732.png 1536w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-05-at-21.55.24.png 2005w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Once the cluster is running, you can go to the cluster page and then to the Kubernetes Dashboard to see the state of the cluster, or generate a kubeconfig.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"199\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-05-at-22.04.09-1024x199.png\" alt=\"\" class=\"wp-image-1972\" srcset=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-05-at-22.04.09-1024x199.png 1024w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-05-at-22.04.09-300x58.png 300w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-05-at-22.04.09-768x150.png 768w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-05-at-22.04.09.png 1217w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>I&#8217;m using kubectl and helm on the command line to get things installed. For that, the kubeconfig is very handy.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Object storage<\/h2>\n\n\n\n<p>You can find this in the Scaleway Console on the left hand menu under Storage, Object storage. I have created a bucket with the name &#8220;lakehouse&#8221; in the same region as the Kubernetes cluster. You can choose between Private and Public visibility, but <strong>don&#8217;t choose Public<\/strong>. It is not secure and really Private works fine in this project.<\/p>\n\n\n\n<p>You can choose the use case for the bucket. &#8220;Big data&#8221; seemed like a good choice.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"931\" height=\"851\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-03-at-22.39.08.png\" alt=\"\" class=\"wp-image-1973\" srcset=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-03-at-22.39.08.png 931w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-03-at-22.39.08-300x274.png 300w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-03-at-22.39.08-768x702.png 768w\" sizes=\"auto, (max-width: 931px) 100vw, 931px\" \/><\/figure>\n\n\n\n<p>And good news: object storage comes with a free 90 day trial up to 750 GB. That is more than enough for now.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"824\" height=\"276\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-03-at-22.40.20.png\" alt=\"\" class=\"wp-image-1974\" srcset=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-03-at-22.40.20.png 824w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-03-at-22.40.20-300x100.png 300w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-03-at-22.40.20-768x257.png 768w\" sizes=\"auto, (max-width: 824px) 100vw, 824px\" \/><\/figure>\n\n\n\n<p> It also has an outgoing transfer setting for which I chose 10 GB. <\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"818\" height=\"664\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-03-at-22.41.05.png\" alt=\"\" class=\"wp-image-1976\" srcset=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-03-at-22.41.05.png 818w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-03-at-22.41.05-300x244.png 300w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-03-at-22.41.05-768x623.png 768w\" sizes=\"auto, (max-width: 818px) 100vw, 818px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Access to the object storage<\/h2>\n\n\n\n<p>I am going to give you the benefit of not making the same mistakes here as I did. So that you don&#8217;t need to wonder why you can&#8217;t seem to reach the block storage.<\/p>\n\n\n\n<p>In the Scaleway Console, under Security &amp; Identity (in the left hand menu), you&#8217;ll find IAM, or Identity and Access Management. Go to Applications and create an application. I have named it Nessie.<\/p>\n\n\n\n<p>Under this Application we can add an API key, which we&#8217;ll be going to need when we&#8217;re installing Nessie. And yes, this API key will be used for Object Storage.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"614\" height=\"893\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-06-at-12.06.55.png\" alt=\"\" class=\"wp-image-1980\" srcset=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-06-at-12.06.55.png 614w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-06-at-12.06.55-206x300.png 206w\" sizes=\"auto, (max-width: 614px) 100vw, 614px\" \/><\/figure>\n\n\n\n<p>Copy the Access Key ID and Secret Key ID. It will only be shown once.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"614\" height=\"586\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-06-at-12.09.12.png\" alt=\"\" class=\"wp-image-1981\" srcset=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-06-at-12.09.12.png 614w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-06-at-12.09.12-300x286.png 300w\" sizes=\"auto, (max-width: 614px) 100vw, 614px\" \/><\/figure>\n\n\n\n<p>Your API key still has zero access. For this we need to add a Policy to the Application. <\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"624\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-06-at-12.13.17-1024x624.png\" alt=\"\" class=\"wp-image-1982\" srcset=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-06-at-12.13.17-1024x624.png 1024w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-06-at-12.13.17-300x183.png 300w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-06-at-12.13.17-768x468.png 768w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-06-at-12.13.17.png 1351w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Next we need to create rules. The scope of these will be our Project.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"463\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-06-at-12.14.44-1024x463.png\" alt=\"\" class=\"wp-image-1983\" srcset=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-06-at-12.14.44-1024x463.png 1024w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-06-at-12.14.44-300x136.png 300w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-06-at-12.14.44-768x347.png 768w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-06-at-12.14.44.png 1331w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Next we choose the permission sets. Under Storage, choose ObjectStorageFullAccess. I made the mistake to pick ObjectStorageBucketPolicyFullAccess instead and that does not work.<\/p>\n\n\n\n<p>With the policy created, your application should look something like this:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"661\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-06-at-12.33.22-1024x661.png\" alt=\"\" class=\"wp-image-1984\" srcset=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-06-at-12.33.22-1024x661.png 1024w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-06-at-12.33.22-300x194.png 300w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-06-at-12.33.22-768x496.png 768w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-06-at-12.33.22.png 1238w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><strong>[Update 2026-03-30]<\/strong> Now you also need to add a Kubernetes access policy. So create an extra Permission set with KubernetesFullAccess.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Checking the Kubernetes cluster<\/h2>\n\n\n\n<p>The next steps will be done on the command line. For this you have to install <a href=\"https:\/\/kubernetes.io\/docs\/tasks\/tools\/#kubectl\">kubectl<\/a> and <a href=\"https:\/\/helm.sh\/docs\/intro\/install\/\">helm<\/a> on your machine.  I have done this on a MacBook. According to the kubectl and helm pages there are Windows versions as well.<\/p>\n\n\n\n<p>To connect to your Kubernetes cluster with kubectl use the kubeconfig from your Kubernetes cluster overview and download the file to a directory of your choosing.<\/p>\n\n\n\n<p>On Mac\/Linux you create variable called KUBECONFIG. Now kubectl and helm will find their way to the Kubernetes cluster.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>export KUBECONFIG=kubeconfig-k8s-ams-amazing-merkle.yaml<\/code><\/pre>\n\n\n\n<p>Let&#8217;s see if this has worked:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl cluster-info<\/code><\/pre>\n\n\n\n<p>This will give output like this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Kubernetes control plane is running at https:\/\/d5752251-e566-4dea-af21-f718b1f9466c.api.k8s.nl-ams.scw.cloud:6443\nCoreDNS is running at https:\/\/d5752251-e566-4dea-af21-f718b1f9466c.api.k8s.nl-ams.scw.cloud:6443\/api\/v1\/namespaces\/kube-system\/services\/coredns:dns\/proxy\n\nTo further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.<\/code><\/pre>\n\n\n\n<p>That vague message tells us that Kubernetes is up and running.<\/p>\n\n\n\n<p><strong>[Update 2026-03-30]<\/strong> If you get the message &#8220;Unhandled Error&#8221; err=&#8221;couldn&#8217;t get current server API group list: the server has asked for the client to provide credentials&#8221;, you have to add a Permission set to your application with KubernetesFullAccess.<\/p>\n\n\n\n<p>Also create a namespace for this project:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl create namespace lakehouse<\/code><\/pre>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Installing Nessie<\/h2>\n\n\n\n<p>Time to install Nessie. We&#8217;re going to use Helm for this. Helm is the package manager for Kubernetes. It allows us to install all necessary components for an application.<\/p>\n\n\n\n<p>First we add the Nessie repository to our Helm repos:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>helm repo add nessie https:\/\/charts.projectnessie.org\nhelm repo update<\/code><\/pre>\n\n\n\n<p>Now we need to tell Nessie where to find our object storage. For this I created a <a href=\"https:\/\/github.com\/Marcel-Jan\/de_europeancloud\/blob\/main\/Scaleway\/nessie-values.yaml\">nessie-values.yaml<\/a> file. You can <a href=\"https:\/\/github.com\/Marcel-Jan\/de_europeancloud\/blob\/main\/Scaleway\/nessie-values.yaml\">find my example<\/a> in my European cloud Github repo. <\/p>\n\n\n\n<p>Note: you need to change the following parameters:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>NESSIE_STORAGE_S3_ENDPOINT: You can find this in the Scaleway Console in the overview of your bucket.<\/li>\n\n\n\n<li>NESSIE_STORAGE_S3_ACCESS_KEY and NESSIE_STORAGE_S3_SECRET_KEY are your API access and secret keys that you saved earlier.<\/li>\n<\/ul>\n\n\n\n<p>With that all set, you can install Nessie:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>helm upgrade --install nessie nessie\/nessie -n lakehouse -f nessie-values.yaml<\/code><\/pre>\n\n\n\n<p>(Or helm install nessie nessie\/nessie -n lakehouse&nbsp;-f nessie-values.yaml . I went over a couple iterations.)<\/p>\n\n\n\n<p>A good indication if the installation was successful is in the logging of the server. For this first get the name of the Kubernetes pod that Nessie is running in:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl get all -n lakehouse\n\nNAME                                     READY   STATUS    RESTARTS   AGE\npod\/nessie-774fcdb9bf-t6rwv              0\/1     Running   0          9s<\/code><\/pre>\n\n\n\n<p>Now that we have the name of the Kubernetes pod (pod\/nessie-774fcdb9bf-t6rwv) we can ask for the logs:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl logs pod\/nessie-774fcdb9bf-t6rwv -n lakehouse<\/code><\/pre>\n\n\n\n<p>At the end of the logging (provided you haven&#8217;t waited too long after installing) there should be this message:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>INFO exec -a \"java\" java -XX:MaxRAMPercentage=80.0 -XX:+UseParallelGC -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=20 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -XX:+ExitOnOutOfMemoryError -cp \".\" -jar \/deployments\/quarkus-run.jar\nINFO running in \/deployments\n _   _               _         ____\n| \\ | |             (_)       \/ __ \\\n|  \\| | ___  ___ ___ _  ___  \/ \/__\\\/ ___ _ ____   _____ _ __\n| . ` |\/ _ \\\/ __\/ __| |\/ _ \\ \\___. \\\/ _ \\ '__\\ \\ \/ \/ _ \\ '__|\n| |\\  |  __\/\\__ \\__ \\ |  __\/ \/\\__\/ \/  __\/ |   \\ V \/  __\/ |\n\\_| \\_\/\\___||___\/___\/_|\\___| \\____\/ \\___|_|    \\_\/ \\___|_|\n\n                               https:&#47;&#47;projectnessie.org\/\n\n                                    Powered by Quarkus 3.30.2<\/code><\/pre>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Installing Trino<\/h2>\n\n\n\n<p>Trino is a <em>distributed<\/em> query engine. You can run it on many servers. That means it has a familiar architecture: one with a coordinator that keeps track of all the workers that are doing query work.<\/p>\n\n\n\n<p>Installing and configuring Trino took me the most time, because I followed the advice of ChatGPT. And while AI can come up with pretty decent code, too often I&#8217;ve seen ChatGPT come up with non existent or wrongly spelled parameter names. And this was such an occasion. Actually it also got the Nessie parameters wrong, but I was quickly able to find a correct example of nessie-values.yaml. But be warned when ChatGPT gives you parameters!<\/p>\n\n\n\n<p>ChatGPT&#8217;s version of trino-values.yaml (the file to be used by Helm for installation) was a mess. So I had to write a trino-values.yaml file myself. I found some hints for this in the <a href=\"https:\/\/trino.io\/docs\/current\/installation\/kubernetes.html\">Trino documentation<\/a>, but not enough to get it working. <\/p>\n\n\n\n<p>The problem I ran into was that the Trino workers weren&#8217;t able to find the Trino coordinator. This I found out when I tried to get the catalog info from Trino:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>trino&gt; SHOW CATALOGS;\nError starting query at http:\/\/localhost:8080\/v1\/statement returned an invalid response: JsonResponse{statusCode=404, headers={content-length=&#91;39], content-type=&#91;text\/plain;charset=iso-8859-1], date=&#91;Sat, 03 Jan 2026 23:18:34 GMT]}, hasValue=false} &#91;Error: Error 404 Not Found: HTTP 404 Not Found]<\/code><\/pre>\n\n\n\n<p>I checked the logging of the Trino workers and there were a lot of these messages:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>026-01-03T23:13:17.589Z\tWARN\thttp-client-memory-manager-scheduler-1\tio.trino.memory.RemoteNodeMemory\tError fetching memory info from http:\/\/100.64.2.85:8080\/v1\/memory: Failed communicating with server: http:\/\/100.64.2.85:8080\/v1\/memory\n2026-01-03T23:13:21.173Z\tWARN\thttp-client-node-manager-scheduler-1\tio.trino.node.RemoteNodeState\tError fetching node state from http:\/\/100.64.2.85:8080\/v1\/info: Failed communicating with server: http:\/\/100.64.2.85:8080\/v1\/info<\/code><\/pre>\n\n\n\n<p>At one point I asked Claude Code to assist me. And it turns out Claude Code is great at this. It&#8217;s like working with an experienced data engineer who checks all the failure points and logs. And then comes with good advice how to pursue further.<\/p>\n\n\n\n<p>Claude Code came up with a version of trino-values.yaml that I would never have produced. And now you can use it too, because I have <a href=\"https:\/\/github.com\/Marcel-Jan\/de_europeancloud\/blob\/main\/Scaleway\/trino-values.yaml\">made it available on Github<\/a>. Here also you need to change a couple of parameters:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>s3.endpoint: put the endpoint of the bucket here.<\/li>\n\n\n\n<li>s3.aws-access-key and s3.aws-secret-key: use the API keys you saved earlier here.<\/li>\n\n\n\n<li>iceberg.nessie-catalog.default-warehouse-dir: if your bucket has a different name, change it here.<\/li>\n<\/ul>\n\n\n\n<p>Now we can install Trino with Helm:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>helm repo add trino https:\/\/trinodb.github.io\/charts\nhelm repo update\n\nhelm install trino trino\/trino -f trino-values-fixed.yaml -n lakehouse<\/code><\/pre>\n\n\n\n<p>The logs don&#8217;t give you a lot of proof that the system is actually working.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>2026-01-04T22:06:37.394Z\tINFO\tmain\tio.trino.connector.StaticCatalogManager\t-- Added catalog iceberg using connector iceberg --\n2026-01-04T22:06:37.395Z\tINFO\tmain\tio.trino.connector.StaticCatalogManager\t-- Loading catalog tpch --\n2026-01-04T22:06:37.862Z\tINFO\tmain\torg.hibernate.validator.internal.util.Version\tHV000001: Hibernate Validator 9.0.1.Final\n2026-01-04T22:06:38.482Z\tINFO\tmain\tBootstrap\tPROPERTY                             DEFAULT     RUNTIME     DESCRIPTION\n2026-01-04T22:06:38.482Z\tINFO\tmain\tBootstrap\ttpch.column-naming                   SIMPLIFIED  SIMPLIFIED\n2026-01-04T22:06:38.482Z\tINFO\tmain\tBootstrap\ttpch.double-type-mapping             DOUBLE      DOUBLE\n2026-01-04T22:06:38.482Z\tINFO\tmain\tBootstrap\ttpch.max-rows-per-page               1000000     1000000\n2026-01-04T22:06:38.482Z\tINFO\tmain\tBootstrap\ttpch.partitioning-enabled            true        true\n2026-01-04T22:06:38.482Z\tINFO\tmain\tBootstrap\ttpch.predicate-pushdown-enabled      true        true\n2026-01-04T22:06:38.482Z\tINFO\tmain\tBootstrap\ttpch.splits-per-node                 1           4\n2026-01-04T22:06:38.482Z\tINFO\tmain\tBootstrap\ttpch.table-scan-redirection-catalog  ----        ----\n2026-01-04T22:06:38.482Z\tINFO\tmain\tBootstrap\ttpch.table-scan-redirection-schema   ----        ----\n2026-01-04T22:06:39.562Z\tINFO\tmain\tio.trino.connector.StaticCatalogManager\t-- Added catalog tpch using connector tpch --\n2026-01-04T22:06:39.568Z\tINFO\tmain\tio.trino.security.AccessControlManager\tUsing system access control: default\n2026-01-04T22:06:39.643Z\tINFO\tmain\tio.trino.server.Server\tServer startup completed in 53.34s\n2026-01-04T22:06:39.644Z\tINFO\tmain\tio.trino.server.Server\t======== SERVER STARTED ========<\/code><\/pre>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Proof of the pudding<\/h2>\n\n\n\n<p>Therefore we need to try out creating an Iceberg table from Trino next.<\/p>\n\n\n\n<p>First you need to do a port forward from Trino in Kubernetes to your local host. <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl port-forward -n lakehouse svc\/trino 8080:8080<\/code><\/pre>\n\n\n\n<p>I have downloaded the Trino command line client from <a href=\"https:\/\/trino.io\/download\">the Trino download page<\/a> to try my first connection. You can connect to it with this command:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>trino --server http:\/\/localhost:8080<\/code><\/pre>\n\n\n\n<p>Now we can ask for the available catalogs. You should see the following result:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>trino&gt; SHOW CATALOGS;\n Catalog\n---------\n iceberg\n system\n tpcds\n tpch\n(4 rows)<\/code><\/pre>\n\n\n\n<p>This means that Trino is working and there is a catalog named iceberg. Nice! And the Trino worker can contact the Trino coordinator.<\/p>\n\n\n\n<p>But we also need to test if we can use Iceberg. For this we are going to create a table. First we create a schema:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>CREATE SCHEMA IF NOT EXISTS iceberg.test_schema;<\/code><\/pre>\n\n\n\n<p>Trino will respond with the message &#8220;CREATE SCHEMA&#8221; and that would mean it was successful? But now we create the table:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>CREATE TABLE iceberg.test_schema.test_table (\n    id INTEGER,\n    name VARCHAR,\n    created_at TIMESTAMP\n) WITH (\n    format = 'PARQUET'\n);<\/code><\/pre>\n\n\n\n<p>If the connection with the object storage is successful, the message should be:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>CREATE TABLE<\/code><\/pre>\n\n\n\n<p>But if the connection with the object storage is not working, you would get:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Query 20260105_103349_00006_7kewg failed: Failed checking new table's location: s3:\/\/lakehouse\/test_schema\/test_table-d1400bf0888b41daa07240bcfe18d6e3<\/code><\/pre>\n\n\n\n<p>Let&#8217;s try to insert a row in the table:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>INSERT INTO iceberg.test_schema.test_table VALUES\n      (1, 'test', CURRENT_TIMESTAMP);<\/code><\/pre>\n\n\n\n<p>Trino should respond with:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>INSERT: 1 row<\/code><\/pre>\n\n\n\n<p>I call that success.<\/p>\n\n\n\n<p>For good measure, let&#8217;s try to select the data:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>trino&gt; select * from iceberg.test_schema.test_table;\n id | name |         created_at\n----+------+----------------------------\n  1 | test | 2026-01-05 11:38:18.962000\n(1 row)<\/code><\/pre>\n\n\n\n<p>Great! We have a working data lakehouse with Iceberg.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">DBeaver connection<\/h2>\n\n\n\n<p>There&#8217;s a lot more that I would like to do with this environment, but let&#8217;s close for now with a DBeaver connection. <a href=\"https:\/\/dbeaver.io\">DBeaver<\/a> is a popular database tool. I use it all the time. It would be nice to reach my new data lakehouse from it. DBeaver has drivers for many types of databases. It does not come as a big surprise that DBeaver has a Trino driver as well, though you do need to download it.<\/p>\n\n\n\n<p>This is how you can configure your Trino connection (this works if your port forwarding is still active):<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"952\" height=\"399\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-06-at-17.17.43.png\" alt=\"\" class=\"wp-image-1988\" srcset=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-06-at-17.17.43.png 952w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-06-at-17.17.43-300x126.png 300w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-06-at-17.17.43-768x322.png 768w\" sizes=\"auto, (max-width: 952px) 100vw, 952px\" \/><\/figure>\n\n\n\n<p>As you can see, your username is admin, but it doesn&#8217;t need a password. And now you can browse through the Iceberg schemas.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"395\" height=\"273\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-06-at-17.19.03.png\" alt=\"\" class=\"wp-image-1989\" srcset=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-06-at-17.19.03.png 395w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-06-at-17.19.03-300x207.png 300w\" sizes=\"auto, (max-width: 395px) 100vw, 395px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusions<\/h2>\n\n\n\n<p>Note: these are opinions of my own.<\/p>\n\n\n\n<p>I think this experiment has proven that it is possible to create a data lakehouse in the European cloud, or at least at Scaleway. And we don&#8217;t have to stop there: Spark, Kafka and other relevant products in the data ecosystem are within reach. The pricing is not that different from pricing at US cloud providers.<\/p>\n\n\n\n<p>Creating the Kubernetes cluster and the object storage was quite easy. Once you know how the access is supposed to work in Scaleway&#8217;s IAM, it is not too hard to grasp.<\/p>\n\n\n\n<p>The installation of Nessie was not overly complex. But without assistance of Claude Code I wonder if I would have had a working Trino setup within reasonable time (remember: this is a private project). Now that I do have a working trino-values.yaml, the setup of a data lakehouse is not so hard anymore. I can repeat this at other cloud providers.<\/p>\n\n\n\n<p>I&#8217;m pretty happy about the user experience at Scaleway. It might not be completely on the same level as the big US cloud providers, but it comes pretty close. The user interface is very similar and it was easy to find things. <\/p>\n\n\n\n<p>If I were an IT manager and my company had to decide on a cloud provider, I would consider Scaleway as an option. If I were at Scaleway I would expand the data portfolio with a data lakehouse. It is not that difficult. Also, making it possible to stop the Kubernetes cluster to save costs, would be a quick-win.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Questions?<\/h2>\n\n\n\n<p>I will keep the data lakehouse setup running for a couple more days. If you would like me to try something out, let me know in the comments.<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>This is Part 2 in a series where I try to create a data engineering environment in the European cloud. In Part 1 I described my plan for creating a data lakehouse in the European cloud. Now it&#8217;s time to get our hands dirty. We&#8217;re going to do this in [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1966,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[377,191],"tags":[404,403,406],"class_list":["post-1968","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cloud","category-data-engineering","tag-data-lakehouse","tag-european-cloud","tag-scaleway"],"_links":{"self":[{"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/posts\/1968","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/comments?post=1968"}],"version-history":[{"count":40,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/posts\/1968\/revisions"}],"predecessor-version":[{"id":2075,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/posts\/1968\/revisions\/2075"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/media\/1966"}],"wp:attachment":[{"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/media?parent=1968"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/categories?post=1968"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/tags?post=1968"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}