Monthly Archives: April 2025

Profiling data with ydata in PySpark

When you got a dataset to explore, there are several ways to do that in PySpark. You can do a describe or a summary. But if you want something a little more advanced, and if you want to get a … Continue reading

Posted in Data management, Python, Spark | Tagged , , , | Leave a comment

My experiences with Azure Purview

At my last customer I have extensively worked with Ataccama, a data management product. It has a data catalog to store metadata on datasets, and it can do data quality checks. In Azure Microsoft has a data management product too. … Continue reading

Posted in Azure, Data engineering, Data management | Tagged , , , , | Leave a comment

Things I learned about Azure Data Fabric

Currently I’m helping colleagues to read open data in Azure Data Fabric. Here are some of my experiences with it. I don’t want to do an extensive description of what Data Fabric is. In short, if you have an organisational … Continue reading

Posted in Azure, Cloud, Things I Learned | Tagged , , , , , , | Leave a comment