Recent Comments
Monthly Archives: April 2025
Profiling data with ydata in PySpark
When you got a dataset to explore, there are several ways to do that in PySpark. You can do a describe or a summary. But if you want something a little more advanced, and if you want to get a … Continue reading
Posted in Data management, Python, Spark
Tagged data profiling, data quality, PySpark, Python
Leave a comment
My experiences with Azure Purview
At my last customer I have extensively worked with Ataccama, a data management product. It has a data catalog to store metadata on datasets, and it can do data quality checks. In Azure Microsoft has a data management product too. … Continue reading
Posted in Azure, Data engineering, Data management
Tagged azure, data catalog, data management, data quality, purview
Leave a comment
Things I learned about Azure Data Fabric
Currently I’m helping colleagues to read open data in Azure Data Fabric. Here are some of my experiences with it. I don’t want to do an extensive description of what Data Fabric is. In short, if you have an organisational … Continue reading
Posted in Azure, Cloud, Things I Learned
Tagged azure, Data Fabric, Data Factory, Lakehouse, Notebook, Python, Things I Learned
Leave a comment