Recent Posts
- Categorising text with ChatGPT. Results may be messy.
- A Strava dashboard on a Raspberry Pi (Part 3): The Strava API
- A Strava dashboard on a Raspberry Pi (Part 2): Installing software
- A Strava dashboard on a Raspberry Pi (Part 1): Setting up the Raspberry Pi
- Building a Strava dashboard on a Raspberry Pi with an e-ink display
Recent Comments
Tag Archives: Spark
Book review: Spark in Action, 2nd edition
There are lots of books on Spark, but not a lot that aimed at the data engineer. Data engineers use Spark to ingest and transform data, which is different from what data scientists use it for. On the Roaring Elephant … Continue reading
Posted in Data engineering, Spark
Tagged Apache Spark, Jean-Georges Perrin, Roaring Elephant podcast, Spark
2 Comments
Dataworks Summit Berlin 2018, day two
Back for round two of keynotes, good technical sessions and discussing them with fellow data specialists in between. Keynotes First up was Frank Säuberlich from Teradata, who had an interesting example of machine learning for fraud detection at Danske Bank. … Continue reading
Posted in Conferences, Events
Tagged Apache Atlas, Apache Metron, Apache Ranger, Data Steward Studio, Dataworks Summit, Docker, GDPR, Personal data, Roaring Elephant podcast, Spark, Synerscope, TPC-H
Leave a comment