Recent Comments
Tag Archives: Roaring Elephant podcast
Book review: Spark in Action, 2nd edition
There are lots of books on Spark, but not a lot that aimed at the data engineer. Data engineers use Spark to ingest and transform data, which is different from what data scientists use it for. On the Roaring Elephant … Continue reading
Posted in Data engineering, Spark
Tagged Apache Spark, Jean-Georges Perrin, Roaring Elephant podcast, Spark
2 Comments
Dataworks Summit Berlin 2018, day two
Back for round two of keynotes, good technical sessions and discussing them with fellow data specialists in between. Keynotes First up was Frank Säuberlich from Teradata, who had an interesting example of machine learning for fraud detection at Danske Bank. … Continue reading
Posted in Conferences, Events
Tagged Apache Atlas, Apache Metron, Apache Ranger, Data Steward Studio, Dataworks Summit, Docker, GDPR, Personal data, Roaring Elephant podcast, Spark, Synerscope, TPC-H
Leave a comment
How to learn Big Data
“How do you got in Big Data?”, is a question that people asked me a couple of times now. So let me give that answer in a blogpost as well. I’ve used eight sources of Big Data related knowledge and … Continue reading