Book review: Spark in Action, 2nd edition

There are lots of books on Spark, but not a lot that aimed at the data engineer. Data engineers use Spark to ingest and transform data, which is different from what data scientists use it for.

On the Roaring Elephant podcast I heard an interview with Jean-Georges Perrin, author of Spark in Action, 2nd Edition, and it was clear that this would be a very data engineering centered Spark book. So I decided to buy the ebook (also because, as a Patreon of the Roaring Elephant podcast, I have a discount key at Manning Publishing).

Spark in Action, 2nd Edition, is not yet finished. It’s a so called MEAP (Manning Early Access Program), which means the author is still writing parts. But he already wrote chapters 1 to 15 and many appendices, so he seems pretty far advanced. I’ve read all the regular chapters and I can honestly say that I did a little proofreading.

(more…)

How to learn Big Data

“How do you got in Big Data?”, is a question that people asked me a couple of times now. So let me give that answer in a blogpost as well.

I’ve used eight sources of Big Data related knowledge and skills:

  • Massive Open Online Courses (MOOCs)
  • Books
  • Meetups and summits
  • Podcasts
  • Videos
  • Online documentation
  • Hands-on experience
  • Learning sites/”universities” of vendors

(more…)