I feel great when I study

When I started studying Hadoop, Python and machine learning in 2016, I found something out that I didn’t expect. I feel better when I study. When I finished another problem, exam or course, and I stepped outside the house to do some shopping or to go to work, I felt great.

And this effect is pretty consistent. Currently I’m in week 3 of MongoDB for DBAs at MongoDB University and in lecture 35 of Elasticsearch 6 and Elastic Stack on Udemy. And I just feel like I can take on the world.

So how come? I think it’s a feeling of control. I decide on the study program. It’s not something I had to write up in a personal development program. None nagged me about it. I just thought “I need to know what Elasticsearch is” two weeks ago, found a course and there I went.

It’s also a feeling of worthwhile productivity. That I spend my time on the planet well. And knowing that you are building a foundation of knowledge you can do lots of cool stuff with, also works for me. I can’t wait to surprise people at work: “Actually, I do know MongoDB. And I’ve learned a thing or two about securing it.”

I don’t know if studying has this effect on everyone. I’m almost sure it doesn’t. Several people asked me “you don’t have children, do you?” True. But I also rarely watch TV. I don’t have Netflix. Because, while watching TV and series is fun, it doesn’t make me feel better. To be honest, social media and games are still on my list, but I now they are not there to make me feel better.

And in this fast-changing field of work, I think I can keep on learning things for a long time to come. It’s actually not a bad weird thing to have. (Also, more videos to come.)

Posted in Learning Big Data | Tagged , | Leave a comment

Playing with asteroids data in MongoDB

If there is one thing I learned when becoming a data engineer, it’s that having just Hadoop expertise is probably not enough. For starters: what it means to be a data engineer is not exactly sharply defined. Some say data engineers are (Java) developers. Some place data engineers more at the operations side. And at some organisations data engineers work with any combination of these products: Hadoop, ElasticSearch, MongoDB, Cassandra, relational databases and even less hip products.

So I thought it would be a good idea to broaden my horizons. One product that is used quite often, is MongoDB. MongoDB is a NoSQL database. And if you don’t exactly know what that means, I think you will get the idea after viewing this video I made.

Continue reading

Posted in NoSQL | Tagged , , , , | Leave a comment

Hadoop in a Hurry – Security

When talking about Hadoop security there are so many products and features. What do all of them do? This video gives a high over overview.

Posted in Apache Products for Outsiders, DBA2Hadoop | Tagged , , , , , , , , , | Leave a comment

I tried Lion’s Mane as a cognitive enhancer. Here are my experiences with it.


I tried Lion’s Mane from Four Sigmatic, which is branded as a cognitive enhancer. I’ve used it while studying Deep Neural Networks, amongst other things. I’ve done alternate weeks with and without Lion’s Mane and in my experience the effect is indiscernable.


Why cognitive enhancer?

I often listen to Tim Ferriss’ podcast (The Tim Ferriss Show). In it he often advertizes the wares of a company called Four Sigmatic. Apparently some of their mushroom coffees enhance cognitive abilities. That is of interest of me, because I’ve been studying a data science course on Coursera.org which had quite a lot of math and later I got a new assignment as a consultant to dive rather deep in the (Hadoop/Big Data related) Apache Atlas and Ranger products.

I’m 47 years old and math is certainly not part of my daily life. In fact I haven’t seen math that much since my bachelor study twenty years ago (besides Coursera courses). I’m also learning a lot of new open source products as data engineer. I can use all the cognitive abilities I can get. Continue reading

Posted in Weird experiments | Tagged , , , | 3 Comments

Recovering your HDP 2.6.1 Sandbox on VirtualBox after a restart

If you’ve worked with the Hortonworks Data Platform 2.x sandbox of later versions in VirtualBox and made it shutdown rather vigorously, you might have noticed that you won’t get past this startup screen when you try to start it up the next time:

I had this a couple of times and that’s why I decided to pause my sandbox every time and save it before shutting down my laptop. But yesterday Windows 10 decided to step in. After a day of studying it was high time for me to have dinner, during which I kept the laptop on. Little did I know that Windows 10 at that time decided to update and restart. And to do this, it needed to shutdown every application. Including VirtualBox. When I came back I found out to my horror that my carefully prepared HDP sandbox was shutdown in the roughest of ways. Thanks, Microsoft! Continue reading

Posted in Apache Products for Outsiders, Howto, Learning Big Data | Tagged , , , , , , , , , | 2 Comments

Tutorial: Let’s throw some asteroids in Apache Hive

This is a tutorial on how to import data (with fixed lenght) in Apache Hive (in Hortonworks Data Platform 2.6.1). The idea is that any non-Hive, non-Hadoop savvy people can follow along, so let me know if I succeeded (make sure you don’t look like comment spam though. I’m getting a lot of that lately, even though they never pass my approval).


Currently I’m studying for the Hortonworks Data Platform Certified Developer: Spark using Python exam (or HDPCD: Spark using Python). One part of the exam objectives is using SQL in Spark. Along the way you also work with Hive, the data warehouse software in Hadoop.

I was following the free Udemy HDPCD Spark using Python preparation course by ITVersity. The course is good BTW, especially for the price :). But after playing along with the Core Spark videos, the course again used the same boring revenue data for the Spark SQL part. And I thought: “I know SQL pretty well. Why not use data that is a bit more interesting?” And so I downloaded the Minor Planet Center’s asteroid data. This contains all the known asteroids until at least yesterday. At this moment, that is about 745.000 lines of data. Continue reading

Posted in Apache Products for Outsiders, Learning Big Data | Tagged , , , , , , , , | Leave a comment

Fun with Data: Python and space rocks!

Last week I had a little fun with playing with Python, the pandas and matplotlib library and a JSON file with asteroid data. Here is what I did.

Posted in Howto, Python | Tagged , , , , , | Leave a comment

Hadoop High Availability In A Hurry – Part 2: YARN

If you don’t know a lot about YARN and why it’s called a data operating system, you’re in luck. I found it necessary to explain how YARN works before I could explain the solutions for high availability.

At first YARN High Availability seemed like a different beast from HDFS High Availability. But when I read more about the topic I found out the solutions are actually very simular. Enjoy!

Posted in Apache Products for Outsiders, Learning Big Data | Tagged , , , , , , | 1 Comment

Hadoop High Availability In A Hurry – Part 1: HDFS

I’ve been studying for a couple of hours how Hadoop high availability works, for the HDPCA exam. And now I’ve condensed that knowledge to a video on HDFS HA in just under 9 minutes. Enjoy!

Posted in Apache Products for Outsiders, Learning Big Data | Tagged , , , , , , , , , , , | 1 Comment

Certifying as HDP Certified Administrator

Let’s talk about certification. The thing by which you try to show potential employers and customers that you actually know what you are doing at work. My only experience up to last Tuesday with IT product-related certifications was with Oracle’s Certified Professional program. I’ve been OCP for the database from 8i to 11g plus I’m 11g Database Performance Tuning Certified Expert. But all these exams were mainly multiple choice and to really test your knowledge the exams often contained some obscure stuff that you would rarely use. I’ll never forget the question about v$waitstat in one of these exams… well, I digress.

OCP wasn’t exactly embraced by all Oracle DBA’s either. A lot of experienced DBA’s saw it more as a way for inexperienced DBA’s to show they .. knew how to learn lots of facts about Oracle databases. Companies with lots of inexperienced DBA’s loved it, hoping that this would entice customers to invite their otherwise green “medior” DBA’s.

Continue reading

Posted in Learning Big Data | Tagged , , , | Leave a comment