{"id":871,"date":"2019-09-26T17:00:39","date_gmt":"2019-09-26T17:00:39","guid":{"rendered":"http:\/\/marcel-jan.eu\/datablog\/?p=871"},"modified":"2019-09-26T19:57:03","modified_gmt":"2019-09-26T19:57:03","slug":"tech-dossier-pandas","status":"publish","type":"post","link":"https:\/\/marcel-jan.eu\/datablog\/2019\/09\/26\/tech-dossier-pandas\/","title":{"rendered":"Tech dossier: pandas"},"content":{"rendered":"<p>I&#8217;m keeping tech dossiers in Evernote on open source products I want to keep track of.\u00a0 And I decided to put them on my blog. My previous ones were on <a href=\"https:\/\/marcel-jan.eu\/datablog\/2018\/12\/01\/tech-dossier-kubernetes\/\">Kubernetes<\/a> and <a href=\"https:\/\/marcel-jan.eu\/datablog\/2018\/05\/13\/making-a-hertzsprung-russell-diagram-from-gaia-dr2-data-with-elasticsearch\/\">Elasticsearch<\/a>. This one is on the Python data management library pandas.<\/p>\n<p>&nbsp;<\/p>\n<h2>A short description &#8211; in English<\/h2>\n<p>Pandas is a library of Python. If you already have Python 3 (version 2 support was recently dropped), it&#8217;s a matter of running &#8220;pip install pandas&#8221; and there you are. Pandas allows you to analyze and manipulate your data. But then again, aren&#8217;t there many more products for that? How to explain the power of pandas?<\/p>\n<p>Let me put it like this: it is like using Excel, but on much larger datasets, and if Excel had a command line interface. Imagine being able to say to Excel on a command line: &#8220;load my csv file&#8221;, &#8220;use this row as names for my columns&#8221;, &#8220;just show me columns date and sales&#8221;, &#8220;all right, now pivot that&#8221;. I just love it.<\/p>\n<p>&nbsp;<\/p>\n<h2>Learning pandas<\/h2>\n<p>For this I&#8217;ve used <a href=\"https:\/\/pythonprogramming.net\/introduction-python3-pandas-data-analysis\/\">pythonprogramming.net<\/a>. It&#8217;s free and it gave me an excellent start with data analysis in Python. The Youtube videos for pandas seem to have been recently updated also.<\/p>\n<p>Need to learn Python first? I started learning Python with the Coursera course &#8220;<a href=\"https:\/\/www.coursera.org\/learn\/interactive-python-1\">An Introduction to Interactive Programming in Python (Part 1)<\/a>&#8221; from Rice University. It&#8217;s a great course. But if you want a free course, you can&#8217;t go wrong with the <a href=\"https:\/\/pythonprogramming.net\/\">pythonprogramming.net<\/a> videos.<\/p>\n<p>You can also watch a couple of my video&#8217;s on my first encounters with pandas.<\/p>\n<p><iframe loading=\"lazy\" title=\"Fun with Data - Python, pandas and asteroids\" width=\"750\" height=\"422\" src=\"https:\/\/www.youtube.com\/embed\/iXjJNc8zGsM?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe><\/p>\n<p>And recently I wrote a blogpost on how I used pandas at work to <a href=\"https:\/\/marcel-jan.eu\/datablog\/2019\/03\/08\/showing-a-complex-excel-sheet-whos-boss-with-python-and-pandas\/\">flatten the data from a complex Excel sheet<\/a>, so I could load it in Hadoop. I&#8217;ve used all kinds of lesser known features to achieve that result.<\/p>\n<p>&nbsp;<\/p>\n<h2>Building your own environment<\/h2>\n<p>Want to play with pandas? That&#8217;s quite easy. You need to install <a href=\"https:\/\/www.python.org\/downloads\/\">Python 3<\/a> on your own computer and use &#8220;pip install pandas&#8221; (from the command line).<\/p>\n<p>&nbsp;<\/p>\n<h2>Getting pandas to do specific stuff<\/h2>\n<p><a href=\"https:\/\/pythonhow.com\/accessing-dataframe-columns-rows-and-cells\/\">Selecting columns or rows with pandas<\/a> (Because I keep forgetting after a while)<\/p>\n<p><a href=\"https:\/\/www.dataschool.io\/pandas-dot-notation-vs-brackets\/\">This article discusses two ways of selecting data with pandas<\/a>, but it&#8217;s also handy as reminder how to select rows and columns. You can&#8217;t go wrong now.<\/p>\n<p><a href=\"https:\/\/kanoki.org\/2019\/09\/09\/how-to-shift-a-column-in-pandas\/\">How to shift a column in pandas<\/a><\/p>\n<p><a href=\"http:\/\/www.zaxrosenberg.com\/pandas-multiindex-tutorial\/\">How do multi-indexes in pandas work?<\/a> Also in this video:<\/p>\n<p><iframe loading=\"lazy\" title=\"Pandas MultiIndex Tutorial and Best Practices\" width=\"750\" height=\"422\" src=\"https:\/\/www.youtube.com\/embed\/kP-0ET0V5Tc?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<h2>Other interesting stuff<\/h2>\n<p><a href=\"https:\/\/realpython.com\/python-pandas-tricks\/\">Pandas tricks and features you might not know<\/a><\/p>\n<p><a href=\"https:\/\/kanoki.org\/2019\/09\/16\/dataframe-visualization-with-pandas-plot\/\">Data visualization with pandas plot<\/a> (How cool: you can add .plot to your dataframe)<\/p>\n<p>&nbsp;<\/p>\n<h2>pandas and performance<\/h2>\n<p><a href=\"https:\/\/towardsdatascience.com\/python-pandas-at-extreme-performance-912912b1047c\">pandas at extreme performance<\/a><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;m keeping tech dossiers in Evernote on open source products I want to keep track of.\u00a0 And I decided to put them on my blog. My previous ones were on Kubernetes and Elasticsearch. This one is on the Python data management library pandas. &nbsp; A short description &#8211; in English [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[191,75,154],"tags":[208,178,77,209,76,210],"class_list":["post-871","post","type-post","status-publish","format-standard","hentry","category-data-engineering","category-python","category-tech-dossier","tag-data-manipulation","tag-multiindex","tag-pandas","tag-programming","tag-python","tag-tech-dossier"],"_links":{"self":[{"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/posts\/871","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/comments?post=871"}],"version-history":[{"count":2,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/posts\/871\/revisions"}],"predecessor-version":[{"id":874,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/posts\/871\/revisions\/874"}],"wp:attachment":[{"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/media?parent=871"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/categories?post=871"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/tags?post=871"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}