{"id":1950,"date":"2026-01-02T11:36:14","date_gmt":"2026-01-02T11:36:14","guid":{"rendered":"https:\/\/marcel-jan.eu\/datablog\/?p=1950"},"modified":"2026-01-02T11:41:01","modified_gmt":"2026-01-02T11:41:01","slug":"what-i-learned-from-using-ocr-to-get-data-from-my-weighing-scale","status":"publish","type":"post","link":"https:\/\/marcel-jan.eu\/datablog\/2026\/01\/02\/what-i-learned-from-using-ocr-to-get-data-from-my-weighing-scale\/","title":{"rendered":"What I learned from using OCR to get data from my weighing scale"},"content":{"rendered":"\n<p>A bit more than a year ago I wrote about <a href=\"https:\/\/marcel-jan.eu\/datablog\/2024\/11\/05\/using-ocr-to-get-data-from-my-robi-scale\/\">the Robi S11 personal weighing scale and that it would not share its data with me<\/a>, except as jpeg file (from the Fitdays app).<\/p>\n\n\n\n<p>Recently I got my Python code up to a point that it services all my OCR needs for this solution. I&#8217;m really happy that I got this far. I&#8217;ve uploaded the latest version to <a href=\"https:\/\/github.com\/Marcel-Jan\/extract_fitdays_data\">my Github repo<\/a>.<\/p>\n\n\n\n<p>Here are a couple of things I learned when building the current version:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Don&#8217;t ask an AI coding assistant to refactor the whole code base<\/h2>\n\n\n\n<p>I just had my first experiences with Claude Code, Antropic&#8217;s AI coding assistant. I had fun trying new things with it. At one point I asked Claude Code to refactor the code base for the FitDays OCR solution. And yes, I made a backup beforehand. But still.<\/p>\n\n\n\n<p>Claude Code went on the job quite energetically. A lot was happening. It cleaned up code, removed print commands and replaced them with a few logging statements. And it made the whole thing object oriented.<\/p>\n\n\n\n<p>Afterwards I tried running my new Python code. It worked! But I had a hard time finding out what the code was doing now. So don&#8217;t do that.<\/p>\n\n\n\n<p>Next time I want to use AI coding assistants to advice me what the next step in refactoring should be and rather do it myself. After all, I do these projects to learn better programming.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Learn about the pytesseract psm settings<\/h2>\n\n\n\n<p>Reading all the text in the jpegs was a bit of a hit and miss affair in the original version of the code. So I made a loop to try different settings so one version would usually work. Things like rescaling and changing different colour conversions. And there was this psm parameter in the config of the pytesseract.image_to_string function. But at first I could not find what it was for. <\/p>\n\n\n\n<p>Recently I found this article, and now it makes much more sense.<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-wp-embed is-provider-pyimagesearch wp-block-embed-pyimagesearch\"><div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"wp-embedded-content\" data-secret=\"jp8delSx5C\"><a href=\"https:\/\/pyimagesearch.com\/2021\/11\/15\/tesseract-page-segmentation-modes-psms-explained-how-to-improve-your-ocr-accuracy\/\">Tesseract Page Segmentation Modes (PSMs) Explained: How to Improve Your OCR Accuracy<\/a><\/blockquote><iframe loading=\"lazy\" class=\"wp-embedded-content\" sandbox=\"allow-scripts\" security=\"restricted\" style=\"position: absolute; visibility: hidden;\" title=\"&#8220;Tesseract Page Segmentation Modes (PSMs) Explained: How to Improve Your OCR Accuracy&#8221; &#8212; PyImageSearch\" src=\"https:\/\/pyimagesearch.com\/2021\/11\/15\/tesseract-page-segmentation-modes-psms-explained-how-to-improve-your-ocr-accuracy\/embed\/#?secret=l18JRGkgJW#?secret=jp8delSx5C\" data-secret=\"jp8delSx5C\" width=\"600\" height=\"338\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\"><\/iframe>\n<\/div><\/figure>\n\n\n\n<p>The psm parameter changes the way pytesseract looks at the text. It can assume that it&#8217;s a page from a book, or a page of text that has been vertically aligned, or it&#8217;s a single block of text. And psm=6 means &#8220;assume it is a single uniform block of text&#8221;, which works quite well for receipts or.. in this case: the jpeg shared by the Fitdays app.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">If you still want to use the data in Excel, have a query handy<\/h2>\n\n\n\n<p>Yes, now that I have the data in my sqlite database, and that I&#8217;m pretty happy with the data quality (although pytesseract will sometimes read 7.4 as 7.A), I could use the database as .. base.. for the graphs and stuff. But to be honest, I still use Excel.<\/p>\n\n\n\n<p>Yes I could use PowerBI like a pro, but for PowerBI I practically need a Windows machine. And I&#8217;m running a MacBook. I could use Python code to create the graphs. But I&#8217;m still not quite happy there.<\/p>\n\n\n\n<p>So I have a query handy to read all the data from the database and copy the data from there into Excel. Not really a pro solution. Fine for me for now.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"966\" height=\"120\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-02-at-12.31.55.png\" alt=\"\" class=\"wp-image-1951\" srcset=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-02-at-12.31.55.png 966w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-02-at-12.31.55-300x37.png 300w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2026\/01\/CleanShot-2026-01-02-at-12.31.55-768x95.png 768w\" sizes=\"auto, (max-width: 966px) 100vw, 966px\" \/><\/figure>\n\n\n\n<p>The main thing is: the data entry time has been eliminated. I just Airdrop the latest weight data from the Fitdays app to my MacBook, run the Python code and into the sqlite database it goes. I&#8217;m pretty happy about that.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>A bit more than a year ago I wrote about the Robi S11 personal weighing scale and that it would not share its data with me, except as jpeg file (from the Fitdays app). Recently I got my Python code up to a point that it services all my OCR [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1952,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[75,383],"tags":[],"class_list":["post-1950","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-python","category-things-i-learned"],"_links":{"self":[{"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/posts\/1950","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/comments?post=1950"}],"version-history":[{"count":2,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/posts\/1950\/revisions"}],"predecessor-version":[{"id":1954,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/posts\/1950\/revisions\/1954"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/media\/1952"}],"wp:attachment":[{"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/media?parent=1950"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/categories?post=1950"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/tags?post=1950"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}