What I learned from using OCR to get data from my weighing scale

A bit more than a year ago I wrote about the Robi S11 personal weighing scale and that it would not share its data with me, except as jpeg file (from the Fitdays app).

Recently I got my Python code up to a point that it services all my OCR needs for this solution. I’m really happy that I got this far. I’ve uploaded the latest version to my Github repo.

Here are a couple of things I learned when building the current version:

Don’t ask an AI coding assistant to refactor the whole code base

I just had my first experiences with Claude Code, Antropic’s AI coding assistant. I had fun trying new things with it. At one point I asked Claude Code to refactor the code base for the FitDays OCR solution. And yes, I made a backup beforehand. But still.

Claude Code went on the job quite energetically. A lot was happening. It cleaned up code, removed print commands and replaced them with a few logging statements. And it made the whole thing object oriented.

Afterwards I tried running my new Python code. It worked! But I had a hard time finding out what the code was doing now. So don’t do that.

Next time I want to use AI coding assistants to advice me what the next step in refactoring should be and rather do it myself. After all, I do these projects to learn better programming.

Learn about the pytesseract psm settings

Reading all the text in the jpegs was a bit of a hit and miss affair in the original version of the code. So I made a loop to try different settings so one version would usually work. Things like rescaling and changing different colour conversions. And there was this psm parameter in the config of the pytesseract.image_to_string function. But at first I could not find what it was for.

Recently I found this article, and now it makes much more sense.

The psm parameter changes the way pytesseract looks at the text. It can assume that it’s a page from a book, or a page of text that has been vertically aligned, or it’s a single block of text. And psm=6 means “assume it is a single uniform block of text”, which works quite well for receipts or.. in this case: the jpeg shared by the Fitdays app.

If you still want to use the data in Excel, have a query handy

Yes, now that I have the data in my sqlite database, and that I’m pretty happy with the data quality (although pytesseract will sometimes read 7.4 as 7.A), I could use the database as .. base.. for the graphs and stuff. But to be honest, I still use Excel.

Yes I could use PowerBI like a pro, but for PowerBI I practically need a Windows machine. And I’m running a MacBook. I could use Python code to create the graphs. But I’m still not quite happy there.

So I have a query handy to read all the data from the database and copy the data from there into Excel. Not really a pro solution. Fine for me for now.

The main thing is: the data entry time has been eliminated. I just Airdrop the latest weight data from the Fitdays app to my MacBook, run the Python code and into the sqlite database it goes. I’m pretty happy about that.

This entry was posted in Python, Things I Learned. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *