{"id":1754,"date":"2024-11-05T09:44:00","date_gmt":"2024-11-05T09:44:00","guid":{"rendered":"https:\/\/marcel-jan.eu\/datablog\/?p=1754"},"modified":"2024-11-05T09:45:21","modified_gmt":"2024-11-05T09:45:21","slug":"using-ocr-to-get-data-from-my-robi-scale","status":"publish","type":"post","link":"https:\/\/marcel-jan.eu\/datablog\/2024\/11\/05\/using-ocr-to-get-data-from-my-robi-scale\/","title":{"rendered":"Using OCR to get data from my Robi scale"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">How it started<\/h2>\n\n\n\n<p>For several years I kept track of my weight and fat with a <a href=\"https:\/\/www.coolblue.nl\/product\/409686\/soehnle-body-balance-comfort-select.html\">Soehnle Body Balance<\/a>, which I bought in 2018. That worked quite well until I saw more and more these weird deviations. Take a look at the red line (fat percentage) in the graph below:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"974\" height=\"635\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2024\/11\/CleanShot-2024-11-04-at-22.10.52.png\" alt=\"\" class=\"wp-image-1755\" srcset=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2024\/11\/CleanShot-2024-11-04-at-22.10.52.png 974w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2024\/11\/CleanShot-2024-11-04-at-22.10.52-300x196.png 300w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2024\/11\/CleanShot-2024-11-04-at-22.10.52-768x501.png 768w\" sizes=\"auto, (max-width: 974px) 100vw, 974px\" \/><\/figure>\n\n\n\n<p>I&#8217;ve been training harder in the last 2 years, but according to the fat measurements I gained more fat, not less. And also, after a day of a long bike ride, the fat percentage would peak the next day, instead of getting lower. In the last few months I would regularly get fat percentage measurements of 30+%. And it was not like I was eating burgers, fries and ice cream everyday. It didn&#8217;t look like the fat measurements were very accurate anymore.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">My new scale<\/h2>\n\n\n\n<p>I decided it was time for a new personal scale. After some deliberation I picked the Robi S11. It is a &#8220;Smart body composition scale&#8221; according to the brochure. It has a handheld device that measures your body fat (and a whole lot of other things) more accurately. It is similar to how my doctor measures my fat percentage during my half yearly checkup. And it was moderately priced.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"550\" height=\"554\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2024\/11\/image.png\" alt=\"\" class=\"wp-image-1756\" srcset=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2024\/11\/image.png 550w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2024\/11\/image-298x300.png 298w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2024\/11\/image-150x150.png 150w\" sizes=\"auto, (max-width: 550px) 100vw, 550px\" \/><\/figure>\n\n\n\n<p>Now this is one of those scales that has a Bluetooth connection. I&#8217;ve always had a healthy mistrust of sharing my health data with apps like these. Especially when the parent company is one Guandong Icomon Technology. Who knows where your data goes to and how securely it is stored?<\/p>\n\n\n\n<p>I decided to give their Fitdays app a try anyway. I filled in the limited amount of personal details (and not all of them entirely accurate). And of course I didn&#8217;t give the app any more access to iPhone data than absolutely necessary. For what it&#8217;s worth.<\/p>\n\n\n\n<p>The device does an impressive amount of measurements. It measures not just weight, fat, water and muscle tissue. It can do so per arm and leg. And somehow it also can measure bone mass and protein mass in your body. Not sure how accurate and scientific all this is though.<\/p>\n\n\n\n<p>The app shows all these results. And then came the little matter of me wanting to copy all that data. Luckily the app has a &#8220;share&#8221; option. I was able to Airdrop that data to my MacBook. So I was excited&#8230; until I got said data. Because it was in the form of a jpeg file.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"611\" height=\"822\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2024\/11\/CleanShot-2024-11-04-at-22.33.45.png\" alt=\"\" class=\"wp-image-1757\" srcset=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2024\/11\/CleanShot-2024-11-04-at-22.33.45.png 611w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2024\/11\/CleanShot-2024-11-04-at-22.33.45-223x300.png 223w\" sizes=\"auto, (max-width: 611px) 100vw, 611px\" \/><figcaption class=\"wp-element-caption\">Example of the data in jpeg form (only top part because the image is very long).<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Your data, in jpeg form<\/h2>\n\n\n\n<p>You can&#8217;t copy the values. You can&#8217;t get the data in any other form. Good luck!<\/p>\n\n\n\n<p>Good luck? Well we&#8217;ll see about that. I decided to summon the power of Python! Surely there must be some way to OCR the heck out of this jpg? And, as almost ever, there is a Python solution. Quite quickly I learned there is a Python package called pytesseract that can do OCR.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Using pytesseract for OCR<\/h2>\n\n\n\n<p>For a first attempt the code is fairly simple:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import pytesseract\nfrom PIL import Image\n\nim = Image.open(\"IMG_69EC2B66C329-1.jpeg\") # the ROBI image with data\ntext = pytesseract.image_to_string(im)\n\nprint(text)<\/code><\/pre>\n\n\n\n<p>And sure enough, when you run it, you get this result:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>83.2 kg 18.7 %\n\nGewicht Lichaamsvet\n\nIndicator Waarde Standaard\n\nGewicht 83.2kg Standaard\nBMI 23.0 Standaard\nLichaamsvet 18.7% Standaard\n\nVetmassa 15.6kg Standaard\n\nVetvrij\n\nlichaamsgewicht 878k\n\nSpiermassa 63.1kg Standaard\nSpiersnelheid 75.8% Standaard\nSkeletspier 46.5% Standaard\nBotmassa 4.5kg Standaard\nEiwitmassa 13.5kg Standaard\nEiwit 16.2% Standaard\nWatergewicht 49.6kg Standaard\nLichaamswater 59.6% Standaard\nOnderhuids vet 13.4% Standaard\nVisceraal vet 5.0 Standaard\nBMR 1830kcal\n\nLichaamsleeftijd 52 Uitstekend\n\nWHR 0.90 Standaard<\/code><\/pre>\n\n\n\n<p>Now all I have to do is select the lines with the data that I want, write it to a cleaned up data output, and I have my data in consumable form.<\/p>\n\n\n\n<p>I got a lot of data out of this. But not all. For example, on this multiline name it would get the text, but value was wrong:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"585\" height=\"101\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2024\/11\/CleanShot-2024-11-04-at-22.53.52.png\" alt=\"\" class=\"wp-image-1758\" srcset=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2024\/11\/CleanShot-2024-11-04-at-22.53.52.png 585w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2024\/11\/CleanShot-2024-11-04-at-22.53.52-300x52.png 300w\" sizes=\"auto, (max-width: 585px) 100vw, 585px\" \/><\/figure>\n\n\n\n<p>As you can see in the result here:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Vetvrij\n\nlichaamsgewicht 878k<\/code><\/pre>\n\n\n\n<p>It was probably confused by the value being in the middle of the multiline name?<\/p>\n\n\n\n<p>Also it would not get the text from this part with the human image:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"601\" height=\"574\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2024\/11\/CleanShot-2024-11-04-at-22.55.52.png\" alt=\"\" class=\"wp-image-1759\" srcset=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2024\/11\/CleanShot-2024-11-04-at-22.55.52.png 601w, https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2024\/11\/CleanShot-2024-11-04-at-22.55.52-300x287.png 300w\" sizes=\"auto, (max-width: 601px) 100vw, 601px\" \/><\/figure>\n\n\n\n<p>It would not get the numbers here (except the &#8220;Standard range&#8221;):<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Segmentale vetanalyse\n\nStandaardbereik: 80%-160%\n\nStandaard\n\nStandaard \\\\ Standaard\nl R l<\/code><\/pre>\n\n\n\n<p>Maybe that&#8217;s something to look into in a later phase.<\/p>\n\n\n\n<p>In any case, I was pretty happy about how easy it was to get the first results. I got enough out of it to start with. Hiding my data in a jpeg is no match for some rudimentary Python skills anymore.<\/p>\n\n\n\n<p>I&#8217;ve put my Python code in a Github repository: <a href=\"https:\/\/github.com\/Marcel-Jan\/extract_fitdays_data\">https:\/\/github.com\/Marcel-Jan\/extract_fitdays_data<\/a><\/p>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Further research<\/h2>\n\n\n\n<p>I&#8217;ve been thinking how to improve the quality of the results from pytesseract. One approach is to cut parts of the image out, so it can &#8220;focus&#8221; on these.<\/p>\n\n\n\n<p>But I also read you can do other forms of preprocessing of the image that can help. Like what I read in this post:<\/p>\n\n\n\n<p><a href=\"https:\/\/towardsdatascience.com\/getting-started-with-tesseract-part-i-2a6a6b1cf75e\">https:\/\/towardsdatascience.com\/getting-started-with-tesseract-part-i-2a6a6b1cf75e<\/a><\/p>\n\n\n\n<p>I also want to store my data in a .sqlite database in the future. Now it&#8217;s still an Excel sheet. But I could do more in SQL. Maybe make a data warehouse of my own personal data.<\/p>\n\n\n\n<p>To be continued.<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>How it started For several years I kept track of my weight and fat with a Soehnle Body Balance, which I bought in 2018. That worked quite well until I saw more and more these weird deviations. Take a look at the red line (fat percentage) in the graph below: [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1759,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[75],"tags":[152,370,371,76],"class_list":["post-1754","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-python","tag-health-data","tag-ocr","tag-pytesseract","tag-python"],"_links":{"self":[{"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/posts\/1754","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/comments?post=1754"}],"version-history":[{"count":5,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/posts\/1754\/revisions"}],"predecessor-version":[{"id":1765,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/posts\/1754\/revisions\/1765"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/media\/1759"}],"wp:attachment":[{"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/media?parent=1754"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/categories?post=1754"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/tags?post=1754"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}