Visual recognition of text fields in photos/camera view?

I was wondering if it was within SwiftUI and UIKit's capabilities to recognize things like multiple text fields in photos of displays or credit parts, without too much pain and suffering going into the implementation. Also stuff involving the relative positions of those fields if you can tell it what to expect.

For example being able to create an app that takes a picture of a blood pressure display that you're able to tell your app to separately recognize the systolic and diastolic pressures. Or if you're reading a bank check and it knows where to expect the name, signature, routing number etc.

Apologies if these are silly questions! I've never actually programmed using visual technology before, so I don't know much about Vision and Apple ML's capability, or how involved these things are. (They feel daunting to me, as someone new to this @_@ Especially since so many variable affect how the positioning looks to the camera, and having to deal with angles, etc.)

For a concrete example, would I be able to tell an app to automatically read the blood sugar on this display, and then read the time below as separate data fields given their relative positions? Also figuring out how to pay attention to those things and not the other visual elements.

https://preview.redd.it/vxwm11rcvzle1.jpg?width=1000&format=pjpg&auto=webp&s=c120402368fc49c17f97ccfe7b5f29c406fa376f