I didn't see any mention of how long it takes to recover a mesh from an image. I imagine it's a significant amount of time, not including training.
What I wonder is, if this technology was fast enough, could it be used to caption sign language?
The full publication: https://arxiv.org/abs/1902.09305
This looks amazing and really useful for real-time sign language translation!
Maybe the soli radar is just unnecessary, if you have a low light camera or just illuminate with IR.
This is one problem where getting the results in software is very impressive, but the problem becomes much simpler with just a modicum of extra hardware.
LeapMotion[1] devices accomplish this with nothing more than a pair of cameras in a matchbox, in real time. And this kind of hardware is already becoming standard on cell phones and laptops.
Still, the killer obstacle for the applications I was trying to make was not precision, but latency.
Amazing work on HAMR - I wish they had the latency numbers in the article too!
In other news, there's a Google MediaPipe hand tracking example.[1][2] It's still documented as iOS/Android only, but there's now a hand_tracking directory under linux desktop examples![3] Results have been mixed.[4]
[1] https://ai.googleblog.com/2019/08/on-device-real-time-hand-t... [2] https://github.com/google/mediapipe/blob/master/mediapipe/do... [3] https://github.com/google/mediapipe/tree/master/mediapipe/ex... [4] https://www.youtube.com/watch?v=ZwgjgT9hu6A (Aug 31)