QR codes for humans | Wesley Rogers

Pretty much everyone has a smart phone with a camera now, and QR codes seem to be here to stay, but there is a problem with human facing use in my opinion.

The problem I am talking about isn’t malicious use. The problem I am talking about is that barcodes are designed for computers, but today they are incorporated into use for humans.

That is a bit poorly worded, so let me give you an example instead.

Example of QR use that I am talking about. ©Hinnerk Rümenapf CC BY-SA 4.0

While this may “work,” it is pretty clunky. Putting logos in the middle of the code and relying on error correction really grinds my gears too.

Let me restate the problem then, the problem is that people see the QR code, and it is a lot of distraction noise to them. So I propose that we hide the QR code from humans.

So, what if you saw something like this instead?

The leftmost image may look somewhat similar to a frame QR code, but there is an important difference that I will explain later, unless I forget. First I am going to discuss “mode 1” or the left image, as I believe it is the logical starting point, but keep in mind the other images as the eventual target. The center image giving a visual indicator that there is data available, and the rightmost image giving no inherent indication to prompt the viewer to engage, but an example of what you might use for something a bit more “AR” type scenario.

So, how do we hide the data from the human? Generally speaking, color vision isn’t really particularly useful for a lot of computer-orientated tasks. While there are obvious advantages, the cost of color vision is pretty high. Computer vision tasks involving color sensors have to deal with a lot of issues, like Bayer filters and reduced sensitivity. As a result, it is fairly common to use near IR cameras. As smartphones continue to get more and more cameras on them for assorted tasks, the inclusion of near IR cameras for 3D mapping, low light situations, and so on seems inevitable. My suggestion is to create images with 4 color channels. The human content would be in the Red Green and Blue channels but the machine content would be in the Infrared channel.

The leftmost image here shows the Infrared channel and the rightmost one shows RGB. The middle image is an animation demonstrating how the IR channel (left) and RGB channel (right) combine.

The idea here is that the frame stays the same to allow alignment because the near IR sensor would be expected to be offset by a bit in a multicam setup. by placing checksums in the frame part you could match the I image with the corresponding RGB image. In this mode, the layers actually could be completely separate spatially because it is relating the I data to the RGB image using the frame anyway. This brings us to mode 2.

mode 2

Mode 2 is basically the same as mode 1, except that the data layer is offset spatially instead of by wavelength. Obviously, this mode defeats the whole point of this setup, but it is useful for visualization and testing

composite

Here we can easily see how the components combine to make the 4 channel image.

Now that we have laid out the basics, lets go back to our actual target objective.

mode 3

Here we see the QP, Q+, or Q Plus logo. This lets the viewer know that there is additional data associated with the image, but unlike mode 1 it doesn’t clutter things up with the frame. Only the logo overlaps between I and RGB channels. This setup requires the device to account for any channel alignment, but such actions should be trivial and the logo acts to assist alignment. Below we show a corresponding I channel for the above.

Mode 3 IR

I will briefly talk about the idea here. In the image below, the blue shows the overlapping logo. The black shows the data frame. The green shows the padding, letting the data be associated with the full spatial location of the image instead of just the frame area.

Mode 3 NIR zones

And last we can talk about mode 4. Although mode 4 doesn’t have any inherent visual indication to prompt the viewer, it would be compatible with an AR style setup where everything is scanned. Although, to be fair, if that is the case you really wouldn’t need to use Q+, you just need to have the data in the barcode to tell it what to do. So I don’t know how useful it would be, but it seems like a logical extension.

mode 4 I channel

As for the actual data in the code, I have some thoughts, but honestly the encoding isn’t really the point of this and there is no reason you couldn’t use existing encoding, so I am not going to say much. I would say that encoding might want to take into account that we are talking about something more futuristic so the need for things like alignment and timing marks might not be as important. And we might be talking about use with “AI” assistants that could take contextual cues and deal with “grey” data.

I am going to wrap this up by putting a few images from my thoughts on data arrays and encoding. They are pretty sloppy to say the least, and balancing error correction, data density, limited invalid code points, and robustness in real world situations it a bit outside of my abilities. Still, maybe some of my thoughts could be helpful to someone.

Bard notes that I agree with:

Suggestion: The sentence “In this mode, the layers actually could be completely separate spatially because it is relating the I data to the RGB image using the frame anyway.” is a bit dense. Maybe: “This approach allows the IR data and RGB image to be spatially independent, as the frame ensures proper alignment. This distinct spatial relationship, leveraging the frame for correlation, brings us to Mode 2.”

Suggestion: “it doesn’t clutter things up with the frame.” could be slightly more formal for the surrounding text, e.g., “it avoids the visual clutter of the frame present in Mode 1.”
“the logo acts to assist alignment” – perhaps “the logo serves as an alignment marker/reference.”