The Questioning Camera

The camera is a wonderful tool. An artificial eye, but capable of so much unlike ours. It is capable of sensing so much more than what it merely 'sees'. Those who control a camera have an extra pair of sensory organs, which are highly capable, like a superpower. A CCTV camera is used to surveill and spy, but a regular digital camera is used to document moments in one's life. The same tool can be used for a wide range of purposes, some benign, and others with a malicious intent. A camera can 'see' much more than we can, technically and metaphorically. Cameras can see across the electromagnetic spectrum and beyond, which is highly functional. Some also incorporate small levels of intelligence, which can be used to detect intruders, calculate distance, or recognise certain entities. Even small amounts of intelligence can make a camera a very powerful tool or weapon.

A neutral camera with no intelligence has a lot of potential. Although it can be used to augment our senses to see what we can't with our own eyes, it can also help augment our intelligence. Cameras can be trained to perceive things differently from how we see the world. They can help us think differently and open our minds. A challenge I face regularly is my acceptance of the world around me. I don't often questions the things I see around me, and accept what I see at face value. This has perhaps happened over time. I've stopped questioning the entities and systems in my environment. I wish I could regain that curiosity I once had. The need to not only question everything I see, but the need to fully understand how everything around me works.

What better tool than a camera to help me out with this?

Questioning Camera


Questioning Camera (Give it a few minutes to load!) See the code →

I created the Questioning Camera to augment how I view the world around me. It's a helpful tool which reminds me to ask questions and think a little more. With the help of a simple machine learning algorithm, it recognises the entities it sees, and accordingly frames questions for me-the user. It uses the COCO-SSD Object Detection model adpated to ml5.js to detect objects. The COCO-SSD model classifies objects into 90 different classifications. Although it is not exhaustive, it does detect objects accurately and quickly. Once it identifies an object it recognises, it draws a rectangle around it calling it out, and overlays a few questions for the viewer to think about. The intent is to help the viewer think critically about what they see. It tries to redefine the way in which we accept our environment.

questioning-camera questioning-camera questioning-camera
questioning-camera questioning-camera questioning-camera
Some images taken with the questioning camera.

Design Process

sketches sketches
Initial ideation sketches

From the first batch of ideas, I downselected 3 ideas. Two of those ideas were quite similar, both about small amounts of local intelligence, which I was particularly interested in. The idea of a small, local intelligence is particularly fascinating to me as it is diametrically opposite of the idea of an all-known 'AI' which is quite popular. We are very far away from the idea of a general purpose intelligence, and so I feel a lot of the hype and attention towards the idea of an AI is unjustified. However, we can create entities with small amounts of local intelligence, which is what most objects marketed as 'AI' powered really are.

To create a camera which could not only understand what it sees but also formulate relevant questions, I had to give it some intelligence. I created a simple camera sketch, which could read the video feed from the webcam. I then integrated the ml5js Object Detector model within the sketch. I picked the COCO-SSD version of the model since it had a defined set of objects it could identify. This would help me assign specific questions for every kind of object.

The sketch worked as planned, and could identify objects. However since the machine learning model was infering each frame, it ran very slowly (~1-3fps). This was not only unpleasant to look at, but it was slow as well. To make the user experience smoother, I changed the configuration of the sketch, so that the machine learning model would only start infering once the user has taken a photo. Hence, the inference would run once, making it much faster. I added 3 buttons, a shutter button, clear button, and a download button. The shutter button is used to take a picture and start the machine learning model. The clear button switch the camera back to the viewfinder mode, ready to take a picture again. The download button allows the user to download the image if they wish to.

I recreated the object classification from the COCO-SSD documentation to assign custom questions to each kind of object. To simplify the process of assigning the correct questions, I used an object and a nifty little logic which used a word matching to avoid using a series of complex and long if/else loops.

I also tried to visually differentiate the detected object from the background using PGraphics masking and filter(INVERT), but it didn't work out as planned. Masking a PGraphic with another PGraphic is a little more complex than masking an image.

To make it easier to shoot with when out and about, I created a mobile version of the sketch, with a slightly different interface. This required a slightly different code to access the rear camera of the smartphone, but it works just as well as it does on a desktop.

Reflection

In the process of creating an experimental camera, I've ended up creating a lens through which I can look at the world. Giving it a small amount of local intelligence augments my own intelligence and gives me additional capabilities. Visual perception is difficult, and computationally heavy! Even with a low-res camera like the ones used on webcams, it can be quite challenging to read and analyse the incoming data. It also helped me realise the challenges of inference through pixels. A digital camera only understands pixels, and can't see or perceive anything. We have to add multiple layers of intelligence in order to give it the capability to actually see and comprehend.