Today we’ll have an unusual but interesting comparison of two image capture “devices” – a smartphone sensor and the human eye.
If you think the modern 108-megapixel sensor of Galaxy S22 Ultra or Redmi Note 11 Pro is inferior to the eye in everything, then you are very wrong. This article will give you a better understanding of modern mobile technology and a different perspective on yourself.
It might seem that even professional cameras have not come close to the human eye’s capabilities so far. And smartphone cameras even more so!
Just think about it, how many megapixels does a camera need to produce a huge picture that fills the entire field of view? And the quality has to be so high that we can’t see a single pixel. After all, our eyes give us a sharp picture without pixels.
This means that the resolution of our eye “image sensor” (retina) has to be extremely high. So, let’s dive in, starting with the resolution!
How many megapixels are our eyes?
If you ask this question to a Google search engine, the answer will be a concrete number – 576 megapixels. Other resources might give a different answer – about 120 megapixels. And if Steve Jobs had answered this question, he would probably say ~350 megapixels.
Even though all these answers are different, they at least “prove” that no modern image sensor is yet able to get close to our eye’s capabilities!
But why, actually, are the answers different? Well, it’s because these calculations have nothing to do with our eyesight. To be sure, let’s take a closer look at each number.
576 megapixels eye
Imagine a giant screen covering your entire field of view, which means you can’t see anything but that screen. So, for you to be unable to distinguish individual dots (pixels), the resolution of such a display must be at least 576 megapixels.
Is that much? Judge for yourself: 4K TVs have just over 8 million pixels (megapixels), and modern 8K TVs may have up to 30 million pixels and even more! You’ll agree that 576 megapixels sound very convincing in this context.
The same goes for the 350 megapixels. It’s just that the calculation doesn’t take the best visual acuity into account but something closer to average or “normal” visual acuity (the better the acuity, the higher resolution you need and vice versa).
But what do these numbers have to do with the eye? If the eye were actually taking 576-MP pictures and showing the result to our consciousness, we could talk about such high resolution. In reality, however, none of this happens.
The eye does not take “pictures” as cameras do, so we can rule figures like 576 or 350 out. They do not answer the question and have nothing to do with our vision.
120 megapixels eye
This is a more interesting and realistic number, which, however, also has nothing to do with the correct answer.
The “eye sensor” (retina) consists of individual tiny light-sensitive elements like the smartphone camera sensor. We call it pixels in a camera and rods or cones on the retina (there is also a third kind of “pixels”, but they don’t take part in the picture forming).
There are 110 to 120 million rods on the retina and 6-7 million cones. Therefore the total number of photosensitive elements is 116-127 million, which gives us an average of 120 megapixels.
Okay, stay on this number for a moment. Especially since it’s very close to today’s 108-MP mobile sensors. And now, let’s compare the 120-MP eye “sensor” to the 108-MP mobile sensor.
108-megapixel smartphone camera vs 120-megapixel eye. Which sensor is better?
Any mobile sensor with super-high resolution (48 megapixels and more) is designed roughly the same way. It’s a rectangular plate on which the “pixels” (light-sensitive silicon) are placed in small groups.
This is because pixels can’t sense color, so you have to place an additional filter over each of them. The filter is just a glass painted in one of the 3 primary colors. And when the light from the lens passes through such a filter, only a fraction of it hits the pixel:
We break down all the incoming light into its components: red, green, and blue. We get a mosaic of three colors. And when we need to restore the original color in the picture, we put these components back together into one color. Or, professionally speaking, we make a demosaic.
If we talk about high-resolution sensors, the glass (color filter) is placed not over each individual pixel but rather over a group of pixels at once. For instance, on Samsung ISOCELL HMX (108 MP) or Sony IMX686 (64 MP) sensors, each color filter covers 4 pixels at once:
Some users see a trick in pixel grouping (or pixel binning). After all, if you count by color, we do not have 108 megapixels on a 108-MP image sensor. If we group by 4 pixels, we get 27 megapixels (108 divided by 4), and if we group by 9 pixels, then the “color resolution” will be as little as 12 megapixels (108 divided by 9).
Of course, everything is more complicated in reality, as there are many algorithms and options to get more colors by combining pixels in different ways. But this trick is nothing compared to how the eye “image sensor” is designed!
The real “eye sensor” size and resolution
The retina (“eye image sensor”) is not rectangular like a smartphone camera sensor but is made in the form of a “hemisphere” stretched out on the back inner wall of the eyeball:
The retina is shown in gray in the illustration above. Since it covers about 72% of the eyeball, it’s a gigantic “sensor” compared to a mobile one. Even if we talk about the largest mobile sensor, its area is at least 10 times smaller than the retina.
If the trick with a smartphone was all about combining pixels, then it’s much more complicated with the eye.
To start with, only one type of “pixels” is responsible for color: cones. There are no more than 7 million cones on the retina. Even theoretically, our eye can produce a “color picture” at a resolution of only 7 megapixels, which is not even 4K!
Can you imagine a huge photo taking up the entire field of view that consists of only 7 megapixels? Obviously, with such a low resolution, the size of the sensor doesn’t matter anymore. The pictures will be of terrible quality anyway.
So how come the image we see is so crisp?
This is because most of the cones (color-sensitive “pixels”) are concentrated in a tiny spot in the center of the retina. There are no rods (“pixels” perceiving only brightness) here. In fact, the “eye sensor”, which captures the sharpest color image, looks like this:
The mobile sensor looks much more serious and high-quality tool against the background of this tiny piece of the retina, doesn’t it?
And this is the only place where the image on the retina is as sharp as possible. It’s about one square at arm’s length. The rest of the picture is very blurry, and the further away from that central piece, the worse the quality is.
But you can’t verify this statement! If you look a little to the left (to check the image quality there), the max sharpness will already be at this new point, and the previous area to the right will be blurred.
In fact, your eyes scan the surrounding space with the small “sensor” in the central part of the retina. And this is a great way to save the body’s computing resources.
But that’s not the whole story!
“Pixel binning” on the eye sensor
As mentioned above, pixels on mobile sensors are combined into groups of four or nine. This technology is called pixel binning, and its main purpose is to improve image quality by reducing noise or increasing the signal-to-noise ratio.
The exact same technology is used in our eye’s “image sensor”. But instead of 4 or 9 pixels, there are tens, hundreds, and even thousands of rods and cones combined into one nerve cell! On average, about 100 “pixels” of the retina are combined.
And, unlike the smartphone, here we are dealing with a real physical merging of the signal. That is, we can’t read the signal from a single rod that is combined into a group of 1000 rods/cones. Only the total signal of the entire group is read. We just physically have only about a million or so “wires” coming out of the eye and going to the brain.
Each pixel is connected to a circuit by a separate wire on a smartphone. And we read each of the 108 million pixels individually, even if they are grouped together and covered by the same color filter. Combining the signal takes place after it has been read. Thus:
The actual “resolution” of the eye (or, more precisely, vision) is close to 1.3 megapixels! And this is the level of a 15-year-old cellphone.
And almost all of this detail goes to the small center of the frame. Because only here cones are not combined into groups to keep the picture as crisp as possible.
A hole in the sensor!
What else could be done to ruin the sensor of the eye? How about adding blind spots?
There is a place approximately in the center of each eye where all the wires (axons) from “pixels” exit and go to the brain with a single “cable” (optic nerve):
There are no light-sensitive elements in this place. Therefore the “blind spots” are right in front of our eyes all the time.
If you are reading this article from a computer monitor (large screen), close your right eye and look with your left eye from a distance of ~8-12″ at the plus sign shown on the right. At that point, the black circle on the left will simply disappear, as it will fall right into the blind spot:
If it doesn’t work, move closer towards or away from the screen until the sign disappears. Also, you shouldn’t move your gaze anywhere. Otherwise, your eye will do the trick again – it will project this area into the central pit of the retina so you can see the image.
These blind spots are always present, but when we look with two eyes, the right eye adds missing information to the blind spot on the left and vice versa. When we look with only one eye, the brain somehow tries to mask the blind spot. For instance, it takes color or pattern around a blind spot and paints over it.
But that’s not all!
Don’t forget that our “image sensor” (the retina) has to be powered somehow, which means it has to have “wires” or blood vessels. These vessels do exist, and they cast a shadow on the image we see.
However, we don’t see these shadows, because our brain got used to them a long time ago and understood that they shouldn’t be shown to our consciousness. So, the brain just erase shadows!
I believe you are ready now to see an example of the picture our 1.3-megapixel “eye sensor” produces. If you were expecting to see quality at a 15-year-old cellphone level, it’s even worse:
Obviously, this is just a computer-generated example, but it conveys the main point quite well.
Here we see a small crisp area in the center, a blind black spot on the right, shadows cast by the vessels. And the whole quality of the 1.3-MP picture is extremely poor. There is almost no color at the edges because there are few cones (color-sensitive “pixels”) and many rods (light-sensitive “pixels”) there.
By the way, there is no nose in this picture, which is constantly present in the “frame” and obstructs the view. But we don’t see it consciously since the brain “wipes it out”.
Funny enough, smartphones adopted BSI technology a long time ago. This is when we place all pixels’ wires behind the light-sensitive elements. In other words, nothing obstructs the light:
But the eye was developed long before the BSI technology. The light-sensitive elements of the eye are at the very bottom, behind several layers of wires (nerves) and other cells (mostly transparent):
And before we understand why despite all we see so well, let’s compare the performance of the sensors in low light.
Smartphone image sensor vs retina in low light
When it gets too dark, every photon counts! A photon is the smallest indivisible portion of light. Half or a quarter of a photon can’t hit the image sensor or retina.
When the photon is absorbed by the pixel, a piece of silicon releases one electron. The more photons absorbed, the more free electrons will appear. And the more electrons inside the pixel, the brighter the corresponding dot will be in the final picture.
Here it’s important to use all the photons as efficiently as possible. Every photon that hits a pixel in a perfect world must free an electron. This is not always the case, though.
Imagine how terrible a sensor would be if it only absorbed every tenth photon! There is already too little light, and we waste 90% of the photons.
The efficiency of a modern smartphone image sensor comes close to 100%! That is, almost all photons “create” free electrons. And that’s an excellent rate.
Now let’s look at our eyes. It takes many more photons to activate at least one cone (a “color pixel”) than it takes to activate one rod (a “pixel” that only takes into account brightness). Therefore, there is not enough light in the darkness to activate cones, and our eyes “take pictures” only with black and white rods.
While the smartphone sensor absorbs photons with pieces of silicon, the rods do it with special molecules called rhodopsins. One rhodopsin molecule can absorb 1 photon of light.
This is what such a rod looks like:
Note the “shelf” with the disks. Each such disk contains 10,000 rhodopsin molecules. That is, each disk can absorb 10,000 photons. Now watch the numbers:
- There are 120 million rods on the retina
- There are 1,000 disks inside each rod
- Each disk contains 10,000 rhodopsin molecules
In total, the “sensor” of the eye can absorb about 1.2 quadrillion photons (1 quadrillion is a million billion). And a 108-MP smartphone sensor with the most advanced effective pixels can absorb about 600 billion photons, which is about 2,000 times less. Thus, the dynamic range of the eye “sensor” is much larger than a mobile one.
This is a huge advantage during daylight, but what about low light?
It takes just one photon to activate one rod. But this rod won’t send any signal to the brain, and we won’t see a picture. We need to activate at least 10 rods for that. And that brings us back to the question of the eye “sensor” efficiency.
While the quantum efficiency of a smartphone sensor is close to 100%, it’s less than 20% for the eye. That is, out of 100 photons hitting the retina, the rod will absorb at best 20 photons. The rest will be “utilized” by a special layer, which prevents chaotic movement of photons inside the eye so that no reflections, “glare”, and other problems arise.
That’s why our pupil appears black. Photons just don’t come back from the eye. Otherwise, we would see blood vessels inside the eye.
In fact, this is what sometimes happens when we use a camera flash in low light. The pupil doesn’t have time to react to the powerful light flux and cover the lens “aperture”. Thus, too many photons enter the eye, some of which are reflected back.
“Computational photography” is the key to success!
You may have already guessed that the whole secret of high-quality vision is the most powerful “image processor”. Our brain gets a poor picture, especially when you compare it to what the smartphone produces.
However, our eyes don’t “capture” the world like a smartphone camera frame by frame. They continuously make small movements (saccades), scanning the scene with their measly 1.3 megapixels.
The brain combines two flat images from both eyes and builds a 3D image. It removes shadows from the vessels, “erases” the nose, paints blind spots, makes guesses and turns them into a “real” picture.
To realize the scale of his artistic creativity hidden from your consciousness, just look at the moon or the sun. Have you noticed how big they are above the horizon and small at the zenith?
Did you ever tell someone to admire the big and beautiful moon before it rose up and became small?
What is this mysterious physical phenomenon? Could it be all about orbits? Or is it the atmosphere that somehow refracts the light and makes celestial bodies look larger?
In fact, neither the sun nor the moon changes their size in any way, whether they are in the zenith or above the horizon. It’s just your brain having fun taking a picture of the small moon above the horizon and then enlarging it to a spectacular size, showing the result to your consciousness.
You admire your brain’s talents, call your acquaintances and tell them to look at this beauty. But objectively, there is no beauty. Your acquaintances will look at a tiny moon, and their brains will “photoshop” the picture the same way, making the moon seems bigger and more impressive. And both of you will enjoy the nonexistent landscape!
The measly 1.3 megapixels that actually feed into the brain are only a fraction of the picture we see. The rest is, so to speak, computational photography. And this is exactly the way the smartphone cameras will continue to evolve.
The only difference is that the smartphone has to make the whole picture crisp and not just a central part as the brain does. Therefore, the smartphone’s image sensor produces an overall much better picture than the retina. And in this respect, technology has long been ahead of biology.
It will be interesting to watch people’s reactions when all smartphones do the same trick with the moon that our brains do (and not just the moon).
Aesthetes will express their frustration that smartphones no longer capture reality: “Why do I need a photoshop!? I want to see a natural picture! Where are the good old days when the camera was all about physics, not algorithms?!”.
And these same people won’t even realize that the “reality” is a figment of their imagination, the drawings rigidly processed by the brain’s “photoshop”.
Alex Salo, Tech Longreads founder
If you enjoyed this post, please share it! Also, consider to support Tech Longreads on Patreon and get bonus materials!