*Pixel binning* is a process that combines several small pixels into one large to improve image quality.

This technology isn’t new, but binning became extremely popular in mobile photography just a couple of years ago – at the very moment when affordable 48MP camera phones became available on the market. Manufacturers became so obsessed with this technology that they went from 12 to 108 megapixels in 2019 alone!

Today, almost all flagship smartphones use binning, combining four or even 9 pixels into one *superpixel*. But why? Does that make any sense, or is it just a gimmick?

The main problem with high resolution is that image sensors’ size does not increase in proportion to their resolution, which means we have to reduce each pixel size.

In 2019, we started with 0.8 µm pixels, which was already close to the red light wavelength. And just recently, Samsung introduced the first 200-megapixel sensor with 0.64 µm pixels! That is, the pixel size has become smaller than the wavelength of red light (0.62-0.74 µm).

Therefore, no mobile lens can physically focus a red color dot that would “fit” into such a small pixel^{1} . Even theoretically, a new image sensor from Samsung cannot resolve red color fine details.

So, do we need such resolution at all? That’s an excellent question for another conversation, but today we need to make sense of combining four or nine 0.8 µm pixels into one 1.6-2.4 µm *superpixel*.

## Introduction. Or what are we even talking about?

As many of you know, the image sensor of any digital camera, be it a DSLR or a smartphone, is not capable of distinguishing colors. The pixel (*a piece of silicon*) can only capture the amount of light (its brightness) regardless of its color.

To get a color image, we have to divide all pixels into 3 groups: some will ignore all light except red shades, others will only perceive green tones, and the rest will be “sensitive” just to blue light.

To do this, we place a colored glass on top of each pixel, which will only allow its color to pass through. As a result, a piece of silicon under the red filter will only register the number of “red” photons from the total flux, and so on:

That’s how pixels on the image sensor are arranged, except for the modern ones – those that use pixel binning. These sensors are designed a little differently.

If a manufacturer claims 48 megapixels, then such a sensor actually uses 48 million physical pixels (*pieces of light-sensitive silicon*). However, all of them are gathered into groups of 4 pixels, which are covered by a single color filter:

But why bother with all that complexity? Why not just place 48 million tiny pixels, each with its color filter?

The reason for this is that the 48-megapixel sensor could “turn” into 12 megapixels when needed so that 4 pixels under one color filter would work as one larger pixel of the same color.

And when not needed, each pixel can work independently, producing a photo with more details:

It turns out that pixels on modern image sensors can change their size if needed from 0.8 microns to 1.6 or even 2.4 µm, depending on binning type.

But all that is just a nice theory, and what is the reality? Can binning really improve image quality, and does a group of small pixels behave like one big pixel?

It’s hard to answer these questions if you don’t understand the advantage of large pixels over small ones. We must clearly understand how pixel size itself affects image quality. And only after that can we figure out if binning increases the physical size of a pixel.

### Why is a big pixel better than a small one? Or let’s talk about randomness

Let’s imagine that we have a perfect professional camera. By *perfect*, I mean that all the pixels in that camera capture every incident photon of light without the slightest distortion. Then all photons are counted and digitized as accurately as possible.

Would such a camera always produce perfect images? No, it wouldn’t. The image quality, in this case, depends entirely on the amount of light the camera captures.

And the problem is not that the pictures will look just a bit darker in low light conditions. Despite the impeccable quality of our perfect camera, it will produce “dirty” photos with a lot of noise when there is not enough light.

Here is an example of a scene taken with the “perfect camera” under different light conditions. On the left, you see an image with an average of 10 photons per pixel, and on the right, we have 1,000 photons per pixel:

Where did the “dirt” in the photo on the left come from? Why did the perfect camera produce so much noise in low light instead of making a darker copy of the picture on the right?

The problem lies in the nature of the light itself. When it becomes too dark (e.g., we have reduced the shutter speed or there is simply not enough light around), the photons start to “play dice”, throwing out a random result each time.

No matter how many attempts we make to take a picture in low light conditions, we will always get different results.

For instance, if we take 5 pictures in a row of a purely white wall, the same pixel will record a different number of photons each time: 10, 12, 8, 8, 11.

As a result, the same point of the same object will be different in each shot. And the less light there is, the greater the difference. Moreover, each point on an equally white wall will also have a different brightness because the number of photons that fall on neighboring pixels will differ.

That’s exactly what we have seen in the photo above. Where the sky pixels were supposed to be of the same color, the brightness of each particular pixel in the “dirty” picture varied greatly.

### What is noise, and where does it come from?

There is no mystery to it. The falling of photons on the image sensor or each particular pixel is as random an event as the falling of raindrops on a paving tile.

A completely different number of drops will fall on each tile when it drizzles. And the rarer the drops fall, the greater the difference in the total number of drops on the tiles.

So with light. There may be 10, 12 or 8 photons per pixel coming from a white wall. But to make the wall look as purely white in the picture as it does in real life, nearly the same number of photons must fall on each pixel.

If a particular pixel got 8 photons, that is an error of -2 photons, because we would expect to see 10 photons ideally. If the second pixel got 12 photons, the error is +2 photons, etc.

This natural deviation of the number of photons from the expected number is called **noise **(more accurately, photon noise or shot noise). The stronger this deviation, the more the brightness of the related point on the image will differ.

But how do we define the noise level in this case? Well, just follow a simple instruction:

- Take 3 pictures in a row and see how many photons hit a particular pixel in each picture.
- Count the mean number of photons falling each time on the same pixel.
- Calculate the difference between each value and the average number of photons.
- Sum up these differences and divide by 3 since we have done the experiment 3 times.

To make things clear, let’s take a simple example.

#### An example

Suppose, 8 photons hit the pixel the first time, next time it were 10 photons, and the third time – 12. Let’s calculate the mean value:

(8+10+12)/3 = 10

Now we have to calculate the difference between the number of photons hitting the pixel each time and the average value. If 8 photons hit the pixel the first time, and the average value is 10, then we get:

8-10 = -2

Thus, the number of photons hitting the pixel the first time differs from the average value by -2; for the second time, there is no difference (because 10 is the average number of photons), and for the last time, by +2 (because 12 – 10 is equal 2).

All that remains is to add all these deviations and divide by 3. That’s how we get an *average deviation* or noise:

(-2+0+2)/3=0

A zero?! It appears there was no deviation on average, which is contrary to our observation, obviously.

Indeed, we made a mistake because the number of photons deviated both positively and negatively. Having added these deviations, we canceled them out, i.e., we came to zero.

So, how do we solve this problem? Right! We need to eliminate the negative numbers, which can be done by squaring them. Let’s try:

(-2)^{2}+0^{2}+2^{2} = 4+0+4 = 8

Now divide 8 by 3, and we get 2.7. That’s it! The average noise level is 2.7 photons, meaning that the number of photons (and therefore the brightness of each pixel on image) differs by the value of 2.7 on average. Right? Again, no!

The number we get is an essential measure in statistics known as the **variance**. It’s a measure of how far a set of numbers is spread out from their average value.

And that’s pretty much what we need, with one important exception – don’t forget that we’ve squared all the values. The variance shows the average deviation of the values from the mean, squared.

To calculate actual deviation from the mean (called the **standard deviation** in statistics), you need to get rid of the square and take the square root of the variance:

Standard deviation = √2.7 = 1.64

There we have it! This is the **noise level**, meaning how much the number of photons in a particular pixel differs from the expected number of photons on average.

### Signal-to-noise ratio as the most important quality measure

Now, suppose 10 photons hit a particular pixel on average. How do we know the noise level for that pixel given that number of photons? Do we have to count, add, square, and then take the square root of that?

Fortunately, we don’t have to do any of that. This is because rain, flipping a coin, photons flux, or the number of calls to a call center are specific random events described by **Poisson’s law** or **Poisson distribution**.

If events are independent of each other and all events occur with a certain frequency, then they are distributed according to Poisson’s law^{2}.

And if so, then according to the Poisson distribution, *the variance is the mean value*.

In the previous example, 10 photons hit the pixel on average. Since the incident photons are distributed according to Poisson’s law in real life, we don’t need to estimate the variance by calculating all deviations from the mean, adding up their squares, etc. The number 10 is already both the *mean *and the *variance *by definition.

If you want to find the noise level (*standard deviation*), you just take the square root of the variance or square root of 10 in our case. Therefore, if 100 photons hit the pixel on average, the noise level will be √100 or 10 photons; if 1000 photons incident on average, the noise level will be √1000 or 31 photons.

This basic rule describes the amount of photon noise in pixels. And this leads us to the most important measure of image quality – the signal-to-noise ratio (SNR).

If 10 photons are incident per pixel on average, the noise will be √10 or ~3 photons. So the signal-to-noise ratio or SNR for that pixel would be 10/3 = 3.33. In other words, the signal level (10 photons) differs from useless information or noise (3 photons) by a factor of ~3, which is really bad.

It is easy to notice such noise by eye because pixel brightness, which should be identical (as in the case of a white wall), will differ by a third (100% divided by SNR).

If there are 1000 photons, then the noise is 31 photons (√1000), and SNR is 1000/31 = 32. It means that the useful signal is 32 times stronger than the useless signal (noise)!

If there is a lot of light and a vast number of photons incident to each pixel, it is impossible to notice any noise at all. In the case of 50,000 photons, the noise is just 224 photons (√50000), and SNR is 223, so the brightness of each pixel will differ by no more than 0.4% (probably, such a slight difference will be ignored while digitizing the signal).

When a few raindrops fall on the sidewalk, part of the tile will be drier, and another part will have more water. But when it’s pouring, all the tiles will be equally flooded, and there is impossible to see any difference.

### Why a big pixel is better than a small one in terms of SNR?

The more photons will incident on a pixel, the higher the signal-to-noise ratio will be. In the case of 4 small pixels, each will get 25 photons, but if it were one big pixel, it would get 100 photons:

We can easily calculate the SNR for each of these pixels:

**The SNR of the small pixel**. In the case of 25 photons, the noise is √25 or 5 photons, and SNR is 5.**SNR of the large pixel**. In the case of 100 photons, the noise is √100 or 10 photons, and SNR is 10.

So if the area of the large pixel is 4 times larger than the small one, then its SNR will be 2 times higher.

Therefore, in low light, we can make the picture cleaner by increasing the size of the pixels on the sensor so that more photons incident on each of them.

### Does pixel binning work on smartphones?

We can now answer the main question: whether *software *pixel binning really increases signal-to-noise ratio?

Why *software binning*? Because 4 pixels on a sensor do not physically merge into a big one. They are still the same 4 small pixels (pieces of silicon sensitive to light) that the smartphone reads a small signal from, and then software combines it, increasing by a factor of 4.

The logic dictates that this should work. It makes no difference how many pixels are placed on an area of, say, 2.4×2.4 µm. If, for instance, 540 photons are incident on that area, they will drop here anyway, whether there is only one large pixel (2.4 μm) or 9 small pixels (0.8 μm).

You might think that 9 small pixels don’t really occupy the entire area of one large pixel. There are barriers between pixels and other electronics placed inside pixels (transistors, capacitors, etc.).

Therefore, it may seem that small pixels in a group are less efficient than one large pixel:

However, this is not the case in real life. To avoid such a problem, we cover all pixels with micro-lenses that direct the light onto the light-sensitive element (photodiode) located inside each pixel so that nothing is lost along the way:

Thus, when a smartphone camera uses binning, it makes almost no difference whether there is one large pixel on the sensor or nine small pixels.

In the former case, the camera reads out 540 photons and gets the noise of 23 photons (√540). In the latter case, the camera reads out 60 photons on average from 9 small pixels, which have a noise of 8 photons (√60).

When we add up both the signal (60*9) and the noise √(60*9), we get the same signal (540 photons) and the same noise (23 photons) as in the case of a big pixel.

That means that the SNR of the big pixel is equal to the SNR of a group of several small pixels.

So does binning really work? Yes, except for one thing.

### We have no perfect camera!

Previously we talked only about a theoretical camera that does not distort the signal. The number of photons that hit the sensor equals the number of digitized photons. But in real life, the signal gets distorted many times before the photons become a point on the photo.

First of all, camera electronics have nothing to do with photons. We neither collect photons on a sensor nor count or digitize them.

When a photon incidents on a piece of silicon, it can be absorbed, knocking out an electron from a silicon atom. Or it may not. This is the first thing to consider when calculating SNR because a photon always knocks out an electron in a “perfect camera”.

Then we collect electrons that were released due to the photoelectric effect^{3}. But occasionally, some electrons inside our silicon are released without photons involved. In other words, no photon came in, but the electron still appeared. This is due to the sensor heating. An electron can be excited by thermal energy and break away from the atom independently.

So how can we tell the difference between an electron that appeared because of an incident photon and an electron that “popped out” on its own? Obviously, we can’t. So we have to consider the thermal noise (a random amount of electrons from the so-called *dark current*^{4} or heat) when calculating the SNR.

Next, we read the voltage created by the electrons and finally digitize it. But can you imagine what hardware capable of reading the voltage to the exact charge of a single electron must be worth?

Therefore the reading process itself distorts the signal, and this is an additional noise that has to be taken into account when calculating the pixel’s signal-to-noise ratio.

If we want to see a more realistic signal quality increase by pixel binning, we can’t use this simple formula:

SNR = photons / √photons

When 100 photons hit an ideal sensor, we calculate SNR by dividing 100 (signal) by √100 (noise) and get 10. But in a non-ideal world, additional noise must be taken into account (don’t be scared by this formula, it can be simplified):

SNR = (photons * quantum_efficiency) / √(photons + dark_current * exposure_time + readout_noise^{2})

Since we are talking about a mobile camera, we can drop the dark_current * time_exposure from this equation since thermal noise is a challenge in astrophotography. We usually don’t use a shutter speed longer than a second on a smartphone. So electrons are not excited due to high temperature.

Quantum efficiency is the incident photon to converted electron ratio (how many photons knocked out an electron from an atom). If the pixel quantum efficiency is 80%, we need to multiply the number of incident photons by 0.8. But we can throw out this variable because quantum efficiency does not depend on pixel size.

After all the simplifications, our formula will look like this:

SNR = photons / √(photons + read_noise^{2})

This is where we see the main limitation of binning. The SNR of one large pixel is not equal to the SNR of several small pixels, because when we read the signal from each small pixel, we introduce read-out noise.

Therefore, the signal-to-noise ratio of a single large pixel will differ from the SNR of four small pixels by that very same read-out noise since we read the signal from the small pixels 4 times before “combining” the pixels.

To make it perfectly clear, let’s look at a simple example. Suppose that 200 photons fall on a 1.6×1.6 (μm) area and the pixel read-out noise is 2 electrons. That is, if 200 photons knocked out 200 electrons, the electronics could count 198 or 202 electrons.

Now let’s compare signal-to-noise ration in different scenarios:

#### Big pixel SNR

The image sensor consists of relatively large pixels of 1.6 µm. Since the entire 1.6×1.6 µm area is occupied by a single pixel, all 200 photons are incident on it. All we have to do is read this information by introducing read-out noise just once (2 electrons). Let’s calculate the SNR of this pixel:

SNR = 200/√(200+2^{2}) = 200/√204 = 200/14.28 = **14**

#### Small pixel SNR

The image sensor consists of tiny pixels of 0.8 µm. Since there are 4 small pixels in an area of 1.6×1.6 µm, only 50 photons (200/4) will be incident on each pixel. Let’s find out its signal-to-noise ratio:

SNR = 50 / √(50+2^{2}) = 50/√54 = 50/7.23 = **6.8**

#### SNR of the “enlarged” pixel after binning

If our image sensor supports binning, we need to add up all photons from four small pixels. But this will require reading the information 4 times, thus, introducing read-out noise 4 times. Therefore the signal-to-noise ratio of the “virtual” superpixel will be as follows:

SNR = 4*50/√(4*50+4*2^{2}) = 200/√216 = 200/14.7 = **13.6**

This example shows that the signal quality of a large pixel is 2 times higher than a small one. However, when small pixels are binned, even considering the read-out noise, the signal quality of the “combined” superpixel is almost the same as the SNR of the physically large pixel.

This is, of course, assumed that the readout noise is equal to two electrons. Since smartphone manufacturers do not disclose this information, no one knows what it really is.

The large DSLR pixels’ read-out noise can range from 1.5 to 5 electrons, sometimes even higher. However, it is worth keeping in mind that the smaller the pixel the lower the read-out noise.

### Let’s summarize

Unfortunately, this article does not answer many questions. In particular, we considered pixel binning only in low light, when there are very few photons. But what happens in good lighting conditions?

Is it better to have many small pixels vs fewer big ones? Does binning help increase the dynamic range of the sensor in good light?

We’ll talk about all that another time, but I hope this article provided a comprehensive answer to the main question.

Pixel binning is not a gimmick but a real tool that reduces noise and improves image quality overall while letting you shoot at a higher resolution when needed.

Once you understand how binning works, you can use a simple rule of thumb. If the sensor supports 2×2 binning, it improves the signal-to-noise ratio of the binned pixel by a factor of 2; for 3×3 binning, it increases the SNR by a factor of 3, etc.

Needless to say, the resolution of the images decreases proportionally with binning. Increasing SNR by two times reduces the resolution by 4 times (2^{2}), if we want to increase signal-to-noise ratio by 3 times, the resolution will decrease by 9 times (3^{2}), etc.

**Alex Salo**, Tech Longreads founder

**If you enjoyed this post, please share it! Also, consider to support Tech Longreads on Patreon** **and get bonus materials!**