Image Histograms with Python and Pillow
This is the third in my ongoing series about the Pillow image manipulation library. You might like to read the first two before this one.
An Introduction to the Python Pillow Image Library
The Python Pillow Image Library part 2
The Image module of the Pillow library has a method called histogram. When I first saw it I naively assumed that it generated three nice little graphics looking something like these which are from GIMP (GNU Image Manipulation Program)
I was wrong! What the method actually does is to return the frequencies of the colour values 0 to 255 for the three colour channels red, green and blue, or a single set of frequencies for greyscale images. To put it another way, it gives us the raw data for the histograms.
Another Pillow module is ImageDraw which provides a set of methods for drawing in an image. So although Pillow does not actually create histograms it gives us all the data and drawing functionality we need to create them. So let’s do so . . .
The Plan
As you no doubt know each pixel in an image consists of a red, a green and a blue value between 0 and 255. Plotting the frequencies of each value for each of the three channels can give us an idea of both the predominance of each colour as well as the overall brightness of that colour throughout the image. A practical use for such histograms is to judge how much, if at all, the colour balance of an image needs to be adjusted and image editing software typically provides them.
The raw data provided by Pillow’s histogram method is a single list of integers, and in a 24-bit colour image there are 768 values, the first 256 representing red values from 0 to 255, and the next two blocks of 256 values representing green and blue respectively. In an 8-bit black and white image there are just 256 values.
This is a sample from a colour image, and shows that there are 152 pixels in the image with a red value of 0, 176 with a red value of 1 etc..
[152, 176, 439, 1024, 2131, 2887, 3031, 2918, 2855...
In this project I will write a module with a function called create_histograms. This will take a Pillow image and return a dictionary of Pillow images of histograms, one for greyscale images and three for colour images. The calling code can then either save these or display them in a GUI.
Of course my code needs to create histograms the same shape as the GIMP ones shown above but instead of using grey for the histogram itself and a graduated colour bar at the bottom I will draw the individual vertical lines in the colours they correspond to.
Another difference is that the Gimp histograms are drawn independently and all hit the top of their plot areas, ignoring relative frequencies of the other colour channels. I find this misleading and mine are drawn with the same frequency scale (y-axis) for all three. I hope you agree that this is an improvement.
The Project
This project consists of the following files.
colors_histogram.py
colors_histogram_demo.py
They are in the same Github repository as those for Part 1 and 2. I am using the photo which is shown below but you’ll probably want to run the code with your own.
The Code
This is the colors_histogram module.
from PIL import Image, ImageDraw
def create_histograms(image):
“”“
Takes a Pillow image.
Returns a dictionary of Pillow images of colour
histograms.
For colour (mode “RGB”) images there are 3 with
keys “red”, “green” and “blue”.
For greyscale (mode “L”) images there is one with key “greyscale”.
Raises ValueError if mode is not “RGB” or “L”
“”“
if image.mode == “RGB”:
normalized_frequencies = _create_normalized_frequencies_rgb(image)
return {”red”: _create_histogram((0,), normalized_frequencies[”red”]),
“green”: _create_histogram((1,), normalized_frequencies[”green”]),
“blue”: _create_histogram((2,), normalized_frequencies[”blue”])}
elif image.mode == “L”:
normalized_frequencies = _create_normalized_frequencies_greyscale(image)
return {”greyscale”: _create_histogram((0,1,2), normalized_frequencies)}
else:
raise ValueError(”Image must have mode of RGB or L”)
def _create_normalized_frequencies_rgb(image):
“”“
Create a dictionary of 3 sets of frequencies
as fractions of the highest frequency.
Keys are “red”, “green” and “blue”.
Frequencies are lists.
“”“
# get the flat frequencies data using the Pillow histogram method
frequencies = image.histogram()
# find the highest of the 3 frequencies
max_freq = max(frequencies)
# Create the dictionary using list comprehensions.
normalized_frequencies = {”red”: [f / max_freq for f in frequencies[0:256]],
“green”: [f / max_freq for f in frequencies[256:512]],
“blue”: [f / max_freq for f in frequencies[512:768]]}
return normalized_frequencies
def _create_normalized_frequencies_greyscale(image):
“”“
Create a list of of frequencies as
fractions of the highest frequency.
“”“
# get the flat frequencies data using the Pillow histogram method
frequencies = image.histogram()
# get the highest frequency
max_freq = max(frequencies)
# calculate frequencies using list comprehension
normalized_frequencies = [f / max_freq for f in frequencies]
return normalized_frequencies
def _create_histogram(channels, frequencies):
“”“
Create a histogram from frequency
data as a Pillow image.
“”“
width = 256
height = 158
column_width = 1
im = Image.new(”RGB”, (width, height), (255,255,255))
draw = ImageDraw.Draw(im)
col = [0,0,0]
for v in range(0, 256):
# set the value of the particular RGB channel
# to values between 0 and 255
for channel in channels:
col[channel] = v
# draw the individual histogram column
draw.line(xy=[(v, height),(v, height - (height * frequencies[v]))],
fill=tuple(col),
width=column_width)
return imcreate_histograms
This function consists mainly of function calls to create normalized frequencies (which we’ll get to in a moment) which are then passed to another function to actually create histograms, the latter being done within the creation of a dictionary.
As you can see there are two separate tasks here, one for colour images and one for black and white. Other image types will raise an exception.
_create_normalized_frequencies_rgb
The actual frequencies aren’t much use for drawing histograms. What we need is the frequency as a fraction of the highest frequency.
In my image the highest frequency is 33593 (which incidentally is 183 green). If you look at the histogram for the green channel this is represented by the peak which hits the top. For each of the frequencies the normalized frequency is the actual frequency divided by the highest which gives a real value between 0 and 1.
The normalized values tell us how far up the histogram each column needs to go relative to the highest. Therefore all we need to do to calculate a column height in pixels is to multiply the histogram height by the normalized value.
For RGB images we need three sets of these frequencies in a dictionary. These are calculated by list comprehensions as the dictionary is being created. Note how the list provided by Pillow is sliced up into three chunks: [0:256], [256:512] and [512:768] for red, green and blue respectively.
_create_normalized_frequencies_greyscale
This works on the same principle as the previous function but is simpler as we only need one set of values.
_create_histogram
I have written a lot of data vizualization code over the years in various languages and the biggest problem is allowing for different ranges on both the x and y axes and then scaling these up or down to the required image size.
Here I haven’t bothered with any of that but have just hard coded the image size as I know we are always going to be dealing with exactly 256 values. A rare luxury! The height is also hard coded to the width divided by the Golden Ratio 1.618, rounded down to the nearest integer. This is just my personal choice and there is no reason for the histograms to be any particular height.
Next we create a new Pillow image with RGB colour depth, a tuple for the size, and another tuple of RGB values for the background colour. The ImageDraw.Draw method then gives us an object we can use to draw on the image.
The channels argument is a tuple specifying which channel or channels the histogram is being drawn for. It will be (0,) for red, (1,) for green and (2,) for blue, or (0,1,2) for greyscale where all three RGB values are the same.
We now iterate from 0 to 255. The col list is used to hold the RGB values for the current column in the histogram, and the relevant item(s) is set to the current value of v.
Finally we draw a line from the bottom up to the required height, calculated as described in the section on the _create_normalized_frequencies_rgb function. We also pass the col list cast to a tuple, and the line width.
The module is now finished so let’s write a bit of code to try it out.
colors_histogram_demo.py
from PIL import Image
import colors_histogram
def main():
print(”--------------------”)
print(”| codedrome.com |”)
print(”| Pillow Histogram |”)
print(”--------------------”)
filename = “photo.jpg”
try:
image = Image.open(filename)
histograms = colors_histogram.create_histograms(image)
if image.mode == “RGB”:
histograms[”red”].save(”histogram_red.png”, “PNG”)
histograms[”green”].save(”histogram_green.png”, “PNG”)
histograms[”blue”].save(”histogram_blue.png”, “PNG”)
elif image.mode == “L”:
histograms[”greyscale”].save(”histogram_greyscale.png”, “PNG”)
image.close()
print(”histograms saved”)
except IOError as e:
print(e)
except ValueError as e:
print(e)
if __name__ == “__main__”:
main()Firstly edit filename if you are using your own photo.
After opening an image we pass it to colors_histogram.create_histograms, catching the returned dictionary in the histograms variable. Depending on whether the image is colour or black and white we then save the three or the one histogram before closing the image.
There are two possible errors we need to catch here. An IOError is raised if there is a problem opening the image or saving the histograms, and a ValueError is raised if the image’s mode isnt “RGB” or “L”.
Now let’s run the program.
python3 colors_histogram_demo.py
Open the folder where your code is and you will find three new png files (or one for monochrome images).
I am pleased with the result and I think using actual colours is very effective. The histograms are deliberately minimalist as their primary use is within a GUI which would provide its own border, headings and other information, similar to the Gimp screenshots above.
I have a few more Pillow articles planned so if you liked the first three stay tuned.










