Basic image manipulation

Learn to read and edit pixel values, use image ROI and other basic operations. The four main points are as follows:

  • Accessing pixel values and modifying them

  • Accessing image properties

  • Set the region of interest (ROI)

  • Splitting and merging images

Almost all of the operations in this section are primarily related to Numpy, not OpenCV. writing better optimized code using OpenCV requires a good knowledge of Numpy.

Accessing pixel values and modifying them

>>> import numpy as np
>>> import cv2 as cv
>>> img = cv.imread('messi5.jpg')

Next you can access the pixel values by their row and column coordinates. For BGR images, it returns an array of blue, green, and red values. For grayscale images, only the corresponding intensity is returned:

>>> px = img[100,100] #Access the pixel value at the coordinates (100, 100)
>>> print( px ) #The printout is the BGR, which is the blue, green, red, and corresponding values
[157 166 200]
# #To access the B channel pixel value, then pass in index 0, and accordingly access the R channel, which is 2
>>> blue = img[100,100,0]
>>> print( blue )
>>> red = img[100,100,2]
>>> print( red )

We can directly modify the pixel value of a coordinate:

>>> img[100,100] = [255,255,255]
>>> print( img[100,100] )
[255 255 255]

Numpy is an optimization library for fast array computation. Therefore, simply accessing each pixel value and modifying it will be very slow, and the above code is for demonstration purposes only and is not a recommended practice.

A more elegant approach to accessing and modifying pixels would be something like this:

# Access coordinates 10, 10 out of the R-value
>>> img.item(10,10,2)
# Modify the R-value for coordinates 10, 10 out
>>> img.itemset((10,10,2),100)
>>> img.item(10,10,2)

Accessing image properties

Get the shape of the image:

>>> print( img.shape )
(342, 548, 3)

342 is the height, which is the number of rows of pixel values, 548 is the width, which is the number of columns of pixel values, and 3 represents 3 channels, indicating that this is a color map and not a grayscale map. If it is a grayscale map, then the only results returned are height and width.

Get the total number of pixels: 342*548*3 = 562248

>>> print( img.size )

Get the data type of the image:

>>> print( img.dtype )

Since the maximum value of pixels is 255, 8 bits are sufficient.

Setting the Region of Interest ROI (ROI-Region of Interest)
Sometimes, we will have to use certain image regions. For example, for eye detection in an image, face detection is first performed on the whole image. When a face is obtained, we select only the face region and search for eyes in it, instead of searching the whole image. It improves accuracy (because the eyes are always in the face) and performance (because we search in a small region).

Use Numpy indexing to get the ROI. here I select the ball and copy it to another region in the image:

>>> ball = img[280:340, 330:390]
>>> img[273:333, 100:160] = ball

The rendering is as follows:


Splitting and merging images
Sometimes you need to process the B, G, and R channels of an image separately. In this case, you need to split the BGR image into individual channels. In other cases, you may need to join these separate channels to create a BGR image. You can do this simply by:

>>> b,g,r = cv.split(img)
>>> img = cv.merge((b,g,r))

However, cv.split is not as efficient as the following way of using indexes:

>>> b = img[:,:,0]

Modifications can also be made using indexes, for example if you want to set all red values to 0:

>>> img[:,:,2] = 0

Sometimes we want to add a border to an image, like a photo frame, we can use cv.copyMakeBorder(). But it has more applications in convolution operations, zero padding, etc. This function takes the following parameters:


Here is a sample code that demonstrates all these border types so you can better understand:

import cv2 as cv
import numpy as np
from matplotlib import pyplot as plt
BLUE = [255,0,0]
img1 = cv.imread('opencv-logo.png')
replicate = cv.copyMakeBorder(img1,10,10,10,10,cv.BORDER_REPLICATE)
reflect = cv.copyMakeBorder(img1,10,10,10,10,cv.BORDER_REFLECT)
reflect101 = cv.copyMakeBorder(img1,10,10,10,10,cv.BORDER_REFLECT_101)
wrap = cv.copyMakeBorder(img1,10,10,10,10,cv.BORDER_WRAP)
constant= cv.copyMakeBorder(img1,10,10,10,10,cv.BORDER_CONSTANT,value=BLUE)

The effect is shown below:


Arithmetic operations on images

Perform arithmetic operations on images, such as addition, subtraction, bitwise operations, etc. These functions are mainly used: cv.add(), cv.addWeighted(), etc.

Image Addition

You can add two images using the OpenCV function cv.add() or simply by the numpy operation res = img1 + img2. Both images should have the same depth and type, or the second image can be just a scalar value.

There is a difference between OpenCV addition, which is a saturation operation, and Numpy addition, which is a modulo operation.

>>> x = np.uint8([250])
>>> y = np.uint8([10])
>>> print( cv.add(x,y) ) # 250+10 = 260 => 255
>>> print( x+y )          # 250+10 = 260 % 256 = 4

Image Blending

This is also image addition, but assigns different weights to the images in order to give a blended or transparent look. Images are added according to the following equation:


α from 0→1. Here we blend the two images together. The first image has a weight of 0.7 and the second image has a weight of 0.3. cv.addWeighted() applies the following equation to the images, where γ is 0.


img1 = cv.imread('ml.png')
img2 = cv.imread('opencv-logo.png')
dst = cv.addWeighted(img1,0.7,img2,0.3,0)

The effect is as follows:


Bit Operations

This includes per-bit with, or, non, and iso-or operations. They are useful when extracting any part of an image (as we will see in the next sections), defining and using non-rectangular ROIs, etc. Below we will see an example of how to change a specific region of an image.

For example, put the OpenCV logo on top of the image. If I add two images, it changes the color. If I blend them, I get a transparent effect. But I want it to be opaque. If it were a rectangular area, I could use ROI, but the OpenCV logo is not a rectangle. So you can do it using bitwise operations, as follows:

# Read two images
img1 = cv.imread('messi5.jpg')
img2 = cv.imread('opencv-logo-white.png')
# I wanted to put the logo in the top left corner, so I created a ROI
rows,cols,channels = img2.shape
roi = img1[0:rows, 0:cols]

# Now create a logo mask and create its reverse mask
img2gray = cv.cvtColor(img2,cv.COLOR_BGR2GRAY)
ret, mask = cv.threshold(img2gray, 10, 255, cv.THRESH_BINARY)
mask_inv = cv.bitwise_not(mask)

# Now black out the logo area in the ROI
img1_bg = cv.bitwise_and(roi,roi,mask = mask_inv)

# Get only the logo area from the logo image
img2_fg = cv.bitwise_and(img2,img2,mask = mask)

# Placement of logo
dst = cv.add(img1_bg,img2_fg)
img1[0:rows, 0:cols ] = dst




Performance testing and improvement

Getting the solution is important. But getting it in the fastest way possible is even more important.

In image processing, since you are processing a large number of operations per second, it is imperative that your code not only provides the right solution, but also provides it in the fastest possible way. Next, let's look at how to measure the performance of your code and some tips to improve it.

These functions will be used: cv.getTickCount, cv.getTickFrequency, etc.

In addition to OpenCV, Python provides a module time that helps to measure execution time. Another module, profile, helps to get detailed reports about the code, such as how much time each function in the code took, how many times the function was called, etc.

Measuring Performance with OpenCV

The cv.getTickCount function returns the number of clock cycles from the moment the machine is turned on to the moment this function is called. Thus, if you call it before and after the function is executed, you will get the number of clock cycles used to execute the function.

The cv.getTickFrequency function returns the frequency of clock cycles, or the number of clock cycles per second. So, to find the execution time in seconds, you can do the following:

e1 = cv.getTickCount()
# Place your code here
e2 = cv.getTickCount()
time = (e2 - e1)/ cv.getTickFrequency()

This is demonstrated by the following example. The following example applies median filtering with kernels ranging in size from 5 to 49:

img1 = cv.imread('messi5.jpg')
e1 = cv.getTickCount()
for i in range(5,49,2):
    img1 = cv.medianBlur(img1,i)
e2 = cv.getTickCount()
t = (e2 - e1)/cv.getTickFrequency()
print( t )
# 0.521107655 seconds

You can also use the time module to time

Default optimizations in OpenCV
Many OpenCV functions are optimized using SSE2, AVX, etc. It also contains unoptimized code. Therefore, if our system supports these functions, we should take advantage of them (almost all modern processors support them). They are enabled by default at compile time. So OpenCV runs optimized code if it is enabled, otherwise it runs unoptimized code. You can use cv.useOptimized() to check if it is enabled/disabled and cv.setUseOptimized() to enable/disable it. Let's look at a simple example.

# Check if optimization is enabled
In [5]: cv.useOptimized()
Out[5]: True
In [6]: %timeit res = cv.medianBlur(img,49)
10 loops, best of 3: 34.9 ms per loop
# Disable optimization
In [7]: cv.setUseOptimized(False)
In [8]: cv.useOptimized()
Out[8]: False
In [9]: %timeit res = cv.medianBlur(img,49)
10 loops, best of 3: 64.1 ms per loop

Test time in IPython

In [10]: x = 5
In [11]: %timeit y=x**2
10000000 loops, best of 3: 73 ns per loop
In [12]: %timeit y=x*x
10000000 loops, best of 3: 58.3 ns per loop
In [15]: z = np.uint8([5])
In [17]: %timeit y=z*z
1000000 loops, best of 3: 1.25 us per loop
In [19]: %timeit y=np.square(z)
1000000 loops, best of 3: 1.16 us per loop

Performance Optimization Technology

There are several techniques and coding methods to get the most out of Python and Numpy. Only relevant content is noted here, with links to important sources. The main thing to note here is to first try to implement the algorithm in a simple way. Once it starts working, analyze it, find bottlenecks and then optimize it.

  1. Avoid using loops in Python whenever possible, especially double/triple loops and the like. They are inherently slow.

  2. Vectorize the algorithm/code as much as possible, as Numpy and OpenCV are optimized for vector operations.

  3. Take advantage of cache consistency.

  4. Never copy arrays unless necessary. Try to use the view instead. Array copying is an expensive operation.

If your code is still slow after performing all these operations, or if large loops are unavoidable, use another library such as Cython to speed things up.


Python Optimization Techniques:

Numpy Advanced Operations:

Timing and Analysis in IPython:

Related articles

How does FastAPI close the interface documentation?

FastApi comes with interface documentation, which saves us a lot of work when developing back-end interfaces. It automatically identifies the parameters of the interface based on your code and also generates a description of the interface based on your

Three ways to configure logging for FastAPI

Recently, when I was using FastAPI, I found that the official documentation of FastAPI did not have instructions related to configuring logs. Today, I will share three methods of configuring logs for FastAPI.

FastAPI interface flow limitation

Without interface flow limiting, it may lead to server load imbalance, brute force password cracking, malicious requests, extra server charges, denial of service attacks, etc. Therefore it is necessary to do a good job of interface flow limiting.

Replacing excel with Mito

Mito is a spreadsheet library in Python. If you can edit Excel files, you can write code. That's because for every action we perform in a table, Mito will automatically generate the corresponding Python code. Say goodbye to repetitive and boring operation

Python face recognition algorithm

Today we give you a summary of a few simple and good face recognition algorithms. Face recognition is a relatively common technology in computer vision. In life, the face recognition scenario that we have the most contact with is face attendance.

Three ways to parse parameters in Python

Like the diagram above, we have a standard structure to organize our small projects: The folder named data that contains our dataset file The file for specifying hyperparameters