Basic image manipulation
Learn to read and edit pixel values, use image ROI and other basic operations. The four main points are as follows：
Accessing pixel values and modifying them
Accessing image properties
Set the region of interest (ROI)
Splitting and merging images
Almost all of the operations in this section are primarily related to Numpy, not OpenCV. writing better optimized code using OpenCV requires a good knowledge of Numpy.
Accessing pixel values and modifying them
>>> import numpy as np >>> import cv2 as cv >>> img = cv.imread('messi5.jpg')
Next you can access the pixel values by their row and column coordinates. For BGR images, it returns an array of blue, green, and red values. For grayscale images, only the corresponding intensity is returned：
>>> px = img[100,100] #Access the pixel value at the coordinates (100, 100) >>> print( px ) #The printout is the BGR, which is the blue, green, red, and corresponding values [157 166 200] # #To access the B channel pixel value, then pass in index 0, and accordingly access the R channel, which is 2 >>> blue = img[100,100,0] >>> print( blue ) 157 >>> red = img[100,100,2] >>> print( red ) 200
We can directly modify the pixel value of a coordinate：
>>> img[100,100] = [255,255,255] >>> print( img[100,100] ) [255 255 255]
Numpy is an optimization library for fast array computation. Therefore, simply accessing each pixel value and modifying it will be very slow, and the above code is for demonstration purposes only and is not a recommended practice.
A more elegant approach to accessing and modifying pixels would be something like this：
# Access coordinates 10, 10 out of the R-value >>> img.item(10,10,2) 59 # Modify the R-value for coordinates 10, 10 out >>> img.itemset((10,10,2),100) >>> img.item(10,10,2) 100
Accessing image properties
Get the shape of the image：
>>> print( img.shape ) (342, 548, 3)
342 is the height, which is the number of rows of pixel values, 548 is the width, which is the number of columns of pixel values, and 3 represents 3 channels, indicating that this is a color map and not a grayscale map. If it is a grayscale map, then the only results returned are height and width.
Get the total number of pixels: 342*548*3 = 562248
>>> print( img.size ) 562248
Get the data type of the image：
>>> print( img.dtype ) uint8
Since the maximum value of pixels is 255, 8 bits are sufficient.
Setting the Region of Interest ROI (ROI-Region of Interest)
Sometimes, we will have to use certain image regions. For example, for eye detection in an image, face detection is first performed on the whole image. When a face is obtained, we select only the face region and search for eyes in it, instead of searching the whole image. It improves accuracy (because the eyes are always in the face) and performance (because we search in a small region).
Use Numpy indexing to get the ROI. here I select the ball and copy it to another region in the image：
>>> ball = img[280:340, 330:390] >>> img[273:333, 100:160] = ball
The rendering is as follows：
Splitting and merging images
Sometimes you need to process the B, G, and R channels of an image separately. In this case, you need to split the BGR image into individual channels. In other cases, you may need to join these separate channels to create a BGR image. You can do this simply by：
>>> b,g,r = cv.split(img) >>> img = cv.merge((b,g,r))
However, cv.split is not as efficient as the following way of using indexes：
>>> b = img[:,:,0]
Modifications can also be made using indexes, for example if you want to set all red values to 0:
>>> img[:,:,2] = 0
Sometimes we want to add a border to an image, like a photo frame, we can use cv.copyMakeBorder(). But it has more applications in convolution operations, zero padding, etc. This function takes the following parameters：
Here is a sample code that demonstrates all these border types so you can better understand：
import cv2 as cv import numpy as np from matplotlib import pyplot as plt BLUE = [255,0,0] img1 = cv.imread('opencv-logo.png') replicate = cv.copyMakeBorder(img1,10,10,10,10,cv.BORDER_REPLICATE) reflect = cv.copyMakeBorder(img1,10,10,10,10,cv.BORDER_REFLECT) reflect101 = cv.copyMakeBorder(img1,10,10,10,10,cv.BORDER_REFLECT_101) wrap = cv.copyMakeBorder(img1,10,10,10,10,cv.BORDER_WRAP) constant= cv.copyMakeBorder(img1,10,10,10,10,cv.BORDER_CONSTANT,value=BLUE) plt.subplot(231),plt.imshow(img1,'gray'),plt.title('ORIGINAL') plt.subplot(232),plt.imshow(replicate,'gray'),plt.title('REPLICATE') plt.subplot(233),plt.imshow(reflect,'gray'),plt.title('REFLECT') plt.subplot(234),plt.imshow(reflect101,'gray'),plt.title('REFLECT_101') plt.subplot(235),plt.imshow(wrap,'gray'),plt.title('WRAP') plt.subplot(236),plt.imshow(constant,'gray'),plt.title('CONSTANT') plt.show()
The effect is shown below：
Arithmetic operations on images
Perform arithmetic operations on images, such as addition, subtraction, bitwise operations, etc. These functions are mainly used: cv.add(), cv.addWeighted(), etc.
You can add two images using the OpenCV function cv.add() or simply by the numpy operation res = img1 + img2. Both images should have the same depth and type, or the second image can be just a scalar value.
There is a difference between OpenCV addition, which is a saturation operation, and Numpy addition, which is a modulo operation.
>>> x = np.uint8() >>> y = np.uint8() >>> print( cv.add(x,y) ) # 250+10 = 260 => 255 [] >>> print( x+y ) # 250+10 = 260 % 256 = 4 
This is also image addition, but assigns different weights to the images in order to give a blended or transparent look. Images are added according to the following equation：
α from 0→1. Here we blend the two images together. The first image has a weight of 0.7 and the second image has a weight of 0.3. cv.addWeighted() applies the following equation to the images, where γ is 0.
img1 = cv.imread('ml.png') img2 = cv.imread('opencv-logo.png') dst = cv.addWeighted(img1,0.7,img2,0.3,0) cv.imshow('dst',dst) cv.waitKey(0) cv.destroyAllWindows()
The effect is as follows：
This includes per-bit with, or, non, and iso-or operations. They are useful when extracting any part of an image (as we will see in the next sections), defining and using non-rectangular ROIs, etc. Below we will see an example of how to change a specific region of an image.
For example, put the OpenCV logo on top of the image. If I add two images, it changes the color. If I blend them, I get a transparent effect. But I want it to be opaque. If it were a rectangular area, I could use ROI, but the OpenCV logo is not a rectangle. So you can do it using bitwise operations, as follows：
# Read two images img1 = cv.imread('messi5.jpg') img2 = cv.imread('opencv-logo-white.png') # I wanted to put the logo in the top left corner, so I created a ROI rows,cols,channels = img2.shape roi = img1[0:rows, 0:cols] # Now create a logo mask and create its reverse mask img2gray = cv.cvtColor(img2,cv.COLOR_BGR2GRAY) ret, mask = cv.threshold(img2gray, 10, 255, cv.THRESH_BINARY) mask_inv = cv.bitwise_not(mask) # Now black out the logo area in the ROI img1_bg = cv.bitwise_and(roi,roi,mask = mask_inv) # Get only the logo area from the logo image img2_fg = cv.bitwise_and(img2,img2,mask = mask) # Placement of logo dst = cv.add(img1_bg,img2_fg) img1[0:rows, 0:cols ] = dst cv.imshow('res',img1) cv.waitKey(0) cv.destroyAllWindows()
Performance testing and improvement
Getting the solution is important. But getting it in the fastest way possible is even more important.
In image processing, since you are processing a large number of operations per second, it is imperative that your code not only provides the right solution, but also provides it in the fastest possible way. Next, let's look at how to measure the performance of your code and some tips to improve it.
These functions will be used: cv.getTickCount, cv.getTickFrequency, etc.
In addition to OpenCV, Python provides a module time that helps to measure execution time. Another module, profile, helps to get detailed reports about the code, such as how much time each function in the code took, how many times the function was called, etc.
Measuring Performance with OpenCV
The cv.getTickCount function returns the number of clock cycles from the moment the machine is turned on to the moment this function is called. Thus, if you call it before and after the function is executed, you will get the number of clock cycles used to execute the function.
The cv.getTickFrequency function returns the frequency of clock cycles, or the number of clock cycles per second. So, to find the execution time in seconds, you can do the following：
e1 = cv.getTickCount() # Place your code here e2 = cv.getTickCount() time = (e2 - e1)/ cv.getTickFrequency()
This is demonstrated by the following example. The following example applies median filtering with kernels ranging in size from 5 to 49:
img1 = cv.imread('messi5.jpg') e1 = cv.getTickCount() for i in range(5,49,2): img1 = cv.medianBlur(img1,i) e2 = cv.getTickCount() t = (e2 - e1)/cv.getTickFrequency() print( t ) # 0.521107655 seconds
You can also use the time module to time
Default optimizations in OpenCV
Many OpenCV functions are optimized using SSE2, AVX, etc. It also contains unoptimized code. Therefore, if our system supports these functions, we should take advantage of them (almost all modern processors support them). They are enabled by default at compile time. So OpenCV runs optimized code if it is enabled, otherwise it runs unoptimized code. You can use cv.useOptimized() to check if it is enabled/disabled and cv.setUseOptimized() to enable/disable it. Let's look at a simple example.
# Check if optimization is enabled In : cv.useOptimized() Out: True In : %timeit res = cv.medianBlur(img,49) 10 loops, best of 3: 34.9 ms per loop # Disable optimization In : cv.setUseOptimized(False) In : cv.useOptimized() Out: False In : %timeit res = cv.medianBlur(img,49) 10 loops, best of 3: 64.1 ms per loop
Test time in IPython
In : x = 5 In : %timeit y=x**2 10000000 loops, best of 3: 73 ns per loop In : %timeit y=x*x 10000000 loops, best of 3: 58.3 ns per loop In : z = np.uint8() In : %timeit y=z*z 1000000 loops, best of 3: 1.25 us per loop In : %timeit y=np.square(z) 1000000 loops, best of 3: 1.16 us per loop
Performance Optimization Technology
There are several techniques and coding methods to get the most out of Python and Numpy. Only relevant content is noted here, with links to important sources. The main thing to note here is to first try to implement the algorithm in a simple way. Once it starts working, analyze it, find bottlenecks and then optimize it.
Avoid using loops in Python whenever possible, especially double/triple loops and the like. They are inherently slow.
Vectorize the algorithm/code as much as possible, as Numpy and OpenCV are optimized for vector operations.
Take advantage of cache consistency.
Never copy arrays unless necessary. Try to use the view instead. Array copying is an expensive operation.
If your code is still slow after performing all these operations, or if large loops are unavoidable, use another library such as Cython to speed things up.References
Python Optimization Techniques: https://wiki.python.org/moin/PythonSpeed/PerformanceTips
Numpy Advanced Operations: https://scipy-lectures.github.io/advanced/advanced_numpy/index.html#advanced-numpy
Timing and Analysis in IPython: https://pynash.org/2013/03/06/timing-and-profiling/