一个简单的计算机视觉数据扫描程序,可用于自动阅卷或者快速录入手写的表格数据

最近带北大化院的本科生上基础实验课,按照学校的要求,每周需要给学生从10个方面评分。在纸质评分表上,我用加号和减号的数量表示学生该项分数相对“平均水准”的偏差,通过正态分布拟合求解偏差值代表的分值,最后通过加权求和就可以给出学生的最终成绩。我带的学生一共是40个人,换句话说,这意味着我每周需要从纸质表格上转移400个数据到电脑上,这非常耗时并且容易出错(比如看岔行)。
为了解决这个麻烦的问题,我的第一反应是使用现在很成熟的答题卡读卡系统。不过花了5分钟完全了解了这个市场的情况之后我直接放弃了这个想法,因为他们这种传统行业实在是太封闭了,就算我愿意花几千买一套这种系统,用起来也受到很多限制,而且不知道要谈多久才能买到一套。
我的第二反应是搜索有没有现成的自动拍照阅卷系统,结果花了5分钟找到的几个应用或者服务看起来没有一个靠谱的。甚至有的还要收费,不可思议。考虑到这样的系统实在是简单的扣脚,我只好再浪费十几分钟的时间自己写一个,顺便放在这里,以方便后来者。

系统设计

为了方便,我并没有写前端界面,而是写在 jupyter notebook 中运行。这里我首先介绍系统设计,从而说明这个系统的原理。然后按照代码依次简单介绍每一步的意图,从而可以让你直接使用。

数据读取区域设计

首先,由于我懒得拿出扫描仪扫描数据记录表,而准备用手机摄像头拍照代替,为了增强自动扫描的可靠度,我选择在打印的数据记录表格上加入一些特殊的定位标记。借鉴 QR Code 的设计,在记录表格的四周加上 “Position Detection Pattern” 和 “Timing Pattern”。Position Detection Pattern (PDP) 就是在 QR Code 中经常看到的标志性的方块套个环的图样,如图 1 所示。而所谓的 Timing Pattern 其实就是黑白1:1相间的条纹,用来定义最小格点。如果大家仔细观察 QR Code,就能发现在每个相邻 PDP 之间都有一条 Timing Pattern。加上这些标记后,我们可以设计出如图 2 所示的一张数据记录表,在相邻 PDP 之间的黑白方块就是 Timing Pattern。

Figure 1. 常见的 QR Code 中的 Position Detection Pattern
Figure 2. 加入定位标记的数据记录表,遮盖了部分隐私信息

有了 PDP,我们只需要通过简单地寻找图像中 contour 嵌套为 3 级的位置就能找到数据记录的位置,并且对照片进行 Perspective Transformation。而有了 Timing Pattern,我们就可以方便地求解出每一格的精确位置,并且对照片畸变提供一定的抵抗能力。

照片处理

用 OpenCV 对照片进行简单的预处理。分为几个部分:

PDP识别

PDP 识别首先需要降噪,Canny 求解轮廓,求解 Contour,求解嵌套数,找到嵌套数正确的 Contour,对所有嵌套数正确 Contour 的面积进行排序,并取最优的 Contour 确定为 PDP。

Perspective Transformation

由于没有用扫描仪,而且拍照的时候我姿势不好,一般照片里的数据记录表不可能是正着拍的,一般都是从有点歪的角度。所以需要首先进行视角的校正,学名叫做 Perspective Transformation。我们知道正投影时四个 PDP 按照设计应该排成一个矩形,那么通过上一步确定的四个 PDP 的位置,就可以用四点法求解正投影时的图像了。

图像畸变校正

因为用手机拍A4纸不可能是从无穷远处拍的,所以得到的图像肯定存在点投影畸变,尤其是如果用广角镜头拍,畸变就很严重了,所以不能直接假设照片里的每个格子都是等宽等长的。我们用 Timing Pattern 来确定每个格子的真实大小和位置。这个算法很简单,二值化后求解不连续域即可。

内容识别

最后,我们依次对每个格子内的图像中的内容进行目标探测,使用一个简单的深度神经网络,就可以很轻松地识别每个格子中有几个加号几个减号。如果用其它的符号比如字母或者数字,只需要一起加入训练集就行了。

数据输出

识别出每个格子的内容后用 Numpy 就可以计算最终的结果了。计算完成之后可以导出成 CSV,用其它软件排版后打印。数据输出还可以用 Pandas 等等各种办法,不再一一枚举。

实现

首先我们导入一些需要的库,并且定义一个方便的函数用来在 jupyter notebook 中查看 OpenCV 图像,这是为了能及时查看中间结果,避免出错。

import cv2
import matplotlib.pyplot as plt
import numpy as np
import sys
%matplotlib inline

def my_show_img(img, code=cv2.COLOR_BGR2RGB):
    cv_rgb = cv2.cvtColor(img, code)
    fig, ax = plt.subplots(figsize=(16, 10))
    ax.imshow(cv_rgb)
    fig.show()

PDP识别

读入照片,并且求解它的灰度图

img = cv2.imread("grade2.jpg")
print("Img Size: ", img.shape)
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

为了正确找到图像中的轮廓,首先要去除照片的噪声,去除噪声常用高斯模糊或者 Bilateral Filter,你可以根据自己的手机相机的噪点情况自行调整

# If strange coutours are detected, try increase the gaussian 
#   core size to filter more noise
# img_gb = cv2.GaussianBlur(img_gray, (5, 5), 0)
# img_gb = cv2.GaussianBlur(img_gray, (9, 9), 0)

# If the image is too fucking noisy, apply the bilateral filter 
#  so that the PDPs can be more clearly shown
# Apply bilateral filter with d = 15,  
# sigmaColor = sigmaSpace = 75. 
img_gb = cv2.bilateralFilter(img_gray, 15, 75, 75) 

接下来,我们通过 Canny 找到轮廓,并且求解符合 PDP 特征的轮廓

edges = cv2.Canny(img_gb, 100 , 200)

cv2.findContours(edges, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)

contours, hierarchy = cv2.findContours(edges, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)

hierarchy = hierarchy[0]
found = []
for i in range(len(contours)):
    k = i
    c = 0
    while hierarchy[k][2] != -1:
        k = hierarchy[k][2]
        c = c + 1
    if c >= 5:
        found.append(i)

有的时候,汉字之类的符号嵌套数也会超过临界点导致被误认为是 PDP,因此我们就取图像中面积最大的 4 个 PDP 当成正确的 PDP (显然如果按照我们的设计,不可能有汉字能比 PDP 还大),并且求出它们的中心位置。

# I only want the largest 4 ones
def area_4_point_rectangle(rect):
    xs = [i[0] for i in rect]
    ys = [i[1] for i in rect]
    area = (max(xs) - min(xs))*(max(ys) - min(ys)) / 2
    return area

areas = []
for i in found:
    rect = cv2.minAreaRect(contours[i])
    box = cv2.boxPoints(rect)
#     print(box)
    areas.append(area_4_point_rectangle(box))

found = [x for _,x in sorted(zip(areas, found))]

found = found[-4:]

# Find box centers

pdp = []
for i in found:
    rect = cv2.minAreaRect(contours[i])
    box = cv2.boxPoints(rect)
    centerx = (box[0][0] + box[2][0])/2
    centerx = int(centerx)
    centery = (box[0][1] + box[2][1])/2 
    centery = int(centery)
    center = (centerx, centery)
    pdp.append(center)

作为中间控制步骤,我们可以将图像输出看看是否正确找到了 4 个 PDP。

print(pdp)
draw_img = img.copy()
for i in pdp:
    draw_img = cv2.circle(draw_img, i ,10, (0,255,0),2)
my_show_img(draw_img)

如图 3 所示,我们成功从照片中定位到了 4 个 PDP。

Figure 3. 成功检测到了 4 个 PDP,照片使用 Bilateral Filter 降噪

接下来我们对图像进行 Perspective Transformation。定义 4 点法变换函数

def order_points(pts):
    # initialzie a list of coordinates that will be ordered
    # such that the first entry in the list is the top-left,
    # the second entry is the top-right, the third is the
    # bottom-right, and the fourth is the bottom-left
    rect = np.zeros((4, 2), dtype = "float32")
    # the top-left point will have the smallest sum, whereas
    # the bottom-right point will have the largest sum
    s = pts.sum(axis = 1)
    rect[0] = pts[np.argmin(s)]
    rect[2] = pts[np.argmax(s)]
    # now, compute the difference between the points, the
    # top-right point will have the smallest difference,
    # whereas the bottom-left will have the largest difference
    diff = np.diff(pts, axis = 1)
    rect[1] = pts[np.argmin(diff)]
    rect[3] = pts[np.argmax(diff)]
    # return the ordered coordinates
    return rect

def four_point_transform(image, pts):
    # obtain a consistent order of the points and unpack them
    # individually
    rect = order_points(pts)
    (tl, tr, br, bl) = rect
    # compute the width of the new image, which will be the
    # maximum distance between bottom-right and bottom-left
    # x-coordiates or the top-right and top-left x-coordinates
    widthA = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))
    widthB = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))
    maxWidth = max(int(widthA), int(widthB))
    # compute the height of the new image, which will be the
    # maximum distance between the top-right and bottom-right
    # y-coordinates or the top-left and bottom-left y-coordinates
    heightA = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))
    heightB = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))
    maxHeight = max(int(heightA), int(heightB))
    # now that we have the dimensions of the new image, construct
    # the set of destination points to obtain a "birds eye view",
    # (i.e. top-down view) of the image, again specifying points
    # in the top-left, top-right, bottom-right, and bottom-left
    # order
    dst = np.array([
        [0, 0],
        [maxWidth - 1, 0],
        [maxWidth - 1, maxHeight - 1],
        [0, maxHeight - 1]], dtype = "float32")
    # compute the perspective transform matrix and then apply it
    M = cv2.getPerspectiveTransform(rect, dst)
    warped = cv2.warpPerspective(image, M, (maxWidth, maxHeight))
    # return the warped image
    return warped

然后对图像进行变换,降噪后二值化,这样我们就能初步预览各个 pattern 是否完好了,结果如图 4 所示。

pts = np.array(pdp)
img_warped = four_point_transform(img, pts)
img_blur = cv2.bilateralFilter(img_warped,15,75,75)
th, img_bi = cv2.threshold(img_blur, 100, 255, cv2.THRESH_BINARY)
my_show_img(img_bi)
Figure 4. 透视变形后得到的数据表

接下来,只要用 Timing Pattern 找到每个格子的位置即可。

# Detect the timing pattern

def find_edges(line):
    offset = []
    current_color = line[0]
    count = 1
    for i in line:
        if i == current_color:
            count += 1
        else:
            offset.append(count + 1)
            current_color = i
            count = 0
    offset.append(count + 1)
    
    pos = []
    pos_sum = 0
    for i in offset:
        pos_sum += i
        pos.append(pos_sum)

    return pos

# Pixels from 1st row
line = [i[0] for i in img_bi[0]]
# Discard first 3 items and last 4 items because they are part of PDP, not our timing pattern
xs1 = find_edges(line)[3:-4:2]
xc1 = find_edges(line)[4:-4:2]

# Pixels from last row
line = [i[0] for i in img_bi[-1]]
xs2 = find_edges(line)[3:-4:2]
xc2 = find_edges(line)[4:-4:2]

# Pixels from 1st column
line = [i[0][0] for i in img_bi]
ys1 = find_edges(line)[3:-4:2]
yc1 = find_edges(line)[4:-4:2]

# Pixels from last column
line = [i[-1][0] for i in img_bi]
ys2 = find_edges(line)[3:-4:2]
yc2 = find_edges(line)[4:-4:2]

# Preview rows and columns
draw_img = img_warped.copy()
shape = draw_img.shape
for i in range(len(xs1)):
    draw_img = cv2.line(draw_img, (xs1[i], 0), (xs2[i], shape[0]), (0, 255, 0), 2)

for i in range(len(ys1)):
    draw_img = cv2.line(draw_img, (0, ys1[i]), (shape[1], ys2[i]), (0, 255, 0), 2)
    
for i in range(len(xc1)):
    draw_img = cv2.line(draw_img, (xc1[i], 0), (xc2[i], shape[0]), (255, 0, 0), 2)

for i in range(len(yc1)):
    draw_img = cv2.line(draw_img, (0, yc1[i]), (shape[1], yc2[i]), (255, 0, 0), 2)

my_show_img(draw_img)

如图 5 所示是我们找到的各个格子的位置和边框。

Figure 5. 利用 Timing Pattern 找到的各个格子的位置和边框

最后,我们提取出每个格子,并且用深度神经网络探测其中的内容。这里我使用了红笔记录数据,因此还额外添加了一个过滤出红色信号的前处理来提升识别的准确率,如果你使用的是其它颜色的笔,只需要修改其中的 Mask 值即可适配。当然,这一步也可以直接省略。

以下的这段程序还引用了 pytorch-YOLOv4 的内容,你需要去 Github 上下载他们的程序并且把 tool 文件夹放在同一目录才能正常运行。权重文件根据你自己的笔迹简单训练即可,我训练时使用的数据集是从一张表格中切分产生的 100 张图片,每张图片是表格的一个格子大(从上一步找到的各个格子的边框直接切出图像来保存为数据集,然后花几分钟标注一下即可),训练使用 GPU 耗时大约一顿午饭的功夫。

from __future__ import division 

def two_point_line(p1, p2):
    A = (p1[1] - p2[1])
    B = (p2[0] - p1[0])
    C = (p1[0]*p2[1] - p2[0]*p1[1])
    return A, B, -C

def line_intersection(L1, L2):
    D  = L1[0] * L2[1] - L1[1] * L2[0]
    Dx = L1[2] * L2[1] - L1[1] * L2[2]
    Dy = L1[0] * L2[2] - L1[2] * L2[0]
    if D != 0:
        x = Dx / D
        y = Dy / D
        return x,y
    else:
        return False

# The following code extracts "red" color parts from IMG

img_hsv = cv2.cvtColor(img_warped, cv2.COLOR_BGR2HSV)

# lower mask (0-10)
lower_red = np.array([0,50,50])
upper_red = np.array([10,255,255])
mask0 = cv2.inRange(img_hsv, lower_red, upper_red)

# upper mask (170-180)
lower_red = np.array([170,50,50])
upper_red = np.array([180,255,255])
mask1 = cv2.inRange(img_hsv, lower_red, upper_red)

# join masks
mask = mask0 + mask1

# Filtered HSV image
output_hsv = img_hsv.copy()
output_hsv[np.where(mask==0)] = 0

output_hsv = cv2.cvtColor(output_hsv, cv2.COLOR_HSV2BGR)

my_show_img(output_hsv)

#save image
cv2.imwrite('cv2-red.png', output_hsv) 

'''
use YOLOv4 to identify and count + and -s in img for each img in imglist
'''

import time

import numpy as np
import cv2
from PIL import ImageGrab, Image
import mss
import win32gui

from tool.utils import *
from tool.torch_utils import *
from tool.darknet2pytorch import Darknet
import argparse

from pynput.mouse import Button, Controller
from pynput import keyboard
import pynput

import ctypes

import os

"""hyper parameters"""
use_cuda = True

namesfile = './model/autograde.names'
cfgfile = './model/yolov4-autograde.cfg'
weightfile = './model/yolov4-autograde_final.weights'

class_names = load_class_names(namesfile)

m = Darknet(cfgfile)
m.print_network()
m.load_weights(weightfile)
print('Loading weights from %s... Done!' % (weightfile))
if use_cuda:
    m.cuda()

scoreboard = np.zeros((len(xs1), len(ys1)))

for i in range(len(xs1)):
    for j in range(len(ys1)):
        l1 = two_point_line((xs1[i], 0), (xs2[i], shape[0]))
        l2 = two_point_line((0, ys1[j]), (shape[1], ys2[j]))
        neworigin = line_intersection(l1, l2)
        x0, y0 = np.int0(neworigin)
        neworigin = (x0, y0)
        l1 = two_point_line((xc1[i], 0), (xc2[i], shape[0]))
        l2 = two_point_line((0, yc1[j]), (shape[1], yc2[j]))
        newcenter = line_intersection(l1, l2)
        h = int( 2*(newcenter[0] - x0) )
        w = int( 2*(newcenter[1] - y0) )

        crop_img = output_hsv[y0:y0+w, x0:x0+h]
        boxes = count_symbol(crop_img)
        for box in boxes[0]:
            cls_id = box[6]
            if cls_id == 0:
                scoreboard[i][j] += 1
            elif cls_id == 1:
                scoreboard[i][j] -= 1

print(scoreboard)

运行所有程序,输出为

layer     filters    size              input                output
    0 conv     32  3 x 3 / 1   608 x 608 x   3   ->   608 x 608 x  32
    1 conv     64  3 x 3 / 2   608 x 608 x  32   ->   304 x 304 x  64
    2 conv     64  1 x 1 / 1   304 x 304 x  64   ->   304 x 304 x  64
    3 route  1
......
  156 conv    512  1 x 1 / 1    19 x  19 x1024   ->    19 x  19 x 512
  157 conv   1024  3 x 3 / 1    19 x  19 x 512   ->    19 x  19 x1024
  158 conv    512  1 x 1 / 1    19 x  19 x1024   ->    19 x  19 x 512
  159 conv   1024  3 x 3 / 1    19 x  19 x 512   ->    19 x  19 x1024
  160 conv     21  1 x 1 / 1    19 x  19 x1024   ->    19 x  19 x  21
  161 detection
Loading weights from ./model/yolov4-autograde_final.weights... Done!

0 0.5021945238113403 0.4358288049697876 0.2075052261352539 0.6526204347610474
0 0.7201794385910034 0.5339558720588684 0.17688947916030884 0.6279205083847046
0 0.5135188698768616 0.4898378252983093 0.20927563309669495 0.74901282787323
0 0.4662136435508728 0.5425143837928772 0.2105109691619873 0.8430360555648804
......
1 0.5288676619529724 0.4923632740974426 0.18896225094795227 0.14317214488983154
1 0.5295970439910889 0.5405644774436951 0.28285953402519226 0.16715660691261292
0 0.3737491965293884 0.5101150274276733 0.17290887236595154 0.40456604957580566
0 0.4845519959926605 0.5157043933868408 0.23527127504348755 0.6553317308425903

[[ 1.  2.  1.  2.  2.  2.  2.  2.  2.  2.  2.  1.  1.  1.  0.  1.  2.  1.  1.]
 [ 0.  2.  2.  1.  2.  0.  2.  0.  0.  1.  2.  1.  1.  0. -1.  2.  2.  1.  2.]
 [ 0.  0.  0.  0.  2.  0.  2.  0.  0.  2.  0.  2.  1.  2.  0.  0.  0.  0.  0.]
 [ 2.  1.  0.  0.  2.  0.  2.  0.  1.  2.  0.  0.  1.  1.  0.  0.  0.  1.  0.]
 [ 1.  0.  0.  0.  2.  0.  2.  1.  0.  2.  0.  2.  0.  2.  0. -1.  0.  0.  0.]
 [ 1.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  1.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 1.  0.  1.  0.  0.  1.  0.  0.  0.  1.  0.  1.  0.  0.  0.  0.  1.  0.  0.]
 [ 2.  2.  2.  1.  1.  1.  2.  1.  1.  1.  1.  0.  1.  1.  0.  1.  2.  1.  2.]
 [ 1.  0.  2.  0.  0.  0.  1.  0.  0.  0. -1. -1.  0.  0.  0.  0.  1.  0.  1.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]]

得到扫描数据后,就可以利用模板生成最终的分数报告了,如图 6 所示。如果不习惯在 Python 中处理数据,也可以直接保存为 csv 后再用其他软件处理。这里要注意由于图像的 xy 与我们习惯的坐标系是相反的,输出时将矩阵进行 Transpose 即可

save = np.transpose(scoreboard)
np.savetxt('scanned.csv', save, delimiter=',', fmt='%f')
Figure 6. 生成的最终成绩统计表,可用于在单张 A4 纸上打印。包含了原始记录数据,计算的分项得分以及各种统计量。同时,输出一张直方图以检查分值分布。

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.