自然场景中的中文识别尝试

想做个考试作弊系统[捂脸]……

又想作弊了

最近想整个通过图像识别自然场景中的中文“试卷” -> 自然语言处理 -> 查找题库 -> 返回结果的小程序。目前考虑的是,图像通过树莓派的摄像头产生,将结果输出到柔性墨水屏上。

1 Connectionist Text Proposal Network

据说ctpn算法在长文本上也能够保持一个不错的效果,先试试吧

1.1 虚拟环境

一开始想用pipenv,不过在安装Cythonlock package的时候就挂起不动了,搜了一下估计是跟跟网络和依赖有关系,因此就直接用Python自带的venv做虚拟环境管理了。

虚拟环境主要需要安装tensorflow,所以新建一个tf虚拟空间并激活:

1
2
3
python3 -m venv tf
cd tf
source bin/activate

包依赖有Cythonnumpyopencv。安装numpy之前记得安装sudo apt install libatlas-base-dev,不然会报libf77blas.so.3找不到的错误。

1
2
3
pip install numpy
pip install tensorflow
proxychains4 git clone https://github.com/eragonruan/text-detection-ctpn.git

测试import tensorflow的时候会报一堆Future Warning

1.2 安装OpenCV

安装OpenCV很麻烦,单独拉出来写。

1.2.1 准备工作

  1. 打开摄像头:sudo raspi-config -> Enable camera
  2. 扩展存储卡空间,以防不测:sudo raspi-config -> Advanced Options -> Expand filesystem
  3. 重启:sudo reboot -h now,检查空间df -h
  4. 可选,清理不常用的程序:
    1
    2
    3
    4
    sudo apt-get purge wolfram-engine
    sudo apt-get purge libreoffice*
    sudo apt-get clean
    sudo apt autoremove

1.2.2 安装OpenCV依赖

更新系统:

1
time sudo apt update && time sudo apt upgrade && time sudo apt dist-upgrade

安装cmake等开发工具:

1
sudo apt install -y build-essential cmake unzip pkg-config

安装图片和视频相关库:

1
2
3
sudo apt install -y libjpeg-dev libpng-dev libtiff-dev
sudo apt install -y libavcodec-dev libavformat-dev libswscale-dev libv4l-dev
sudo apt install -y libxvidcore-dev libx264-dev

可选,安装GUI库GTK以及能减少GTK警告的库[捂脸](名字中的*号可以匹配到ARM版的库)

1
2
sudo apt install -y libgtk-3-dev
sudo apt install -y libcanberra-gtk*

OpenCV安装数值优化库:

1
sudo apt install -y libatlas-base-dev gfortran

最后安装Python的头文件库:

1
sudo apt-get install python3-dev

1.2.3 下载OpenCV

下载opencv和附加的模块opencv_contrib,这些附加模块和函数可能会被经常用到:

1
2
3
4
5
6
7
8
#proxychains4 wget -O opencv_4.1.1.zip https://github.com/opencv/opencv/archive/4.1.1.zip
#proxychains4 wget -O opencv_contrib_4.1.1.zip https://github.com/opencv/opencv_contrib/archive/4.1.1.zip
proxychains4 wget -O opencv_4.1.0.zip https://github.com/opencv/opencv/archive/4.1.0.zip
proxychains4 wget -O opencv_contrib_4.1.0.zip https://github.com/opencv/opencv_contrib/archive/4.1.0.zip
#proxychains4 wget -O opencv_4.0.1.zip https://github.com/opencv/opencv/archive/4.0.1.zip
#proxychains4 wget -O opencv_contrib_4.0.1.zip https://github.com/opencv/opencv_contrib/archive/4.0.1.zip
#proxychains4 wget -O opencv_4.0.0.zip https://github.com/opencv/opencv/archive/4.0.0.zip
#proxychains4 wget -O opencv_contrib_4.0.0.zip https://github.com/opencv/opencv_contrib/archive/4.0.0.zip

解压:

1
2
unzip opencv_4.1.0.zip
unzip opencv_contrib_4.1.0.zip

注意:我尝试编译4.1.1版,在编译到49%的时候总会出现:

1
2
3
[ 49%] Linking CXX shared library ../../lib/libopencv_imgproc.so
[ 49%] Built target opencv_imgproc
make: *** [Makefile:163: all] Error 2

换4.1.0版就好了,耗时数小时。

1.2.4 CMake及编译

使用cmake构建编译,再用make进行编译,这一步非常耗时。先新建一个build文件夹:

1
2
3
cd opencv-4.1.0
mkdir build
cd build

运行cmake,其中:

  • OPENCV_ENABLE_NONFREE=ON标识可以让我们在OpenCV 4中使用SIFT/SURF等专利算法。
  • OPENCV_GENERATE_PKGCONFIG=ON生成opencv4.pc用于后面darknet的编译。
  • 注意:保证OPENCV_EXTRA_MODULES_PATH的路径正确,不然会出现类似 sys/videoio.h: No such file or directory 的错误。
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    cmake -D CMAKE_BUILD_TYPE=RELEASE \
    -D CMAKE_INSTALL_PREFIX=/usr/local \
    -D OPENCV_EXTRA_MODULES_PATH=~/py37/opcv/opencv_contrib-4.1.0/modules \
    -D ENABLE_NEON=ON \
    -D ENABLE_VFPV3=ON \
    -D BUILD_TESTS=OFF \
    -D OPENCV_ENABLE_NONFREE=ON \
    -D OPENCV_GENERATE_PKGCONFIG=ON \
    -D INSTALL_PYTHON_EXAMPLES=OFF \
    -D BUILD_EXAMPLES=OFF ..

接下来要为树莓派增加交换区空间,以便能够利用全部的四颗核心,否则可能会因为内存耗尽而挂起编译,打开/etc/dphys-swapfile文件:

1
sudo vim /etc/dphys-swapfile

修改CONF_SWAPSIZE标识,我们将交换区从100M增加到2048M:

1
2
3
4
# set size to absolute value, leaving empty (default) then uses computed value
# you most likely don't want this, unless you have an special disk situation
# CONF_SWAPSIZE=100
CONF_SWAPSIZE=2048

重启交换服务:

1
2
sudo /etc/init.d/dphys-swapfile stop
sudo /etc/init.d/dphys-swapfile start

注意:增加交换区空间有可能导致存储卡卡损坏,因为基于闪存的存储器的读写次数是有限的。我们这里仅在编译时增加交换区空间。

编译,用-j4告诉make使用四颗核心,这是最耗时的一步,大约用了4-5小时:

1
make -j4

安装:

1
2
sudo make install
sudo ldconfig

恢复交换区空间,打开/etc/dphys-swapfile文件:

1
sudo vim /etc/dphys-swapfile

恢复CONF_SWAPSIZE=100,并重启交换服务:

1
2
sudo /etc/init.d/dphys-swapfile stop
sudo /etc/init.d/dphys-swapfile start

1.2.5 为Python虚拟环境建立OpenCV 4连接

在我们的tf虚拟环境中建立OpenCV的软链接:

1
ln -s /usr/local/lib/python3.7/site-packages/cv2/python-3.7/cv2.cpython-37m-arm-linux-gnueabihf.so /home/pi/cn_dect/tf/lib/python3.7/site-packages/cv2.so

软链接的源和目的路径一定要写正确,其中源路径可以在安装的sudo make install命令输出中看到:

1
2
3
4
5
6
7
-- Installing: /usr/local/lib/python3.7/site-packages/cv2/__init__.py
-- Installing: /usr/local/lib/python3.7/site-packages/cv2/load_config_py2.py
-- Installing: /usr/local/lib/python3.7/site-packages/cv2/load_config_py3.py
-- Installing: /usr/local/lib/python3.7/site-packages/cv2/config.py
-- Installing: /usr/local/lib/python3.7/site-packages/cv2/python-3.7/cv2.cpython-37m-arm-linux-gnueabihf.so
-- Set runtime path of "/usr/local/lib/python3.7/site-packages/cv2/python-3.7/cv2.cpython-37m-arm-linux-gnueabihf.so" to "/usr/local/lib"
-- Installing: /usr/local/lib/python3.7/site-packages/cv2/config-3.7.py

1.2.6 测试OpenCV

测试虚拟环境中的OpenCV:

1
2
3
4
5
6
7
Python 3.7.3 (default, Apr  3 2019, 05:39:12)
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cv2
>>> cv2.__version__
'4.1.0'
>>>

成功。

1.3 测试text-detection-ctpn

无奈的是ctpn训练好的模型很大,一加载TensorFlow就报错,只好再把swap区改成2048[捂脸]。不知道这么玩下去存储卡能撑多久……

1
python ./main/demo.py

测试ctpn检测,还是比较慢的,第一张图在树莓派上就用了35秒。我准备再试试chineseocr这种端到端杂烩项目。

2 OCR

大概扫了一眼chineseocr这个项目,估计是用yolo3/darknet做的文本检测,再用基于pytorch的crnn做检测到的文本的ocr。

2.1 虚拟环境

还是用原来的虚拟环境

1
2
cd ~/cn_dect/tf
source ./bin/activate

在虚拟环境中安装依赖:

1
pip3 install scipy numpy easydict Cython h5py lmdb mahotas pandas requests bs4 matplotlib lxml pillow web.py keras opencv-contrib-python==4.1.0.25

2.2 编译 & 配置darknet

下载项目和darknet:

1
2
3
proxychains4 git clone https://github.com/chineseocr/chineseocr.git
proxychains4 git clone https://github.com/pjreddie/darknet.git
mv darknet chineseocr/

修改darknet/Makefile

1
2
3
4
GPU=0
CUDNN=0
OPENCV=1
OPENMP=1

直接make时提示PKG_CONFIG_PAT中找不到opencv.pc,手动帮它找到:

1
2
cp /usr/local/lib/pkgconfig/opencv4.pc /usr/local/lib/pkgconfig/opencv.pc
export PKG_CONFIG_PAT=/usr/local/lib/pkgconfig

编译过程出现./src/image_opencv.cpp:12:1: error: ‘IplImage’ does not name a type的错误,这是由于我们用的新版OpenCV导致的兼容性问题,此时需要pull一个补丁

1
2
3
git fetch origin pull/1348/head:opencv4
git checkout opencv
make -j4

darknet/python/darknet.py第48行改成刚编译好的libdarknet.so的路径:

1
lib = CDLL("/home/pi/cn_dect/chineseocr/darknet/libdarknet.so", RTLD_GLOBAL)

2.3 下载OCR模型

因为是百度网盘上的东西,我用bnd2加了个速。

将下载的所有文件放在models文件夹下。(如果是用python -m http.server 8000做临时中转,wget -r http://192.168.1.100:8000可以递归下载下所有目录和文件)

2.4 安装PyTorch

安装系统依赖:

1
sudo apt install libopenblas-dev libblas-dev m4 cmake cython python3-dev python3-yaml python3-setuptools libatomic-ops-dev

下载并安装:

1
2
3
mkdir pytorch_install && cd pytorch_install
proxychains4 git clone --recursive https://github.com/pytorch/pytorch
cd pytorch

貌似protobuf这个库有个bug,会导致在编译近半时出现:

1
2
3
[ 43%] Linking CXX executable ../../../bin/protoc
/usr/bin/ld: ../../../lib/libprotobuf.a(arena.cc.o): in function `google::protobuf::internal::ArenaImpl::Init()':
arena.cc:(.text+0x24): undefined reference to `__atomic_fetch_add_8'

需要更新一下protobuf修复bug:

1
git submodule update --remote third_party/protobuf

因为我试了好几次了,因此用export BUILD_TEST=0跳过测试以缩短编译时间,设置环境并编译:

1
2
3
4
5
6
7
8
export NO_CUDA=1
export NO_DISTRIBUTED=1
export NO_MKLDNN=1
export NO_NNPACK=1
export NO_QNNPACK=1
export BUILD_TEST=0

python3 setup.py build

接下来正式安装,安装前需要注意的是,一定要记得带着上面的export的环境变量安装(因为编译时间太长,有时候编译完了重启就忘了设置环境变量):

1
python3 setup.py install

小注意一下:如果安装完成后不切换pwd直接试就会出现No module named 'torch._C'的异常,因为编译目录下就有一个torch目录,切换工作目录再启动Python即可。

2.5 安装CTC模型

1
2
3
4
pip install wget
git clone --recursive https://github.com/parlance/ctcdecode.git
cd ctcdecode
pip install .

其中--recursive指将引用的第三方git repo一并克隆。安装会持续个把小时,可以追几集番了。

2.6 下载语言模型

不到3个G,挂梯子下载了:

1
2
cd chineseocr/models/
proxychains4 wget https://deepspeech.bj.bcebos.com/zh_lm/zh_giga.no_cna_cmn.prune01244.klm

2.7 运行

按自己的情况修改config.py文件,运行python app.py 8080即可。在树莓派3B+上的结果是,内存耗尽[捂脸]总共不到1G的内存,连一个CTC模型都读不进去[捂脸]

更新到最新的app分支

1
git pull origin app

这下炸出了一个貌似是bug的问题……这个分支(git版本daabebc93a8b4436a3653f83668429ab99eefaea)跟以前最大的不同就是一开始就在用kerastensorflow创建yolo3的darknet,结果在text/keras_yolo3.py的第285行boxes = concatenate(boxes, axis=0)报错:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
>>> boxes  = concatenate(boxes, axis=0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/pi/cn_dect/tf/lib/python3.7/site-packages/tensorflow_core/python/keras/layers/merge.py", line 687, in concatenate
return Concatenate(axis=axis, **kwargs)(inputs)
File "/home/pi/cn_dect/tf/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 630, in __call__
base_layer_utils.create_keras_history(inputs)
File "/home/pi/cn_dect/tf/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer_utils.py", line 199, in create_keras_history
_, created_layers = _create_keras_history_helper(tensors, set(), [])
File "/home/pi/cn_dect/tf/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer_utils.py", line 245, in _create_keras_history_helper
layer_inputs, processed_ops, created_layers)
File "/home/pi/cn_dect/tf/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer_utils.py", line 245, in _create_keras_history_helper
layer_inputs, processed_ops, created_layers)
File "/home/pi/cn_dect/tf/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer_utils.py", line 245, in _create_keras_history_helper
layer_inputs, processed_ops, created_layers)
[Previous line repeated 2 more times]
File "/home/pi/cn_dect/tf/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer_utils.py", line 243, in _create_keras_history_helper
constants[i] = backend.function([], op_input)([])
File "/home/pi/cn_dect/tf/lib/python3.7/site-packages/tensorflow_core/python/keras/backend.py", line 3349, in __call__
run_metadata=self.run_metadata)
File "/home/pi/cn_dect/tf/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1450, in __call__
run_metadata_ptr)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder_367' with dtype float and shape [2]
[[{{node Placeholder_367}}]]

我目前的环境是:

1
2
3
4
5
6
7
8
Python 3.7.3 (default, Apr  3 2019, 05:39:12)
[GCC 8.2.0] on linux
# TensorFlow版本
>>> tf.__version__
'1.13.1'
# keras版本
>>> K.__version__
'2.2.5'

一步一步的在REPL中调试,只要一concatenatePlaceholder_367就报错,这个Placeholder就是text/keras_detect.py中第23行就初始化了的那个input_shape。不知道为什么,我在我自己的电脑上试了,同样的代码,完全没有问题,咱也不懂不知道,咱也懒得在github上问[捂脸]

其中,环境上的不同在于,电脑用的Python 3.6,TensorFlow 1.14.0,Keras 2.2.4。打算重新写张存储卡用Python 3.6的整套环境在树莓派上试一下,毕竟这张卡上还有波达方向定位的程序呢(用了Python 3.7的协程新特性)。之所以没有用chineseocr项目要求的版本是因为树莓派上安装软件真的不容易[捂脸]

调试代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
import sys
sys.path.append(r'~/cn_dect/chineseocr')
#sys.path.append(r'~/PycharmProjects/chineseocr')
from config import *
os.environ["CUDA_VISIBLE_DEVICES"] = ''
scale,maxScale = IMGSIZE[0],2048
from text.keras_yolo3 import yolo_text,box_layer,K
import tensorflow as tf
import numpy as np

graph = tf.get_default_graph()##解决web.py 相关报错问题

anchors = [float(x) for x in keras_anchors.split(',')]
anchors = np.array(anchors).reshape(-1, 2)
num_anchors = len(anchors)
num_classes = len(class_names)
K.clear_session()
tf.reset_default_graph()
textModel = yolo_text(num_classes,anchors)
textModel.load_weights(kerasTextModel)
#textModel.load_weights(r'~/PycharmProjects/text.h5')
sess = K.get_session()
image_shape = K.placeholder(shape=(2, ))##图像原尺寸:h,w
input_shape = K.placeholder(shape=(2, ))##图像resize尺寸:h,w


y1,y2,y3 = [*textModel.output]
out = [y1,y2,y3]

num_layers = len(out)
anchor_mask = [[6,7,8], [3,4,5], [0,1,2]]
boxes = []
scores =[]
input_shape = K.cast(input_shape, tf.float32)
image_shape = K.cast(image_shape, tf.float32)
#from keras.utils import plot_model
#plot_model(textModel, to_file='model.png')

def yolo_head(feats, anchors, num_classes, input_shape, calc_loss=False):
"""Convert final layer features to bounding box parameters."""
num_anchors = len(anchors)
# Reshape to batch, height, width, num_anchors, box_params.
anchors_tensor = K.reshape(K.constant(anchors), [1, 1, 1, num_anchors, 2])

grid_shape = K.shape(feats)[1:3] # height, width
grid_y =tf.tile(K.reshape(K.arange(0, stop=grid_shape[0]), [-1, 1, 1, 1]),
[1, grid_shape[1], 1, 1])
grid_x =tf.tile(K.reshape(K.arange(0, stop=grid_shape[1]), [1, -1, 1, 1]),
[grid_shape[0], 1, 1, 1])
grid = K.concatenate([grid_x, grid_y])
grid = K.cast(grid, K.dtype(feats))

feats = K.reshape(
feats, [-1, grid_shape[0], grid_shape[1], num_anchors, num_classes + 5])

# Adjust preditions to each spatial grid point and anchor size.
box_xy = (K.sigmoid(feats[..., :2]) + grid) / K.cast(grid_shape[::-1], K.dtype(feats))
box_wh = K.exp(feats[..., 2:4]) * anchors_tensor / K.cast(input_shape[::-1], K.dtype(feats))
box_confidence = K.sigmoid(feats[..., 4:5])
box_class_probs = K.sigmoid(feats[..., 5:])

if calc_loss == True:
return grid, feats, box_xy, box_wh
return box_xy, box_wh, box_confidence, box_class_probs


for lay in range(num_layers):
box_xy, box_wh, box_confidence, box_class_probs = yolo_head(out[lay],anchors[anchor_mask[lay]], num_classes, input_shape)
#box_xy = (box_xy - offset) * scale
#box_wh = box_wh*scale

box_score = box_confidence * box_class_probs
box_score = K.reshape(box_score, [-1, num_classes])

box_mins = box_xy - (box_wh / 2.)
box_maxes = box_xy + (box_wh / 2.)
box = K.concatenate([
box_mins[..., 0:1], # xmin
box_mins[..., 1:2], # ymin
box_maxes[..., 0:1], # xmax
box_maxes[..., 1:2] # ymax
],axis=-1)

box = K.reshape(box, [-1, 4])

boxes.append(box)

scores.append(box_score)

concatenate = tf.keras.layers.concatenate
boxes = concatenate(boxes, axis=0)

2.8 更换环境再试一遍

由于新的buster系统都是自带Python3.7,于是打算用pipenv创建Python3.6版本的环境:

  1. 首先,pipenv在系统只有3.7版本的时候并不能直接创建3.6,所以我手动sudo apt install python3.6
  2. 结果发现并不是那么回事,这时候再次pipenv --python 3.6会报错,大概是说python3.6distutils找不到模块;
  3. 网上说可能是Linux自带的Python损坏,sudo apt install python3-distutils就好了,我试了一下,3.7确实是好了,3.6依然不行;
  4. 我打算用系统级Python多版本切换:

    1
    2
    3
    4
    5
    6
    sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.6 1
    sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.7 2
    配置版本
    sudo update-alternatives --config python3
    或切换设置
    sudo update-alternatives --set python3 /usr/bin/python3.6

    重新设3.6为系统默认Python,再执行apt或者dpkg安装说不定就安装到3.6下面去了,结果发现还是太天真了。
    试完了想把系统Python切换到Python3.6上,再运行apt install python3-distutils,结果并不可以。dist这个库安装完成后使用dpkg -L python3-distutils查看发现它已经在/usr/lib/python3.7/distutils文件夹下了,没法再装给Python3.6。被逼无奈只好强行把dpkg列出来的文件拷过去了,反正都是纯Python代码且发布无所谓小版本:

    1
    sudo cp -r /usr/lib/python3.7/distutils/* /usr/lib/python3.6/distutils/
  5. 这下终于好了,运行pipenv --python 3.6报环境变量的错,按提示重新配置即可:

    1
    2
    export LC_ALL=C.UTF-8
    export LANG=C.UTF-8

    在运行pipenv终于成功安装了Python 3.6的环境。

装TensorFlow和Pytorch这两个大家伙,发现piwheel上没有现成的numpy[捂脸]

分享到