cnocr 识别七段数码

今日头条 中的博文两款开源的中文OCR工具 介绍了两款OCR工具。对于其中的**CNOCR ** 进行了测试。可以作为今后研究的工具。


01安装cnocr

可以使用pip进行安装

pip install cnocr

也可以使如下的命令安装V1.1.0版本

pip install cnocr=1.1.0


02初步实验

1. 屏幕截取的文字

▲ 屏幕截取的一段文字

▲ 屏幕截取的一段文字

  • 识别时间:1.98
  • 识别结果:

[[‘●’, ‘更’, ‘新’, ‘了’, ‘训’, ‘练’, ‘代’, ‘码’, ‘,’, ‘使’, ‘用’, ‘m’, ‘x’, ‘n’, ‘e’, ‘t’, ‘的’, ‘r’, ‘e’, ‘c’, ‘o’, ‘r’, ‘d’, ‘i’, ‘o’, ‘首’, ‘先’, ‘把’, ‘数’, ‘据’, ‘转’, ‘换’, ‘成’, ‘二’, ‘进’, ‘制’, ‘格’, ‘式’, ‘,’, ‘提’, ‘升’, ‘后’, ‘续’, ‘的’], [‘训’, ‘练’, ‘效’, ‘率’, ‘。’, ‘训’, ‘练’, ‘时’, ‘支’, ‘持’, ‘对’, ‘图’, ‘片’, ‘做’, ‘实’, ‘时’, ‘数’, ‘据’, ‘增’, ‘强’, ‘。’, ‘也’, ‘加’, ‘入’, ‘了’, ‘更’, ‘多’, ‘可’, ‘传’, ‘入’, ‘的’, ‘参’, ‘数’, ‘。’], [‘●’, ‘允’, ‘许’, ‘训’, ‘练’, ‘集’, ‘中’, ‘的’, ‘文’, ‘字’, ‘数’, ‘量’, ‘不’, ‘同’, ‘,’, ‘目’, ‘前’, ‘是’, ‘中’, ‘文’, ‘1’, ‘0’, ‘个’, ‘字’, ‘,’, ‘英’, ‘文’, ‘2’, ‘0’, ‘个’, ‘字’, ‘母’, ‘。’], [’。’, ‘提’, ‘供’, ‘了’, ‘更’, ‘多’, ‘的’, ‘模’, ‘型’, ‘选’, ‘择’, ‘,’, ‘允’, ‘许’, ‘大’, ‘家’, ‘按’, ‘需’, ‘训’, ‘练’, ‘多’, ‘种’, ‘不’, ‘同’, ‘大’, ‘小’, ‘的’, ‘识’, ‘别’, ‘模’, ‘型’, ‘。’], [‘●’, ’ ', ‘内’, ‘置’, ‘了’, ‘各’, ‘种’, ‘训’, ‘练’, ‘好’, ‘的’, ‘模’, ‘型’, ‘,’, ‘最’, ‘小’, ‘的’, ‘模’, ‘型’, ‘只’, ‘有’, ‘之’, ‘前’, ‘模’, ‘型’, ‘的’, ‘1’, ‘/’, ‘5’, ‘大’, ‘小’, ‘。’, ‘所’, ‘有’, ‘模’, ‘型’, ‘都’, ‘可’, ‘免’, ‘费’], [‘使’, ‘用’, ‘。’]]

2.屏幕截取的英文

▲ 屏幕截取的英文文字

▲ 屏幕截取的英文文字

  • 识别所使用的时间:2.376136064529419

  • 识别的结果:
    [[‘E’, ‘r’, ‘n’, ‘e’, ‘s’, ‘t’, ’ ', ‘R’, ‘u’, ‘t’, ‘h’, ‘e’, ‘r’, ‘f’, ‘o’, ‘r’, ‘d’, ‘,’, ‘i’, ‘n’, ’ ', ‘f’, ‘u’, ‘l’, ‘l’, ’ ', ‘E’, ‘r’, ‘n’, ‘e’, ‘s’, ‘t’, ’ ', ‘R’, ‘u’, ‘t’, ‘h’, ‘e’, ‘r’, ‘f’, ‘o’, ‘r’, ‘d’, ‘,’, ‘B’, ‘a’, ‘r’, ‘o’, ‘n’, ’ ', ‘R’, ‘u’, ‘t’, ‘h’, ‘e’, ‘r’, ‘f’, ‘o’, ‘r’, ‘d’, ’ ', ‘o’, ‘f’], [‘N’, ‘e’, ‘l’, ‘s’, ‘o’, ‘n’, ‘,’, ‘o’, ‘f’, ’ ', ‘C’, ‘a’, ‘m’, ‘b’, ‘r’, ‘i’, ‘d’, ‘g’, ‘e’, ‘,’, ‘(’, ‘b’, ‘o’, ‘r’, ‘n’, ’ ', ‘A’, ‘u’, ‘g’, ‘s’, ‘t’, ’ ', ‘3’, ‘0’, ‘,’, ‘1’, ‘8’, ‘7’, ‘1’, ‘,’, ‘S’, ‘p’, ‘r’, ‘i’, ‘n’, ‘g’, ’ ', ‘G’, ‘r’, ‘o’, ‘v’, ‘e’, ‘,’, ’ ', ‘N’, ‘e’, ‘w’, ’ ', ‘Z’, ‘e’, ‘a’, ‘l’, ‘a’, ‘n’, ‘d’, ‘-’], [‘d’, ‘i’, ‘e’, ‘d’, ’ ', ‘O’, ‘c’, ‘t’, ‘o’, ‘b’, ‘e’, ‘r’, ’ ', ‘1’, ‘9’, ‘,’, ‘1’, ‘9’, ‘3’, ‘7’, ‘,’, ‘C’, ‘a’, ‘m’, ‘b’, ‘r’, ‘i’, ‘d’, ‘g’, ‘e’, ‘,’, ‘C’, ‘a’, ‘m’, ‘b’, ‘r’, ‘i’, ‘d’, ‘g’, ‘e’, ‘s’, ‘h’, ‘i’, ‘r’, ‘e’, ‘,’, ‘E’, ‘n’, ‘g’, ‘l’, ‘a’, ‘n’, ‘d’, ‘)’, ‘,’, ‘N’, ‘e’, ‘w’, ’ ', ‘Z’, ‘e’, ‘a’, ‘l’, ‘a’, ‘n’, ‘d’, ‘-’], [‘b’, ‘o’, ‘r’, ‘n’, ’ ', ‘B’, ‘r’, ‘i’, ‘t’, ‘i’, ‘s’, ‘h’, ’ ', ‘p’, ‘h’, ‘y’, ‘s’, ‘i’, ‘c’, ‘i’, ‘s’, ‘t’, ’ ', ‘c’, ‘o’, ‘n’, ‘s’, ‘i’, ‘d’, ‘e’, ‘r’, ‘e’, ‘d’, ’ ', ‘t’, ‘h’, ‘e’, ’ ', ‘g’, ‘r’, ‘e’, ‘a’, ‘t’, ‘e’, ‘s’, ‘t’, ’ ', ‘e’, ‘x’, ‘p’, ‘e’, ‘r’, ‘i’, ‘m’, ‘e’, ‘n’, ‘t’, ‘a’, ‘l’, ‘i’, ‘s’, ‘t’, ’ ', ‘s’, ‘i’, ‘n’, ‘c’, ‘e’, ’ ', ‘M’, ‘i’, ‘c’, ‘h’, ‘a’, ‘e’, ‘l’], [‘F’, ‘a’, ‘r’, ‘a’, ‘d’, ‘a’, ‘y’, ’ ', ‘(’, ‘1’, ‘7’, ‘9’, ‘1’, ‘-’, ‘1’, ‘8’, ‘6’, ‘7’, ‘)’, ‘.’, ’ ', ‘R’, ‘u’, ‘t’, ‘h’, ‘e’, ‘r’, ‘f’, ‘o’, ‘r’, ‘d’, ’ ', ‘w’, ‘a’, ‘s’, ’ ', ‘t’, ‘h’, ‘e’, ’ ', ‘c’, ‘e’, ‘n’, ‘t’, ‘r’, ‘a’, ‘l’, ’ ', ‘f’, ‘i’, ‘g’, ‘u’, ‘r’, ‘e’, ’ ', ‘i’, ‘n’, ’ ', ‘t’, ‘h’, ‘e’, ’ ', ‘s’, ‘t’, ‘u’, ‘d’, ‘y’, ’ ', ‘o’, ‘f’], [‘r’, ‘a’, ‘d’, ‘i’, ‘o’, ‘a’, ‘c’, ‘t’, ‘i’, ‘v’, ‘i’, ‘t’, ‘y’, ‘,’, ‘a’, ‘n’, ‘d’, ’ ', ‘w’, ‘i’, ‘t’, ‘h’, ’ ', ‘h’, ‘i’, ‘s’, ’ ', ‘c’, ‘o’, ‘n’, ‘c’, ‘e’, ‘p’, ‘t’, ’ ', ‘o’, ‘f’, ’ ', ‘t’, ‘h’, ‘e’, ’ ', ‘n’, ‘u’, ‘c’, ‘l’, ‘e’, ‘a’, ‘r’, ’ ', ‘a’, ‘t’, ‘o’, ‘m’, ’ ', ‘h’, ‘e’, ’ ', ‘l’, ‘e’, ‘d’, ’ ', ‘t’, ‘h’, ‘e’, ’ ', ‘e’, ‘x’, ‘p’, ‘l’, ‘o’, ‘r’, ‘a’, ‘t’, ‘i’, ‘o’, ‘n’, ’ ', ‘o’, ‘f’], [‘n’, ‘u’, ‘c’, ‘l’, ‘e’, ‘a’, ‘r’, ’ ', ‘p’, ‘h’, ‘y’, ‘s’, ‘i’, ‘c’, ‘s’, ‘.’, ‘H’, ‘e’, ’ ', ‘w’, ‘o’, ‘n’, ’ ', ‘t’, ‘h’, ‘e’, ’ ', ‘N’, ‘o’, ‘b’, ‘e’, ‘l’, ’ ', ‘P’, ‘r’, ‘i’, ‘z’, ‘e’, ’ ', ‘f’, ‘o’, ‘r’, ’ ', ‘C’, ‘h’, ‘e’, ‘m’, ‘i’, ‘s’, ‘t’, ‘r’, ‘y’, ’ ', ‘i’, ‘n’, ’ ', ‘1’, ‘9’, ‘0’, ‘8’, ‘,’, ‘w’, ‘a’, ‘s’, ’ ', ‘p’, ‘r’, ‘e’, ‘s’, ‘i’, ‘d’, ‘e’, ‘n’, ‘t’, ’ ', ‘o’, ‘f’], [‘t’, ‘h’, ‘e’, ’ ', ‘R’, ‘o’, ‘y’, ‘a’, ‘l’, ’ ', ‘S’, ‘o’, ‘c’, ‘i’, ‘e’, ‘t’, ‘y’, ’ ', ‘(’, ‘1’, ‘9’, ‘2’, ‘5’, ‘-’, ‘3’, ‘0’, ‘)’, ‘a’, ‘n’, ‘d’, ’ ', ‘t’, ‘h’, ‘e’, ’ ', ‘B’, ‘r’, ‘i’, ‘t’, ‘i’, ‘s’, ‘h’, ’ ', ‘A’, ‘s’, ‘s’, ‘o’, ‘c’, ‘i’, ‘a’, ‘t’, ‘i’, ‘o’, ‘n’, ’ ', ‘f’, ‘o’, ‘r’, ’ ', ‘t’, ‘h’, ‘e’, ’ ', ‘A’, ‘d’, ‘v’, ‘a’, ‘n’, ‘c’, ‘e’, ‘m’, ‘e’, ‘n’, ‘t’, ’ ', ‘o’, ‘f’], [‘S’, ‘c’, ‘i’, ‘e’, ‘n’, ‘c’, ‘e’, ’ ', ‘(’, ‘1’, ‘9’, ‘2’, ‘3’, ‘)’, ‘,’, ‘w’, ‘a’, ‘s’, ’ ', ‘c’, ‘o’, ‘n’, ‘f’, ‘e’, ‘r’, ‘r’, ‘e’, ‘d’, ’ ', ‘t’, ‘h’, ‘e’, ’ ', ‘O’, ‘r’, ‘d’, ‘e’, ‘r’, ’ ', ‘o’, ‘f’, ’ ', ‘M’, ‘e’, ‘r’, ‘i’, ‘t’, ’ ', ‘i’, ‘n’, ’ ', ‘1’, ‘9’, ‘2’, ‘5’, ‘,’, ‘a’, ‘n’, ‘d’, ’ ', ‘w’, ‘a’, ‘s’, ’ ', ‘r’, ‘a’, ‘i’, ‘s’, ‘e’, ‘d’, ’ ', ‘t’, ‘o’, ’ ', ‘t’, ‘h’, ‘e’], [‘p’, ‘e’, ‘e’, ‘r’, ‘a’, ‘g’, ‘e’, ’ ', ‘a’, ‘s’, ’ ', ‘L’, ‘o’, ‘r’, ‘d’, ’ ', ‘R’, ‘u’, ‘t’, ‘h’, ‘e’, ‘r’, ‘f’, ‘o’, ‘r’, ‘d’, ’ ', ‘o’, ‘f’, ’ ', ‘N’, ‘e’, ‘l’, ‘s’, ‘o’, ‘n’, ’ ', ‘i’, ‘n’, ’ ', ‘1’, ‘9’, ‘3’, ‘1’, ‘.’]]

Ernest Rutherford,in full Ernest Rutherford,Baron Rutherford of
Nelson,of Cambridge,(born Augst 30,1871,Spring Grove, New Zealand-
died October 19,1937,Cambridge,Cambridgeshire,England),New Zealand-
born British physicist considered the greatest experimentalist since Michael
Faraday (1791-1867). Rutherford was the central figure in the study of
radioactivity,and with his concept of the nuclear atom he led the exploration of
nuclear physics.He won the Nobel Prize for Chemistry in 1908,was president of
the Royal Society (1925-30)and the British Association for the Advancement of
Science (1923),was conferred the Order of Merit in 1925,and was raised to the
peerage as Lord Rutherford of Nelson in 1931.

3.识别七段数码管

(1)数码管

▲ 测试的数码管字符

▲ 测试的数码管字符

  • 识别时间: 0.922秒
  • 识别结果:
    [[‘目’, ‘囱’, ‘巳’, ‘曰’, ‘臼’], [‘S’, ‘日’, ‘囱’, ‘日’, ’ ', ‘臼’]]

(2)表格中的内容

▲ 字符表格的内容

▲ 字符表格的内容

没有任何结果输出:

(3)手写体识别

▲ 手写文字

▲ 手写文字

  • 识别时间: 0.865秒
  • 识别结果:
    手怎体文字
    AB数寇

4.实验程序

#!/usr/local/bin/python
# -*- coding: gbk -*-
#============================================================
# TEST1.PY                     -- by Dr. ZhuoQing 2020-05-26
#
# Note:
#============================================================
from headm import *
from cnocr                  import CnOcr
imageid = 7
file = tspgetdopfile(imageid)
#img = mx.image.imread(file, 1)
ocr = CnOcr()
res = ocr.ocr(file)
printf(res)
printf('\a')
#------------------------------------------------------------
#        END OF FILE : TEST1.PY
#============================================================


03进一步的测试结果

1. 字符

▲ 一小段黑色背景的字符

▲ 一小段黑色背景的字符

  • 识别时间: 0.774
  • 识别结果:
    –by Dr.ZhuoQing 2020-05-26

2. 一整段字符

▲ 黑色字体

▲ 黑色字体

  • 识别时间:0.883
  • 识别结果:
    两款开源的中文OCR工具,简直碉堡了

04结论

CNOCR的确是一款对英文和汉字识别很好的模型。但是对于7段字符识别则具有它的局限性。

已标记关键词 清除标记
©️2020 CSDN 皮肤主题: Age of Ai 设计师:meimeiellie 返回首页