Unofficial PyTorch implementation of the paper, which integrates not only global semantic reasoning module but also parallel visual attention module and visual-semantic fusion decoder.the semanti reasoning network(SRN) can be trained end-to-end.
At present, the accuracy of the paper cannot be achieved. And i borrowed code from deep-text-recognition-benchmark
model
result
IIIT5k_3000 | SVT | IC03_860 | IC03_867 | IC13_857 | IC13_1015 | IC15_1811 | IC15_2077 | SVTP | CUTE80 |
---|---|---|---|---|---|---|---|---|---|
84.600 | 83.617 | 92.907 | 92.849 | 90.315 | 88.177 | 71.010 | 68.064 | 71.008 | 68.641 |
total_accuracy: 80.597
Feature
- predict the character at once time
- DistributedDataParallel training
Requirements
Pytorch >= 1.1.0
Test
-
download the evaluation data from deep-text-recognition-benchmark
-
download the pretrained model from Baidu, Password: d2qn
-
test on the evaluation data
python test.py --eval_data path-to-data --saved_model path-to-model
Train
-
download the training data from deep-text-recognition-benchmark
-
training from scratch
python train.py --train_data path-to-train-data --valid-data path-to-valid-data
Reference
- bert_ocr.pytorch
- deep-text-recognition-benchmark
- 2D Attentional Irregular Scene Text Recognizer
- Towards Accurate Scene Text Recognition with Semantic Reasoning Networks
difference with the origin paper
- use resnet for 1D feature not resnetFpn 2D feature
- use add not gated unit for visual-semanti fusion decoder
other
It is difficult to achieve the accuracy of the paper, hope more people to try and share