When this option is used, the threshold is adapted to the cropped
image, i.e., after the "crop" command, but not directly before.
This allows to avoid adjusting the threshold to the full image,
and thus potentially reduce the time needed for recognition.
Instead of adapting the threshold to the image before executing
commands, adapt the threshold just before it is needed.
This allows to avoid theshold adaptation when -p, --process-only
is used with only the "grayscale" and/or "mirror" commands.
This also prepares the code to allow introduction of a new option
to avoid adapting the threshold to the original image before the
"crop" command is applied.
Commit 67abe0fba8 improved the
speed for determining both minimum and maximum luminance (gray)
values for the image. This speeds up some debugging output,
and also using the gray_stretch command together with option -g.
(I did not notice that initially.)
When the widest digit found in the image is a one, it is likely
that a decimal separator is nearly as wide as this digit. Thus
it cannot be recognized, because the decimal separator needs to
be at most half as wide as the widest digit (before this commit).
Thus add an additional pass over the digits for this special case.
This pass comes after the existing recognition passes for the
digit one, decimal separator, and minus sign. In the new pass,
the width of the digit is ignored.
This addresses GitHub issue #26.
The latest release date is extracted from the NEWS file,
i.e., it depends only on the sources, not the build date.
This is intended to help in creating reproducible builds
by avoiding timestamps. It is also closer to the date of
the contents of the man page than using just the latest
copyright year.
I do not have a perfect solution that works for both a git
clone and a downloaded tar ball. This solution works well
for released tar balls of the ssocr sources.
In order to help creating reproducible builds of ssocr,
do not use the build day of the man page as the date
inside the man page. Instead, use the latest copyright
year of ssocr.
Using the man page build date has always been problematic,
because it is misleading. But I do not have a general
automatic way to maintain the last change date for the
man page that works for both git clones and tar balls.
This seems like an improvement to me. It provides some
idea of how old the man page is, and this date depends
only on the ssocr source code, not the build date.
This should help with one of the two problems reported
in GitHub issue #22.
This ssocr release adds new features and thus addresses
GitHub issue #21:
* new option -N, --min-segment=SIZE
* new option -M, --min-char-dims=WxH
* a range of expected digits can be specified (before, only
a single number could be specified, or the number could be
left unspecified)
When there is a bit of noise in the image, the segmentation
step might find lots of small potential digits that are not
really digits (or other characters) of the display. Given
sufficiently large display characters, it may be possible
to specify minimum character dimensions to remove spurious
potential characters (digits) based on their size.
In 2017, I received a report of ssocr being used to interpret the
status display of some Chinese table tennis robot. The user had
implemented character recognition in a Perl script that interpreted
the ASCII art segments in ssocr's debug output.
The new ssocr character set "tt_robot" implements the exact same
segment interpretations as found in the Perl script mentioned above.
This even includes two erroneous '7' definitions that work around
recognition problems.