When this option is used, the threshold is adapted to the cropped
image, i.e., after the "crop" command, but not directly before.
This allows to avoid adjusting the threshold to the full image,
and thus potentially reduce the time needed for recognition.
When there is a bit of noise in the image, the segmentation
step might find lots of small potential digits that are not
really digits (or other characters) of the display. Given
sufficiently large display characters, it may be possible
to specify minimum character dimensions to remove spurious
potential characters (digits) based on their size.
In 2017, I received a report of ssocr being used to interpret the
status display of some Chinese table tennis robot. The user had
implemented character recognition in a Perl script that interpreted
the ASCII art segments in ssocr's debug output.
The new ssocr character set "tt_robot" implements the exact same
segment interpretations as found in the Perl script mentioned above.
This even includes two erroneous '7' definitions that work around
recognition problems.
This adds an option to enable white spece detection, and two
further options to control the operation of white space detection.
White space detection (--print-spaces) is intended for use cases
where digit (resp. character) grouping is important for correct
interpretation. One use case is the recognition of superimposed
dates in photographic images.
This commit also increases the version number to 2.21.0 and tweaks
some debug output.
This is the first step towards support of different character sets.
Different character sets are intended to be used to e.g. select
between '6' and 'b', but also to receive an error if e.g. a decimal
display is recognized as a hexadecimal digit.
The decimal point is not ignored, it is found during image segmentation
and recognized as a decimal point, but then it is omitted from the
output. Thus '--omit-decimal-point' is a more fitting name.
The changed output format can be used for further processing in an
external filter to e.g. recognize characters not known to ssocr, or use
some fuzzy matching rules to work around variations in image quality.
This options prints the recognized segments, i.e. the display as
seen by ssocr, to standard error. This can be used as a quick check
what went wrong if the recognition does not work. Additionally,
it can be used to get the raw segment data to use a separate
program to interpret set segments as digits or characters.
The description of the -b, --background option wronlgy stated that
this sets the foreground color to the given value, but sets the background
color to the given value.
Bug reported by Robert Sund.
a minus ('-') sign.
The patch fixes a comment typo as well.
After applying this patch, ssocr will crash if it finds a "digit" with
zero height. This will be fixed in the next commit.
Instead of specifying the excat number of digits in the display, use
--digits -1 to have ssocr auto-detect the number of digits. When this
is used, ssocr cannot check if the correct number of digits has been
recognized.
Bumped version number to 2.13.0 to indicate a new feature.
- no functional changes
- this is another step towards refactoring the recognition algorithm
- this is another step towards factoring out the image access routines to
ultimately replace Imlib2 by something else (e.g. gd)