143 Commits

Author SHA1 Message Date
Erik Auerswald e914669079 add option -F, --adapt-after-crop
When this option is used, the threshold is adapted to the cropped
image, i.e., after the "crop" command, but not directly before.
This allows to avoid adjusting the threshold to the full image,
and thus potentially reduce the time needed for recognition.
2025-03-23 19:03:48 +01:00
Erik Auerswald 94ce321060 lazily adapt threshold to image
Instead of adapting the threshold to the image before executing
commands, adapt the threshold just before it is needed.

This allows to avoid theshold adaptation when -p, --process-only
is used with only the "grayscale" and/or "mirror" commands.

This also prepares the code to allow introduction of a new option
to avoid adapting the threshold to the original image before the
"crop" command is applied.
2025-03-23 18:54:13 +01:00
Erik Auerswald 7cae796188 ssocr.c: code maintenance
Fix two minor issues with crop command execution:

 * correct two variable names in a comment, and
 * remove a useless imlib_context_set_image() call.
2025-03-23 14:27:52 +01:00
Erik Auerswald 7f2a9f3f22 warn when options -a and -T are used together 2025-03-22 18:45:15 +01:00
Erik Auerswald 56d1b22118 simplify some command execution code paths
Commands with optional argument had two code paths leading
to the respective function application, one of those with
hard-coded argument "1".  Instead, ensure the variable for
the optional argument is always set, and have just one
function call, always using this variable, per command.
2025-03-20 21:12:00 +01:00
Erik Auerswald fa30f473e7 simplify adapt_threshold() and iterative_threshold()
Both functions are always called with the same arguments.
Only the get_threshold() function is also called with
different x, y, w, h arguments (when used during dynamic
thresholding).
2025-03-18 21:31:19 +01:00
Erik Auerswald 67abe0fba8 combine get_minval() and get_maxval()
This simplifies the code a bit, and slightly speeds up using
option "-P, --debug-output".
2025-03-18 20:42:38 +01:00
Erik Auerswald ac060de96b speed up reading from standard input
Instead of reading one byte at a time, read data in chunks
of BUFSIZ bytes.

For one example, this improves recognition time for image
data read via standard input from 281ms to 53ms.  YMMV.
2025-03-18 20:10:51 +01:00
Erik Auerswald f012c14d93 refactoring and type consistency fixes
* The draw_pixel() function was called with an "image" parameter
  of type "Imlib_Image" instead of "Imlib_Image *".  This type
  error did not result in a compilation error, and thus stayed
  undetected in the code.

* Introduce a new function draw_color_pixel() similar to draw_pixel(),
  and use it instead of repeatedly open-coding this operation.
2025-02-02 20:32:48 +01:00
Erik Auerswald b44a4ad72a update copyright years 2025-02-01 19:33:29 +01:00
Erik Auerswald 88a627050a print warning when ignoring unknown luminance formula 2024-11-18 17:55:02 +01:00
Erik Auerswald 6e8c8361fd improve wording of unknown charset warning 2024-11-18 17:43:55 +01:00
Erik Auerswald acdf26c6bf print warning when ignoring unknown charset 2024-11-17 19:25:33 +01:00
Erik Auerswald 70692aa5c4 suppress debug output without -P, --debug-output 2024-11-17 19:10:34 +01:00
Erik Auerswald 9964c8ce85 fix copy & paste error in a comment 2024-11-17 19:09:00 +01:00
Erik Auerswald 894f3035fd fix a special case for decimal point recognition
When the widest digit found in the image is a one, it is likely
that a decimal separator is nearly as wide as this digit.  Thus
it cannot be recognized, because the decimal separator needs to
be at most half as wide as the widest digit (before this commit).

Thus add an additional pass over the digits for this special case.
This pass comes after the existing recognition passes for the
digit one, decimal separator, and minus sign.  In the new pass,
the width of the digit is ignored.

This addresses GitHub issue #26.
2024-06-22 20:04:08 +02:00
Erik Auerswald 6569ca9f6b update copyright years 2024-02-23 19:03:58 +01:00
Erik Auerswald 81c5472b69 add program name to warning messages 2023-09-10 14:49:12 +02:00
Erik Auerswald 6fc62729e9 improve error message for wrong number of digits
* Use different messages for single digit and digit ranges.
* Use singular when looking for one digit, plural otherwise.
2023-09-10 14:35:49 +02:00
Erik Auerswald 19f93dcb58 fix debug output regarding flag DEBUG_OUTPUT 2023-07-26 19:50:15 +02:00
Erik Auerswald 5ca3455873 add ssocr version to debug output 2023-07-26 19:47:01 +02:00
Erik Auerswald 0c3adc9002 avoid useless digit memory copy
There is no need to copy digit memory if all potential
digits are kept.
2023-05-13 20:05:23 +02:00
Erik Auerswald ad9027b6a0 guard against some integer overflows
When determining memory allocation sizes, integer overflow can
lead to memory safety errors.  Add some guards to prevent this.
2023-05-13 19:58:16 +02:00
Erik Auerswald cf2391f400 do not accept 0 expected digits
This did not work before, and it was not intended to change
this.

I do not know of a use case where one is expecting no digits
at all, but is using ssocr to recognize the non-existing
digits.  If you do have such a use case, let me know.
2023-05-01 17:00:49 +02:00
Erik Auerswald 898f5ec712 allow to specify a range for the number of digits
This can be helpful when using ssocr with a display showing
a variable number of digits, e.g., a clock, a scale, or a
thermometer.
2023-05-01 16:19:12 +02:00
Erik Auerswald 9e2d37ddbf add option -M, --min-char-dims=WxH
When there is a bit of noise in the image, the segmentation
step might find lots of small potential digits that are not
really digits (or other characters) of the display.  Given
sufficiently large display characters, it may be possible
to specify minimum character dimensions to remove spurious
potential characters (digits) based on their size.
2023-04-30 19:14:19 +02:00
Erik Auerswald 8df40937c9 accept any number of digits during segmentation
If a specific number of digits is expected, i.e., without
-d-1, the number of found (potential) digits is compared
with the expected number, and ssocr errors out on mismatch.

This prepares for the introduction of an option to reject
some potential images, e.g., because they are too small to
contain a digit or character from the display.

It also prepares for allowing a range of digits as the
expected segmentation result, e.g., for a clock display
with a blinking colon (:), or for a thermometer display.
2023-04-30 16:29:47 +02:00
Erik Auerswald 06c2afc85e add option -N, --min-segment=SIZE
This option is similar to -n, --number-pixels=#, but also
applies the limit to ratio based detection (i.e., for
recognition of "one" and "minus").
2023-04-29 11:53:32 +02:00
Erik Auerswald 2d6b019842 consistently use "scanline" in comments 2023-04-24 17:04:34 +02:00
Erik Auerswald ea8f724846 update copyright years 2023-04-23 12:53:48 +02:00
Erik Auerswald 320487e300 tweak alignment of some debug output 2023-04-23 12:50:02 +02:00
Erik Auerswald b83610d7d2 update copyright year 2022-01-25 18:38:52 +01:00
Erik Auerswald 0793da7bcb replace strncat() with memcpy()
GCC 10 warns about the use of strncat(), but accepts the equivalent
use of memcpy() without warning.

¯\_(ツ)_/¯
2021-10-26 20:54:14 +02:00
Erik Auerswald 9b227e7d64 update reasons for string.h includes 2021-10-26 20:28:36 +02:00
Erik Auerswald 9b31b50154 guard against potential unsigned integer overflow
If the string length of the value of the environment variable TMP
is too big, adding the length of a slash and the temporary file
name might overflow, which would result in an insufficient memory
allocation for the absolute file name.

I do not know if that can actually happen in any existing operating
system and platform combination where ssocr can be used.
2021-10-26 20:18:11 +02:00
Erik Auerswald 113d665135 add ability to detect and print white space
This adds an option to enable white spece detection, and two
further options to control the operation of white space detection.

White space detection (--print-spaces) is intended for use cases
where digit (resp. character) grouping is important for correct
interpretation.  One use case is the recognition of superimposed
dates in photographic images.

This commit also increases the version number to 2.21.0 and tweaks
some debug output.
2021-04-25 14:05:51 +02:00
Erik Auerswald dc463a5529 options to control decimal separator recognition
Additionally, bump copyright dates and version number.
2021-04-19 19:27:52 +02:00
Erik Auerswald 17928b4b76 always use spaces for indentation, not tabs 2019-03-10 18:06:58 +01:00
Erik Auerswald 2031c2c08e refactor scanning for set segments
This introduces a function to scan part of the image for foreground
pixels.

This scanline() function may be of use for distuingishing between
the digit '1' and the symbol ":".

It may also help in segment detection reliability if the "len" parameter
is used to skip scanning image areas between segment positions.
2019-03-10 17:58:04 +01:00
Erik Auerswald d3fdf3b223 cosmetic changes in ssocr.c 2019-02-02 15:09:50 +01:00
Erik Auerswald 535aa89bdb keep line length <= 80 2019-02-02 14:30:41 +01:00
Erik Auerswald 9569289c63 bump copyright year to 2019 2019-02-02 13:08:13 +01:00
Erik Auerswald 07623ef831 make debug output imply verbose operation 2019-02-02 13:03:31 +01:00
Erik Auerswald 3bdd09428f ssocr.c: replace strcat with strncat 2018-12-29 11:40:02 +01:00
Erik Auerswald 592a4044e5 ssocr.c: simplify some string operations 2018-12-29 11:06:37 +01:00
Erik Auerswald 8c2f93d6c2 ssocr.c: replace last malloc() with calloc()
The code actually needs the allocated memory to have a '\0' byte
at the beginning.
2018-12-29 10:49:29 +01:00
Erik Auerswald d25ddb4674 final step of character set support
This implements selection of character set to recognize and documents
it in the man page.
2018-08-05 07:01:50 +02:00
Erik Auerswald e6f4e49ba9 second step to implement different character sets
- add character sets full, digits, decimal, hex
- full is used by default
- character set cannot be selected for now
2018-08-05 06:26:04 +02:00
Erik Auerswald d6a957e6d3 move character printing to a separate function
This is the first step towards support of different character sets.
Different character sets are intended to be used to e.g. select
between '6' and 'b', but also to receive an error if e.g. a decimal
display is recognized as a hexadecimal digit.
2018-08-05 05:24:38 +02:00
Erik Auerswald 1b90fbba6e prefix all error messages with 'ssocr: ' 2018-07-27 22:30:05 +02:00