bug#40216: 28.0.50; Misinformation in isearch char-fold

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

bug#40216: 28.0.50; Misinformation in isearch char-fold

Juri Linkov-2
Tags: patch

When the size of the generated regexp in char-fold isearch mode reaches
a certain limit, it silently falls back to literal search without notifying
the user about this fact.  Thus uninformed users might miss some search hits.

Here is the patch that instead of returning a quoted string in
char-fold-to-regexp when it reaches some arbitrary limit,
instead of this it toggles the literal search mode explicitly,
tries to find the next occurrence in literal mode, and displays
the message about switching search mode for 2 seconds:


diff --git a/lisp/char-fold.el b/lisp/char-fold.el
index f8a303956e..34561a2efe 100644
--- a/lisp/char-fold.el
+++ b/lisp/char-fold.el
@@ -370,11 +377,7 @@ char-fold-to-regexp
       (setq i (1+ i)))
     (when (> spaces 0)
       (push (char-fold--make-space-string spaces) out))
-    (let ((regexp (apply #'concat (nreverse out))))
-      ;; Limited by `MAX_BUF_SIZE' in `regex-emacs.c'.
-      (if (> (length regexp) 5000)
-          (regexp-quote string)
-        regexp))))
+    (apply #'concat (nreverse out))))
 
 
 ;;; Commands provided for completeness.
diff --git a/lisp/isearch.el b/lisp/isearch.el
index ddf9190dc6..7625ec12b5 100644
--- a/lisp/isearch.el
+++ b/lisp/isearch.el
@@ -2011,15 +2011,16 @@ regexp
 (defvar isearch-message-properties minibuffer-prompt-properties
   "Text properties that are added to the isearch prompt.")
 
-(defun isearch--momentary-message (string)
-  "Print STRING at the end of the isearch prompt for 1 second."
+(defun isearch--momentary-message (string &optional seconds)
+  "Print STRING at the end of the isearch prompt for 1 second.
+The optional argument SECONDS overrides the number of seconds."
   (let ((message-log-max nil))
     (message "%s%s%s"
              (isearch-message-prefix nil isearch-nonincremental)
              isearch-message
              (apply #'propertize (format " [%s]" string)
                     isearch-message-properties)))
-  (sit-for 1))
+  (sit-for (or seconds 1)))
 
 (isearch-define-mode-toggle lax-whitespace " " nil
   "In ordinary search, toggles the value of the variable
@@ -3443,7 +3444,10 @@ isearch-search
     (string-match "\\`Regular expression too big" isearch-error))
        (cond
  (isearch-regexp-function
- (setq isearch-error "Too many words"))
+         (setq isearch-error nil)
+         (setq isearch-regexp-function nil)
+         (isearch-search-and-update)
+         (isearch--momentary-message "Too many words; switched to literal mode" 2))
  ((and isearch-lax-whitespace search-whitespace-regexp)
  (setq isearch-error "Too many spaces for whitespace matching"))))))
 
Reply | Threaded
Open this post in threaded view
|

bug#40216: 28.0.50; Misinformation in isearch char-fold

Juri Linkov-2
> Out of curiosity, what were you searching for that resulted in such a
> large regexp?

Sometimes I pull a few of lines (usually 1-3 lines, not more)
from the buffer into the search string to confirm that the same lines
exist in more places in the same buffer ignoring the differences defined
by folding rules.  But after pulling 2 lines into the search string,
the generated regexp becomes so long that the regexp search fails
with the error "Regular expression too big".  Currently it silently
switches to literal search without notification that it doesn't follow
the folding rules anymore.  With the patch it informs about switching
to literal search.



Reply | Threaded
Open this post in threaded view
|

bug#40216: 28.0.50; Misinformation in isearch char-fold

Eli Zaretskii
> From: Robert Pluim <[hidden email]>
> Date: Thu, 26 Mar 2020 10:28:55 +0100
> Cc: [hidden email]
>
> Do we need an option to char-fold-regexp that says 'only apply
> char-folding to non-ascii characters'?

But this feature is not intended only to find variants of non-ASCII
characters when one searches for a non-ASCII, it is also intended to
find variants when searching for ASCII characters.  For example,
searching for a is supposed to find ä and à and á.  Or am I missing
something?



Reply | Threaded
Open this post in threaded view
|

bug#40216: 28.0.50; Misinformation in isearch char-fold

Robert Pluim
>>>>> On Thu, 26 Mar 2020 16:25:14 +0200, Eli Zaretskii <[hidden email]> said:

    >> From: Robert Pluim <[hidden email]>
    >> Date: Thu, 26 Mar 2020 10:28:55 +0100
    >> Cc: [hidden email]
    >>
    >> Do we need an option to char-fold-regexp that says 'only apply
    >> char-folding to non-ascii characters'?

    Eli> But this feature is not intended only to find variants of non-ASCII
    Eli> characters when one searches for a non-ASCII, it is also intended to
    Eli> find variants when searching for ASCII characters.  For example,
    Eli> searching for a is supposed to find ä and à and á.  Or am I missing
    Eli> something?

Yes, thatʼs exactly right. But in the case where you have mainly
characters where you donʼt want case-folding, it might make sense to
restrict the folding to non-ascii as an optimisation. eg. Suppose my
name were Røbert, with people frequently misspelling it as Robert, I
might want isearch to just search for "R\\(?:ǿ\\|[øǿo]\\)bert"

Røbert



Reply | Threaded
Open this post in threaded view
|

bug#40216: 28.0.50; Misinformation in isearch char-fold

Juri Linkov-2
In reply to this post by Juri Linkov-2
> Ah, I hadn't considered that use case. Do we need an option to
> char-fold-regexp that says 'only apply char-folding to non-ascii
> characters'? That would reduce the size of the regexp considerably.

Currently there are 2 covered use cases:

1. the default is to fold ascii to non-ascii characters;

2. non-nil char-fold-symmetric additionally folds
   non-ascii to ascii characters.

It seems you are proposing a third use case:

3. symmetric-only that can be implemented with a new non-nil option
   char-fold-symmetric-only that will fold only non-ascii characters
   to ascii.

I have doubts how useful this will be.

The current default behavior is useful when the user types
ascii characters on the keyboard with ascii characters only.

The option char-fold-symmetric is useful to match pasted text
both ways ignoring all differences between ascii/non-ascii characters.

But for symmetric-only I can't imagine any useful use case.
For example, when you paste non-ascii characters into the search string,
and want to find corresponding ascii characters.  But why wouldn't you
want to find the other way around: pasting ascii characters
to find non-ascii counterparts?



Reply | Threaded
Open this post in threaded view
|

bug#40216: 28.0.50; Misinformation in isearch char-fold

Juri Linkov-2
In reply to this post by Robert Pluim
>     >> Do we need an option to char-fold-regexp that says 'only apply
>     >> char-folding to non-ascii characters'?
>
>     Eli> But this feature is not intended only to find variants of non-ASCII
>     Eli> characters when one searches for a non-ASCII, it is also intended to
>     Eli> find variants when searching for ASCII characters.  For example,
>     Eli> searching for a is supposed to find ä and à and á.  Or am I missing
>     Eli> something?
>
> Yes, thatʼs exactly right. But in the case where you have mainly
> characters where you donʼt want case-folding, it might make sense to
> restrict the folding to non-ascii as an optimisation. eg. Suppose my
> name were Røbert, with people frequently misspelling it as Robert, I
> might want isearch to just search for "R\\(?:ǿ\\|[øǿo]\\)bert"

I tried to find Røbert by typing Robert, but char-fold fails to find it.
A bug in char-fold?



Reply | Threaded
Open this post in threaded view
|

bug#40216: 28.0.50; Misinformation in isearch char-fold

Eli Zaretskii
> From: Juri Linkov <[hidden email]>
> Cc: Eli Zaretskii <[hidden email]>,  [hidden email]
> Date: Fri, 27 Mar 2020 01:04:12 +0200
>
> I tried to find Røbert by typing Robert, but char-fold fails to find it.
> A bug in char-fold?

I don't think it's a bug, because ø doesn't have a decomposition in
the Unicode character database:

   (get-char-code-property ?ø 'decomposition) => (248)

(i.e. the character "decomposes" into itself).  By contrast:

   (get-char-code-property ?á 'decomposition) => (97 769)

(i.e. á decomposes into a followed by U+0301 COMBINING ACUTE ACCENT).

So if one wants to support the kind of folding you expected, one would
have to customize char-fold-include to add those additional rules.



Reply | Threaded
Open this post in threaded view
|

bug#40216: 28.0.50; Misinformation in isearch char-fold

Juri Linkov-2
tags 40216 fixed
close 40216 28.0.50
quit

>     Eli> So if one wants to support the kind of folding you expected, one would
>     Eli> have to customize char-fold-include to add those additional rules.
>
> Yes, wrong example. I guess this wouldnʼt be useful after all (and

Thanks for pointing out a possibility to optimize char-fold,
I haven't thought about this before.  But it seems this optimization
limits the usability of char-fold since matching non-ascii characters
on ascii text is not needed as often as matching ascii on non-ascii text,
or both ways.  Even the current default of folding ascii to non-ascii
is so useless for me that I have to enable char-fold-symmetric.

> I see nothing wrong with Juri's proposed fix to the actual issue).

So now pushed to master.