bug#47534: Subject: 28.0.50; Regexp lower case pattern matches upper case

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

bug#47534: Subject: 28.0.50; Regexp lower case pattern matches upper case

dalanicolai
When using the regexp builder and trying to match only lower case
letters (and spaces), Emacs also includes upper case matches. See attached
image (the regexp builder syntax in the image is set to string, but it happens with other syntaxes too)
image.png

Also, I am unable to find in the manual any information about the option of setting different syntaxes in the regexp builder (also the option is not mentioned in the regexp-builder docstring). So I would additionally like to report this as a documentation bug. (I am reporting it here, because maybe I did not search clever enough, but anyway then it is really hard to find for me.
Let me know if you would like me to report this separately.)

In GNU Emacs 28.0.50 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.25, cairo version 1.16.0)
 of 2021-02-18 built on daniel-fedora
Repository revision: 185121da6978553d538d37d6d0e67dc52e13311f
Repository branch: feature/native-comp
Windowing system distributor 'The X.Org Foundation', version 11.0.12010000
System Description: Fedora 34 (Workstation Edition Prerelease)

Configured using:
 'configure --with-nativecomp'

Configured features:
CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ JPEG JSON
LCMS2 LIBOTF LIBSELINUX LIBSYSTEMD LIBXML2 M17N_FLT MODULES NATIVE_COMP
NOTIFY INOTIFY PDUMPER PNG RSVG SOUND THREADS TIFF TOOLKIT_SCROLL_BARS
X11 XDBE XIM XPM GTK3 ZLIB

Important settings:
  value of $LANG: en_US.UTF-8
  value of $XMODIFIERS: @im=none
  locale-coding-system: utf-8-unix

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
None found.

Features:
(mailalias mailclient browse-url url url-proxy url-privacy url-expand
url-methods url-history url-cookie url-domsuf url-util url-parse
url-vars mailcap help-mode pp shadow sort mail-extr emacsbug message rmc
puny dired dired-loaddefs rfc822 mml easymenu mml-sec epa derived epg
epg-config gnus-util rmail rmail-loaddefs auth-source cl-seq eieio
eieio-core eieio-loaddefs password-cache json map cl-macs
text-property-search seq byte-opt gv bytecomp byte-compile cconv
mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils
mailheader sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr
mail-utils time-date subr-x re-builder rx thingatpt cl-loaddefs cl-lib
iso-transl tooltip eldoc electric uniquify ediff-hook vc-hooks
lisp-float-type mwheel term/x-win x-win term/common-win x-dnd tool-bar
dnd fontset image regexp-opt fringe tabulated-list replace newcomment
text-mode elisp-mode lisp-mode prog-mode register page tab-bar menu-bar
rfn-eshadow isearch timer select scroll-bar mouse jit-lock font-lock
syntax facemenu font-core term/tty-colors frame minibuffer cl-generic
cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao
korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech
european ethiopic indian cyrillic chinese composite charscript charprop
case-table epa-hook jka-cmpr-hook help simple abbrev obarray
cl-preloaded nadvice button loaddefs faces cus-face pcase macroexp files
window text-properties overlay sha1 md5 base64 format env code-pages
mule custom widget hashtable-print-readable backquote threads dbusbind
inotify lcms2 dynamic-setting system-font-setting font-render-setting
cairo move-toolbar gtk x-toolkit x multi-tty make-network-process
nativecomp emacs)

Memory information:
((conses 16 94092 14675)
 (symbols 48 7774 2)
 (strings 32 25567 2178)
 (string-bytes 1 959120)
 (vectors 16 17517)
 (vector-slots 8 362281 18788)
 (floats 8 34 184)
 (intervals 56 230 0)
 (buffers 992 11))
Reply | Threaded
Open this post in threaded view
|

bug#47534: Subject: 28.0.50; Regexp lower case pattern matches upper case

dalanicolai
Ah thanks. I thought I used case sensitive replace-regexp before, but probably I am mistaking then. Section "34.3 Regular Expressions"  of the Emacs manual states that "[a-z]" or "[:lower:]" should match only lower case, while it is not mentioned at all that it will not work by default (case-fold-search as well as reb-toggle-case and reb-change-syntax are not mentioned at all in that section as far as I can find). Indeed I was referring to the docstring of re-builder. Anyway, I am happy to provide a patch.



On Thu, 1 Apr 2021 at 00:25, Basil L. Contovounesios <[hidden email]> wrote:
dalanicolai <[hidden email]> writes:

> When using the regexp builder and trying to match only lower case
> letters (and spaces), Emacs also includes upper case matches.

I think this is the effect of the user option case-fold-search, which
defaults to non-nil (this is the case across most search-related parts
of Emacs, including Isearch).  You can toggle it in re-builder
specifically with C-c C-c (reb-toggle-case), or across all buffers by
customising case-fold-search to be nil.

> Also, I am unable to find in the manual any information about the
> option of setting different syntaxes in the regexp builder

I think most of the documentation for re-builder.el is in its commentary
at the start of the file; see M-x find-library RET re-builder RET.

I haven't used re-builder much, but from M-x customize-group RET
re-builder RET I see there is a user option reb-re-syntax for
controlling the default syntax.  C-h m in the re-builder buffer further
reveals that the command C-c C-i (reb-change-syntax) can modify
reb-re-syntax on the fly.

> (also the option is not mentioned in the regexp-builder docstring). So
> I would additionally like to report this as a documentation bug.

Are you referring to the docstring of the re-builder command, or
something else?  And is it a listing of key bindings you would like to
see, or something else?  Would you like to provide a patch with
suggestions for improvements?

Thanks,

--
Basil
Reply | Threaded
Open this post in threaded view
|

bug#47534: Subject: 28.0.50; Regexp lower case pattern matches upper case

dalanicolai
Here is a small patch for the re-builder docstring. I hope I did things correctly...

On Thu, 1 Apr 2021 at 10:12, dalanicolai <[hidden email]> wrote:
Ah thanks. I thought I used case sensitive replace-regexp before, but probably I am mistaking then. Section "34.3 Regular Expressions"  of the Emacs manual states that "[a-z]" or "[:lower:]" should match only lower case, while it is not mentioned at all that it will not work by default (case-fold-search as well as reb-toggle-case and reb-change-syntax are not mentioned at all in that section as far as I can find). Indeed I was referring to the docstring of re-builder. Anyway, I am happy to provide a patch.



On Thu, 1 Apr 2021 at 00:25, Basil L. Contovounesios <[hidden email]> wrote:
dalanicolai <[hidden email]> writes:

> When using the regexp builder and trying to match only lower case
> letters (and spaces), Emacs also includes upper case matches.

I think this is the effect of the user option case-fold-search, which
defaults to non-nil (this is the case across most search-related parts
of Emacs, including Isearch).  You can toggle it in re-builder
specifically with C-c C-c (reb-toggle-case), or across all buffers by
customising case-fold-search to be nil.

> Also, I am unable to find in the manual any information about the
> option of setting different syntaxes in the regexp builder

I think most of the documentation for re-builder.el is in its commentary
at the start of the file; see M-x find-library RET re-builder RET.

I haven't used re-builder much, but from M-x customize-group RET
re-builder RET I see there is a user option reb-re-syntax for
controlling the default syntax.  C-h m in the re-builder buffer further
reveals that the command C-c C-i (reb-change-syntax) can modify
reb-re-syntax on the fly.

> (also the option is not mentioned in the regexp-builder docstring). So
> I would additionally like to report this as a documentation bug.

Are you referring to the docstring of the re-builder command, or
something else?  And is it a listing of key bindings you would like to
see, or something else?  Would you like to provide a patch with
suggestions for improvements?

Thanks,

--
Basil

re-builder-improve-docstring (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

bug#47534: Subject: 28.0.50; Regexp lower case pattern matches upper case

dalanicolai
And here is a small patch for the docs

On Thu, 1 Apr 2021 at 12:27, dalanicolai <[hidden email]> wrote:
Here is a small patch for the re-builder docstring. I hope I did things correctly...

On Thu, 1 Apr 2021 at 10:12, dalanicolai <[hidden email]> wrote:
Ah thanks. I thought I used case sensitive replace-regexp before, but probably I am mistaking then. Section "34.3 Regular Expressions"  of the Emacs manual states that "[a-z]" or "[:lower:]" should match only lower case, while it is not mentioned at all that it will not work by default (case-fold-search as well as reb-toggle-case and reb-change-syntax are not mentioned at all in that section as far as I can find). Indeed I was referring to the docstring of re-builder. Anyway, I am happy to provide a patch.



On Thu, 1 Apr 2021 at 00:25, Basil L. Contovounesios <[hidden email]> wrote:
dalanicolai <[hidden email]> writes:

> When using the regexp builder and trying to match only lower case
> letters (and spaces), Emacs also includes upper case matches.

I think this is the effect of the user option case-fold-search, which
defaults to non-nil (this is the case across most search-related parts
of Emacs, including Isearch).  You can toggle it in re-builder
specifically with C-c C-c (reb-toggle-case), or across all buffers by
customising case-fold-search to be nil.

> Also, I am unable to find in the manual any information about the
> option of setting different syntaxes in the regexp builder

I think most of the documentation for re-builder.el is in its commentary
at the start of the file; see M-x find-library RET re-builder RET.

I haven't used re-builder much, but from M-x customize-group RET
re-builder RET I see there is a user option reb-re-syntax for
controlling the default syntax.  C-h m in the re-builder buffer further
reveals that the command C-c C-i (reb-change-syntax) can modify
reb-re-syntax on the fly.

> (also the option is not mentioned in the regexp-builder docstring). So
> I would additionally like to report this as a documentation bug.

Are you referring to the docstring of the re-builder command, or
something else?  And is it a listing of key bindings you would like to
see, or something else?  Would you like to provide a patch with
suggestions for improvements?

Thanks,

--
Basil

improve-regexp-doc-add-note-case-sensitive (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

bug#47534: Subject: 28.0.50; Regexp lower case pattern matches upper case

dalanicolai
Of course any feedback on the code (well... text)  or my workflow is very welcome.

On Thu, 1 Apr 2021 at 12:27, dalanicolai <[hidden email]> wrote:
And here is a small patch for the docs

On Thu, 1 Apr 2021 at 12:27, dalanicolai <[hidden email]> wrote:
Here is a small patch for the re-builder docstring. I hope I did things correctly...

On Thu, 1 Apr 2021 at 10:12, dalanicolai <[hidden email]> wrote:
Ah thanks. I thought I used case sensitive replace-regexp before, but probably I am mistaking then. Section "34.3 Regular Expressions"  of the Emacs manual states that "[a-z]" or "[:lower:]" should match only lower case, while it is not mentioned at all that it will not work by default (case-fold-search as well as reb-toggle-case and reb-change-syntax are not mentioned at all in that section as far as I can find). Indeed I was referring to the docstring of re-builder. Anyway, I am happy to provide a patch.



On Thu, 1 Apr 2021 at 00:25, Basil L. Contovounesios <[hidden email]> wrote:
dalanicolai <[hidden email]> writes:

> When using the regexp builder and trying to match only lower case
> letters (and spaces), Emacs also includes upper case matches.

I think this is the effect of the user option case-fold-search, which
defaults to non-nil (this is the case across most search-related parts
of Emacs, including Isearch).  You can toggle it in re-builder
specifically with C-c C-c (reb-toggle-case), or across all buffers by
customising case-fold-search to be nil.

> Also, I am unable to find in the manual any information about the
> option of setting different syntaxes in the regexp builder

I think most of the documentation for re-builder.el is in its commentary
at the start of the file; see M-x find-library RET re-builder RET.

I haven't used re-builder much, but from M-x customize-group RET
re-builder RET I see there is a user option reb-re-syntax for
controlling the default syntax.  C-h m in the re-builder buffer further
reveals that the command C-c C-i (reb-change-syntax) can modify
reb-re-syntax on the fly.

> (also the option is not mentioned in the regexp-builder docstring). So
> I would additionally like to report this as a documentation bug.

Are you referring to the docstring of the re-builder command, or
something else?  And is it a listing of key bindings you would like to
see, or something else?  Would you like to provide a patch with
suggestions for improvements?

Thanks,

--
Basil