modern regexes in emacs

classic Classic list List threaded Threaded
44 messages Options
123
Reply | Threaded
Open this post in threaded view
|

modern regexes in emacs

Perry E. Metzger
I think, someday, it would be nice if users could select modern
regex syntax instead of the very very old-fashioned and awkward Emacs
regex syntax. The old syntax and functions that implement it need to
be kept around for legacy reasons, but one could easily set up a set
of parallel new functions that used modern PCRE style syntax, and
allow users to select those instead when doing things like
isearching on regexps etc.

Perry
--
Perry E. Metzger [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: modern regexes in emacs

Radon Rosborough
> I think, someday, it would be nice if users could select modern
> regex syntax instead of the very very old-fashioned and awkward Emacs
> regex syntax

I agree. See https://github.com/benma/visual-regexp-steroids.el, which
implements this.

Reply | Threaded
Open this post in threaded view
|

Re: modern regexes in emacs

Perry E. Metzger
On Sat, 16 Jun 2018 11:45:40 -0600 Radon Rosborough
<[hidden email]> wrote:
> > I think, someday, it would be nice if users could select modern
> > regex syntax instead of the very very old-fashioned and awkward
> > Emacs regex syntax  
>
> I agree. See https://github.com/benma/visual-regexp-steroids.el,
> which implements this.
>

That requires python, isn't integrated into emacs, etc.

--
Perry E. Metzger [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: modern regexes in emacs

Daniel Colascione-5
> On Sat, 16 Jun 2018 11:45:40 -0600 Radon Rosborough
> <[hidden email]> wrote:
>> > I think, someday, it would be nice if users could select modern
>> > regex syntax instead of the very very old-fashioned and awkward
>> > Emacs regex syntax

Right now, I'd just settle for an easier-to-type equivalent to "\_<". This
sequence is the typing equivalent of a nasty cough. I've actually been
wondering how feasible it'd be to support an rx input mode.


Reply | Threaded
Open this post in threaded view
|

Re: modern regexes in emacs

Jimmy Yuen Ho Wong
In reply to this post by Perry E. Metzger
> That requires python, isn't integrated into emacs, etc.

It doesn't. You can select pcre2el as the Regexp syntax, which
implements a good subset of PCRE on top of Emacs Regexp. However, that
project hasn't seen any movement since late 2016. It's be nice if Emacs
had PCRE baked in.



Reply | Threaded
Open this post in threaded view
|

Re: modern regexes in emacs

Jay Kamat
In reply to this post by Perry E. Metzger
Perry E. Metzger writes:

> I think, someday, it would be nice if users could select modern
> regex syntax instead of the very very old-fashioned and awkward Emacs
> regex syntax. The old syntax and functions that implement it need to
> be kept around for legacy reasons, but one could easily set up a set
> of parallel new functions that used modern PCRE style syntax, and
> allow users to select those instead when doing things like
> isearching on regexps etc.

I just wanted to note that `rx' is in many cases much easier to write and
understand than even PCRE. I'd recommend learning and using `rx' if you are
annoyed about backslashes or readability.

-Jay

Reply | Threaded
Open this post in threaded view
|

Re: modern regexes in emacs

Philippe Vaucher


On Sun, Jun 17, 2018 at 12:32 AM Jay Kamat <[hidden email]> wrote:
Perry E. Metzger writes:

> I think, someday, it would be nice if users could select modern
> regex syntax instead of the very very old-fashioned and awkward Emacs
> regex syntax. The old syntax and functions that implement it need to
> be kept around for legacy reasons, but one could easily set up a set
> of parallel new functions that used modern PCRE style syntax, and
> allow users to select those instead when doing things like
> isearching on regexps etc.

I just wanted to note that `rx' is in many cases much easier to write and
understand than even PCRE. I'd recommend learning and using `rx' if you are
annoyed about backslashes or readability.

I only remember the PCRE syntax that's why I use packages like `visual-regexp` and `pcre2el`, if emacs supported it natively that'd be a huge step forward for me.

Philippe 
Reply | Threaded
Open this post in threaded view
|

Re: modern regexes in emacs

Elias Mårtenson
In reply to this post by Jay Kamat
On Sun, 17 Jun 2018, 06:32 Jay Kamat <[hidden email] wrote:

I just wanted to note that `rx' is in many cases much easier to write and
understand than even PCRE. I'd recommend learning and using `rx' if you are
annoyed about backslashes or readability.

While I'm sure that is true for lot of people (and for those, the newly announced xr package helps here), others prefer to use the more compact regex syntax. 

However, I don't think anyone would argue that the Emacs regex syntax has any advantages compared to pcre. I certainly need to wade through the Emacs regex manual every time I want to do slightly more advanced regex matching, followed by lots of testing. 

When using regexes in regular editing (as opposed to elisp programming) it's even worse. 

I'm most definitely in favour of pcre. 
Regards, 
Elias 
Reply | Threaded
Open this post in threaded view
|

Re: modern regexes in emacs

Mattias Engdegård-2
10 feb. 2019 kl. 10.39 skrev Elias Mårtenson <[hidden email]>:
>
> While I'm sure that is true for lot of people (and for those, the newly announced xr package helps here), others prefer to use the more compact regex syntax.
>
> However, I don't think anyone would argue that the Emacs regex syntax has any advantages compared to pcre. I certainly need to wade through the Emacs regex manual every time I want to do slightly more advanced regex matching, followed by lots of testing.
>
> When using regexes in regular editing (as opposed to elisp programming) it's even worse.
>
> I'm most definitely in favour of pcre.

Hello Elias,

Of course you should write "-?[0-9]+" when you need it! And for interactive use -- search-and-replace, say -- the conventional notations are not bad, since they are compact to write, you have the meaning all in your head anyway, and nobody is going to look at it later on.

Where rx shines is for the complex ones. I have written page-long regexps in Perl and Python, and despite the fact that both languages permit a "structured" regexp layout, they does not come close to rx when it counts: rx can be read, understood, maintained, evolved, and composed far better, and with fewer mistakes.

I agree that the Posix notation is probably better than the old-style version in Emacs since the former tends to be a tad lighter in backslashes. Some languages - OCaml, Python, etc -- have some form of string literal that avoids the need to escape backslashes, but fundamentally, regexps are not strings but an algebraic notation with values and operators, and deserve some kind of higher language-level support. Larry Wall understood that.

So I suggest you give rx a go next time you need to write a complicated regexp in Elisp. If you still find it too verbose, you can use short keywords, like `+' or `1+' instead of `one-or-more'. You can even speak a hybrid dialect by injecting little regexp strings inside a big rx expression with the `(regexp ...)' syntax! Take a look at the big `gnu' matcher in compile.el (around line 281) to see what that looks like.

Careful here -- rx is addictive, and you may very well come to use it more and more.


Reply | Threaded
Open this post in threaded view
|

Re: modern regexes in emacs

Philippe Vaucher

Of course you should write "-?[0-9]+" when you need it! And for interactive use -- search-and-replace, say -- the conventional notations are not bad, since they are compact to write, you have the meaning all in your head anyway, and nobody is going to look at it later on.

I think the purpose of this thread is to ask for emacs to support PCRE regexpes in commands like `query-replace-regexp`.

Would this even be possible? I can imagine a whole lot of packages breaking if the regexp syntax changed, and changing it just for the user input in interactive functions looks a bit sketchy.
Reply | Threaded
Open this post in threaded view
|

Re: modern regexes in emacs

Clément Pit-Claudel
On 15/02/2019 08.42, Philippe Vaucher wrote:
> Would this even be possible? I can imagine a whole lot of packages breaking if the regexp syntax changed, and changing it just for the user input in interactive functions looks a bit sketchy.

We could just add a special tag at the beginning of a regexp to indicate that it's a pcre regexp; something like this maybe? (re-search-forward "\\(?pcre:\\)…[pcre regexp goes here]…").  This form is currently a syntax error, so there would be no ambiguity, and we could define a (pcre …) macro so that you could write (re-search-forward (pcre "…[pcre regexp goes here]…")) instead.  Alternatively, we could use an explicit tag, something like (re-search-forward (cons 'pcre "…[pcre regexp goes here]…")).

For interactive functions, I imagine you'd have a defcustom with a preferred regexp dialect.

Reply | Threaded
Open this post in threaded view
|

Re: modern regexes in emacs

Eli Zaretskii
In reply to this post by Philippe Vaucher
> From: Philippe Vaucher <[hidden email]>
> Date: Fri, 15 Feb 2019 14:42:43 +0100
> Cc: emacs-devel <[hidden email]>, Jay Kamat <[hidden email]>,
> Elias Mårtenson <[hidden email]>,
> "Perry E. Metzger" <[hidden email]>
>
> I think the purpose of this thread is to ask for emacs to support PCRE regexpes in commands like
> `query-replace-regexp`.
>
> Would this even be possible? I can imagine a whole lot of packages breaking if the regexp syntax changed,
> and changing it just for the user input in interactive functions looks a bit sketchy.

It should be possible if we introduce new functions for PCRE, or if we
mark PCRE regexps in some special way, like put a special text
property on the string.

Reply | Threaded
Open this post in threaded view
|

Re: modern regexes in emacs

Philippe Vaucher
In reply to this post by Clément Pit-Claudel
> Would this even be possible? I can imagine a whole lot of packages breaking if the regexp syntax changed, and changing it just for the user input in interactive functions looks a bit sketchy.

We could just add a special tag at the beginning of a regexp to indicate that it's a pcre regexp; something like this maybe? (re-search-forward "\\(?pcre:\\)…[pcre regexp goes here]…").  This form is currently a syntax error, so there would be no ambiguity, and we could define a (pcre …) macro so that you could write (re-search-forward (pcre "…[pcre regexp goes here]…")) instead.  Alternatively, we could use an explicit tag, something like (re-search-forward (cons 'pcre "…[pcre regexp goes here]…")).

For interactive functions, I imagine you'd have a defcustom with a preferred regexp dialect.

I like where this is going, that and Eli's suggestion of a special text property we have plenty of ways to implement it where it'd play nice with the existing code.

So far 3 proposals:
  • Regexps are always strings, with "\\(?pcre:\\)" as part of the regexp
    • when the string is displayed you need to scan the beginning to see it is a PCRE regex
    • no separation between the regexp and it's kind
  • Regexps are strings (emacs regexps) or conses with their kind as symbol with the first argument
    • when the argument is displayed you see immediatly wether it's an emacs regexp or one using another engine
    • the regexp is clearly separated from it's kind, probably faciliting convertions
    • seems more "open", in the sense we can easily imagine new types ('emacs, 'pcre', 'rx, 'sed, 'vim-verymagic, etc)
  • Special text property on the string
    • Not immediatly visible that it is a PCRE regexp
    • Harder to manipulate?
Given this I'm in favor of the 2nd option, but maybe I missed some points.

Philippe
Reply | Threaded
Open this post in threaded view
|

Re: modern regexes in emacs

Clément Pit-Claudel
On 15/02/2019 10.03, Philippe Vaucher wrote:
> Given this I'm in favor of the 2nd option, but maybe I missed some points.

Thinking more about this, there is one non-trivial issue: concatenation.  It's common for code in Emacs to take a regexp, assume it's a string, and do something like (concat "\\(" some-regexp-var "\\|" some-other-regexp-var "\\)").

Solution 1 could be tweaked to wrap the whole regexp: "\\(?pcre:…[pcre regexp here]…\\)", and so could solution 3 (a text property spanning the whole length of the string), but solution 2 won't work well here.

Not to mention the fact that if the regexps are matched by different engines, we now have to make these work together :/

Clément.

Reply | Threaded
Open this post in threaded view
|

Re: modern regexes in emacs

Perry E. Metzger
In reply to this post by Eli Zaretskii
On Fri, 15 Feb 2019 16:18:12 +0200 Eli Zaretskii <[hidden email]> wrote:

> > I think the purpose of this thread is to ask for emacs to support
> > PCRE regexpes in commands like `query-replace-regexp`.
> >
> > Would this even be possible? I can imagine a whole lot of
> > packages breaking if the regexp syntax changed, and changing it
> > just for the user input in interactive functions looks a bit
> > sketchy.  
>
> It should be possible if we introduce new functions for PCRE, or if
> we mark PCRE regexps in some special way, like put a special text
> property on the string.

I think the right thing is to introduce new functions for new-style
regexps that parallel the old ones, and to allow users to bind things
like the regexp flavors of isearch to the new-style versions if they
wish.

We can decide if we want the new-style versions to be the default
search bindings at some distant date.

One could also very slowly replace use of old-style regexp functions
with the new-style regexp functions in lisp code, but that could be
done over many many years if desired.

Perry
--
Perry E. Metzger [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: modern regexes in emacs

Stefan Monnier
>> It should be possible if we introduce new functions for PCRE, or if
>> we mark PCRE regexps in some special way, like put a special text
>> property on the string.

A simpler option is for Elisp users to write (pcre "foo") where `pcre`
is a function that converts to Emacs's own format.

I think it would make a lot of sense for Emacs's search functions to
accept other kinds of search specifications than regular expressions
represented as strings, e.g. to also accept precompiled regexps (or
NFAs/DFAs), so `pcre` could also return one of those representations
if/when support for it is added.


        Stefan


Reply | Threaded
Open this post in threaded view
|

Re: modern regexes in emacs

Mattias Engdegård-2
In reply to this post by Eli Zaretskii
15 feb. 2019 kl. 15.18 skrev Eli Zaretskii <[hidden email]>:
>
> It should be possible if we introduce new functions for PCRE, or if we
> mark PCRE regexps in some special way, like put a special text
> property on the string.

It would be easier if those who ask for PCRE would say exactly what they want:

(1) The syntax of PCRE -- | () {} instead of \| \(\) \{\} etc -- but restricted to the set of features of the Emacs regexp engine.
(2) The features of PCRE not present in Emacs regexps. Which ones, exactly? Lookbehind assertions? Atomic groups?
(3) PCRE for interactive use only.
(4) PCRE for general Elisp programming.

Locating and wrapping the places that ask for regexps interactively, such as `query-replace-regexp', would permit the interactive regexp syntax to become a simple user customisation -- traditional, PCRE, rx or whatnot. It would be a matter of writing a transformation function, and possibly some syntax highlighting, for each case.

I wouldn't be surprised if 99% of the requests are really about not having to escape |(){} as metacharacters in interactive use.


Reply | Threaded
Open this post in threaded view
|

Re: modern regexes in emacs

Perry E. Metzger
On Fri, 15 Feb 2019 17:24:18 +0100 Mattias Engdegård
<[hidden email]> wrote:

> 15 feb. 2019 kl. 15.18 skrev Eli Zaretskii <[hidden email]>:
> >
> > It should be possible if we introduce new functions for PCRE, or
> > if we mark PCRE regexps in some special way, like put a special
> > text property on the string.  
>
> It would be easier if those who ask for PCRE would say exactly what
> they want:
>
> (1) The syntax of PCRE -- | () {} instead of \| \(\) \{\} etc --
> but restricted to the set of features of the Emacs regexp engine.

Modern syntax is the main one.

> (2) The features of PCRE not present in Emacs regexps. Which ones,
> exactly? Lookbehind assertions? Atomic groups?

I'm not particularly interested in those.

> (3) PCRE for interactive use only.
> (4) PCRE for general Elisp programming.

The old style syntax is repulsive. I think we should make it possible
to slowly switch over to the syntax everyone using regexps has gotten
used to over the last 30 years or so. BREs in the style Emacs has
been using have been obsolete for longer than many Emacs users have
been alive.

> Locating and wrapping the places that ask for regexps
> interactively, such as `query-replace-regexp', would permit the
> interactive regexp syntax to become a simple user customisation --
> traditional, PCRE, rx or whatnot. It would be a matter of writing a
> transformation function, and possibly some syntax highlighting, for
> each case.
>
> I wouldn't be surprised if 99% of the requests are really about not
> having to escape |(){} as metacharacters in interactive use.

No, that's a lot of my complaint. I can't even remember what the
correct syntax is half the time.

Anyway, I recommend Eli's approach. We create a parallel set of
modernized syntax functions, and people can slowly adopt them.

Perry
--
Perry E. Metzger [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: modern regexes in emacs

Alan Mackenzie
Hello, Perry.

On Fri, Feb 15, 2019 at 11:47:28 -0500, Perry E. Metzger wrote:
> On Fri, 15 Feb 2019 17:24:18 +0100 Mattias Engdegård
> <[hidden email]> wrote:
> > 15 feb. 2019 kl. 15.18 skrev Eli Zaretskii <[hidden email]>:

> > > It should be possible if we introduce new functions for PCRE, or
> > > if we mark PCRE regexps in some special way, like put a special
> > > text property on the string.  

> > It would be easier if those who ask for PCRE would say exactly what
> > they want:

> > (1) The syntax of PCRE -- | () {} instead of \| \(\) \{\} etc --
> > but restricted to the set of features of the Emacs regexp engine.

> Modern syntax is the main one.

Such use of "modern" always gets on my nerves.  "Modern" is not the same
as "good", and likely has a very weak correlation with it.  Why aren't we
all using "modern" editors, for example?

> > (2) The features of PCRE not present in Emacs regexps. Which ones,
> > exactly? Lookbehind assertions? Atomic groups?

> I'm not particularly interested in those.

That would be the sole reason for me for any switch.

> > (3) PCRE for interactive use only.
> > (4) PCRE for general Elisp programming.

> The old style syntax is repulsive.

I disagree.  But that's not important.  What's important is to have a
standard invariable regexp notation, otherwise confusion and unwanted
unforeseen nastinesses will occur.

> I think we should make it possible to slowly switch over to the syntax
> everyone using regexps has gotten used to over the last 30 years or so.
> BREs in the style Emacs has been using have been obsolete for longer
> than many Emacs users have been alive.

They're not obsolete: they're used in grep, sed, and in Emacs.

There are several different standards for writing regexps, all of
approximately the same age.  None is better than any other (aside from
extra facilities available in some versions).

This seems to me to be the same argument as that proposing that Emacs
should change its key bindings to match those of other programs, because
"everybody" knows those other bindings.

> > Locating and wrapping the places that ask for regexps
> > interactively, such as `query-replace-regexp', would permit the
> > interactive regexp syntax to become a simple user customisation --
> > traditional, PCRE, rx or whatnot. It would be a matter of writing a
> > transformation function, and possibly some syntax highlighting, for
> > each case.

Exactly.  And then we've got 10 to 20 years of confusion, with several
mutually incompatible regexp notations competing for attention in the
same Emacs.  I think this would be a thoroughly bad idea.

> > I wouldn't be surprised if 99% of the requests are really about not
> > having to escape |(){} as metacharacters in interactive use.

> No, that's a lot of my complaint. I can't even remember what the
> correct syntax is half the time.

I don't suffer that difficulty in Emacs (though I sometimes do in grep,
egrep, sed and AWK, all of which have slightly different regexps).  But I
would begin to suffer it if there started to be a mixture of incompatible
regexp notations in Emacs sources.  Let's keep things simple.

> Anyway, I recommend Eli's approach. We create a parallel set of
> modernized syntax functions, and people can slowly adopt them.

I suggest we retain our current regexp notation, together with compatible
tools, as the sole way of writing regexps in Emacs.  This notation is not
all that bad, and it is thoroughly documented and well tested.  It's the
approach which will cause the least confusion.  It works.

> Perry
> --
> Perry E. Metzger [hidden email]

--
Alan Mackenzie (Nuremberg, Germany).

Reply | Threaded
Open this post in threaded view
|

RE: modern regexes in emacs

Drew Adams
> > Modern syntax is the main one.
>
> Such use of "modern" always gets on my nerves.  "Modern" is not the same
> as "good", and likely has a very weak correlation with it.

Not to mention that "modern" has been applied to the latest fashion, ephemeral or not, for at least 100 years.  Today's modernista is tomorrow morning's has-been, but s?he sometimes continues to tout the same old-fashioned modernisms.

There's absolutely nothing new about labeling something "modern" (or "old-fashioned", for that matter).  Nothing new about "modern".

> Why aren't we all using "modern" editors, for example?

Why indeed?

Headline: "Users of Anachronistic Editor Emacs Go 'Modern'!"

> > I think we should make it possible to slowly switch over to the syntax
> > everyone using regexps has gotten used to over the last 30 years or so.
> > BREs in the style Emacs has been using have been obsolete for longer
> > than many Emacs users have been alive.
>
> They're not obsolete: they're used in grep, sed, and in Emacs.
>
> There are several different standards for writing regexps, all of
> approximately the same age.  None is better than any other (aside from
> extra facilities available in some versions).

But surely some are "modern" and others are "obsolete", Alan. ;-)

(What's the equivalent of L'Academie Francaise for things technical?)

Emacs itself has been obsolete for longer than many Emacs users have been alive.  Emacs is dead.  Long live Emacs.

> This seems to me to be the same argument as that proposing that Emacs
> should change its key bindings to match those of other programs, because
> "everybody" knows those other bindings.

Emacs key bindings have been obsolete longer than many Emacs users have been alive.  Please remember this.

123