Saving match data

classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Saving match data

Philipp Stephani
Hi,

the Elisp manual (section "The Match Data") says:

"Notice that all functions are allowed to overwrite the match data
unless they’re explicitly documented not to do so."

I think this statement is surprising and puts unnecessary burden on Elisp programmers. The usual expectation is that global state is *not* modified unless explicitly specified. Taken literally, Elisp programmers need to surround even calls to `car' with `save-match-data' because the documentation of `car' doesn't specify that it doesn't change the match data. How about changing the statement to

"Notice that no functions are allowed to overwrite the match data unless they're explicitly documented to do so."

and then clean up existing documentation and add `save-match-data' where appropriate.

Philipp
Reply | Threaded
Open this post in threaded view
|

Re: Saving match data

Eli Zaretskii
> From: Philipp Stephani <[hidden email]>
> Date: Wed, 28 Sep 2016 14:01:23 +0000
>
> How about changing the statement to
>
> "Notice that no functions are allowed to overwrite the match data unless they're explicitly documented to do
> so."
>
> and then clean up existing documentation and add `save-match-data' where appropriate.

Can we really make such a promise and keep it?  I sincerely doubt
that.

What do others think?

Reply | Threaded
Open this post in threaded view
|

Re: Saving match data

Stefan Monnier
In reply to this post by Philipp Stephani
> I think this statement is surprising

Agreed.  That's why we have to write it explicitly in the doc ;-)

> and puts unnecessary burden on Elisp programmers.

Experience shows that it's the more efficient choice, tho: both in terms
of CPU efficiency and in terms of programmer efficiency.

So, yes, I think it's definitely necessary.

> Taken literally, Elisp programmers need to surround even calls to
> `car' with `save-match-data' because the documentation of `car'
> doesn't specify that it doesn't change the match data.

Indeed, there's also an expectation that "primitives" don't touch the
match-data.  It would be good to document it, tho it will take some work
to clarify what is meant by "primitive".

> "Notice that no functions are allowed to overwrite the match data unless
> they're explicitly documented to do so."

> and then clean up existing documentation and add `save-match-data' where
> appropriate.

That would imply adding save-match-data *everywhere*.  It's an enormous
amount of work, can't be automated, and comes with only two obvious results:
- our Elisp source code will be significantly larger.
- Emacs will be slower.


        Stefan


Reply | Threaded
Open this post in threaded view
|

Re: Saving match data

Michael Heerdegen
Stefan Monnier <[hidden email]> writes:

> That would imply adding save-match-data *everywhere*.  It's an enormous
> amount of work, can't be automated, and comes with only two obvious
> results:
> - our Elisp source code will be significantly larger.
> - Emacs will be slower.

This sounds crazy.  Sorry about this ignorant question: Why do we use
this model of match data: a global state that is changed as a side
effect in thousands of circumstances.  If you really need the match
data, the common way is to suppress that it is changed.  This discussion
shows that this approach seems to have great downsides, and it doesn't
seem very "lispy" anyway.

Why don't we just let the programmer explicitly save the match data
(e.g. to a symbol) when he is interested in it (FWIW this is already
possible with `match-data' and `save-match-data').  That would be more
transparent and work around this kind of problem.


Michael.

Reply | Threaded
Open this post in threaded view
|

Re: Saving match data

Marcin Borkowski-3
In reply to this post by Philipp Stephani

On 2016-09-28, at 16:01, Philipp Stephani <[hidden email]> wrote:

> Hi,
>
> the Elisp manual (section "The Match Data") says:
>
> "Notice that all functions are allowed to overwrite the match data
> unless they’re explicitly documented not to do so."
>
> I think this statement is surprising and puts unnecessary burden on Elisp

Yes, it is surprising.  Here's a story from three years ago about how
this hit me:
http://mbork.pl/2013-09-18_Selective_replacement_in_LaTeX_documents_(en)

OTOH, I agree with Stefan and Eli that changing that would be a huge
work (and it would make Emacs slower).

OYAH, I think that it is safe to assume that only functions related to
searching actually mess with match data, and one could easily grep the
Emacs sources to make a list of functions which actually change match
data.  Then, we could extend these functions' docstrings (and mentions
in the manual) with a suitable mention.  IOW, I would consider doing
this

> and then clean up existing documentation [...]

but not this:

> [...] add `save-match-data' where appropriate.

Also, one might consider (as hinted in my post) adding `save-match-data'
to _interactive_ functions messing with match data.  This way, the user
would not be surprised as I was back then.  This _might_ be a reasonable
compromise, no?

WDYT?

> Philipp

Best,

--
Marcin Borkowski

Reply | Threaded
Open this post in threaded view
|

Re: Saving match data

Stefan Monnier
In reply to this post by Michael Heerdegen
> This sounds crazy.  Sorry about this ignorant question: Why do we use
> this model of match data: a global state that is changed as a side
> effect in thousands of circumstances.

That's a design choice with which we've lived for ever.
I don't like it either.  I'd much rather have a regexp-matcher which
returns the match data as a return *value*.

I sometimes dream about extending pcase to support something like

    (pcase <e>
      ((re "^\\(?header:[^:]*\\):\\(?value:.*\\)") (cons header value))
      ...)

of course, it would also take multiple branches and merge them into
a single DFA, and in some versions it even brings world peace,


        Stefan


Reply | Threaded
Open this post in threaded view
|

Re: Saving match data

Michael Heerdegen
Stefan Monnier <[hidden email]> writes:

> > This sounds crazy.  Sorry about this ignorant question: Why do we
> > use this model of match data: a global state that is changed as a
> > side effect in thousands of circumstances.
>
> That's a design choice with which we've lived for ever.

It's still not too late to build something better on top of it.


> I sometimes dream about extending pcase to support something like
>
>     (pcase <e>
>       ((re "^\\(?header:[^:]*\\):\\(?value:.*\\)") (cons header value))
>       ...)
>
> of course, it would also take multiple branches and merge them into
> a single DFA, and in some versions it even brings world peace,

Ambitious!


Michael.

Reply | Threaded
Open this post in threaded view
|

Re: Saving match data

Eli Zaretskii
In reply to this post by Michael Heerdegen
> From: Michael Heerdegen <[hidden email]>
> Date: Wed, 28 Sep 2016 18:49:37 +0200
> Cc: [hidden email]
>
> Why don't we just let the programmer explicitly save the match data
> (e.g. to a symbol) when he is interested in it (FWIW this is already
> possible with `match-data' and `save-match-data').  That would be more
> transparent and work around this kind of problem.

AFAIU, that's exactly what we do now.

Reply | Threaded
Open this post in threaded view
|

Re: Saving match data

Michael Heerdegen
Eli Zaretskii <[hidden email]> writes:

> > Why don't we just let the programmer explicitly save the match data
> > (e.g. to a symbol) when he is interested in it (FWIW this is already
> > possible with `match-data' and `save-match-data').  That would be more
> > transparent and work around this kind of problem.
>
> AFAIU, that's exactly what we do now.

I notice I wrote `save-match-data' where I meant `set-match-data',
sorry.  `save-match-data' is the thing I would like to avoid: instead of
an implicit state that is altered as side effect, I want to make uses of
match data explicit e.g. by using a variable.


Michael.

Reply | Threaded
Open this post in threaded view
|

Re: Saving match data

Eli Zaretskii
> From: Michael Heerdegen <[hidden email]>
> Cc: [hidden email],  [hidden email]
> Date: Wed, 28 Sep 2016 22:15:29 +0200
>
> `save-match-data' is the thing I would like to avoid: instead of
> an implicit state that is altered as side effect, I want to make uses of
> match data explicit e.g. by using a variable.

Why? what would that gain us?

Reply | Threaded
Open this post in threaded view
|

Re: Saving match data

Michael Heerdegen
Eli Zaretskii <[hidden email]> writes:

> > `save-match-data' is the thing I would like to avoid: instead of an
> > implicit state that is altered as side effect, I want to make uses
> > of match data explicit e.g. by using a variable.
>
> Why? what would that gain us?

Not having the need to protect match data to be lost, and not having to
care about which functions may change the match data, and which can be
guaranteed to not change match data.  That would be irrelevant.  You
would bind the match data to a variable directly when it is produced,
and refer to it later by that variable name.  That would be more
transparent, and spare us to invest either a large amount of work, or
make Emacs slower.


Michael.

Reply | Threaded
Open this post in threaded view
|

Re: Saving match data

Lars Ingebrigtsen
Michael Heerdegen <[hidden email]> writes:

> You would bind the match data to a variable directly when it is
> produced, and refer to it later by that variable name.

This is how it's done in most languages.

To be explicit in an Emacs context: if you wanted to double the number
in all instances of a<number>, it could look like:

(while (setq match (new-re-search-forward "a\\([0-9]+\\)" nil t))
  (insert (match match 1)))

It would be nice and clean and all, but the current global state thing
we have now is rarely any problem in practice.  You just have to learn
to do no non-trivial operations between the match and where you use the
match data.

--
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

Reply | Threaded
Open this post in threaded view
|

Re: Saving match data

Eli Zaretskii
In reply to this post by Michael Heerdegen
> From: Michael Heerdegen <[hidden email]>
> Cc: [hidden email],  [hidden email]
> Date: Wed, 28 Sep 2016 22:42:36 +0200
>
> Eli Zaretskii <[hidden email]> writes:
>
> > > `save-match-data' is the thing I would like to avoid: instead of an
> > > implicit state that is altered as side effect, I want to make uses
> > > of match data explicit e.g. by using a variable.
> >
> > Why? what would that gain us?
>
> Not having the need to protect match data to be lost, and not having to
> care about which functions may change the match data, and which can be
> guaranteed to not change match data.  That would be irrelevant.  You
> would bind the match data to a variable directly when it is produced,
> and refer to it later by that variable name.  That would be more
> transparent, and spare us to invest either a large amount of work, or
> make Emacs slower.

Maybe I misunderstand the proposal, because it sounds very similar to
what we have.  Could you perhaps show an example using the current and
the proposed technique, so that the differences are clear?

Thanks.

Reply | Threaded
Open this post in threaded view
|

Re: Saving match data

Uwe Brauer
In reply to this post by Marcin Borkowski-3

    > On 2016-09-28, at 16:01, Philipp Stephani <[hidden email]> wrote:


    > Yes, it is surprising.  Here's a story from three years ago about how
    > this hit me:
    > http://mbork.pl/2013-09-18_Selective_replacement_in_LaTeX_documents_(en)

There was also a problem with auto-capitalize-mode and auctex. See
see: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=23180
and its solution


In order to avoid, that in Auctex, math-mode constructions like
$A_i$ get expanded to $A_I$ use set `auto-capitalize-predicate' was
set to (lambda () (not (texmathp))) as in the following. However
`texmathp' function doesn't save match data but it's run in
`auto-capitalize' that is installed into `after-change-functions'
hook however such functions (info "(elisp)Change Hooks") must
restore match data otherwise unexpected behavior will appear, as
it's in the case of the following BUG.....


Reply | Threaded
Open this post in threaded view
|

Re: Saving match data

Michael Heerdegen
In reply to this post by Eli Zaretskii
Eli Zaretskii <[hidden email]> writes:

> Maybe I misunderstand the proposal, because it sounds very similar to
> what we have.  Could you perhaps show an example using the current and
> the proposed technique, so that the differences are clear?

Well, what currently looks like (using some fantasy function names)

#+begin_src emacs-lisp
(progn
  (search-forward "test")
  (save-match-data
    (do-this)
    (maybe-change-match-data-here)
    (do-that))
  (use-the-match-data))
#+end_src

could become something like

#+begin_src emacs-lisp
(with-match-data data
     (search-forward "test")
   (do-this)
   (maybe-change-match-data-here)
   (do-that)
   (use-match-data data
      (use-the-match-data)))
#+end_src


We don't have multiple values in Elisp.  So, `with-match-data' could be
a macro that binds a specified variable in first position to the match
data present after (normally) evaluating the expression in the second
position.  `use-match-data' would be another macro that would change the
match data in its scope to the evaluation result of the sexp in the
first position.


Honestly, I don't really want to propose to change things, if either in
this direction, or at all... as long as we don't invest too much time to
consolidate what we have now (like ensuring tons of function to not
change match data).


Regards,

Michael.

Reply | Threaded
Open this post in threaded view
|

Re: Saving match data

Eli Zaretskii
> From: Michael Heerdegen <[hidden email]>
> Cc: [hidden email],  [hidden email]
> Date: Sat, 08 Oct 2016 06:02:44 +0200
>
> Well, what currently looks like (using some fantasy function names)
>
> #+begin_src emacs-lisp
> (progn
>   (search-forward "test")
>   (save-match-data
>     (do-this)
>     (maybe-change-match-data-here)
>     (do-that))
>   (use-the-match-data))
> #+end_src
>
> could become something like
>
> #+begin_src emacs-lisp
> (with-match-data data
>      (search-forward "test")
>    (do-this)
>    (maybe-change-match-data-here)
>    (do-that)
>    (use-match-data data
>       (use-the-match-data)))
> #+end_src

This just replaces one macro with 2 different ones, and doesn't make
the code more elegant or readable, or bring any new benefits, does it?
Or am I missing something?

> Honestly, I don't really want to propose to change things, if either in
> this direction, or at all... as long as we don't invest too much time to
> consolidate what we have now (like ensuring tons of function to not
> change match data).

I don't see why we would need to do anything with tons of functions.
The stuff works, doesn't it?

Reply | Threaded
Open this post in threaded view
|

Re: Saving match data

Stefan Monnier
In reply to this post by Michael Heerdegen
> #+begin_src emacs-lisp
> (with-match-data data
>      (search-forward "test")
>    (do-this)
>    (maybe-change-match-data-here)
>    (do-that)
>    (use-match-data data
>       (use-the-match-data)))
> #+end_src

It would still be based on state, with the same risks and issues.
The idea of using

    (let ((data (new-search-forward "test")))
      ...
      (use-match-result data)
      ...)

is that the user is forced to pass `data` hence to explicitly say with
which search result to do the work.


        Stefan


Reply | Threaded
Open this post in threaded view
|

Re: Saving match data

Philipp Stephani
In reply to this post by Stefan Monnier
It's been a while, but I'm finally coming back to this.

Stefan Monnier <[hidden email]> schrieb am Mi., 28. Sep. 2016 um 18:12 Uhr:
> I think this statement is surprising

Agreed.  That's why we have to write it explicitly in the doc ;-)

Could we also write it in the docstrings of the match-related functions (match-beginning etc.)? I guess people are more likely to read those than the manual.
 

> and puts unnecessary burden on Elisp programmers.

Experience shows that it's the more efficient choice, tho: both in terms
of CPU efficiency and in terms of programmer efficiency.


I disagree, but it's probably too late to change the contract now.
 

> Taken literally, Elisp programmers need to surround even calls to
> `car' with `save-match-data' because the documentation of `car'
> doesn't specify that it doesn't change the match data.

Indeed, there's also an expectation that "primitives" don't touch the
match-data.  It would be good to document it, tho it will take some work
to clarify what is meant by "primitive".

At least all functions that are side-effect-free or pure (in the sense of byte-opt) are trivially in this category, so we could amend the help texts of these functions automatically.
(Looking at that list, I'm wondering why so few functions are marked as pure - e.g. even `eq' is apparently not pure?)
 

> "Notice that no functions are allowed to overwrite the match data unless
> they're explicitly documented to do so."

> and then clean up existing documentation and add `save-match-data' where
> appropriate.

That would imply adding save-match-data *everywhere*.  It's an enormous
amount of work, can't be automated,

Unfortunately yes.
 
and comes with only two obvious results:
- our Elisp source code will be significantly larger.
- Emacs will be slower.

It would also come with the obvious result of making the Emacs function contracts much clearer because they wouldn't modify global state any more. 
Reply | Threaded
Open this post in threaded view
|

Re: Saving match data

Stefan Monnier
>> > I think this statement is surprising
>> Agreed.  That's why we have to write it explicitly in the doc ;-)
> Could we also write it in the docstrings of the match-related functions
> (match-beginning etc.)? I guess people are more likely to read those than
> the manual.

That might work, yes (tho in my experience, Elisp coder don't read
docstrings nearly as much as I would have expected).

>> Indeed, there's also an expectation that "primitives" don't touch the
>> match-data.  It would be good to document it, tho it will take some work
>> to clarify what is meant by "primitive".
> At least all functions that are side-effect-free or pure (in the sense of
> byte-opt) are trivially in this category, so we could amend the help texts
> of these functions automatically.

Indeed.

> (Looking at that list, I'm wondering why so few functions are marked as
> pure - e.g. even `eq' is apparently not pure?)

I don't think there's a good reason for that.  Just lack of need so far.

> It would also come with the obvious result of making the Emacs function
> contracts much clearer because they wouldn't modify global state any more.

Some still would (as long as the match-data is a global state).


        Stefan

Reply | Threaded
Open this post in threaded view
|

Re: Saving match data

Philipp Stephani


Stefan Monnier <[hidden email]> schrieb am Fr., 16. Juni 2017 um 21:53 Uhr:

>> Indeed, there's also an expectation that "primitives" don't touch the
>> match-data.  It would be good to document it, tho it will take some work
>> to clarify what is meant by "primitive".
> At least all functions that are side-effect-free or pure (in the sense of
> byte-opt) are trivially in this category, so we could amend the help texts
> of these functions automatically.

Indeed.


That's at least easy enough to do (patch attached). 

0001-Say-that-side-effect-free-functions-don-t-change-the-m.txt (1K) Download Attachment
12