Lisp primitives and their calling of the change hooks

classic Classic list List threaded Threaded
66 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Lisp primitives and their calling of the change hooks

Alan Mackenzie
Hello, Emacs.

I've just had an interesting few days investigating our privitives'
calling of before-change-functions and after-change-functions.

In an ideal world, each primitive would call each of those hooks exactly
once.  However, we're not in an ideal world, and there are primitives
which perform several (or many) distinct changes (e.g. transpose-regions
or subst-char-in-region) where a single pair of hook calls wouldn't make
sense.

TL;DR: There are several, but not many, primitives where
before-change-functions doesn't match after-change-functions.

Here is how I went about this:
1. Extract a list of external functions in insdel.c from the section in
lisp.h which declares them.  Form a regexp from this list, and convert it
into a grep regexp (by replacing '?' by '\?'.

2. grep -l *.c with this regexp.  This gave these files:
    buffer.c
    callproc.c
    casefiddle.c
    cmds.c
    coding.c
    decompress.c
    editfns.c
    emacs.c
    fileio.c
    fns.c
    indent.c
    insdel.c
    print.c
    process.c
    search.c
    textprop.c
    xdisp.c
    xml.c

3. Using GNU cflow, a utility which creates call graphs for C files,
create a reverse call graph (i.e. an "is called by" graph) for the 18 C
files.

4. Analyse this graph with an elisp script to find all functions which,
directly or indirectly, call signal_before_change or signal_after_change.

5. Filter this list of functions to leave only the lisp primitives (i.e.
functions starting with "F"), and convert to Lisp names.  Edit this list
by hand to remove those primitives which don't change the buffer (most of
them were removed).  This left the following list:
    (add-face-text-property)
    (add-text-properties)
    (base64-decode-region)
    (base64-encode-region)
    (capitalize-region)

    (capitalize-word)
    (delete-and-extract-region)
    (delete-char)
    (delete-field)
    (delete-region)

    (downcase-region)
    (downcase-word)
    (erase-buffer)
    (indent-to)
    (insert)

    (insert-and-inherit)
    (insert-before-markers)
    (insert-buffer-substring)
    (insert-byte)
    (insert-char)

    (insert-file-contents)
    (move-to-column)
    (princ)
    (print)
    (put-text-property)

    (remove-list-of-text-properties)
    (remove-text-properties)
    (replace-buffer-contents)
    (replace-match)
    (self-insert-command)

    (set-buffer-multibyte)
    (set-text-properties)
    (subst-char-in-region)
    (terpri)
    (translate-region-internal)

    (transpose-regions)
    (upcase-initials-region)
    (upcase-region)
    (upcase-word)
    (write-char)

    (zlib-decompress-region)

6. Write and run a script which executes each of these primitives whilst
counting the number of times it invokes before-change-hooks and
after-change-hooks.  Output messages where these numbers aren't 1 and 1.
This gave the following:
    Primitive              b-c-f calls         a-c-f calls
base64-encode-region          2                    2
base64-decode-region          2                    1
insert-file-contents          2                    2
move-to-column-1              3                    3   ; with the &optional
                                                       ; which untabifies.
replace-buffer-contents      123                  123
set-buffer-multibyte          0                    0   ; May have done nothing
tranlate-region-internal      1                   146
tranpose-regions              2                    1
upcase-initials-region        1                    0[*]
upcase-region                 1                    0[*]
zlib-decompress-region        0                    0
erase-buffer                  0                    0

[*] In upcase-... there weren't any lower case characters in the buffer
at the time.

This list is incomplete.  There were one or two other primitives which
triggered messages which I didn't write down, and now cannot reproduce.

7. Surpising is that insert-file-contents (which gave so much trouble in
summer 2016) returned balanced counts.  It seems to be mostly the special
purpose primitives, those not widely used in Lisp, which are most likely
to have strange b-c-f and a-c-f counts.  But the case-switching
primitives might be troublesome.  So might transpose-regions.

8. Possibly these things could be more accurately documented in
elisp.info.

--
Alan Mackenzie (Nuremberg, Germany).

Reply | Threaded
Open this post in threaded view
|

Re: Lisp primitives and their calling of the change hooks

Stefan Monnier
> 6. Write and run a script which executes each of these primitives whilst
> counting the number of times it invokes before-change-hooks and
> after-change-hooks.

FWIW, we do not try to make those numbers match (and their begin/end
specs don't necessarily match either).

What we aim to do (i.e. what defines what I would consider as a bug) is
to make sure every a-c-f is preceded by a "covering" b-c-f.  IOW, b-c-f
may be followed by any number of a-f-c (including 0) as long as those
are within the text chunk covered by the b-f-c.

Some of your results clearly indicate what I'd consider as bugs.
E.g. `erase-buffer` should call those hooks (unless the buffer was
already empty).  OTOH for upcase-region 1 call to b-c-f and 0 to a-c-f
is acceptable.  For most of the others, a deeper inspection would be
needed to figure out if there's an actual bug or if it's just
a normal occurrence.


        Stefan


Reply | Threaded
Open this post in threaded view
|

Re: Lisp primitives and their calling of the change hooks

Alan Mackenzie
Hello, Stefan.

On Wed, Jan 03, 2018 at 16:51:29 -0500, Stefan Monnier wrote:
> > 6. Write and run a script which executes each of these primitives whilst
> > counting the number of times it invokes before-change-hooks and
> > after-change-hooks.

> FWIW, we do not try to make those numbers match (and their begin/end
> specs don't necessarily match either).

In practice, these numbers match for the vast majority of buffer
changing calls, and they match at 1-1.  They match for all the
"primitive" primitives, which are basically insert, delete, and possibly
change.

These numbers, in an ideal world, would match.  It is only because we
have "non-primitive" primitives (i.e. primitives which perform several
distinct buffer changes) that they don't.

> What we aim to do (i.e. what defines what I would consider as a bug) is
> to make sure every a-c-f is preceded by a "covering" b-c-f.  IOW, b-c-f
> may be followed by any number of a-f-c (including 0) as long as those
> are within the text chunk covered by the b-f-c.

Yes.  That applies, however, only to "compound primitives", not to the
"primitive primitives", insert and delete, which comprise nearly all the
calls in actual use, which are all 1-1.

It is an awkward state of affairs, where after-c-f's have somehow got to
"remember" that they may only be processing part of the change announced
by before-c-f.

It is also not true: insert-file-contents, in circumstances explored in
summer 2016, invokes only a-c-f, not b-c-f.

> Some of your results clearly indicate what I'd consider as bugs.
> E.g. `erase-buffer` should call those hooks (unless the buffer was
> already empty).

I've had a look at the source, and erase-buffer clearly calls the two
hooks.  I can't at the moment see what went wrong.

> OTOH for upcase-region 1 call to b-c-f and 0 to a-c-f is acceptable.

I don't really agree, but that won't change anything.  ;-(

> For most of the others, a deeper inspection would be needed to figure
> out if there's an actual bug or if it's just a normal occurrence.

We know there is a bug in insert-file-contents (See summer 2016).  I
would be surprised indeed if there weren't others, too.

A way to fix them if we were going to (which we're not), would be to
take all the b-c-f and a-c-f calls out of the "compound primitives" and
have the latter effect their actions through calling the "true
primitivies".

However, we could improve the documentation of this situation in the
eilsp manual.

>         Stefan

--
Alan Mackenzie (Nuremberg, Germany).

Reply | Threaded
Open this post in threaded view
|

Re: Lisp primitives and their calling of the change hooks

Stefan Monnier
>> FWIW, we do not try to make those numbers match (and their begin/end
>> specs don't necessarily match either).
> In practice, these numbers match for the vast majority of buffer
> changing calls, and they match at 1-1.

Yes, as discussed numerous times in the past ;-)

> These numbers, in an ideal world, would match.

Not in my ideal world, no.

> It is also not true:

As mentioned, I'd consider that as a bug.

> insert-file-contents, in circumstances explored in
> summer 2016, invokes only a-c-f, not b-c-f.

And indeed, I considered it a bug and AFAIK this is now fixed.

>> For most of the others, a deeper inspection would be needed to figure
>> out if there's an actual bug or if it's just a normal occurrence.
> We know there is a bug in insert-file-contents (See summer 2016).
                ^^
                was

> I would be surprised indeed if there weren't others, too.

I would too.

> However, we could improve the documentation of this situation in the
> elisp manual.

We currently say:

      Do @emph{not} expect the before-change hooks and the after-change
    hooks be called in balanced pairs around each buffer change.  Also
    don't expect the before-change hooks to be called for every chunk of
    text Emacs is about to delete.  These hooks are provided on the
    assumption that Lisp programs will use either before- or the
    after-change hooks, but not both, and the boundaries of the region
    where the changes happen might include more than just the actual
    changed text, or even lump together several changes done piecemeal.

which is lax enough that any behavior could be argued to be acceptable.
IOW I think it's too lax.  We should probably try and fix it to reflect
the fact that every change should be covered by the last preceding b-c-f
and should be followed by a corresponding call to a-c-f (and this
before the next call to b-c-f).


        Stefan

Reply | Threaded
Open this post in threaded view
|

Re: Lisp primitives and their calling of the change hooks

Alan Mackenzie
Hello, Stefan.

On Thu, Jan 04, 2018 at 13:16:23 -0500, Stefan Monnier wrote:

> As mentioned, I'd consider that [insert-file-contents not calling
> b-c-f] as a bug.

> > insert-file-contents, in circumstances explored in
> > summer 2016, invokes only a-c-f, not b-c-f.

> And indeed, I considered it a bug and AFAIK this is now fixed.

Hey!  It is indeed fixed.  Thanks!  I didn't know that.

[ .... ]

> > I would be surprised indeed if there weren't others, too.

> I would too.

> > However, we could improve the documentation of this situation in the
> > elisp manual.

> We currently say:

>       Do @emph{not} expect the before-change hooks and the after-change
>     hooks be called in balanced pairs around each buffer change.  Also
>     don't expect the before-change hooks to be called for every chunk of
>     text Emacs is about to delete.  These hooks are provided on the
>     assumption that Lisp programs will use either before- or the
>     after-change hooks, but not both, and the boundaries of the region
>     where the changes happen might include more than just the actual
>     changed text, or even lump together several changes done piecemeal.

> which is lax enough that any behavior could be argued to be acceptable.
> IOW I think it's too lax.  We should probably try and fix it to reflect
> the fact that every change should be covered by the last preceding b-c-f
> and should be followed by a corresponding call to a-c-f (and this
> before the next call to b-c-f).

Is that quite right?  The upcase-region call in my test had no a-c-f
call, almost certainly because there were no lower case letters in the
buffer at the time.  From your answers in this thread, I'm thinking that
every primitive-call which could change the buffer will have exactly one
b-c-f and zero or more a-c-f's.

How about something like this to replace that paragraph from the elisp
manual?

    The primitives which atomically insert or delete a contiguous chunk
    of text into or from a buffer will call `before-change-functions'
    and `after-change-functions' in balanced pairs, once for each
    change.  The arguments to these hooks will exactly delimit the
    change being made.  Calls to these primitives comprise the vast bulk
    of buffer changes.

    Other, more complex primitives aim to call `before-change-functions'
    once before making any changes, then to call
    `after-change-functions' zero, one, or several times, depending on
    how many individual changes the primitive makes.  The `BEG' and
    `END' arguments to `before-change-functions' will enclose a region
    in which the individual changes are made, but won't necessarily be
    the minimal such region.  The `BEG', `END', and `OLD-LEN' arguments
    to each successive call of `after-change-functions' will accurately
    delimit the current change.

>         Stefan

--
Alan Mackenzie (Nuremberg, Germany).

Reply | Threaded
Open this post in threaded view
|

Re: Lisp primitives and their calling of the change hooks

Stefan Monnier
>> which is lax enough that any behavior could be argued to be acceptable.
>> IOW I think it's too lax.  We should probably try and fix it to reflect
>> the fact that every change should be covered by the last preceding b-c-f
>> and should be followed by a corresponding call to a-c-f (and this
>> before the next call to b-c-f).
> Is that quite right?

Probably not quite.

> The upcase-region call in my test had no a-c-f call, almost certainly
> because there were no lower case letters in the buffer at the time.

Indeed, there were no changes, so no need to call a-c-f.

> From your answers in this thread, I'm thinking that every
> primitive-call which could change the buffer will have exactly one
> b-c-f and zero or more a-c-f's.

Sounds about right, tho I expect some primitives might just call insert
and delete a few times, thus calling b-c-f several times.

> How about something like this to replace that paragraph from the elisp
> manual?
>
>     The primitives which atomically insert or delete a contiguous chunk
>     of text into or from a buffer will call `before-change-functions'
>     and `after-change-functions' in balanced pairs, once for each
>     change.  The arguments to these hooks will exactly delimit the
>     change being made.  Calls to these primitives comprise the vast bulk
>     of buffer changes.
>
>     Other, more complex primitives aim to call `before-change-functions'
>     once before making any changes, then to call
>     `after-change-functions' zero, one, or several times, depending on
>     how many individual changes the primitive makes.  The `BEG' and
>     `END' arguments to `before-change-functions' will enclose a region
>     in which the individual changes are made, but won't necessarily be
>     the minimal such region.  The `BEG', `END', and `OLD-LEN' arguments
>     to each successive call of `after-change-functions' will accurately
>     delimit the current change.

Looks good to me, thank you.

I think in the case of subst-chars-in-region we only call a-c-f one time
(but with tighter bounds than those of the preceding b-c-f) rather than
once per character that's substituted, so maybe "The `BEG', `END', and
`OLD-LEN' arguments to each successive call of `after-change-functions'
will accurately delimit the current change" promises a bit more than we
deliver, although it depends on how we interpret "current change".

In any case, the above is much better than what we have now and I think
it gives a pretty good rendition of our intention.


        Stefan

Reply | Threaded
Open this post in threaded view
|

Re: Lisp primitives and their calling of the change hooks

Eli Zaretskii
In reply to this post by Alan Mackenzie
> Date: Thu, 4 Jan 2018 21:11:54 +0000
> From: Alan Mackenzie <[hidden email]>
> Cc: [hidden email]
>
>     The primitives which atomically insert or delete a contiguous chunk
>     of text into or from a buffer will call `before-change-functions'
>     and `after-change-functions' in balanced pairs, once for each
>     change.  The arguments to these hooks will exactly delimit the
>     change being made.  Calls to these primitives comprise the vast bulk
>     of buffer changes.
>
>     Other, more complex primitives aim to call `before-change-functions'
>     once before making any changes, then to call
>     `after-change-functions' zero, one, or several times, depending on
>     how many individual changes the primitive makes.  The `BEG' and
>     `END' arguments to `before-change-functions' will enclose a region
>     in which the individual changes are made, but won't necessarily be
>     the minimal such region.  The `BEG', `END', and `OLD-LEN' arguments
>     to each successive call of `after-change-functions' will accurately
>     delimit the current change.

How will the reader know to distinguish between these two classes of
primitives?  Without such an ability, the extra accuracy in this text
is not useful.


Reply | Threaded
Open this post in threaded view
|

Re: Lisp primitives and their calling of the change hooks

Alan Mackenzie
Hello, Eli.

On Fri, Jan 05, 2018 at 08:55:20 +0200, Eli Zaretskii wrote:
> > Date: Thu, 4 Jan 2018 21:11:54 +0000
> > From: Alan Mackenzie <[hidden email]>
> > Cc: [hidden email]

> >     The primitives which atomically insert or delete a contiguous chunk
> >     of text into or from a buffer will call `before-change-functions'
> >     and `after-change-functions' in balanced pairs, once for each
> >     change.  The arguments to these hooks will exactly delimit the
> >     change being made.  Calls to these primitives comprise the vast bulk
> >     of buffer changes.

> >     Other, more complex primitives aim to call `before-change-functions'
> >     once before making any changes, then to call
> >     `after-change-functions' zero, one, or several times, depending on
> >     how many individual changes the primitive makes.  The `BEG' and
> >     `END' arguments to `before-change-functions' will enclose a region
> >     in which the individual changes are made, but won't necessarily be
> >     the minimal such region.  The `BEG', `END', and `OLD-LEN' arguments
> >     to each successive call of `after-change-functions' will accurately
> >     delimit the current change.

> How will the reader know to distinguish between these two classes of
> primitives?  Without such an ability, the extra accuracy in this text
> is not useful.

A good point.  Wherein lies the difference, from a programmers point of
view?  Briefly I think that we have a "complex primitive" when we don't
have an exact match between one b-c-f and one a-c-f.  How about adding
the following paragraph to what comes above:

    The "complex primitive" case can be distinguised from the "atomic
    primitive" case because either the call to `after-change-functions'
    is missing (i.e. there are two consecutive calls to
    `before-change-functions'), or in the first call to
    `after-change-functions', `OLD-LEN' is less then `END' - `BEG' in
    `before-change-functions'.

The above leaves unsaid what happens when a "complex primitive" happens
to call b-c-f and a-c-f as though it were an "atomic primitive".  This
doesn't seem important enough to take up the space.

Personally, I think that when we come to rationalise and refactor
insdel.c and related files sometime in the medium future, we should
arrange to have b-c-f and a-c-f called only as "atomic" changes.  There
is no longer any need to optimise the calling of these hooks, and the
irregularites of these optimisations imposes an overhead on the use of
these hooks.

--
Alan Mackenzie (Nuremberg, Germany).

Reply | Threaded
Open this post in threaded view
|

Re: Lisp primitives and their calling of the change hooks

Eli Zaretskii
> Date: Fri, 5 Jan 2018 11:41:07 +0000
> Cc: [hidden email], [hidden email]
> From: Alan Mackenzie <[hidden email]>
>
>     The "complex primitive" case can be distinguised from the "atomic
>     primitive" case because either the call to `after-change-functions'
>     is missing (i.e. there are two consecutive calls to
>     `before-change-functions'), or in the first call to
>     `after-change-functions', `OLD-LEN' is less then `END' - `BEG' in
>     `before-change-functions'.
>
> The above leaves unsaid what happens when a "complex primitive" happens
> to call b-c-f and a-c-f as though it were an "atomic primitive".

It also provides no way to know, up front, whether a given primitive
I'm about to call, is one or the other.  IMO, we need some way of
doing that, if we want to document this distinction.

Reply | Threaded
Open this post in threaded view
|

Re: Lisp primitives and their calling of the change hooks

Alan Mackenzie
Hello, Eli.

On Fri, Jan 05, 2018 at 15:00:21 +0200, Eli Zaretskii wrote:
> > Date: Fri, 5 Jan 2018 11:41:07 +0000
> > Cc: [hidden email], [hidden email]
> > From: Alan Mackenzie <[hidden email]>

> >     The "complex primitive" case can be distinguised from the "atomic
> >     primitive" case because either the call to `after-change-functions'
> >     is missing (i.e. there are two consecutive calls to
> >     `before-change-functions'), or in the first call to
> >     `after-change-functions', `OLD-LEN' is less then `END' - `BEG' in
> >     `before-change-functions'.

> > The above leaves unsaid what happens when a "complex primitive" happens
> > to call b-c-f and a-c-f as though it were an "atomic primitive".

> It also provides no way to know, up front, whether a given primitive
> I'm about to call, is one or the other.  IMO, we need some way of
> doing that, if we want to document this distinction.

Do we really need this level of detail?  My idea was to enable users of
b-c-f and a-c-f to predict what they're going to be being hit with.

There are two patterns of handling b/a-c-f, the "atomic" and the
"complex".  My above proposal documents enough for somebody using
b/a-c-f to be able to handle the "atomic" and "complex" uses.

Why does that hacker need to know exactly what each buffer-changing
primitive does, or which falls into which category?  Surely it is enough
that she handle the b/a-c-f calls appropriately.

What am I missing here?

--
Alan Mackenzie (Nuremberg, Germany).

Reply | Threaded
Open this post in threaded view
|

Re: Lisp primitives and their calling of the change hooks

Eli Zaretskii
> Date: Fri, 5 Jan 2018 13:34:48 +0000
> Cc: [hidden email], [hidden email]
> From: Alan Mackenzie <[hidden email]>
>
> On Fri, Jan 05, 2018 at 15:00:21 +0200, Eli Zaretskii wrote:
> > > Date: Fri, 5 Jan 2018 11:41:07 +0000
> > > Cc: [hidden email], [hidden email]
> > > From: Alan Mackenzie <[hidden email]>
>
> > >     The "complex primitive" case can be distinguised from the "atomic
> > >     primitive" case because either the call to `after-change-functions'
> > >     is missing (i.e. there are two consecutive calls to
> > >     `before-change-functions'), or in the first call to
> > >     `after-change-functions', `OLD-LEN' is less then `END' - `BEG' in
> > >     `before-change-functions'.
>
> > > The above leaves unsaid what happens when a "complex primitive" happens
> > > to call b-c-f and a-c-f as though it were an "atomic primitive".
>
> > It also provides no way to know, up front, whether a given primitive
> > I'm about to call, is one or the other.  IMO, we need some way of
> > doing that, if we want to document this distinction.
>
> Do we really need this level of detail?  My idea was to enable users of
> b-c-f and a-c-f to predict what they're going to be being hit with.
>
> There are two patterns of handling b/a-c-f, the "atomic" and the
> "complex".  My above proposal documents enough for somebody using
> b/a-c-f to be able to handle the "atomic" and "complex" uses.
> [...]
> What am I missing here?

Maybe it's me that is missing something.  You first say above that you
want to "enable users of b-c-f and a-c-f to predict what they're going
to be being hit with", which is exactly my concern, but then provide a
recipe that AFAIU only works post-factum, i.e. the user can only know
whether they called an "atomic" or a "complex" primitive by analyzing
the calls to the 2 hooks as result of calling the primitive.  If
that's indeed what you are saying, IMO it's not a useful criterion,
because generally when I read documentation, I shouldn't be required
to write code in order to interpret the documentation.

> Why does that hacker need to know exactly what each buffer-changing
> primitive does, or which falls into which category?  Surely it is enough
> that she handle the b/a-c-f calls appropriately.

How can she handle these calls correctly unless she knows which of the
hooks will be called by a given primitive, and whether these calls
will be balanced?  And if she doesn't need to know that, then why do
we have to tell here these details about the 2 classes of primitives?

IOW, accurate information is only useful if one knows exactly how to
apply it to the practical case in hand.

Reply | Threaded
Open this post in threaded view
|

Re: Lisp primitives and their calling of the change hooks

Alan Mackenzie
Hello, Eli.

On Fri, Jan 05, 2018 at 16:08:31 +0200, Eli Zaretskii wrote:
> > Date: Fri, 5 Jan 2018 13:34:48 +0000
> > Cc: [hidden email], [hidden email]
> > From: Alan Mackenzie <[hidden email]>

> > On Fri, Jan 05, 2018 at 15:00:21 +0200, Eli Zaretskii wrote:
> > > > Date: Fri, 5 Jan 2018 11:41:07 +0000
> > > > Cc: [hidden email], [hidden email]
> > > > From: Alan Mackenzie <[hidden email]>

> > > >     The "complex primitive" case can be distinguised from the "atomic
> > > >     primitive" case because either the call to `after-change-functions'
> > > >     is missing (i.e. there are two consecutive calls to
> > > >     `before-change-functions'), or in the first call to
> > > >     `after-change-functions', `OLD-LEN' is less then `END' - `BEG' in
> > > >     `before-change-functions'.

> > > > The above leaves unsaid what happens when a "complex primitive" happens
> > > > to call b-c-f and a-c-f as though it were an "atomic primitive".

> > > It also provides no way to know, up front, whether a given primitive
> > > I'm about to call, is one or the other.  IMO, we need some way of
> > > doing that, if we want to document this distinction.

> > Do we really need this level of detail?  My idea was to enable users of
> > b-c-f and a-c-f to predict what they're going to be being hit with.

> > There are two patterns of handling b/a-c-f, the "atomic" and the
> > "complex".  My above proposal documents enough for somebody using
> > b/a-c-f to be able to handle the "atomic" and "complex" uses.
> > [...]
> > What am I missing here?

> Maybe it's me that is missing something.  You first say above that you
> want to "enable users of b-c-f and a-c-f to predict what they're going
> to be being hit with", which is exactly my concern, but then provide a
> recipe that AFAIU only works post-factum, i.e. the user can only know
> whether they called an "atomic" or a "complex" primitive by analyzing
> the calls to the 2 hooks as result of calling the primitive.  If
> that's indeed what you are saying, IMO it's not a useful criterion,
> because generally when I read documentation, I shouldn't be required
> to write code in order to interpret the documentation.

I think I understand what you're getting at now: that Lisp hackers will
be using these "complex" primitives in their code, and hence need to
know the b/a-c-f calling details in detail for each such primitive.
I don't think people writing modes will be using the "complex" buffer
changing primitives explicitly, at least not very much.  There are no
such calls of these primitives in CC Mode (as far as I know).

But the Lisp code will need to handle any "complex" primitives the user
throws at it, e.g. upcase-region (C-x C-u).  For this purpose, it is
only necessary to know what sequences of b/a-c-f are foreseen, so as to
be able to handle them.

> > Why does that hacker need to know exactly what each buffer-changing
> > primitive does, or which falls into which category?  Surely it is enough
> > that she handle the b/a-c-f calls appropriately.

> How can she handle these calls correctly unless she knows which of the
> hooks will be called by a given primitive, and whether these calls
> will be balanced?  And if she doesn't need to know that, then why do
> we have to tell her these details about the 2 classes of primitives?

Perhaps my idea of describing the primitives' use of b/a-c-f in the two
categories "atomic" and "complex" would create more confusion than it
would alleviate.  The "atomic" primitives constitute the overwhelming
bulk of those actually called at run time, so it seemed sensible to
describe this common, simple case separately.  Maybe this isn't the
case.

> IOW, accurate information is only useful if one knows exactly how to
> apply it to the practical case in hand.

I thought the proposed text was adequate to instruct hackers how to
write b/a-c-f's to handle _any_ existing primitives.

--
Alan Mackenzie (Nuremberg, Germany).

Reply | Threaded
Open this post in threaded view
|

Re: Lisp primitives and their calling of the change hooks

Stefan Monnier
In reply to this post by Eli Zaretskii
> How will the reader know to distinguish between these two classes of
> primitives?

He won't and shouldn't attempt to (the boundary between those two is an
internal implementation detail that is subject to change).

> Without such an ability, the extra accuracy in this text
> is not useful.

I find it useful in order to explain why naively observing the behavior
may give one the impression that all b-c-f and a-c-f calls are
"balanced".

Maybe the first paragraph should be reworded a bit so it doesn't sound
like a promise of behavior?  How 'bout:

    The vast bulk of buffer changes will call `before-change-functions'
    and `after-change-functions' in balanced pairs, once for each
    change where the arguments to these hooks will exactly delimit the
    change being made.  Yet, hook functions should not rely on this
    being always the case:

    Other, more complex primitives may call `before-change-functions'
    once before making changes and then call `after-change-functions'
    zero, one, or several times, depending on how many individual
    changes the primitive makes.  The `BEG' and `END' arguments to
    `before-change-functions' will enclose a region in which the
    individual changes are made, but won't necessarily be the minimal
    such region.  The `BEG', `END', and `OLD-LEN' arguments to each
    successive call of `after-change-functions' will more accurately
    delimit the current change.


-- Stefan

Reply | Threaded
Open this post in threaded view
|

Re: Lisp primitives and their calling of the change hooks

Alan Mackenzie
Hello, Stefan.

On Fri, Jan 05, 2018 at 11:50:56 -0500, Stefan Monnier wrote:
> > How will the reader know to distinguish between these two classes of
> > primitives?

> He won't and shouldn't attempt to (the boundary between those two is an
> internal implementation detail that is subject to change).

> > Without such an ability, the extra accuracy in this text
> > is not useful.

> I find it useful in order to explain why naively observing the behavior
> may give one the impression that all b-c-f and a-c-f calls are
> "balanced".

> Maybe the first paragraph should be reworded a bit so it doesn't sound
> like a promise of behavior?  How 'bout:

>     The vast bulk of buffer changes will call `before-change-functions'
>     and `after-change-functions' in balanced pairs, once for each
>     change where the arguments to these hooks will exactly delimit the
>     change being made.  Yet, hook functions should not rely on this
>     being always the case:

>     Other, more complex primitives may call `before-change-functions'
>     once before making changes and then call `after-change-functions'
>     zero, one, or several times, depending on how many individual
>     changes the primitive makes.  The `BEG' and `END' arguments to
>     `before-change-functions' will enclose a region in which the
>     individual changes are made, but won't necessarily be the minimal
>     such region.  The `BEG', `END', and `OLD-LEN' arguments to each
>     successive call of `after-change-functions' will more accurately
>     delimit the current change.

I like that, in general.  :-)  It gets rid of the awkward terms "atomic"
and "complex" which were more trouble than they were worth.

Just two tiny amendments:
(i) I think a comma is needed in the first paragraph after "in balanced
pairs, once for each change".

(ii) The "may" at the start of the second paragraph is not wanted.  It
suggests that b-c-f is optional.  Simply "Other, more complex primitives
call `b-c-f' once before ....".

> -- Stefan

--
Alan Mackenzie (Nuremberg, Germany).

Reply | Threaded
Open this post in threaded view
|

Re: Lisp primitives and their calling of the change hooks

Stefan Monnier
> Just two tiny amendments:
> (i) I think a comma is needed in the first paragraph after "in balanced
> pairs, once for each change".

I'll trust you on that.  My punctuation fu is weak.

> (ii) The "may" at the start of the second paragraph is not wanted.  It
> suggests that b-c-f is optional.  Simply "Other, more complex primitives
> call `b-c-f' once before ....".

Sounds good.


        Stefan


Reply | Threaded
Open this post in threaded view
|

Re: Lisp primitives and their calling of the change hooks

Eli Zaretskii
In reply to this post by Stefan Monnier
> From: Stefan Monnier <[hidden email]>
> Cc: Alan Mackenzie <[hidden email]>,  [hidden email]
> Date: Fri, 05 Jan 2018 11:50:56 -0500
>
> I find it useful in order to explain why naively observing the behavior
> may give one the impression that all b-c-f and a-c-f calls are
> "balanced".

We don't normally include such "preemptive" explanations in the
manual.  If the text doesn't say one should expect balanced calls, the
reader has no reason to expect balanced calls.  The current text even
makes a point of saying that explicitly.

>     The vast bulk of buffer changes will call `before-change-functions'
>     and `after-change-functions' in balanced pairs, once for each
>     change where the arguments to these hooks will exactly delimit the
>     change being made.  Yet, hook functions should not rely on this
>     being always the case:
>
>     Other, more complex primitives may call `before-change-functions'
>     once before making changes and then call `after-change-functions'
>     zero, one, or several times, depending on how many individual
>     changes the primitive makes.  The `BEG' and `END' arguments to
>     `before-change-functions' will enclose a region in which the
>     individual changes are made, but won't necessarily be the minimal
>     such region.  The `BEG', `END', and `OLD-LEN' arguments to each
>     successive call of `after-change-functions' will more accurately
>     delimit the current change.

This basically says the calls are mostly balanced, but don't rely on
that, because sometimes they aren't".  The text about
after-change-functions being called zero or more times adds
non-trivial information, but what is its practical usefulness?  Same
with the text about BEG and END.

Maybe I don't understand what are we trying to accomplish with these
changes, and that's why I fail to see why the proposed changes are for
the better.

Reply | Threaded
Open this post in threaded view
|

Re: Lisp primitives and their calling of the change hooks

Stefan Monnier
> We don't normally include such "preemptive" explanations in the
> manual.  If the text doesn't say one should expect balanced calls, the
> reader has no reason to expect balanced calls.

I think the name of the hooks suggests such a balance, and actual
experimentation can very easily lead the user to think that
they're balanced.  Alan may not be the "average Emacs coder", but
I think his experience is not completely unexpected.

> The current text even makes a point of saying that explicitly.

Indeed, and this discussion wants to replace this with something a bit
more specific.

> This basically says the calls are mostly balanced, but don't rely on
> that, because sometimes they aren't".

It says a bit more because it describes the way in which they're
not balanced.

> The text about after-change-functions being called zero or more times
> adds non-trivial information, but what is its practical usefulness?

It says that subsequent calls to a-c-f aren't calls with a missing b-c-f
but that they "belong" to (I think of it as "they are covered by") the
last b-c-f.

> Same with the text about BEG and END.

Not sure how important it is, but I think it can help the coder have
a mental model of the kind of unbalancedness that can occur.

> Maybe I don't understand what are we trying to accomplish with these
> changes, and that's why I fail to see why the proposed changes are for
> the better.

The current text basically says "don't rely on them being balanced" but
doesn't say what the coder can rely on if he wants to share information
between a-c-f and b-c-f.

The new text tries to be sufficiently loose that if Emacs doesn't obey
it it's actually a bug, yet sufficiently precise that an Elisp coder
can make use of it to reliably share information between a-c-f and
b-c-f.


        Stefan

Reply | Threaded
Open this post in threaded view
|

Re: Lisp primitives and their calling of the change hooks

Eli Zaretskii
> From: Stefan Monnier <[hidden email]>
> Cc: [hidden email],  [hidden email]
> Date: Fri, 05 Jan 2018 17:28:15 -0500
>
> > Maybe I don't understand what are we trying to accomplish with these
> > changes, and that's why I fail to see why the proposed changes are for
> > the better.
>
> The current text basically says "don't rely on them being balanced" but
> doesn't say what the coder can rely on if he wants to share information
> between a-c-f and b-c-f.
>
> The new text tries to be sufficiently loose that if Emacs doesn't obey
> it it's actually a bug, yet sufficiently precise that an Elisp coder
> can make use of it to reliably share information between a-c-f and
> b-c-f.

Can you describe a practical situation where an Elisp coder could use
the new text to some practical benefit, i.e. to change her
implementation to be better/more resilient (as opposed to just
enhancing her understanding of this stuff)?  I guess I don't see how
such practical benefits would be possible with the new text.

Reply | Threaded
Open this post in threaded view
|

Re: Lisp primitives and their calling of the change hooks

Alan Mackenzie
In reply to this post by Stefan Monnier
Hello, Stefan.

On Thu, Jan 04, 2018 at 16:36:42 -0500, Stefan Monnier wrote:
> >> which is lax enough that any behavior could be argued to be acceptable.
> >> IOW I think it's too lax.  We should probably try and fix it to reflect
> >> the fact that every change should be covered by the last preceding b-c-f
> >> and should be followed by a corresponding call to a-c-f (and this
> >> before the next call to b-c-f).
> > Is that quite right?

> Probably not quite.

> > The upcase-region call in my test had no a-c-f call, almost certainly
> > because there were no lower case letters in the buffer at the time.

> Indeed, there were no changes, so no need to call a-c-f.

> > From your answers in this thread, I'm thinking that every
> > primitive-call which could change the buffer will have exactly one
> > b-c-f and zero or more a-c-f's.

> Sounds about right, tho I expect some primitives might just call insert
> and delete a few times, thus calling b-c-f several times.

> > How about something like this to replace that paragraph from the elisp
> > manual?

> >     The primitives which atomically insert or delete a contiguous chunk
> >     of text into or from a buffer will call `before-change-functions'
> >     and `after-change-functions' in balanced pairs, once for each
> >     change.  The arguments to these hooks will exactly delimit the
> >     change being made.  Calls to these primitives comprise the vast bulk
> >     of buffer changes.

> >     Other, more complex primitives aim to call `before-change-functions'
> >     once before making any changes, then to call
> >     `after-change-functions' zero, one, or several times, depending on
> >     how many individual changes the primitive makes.  The `BEG' and
> >     `END' arguments to `before-change-functions' will enclose a region
> >     in which the individual changes are made, but won't necessarily be
> >     the minimal such region.  The `BEG', `END', and `OLD-LEN' arguments
> >     to each successive call of `after-change-functions' will accurately
> >     delimit the current change.

> Looks good to me, thank you.

I've found a discrepancy.  Just one.  In (transpose-regions 1 10 11 20),
the hook calls are, in order, ((1 10) (11 20) (1 20 19)).  The two
consecutive b-c-f's happen when the two regions are of equal size and
non-contiguous.

The cause of this is not hard to find: in Ftranspose_region, editfns.c
L5204, there are two calls to modify_text on consecutive lines.  This
seems to be some sort of optimisation.  It is not done elsewhere in
Ftranspose_region.  I dare say this could be fixed easily.

> I think in the case of subst-chars-in-region we only call a-c-f one time
> (but with tighter bounds than those of the preceding b-c-f) rather than
> once per character that's substituted, so maybe "The `BEG', `END', and
> `OLD-LEN' arguments to each successive call of `after-change-functions'
> will accurately delimit the current change" promises a bit more than we
> deliver, although it depends on how we interpret "current change".

> In any case, the above is much better than what we have now and I think
> it gives a pretty good rendition of our intention.

Perhaps for Emacs-27, if we want to fix transpose-regions.

>         Stefan

--
Alan Mackenzie (Nuremberg, Germany).

Reply | Threaded
Open this post in threaded view
|

Re: Lisp primitives and their calling of the change hooks

Stefan Monnier
In reply to this post by Eli Zaretskii
> Can you describe a practical situation where an Elisp coder could use
> the new text to some practical benefit, i.e. to change her
> implementation to be better/more resilient (as opposed to just
> enhancing her understanding of this stuff)?  I guess I don't see how
> such practical benefits would be possible with the new text.

In CC-mode, Alan wants to store information about the state of the
buffer before a change and then use this info after the change (IIUC
this is mostly to try and avoid recomputing parsing info about the rest
of the buffer).  With the current description, all he can say is "in
practice my hack works 99% of the time, and the doc says that
I basically can't bring it to 100%".  With the new description, it
should be possible for him to bring it to 100% (as you know I think he'd
be better off using an approach like that of syntax-ppss, but that
doesn't mean we shouldn't try to make it possible to do it reliably his
way).


        Stefan

1234