bug#18: Fine-grained revert-buffer

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

bug#18: Fine-grained revert-buffer

Mauro Aranda
Short story of how I got to this bug report:

For a while now, I've been wanting to contribute to Emacs by doing
something more than reporting bugs and try to provide trivial fixes for
the bugs I found.  So I looked into emacs-devel, to read about
recomendations for beginning to contribute.  Based on some of the mails
I found [1], it looks like working on some wishlist items or minor bugs
would be good.  So I said to myself: OK, let's see what is the oldest
one still open.  And here I am.


Now I'll make a summary of what I've understood from the bug report:

The wish is to have a command that would act like 'revert-buffer'
(e.g., modifying the buffer rightaway), but that tries to do a better job
preserving markers.  Also, it is desirable that undo info is preserved.
It is known that 'revert-buffer' preserves undo info nowadays, but what I
understand is that it would be good to keep undo info of parts of the
total reverted change, so the undo of the reverted action can be made by
steps, and not as a single undo operation.

I think that the alternatives proposed after the report [2] do not
fulfill the wish because they do not modify the buffer immediately.  I'm
not sure how they act in regards to preserving markers and the undo
info, though.

AFAIK, the functionality wanted is still not present, so I'd guess it's
still relevant to work in this matter.

So for a week or so now, I've been working in a command that does what I
think it is wanted.  The command is called 'revert-buffer-by-hunks'
(I've added 'rbbh' as a prefix for sending the file for you to
see/test), because it calls 'diff' and then with the output patches the
buffer.

At first, I wrote a command that used diff-mode under the hood, but I've
been having troubles with preserving markers.  So I took a step back,
and wrote a new function that patches the buffer with the contents of
the file visited on disk.  With that done, I think it is time for me to
show what I've written so far, in order to:

1) Know if the functionality is present (I don't think so, but I believe
this is the first thing to know in order to advance).
2) There's still interest in having this command.
3) Know if working on this subject would be appreciated, or should I
move on to other things.
4) Receieve feedback, suggestions, fixes on things I'm sure I'm missing.
5) Discuss some of the questions that have arisen while I've been
working on this.

I'll wait for answers regarding to 1-4.  With regards to 5, I would like
to read opinions about:
a) What variables would you think should be customizable?
b) After reverting by hunks, I think it would be desirable to navigate
through the hunks reverted and toggle their state (i.e., go back and
forth to the contents before the revert operation and the contens on
disk).  That is because I think it would be kinda annoying to want to
undo some of the reverted hunks, and to do that having to undo
sequentially from the Nth hunk reverted to the desired one.

I attach a first draft of my work.  I'm hoping to hear suggestions and
corrections to improve it.

Best regards,
Mauro.


rbbh-bug18.el (12K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

bug#18: Fine-grained revert-buffer

Eli Zaretskii
> From: Mauro Aranda <[hidden email]>
> Date: Fri, 26 Apr 2019 19:42:02 -0300
>
> For a while now, I've been wanting to contribute to Emacs by doing
> something more than reporting bugs and try to provide trivial fixes for
> the bugs I found.  So I looked into emacs-devel, to read about
> recomendations for beginning to contribute.  Based on some of the mails
> I found [1], it looks like working on some wishlist items or minor bugs
> would be good.  So I said to myself: OK, let's see what is the oldest
> one still open.  And here I am.

Thanks!

> The wish is to have a command that would act like 'revert-buffer'
> (e.g., modifying the buffer rightaway), but that tries to do a better job
> preserving markers.  Also, it is desirable that undo info is preserved.
> It is known that 'revert-buffer' preserves undo info nowadays, but what I
> understand is that it would be good to keep undo info of parts of the
> total reverted change, so the undo of the reverted action can be made by
> steps, and not as a single undo operation.
>
> I think that the alternatives proposed after the report [2] do not
> fulfill the wish because they do not modify the buffer immediately.  I'm
> not sure how they act in regards to preserving markers and the undo
> info, though.
>
> AFAIK, the functionality wanted is still not present, so I'd guess it's
> still relevant to work in this matter.

Sounds correct to me.

> So for a week or so now, I've been working in a command that does what I
> think it is wanted.  The command is called 'revert-buffer-by-hunks'
> (I've added 'rbbh' as a prefix for sending the file for you to
> see/test), because it calls 'diff' and then with the output patches the
> buffer.
>
> At first, I wrote a command that used diff-mode under the hood, but I've
> been having troubles with preserving markers.  So I took a step back,
> and wrote a new function that patches the buffer with the contents of
> the file visited on disk.  With that done, I think it is time for me to
> show what I've written so far, in order to:
>
> 1) Know if the functionality is present (I don't think so, but I believe
> this is the first thing to know in order to advance).
> 2) There's still interest in having this command.
> 3) Know if working on this subject would be appreciated, or should I
> move on to other things.
> 4) Receieve feedback, suggestions, fixes on things I'm sure I'm missing.

Please take a look at replace-buffer-contents, which is new with Emacs
26.  It might allow you to implement this functionality in a much
simpler way, as it already contains an internal implementation of a
Diff-like comparison algorithm, and doesn't require the Diff program
to be installed.

One caveat: replace-buffer-contents can be very slow when the buffer
is large and reverting it requires a large number of small changes.
It will fall back to a simpler algorithm for large numbers of changes,
and could give up entirely if making the changes takes too much time,
see its doc string.  Perhaps in those cases we should fall back to a
different code, like the one you wrote.

Did you time your code?  How long does it take to revert buffers of
different sizes with different amounts of changes?

> a) What variables would you think should be customizable?

The name of the Diff command should be customizable.  Or maybe just
use diff-command already provided by diff.el.  Same with Diff
switches.



Reply | Threaded
Open this post in threaded view
|

bug#18: Fine-grained revert-buffer

martin rudalics
In reply to this post by Mauro Aranda
 > 4) Receieve feedback, suggestions, fixes on things I'm sure I'm missing.

What's IMHO urgently needed are better heuristics for restoring
markers after reverting their buffer.  Currently, most markers end up
at the beginning or end of a hunk that as been restored and thus
become useless.

What we probably need is an extra step to scan the buffer for markers
and save their textual context before reverting and a step to restore
them according to their textual context after reverting.  But if your
method allows to easily determine which hunks remain unchanged, we
could avoid such textual search for markers in unchanged hunks and,
depending on the approach used for replacing text, simply restore
these markers from their offsets from the beginning of the hunk they
belong to.

Many thanks for working on this, martin



Reply | Threaded
Open this post in threaded view
|

bug#18: Fine-grained revert-buffer

Mauro Aranda
In reply to this post by Eli Zaretskii
Eli and Martin, thanks for your answers.

Eli Zaretskii <[hidden email]> writes:

> Please take a look at replace-buffer-contents, which is new with Emacs
> 26.  It might allow you to implement this functionality in a much
> simpler way, as it already contains an internal implementation of a
> Diff-like comparison algorithm, and doesn't require the Diff program
> to be installed.

I didn't know of replace-buffer-contents, so I took a look at it.  Nice
that its documentation even mentions the problem when a marker is
inside a hunk, because of the delete + insert thing (just like Martin
mentions).  IMO, it does most of the things required, but here is one
problem I notice with respect to the expected functionality of
revert-buffer-by-hunks (as I've understood it):

It only calls Fundo_boundary before starting the whole set of
modifications, and thus after replacing the contents (in my tests, the
whole buffer), a single C-/ brings all the changes back, much like
revert-buffer.  And since it binds inhibit-modification-hooks to t, I
think I can't bind locally after-change-functions to an expression that
calls undo-boundary, to do the trick.

Perhaps an optional call to Fundo_boundary in the while loop could be
enough, but I'm not sure how much it will impact on the speed of
replace-buffer-contents.  Or more, if it is justified to add that call.
Please, point out to me if I'm not seeing this right.

Other than that, it looks like a perfect candidate to use (at least to me)
to get the functionality wanted.

> One caveat: replace-buffer-contents can be very slow when the buffer
> is large and reverting it requires a large number of small changes.
> It will fall back to a simpler algorithm for large numbers of changes,
> and could give up entirely if making the changes takes too much time,
> see its doc string.  Perhaps in those cases we should fall back to a
> different code, like the one you wrote.

> Did you time your code?  How long does it take to revert buffers of
> different sizes with different amounts of changes?

I haven't timed it yet.  I didn't know if it would be considered good
enough, to time it.  For a week, I've been testing it manually with some
of the changes in the Emacs sources, and the experience has been
satisfactory.  Are there, by any chance, such tests for
replace-buffer-contents?  I could use them, for comparison purposes.
I will try in the following days to define some parameters (such as
buffer size), and time revert-buffer-by-hunks, to provide some numbers.

Provided it is fast enough, I think something like replacing by hunks a
region would be a good fallback to replace-buffer-contents.  I sure hope
so.

>> a) What variables would you think should be customizable?
>
> The name of the Diff command should be customizable.  Or maybe just
> use diff-command already provided by diff.el.  Same with Diff
> switches.

I agree.  If using diff.el, it makes total sense to use those
variables.  Of course, that means the patch-buffer function should
be modified to work on the different diff output formats (I think --context
and --unified should be enough).  For the record, I don't propose to use
diff-apply-hunk and other diff-mode.el functions, because when I used
that, I ended up with markers at (point-min), I don't know why.  But if
it is desired to reuse those functions instead of repeating code, I
think I will need time (and help, perhaps), to understand why that
happened.


martin rudalics <[hidden email]> writes:

> What we probably need is an extra step to scan the buffer for markers
> and save their textual context before reverting and a step to restore
> them according to their textual context after reverting.  But if your

When I bumped into the problem of the marker being sent to the
beginning of the hunk, I started looking for something to get the
markers of the buffer, but didn't found anything at the Lisp level.

> method allows to easily determine which hunks remain unchanged, we
> could avoid such textual search for markers in unchanged hunks and,
> depending on the approach used for replacing text, simply restore
> these markers from their offsets from the beginning of the hunk they
> belong to.

Yes, I believe that by getting the diff output and with the line-offset
handling in the patch-buffer function, it would be easy to determine the
unchanged regions.


Thanks again to both of you.

Best regards,
Mauro.

Reply | Threaded
Open this post in threaded view
|

bug#18: Fine-grained revert-buffer

Eli Zaretskii
> From: Mauro Aranda <[hidden email]>
> Date: Sat, 27 Apr 2019 12:10:45 -0300
> Cc: Eli Zaretskii <[hidden email]>, martin rudalics <[hidden email]>
>
> I didn't know of replace-buffer-contents, so I took a look at it.  Nice
> that its documentation even mentions the problem when a marker is
> inside a hunk, because of the delete + insert thing (just like Martin
> mentions).  IMO, it does most of the things required, but here is one
> problem I notice with respect to the expected functionality of
> revert-buffer-by-hunks (as I've understood it):
>
> It only calls Fundo_boundary before starting the whole set of
> modifications, and thus after replacing the contents (in my tests, the
> whole buffer), a single C-/ brings all the changes back, much like
> revert-buffer.

Is there a requirement to be able to undo the revert piecemeal? What
would be the use case for that?

> > Did you time your code?  How long does it take to revert buffers of
> > different sizes with different amounts of changes?
>
> I haven't timed it yet.  I didn't know if it would be considered good
> enough, to time it.  For a week, I've been testing it manually with some
> of the changes in the Emacs sources, and the experience has been
> satisfactory.  Are there, by any chance, such tests for
> replace-buffer-contents?

You can find one in bug#31888.

Also, this discussion:

  http://lists.gnu.org/archive/html/help-gnu-emacs/2019-02/msg00000.html

indicates that using JSON pretty-printer might produce a good test
case.



Reply | Threaded
Open this post in threaded view
|

bug#18: Fine-grained revert-buffer

Mauro Aranda
Eli Zaretskii <[hidden email]> writes:

> Is there a requirement to be able to undo the revert piecemeal? What
> would be the use case for that?

Your questions made me open my eyes, so thanks.  I originally thought
that it would be a good thing, for example if the user wants to revert some
of the changes but keep others.  But that use case is already covered by
using diff-buffer-with-file, for example, and applying (or not) each hunk
separately.  Even better, the undo process doesn't have to be from
the Nth hunk to the 1st, in order.  So please, ignore what I said about
that problem, using replace-buffer-contents seems to be a great choice
to address this wishlist item.

>> I haven't timed it yet.  I didn't know if it would be considered good
>> enough, to time it.  For a week, I've been testing it manually with some
>> of the changes in the Emacs sources, and the experience has been
>> satisfactory.  Are there, by any chance, such tests for
>> replace-buffer-contents?
>
> You can find one in bug#31888.
>
> Also, this discussion:
>
>   http://lists.gnu.org/archive/html/help-gnu-emacs/2019-02/msg00000.html
>
> indicates that using JSON pretty-printer might produce a good test
> case.

Thanks.

I'll keep working on the potential fallback for replace-buffer-contents,
and report back when I feel I've made some progress.

Best regards,
Mauro.
Reply | Threaded
Open this post in threaded view
|

bug#18: Fine-grained revert-buffer

Richard Stallman
In reply to this post by Mauro Aranda
[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > So for a week or so now, I've been working in a command that does what I
  > think it is wanted.  The command is called 'revert-buffer-by-hunks'
  > (I've added 'rbbh' as a prefix for sending the file for you to
  > see/test), because it calls 'diff' and then with the output patches the
  > buffer.

It sounds like an improvement in functionality,
but wouldn't it be a lot slower?

--
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





Reply | Threaded
Open this post in threaded view
|

bug#18: Fine-grained revert-buffer

martin rudalics
In reply to this post by Mauro Aranda
 > When I bumped into the problem of the marker being sent to the
 > beginning of the hunk, I started looking for something to get the
 > markers of the buffer, but didn't found anything at the Lisp level.

We could introduce a function 'marker-list' based on BUF_MARKERS.

martin



Reply | Threaded
Open this post in threaded view
|

bug#18: Fine-grained revert-buffer

Mauro Aranda
In reply to this post by Richard Stallman
Richard Stallman <[hidden email]> writes:

> It sounds like an improvement in functionality,
> but wouldn't it be a lot slower?

Hello Richard.

Compared to revert-buffer, yes, I would expect it to be slower (not sure
if a lot).  But maybe the trade-off between time and preserving some
buffer information is worth it, let's see.

I'm finalizing some benchmarking, which I'll post in another email.
I'll wait for opinions about the results.

Best regards,
Mauro.
Reply | Threaded
Open this post in threaded view
|

bug#18: Fine-grained revert-buffer

Mauro Aranda
I've now written a similar function to revert the buffer, but using
replace-buffer-contents as I was suggested.  The new attached file
contains such function, which I called revert-buffer-with-fine-grain, to
honor a previous post.

I kept it as similar as I could to the revert-buffer-by-hunks
function, and went ahead to time both functions (I figure there will be
enough time to add functionality to them).  I additionally timed
revert-buffer.

For the files in Bug#31888, the results were:
revert-buffer: 0.122238819
revert-buffer-with-fine-grain: 2.85150876
rbbh-revert-buffer-by-hunks: 2.044583634

all times in seconds, substracting GC time.

To gather more data, I compared emacs-25.3 and emacs-26.2 C source
files (from the src/ directory), checking them out from the git
repository.  I did the test as if I was going back from emacs-26.2 to
emacs-25.3.  I think these files provided a good range of changes in
size and number of hunks, obtained from `diff --normal'.

I attach the data as ascii file.  Let me know if I should send it in
another format.  The tests are with the latest master, optimized build,
run as `emacs -Q'.

For the majority of files, revert-buffer and
revert-buffer-with-fine-grain spent a similar amount of time when the
buffers or changes are not too big, so I believe that's promising for
implementing this functionality.  On the contrary, reverting by hunks
(by calling diff), takes 1 to 4 times more than the other functions, in
these cases.

When there are many changes, replace-buffer-contents starts taking too
much time (as the docs warns).  In those situations, reverting by hunks
does a much better job than using replace-buffer-contents, but I'm not
sure if it does a good enough job.

Some problematic files:
doprnt.c:
revert-buffer-with-fine-grain took 28.6 times more than revert-buffer
and rbbh-revert-buffer-by-hunks took 1.4 times more.

alloc.c:
revert-buffer-with-fine-grain took 18 times more than revert-buffer
and rbbh-revert-buffer-by-hunks took 6.5 times more.

lisp.h:
lisp.h took almost 9 minutes to revert, using
revert-buffer-with-fine-grain.  It took 0.86 seconds to revert with
rbbh-revert-buffer-by-hunks.

I'll wait for opinions about the data I've collected.  At least it was
super fun writing these tests.

Best regards,
Mauro.


bug18-benchmark-report.txt (22K) Download Attachment
rbbh-bug18.el (15K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

bug#18: Fine-grained revert-buffer

Basil L. Contovounesios
In reply to this post by martin rudalics
martin rudalics <[hidden email]> writes:

>> When I bumped into the problem of the marker being sent to the
>> beginning of the hunk, I started looking for something to get the
>> markers of the buffer, but didn't found anything at the Lisp level.
>
> We could introduce a function 'marker-list' based on BUF_MARKERS.

I've opened a new ticket for this: https://debbugs.gnu.org/35536

[The acknowledgement email lists you as being CCed, but the copy of my
report does not.  So, were you successfully CCed?  If not, any ideas why?]

Thanks,

--
Basil



Reply | Threaded
Open this post in threaded view
|

bug#18: Fine-grained revert-buffer

Glenn Morris-3
"Basil L. Contovounesios" wrote:

> I've opened a new ticket for this: https://debbugs.gnu.org/35536
>
> [The acknowledgement email lists you as being CCed, but the copy of my
> report does not.  So, were you successfully CCed?  If not, any ideas why?]

When you see this, it means the person is subscribed to bug-gnu-emacs
and has the Mailman option "no duplicates" enabled. Everything is
working fine.



Reply | Threaded
Open this post in threaded view
|

bug#18: Fine-grained revert-buffer

Basil L. Contovounesios
Glenn Morris <[hidden email]> writes:

> "Basil L. Contovounesios" wrote:
>
>> I've opened a new ticket for this: https://debbugs.gnu.org/35536
>>
>> [The acknowledgement email lists you as being CCed, but the copy of my
>> report does not.  So, were you successfully CCed?  If not, any ideas why?]
>
> When you see this, it means the person is subscribed to bug-gnu-emacs
> and has the Mailman option "no duplicates" enabled. Everything is
> working fine.

Good to hear, and sorry for the noise.

Thanks,

--
Basil



Reply | Threaded
Open this post in threaded view
|

bug#18: Fine-grained revert-buffer

Richard Stallman
In reply to this post by Basil L. Contovounesios
[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > >> When I bumped into the problem of the marker being sent to the
  > >> beginning of the hunk, I started looking for something to get the
  > >> markers of the buffer, but didn't found anything at the Lisp level.
  > >
  > > We could introduce a function 'marker-list' based on BUF_MARKERS.

There is a reason why you can't get a copy of that list: if anything
else pointed to a copy of it, no marker in that buffer could be
garbage collected.  We should look for some other way to solve your
problem.

--
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





Reply | Threaded
Open this post in threaded view
|

bug#18: Fine-grained revert-buffer

Basil L. Contovounesios
Richard Stallman <[hidden email]> writes:

>   > >> When I bumped into the problem of the marker being sent to the
>   > >> beginning of the hunk, I started looking for something to get the
>   > >> markers of the buffer, but didn't found anything at the Lisp level.
>   > >
>   > > We could introduce a function 'marker-list' based on BUF_MARKERS.
>
> There is a reason why you can't get a copy of that list: if anything
> else pointed to a copy of it, no marker in that buffer could be
> garbage collected.

Isn't that true of all marker copying/manipulation at the Lisp level?
I think the manual already documents these pitfalls.

> We should look for some other way to solve your problem.

I think that would indeed be preferable.

--
Basil



Reply | Threaded
Open this post in threaded view
|

bug#18: Fine-grained revert-buffer

Richard Stallman
[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > There is a reason why you can't get a copy of that list: if anything
  > > else pointed to a copy of it, no marker in that buffer could be
  > > garbage collected.

  > Isn't that true of all marker copying/manipulation at the Lisp level?

There is always the possibility if blocking some markers from GC
by saving pointers to them and not clearing those out.
However, getting a list of ALL the markers would make it trivially
easy to block ALL the buffer's markers from GC.
--
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





Reply | Threaded
Open this post in threaded view
|

bug#18: Fine-grained revert-buffer

Mauro Aranda
In reply to this post by Mauro Aranda
I spent some time thinking on how to get this new revert command
integrated with the rest of revert-buffer functions, and I came up with
the attached patch.

Since the difference between revert-buffer and
revert-buffer-with-fine-grain lies at the insert-file-contents level, I
wrote an alternative to
revert-buffer-insert-file-contents--default-function, that uses
replace-buffer-contents, as Eli suggested.  I named it
revert-buffer-insert-file-contents-delicately.

Then, revert-buffer-with-fine-grain only needs to bind
revert-buffer-insert-file-contents-function to
revert-buffer-insert-file-contents-delicately to get the job done.

There are some more things that revert-buffer-with-fine-grain needs to
do:
1) Manipulate the current-prefix-arg, just like revert-buffer does.
2) Since replace-buffer-contents can fail to replace conservatively,
return that information could be useful.  But revert-buffer--default
always returns t after reverting, so I used a closure that then gets
called by revert-buffer-with-fine-grain.

To control how much time replace-buffer-contents spends trying to be
non-destructive, I introduced a new variable to pass as the MAX-SECS
argument.  2.0 secs looks like a good value, according to the previous
benchmark I posted, and the new one I post here.  I think it should be
an acceptable delay.

As I said above, I timed the command, and to compare I timed
revert-buffer too.  IMO, the results look good.  Some files fail and
take a lot of time, yes, but I think it's unlikely that a user will revert
so many changes.  And for those, at least the variable
revert-buffer-with-fine-grain-max-seconds helps to terminate the
execution earlier.

I also went ahead and wrote a test for revert-buffer-with-fine-grain,
because I figure it would be required, as well as the benchmarking.
As I was there, I wrote a similar test for revert-buffer.

Corrections and comments are welcome.  Better names too.

Best regards,
Mauro.

bug18-benchmark.txt (12K) Download Attachment
0001-Add-a-new-functionality-for-reverting-visiting-file-.patch (13K) Download Attachment