bug#28695: 26.0.60; Rendering lag spikes caused by double-buffering on Linux

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

bug#28695: 26.0.60; Rendering lag spikes caused by double-buffering on Linux

Bryan Gilbert-2
Hi,

I've been using the Emacs 26 release branch and I've noticed a
significant amount of rendering lag spikes. When this occurs the UI
completely locks up in Emacs (and the desktop) for short periods of
time. Better than an explanation I have two very short screencasts
included below the display the behavior.

I've been able to reproduce the problem with 100% reliability using the
`counsel-rg` command from ivy/swiper (full details to reproduce below).
I've also used git bisect to narrow down the exact commit these lag
spikes were introduced as this commit:

https://github.com/emacs-mirror/emacs/commit/c29071587c64efb30792bd72248d3c791abd9337

I was able to verify that reverting to the previous commit before
double-buffering was added completely removed the problem. I've made two
short screencasts, one before the double-buffer commit and the other
after.

Before: http://drop.bryan.sh/YtUzfcSRp7.mp4
After: http://drop.bryan.sh/UyRpSc4NyQ.mp4

The behavior is unaffected by the glxgears program I have running in
both screencasts, I just used it as a method to show when the screen is
locking up. In the second screencast when it looks like the screen is
completely locked up, I am just pressing the 'e' and 'backspace' in
alternation once every second.

Two minor precursors to running the steps to reproduce are that one
would need to be running Linux and have 'rg' installed.

Steps to Reproduce using emacs -Q:

1. Add the melpa archive

    (require 'package)
    (add-to-list 'package-archives
                '("melpa" . "http://melpa.org/packages/"))

2. Install Counsel:

    (package-refresh-contents)
    (package-install) ;; counsel

3. Enable Ivy:

    (ivy-mode 1)

4. Change counsel-rg minimum query length from 3 characters to 1 character:

    (defun ivy-counsel-ag-function (string base-cmd extra-ag-args)
      (when (null extra-ag-args)
          (setq extra-ag-args ""))
      (if (< (length string) 1)  ;; #1
          (counsel-more-chars 1)
          (let ((default-directory counsel--git-dir)
              (regex (counsel-unquote-regex-parens
                      (setq ivy--old-re
                              (ivy--regex
                              (counsel-unquote-regex-parens string)))))) ;; #2
          (let* ((args-end (string-match " -- " extra-ag-args))
                  (file (if args-end
                          (substring-no-properties extra-ag-args (+ args-end 3))
                          ""))
                  (extra-ag-args (if args-end
                                      (substring-no-properties extra-ag-args 0 args-end)
                                  extra-ag-args))
                  (ag-cmd (format base-cmd
                                  (concat extra-ag-args
                                          " -- "
                                          (shell-quote-argument regex)
                                          file))))
              (if (file-remote-p default-directory)
                  (split-string (shell-command-to-string ag-cmd) "\n" t)
              (counsel--async-command ag-cmd)
              nil)))))

    (advice-add #'counsel-ag-function :override #'+ivy*counsel-ag-function)

5. Run 'counsel-rg', begin typing, and notice large lag spikes

    (counsel-rg)


Step number 3 is a bit of a messy step, however by lowering the minimum
query size from 3 characters to 1 character makes the rendering lag spikes
painfully obvious.

Thanks!


=========================================================================


In GNU Emacs 26.0.60 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.22.21)
 of 2017-09-28 built on borealis
Repository revision: 88a0dd71f10ffb63fba08c062e948551c3e876c2
Windowing system distributor 'The X.Org Foundation', version 11.0.11903000
System Description: Arch Linux

Configured using:
 'configure --prefix=/usr --sysconfdir=/etc --libexecdir=/usr/lib
 --localstatedir=/var --mandir=/usr/share/man --with-gameuser=:games
 --with-sound=alsa --with-xft --with-modules --with-x-toolkit=gtk3
 --without-gconf --without-gsettings --without-gpm --without-m17n-flt
 --with-xwidgets --without-compress-install 'CFLAGS=-march=x86-64
 -mtune=generic -O2 -pipe -fstack-protector-strong -fno-plt'
 CPPFLAGS=-D_FORTIFY_SOURCE=2
 LDFLAGS=-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now'

Configured features:
XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND DBUS NOTIFY ACL GNUTLS
LIBXML2 FREETYPE LIBOTF XFT ZLIB TOOLKIT_SCROLL_BARS GTK3 X11 MODULES
XWIDGETS LIBSYSTEMD LCMS2

Important settings:
  value of $LC_COLLATE:
  value of $LC_CTYPE:
  value of $LC_MESSAGES:
  value of $LC_MONETARY:
  value of $LC_NUMERIC:
  value of $LC_TIME:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8



Reply | Threaded
Open this post in threaded view
|

bug#28695: 26.0.60; Rendering lag spikes caused by double-buffering on Linux

Eli Zaretskii
> From: Bryan Gilbert <[hidden email]>
> Date: Tue, 03 Oct 2017 23:10:16 -0400
>
> Step number 3 is a bit of a messy step, however by lowering the minimum
> query size from 3 characters to 1 character makes the rendering lag spikes
> painfully obvious.

Thanks for the report.

While we are waiting for the double-buffering issues to be looked at,
you should be able to work around this problem by using the following
snippet from NEWS:

  If your system has [the XDBE] extension, but an
  Emacs built with double buffering misbehaves on some displays you use,
  you can disable the feature by adding

    '(inhibit-double-buffering . t)

  to default-frame-alist.  Or inject this parameter into the selected
  frame by evaluating this form:

    (modify-frame-parameters nil '((inhibit-double-buffering . t)))



Reply | Threaded
Open this post in threaded view
|

bug#28695: 26.0.60; Rendering lag spikes caused by double-buffering on Linux

martin rudalics
In reply to this post by Bryan Gilbert-2
 > I was able to verify that reverting to the previous commit before
 > double-buffering was added completely removed the problem. I've made two
 > short screencasts, one before the double-buffer commit and the other
 > after.

Unless this problem (and that of Bug#28652) get fixed, we should either
disable double buffering by default or add some special warning so users
do not have to spend their time investigating this issue.

martin



Reply | Threaded
Open this post in threaded view
|

bug#28695: 26.0.60; Rendering lag spikes caused by double-buffering on Linux

Eli Zaretskii
> Date: Wed, 04 Oct 2017 11:04:49 +0200
> From: martin rudalics <[hidden email]>
>
> Unless this problem (and that of Bug#28652) get fixed, we should either
> disable double buffering by default or add some special warning so users
> do not have to spend their time investigating this issue.

Isn't it strange that only some people see these problems, while
others (the majority?) are very happy with double-buffering?  Could it
be that the problems are triggered by some specific system
configurations or Emacs build-time features?  If so, we could perhaps
turn it off only for those configurations, because people who are
happy with this are extremely happy, and will probably be disappointed
if we just turn it off globally.



Reply | Threaded
Open this post in threaded view
|

bug#28695: 26.0.60; Rendering lag spikes caused by double-buffering on Linux

Bryan Gilbert-2

> Eli Zaretskii <[hidden email]> writes:
>
> Isn't it strange that only some people see these problems, while
> others (the majority?) are very happy with double-buffering?  Could it
> be that the problems are triggered by some specific system
> configurations or Emacs build-time features?  If so, we could perhaps
> turn it off only for those configurations, because people who are
> happy with this are extremely happy, and will probably be disappointed
> if we just turn it off globally.

I'm currently using a Dell XPS 15 laptop with a 4K display and running
at a resolution of 3840x2160. I've noticed that the problem almost
entirely disappears when I resize the window to a smaller size (roughly
between 1/2 and 1/3 of the screen) instead of being fullscreen. It's
possible that this is only being noticed at higher resolutions.



Reply | Threaded
Open this post in threaded view
|

bug#28695: 26.0.60; Rendering lag spikes caused by double-buffering on Linux

Eli Zaretskii
> From: Bryan Gilbert <[hidden email]>
> Cc: martin rudalics <[hidden email]>, [hidden email], [hidden email]
> Date: Wed, 04 Oct 2017 07:21:55 -0400
>
> I'm currently using a Dell XPS 15 laptop with a 4K display and running
> at a resolution of 3840x2160. I've noticed that the problem almost
> entirely disappears when I resize the window to a smaller size (roughly
> between 1/2 and 1/3 of the screen) instead of being fullscreen. It's
> possible that this is only being noticed at higher resolutions.

Can you try lowering the screen resolution and using fullscreen Emacs
frames under that lower resolution?

Thanks.



Reply | Threaded
Open this post in threaded view
|

bug#28695: 26.0.60; Rendering lag spikes caused by double-buffering on Linux

Bryan Gilbert-2

> Eli Zaretskii <[hidden email]> writes:
>
> Can you try lowering the screen resolution and using fullscreen Emacs
> frames under that lower resolution?
>
> Thanks.

I just flipped the resolution down to 1080p and it became barely noticeable.
Here's is quick screen capture:

  http://drop.bryan.sh/UImFf4Rnzf.mp4

You can still slightly see it doing something weird with glxgears. However,
without that visual indication on the screen it feels almost on par with double
buffering being disabled.

I'll also mention that apart from this being reliably reproduced while using
counsel-rg, I've been hard pressed to notice it in other places. For a while,
I even thought that this may have been a bug with Ivy until I found the commit
that introduced the slowdown. This coupled with the possibility that it's only
visible at higher resolutions may be why many people aren't experiencing issues.



Reply | Threaded
Open this post in threaded view
|

bug#28695: 26.0.60; Rendering lag spikes caused by double-buffering on Linux

Eli Zaretskii
> From: Bryan Gilbert <[hidden email]>
> Cc: Bryan Gilbert <[hidden email]>, [hidden email], [hidden email]
> Date: Wed, 04 Oct 2017 08:02:44 -0400
>
> I'll also mention that apart from this being reliably reproduced while using
> counsel-rg, I've been hard pressed to notice it in other places. For a while,
> I even thought that this may have been a bug with Ivy until I found the commit
> that introduced the slowdown. This coupled with the possibility that it's only
> visible at higher resolutions may be why many people aren't experiencing issues.

Thanks.  Maybe we should indeed disable it by default for higher
resolutions.



Reply | Threaded
Open this post in threaded view
|

bug#28695: 26.0.60; Rendering lag spikes caused by double-buffering on Linux

Dmitry Gutov
In reply to this post by Bryan Gilbert-2
On 10/4/17 2:21 PM, Bryan Gilbert wrote:

>> Isn't it strange that only some people see these problems, while
>> others (the majority?) are very happy with double-buffering?  Could it
>> be that the problems are triggered by some specific system
>> configurations or Emacs build-time features?  If so, we could perhaps
>> turn it off only for those configurations, because people who are
>> happy with this are extremely happy, and will probably be disappointed
>> if we just turn it off globally.

Eli, perhaps it would be wise to ask the reporter if he'd like it to be
turned off by default, if this problem can't be fixed in time.

Personally, I can stand a few minor glitches, given that double
buffering saves me from flickering, which has been an annoyance for years.

> I'm currently using a Dell XPS 15 laptop with a 4K display and running
> at a resolution of 3840x2160.

Bryan, which revision/year is it? I also have an XPS 15 (with Skylake
CPU) with 4K resolution, and at worst, I see the minor stutters you show
in the 1080p video (after adding the advice, yes).

Ubuntu 17.04 with Unity 7 here.

I wasn't able to compare with the revision before double buffering,
though: after checking out c29071587c64efb30792bd72248d3c791abd9337^,
./autogen.sh says:

Checking whether you have the necessary tools...
(Read INSTALL.REPO for more details on building Emacs)
Checking for autoconf (need at least version 2.65)...
ok
Checking for automake (need at least version 1.11)...
ok
Your system has the required tools.
Running 'autoreconf -fi -I m4' ...
/usr/bin/m4:aclocal.m4:9: cannot open `m4/count-leading-zeros.m4': No
such file or directory
/usr/bin/m4:aclocal.m4:12: cannot open `m4/d-type.m4': No such file or
directory
/usr/bin/m4:aclocal.m4:20: cannot open `m4/explicit_bzero.m4': No such
file or directory
/usr/bin/m4:aclocal.m4:46: cannot open `m4/localtime-buffer.m4': No such
file or directory
/usr/bin/m4:aclocal.m4:52: cannot open `m4/minmax.m4': No such file or
directory
/usr/bin/m4:aclocal.m4:55: cannot open `m4/mode_t.m4': No such file or
directory
/usr/bin/m4:aclocal.m4:58: cannot open `m4/nstrftime.m4': No such file
or directory
/usr/bin/m4:aclocal.m4:60: cannot open `m4/open-cloexec.m4': No such
file or directory
/usr/bin/m4:aclocal.m4:61: cannot open `m4/open.m4': No such file or
directory
/usr/bin/m4:aclocal.m4:103: cannot open `m4/unlocked-io.m4': No such
file or directory
autom4te: /usr/bin/m4 failed with exit status: 1
/usr/bin/m4:aclocal.m4:9: cannot open `m4/count-leading-zeros.m4': No
such file or directory
/usr/bin/m4:aclocal.m4:12: cannot open `m4/d-type.m4': No such file or
directory
/usr/bin/m4:aclocal.m4:20: cannot open `m4/explicit_bzero.m4': No such
file or directory
/usr/bin/m4:aclocal.m4:46: cannot open `m4/localtime-buffer.m4': No such
file or directory
/usr/bin/m4:aclocal.m4:52: cannot open `m4/minmax.m4': No such file or
directory
/usr/bin/m4:aclocal.m4:55: cannot open `m4/mode_t.m4': No such file or
directory
/usr/bin/m4:aclocal.m4:58: cannot open `m4/nstrftime.m4': No such file
or directory
/usr/bin/m4:aclocal.m4:60: cannot open `m4/open-cloexec.m4': No such
file or directory
/usr/bin/m4:aclocal.m4:61: cannot open `m4/open.m4': No such file or
directory
/usr/bin/m4:aclocal.m4:103: cannot open `m4/unlocked-io.m4': No such
file or directory
autom4te: /usr/bin/m4 failed with exit status: 1
autoreconf: /usr/bin/autoconf failed with exit status: 1



Reply | Threaded
Open this post in threaded view
|

bug#28695: 26.0.60; Rendering lag spikes caused by double-buffering on Linux

Eli Zaretskii
> Cc: [hidden email]
> From: Dmitry Gutov <[hidden email]>
> Date: Thu, 5 Oct 2017 16:36:53 +0300
>
> Eli, perhaps it would be wise to ask the reporter if he'd like it to be
> turned off by default, if this problem can't be fixed in time.

I kinda did, by proposing the recipe to turn it off, and waiting for a
response ;-)

> Personally, I can stand a few minor glitches, given that double
> buffering saves me from flickering, which has been an annoyance for years.

I know that you are very happy with this feature, which is why it
puzzles me how come you (and others who like it) don't see these
problems.

> I wasn't able to compare with the revision before double buffering,
> though: after checking out c29071587c64efb30792bd72248d3c791abd9337^,
> ./autogen.sh says:

It should be easier to use a previous release of Emacs, if you have it
or can install it.

Btw, the hardships of building an old enough checkout are the reason
why I keep old binaries around, and insist on not doing a bootstrap
(which nukes them).  I find this a much easier way of "bisecting", or
at least of having a good start point for looking for a change that
causes some bug.



Reply | Threaded
Open this post in threaded view
|

bug#28695: 26.0.60; Rendering lag spikes caused by double-buffering on Linux

Noam Postavsky-2
In reply to this post by Dmitry Gutov
On Thu, Oct 5, 2017 at 9:36 AM, Dmitry Gutov <[hidden email]> wrote:

> I wasn't able to compare with the revision before double buffering, though:
> after checking out c29071587c64efb30792bd72248d3c791abd9337^, ./autogen.sh
> says:

> Running 'autoreconf -fi -I m4' ...
> /usr/bin/m4:aclocal.m4:9: cannot open `m4/count-leading-zeros.m4': No such
> file or directory

Try 'rm aclocal.m4' (this is mentioned in INSTALL.REPO as of last May)



Reply | Threaded
Open this post in threaded view
|

bug#28695: 26.0.60; Rendering lag spikes caused by double-buffering on Linux

Bryan Gilbert-2
In reply to this post by Dmitry Gutov

> Bryan, which revision/year is it? I also have an XPS 15 (with Skylake
> CPU) with 4K resolution, and at worst, I see the minor stutters you show
> in the 1080p video (after adding the advice, yes).

I'm using the newest model: XPS 9560 (Kaby Lake).

> Ubuntu 17.04 with Unity 7 here.

I'm running Arch Linux w/ XMonad.

> Personally, I can stand a few minor glitches, given that double
> buffering saves me from flickering, which has been an annoyance for years.

I've never actually noticed flickering with Emacs in the ~2 years that I've
been using it, although I've seen it brought up often. One thing that is
unique about my setup compared to yours is that I have no compositor running,
and I'd assume that since you're running Unity that you most likely do. I'm
not sure if that would have any effect on this issue or the flickering though.
In either case I'll do a few tests with compton running to see if that makes
a difference.

I also have a co-worker with an identical computer running Gnome on X11, so
I'll see if I can get ahold of his computer and test this as well.



Reply | Threaded
Open this post in threaded view
|

bug#28695: 26.0.60; Rendering lag spikes caused by double-buffering on Linux

Dmitry Gutov
In reply to this post by Noam Postavsky-2
On 10/5/17 5:02 PM, Noam Postavsky wrote:

> Try 'rm aclocal.m4' (this is mentioned in INSTALL.REPO as of last May)

Thanks. Now I can confirm that when using the revision before
double-buffering was added, glxgears indeed do not stutter at all
(unlike the current emacs-26).



Reply | Threaded
Open this post in threaded view
|

bug#28695: 26.0.60; Rendering lag spikes caused by double-buffering on Linux

Eli Zaretskii
> Cc: Bryan Gilbert <[hidden email]>, Eli Zaretskii <[hidden email]>,
>  [hidden email]
> From: Dmitry Gutov <[hidden email]>
> Date: Sat, 7 Oct 2017 10:53:38 +0300
>
> On 10/5/17 5:02 PM, Noam Postavsky wrote:
>
> > Try 'rm aclocal.m4' (this is mentioned in INSTALL.REPO as of last May)
>
> Thanks. Now I can confirm that when using the revision before
> double-buffering was added, glxgears indeed do not stutter at all
> (unlike the current emacs-26).

Can some of you use a tool like perf to see whether the time taken by
XdbeSwapBuffers (called from xterm.c:show_back_buffer) indeed grows
significantly when going from the 2K class of resolutions to the 4K
class?  And if it isn't XdbeSwapBuffers, then what takes most of the
time which causes that "stutter"?

More generally, for some display operation to be perceptible as
"stutter", IME we should see that operation taking times around 50 to
100 msec.  Can we establish what part of frame/window update takes the
lion's share of that time when XDBE is used?  With that information in
hand, we may find some way forward with this issue.

Thanks.



Reply | Threaded
Open this post in threaded view
|

bug#28695: 26.0.60; Rendering lag spikes caused by double-buffering on Linux

Dmitry Gutov
On 10/7/17 11:14 AM, Eli Zaretskii wrote:

> Can some of you use a tool like perf to see whether the time taken by
> XdbeSwapBuffers (called from xterm.c:show_back_buffer) indeed grows
> significantly when going from the 2K class of resolutions to the 4K
> class?  And if it isn't XdbeSwapBuffers, then what takes most of the
> time which causes that "stutter"?

With some pointers on how to use perf, I'd be happy to do that (or even
without, if you're willing to wait).

But I'm not seeing a qualitative difference between a 4K fullscreen
Emacs, a half-screen Emacs, or even Emacs with window resized further
down: there are still minor stutters here. So I'm probably not the best
person for this experiment.



Reply | Threaded
Open this post in threaded view
|

bug#28695: 26.0.60; Rendering lag spikes caused by double-buffering on Linux

Eli Zaretskii
> Cc: [hidden email], [hidden email], [hidden email]
> From: Dmitry Gutov <[hidden email]>
> Date: Mon, 9 Oct 2017 16:53:18 +0300
>
> On 10/7/17 11:14 AM, Eli Zaretskii wrote:
>
> > Can some of you use a tool like perf to see whether the time taken by
> > XdbeSwapBuffers (called from xterm.c:show_back_buffer) indeed grows
> > significantly when going from the 2K class of resolutions to the 4K
> > class?  And if it isn't XdbeSwapBuffers, then what takes most of the
> > time which causes that "stutter"?
>
> With some pointers on how to use perf, I'd be happy to do that

I think this page (which you probably already know about) is a good
starting point:

  http://www.brendangregg.com/perf.html

> (or even without, if you're willing to wait).

There's no rush, so please take your time.

> But I'm not seeing a qualitative difference between a 4K fullscreen
> Emacs, a half-screen Emacs, or even Emacs with window resized further
> down: there are still minor stutters here. So I'm probably not the best
> person for this experiment.

Well, you saw a difference between a 4K display and a 2K display,
didn't you?  All we need is to compare 2 situations and see where's
the extra time spent.

Maybe Bryan could also try the profile, as he reported a significant
difference.

Thanks.



Reply | Threaded
Open this post in threaded view
|

bug#28695: 26.0.60; Rendering lag spikes caused by double-buffering on Linux

Dmitry Gutov
On 10/9/17 5:03 PM, Eli Zaretskii wrote:

> I think this page (which you probably already know about) is a good
> starting point:
>
>    http://www.brendangregg.com/perf.html

I still haven't found the appropriate recipe there, but somebody else on
the internet suggested this, effectively:

sudo perf record -g src/emacs
# produces perf.data
sudo perf report -g -i perf.data

> Well, you saw a difference between a 4K display and a 2K display,
> didn't you?  All we need is to compare 2 situations and see where's
> the extra time spent.

A certain difference, but not a stark one. Like, I couldn't produce a
full-on stuttering even with 4K (sometimes the gears continue spinning
fine; probably has something to do with thread or process scheduling).
With a small-window Emacs, the gears are spinning mostly fine.

Here are the window configurations:

1. Emacs fullscreen, 4K.
2. Emacs in a small window, much less than 2K.

Unfortunately, and if I'm reading the report right, XdbeSwapBuffers
takes only 0,03% of CPU time in the first case and 0,02% in the second
case. So, less than 1 percent in both cases.

Here's how it looks. I search for the function name in the report
program, and it shows something like this:

   Children  Self   Comma  Shared Object     Symbol
   0,03%     0,03%  emacs  libXext.so.6.4.0  [.] XdbeSwapBuffers
   0,00%     0,00%  emacs  emacs             [.] XdbeSwapBuffers@plt

I'm not quite sure if perf.data contains sensitive information, but I'd
be happy to send you the files produced by both scenarios for further
analysis. Questions welcome, too.



Reply | Threaded
Open this post in threaded view
|

bug#28695: 26.0.60; Rendering lag spikes caused by double-buffering on Linux

Eli Zaretskii
> Cc: [hidden email], [hidden email], [hidden email]
> From: Dmitry Gutov <[hidden email]>
> Date: Mon, 16 Oct 2017 01:55:54 +0300
>
>    Children  Self   Comma  Shared Object     Symbol
>    0,03%     0,03%  emacs  libXext.so.6.4.0  [.] XdbeSwapBuffers
>    0,00%     0,00%  emacs  emacs             [.] XdbeSwapBuffers@plt

Hmm... so do you see any difference between these 2 scenarios in other
parts of the profiles?

Thanks.