bug#34235: 27.0.50; lisp profiler does not work

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

bug#34235: 27.0.50; lisp profiler does not work

Alex Branham
The lisp profiler (i.e. M-x profiler-start, M-x profiler-report, etc)
seems to not work for me since the pdumper got merged (or sometime
around then). I'm not using --enable-profiling when I run ./configure.

emacs -q
M-x profiler-start (set cpu mode)
C-x C-f ~/.emacs.d/init.el
M-x profiler-report

results in the profiler report buffer only having the contents:

- ...                                                               1 100%
   Automatic GC                                                     1 100%

but surely there should be more in there. More generally, it doesn't
seem to ever report anything for me, no matter how long I leave it
running for. I've got no idea if the pdumper has anything to do with
this, it just seems to have been happening since about then.

Thanks,
Alex





In GNU Emacs 27.0.50 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.2)
 of 2019-01-25 built on localhost
Windowing system distributor 'The X.Org Foundation', version 11.0.11906000
System Description: NixOS 18.09.1985.749a3a0d00b (Jellyfish)

Configured using:
 'configure
 --prefix=/nix/store/dw0cpqxxrnm2khnmldp7mncn6db4779p-emacs-27.0.50
 --with-modules --with-x-toolkit=gtk3 --with-xft
 CFLAGS=-DMAC_OS_X_VERSION_MAX_ALLOWED=101200'

Configured features:
XPM JPEG TIFF GIF PNG RSVG SOUND DBUS GSETTINGS GLIB NOTIFY INOTIFY
LIBSELINUX GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT ZLIB
TOOLKIT_SCROLL_BARS GTK3 X11 XDBE XIM MODULES THREADS LIBSYSTEMD JSON
CANNOT_DUMP GMP

Important settings:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix



Reply | Threaded
Open this post in threaded view
|

bug#34235: 27.0.50; lisp profiler does not work

Eli Zaretskii
> From: Alex Branham <[hidden email]>
> Date: Mon, 28 Jan 2019 12:34:03 -0600
>
> The lisp profiler (i.e. M-x profiler-start, M-x profiler-report, etc)
> seems to not work for me since the pdumper got merged (or sometime
> around then). I'm not using --enable-profiling when I run ./configure.
>
> emacs -q
> M-x profiler-start (set cpu mode)
> C-x C-f ~/.emacs.d/init.el
> M-x profiler-report
>
> results in the profiler report buffer only having the contents:
>
> - ...                                                               1 100%
>    Automatic GC                                                     1 100%
>
> but surely there should be more in there. More generally, it doesn't
> seem to ever report anything for me, no matter how long I leave it
> running for. I've got no idea if the pdumper has anything to do with
> this, it just seems to have been happening since about then.

I couldn't reproduce this with today's master branch.  I tried on
GNU/Linux and on 2 different MS-Windows machines, and profiling worked
for me on all of them.  I see much more than just "Automatic GC".

So I suggest that you look into possible local changes on your system
recently, and perhaps also try the emacs-26 branch (where there's no
pdumper).

For the record, after invoking profiler-start in CPU mode, I visited
xdisp.c and typed M->, then invoked profiler-report, and typed "C-u RET"
on every "+" sign in the leftmost column.  Did you do something similar?

Thanks.



Reply | Threaded
Open this post in threaded view
|

bug#34235: 27.0.50; lisp profiler does not work

Alan Mackenzie
Hello, Eli and Alex

On Tue, Jan 29, 2019 at 18:06:05 +0200, Eli Zaretskii wrote:
> > From: Alex Branham <[hidden email]>
> > Date: Mon, 28 Jan 2019 12:34:03 -0600

> > The lisp profiler (i.e. M-x profiler-start, M-x profiler-report, etc)
> > seems to not work for me since the pdumper got merged (or sometime
> > around then). I'm not using --enable-profiling when I run ./configure.

> > emacs -q
> > M-x profiler-start (set cpu mode)
> > C-x C-f ~/.emacs.d/init.el
> > M-x profiler-report

> > results in the profiler report buffer only having the contents:

> > - ...                                                               1 100%
> >    Automatic GC                                                     1 100%

> > but surely there should be more in there. More generally, it doesn't
> > seem to ever report anything for me, no matter how long I leave it
> > running for. I've got no idea if the pdumper has anything to do with
> > this, it just seems to have been happening since about then.

> I couldn't reproduce this with today's master branch.  I tried on
> GNU/Linux and on 2 different MS-Windows machines, and profiling worked
> for me on all of them.  I see much more than just "Automatic GC".

> So I suggest that you look into possible local changes on your system
> recently, and perhaps also try the emacs-26 branch (where there's no
> pdumper).

> For the record, after invoking profiler-start in CPU mode, I visited
> xdisp.c and typed M->, then invoked profiler-report, and typed "C-u RET"
> on every "+" sign in the leftmost column.  Did you do something similar?

For what it's worth, I see the bug in master in GNU/Linux, starting off
with emacs -Q, visiting xdisp.c, M-x profiler-start <CR> <CR>, M->, M-x
profiler-report.

I see in the report just the single entry for Automatic GC, like Alex.

I last updated my repository yesterday, just before midday UTC.

> Thanks.

--
Alan Mackenzie (Nuremberg, Germany).



Reply | Threaded
Open this post in threaded view
|

bug#34235: 27.0.50; lisp profiler does not work

Alex Branham

On Wed 30 Jan 2019 at 07:33, Alan Mackenzie <[hidden email]> wrote:

> For what it's worth, I see the bug in master in GNU/Linux, starting off
> with emacs -Q, visiting xdisp.c, M-x profiler-start <CR> <CR>, M->, M-x
> profiler-report.
>
> I see in the report just the single entry for Automatic GC, like Alex.
>
> I last updated my repository yesterday, just before midday UTC.

Thanks, Alan. Good to know it's not just me.

Since submitting this bug, I've discovered that it affects previous
versions of Emacs as well. Emacs from git master that I compile myself,
Emacs 26.1 from my distro (nixOS), and Emacs 25.3 from my distro. I
haven't had time yet to dig in much more than that, but figured I'd
report that as well since it rules out the pdumper.

Alex

signature.asc (497 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

bug#34235: 27.0.50; lisp profiler does not work

Eli Zaretskii
In reply to this post by Alan Mackenzie
> Date: Wed, 30 Jan 2019 13:33:11 +0000
> Cc: [hidden email]
> From: Alan Mackenzie <[hidden email]>
>
> For what it's worth, I see the bug in master in GNU/Linux, starting off
> with emacs -Q, visiting xdisp.c, M-x profiler-start <CR> <CR>, M->, M-x
> profiler-report.
>
> I see in the report just the single entry for Automatic GC, like Alex.

Thanks, I hope you (or someone else who sees this) will be able to
debug this.



Reply | Threaded
Open this post in threaded view
|

bug#34235: 27.0.50; lisp profiler does not work

Alan Mackenzie
In reply to this post by Alex Branham
Hello, Alex and Eli.

On Wed, Jan 30, 2019 at 09:40:02 -0600, Alex Branham wrote:

> On Wed 30 Jan 2019 at 07:33, Alan Mackenzie <[hidden email]> wrote:

> > For what it's worth, I see the bug in master in GNU/Linux, starting off
> > with emacs -Q, visiting xdisp.c, M-x profiler-start <CR> <CR>, M->, M-x
> > profiler-report.

> > I see in the report just the single entry for Automatic GC, like Alex.

> > I last updated my repository yesterday, just before midday UTC.

> Thanks, Alan. Good to know it's not just me.

> Since submitting this bug, I've discovered that it affects previous
> versions of Emacs as well. Emacs from git master that I compile myself,
> Emacs 26.1 from my distro (nixOS), and Emacs 25.3 from my distro. I
> haven't had time yet to dig in much more than that, but figured I'd
> report that as well since it rules out the pdumper.

It seems to be a Linux problem, at least at my end.

I see the bug with kernel versions 4.19.17 and 4.19.13.
I don't see the bug with versions 4.18.19 and 4.19.10.

It seems some incompatibility between Emacs and Linux was introduced in
one of the kernels 4.19.1[123].

I don't know how to debug this any further.

> Alex

--
Alan Mackenzie (Nuremberg, Germany).



Reply | Threaded
Open this post in threaded view
|

bug#34235: 27.0.50; lisp profiler does not work

Alex Branham

On Wed 30 Jan 2019 at 10:15, Alan Mackenzie <[hidden email]> wrote:

> It seems to be a Linux problem, at least at my end.
>
> I see the bug with kernel versions 4.19.17 and 4.19.13.
> I don't see the bug with versions 4.18.19 and 4.19.10.
>
> It seems some incompatibility between Emacs and Linux was introduced in
> one of the kernels 4.19.1[123].

I'm on kernel 4.14.95, so if it is a Linux issue, it's been backported.


signature.asc (497 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

bug#34235: 27.0.50; lisp profiler does not work

Eli Zaretskii
In reply to this post by Alan Mackenzie
> Date: Wed, 30 Jan 2019 16:15:21 +0000
> Cc: [hidden email]
> From: Alan Mackenzie <[hidden email]>
>
> It seems to be a Linux problem, at least at my end.
>
> I see the bug with kernel versions 4.19.17 and 4.19.13.
> I don't see the bug with versions 4.18.19 and 4.19.10.
>
> It seems some incompatibility between Emacs and Linux was introduced in
> one of the kernels 4.19.1[123].

Maybe some problems with the SIGPROF signal?  Does the signal handler
(handle_profiler_signal) get called at all when you are profiling?

If the signal handler does get called, maybe it's the
backtrace_top_function function that fails?



Reply | Threaded
Open this post in threaded view
|

bug#34235: 27.0.50; lisp profiler does not work

Eli Zaretskii
In reply to this post by Alex Branham
> From: Alex Branham <[hidden email]>
> Cc: Eli Zaretskii <[hidden email]>, [hidden email]
> Date: Wed, 30 Jan 2019 10:23:10 -0600
>
> > I see the bug with kernel versions 4.19.17 and 4.19.13.
> > I don't see the bug with versions 4.18.19 and 4.19.10.
> >
> > It seems some incompatibility between Emacs and Linux was introduced in
> > one of the kernels 4.19.1[123].
>
> I'm on kernel 4.14.95, so if it is a Linux issue, it's been backported.

Do you keep records of your system updates?  If so, can you look there
for relevant recent updates, like glibc or some kernel modules/drivers
or libraries?



Reply | Threaded
Open this post in threaded view
|

bug#34235: 27.0.50; lisp profiler does not work

Alex Branham

On Wed 30 Jan 2019 at 10:43, Eli Zaretskii <[hidden email]> wrote:

>> From: Alex Branham <[hidden email]>
>> Cc: Eli Zaretskii <[hidden email]>, [hidden email]
>> Date: Wed, 30 Jan 2019 10:23:10 -0600
>>
>> > I see the bug with kernel versions 4.19.17 and 4.19.13.
>> > I don't see the bug with versions 4.18.19 and 4.19.10.
>> >
>> > It seems some incompatibility between Emacs and Linux was introduced in
>> > one of the kernels 4.19.1[123].
>>
>> I'm on kernel 4.14.95, so if it is a Linux issue, it's been backported.
>
> Do you keep records of your system updates?  If so, can you look there
> for relevant recent updates, like glibc or some kernel modules/drivers
> or libraries?
nixOS keeps old systems laying around. I was able to go back to a system
that doesn't have this bug (back in December, nixos-version
18.09.1763.fbb7dbdb95d). The only differences in package versions
between this system and the next one (where the profiler doesn't work,
nixos-version 18.09.1829.0396345b794) is this:

- linux-4.14.89
+ linux-4.14.91
- python3-3.6.7
+ python3-3.6.8
- libjpeg-turbo-1.5.3
+ libjpeg-turbo-2.0.1
- cpupower-4.14.89
+ cpupower-4.14.91

If there's any other info I can share, let me know.

Thanks,
Alex

signature.asc (497 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

bug#34235: 0e334db6bb4b1fd1e2d72c1f3d8f004313cd9f94 (posix-timers: Fix division by zero bug). Problems with glibc.

Alan Mackenzie
In reply to this post by Alex Branham
Hello, Thomas, Hello Linux.

0e334db6bb4b1fd1e2d72c1f3d8f004313cd9f94
posix-timers: Fix division by zero bug
Committed: 2018-12-17 17:35:45 +0100

With this patch in place I am seeing problems with glibc's function
timer_create.  I am an Emacs maintainer, and saw these problems whilst
investigating Emacs bug #34235 "27.0.50; lisp profiler does not work".
Full details of this bug are at
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=34235.

Emacs's profiler fails in kernel 4.19.13, but works in a version of
4.19.13 with the patch reversed, otherwise unchanged.  My current version
of glibc is 2.27-r6 (I think the "-r6" comes from Gentoo, my distro).

The Emacs profiler works by a signal handler being repeatedly triggered
by the SIGPROF signal every 1 millisecond.  In the bug scenario, this
signal gets triggered precisely once each time the Emacs profiler is
started, rather than continually.

The core of the code in Emacs which initialises the glibc timer is:

      int i;
      struct sigevent sigev;
      sigev.sigev_value.sival_ptr = &profiler_timer;
      sigev.sigev_signo = SIGPROF;
      sigev.sigev_notify = SIGEV_SIGNAL;

      for (i = 0; i < ARRAYELTS (system_clock); i++)
        if (timer_create (system_clock[i], &sigev, &profiler_timer) == 0)
          {
            profiler_timer_ok = 1;
            break;
          }
    }

  if (profiler_timer_ok)
    {
      struct itimerspec ispec;
      ispec.it_value = ispec.it_interval = interval;
      if (timer_settime (profiler_timer, 0, &ispec, 0) == 0)
        return TIMER_SETTIME_RUNNING;
    }

The variable `interval' has been checked as non-zero.  This code is in
.../emacs/src/profiler.c

It seems either that the patch has uncovered some invalid call between
Emacs and glibc, or between glibc and Linux, or that there is some
intrinsic problem with the patch.

I have very little familiarity with glibc and Linux source code, so I
would be greatful if you could help me investigate the bug scenario.
Naturally, I will help as I can in this process.

Thanks in advance!

--
Alan Mackenzie (Nuremberg, Germany).



Reply | Threaded
Open this post in threaded view
|

bug#34235: 0e334db6bb4b1fd1e2d72c1f3d8f004313cd9f94 (posix-timers: Fix division by zero bug). Problems with glibc.

Thomas Gleixner
Hello Alan,

On Fri, 1 Feb 2019, Alan Mackenzie wrote:
> 0e334db6bb4b1fd1e2d72c1f3d8f004313cd9f94
> posix-timers: Fix division by zero bug
> Committed: 2018-12-17 17:35:45 +0100
>
> With this patch in place I am seeing problems with glibc's function
> timer_create.  I am an Emacs maintainer, and saw these problems whilst
> investigating Emacs bug #34235 "27.0.50; lisp profiler does not work".

> Emacs's profiler fails in kernel 4.19.13, but works in a version of
> 4.19.13 with the patch reversed, otherwise unchanged.  My current version
> of glibc is 2.27-r6 (I think the "-r6" comes from Gentoo, my distro).

Please upgrade to 4.19.19. The issue should be fixed there with the
backported variant of

   93ad0fc088c5 ("posix-cpu-timers: Unbreak timer rearming")

Commit 21c0d1621b8d4b in 4.19.19

Thanks,

        tglx



Reply | Threaded
Open this post in threaded view
|

bug#34235: 0e334db6bb4b1fd1e2d72c1f3d8f004313cd9f94 (posix-timers: Fix, division by zero bug). Problems with glibc.

Paul Eggert
In reply to this post by Alex Branham
Thanks for helping to track down this bug. Since the problem occurs only
with a few Linux kernel versions and affects Emacs only when doing CPU
profiling, it doesn't seem worth spending time to try to patch Emacs to
work around the bug. So I installed the attached patch into emacs-26's
etc/PROBLEMS file to warn users about the problem, and am closing the
Emacs bug report.


0001-etc-PROBLEMS-Mention-profiler-report-bug-Bug-34235.patch (855 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

bug#34235: 0e334db6bb4b1fd1e2d72c1f3d8f004313cd9f94 (posix-timers: Fix, division by zero bug). Problems with glibc.

Thomas Gleixner
On Fri, 1 Feb 2019, Paul Eggert wrote:

> Thanks for helping to track down this bug. Since the problem occurs only with
> a few Linux kernel versions and affects Emacs only when doing CPU profiling,
> it doesn't seem worth spending time to try to patch Emacs to work around the
> bug. So I installed the attached patch into emacs-26's etc/PROBLEMS file to
> warn users about the problem, and am closing the Emacs bug report.

Can you please verify whether the issue is fixed with 4.19.19? The commit
in question broke posix CPU timers as a unintended side effect and a follow
up patch unbreaks them again.

Thanks,

        tglx



Reply | Threaded
Open this post in threaded view
|

bug#34235: 0e334db6bb4b1fd1e2d72c1f3d8f004313cd9f94 (posix-timers: Fix division by zero bug). Problems with glibc.

Alan Mackenzie
In reply to this post by Thomas Gleixner
Hello, Thomas.

Thanks for such a rapid reply!

On Fri, Feb 01, 2019 at 23:04:48 +0100, Thomas Gleixner wrote:
> Hello Alan,

> On Fri, 1 Feb 2019, Alan Mackenzie wrote:
> > 0e334db6bb4b1fd1e2d72c1f3d8f004313cd9f94
> > posix-timers: Fix division by zero bug
> > Committed: 2018-12-17 17:35:45 +0100

> > With this patch in place I am seeing problems with glibc's function
> > timer_create.  I am an Emacs maintainer, and saw these problems whilst
> > investigating Emacs bug #34235 "27.0.50; lisp profiler does not work".

> > Emacs's profiler fails in kernel 4.19.13, but works in a version of
> > 4.19.13 with the patch reversed, otherwise unchanged.  My current version
> > of glibc is 2.27-r6 (I think the "-r6" comes from Gentoo, my distro).

> Please upgrade to 4.19.19. The issue should be fixed there with the
> backported variant of

>    93ad0fc088c5 ("posix-cpu-timers: Unbreak timer rearming")

> Commit 21c0d1621b8d4b in 4.19.19

I've just built and installed Linux 4.19.19, and it does indeed solve
the Emacs profiler bug, #34235.  :-)

I see that the patch has been installed in 4.20.6, 4.19.19, and 4.14.97.
Are there any plans to install it into 4.9.x, the other live long term
support branch?  The reason I ask is to make an entry into Emacs's
PROBLEMS file, telling users and distributions which kernel versions to
upgrade to.

> Thanks,

> tglx

--
Alan Mackenzie (Nuremberg, Germany).



Reply | Threaded
Open this post in threaded view
|

bug#34235: 0e334db6bb4b1fd1e2d72c1f3d8f004313cd9f94 (posix-timers: Fix, division by zero bug). Problems with glibc.

Alex Branham
In reply to this post by Paul Eggert
On Fri 01 Feb 2019 at 20:07, Paul Eggert <[hidden email]> wrote:

> +A few versions of the Linux kernel have timer bugs that break CPU
> +profiling; see Bug#34235.  To fix the problem, upgrade to kernel
> +versions 4.19.19 or 4.20.6, or later.
> +
>  *** GNU/Linux: Process output is corrupted.
>
>  There is a bug in Linux kernel 2.6.10 PTYs that can cause emacs to

Hi all -

Thanks for all the work uncovering this one! I upgraded to
linux-4.14.97, which did solve the issue.

Thanks again!
Alex

signature.asc (497 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

bug#34235: 0e334db6bb4b1fd1e2d72c1f3d8f004313cd9f94 (posix-timers: Fix, division by zero bug). Problems with glibc.

Paul Eggert
In reply to this post by Thomas Gleixner
Thomas Gleixner wrote:
> Can you please verify whether the issue is fixed with 4.19.19?

It depends on what you mean by "verify". I looked at the Linux kernel source
code and checked that the "posix-cpu-timers: Unbreak timer rearming" patch is in
4.19.19 (but not 4.19.18) and in 4.20.6 (but not 4.20.5). I did not test Emacs's
CPU profiler on these kernels, as I don't have them installed. I expect to
upgrade soon to 4.20.6 (whenever Fedora 29 release does - 4.20.6 was submitted
for testing a couple of days ago) and plan to give it a try then.



Reply | Threaded
Open this post in threaded view
|

bug#34235: 0e334db6bb4b1fd1e2d72c1f3d8f004313cd9f94 (posix-timers: Fix division by zero bug). Problems with glibc.

Thomas Gleixner
In reply to this post by Alan Mackenzie
On Sat, 2 Feb 2019, Alan Mackenzie wrote:

> Hello, Thomas.
>
> Thanks for such a rapid reply!
>
> On Fri, Feb 01, 2019 at 23:04:48 +0100, Thomas Gleixner wrote:
> > Hello Alan,
>
> > On Fri, 1 Feb 2019, Alan Mackenzie wrote:
> > > 0e334db6bb4b1fd1e2d72c1f3d8f004313cd9f94
> > > posix-timers: Fix division by zero bug
> > > Committed: 2018-12-17 17:35:45 +0100
>
> > > With this patch in place I am seeing problems with glibc's function
> > > timer_create.  I am an Emacs maintainer, and saw these problems whilst
> > > investigating Emacs bug #34235 "27.0.50; lisp profiler does not work".
>
> > > Emacs's profiler fails in kernel 4.19.13, but works in a version of
> > > 4.19.13 with the patch reversed, otherwise unchanged.  My current version
> > > of glibc is 2.27-r6 (I think the "-r6" comes from Gentoo, my distro).
>
> > Please upgrade to 4.19.19. The issue should be fixed there with the
> > backported variant of
>
> >    93ad0fc088c5 ("posix-cpu-timers: Unbreak timer rearming")
>
> > Commit 21c0d1621b8d4b in 4.19.19
>
> I've just built and installed Linux 4.19.19, and it does indeed solve
> the Emacs profiler bug, #34235.  :-)
>
> I see that the patch has been installed in 4.20.6, 4.19.19, and 4.14.97.
> Are there any plans to install it into 4.9.x, the other live long term
> support branch?  The reason I ask is to make an entry into Emacs's
> PROBLEMS file, telling users and distributions which kernel versions to
> upgrade to.

4.9 doesn't have the offending commit AFAICT.

Thanks,

        tglx



Reply | Threaded
Open this post in threaded view
|

bug#34235: 0e334db6bb4b1fd1e2d72c1f3d8f004313cd9f94 (posix-timers: Fix division by zero bug). Problems with glibc.

Alan Mackenzie
Hello, Thomas.

On Mon, Feb 04, 2019 at 17:25:11 +0000, Thomas Gleixner wrote:
> On Sat, 2 Feb 2019, Alan Mackenzie wrote:

[ .... ]

> > I've just built and installed Linux 4.19.19, and it does indeed solve
> > the Emacs profiler bug, #34235.  :-)

> > I see that the patch has been installed in 4.20.6, 4.19.19, and 4.14.97.
> > Are there any plans to install it into 4.9.x, the other live long term
> > support branch?  The reason I ask is to make an entry into Emacs's
> > PROBLEMS file, telling users and distributions which kernel versions to
> > upgrade to.

> 4.9 doesn't have the offending commit AFAICT.

OK, thanks very much!  I've put these three version numbers into the
message in Emacs's PROBLEMS file.

I think we're finished, now.

> Thanks,

> tglx

--
Alan Mackenzie (Nuremberg, Germany).