(select-window nil) crash with gcc-8.2.0

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

(select-window nil) crash with gcc-8.2.0

Madhu-8
Hello, Compiling emacs with gcc-8.2.0 on amd64 with CFLAGS = -O2 -Os
causes emacs to crash when invoking M-: (select-window nil).  Clearly
gcc-8.2.0 is miscompiling code with these optimization settings (-O2
-Os) and I'm seeing crashes elsewhere where I am unable to isolate the
problem.  However the emacs crash is easily isolatable and could point
to the bug in either gcc, (or possibly in emacs if there is some wrong
assumption). Maybe someone with gcc-8.2.0 can verify the crash?

Reply | Threaded
Open this post in threaded view
|

Re: (select-window nil) crash with gcc-8.2.0

Eli Zaretskii
On April 7, 2019 5:11:57 AM GMT+03:00, Madhu <[hidden email]> wrote:
> Hello, Compiling emacs with gcc-8.2.0 on amd64 with CFLAGS = -O2 -Os
> causes emacs to crash when invoking M-: (select-window nil).  Clearly
> gcc-8.2.0 is miscompiling code with these optimization settings (-O2
> -Os) and I'm seeing crashes elsewhere where I am unable to isolate the
> problem.  However the emacs crash is easily isolatable and could point
> to the bug in either gcc, (or possibly in emacs if there is some wrong
> assumption). Maybe someone with gcc-8.2.0 can verify the crash?

Please state the version of Emacs in which this happened, and preferably also show a backtrace from the crash that identifies the problematic variables on the C level.

Thanks.

Reply | Threaded
Open this post in threaded view
|

Re: (select-window nil) crash with gcc-8.2.0

Madhu-8
* Eli Zaretskii <[hidden email]> :
Wrote on Sun, 07 Apr 2019 06:50:11 +0300:

> On April 7, 2019 5:11:57 AM GMT+03:00, Madhu <[hidden email]> wrote:
>> Hello, Compiling emacs with gcc-8.2.0 on amd64 with CFLAGS = -O2 -Os
>> causes emacs to crash when invoking M-: (select-window nil).  Clearly
>> gcc-8.2.0 is miscompiling code with these optimization settings (-O2
>> -Os) and I'm seeing crashes elsewhere where I am unable to isolate
>> the problem.  However the emacs crash is easily isolatable and could
>> point to the bug in either gcc, (or possibly in emacs if there is
>> some wrong assumption). Maybe someone with gcc-8.2.0 can verify the
>> crash?
>
> Please state the version of Emacs in which this happened, and
> preferably also show a backtrace from the crash that identifies the
> problematic variables on the C level.

[ First some pdmp notes:
I'd blown off the build directory and overwritten the installed version
but I had a copy of the binary dist. But I could not unpack the binary to
some other location and run emacs from there:
EMACSLOADPATH=/dev/shm/emacs-tmp/usr/share/emacs/27.0.50/lisp EMACSDATA=/dev/shm/emacs-tmp/usr/share/emacs/27.0.50/etc /dev/shm/emacs-tmp/usr/bin/emacs-27-vcs -Q
emacs: could not load dump file
"/usr/libexec/emacs/27.0.50/x86_64-pc-linux-gnu/emacs.pdmp": not built
for this Emacs executable.
I tried moving the installed pdmp to the bin directory where the
executable was unpacked, and then tried running it.

That was not enough either. emacs was still looking for
"/usr/libexec/emacs/27.0.50/x86_64-pc-linux-gnu/emacs.pdmp".

I tried to remove the pdmp file from the standard location

Now emacs started up but apparently it didn't pick up the pdmp file from
bin/. It loaded up loadup.el instead.

Then I realised that the bin dist was stripped, and a rebuild was only
 a few minutes.]

The following backtrace is from 051533c6 (03-apr-2019) on master on
 (select-window nil)

the problem is that if gcc is producing the wrong code then the
backtrace is unreliable.  This is not the backtrace one would expect
from calling (select-window nil).  But at least with this test case in
emacs I'm able to get *something* instead of 100s of empty nonsense
frames.

Attaching to process 6940
[New LWP 6941]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007f1bad3bd76b in __pselect (nfds=7, readfds=0x7ffd3c33e4b0,
    writefds=0x7ffd3c33e530, exceptfds=0x0, timeout=<optimized out>,
    sigmask=<optimized out>) at ../sysdeps/unix/sysv/linux/pselect.c:69
69 ../sysdeps/unix/sysv/linux/pselect.c: No such file or directory.
(gdb) c
Continuing.

Thread 1 "emacs" received signal SIGSEGV, Segmentation fault.
0x0000000000454ebf in select_window (window=0x0, norecord=0x0,
    inhibit_point_swap=<optimized out>) at lisp.h:1079
1079  return make_lisp_symbol (&lispsym[index]);
(gdb) back
#0  0x0000000000454ebf in select_window (window=0x0, norecord=0x0,
    inhibit_point_swap=<optimized out>) at lisp.h:1079
#1  0x0000000000501a40 in eval_sub (form=form@entry=0xb9eb63) at lisp.h:2119
#2  0x0000000000502c03 in Feval (form=0xb9eb63, lexical=0x0) at eval.c:2117
#3  0x0000000000501117 in funcall_subr (subr=subr@entry=0x90f5c0 <Seval>,
    numargs=numargs@entry=2, args=args@entry=0x7ffd3c33ed30) at eval.c:2907
#4  0x000000000050006d in Ffuncall (nargs=3, args=args@entry=0x7ffd3c33ed28)
    at lisp.h:2119
#5  0x0000000000526e3d in exec_byte_code (bytestr=<optimized out>,
    vector=<optimized out>, maxdepth=0x2a, args_template=<optimized out>,
    nargs=nargs@entry=4, args=<optimized out>, args@entry=0x9)
    at bytecode.c:633
#6  0x0000000000501d1d in funcall_lambda (fun=fun@entry=0x7f1bab8c2f95,
    nargs=nargs@entry=4, arg_vector=0x9, arg_vector@entry=0x7ffd3c33eff0)
    at lisp.h:1862
#7  0x00000000005000d0 in Ffuncall (nargs=nargs@entry=5,
    args=args@entry=0x7ffd3c33efe8) at eval.c:2844
#8  0x00000000004fdb3a in Ffuncall_interactively (nargs=5,
    args=0x7ffd3c33efe8) at callint.c:253
#9  0x0000000000501117 in funcall_subr (
    subr=subr@entry=0x90f100 <Sfuncall_interactively>,
    numargs=numargs@entry=5, args=args@entry=0x7ffd3c33efe8) at eval.c:2907
#10 0x000000000050006d in Ffuncall (nargs=nargs@entry=6, args=0x7ffd3c33efe0)
    at lisp.h:2119
#11 0x000000000050039b in Fapply (nargs=nargs@entry=3,
    args=args@entry=0x7ffd3c33f138) at eval.c:2450
#12 0x00000000004fdf63 in Fcall_interactively (function=0x7f1baaf3e480,
    record_flag=0x0, keys=0x7f1babcc742d) at lisp.h:1079
#13 0x000000000050112a in funcall_subr (
    subr=subr@entry=0x90f0c0 <Scall_interactively>, numargs=numargs@entry=3,
    args=args@entry=0x7ffd3c33f280) at eval.c:2910
#14 0x000000000050006d in Ffuncall (nargs=4, args=args@entry=0x7ffd3c33f278)
    at lisp.h:2119
#15 0x0000000000526e3d in exec_byte_code (bytestr=<optimized out>,
    vector=<optimized out>, maxdepth=0x36, args_template=<optimized out>,
    nargs=nargs@entry=1, args=<optimized out>, args@entry=0x5)
    at bytecode.c:633
#16 0x0000000000501d1d in funcall_lambda (fun=fun@entry=0x7f1bab85b30d,
    nargs=nargs@entry=1, arg_vector=0x5, arg_vector@entry=0x7ffd3c33f498)
    at lisp.h:1862
#17 0x00000000005000d0 in Ffuncall (nargs=nargs@entry=2,
    args=args@entry=0x7ffd3c33f490) at eval.c:2844
#18 0x00000000005001b6 in call1 (fn=fn@entry=0x3bd0, arg1=<optimized out>)
    at eval.c:2681
#19 0x00000000004b488a in command_loop_1 () at lisp.h:1079
#20 0x00000000004ff7a5 in internal_condition_case (
    bfun=bfun@entry=0x4b441d <command_loop_1>,
    handlers=handlers@entry=0x4f50, hfun=hfun@entry=0x4acd85 <cmd_error>)
    at eval.c:1352
#21 0x00000000004aa968 in command_loop_2 (ignore=ignore@entry=0x0)
    at lisp.h:1079
#22 0x00000000004ff723 in internal_catch (tag=tag@entry=0xc7b0,
    func=func@entry=0x4aa953 <command_loop_2>, arg=arg@entry=0x0)
    at eval.c:1115
#23 0x00000000004aa917 in command_loop () at lisp.h:1079
#24 0x00000000004acabf in recursive_edit_1 () at keyboard.c:714
#25 0x00000000004accfb in Frecursive_edit () at keyboard.c:786
#26 0x0000000000414c67 in main (argc=2, argv=0x7ffd3c33f878) at emacs.c:1958
(gdb)

Reply | Threaded
Open this post in threaded view
|

Re: (select-window nil) crash with gcc-8.2.0

Andreas Schwab-2
On Apr 07 2019, Madhu <[hidden email]> wrote:

> the problem is that if gcc is producing the wrong code then the
> backtrace is unreliable.  This is not the backtrace one would expect
> from calling (select-window nil).

Step through select_window to see where it goes wrong.

Andreas.

--
Andreas Schwab, [hidden email]
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

Reply | Threaded
Open this post in threaded view
|

Re: (select-window nil) crash with gcc-8.2.0

Eli Zaretskii
> From: Andreas Schwab <[hidden email]>
> Date: Sun, 07 Apr 2019 09:13:44 +0200
> Cc: [hidden email]
>
> On Apr 07 2019, Madhu <[hidden email]> wrote:
>
> > the problem is that if gcc is producing the wrong code then the
> > backtrace is unreliable.  This is not the backtrace one would expect
> > from calling (select-window nil).
>
> Step through select_window to see where it goes wrong.

Right.  And I'm afraid the stepping will need to be done on machine
instruction level, i.e. "stepi", not "step".  The backtrace looks
completely bogus, it doesn't even show the file names correctly, let
alone line numbers.

The main issue here is why CHECK_LIVE_WINDOW doesn't do its job in
that case.  It does here, and signals an error because nil is not a
live window.

Reply | Threaded
Open this post in threaded view
|

Re: (select-window nil) crash with gcc-8.2.0

Paul Eggert
In reply to this post by Madhu-8
I reproduced the bug with GCC 8.3.1 20190223 (Red Hat 8.3.1-2) on x86-64. It's
clearly a compiler bug with -O2 -Os. The machine code for select_window starts
this way:

select_window:
         pushq   %r13
         movl    %edx, %r13d
         pushq   %r12
         pushq   %rbp
         movq    %rsi, %rbp
         pushq   %rbx
         movq    %rdi, %rbx
         pushq   %rcx
         call    WINDOWP
         movq    75(%rbx), %r12
         xorl    %edi, %edi
         testb   %al, %al
         je      .L981

and that last movq dereferences the window pointer in %rbx before the result of
WINDOWP is checked to verify that the argument (originally in %rdi, now in %rbx)
is indeed a window.

Could you file a GCC bug report for this? And in the meantime, I wouldn't use -Os.

Reply | Threaded
Open this post in threaded view
|

Re: (select-window nil) crash with gcc-8.2.0

Eli Zaretskii
> From: Paul Eggert <[hidden email]>
> Date: Sun, 7 Apr 2019 11:33:22 -0700
> Cc: [hidden email]
>
> and that last movq dereferences the window pointer in %rbx before the result of
> WINDOWP is checked to verify that the argument (originally in %rdi, now in %rbx)
> is indeed a window.
>
> Could you file a GCC bug report for this? And in the meantime, I wouldn't use -Os.

Thanks.  So just -O2 produces correct code?  Because I think GCC 8 is
quite popular, and the default build uses -O2.  If -O2 is prone to
something similar, maybe we should make some changes in the code to
avoid that.


Reply | Threaded
Open this post in threaded view
|

Re: (select-window nil) crash with gcc-8.2.0

Paul Eggert
Eli Zaretskii wrote:
> Thanks.  So just -O2 produces correct code?

Yes, it's -O2 -Os that's the problem. -Os is rarely used so I doubt whether we
need to work around the GCC bug.

Reply | Threaded
Open this post in threaded view
|

Re: (select-window nil) crash with gcc-8.2.0

Richard Stallman
[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > Yes, it's -O2 -Os that's the problem. -Os is rarely used so I doubt whether we
  > need to work around the GCC bug.

Have we reported the GCC bug?  That's very important, since we want
GCC to work correctly.

--
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)



Reply | Threaded
Open this post in threaded view
|

Re: (select-window nil) crash with gcc-8.2.0

Paul Eggert
Richard Stallman wrote:

> Have we reported the GCC bug?  That's very important, since we want
> GCC to work correctly.

Although I asked Madhu to file one I didn't see anything filed, so I went ahead
and filed a GCC bug report here:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90020

Reply | Threaded
Open this post in threaded view
|

Re: (select-window nil) crash with gcc-8.2.0

Richard Stallman
[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > and filed a GCC bug report here:

  > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90020

Thank you.

It is very very important to report the bugs we find in other
programs, both GNU packages and others.  We want our users to do that
for us, so we have to do it for other programs.

Hackers, please make this the first thing you think of when you think
you've found a bug in some other program.

--
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)



Reply | Threaded
Open this post in threaded view
|

Re: (select-window nil) crash with gcc-8.2.0

Paul Eggert
Richard Stallman wrote:
>    >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90020
>
> Thank you.
>
> It is very very important to report the bugs we find in other
> programs, both GNU packages and others.

Yes, a bug was going to be reported one way or another, by Madhu or by me. This
test case seems to have uncovered two GCC bugs, one introduced in GCC 4.9.x in
2014, and one that seems to predate that. Richard Biener is testing an overall
patch.

I considered putting something into etc/PROBLEMS warning people not to compile
Emacs with gcc -O2 -Os, but apparently that sort of GCC usage is pretty rare so
it's not clear it's worth documenting it (plus, the advice is not specific to
Emacs).

Reply | Threaded
Open this post in threaded view
|

Re: (select-window nil) crash with gcc-8.2.0

Paul Eggert
On 4/9/19 7:01 PM, Paul Eggert wrote:
>>    >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90020
>>
Richard Biener installed a patch for this bug into GCC trunk, so I
expect a fix to appear in GCC 9. I don't know of any plans to backport
the fix to earlier GCC releases.