bug#47067: 28.0.50; [feature/native-comp] Crash while scrolling through dispnew.c

classic Classic list List threaded Threaded
85 messages Options
12345
Reply | Threaded
Open this post in threaded view
|

bug#47067: 28.0.50; [feature/native-comp] Crash while scrolling through dispnew.c

Eli Zaretskii
I was hit by a segfault while scrolling through a C source file, in
this case dispnew.c.  The sequence of commands was this:

 emacs -Q
 C-h sit-for RET
 Click on the link to subr.el
 In subr.el go to where sit-for calls sleep-for and type C-h f RET
 Click on "C source code" to display dispnew.c
 Scroll down with C-n or C-v

The backtrace appears below, with some data I collected.  The argument
'args' to Flss is obviously bogus, but I don't understand how it came
into existence.  Maybe related to 0x30, which stands for the symbol t?
The first call-stack frame above that I can examine, frame #4, calls
c-beginning-of-statement-1 with 4 nil args and the last argument of t.
The levels below that are impenetrable for me: is there a way of
digging into this
F632d626567696e6e696e672d6f662d73746174656d656e742d31_c_beginning_of_statement_1_0
thing?

Any suggestions for how to debug this further or what data to collect
that will give you an idea for the root cause(s)?

P.S. Note the stopped backtrace: this is something I see for the last
couple of days on the native-comp branch, not sure if it's related.  I
will report that separately.

P.P.S. I tried to start another instance of Emacs from the branch, and
it immediately displayed this:

  Re-entering top level after C stack overflow

Which probably means something unhealthy happens when you start Emacs
while another instance is under a debugger with the same *.eln files
loaded.

Here's the backtrace and some related variables from the crash site:

  Thread 1 received signal SIGSEGV, Segmentation fault.
  0x01236788 in arithcompare_driver (nargs=2, args=0x28, comparison=ARITH_LESS)
      at data.c:2673
  2673        if (NILP (arithcompare (args[i - 1], args[i], comparison)))
  (gdb) bt
  #0  0x01236788 in arithcompare_driver (nargs=2, args=0x28,
      comparison=ARITH_LESS) at data.c:2673
  #1  0x01236860 in Flss (nargs=2, args=0x28) at data.c:2691
  #2  0x61a92285 in F632d626567696e6e696e672d6f662d73746174656d656e742d31_c_beginning_of_statement_1_0 ()
     from d:\usr\eli\.emacs.d\eln-cache\28.0.50-7d88f6c1\cc-engine-ccfcb170-1b345b21.eln
  #3  0x01261898 in funcall_lambda (fun=XIL(0xa00000000796aed8), nargs=5,
      arg_vector=0x827a78) at eval.c:3292
  #4  0x012601ed in Ffuncall (nargs=6, args=0x827a70) at eval.c:3013
  #5  0x61b00dbf in F632d6a7573742d61667465722d66756e632d6172676c6973742d70_c_just_after_func_arglist_p_0 ()
     from d:\usr\eli\.emacs.d\eln-cache\28.0.50-7d88f6c1\cc-engine-ccfcb170-1b345b21.eln
  #6  0x01261898 in funcall_lambda (fun=XIL(0xa000000007973cb8), nargs=0,
      arg_vector=0x827c50) at eval.c:3292
  #7  0x012601ed in Ffuncall (nargs=1, args=0x827c48) at eval.c:3013
  #8  0x61aee041 in F632d6261636b2d6f7665722d6d656d6265722d696e697469616c697a657273_c_back_over_member_initializers_0 ()
     from d:\usr\eli\.emacs.d\eln-cache\28.0.50-7d88f6c1\cc-engine-ccfcb170-1b345b21.eln
  #9  0x01261898 in funcall_lambda (fun=XIL(0xa0000000079739f8), nargs=1,
      arg_vector=0x827e28) at eval.c:3292
  #10 0x012601ed in Ffuncall (nargs=2, args=0x827e20) at eval.c:3013
  #11 0x0a525b36 in ?? ()
  #12 0x01261898 in funcall_lambda (fun=XIL(0xa0000000079b97c0), nargs=1,
      arg_vector=0x8280c0) at eval.c:3292
  #13 0x012601ed in Ffuncall (nargs=2, args=0x8280b8) at eval.c:3013
  #14 0x0686af93 in ?? ()
  #15 0x012de838 in helper_save_restriction () at comp.c:4575
  #16 0x0122e9aa in wrong_type_argument (predicate=XIL(0x892404890c245c89),
      value=XIL(0x8244c89e45d8be0)) at data.c:143
  Backtrace stopped: previous frame inner to this frame (corrupt stack?)

  Lisp Backtrace:
  "c-beginning-of-statement-1" (0x827a78)
  "c-just-after-func-arglist-p" (0x827c50)
  "c-back-over-member-initializers" (0x827e28)
  "c-font-lock-cut-off-declarators" (0x8280c0)
  "font-lock-fontify-keywords-region" (0x828418)
  "font-lock-default-fontify-region" (0x828728)
  "c-font-lock-fontify-region" (0x8288d8)
  "font-lock-fontify-region" (0x828ac8)
  0x78fb7e8 PVEC_COMPILED
  "jit-lock--run-functions" (0x829460)
  "jit-lock-fontify-now" (0x829720)
  "jit-lock-function" (0x829948)
  "redisplay_internal (C function)" (0x0)
  (gdb) fr 3
  #3  0x01261898 in funcall_lambda (fun=XIL(0xa00000000796aed8), nargs=5,
      arg_vector=0x827a78) at eval.c:3292
  3292          val = XSUBR (fun)->function.a0 ();
  (gdb) p nargs
  $1 = 5
  (gdb) p args[0]
  No symbol "args" in current context.
  (gdb) p arg_vector
  $2 = (Lisp_Object *) 0x827a78
  (gdb) p arg_vector [0]
  $3 = XIL(0)
  (gdb) p arg_vector [1]
  $4 = XIL(0)
  (gdb) p arg_vector[0]
  $5 = XIL(0)
  (gdb) p arg_vector[1]
  $6 = XIL(0)
  (gdb) p arg_vector[2]
  $7 = XIL(0)
  (gdb) p arg_vector[3]
  $8 = XIL(0)
  (gdb) p arg_vector[4]
  $9 = XIL(0x30)
  (gdb) xtype
  Lisp_Symbol
  (gdb) xsymbol
  $10 = (struct Lisp_Symbol *) 0x186a390 <lispsym+48>
  "t"
  (gdb) up
  #4  0x012601ed in Ffuncall (nargs=6, args=0x827a70) at eval.c:3013
  3013        val = funcall_lambda (fun, numargs, args + 1);
  (gdb) p args[0]
  $11 = XIL(0x60800a8)
  (gdb) xtype
  Lisp_Symbol
  (gdb) xsymbol
  $12 = (struct Lisp_Symbol *) 0x78ea408
  "c-beginning-of-statement-1"
  (gdb) p args[1]
  $13 = XIL(0)
  (gdb) p args[2]
  $14 = XIL(0)
  (gdb) p args[3]
  $15 = XIL(0)
  (gdb) p args[4]
  $16 = XIL(0)
  (gdb) p args[5]
  $17 = XIL(0x30)
  (gdb) down
  #3  0x01261898 in funcall_lambda (fun=XIL(0xa00000000796aed8), nargs=5,
      arg_vector=0x827a78) at eval.c:3292
  3292          val = XSUBR (fun)->function.a0 ();
  (gdb) p fun
  $18 = XIL(0xa00000000796aed8)
  (gdb) xtype
  Lisp_Vectorlike
  PVEC_SUBR
  (gdb) xsubr
  $19 = (struct Lisp_Subr *) 0x796aed8
  {
    header = {
      size = 1342205952
    },
    function = {
      a0 = 0x61a8d020 <F632d626567696e6e696e672d6f662d73746174656d656e742d31_c_beginning_of_statement_1_0>,
      a1 = 0x61a8d020 <F632d626567696e6e696e672d6f662d73746174656d656e742d31_c_beginning_of_statement_1_0>,
      a2 = 0x61a8d020 <F632d626567696e6e696e672d6f662d73746174656d656e742d31_c_beginning_of_statement_1_0>,
      a3 = 0x61a8d020 <F632d626567696e6e696e672d6f662d73746174656d656e742d31_c_beginning_of_statement_1_0>,
      a4 = 0x61a8d020 <F632d626567696e6e696e672d6f662d73746174656d656e742d31_c_beginning_of_statement_1_0>,
      a5 = 0x61a8d020 <F632d626567696e6e696e672d6f662d73746174656d656e742d31_c_beginning_of_statement_1_0>,
      a6 = 0x61a8d020 <F632d626567696e6e696e672d6f662d73746174656d656e742d31_c_beginning_of_statement_1_0>,
      a7 = 0x61a8d020 <F632d626567696e6e696e672d6f662d73746174656d656e742d31_c_beginning_of_statement_1_0>,
      a8 = 0x61a8d020 <F632d626567696e6e696e672d6f662d73746174656d656e742d31_c_beginning_of_statement_1_0>,
      aUNEVALLED = 0x61a8d020 <F632d626567696e6e696e672d6f662d73746174656d656e742d31_c_beginning_of_statement_1_0>,
      aMANY = 0x61a8d020 <F632d626567696e6e696e672d6f662d73746174656d656e742d31_c_beginning_of_statement_1_0>
    },
    min_args = 0,
    max_args = 5,
    symbol_name = 0x796eac0 "c-beginning-of-statement-1",
    {
      intspec = 0x0,
      native_intspec = XIL(0)
    },
    doc = 91,
    native_comp_u = {XIL(0xa0000000078884c0)},
    native_c_name = {
      0x796eaf8 "F632d626567696e6e696e672d6f662d73746174656d656e742d31_c_beginning_of_statement_1_0"},
    lambda_list = {XIL(0xc0000000079155b0)},
    type = {XIL(0)}
  }
  (gdb) p 0x28
  $20 = 40
  (gdb) xtype
  Lisp_Symbol
  (gdb) xsymbol
  $21 = (struct Lisp_Symbol *) 0x186a388 <lispsym+40>
  Cannot access memory at address 0x1a4
  (gdb)


In GNU Emacs 28.0.50 (build 1080, i686-pc-mingw32)
 of 2021-03-11 built on HOME-C4E4A596F7
Repository revision: 8497af6892fcf9b08a1c120e897c9f5c21ea64fa
Repository branch: master
Windowing system distributor 'Microsoft Corp.', version 5.1.2600
System Description: Microsoft Windows XP Service Pack 3 (v5.1.0.2600)

Configured using:
 'configure -C --prefix=/d/usr --with-wide-int --with-modules
 --enable-checking=yes,glyphs 'CFLAGS=-O0 -gdwarf-4 -g3''

Configured features:
ACL GIF GMP GNUTLS HARFBUZZ JPEG JSON LCMS2 LIBXML2 MODULES NOTIFY
W32NOTIFY PDUMPER PNG RSVG SOUND THREADS TIFF TOOLKIT_SCROLL_BARS XPM
ZLIB

Important settings:
  value of $LANG: ENU
  locale-coding-system: cp1255

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message rmc puny dired dired-loaddefs
rfc822 mml mml-sec epa derived epg epg-config gnus-util rmail
rmail-loaddefs auth-source cl-seq eieio eieio-core cl-macs
eieio-loaddefs password-cache json map text-property-search time-date
subr-x seq byte-opt gv bytecomp byte-compile cconv mm-decode mm-bodies
mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader cl-loaddefs
cl-lib sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils
iso-transl tooltip eldoc electric uniquify ediff-hook vc-hooks
lisp-float-type mwheel dos-w32 ls-lisp disp-table term/w32-win w32-win
w32-vars term/common-win tool-bar dnd fontset image regexp-opt fringe
tabulated-list replace newcomment text-mode elisp-mode lisp-mode
prog-mode register page tab-bar menu-bar rfn-eshadow isearch easymenu
timer select scroll-bar mouse jit-lock font-lock syntax facemenu
font-core term/tty-colors frame minibuffer cl-generic cham georgian
utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean
japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european
ethiopic indian cyrillic chinese composite charscript charprop
case-table epa-hook jka-cmpr-hook help simple abbrev obarray
cl-preloaded nadvice button loaddefs faces cus-face macroexp files
window text-properties overlay sha1 md5 base64 format env code-pages
mule custom widget hashtable-print-readable backquote threads w32notify
w32 lcms2 multi-tty make-network-process emacs)

Memory information:
((conses 16 56717 12106)
 (symbols 48 7804 1)
 (strings 16 21565 2060)
 (string-bytes 1 626902)
 (vectors 16 13077)
 (vector-slots 8 172292 12096)
 (floats 8 23 61)
 (intervals 40 263 114)
 (buffers 888 10))



Reply | Threaded
Open this post in threaded view
|

bug#47067: 28.0.50; [feature/native-comp] Crash while scrolling through dispnew.c

Eli Zaretskii
> Date: Thu, 11 Mar 2021 13:27:52 +0200
> From: Eli Zaretskii <[hidden email]>
> Cc: Andrea Corallo <[hidden email]>
>
> P.P.S. I tried to start another instance of Emacs from the branch, and
> it immediately displayed this:
>
>   Re-entering top level after C stack overflow
>
> Which probably means something unhealthy happens when you start Emacs
> while another instance is under a debugger with the same *.eln files
> loaded.

This part with stack overflow is not reproducible, unfortunately:
subsequent attempts to do the same start Emacs normally.  Too bad.



Reply | Threaded
Open this post in threaded view
|

bug#47067: 28.0.50; [feature/native-comp] Crash while scrolling through dispnew.c

Eli Zaretskii
In reply to this post by Eli Zaretskii
> Date: Thu, 11 Mar 2021 13:27:52 +0200
> From: Eli Zaretskii <[hidden email]>
> Cc: Andrea Corallo <[hidden email]>
>
>   #15 0x012de838 in helper_save_restriction () at comp.c:4575
>   #16 0x0122e9aa in wrong_type_argument (predicate=XIL(0x892404890c245c89),
>       value=XIL(0x8244c89e45d8be0)) at data.c:143
>   Backtrace stopped: previous frame inner to this frame (corrupt stack?)
>

Btw, it's clear that the arguments to wrong_type_argument are
garbled.  Perhaps some code somewhere clobbers the C stack?



Reply | Threaded
Open this post in threaded view
|

bug#47067: 28.0.50; [feature/native-comp] Crash while scrolling through dispnew.c

Emacs - Bugs mailing list
In reply to this post by Eli Zaretskii
Eli Zaretskii <[hidden email]> writes:

> I was hit by a segfault while scrolling through a C source file, in
> this case dispnew.c.  The sequence of commands was this:
>
>  emacs -Q
>  C-h sit-for RET
>  Click on the link to subr.el
>  In subr.el go to where sit-for calls sleep-for and type C-h f RET
>  Click on "C source code" to display dispnew.c
>  Scroll down with C-n or C-v

I can't reproduce here :/

> The backtrace appears below, with some data I collected.  The argument
> 'args' to Flss is obviously bogus, but I don't understand how it came
> into existence.  Maybe related to 0x30, which stands for the symbol t?
> The first call-stack frame above that I can examine, frame #4, calls
> c-beginning-of-statement-1 with 4 nil args and the last argument of t.
> The levels below that are impenetrable for me: is there a way of
> digging into this
> F632d626567696e6e696e672d6f662d73746174656d656e742d31_c_beginning_of_statement_1_0
> thing?
>
> Any suggestions for how to debug this further or what data to collect
> that will give you an idea for the root cause(s)?

Assuming is a miscompilation it's gonna be tricky to reduce it without a
reproducible testcase.

But if is a miscompilation is should be reproducible so either is not a
miscompilation or either the initial conditions are different.

> P.S. Note the stopped backtrace: this is something I see for the last
> couple of days on the native-comp branch, not sure if it's related.  I
> will report that separately.
>
> P.P.S. I tried to start another instance of Emacs from the branch, and
> it immediately displayed this:
>
>   Re-entering top level after C stack overflow
>
> Which probably means something unhealthy happens when you start Emacs
> while another instance is under a debugger with the same *.eln files
> loaded.

I often used more than one Emacs session from the same binary so at
least on GNU/Linux this does not appear to be a problem.

Thanks

  Andrea



Reply | Threaded
Open this post in threaded view
|

bug#47067: 28.0.50; [feature/native-comp] Crash while scrolling through dispnew.c

Eli Zaretskii
> From: Andrea Corallo <[hidden email]>
> Cc: [hidden email]
> Date: Fri, 12 Mar 2021 06:46:50 +0000
>
> Eli Zaretskii <[hidden email]> writes:
>
> > I was hit by a segfault while scrolling through a C source file, in
> > this case dispnew.c.  The sequence of commands was this:
> >
> >  emacs -Q
> >  C-h sit-for RET
> >  Click on the link to subr.el
> >  In subr.el go to where sit-for calls sleep-for and type C-h f RET
> >  Click on "C source code" to display dispnew.c
> >  Scroll down with C-n or C-v
>
> I can't reproduce here :/

Did you try the 32-bit build --with-wide-int?  It could be specific to
that configuration.



Reply | Threaded
Open this post in threaded view
|

bug#47067: 28.0.50; [feature/native-comp] Crash while scrolling through dispnew.c

Emacs - Bugs mailing list
Eli Zaretskii <[hidden email]> writes:

>> From: Andrea Corallo <[hidden email]>
>> Cc: [hidden email]
>> Date: Fri, 12 Mar 2021 06:46:50 +0000
>>
>> Eli Zaretskii <[hidden email]> writes:
>>
>> > I was hit by a segfault while scrolling through a C source file, in
>> > this case dispnew.c.  The sequence of commands was this:
>> >
>> >  emacs -Q
>> >  C-h sit-for RET
>> >  Click on the link to subr.el
>> >  In subr.el go to where sit-for calls sleep-for and type C-h f RET
>> >  Click on "C source code" to display dispnew.c
>> >  Scroll down with C-n or C-v
>>
>> I can't reproduce here :/
>
> Did you try the 32-bit build --with-wide-int?  It could be specific to
> that configuration.

Good point, it tried on 32-bit before and now 32-bit --with-wide-int but
still could not reproduce.

  Andrea



Reply | Threaded
Open this post in threaded view
|

bug#47067: 28.0.50; [feature/native-comp] Crash while scrolling through dispnew.c

Eli Zaretskii
> From: Andrea Corallo <[hidden email]>
> Cc: [hidden email]
> Date: Fri, 12 Mar 2021 12:04:34 +0000
>
> >> >  emacs -Q
> >> >  C-h sit-for RET
> >> >  Click on the link to subr.el
> >> >  In subr.el go to where sit-for calls sleep-for and type C-h f RET
> >> >  Click on "C source code" to display dispnew.c
> >> >  Scroll down with C-n or C-v
> >>
> >> I can't reproduce here :/
> >
> > Did you try the 32-bit build --with-wide-int?  It could be specific to
> > that configuration.
>
> Good point, it tried on 32-bit before and now 32-bit --with-wide-int but
> still could not reproduce.

Is there any data I can collect to help diagnose the issue?  Anything
at all?  Like maybe disassembly of this F632d626567696e6e696e672d6f662d73746174656d656e742d31_c_beginning_of_statement_1_0()
function or some part of it?

IOW, if the problem is miscompilation to native code, what facilities
do we have to report the details if the simple recipe doesn't
reproduce the problem?  We will have this kind of problems in the near
future, so having a good way of reporting the details might help
eliminate bugs faster.

Thanks.



Reply | Threaded
Open this post in threaded view
|

bug#47067: 28.0.50; [feature/native-comp] Crash while scrolling through dispnew.c

Emacs - Bugs mailing list
Eli Zaretskii <[hidden email]> writes:

>> From: Andrea Corallo <[hidden email]>
>> Cc: [hidden email]
>> Date: Fri, 12 Mar 2021 12:04:34 +0000
>>
>> >> >  emacs -Q
>> >> >  C-h sit-for RET
>> >> >  Click on the link to subr.el
>> >> >  In subr.el go to where sit-for calls sleep-for and type C-h f RET
>> >> >  Click on "C source code" to display dispnew.c
>> >> >  Scroll down with C-n or C-v
>> >>
>> >> I can't reproduce here :/
>> >
>> > Did you try the 32-bit build --with-wide-int?  It could be specific to
>> > that configuration.
>>
>> Good point, it tried on 32-bit before and now 32-bit --with-wide-int but
>> still could not reproduce.
>
> Is there any data I can collect to help diagnose the issue?  Anything
> at all?  Like maybe disassembly of this F632d626567696e6e696e672d6f662d73746174656d656e742d31_c_beginning_of_statement_1_0()
> function or some part of it?
>
> IOW, if the problem is miscompilation to native code, what facilities
> do we have to report the details if the simple recipe doesn't
> reproduce the problem?  We will have this kind of problems in the near
> future, so having a good way of reporting the details might help
> eliminate bugs faster.

Generally speaking the first step is to identify the function that is
responsible for the bug, this is often on the top of the back-trace but
not necessarily.  In the unfortunate case I typically proceed by
bisection.

When the function is identified I typically construct a single function
reproducer, for this I typically need the input parameters and I try to
substitute all other values coming from the environment with something I
can control.  This step involve understanding which part of the
environment are captured by the function (say: point, current buffer
content etc etc...).

At that point I reduce the function searching for the minimal piece of
code that behaves differently when native compiled.

At this point will typically start the "smart" part of the
investigation.

Here the problem is that being not reproducible we are stuck in the
first steps, reproducibility is tipically a pre for this kind of
analysis.  But again if it's a miscompilation it *must* be reproducible
because code is not morphing so probably we are not reproducing it
precisely?

BTW cc-engine.el is dynamic scope, this means we do not perform any
optimization in comp.el and we perform a bare 1:1 translation, so at
this stage I'd be rather skeptical this is miscompiled.

Thanks

  Andrea



Reply | Threaded
Open this post in threaded view
|

bug#47067: 28.0.50; [feature/native-comp] Crash while scrolling through dispnew.c

Eli Zaretskii
> From: Andrea Corallo <[hidden email]>
> Cc: [hidden email]
> Date: Fri, 12 Mar 2021 15:27:30 +0000
>
> Generally speaking the first step is to identify the function that is
> responsible for the bug, this is often on the top of the back-trace but
> not necessarily.  In the unfortunate case I typically proceed by
> bisection.

In my case the top of the stack looks like this:

  #0  0x01236788 in arithcompare_driver (nargs=2, args=0x28,
      comparison=ARITH_LESS) at data.c:2673
  #1  0x01236860 in Flss (nargs=2, args=0x28) at data.c:2691
  #2  0x0a872285 in ?? ()
  #3  0x01261898 in funcall_lambda (fun=XIL(0xa00000000a0bf230), nargs=5,
      arg_vector=0x826a08) at eval.c:3292
  #4  0x012601ed in Ffuncall (nargs=6, args=0x826a00) at eval.c:3013
  #5  0x0a8e0dbf in ?? ()
  #6  0x012601ed in Ffuncall (nargs=1, args=0x826bd8) at eval.c:3013
  #7  0x0a8ce041 in ?? ()
  #8  0x01261898 in funcall_lambda (fun=XIL(0xa0000000069f2a50), nargs=1,
      arg_vector=0x826db8) at eval.c:3292
  #9  0x012601ed in Ffuncall (nargs=2, args=0x826db0) at eval.c:3013
  #10 0x70895b36 in F632d666f6e742d6c6f636b2d6375742d6f66662d6465636c617261746f7273_c_font_lock_cut_off_declarators_0 ()
     from d:\usr\eli\.emacs.d\eln-cache\28.0.50-7d88f6c1\cc-fonts-d7d8a7f5-b7c359cd.eln
  #11 0x01261898 in funcall_lambda (fun=XIL(0xa0000000079249a0), nargs=1,
      arg_vector=0x827050) at eval.c:3292
  #12 0x012601ed in Ffuncall (nargs=2, args=0x827048) at eval.c:3013

And the corresponding Lisp backtrace:

  "c-beginning-of-statement-1" (0x826a08)
  "c-just-after-func-arglist-p" (0x826be0)
  "c-back-over-member-initializers" (0x826db8)
  "c-font-lock-cut-off-declarators" (0x827050)
  "font-lock-fontify-keywords-region" (0x8273a8)
  "font-lock-default-fontify-region" (0x8276b8)

(Don't ask me why "<", i.e. Flss, doesn't appear in the Lisp
backtrace: something strange happens with backtraces here, as I will
describe in another message.  I think the "??" things in the backtrace
are related.)

How do I go about finding the function that's responsible for the
problem given the above?  The problem is 100% reproducible for me.

> Here the problem is that being not reproducible we are stuck in the
> first steps, reproducibility is tipically a pre for this kind of
> analysis.  But again if it's a miscompilation it *must* be reproducible
> because code is not morphing so probably we are not reproducing it
> precisely?

Here's another reproducer:

  emacs -Q
  C-x C-f src/dispnew.c
  C-s sleep-for

I usually get a SIGSEGV before I even type the whole of "sleep-for".

Do you have all of the cc-*.el files natively-compiled?  I do.



Reply | Threaded
Open this post in threaded view
|

bug#47067: 28.0.50; [feature/native-comp] Crash while scrolling through dispnew.c

Emacs - Bugs mailing list
Eli Zaretskii <[hidden email]> writes:

>> From: Andrea Corallo <[hidden email]>
>> Cc: [hidden email]
>> Date: Fri, 12 Mar 2021 15:27:30 +0000
>>
>> Generally speaking the first step is to identify the function that is
>> responsible for the bug, this is often on the top of the back-trace but
>> not necessarily.  In the unfortunate case I typically proceed by
>> bisection.
>
> In my case the top of the stack looks like this:
>
>   #0  0x01236788 in arithcompare_driver (nargs=2, args=0x28,
>       comparison=ARITH_LESS) at data.c:2673
>   #1  0x01236860 in Flss (nargs=2, args=0x28) at data.c:2691
>   #2  0x0a872285 in ?? ()
>   #3  0x01261898 in funcall_lambda (fun=XIL(0xa00000000a0bf230), nargs=5,
>       arg_vector=0x826a08) at eval.c:3292
>   #4  0x012601ed in Ffuncall (nargs=6, args=0x826a00) at eval.c:3013
>   #5  0x0a8e0dbf in ?? ()
>   #6  0x012601ed in Ffuncall (nargs=1, args=0x826bd8) at eval.c:3013
>   #7  0x0a8ce041 in ?? ()
>   #8  0x01261898 in funcall_lambda (fun=XIL(0xa0000000069f2a50), nargs=1,
>       arg_vector=0x826db8) at eval.c:3292
>   #9  0x012601ed in Ffuncall (nargs=2, args=0x826db0) at eval.c:3013
>   #10 0x70895b36 in F632d666f6e742d6c6f636b2d6375742d6f66662d6465636c617261746f7273_c_font_lock_cut_off_declarators_0 ()
>      from d:\usr\eli\.emacs.d\eln-cache\28.0.50-7d88f6c1\cc-fonts-d7d8a7f5-b7c359cd.eln
>   #11 0x01261898 in funcall_lambda (fun=XIL(0xa0000000079249a0), nargs=1,
>       arg_vector=0x827050) at eval.c:3292
>   #12 0x012601ed in Ffuncall (nargs=2, args=0x827048) at eval.c:3013
>
> And the corresponding Lisp backtrace:
>
>   "c-beginning-of-statement-1" (0x826a08)
>   "c-just-after-func-arglist-p" (0x826be0)
>   "c-back-over-member-initializers" (0x826db8)
>   "c-font-lock-cut-off-declarators" (0x827050)
>   "font-lock-fontify-keywords-region" (0x8273a8)
>   "font-lock-default-fontify-region" (0x8276b8)
>
> (Don't ask me why "<", i.e. Flss, doesn't appear in the Lisp
> backtrace: something strange happens with backtraces here, as I will
> describe in another message.  I think the "??" things in the backtrace
> are related.)
>
> How do I go about finding the function that's responsible for the
> problem given the above?  The problem is 100% reproducible for me.

One easy option is to evaluate say `c-beginning-of-statement-1' (as
first defendant) and see if afterwards it still crashes.  Same one can
load entire files to exclude entirely their content from the equation.

>> Here the problem is that being not reproducible we are stuck in the
>> first steps, reproducibility is tipically a pre for this kind of
>> analysis.  But again if it's a miscompilation it *must* be reproducible
>> because code is not morphing so probably we are not reproducing it
>> precisely?
>
> Here's another reproducer:
>
>   emacs -Q
>   C-x C-f src/dispnew.c
>   C-s sleep-for
>
> I usually get a SIGSEGV before I even type the whole of "sleep-for".

Can't reproduce this either :(  That's odd.

> Do you have all of the cc-*.el files natively-compiled?  I do.

Looks so.

Thanks

  Andrea



Reply | Threaded
Open this post in threaded view
|

bug#47067: 28.0.50; [feature/native-comp] Crash while scrolling through dispnew.c

Eli Zaretskii
> From: Andrea Corallo <[hidden email]>
> Cc: [hidden email]
> Date: Fri, 12 Mar 2021 16:08:33 +0000
>
> > In my case the top of the stack looks like this:
> >
> >   #0  0x01236788 in arithcompare_driver (nargs=2, args=0x28,
> >       comparison=ARITH_LESS) at data.c:2673
> >   #1  0x01236860 in Flss (nargs=2, args=0x28) at data.c:2691
> >   #2  0x0a872285 in ?? ()
> >   #3  0x01261898 in funcall_lambda (fun=XIL(0xa00000000a0bf230), nargs=5,
> >       arg_vector=0x826a08) at eval.c:3292
> >   #4  0x012601ed in Ffuncall (nargs=6, args=0x826a00) at eval.c:3013
> >   #5  0x0a8e0dbf in ?? ()
> >   #6  0x012601ed in Ffuncall (nargs=1, args=0x826bd8) at eval.c:3013
> >   #7  0x0a8ce041 in ?? ()
> >   #8  0x01261898 in funcall_lambda (fun=XIL(0xa0000000069f2a50), nargs=1,
> >       arg_vector=0x826db8) at eval.c:3292

Btw, what are those "??" there instead of function names?  Do you see
the same on your system?



Reply | Threaded
Open this post in threaded view
|

bug#47067: 28.0.50; [feature/native-comp] Crash while scrolling through dispnew.c

Pip Cet
In reply to this post by Eli Zaretskii
On Fri, Mar 12, 2021 at 12:52 PM Eli Zaretskii <[hidden email]> wrote:

> > From: Andrea Corallo <[hidden email]>
> > Cc: [hidden email]
> > Date: Fri, 12 Mar 2021 12:04:34 +0000
> >
> > >> >  emacs -Q
> > >> >  C-h sit-for RET
> > >> >  Click on the link to subr.el
> > >> >  In subr.el go to where sit-for calls sleep-for and type C-h f RET
> > >> >  Click on "C source code" to display dispnew.c
> > >> >  Scroll down with C-n or C-v
> > >>
> > >> I can't reproduce here :/
> > >
> > > Did you try the 32-bit build --with-wide-int?  It could be specific to
> > > that configuration.
> >
> > Good point, it tried on 32-bit before and now 32-bit --with-wide-int but
> > still could not reproduce.
>
> Is there any data I can collect to help diagnose the issue?  Anything
> at all?  Like maybe disassembly of this F632d626567696e6e696e672d6f662d73746174656d656e742d31_c_beginning_of_statement_1_0()
> function or some part of it?

I think disassembling that function couldn't hurt, and it might help,
particularly the insns around the call site (but, of course, Andrea's
the expert). Passing 0x28 where the argument pointer should be is very
wrong; my suspicion is that the frame base pointer is NULL and there
are five arguments, leaving us with 0x28 pointing to what's allegedly
the base of the "proper" stack, but I don't think that's even how it's
supposed to work in the dynamic-scope case...

Pip



Reply | Threaded
Open this post in threaded view
|

bug#47067: 28.0.50; [feature/native-comp] Crash while scrolling through dispnew.c

Emacs - Bugs mailing list
In reply to this post by Emacs - Bugs mailing list
Eli Zaretskii <[hidden email]> writes:

>> From: Andrea Corallo <[hidden email]>
>> Cc: [hidden email]
>> Date: Fri, 12 Mar 2021 16:08:33 +0000
>>
>> > And the corresponding Lisp backtrace:
>> >
>> >   "c-beginning-of-statement-1" (0x826a08)
>> >   "c-just-after-func-arglist-p" (0x826be0)
>> >   "c-back-over-member-initializers" (0x826db8)
>> >   "c-font-lock-cut-off-declarators" (0x827050)
>> >   "font-lock-fontify-keywords-region" (0x8273a8)
>> >   "font-lock-default-fontify-region" (0x8276b8)
>> >
>> > (Don't ask me why "<", i.e. Flss, doesn't appear in the Lisp
>> > backtrace: something strange happens with backtraces here, as I will
>> > describe in another message.  I think the "??" things in the backtrace
>> > are related.)
>> >
>> > How do I go about finding the function that's responsible for the
>> > problem given the above?  The problem is 100% reproducible for me.
>>
>> One easy option is to evaluate say `c-beginning-of-statement-1' (as
>> first defendant) and see if afterwards it still crashes.  Same one can
>> load entire files to exclude entirely their content from the equation.
>
> Just evaluating c-beginning-of-statement-1 doesn't help.  But if I
> load cc-engine.el, then the crash goes away.

Okay, then probably is one of the other four c-* functions we see in the
backtrace.

> (Btw, if I load cc-engine.elc, it says it loads the .eln file
> instead?  is that intentional?)

Yes, .eln load is "transparent" and triggered automatically while
loading a .elc file when the corresponding .eln is found in the
`comp-eln-load-path'.

To force the .elc to be loaded one has to bind `load-no-native' to
non-nil.

  Andrea



Reply | Threaded
Open this post in threaded view
|

bug#47067: 28.0.50; [feature/native-comp] Crash while scrolling through dispnew.c

Emacs - Bugs mailing list
In reply to this post by Eli Zaretskii
Eli Zaretskii <[hidden email]> writes:

>> From: Andrea Corallo <[hidden email]>
>> Cc: [hidden email]
>> Date: Fri, 12 Mar 2021 16:08:33 +0000
>>
>> > In my case the top of the stack looks like this:
>> >
>> >   #0  0x01236788 in arithcompare_driver (nargs=2, args=0x28,
>> >       comparison=ARITH_LESS) at data.c:2673
>> >   #1  0x01236860 in Flss (nargs=2, args=0x28) at data.c:2691
>> >   #2  0x0a872285 in ?? ()
>> >   #3  0x01261898 in funcall_lambda (fun=XIL(0xa00000000a0bf230), nargs=5,
>> >       arg_vector=0x826a08) at eval.c:3292
>> >   #4  0x012601ed in Ffuncall (nargs=6, args=0x826a00) at eval.c:3013
>> >   #5  0x0a8e0dbf in ?? ()
>> >   #6  0x012601ed in Ffuncall (nargs=1, args=0x826bd8) at eval.c:3013
>> >   #7  0x0a8ce041 in ?? ()
>> >   #8  0x01261898 in funcall_lambda (fun=XIL(0xa0000000069f2a50), nargs=1,
>> >       arg_vector=0x826db8) at eval.c:3292
>
> Btw, what are those "??" there instead of function names?  Do you see
> the same on your system?

Yes I think too they are in place of function names.  You could verify
if the address is mapped by an .eln file.

I do not see those in my back-traces so it might a bug of the Windows
toolchain?

  Andrea



Reply | Threaded
Open this post in threaded view
|

bug#47067: 28.0.50; [feature/native-comp] Crash while scrolling through dispnew.c

Eli Zaretskii
In reply to this post by Emacs - Bugs mailing list
> From: Andrea Corallo <[hidden email]>
> Cc: [hidden email]
> Date: Fri, 12 Mar 2021 19:04:07 +0000
>
> > Just evaluating c-beginning-of-statement-1 doesn't help.  But if I
> > load cc-engine.el, then the crash goes away.
>
> Okay, then probably is one of the other four c-* functions we see in the
> backtrace.

Yes, but how to determine which one?

> > (Btw, if I load cc-engine.elc, it says it loads the .eln file
> > instead?  is that intentional?)
>
> Yes, .eln load is "transparent" and triggered automatically while
> loading a .elc file when the corresponding .eln is found in the
> `comp-eln-load-path'.
>
> To force the .elc to be loaded one has to bind `load-no-native' to
> non-nil.

I think if load-file is invoked interactively, and the user actually
types "foo.elc", we need to bind load-no-native non-nil
automatically.  Otherwise users would be surprised, as it goes against
the logic of what we do when the user types "foo.el".



Reply | Threaded
Open this post in threaded view
|

bug#47067: 28.0.50; [feature/native-comp] Crash while scrolling through dispnew.c

Eli Zaretskii
In reply to this post by Emacs - Bugs mailing list
> From: Andrea Corallo <[hidden email]>
> Cc: [hidden email]
> Date: Fri, 12 Mar 2021 19:30:31 +0000
>
> > Btw, what are those "??" there instead of function names?  Do you see
> > the same on your system?
>
> Yes I think too they are in place of function names.  You could verify
> if the address is mapped by an .eln file.

Doesn't look like that.

> I do not see those in my back-traces so it might a bug of the Windows
> toolchain?

Please show the full backtrace (including the Lisp backtrace) you get
as result of the following steps:

  gdb ./emacs -Q
  (gdb) break Fredraw_display
  (gdb) r -Q

  C-x C-f dispnew.c
  M-x redraw-display

  (gdb) break Fskip_chars_backward
  (gdb) c

  C-s sleep

  (gdb) bt

I'd like to compare your backtrace with what I get here.

Btw, is your build configured --enable-checking='yes,glyphs' ?  If
not, could you please reconfigure the 32-bit build with wide ints, and
see if you can reproduce the crashes then?  In any case, please show
the backtrace from that configuration, to avoid gratuitous
differences.

Thanks.



Reply | Threaded
Open this post in threaded view
|

bug#47067: 28.0.50; [feature/native-comp] Crash while scrolling through dispnew.c

Eli Zaretskii
In reply to this post by Pip Cet
> From: Pip Cet <[hidden email]>
> Date: Fri, 12 Mar 2021 18:42:17 +0000
> Cc: Andrea Corallo <[hidden email]>, [hidden email]
>
> I think disassembling that function couldn't hurt, and it might help,
> particularly the insns around the call site (but, of course, Andrea's
> the expert).

They are large functions.  I will post the disassembly if someone
wants to look at it.



Reply | Threaded
Open this post in threaded view
|

bug#47067: 28.0.50; [feature/native-comp] Crash while scrolling through dispnew.c

Emacs - Bugs mailing list
In reply to this post by Eli Zaretskii
Eli Zaretskii <[hidden email]> writes:

>> From: Andrea Corallo <[hidden email]>
>> Cc: [hidden email]
>> Date: Fri, 12 Mar 2021 19:04:07 +0000
>>
>> > Just evaluating c-beginning-of-statement-1 doesn't help.  But if I
>> > load cc-engine.el, then the crash goes away.
>>
>> Okay, then probably is one of the other four c-* functions we see in the
>> backtrace.
>
> Yes, but how to determine which one?

Given they are 4 one could go evaluating these one by one, if they were
more bisection would have been the best strategy.  Yeah that's not the
most fun...

>
>> > (Btw, if I load cc-engine.elc, it says it loads the .eln file
>> > instead?  is that intentional?)
>>
>> Yes, .eln load is "transparent" and triggered automatically while
>> loading a .elc file when the corresponding .eln is found in the
>> `comp-eln-load-path'.
>>
>> To force the .elc to be loaded one has to bind `load-no-native' to
>> non-nil.
>
> I think if load-file is invoked interactively, and the user actually
> types "foo.elc", we need to bind load-no-native non-nil
> automatically.  Otherwise users would be surprised, as it goes against
> the logic of what we do when the user types "foo.el".

We certanly can do this if this is what we want.  This breaks a little
the idea to have the system as much transparent as possible, I went this
way cause this was my understanding of what we wanted but I've no strong
feeling with that.

Thanks

  Andrea



Reply | Threaded
Open this post in threaded view
|

bug#47067: 28.0.50; [feature/native-comp] Crash while scrolling through dispnew.c

Emacs - Bugs mailing list
In reply to this post by Eli Zaretskii
Eli Zaretskii <[hidden email]> writes:

>> From: Andrea Corallo <[hidden email]>
>> Cc: [hidden email]
>> Date: Fri, 12 Mar 2021 19:30:31 +0000
>>
>> > Btw, what are those "??" there instead of function names?  Do you see
>> > the same on your system?
>>
>> Yes I think too they are in place of function names.  You could verify
>> if the address is mapped by an .eln file.
>
> Doesn't look like that.
>
>> I do not see those in my back-traces so it might a bug of the Windows
>> toolchain?
>
> Please show the full backtrace (including the Lisp backtrace) you get
> as result of the following steps:
>
>   gdb ./emacs -Q
>   (gdb) break Fredraw_display
>   (gdb) r -Q
>
>   C-x C-f dispnew.c
>   M-x redraw-display
>
>   (gdb) break Fskip_chars_backward
>   (gdb) c
>
>   C-s sleep
>
>   (gdb) bt
>
> I'd like to compare your backtrace with what I get here.

Mmmh, my Emacs on the 32bit system I prepared when running interactively
under gdb is unusable because all keys except the basic letters are
mixed-up.  I never experienced this, is this common?  How can I solve
it?

Thanks

  Andrea



Reply | Threaded
Open this post in threaded view
|

bug#47067: 28.0.50; [feature/native-comp] Crash while scrolling through dispnew.c

Eli Zaretskii
> From: Andrea Corallo <[hidden email]>
> Cc: [hidden email]
> Date: Fri, 12 Mar 2021 20:21:26 +0000
>
> > I'd like to compare your backtrace with what I get here.
>
> Mmmh, my Emacs on the 32bit system I prepared when running interactively
> under gdb is unusable because all keys except the basic letters are
> mixed-up.  I never experienced this, is this common?  How can I solve
> it?

Rebuild Emacs not under GDB.  I think the breakpoints and other stuff
you set up in GDB get dumped into the pdmp file, and thus render the
dumped Emacs not very usable.  I always rebuild Emacs when I let it
dump itself (which happens rarely, because I rarely have to debug
temacs).

If the above doesn't help, it could be that some of the *.eln files
are damaged for similar reasons, so maybe remove them and let Emacs
recompile them as well.

If none of the above helps, please describe how did you "prepare Emacs
on the 32bit system when running interactively under GDB", maybe I
don't understand what exactly you did there.

Meanwhile, could you please post the backtrace I asked for from the
64-bit build?  Maybe it will tell me enough to get some ideas.

Thanks.



12345