branch master segfault (2019-02-05)

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

branch master segfault (2019-02-05)

Philippe Vaucher
Hello,


Since a while, the docker images from "master" segfault:

```
philippe@pv-desktop:~$ docker run -it --rm silex/emacs:master
Fatal error 11: Segmentation fault
Backtrace:
emacs(+0x132cae)[0x555966f19cae]
emacs(+0x11925a)[0x555966f0025a]
emacs(+0x13107e)[0x555966f1807e]
emacs(+0x131328)[0x555966f18328]
emacs(+0x1313ac)[0x555966f183ac]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f3010977890]
```

The binary is semi-working tho, like for example:

```
philippe@pv-desktop:~$ docker run -it --rm silex/emacs:master emacs --version
GNU Emacs 27.0.50
Copyright (C) 2019 Free Software Foundation, Inc.
GNU Emacs comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GNU Emacs
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING.
```

Here attached you'll find the gdb trace and a strace log. I tried to build the image with debug flags using what's in DEBUG, but the errors are similar.

I know that recently the portable dumper branch was merged, I suspect it has to do with the segfault... If necessary, I can bisect to find the offending commit, but I was hoping someone would quickly know what is happening.

This happens even if I run the container in privileged mode.

Kind regards,
Philippe


gdb.txt (6K) Download Attachment
strace.txt (272K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: branch master segfault (2019-02-05)

Eli Zaretskii
> From: Philippe Vaucher <[hidden email]>
> Date: Tue, 5 Feb 2019 15:48:20 +0100
>
> Since a while, the docker images from "master" segfault:
>
> ```
> philippe@pv-desktop:~$ docker run -it --rm silex/emacs:master
> Fatal error 11: Segmentation fault
> Backtrace:
> emacs(+0x132cae)[0x555966f19cae]
> emacs(+0x11925a)[0x555966f0025a]
> emacs(+0x13107e)[0x555966f1807e]
> emacs(+0x131328)[0x555966f18328]
> emacs(+0x1313ac)[0x555966f183ac]
> /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f3010977890]
> ```
>
> The binary is semi-working tho, like for example:
>
> ```
> philippe@pv-desktop:~$ docker run -it --rm silex/emacs:master emacs --version
> GNU Emacs 27.0.50
> Copyright (C) 2019 Free Software Foundation, Inc.
> GNU Emacs comes with ABSOLUTELY NO WARRANTY.
> You may redistribute copies of GNU Emacs
> under the terms of the GNU General Public License.
> For more information about these matters, see the file named COPYING.
> ```
>
> Here attached you'll find the gdb trace and a strace log. I tried to build the image with debug flags using what's
> in DEBUG, but the errors are similar.
>
> I know that recently the portable dumper branch was merged, I suspect it has to do with the segfault... If
> necessary, I can bisect to find the offending commit, but I was hoping someone would quickly know what is
> happening.
>
> This happens even if I run the container in privileged mode.

Thanks.  Could you please show the Lisp backtrace as well, using the
"xbacktrace" command defined on src/.gdbinit?

Also, does this happen when Emacs is invoked as "emacs -Q"?

Reply | Threaded
Open this post in threaded view
|

Re: branch master segfault (2019-02-05)

Philippe Vaucher
> Here attached you'll find the gdb trace and a strace log. I tried to build the image with debug flags using what's
> in DEBUG, but the errors are similar.
>
> I know that recently the portable dumper branch was merged, I suspect it has to do with the segfault... If
> necessary, I can bisect to find the offending commit, but I was hoping someone would quickly know what is
> happening.
>
> This happens even if I run the container in privileged mode.

Thanks.  Could you please show the Lisp backtrace as well, using the
"xbacktrace" command defined on src/.gdbinit?

Here attached you'll find the whole gdb session.
 
Also, does this happen when Emacs is invoked as "emacs -Q"?

Yes it does.

Thanks,
Philippe 

gdb.txt (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: branch master segfault (2019-02-05)

Philippe Vaucher

coding.c:7777     (*(coding->encoder)) (coding);

Here `coding->encoder` is 0. I then printed `*coding` but I don't have enough understanding to figure out what's wrong with it. 

Okay, Emacs works if I in term.c:753 if I use this:

```
coding = FRAME_TERMINAL_CODING (f);
```

instead of:

```
coding = (FRAME_TERMINAL_CODING (f)->common_flags & CODING_REQUIRE_ENCODING_MASK ? FRAME_TERMINAL_CODING (f) : &safe_terminal_coding);
```

I know docker allocates a "pseudo TTY", so probably that there's a misdetection happening here. The thing is it works for all docker images <= 26.2, so something changed in Emacs that triggers this only for docker's TTY.

Getting close :-)
Philippe

p.s: I noticed that earlier I replied to Eli only, so I'm sending again the gdb backgrace with the lisp backtrace to the ML

gdb2.txt (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: branch master segfault (2019-02-05)

Philippe Vaucher
Hooray! My intuition was right (portable dumper issue):

```
root@d71b23f3596a:/opt/emacs-git# git bisect start master emacs-26.1.91
root@d71b23f3596a:/opt/emacs-git# git bisect run bash -c 'git reset --hard; make -j8 &>/dev/null || exit 125; src/emacs -f save-buffers-kill-emacs &>/dev/null || exit 1'
(...snip...)
d12e5d003d503025c1c9b0335d6518a6c3bdfae1 is the first bad commit
commit d12e5d003d503025c1c9b0335d6518a6c3bdfae1
Author: Daniel Colascione <[hidden email]>
Date:   Tue Jan 15 17:36:54 2019 -0500

    Add portable dumper
(...snip...)

root@d71b23f3596a:/opt/emacs-git# git bisect log
# bad: [19fbef549a94ccf733367d29438204e94a00e911] Fix Bug#34196
# good: [d8525ae41d07f9ea629d610de791064180423b6a] Bump Emacs version to 26.1.91
git bisect start 'master' 'emacs-26.1.91'
# good: [aaffae8458dcd774540e7e6b4219c8b5a9902075] Add debug facility for formatting in rr sessions
git bisect good aaffae8458dcd774540e7e6b4219c8b5a9902075
# good: [a0605d96187bc4103a982cededcd12e2628aba66] Fix MinGW compilation problem in timefns.c
git bisect good a0605d96187bc4103a982cededcd12e2628aba66
# good: [34b4da377ae02a0c505574f5ca5f146e92cfd046] Fix an eshell ls dired test for non-recent files
git bisect good 34b4da377ae02a0c505574f5ca5f146e92cfd046
# good: [34b4da377ae02a0c505574f5ca5f146e92cfd046] Fix an eshell ls dired test for non-recent files
git bisect good 34b4da377ae02a0c505574f5ca5f146e92cfd046
# bad: [655badc33e6ee9bfbc6c6c9084bf768f8102824d] ; Copyright fixes for pdumper files
git bisect bad 655badc33e6ee9bfbc6c6c9084bf768f8102824d
# good: [fb10834a602416f8422131d5ce9dabcc28e57be4] Avoid that unwind_format_mode_line messes up buffer points (Bug#32777)
git bisect good fb10834a602416f8422131d5ce9dabcc28e57be4
# good: [517b0aa46663b6173bb31a018aee18a82f2ca1d9] Merge from origin/emacs-26
git bisect good 517b0aa46663b6173bb31a018aee18a82f2ca1d9
# good: [c342b26371480316024e1e5d63cd8b3f035dda69] Fix drag and drop behaviour on NS (bug#30929)
git bisect good c342b26371480316024e1e5d63cd8b3f035dda69
# good: [cdb082322d4209c5104bc1a98b21bf3dd75e8f17] Fix icomplete's cycling when filename filtering kicks in
git bisect good cdb082322d4209c5104bc1a98b21bf3dd75e8f17
# good: [2a3bd6798e9670828f0402079fcc116d6d6b042d] Avoid using obsolete accept-process-output arg
git bisect good 2a3bd6798e9670828f0402079fcc116d6d6b042d
# bad: [6b9fa8804533a695094a930d634d2d6617e2b6c7] Make sure dump-mode is nil after dump
git bisect bad 6b9fa8804533a695094a930d634d2d6617e2b6c7
# bad: [02976d67369699660add46d548f0d1593885334b] Add NEWS for pdumper
git bisect bad 02976d67369699660add46d548f0d1593885334b
# bad: [d12e5d003d503025c1c9b0335d6518a6c3bdfae1] Add portable dumper
git bisect bad d12e5d003d503025c1c9b0335d6518a6c3bdfae1
# first bad commit: [d12e5d003d503025c1c9b0335d6518a6c3bdfae1] Add portable dumper
```

I quickly looked at the pdumper commit for clues but couldn't find any.

Kind regards,
Philippe

Reply | Threaded
Open this post in threaded view
|

Re: branch master segfault (2019-02-05)

Eli Zaretskii
> From: Philippe Vaucher <[hidden email]>
> Date: Wed, 6 Feb 2019 11:33:19 +0100
>
> Hooray! My intuition was right (portable dumper issue):

Yes, it was obvious once you have shown that safe_terminal_coding is
all zeroes.  Please try the latest master bramch, I think I fixed
that.

Thanks.

Reply | Threaded
Open this post in threaded view
|

Re: branch master segfault (2019-02-05)

Eli Zaretskii
In reply to this post by Philippe Vaucher
> From: Philippe Vaucher <[hidden email]>
> Date: Wed, 6 Feb 2019 10:30:06 +0100
>
> Okay, Emacs works if I in term.c:753 if I use this:
>
> ```
> coding = FRAME_TERMINAL_CODING (f);
> ```
>
> instead of:
>
> ```
> coding = (FRAME_TERMINAL_CODING (f)->common_flags & CODING_REQUIRE_ENCODING_MASK ?
> FRAME_TERMINAL_CODING (f) : &safe_terminal_coding);
> ```
>
> I know docker allocates a "pseudo TTY", so probably that there's a misdetection happening here.

No, it isn't misdetection.  The problem is that safe_terminal_coding
is not set up.

Reply | Threaded
Open this post in threaded view
|

Re: branch master segfault (2019-02-05)

Philippe Vaucher
> Okay, Emacs works if I in term.c:753 if I use this:
>
> ```
> coding = FRAME_TERMINAL_CODING (f);
> ```
>
> instead of:
>
> ```
> coding = (FRAME_TERMINAL_CODING (f)->common_flags & CODING_REQUIRE_ENCODING_MASK ?
> FRAME_TERMINAL_CODING (f) : &safe_terminal_coding);
> ```
>
> I know docker allocates a "pseudo TTY", so probably that there's a misdetection happening here.

No, it isn't misdetection.  The problem is that safe_terminal_coding
is not set up.

Your patch works, thanks!

Just curious, does that mean that in "normal terminals" (not docker) it uses `FRAME_TERMINAL_CODING (f)` but in docker it uses `safe_terminal_coding`?

If yes, why does it also work in Docker when using `FRAME_TERMINAL_CODING (f)`?

Kind regards,
Philippe
Reply | Threaded
Open this post in threaded view
|

Re: branch master segfault (2019-02-05)

Eli Zaretskii
> From: Philippe Vaucher <[hidden email]>
> Date: Wed, 6 Feb 2019 17:23:11 +0100
> Cc: Emacs developers <[hidden email]>
>
>  > I know docker allocates a "pseudo TTY", so probably that there's a misdetection happening here.
>
>  No, it isn't misdetection.  The problem is that safe_terminal_coding
>  is not set up.
>
> Your patch works, thanks!
>
> Just curious, does that mean that in "normal terminals" (not docker) it uses `FRAME_TERMINAL_CODING (f)
> ` but in docker it uses `safe_terminal_coding`?

Looks like that, yes.  It would mean that the docker doesn't define a
locale environment which defines a usable codeset.  I know nothing
about dockers, so I have no idea how could this happen.

> If yes, why does it also work in Docker when using `FRAME_TERMINAL_CODING (f)`?

Because the terminal encoding was set up, whereas safe_terminal_coding
wasn't.  In Emacs 26, safe_terminal_coding kept its value initialized
in temacs and recorded by unexec, but with pdumper that doesn't
happen, so safe_terminal_coding started as all-zero, as any other
static variable.


Reply | Threaded
Open this post in threaded view
|

Re: branch master segfault (2019-02-05)

Philippe Vaucher

> Just curious, does that mean that in "normal terminals" (not docker) it uses `FRAME_TERMINAL_CODING (f)
> ` but in docker it uses `safe_terminal_coding`?

Looks like that, yes.  It would mean that the docker doesn't define a
locale environment which defines a usable codeset.  I know nothing
about dockers, so I have no idea how could this happen.

Okay, for information here is the env and locale inside docker containers:

```
silex@silex-laptop:~$ docker run -it --rm silex/emacs:master env
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=b1366c3c8820
TERM=xterm
EMACS_BRANCH=master
EMACS_VERSION=master
HOME=/root

silex@silex-laptop:~$ docker run -it --rm silex/emacs:master locale 
LANG=
LANGUAGE=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
```

Out of curiosity I might investigate why this happens, but maybe there lies some unharmful bug in emacs there that went unnoticed for long because `safe_terminal_coding` just works.

Is all the detection happening in `setup_coding_system`? I maybe modify Emacs' source so it reports what is being detected and it might give us clues about what differs in the docker container.

I suspect the same problem happens in LXC containers.

Regards,
Philippe
Reply | Threaded
Open this post in threaded view
|

Re: branch master segfault (2019-02-05)

Eli Zaretskii
> From: Philippe Vaucher <[hidden email]>
> Date: Wed, 6 Feb 2019 20:00:45 +0100
> Cc: Emacs developers <[hidden email]>
>
> silex@silex-laptop:~$ docker run -it --rm silex/emacs:master locale
> LANG=
> LANGUAGE=
> LC_CTYPE="POSIX"
> LC_NUMERIC="POSIX"
> LC_TIME="POSIX"
> LC_COLLATE="POSIX"
> LC_MONETARY="POSIX"
> LC_MESSAGES="POSIX"
> LC_PAPER="POSIX"
> LC_NAME="POSIX"
> LC_ADDRESS="POSIX"
> LC_TELEPHONE="POSIX"
> LC_MEASUREMENT="POSIX"
> LC_IDENTIFICATION="POSIX"
> LC_ALL=
> ```

That's it: this is the "C" locale, without any codeset being declared
by any of these variables.

> Out of curiosity I might investigate why this happens, but maybe there lies some unharmful bug in emacs
> there that went unnoticed for long because `safe_terminal_coding` just works.

There's no bug, AFAICT.  When Emacs finds that the locale's codeset
doesn't do any encoding, it uses safe_terminal_coding.  The comments
near the code which was segfaulting say that much.
safe_terminal_coding is a coding-system that can handle any character
"safely".

> Is all the detection happening in `setup_coding_system`?

Which detection did you have in mind?  There's no detection inside
setup_coding_system, but to answer your question more fully, I'd like
to understand what exactly are you asking about.  If you are asking
about where does Emacs take the terminal encoding, then this is set up
according to the locale, see set-locale-environment.  If you want to
look into this, I'd start by figuring out why you have the POSIX
(a.k.a. "C") locale in the docker.

Reply | Threaded
Open this post in threaded view
|

Re: branch master segfault (2019-02-05)

Philippe Vaucher
> silex@silex-laptop:~$ docker run -it --rm silex/emacs:master locale
> LANG=
> LANGUAGE=
(snip)

That's it: this is the "C" locale, without any codeset being declared
by any of these variables.

> Out of curiosity I might investigate why this happens, but maybe there lies some unharmful bug in emacs
> there that went unnoticed for long because `safe_terminal_coding` just works.

There's no bug, AFAICT.  When Emacs finds that the locale's codeset
doesn't do any encoding, it uses safe_terminal_coding.  The comments
near the code which was segfaulting say that much.
safe_terminal_coding is a coding-system that can handle any character
"safely".

Oh, okay.
 
> Is all the detection happening in `setup_coding_system`?

Which detection did you have in mind?  There's no detection inside
setup_coding_system, but to answer your question more fully, I'd like
to understand what exactly are you asking about.  If you are asking
about where does Emacs take the terminal encoding, then this is set up
according to the locale, see set-locale-environment.  If you want to
look into this, I'd start by figuring out why you have the POSIX
(a.k.a. "C") locale in the docker.

Well I thought `safe_terminal_coding` was the "plan B" locale, when "detection goes wrong", but you just enlightened me that it was normal behavior so there's nothing to fix here.

Docker containers don't make any assumptions so it makes sense that they use the "C" locale by default.

Okay, thank you for your help resolving this issue!

Kind regards,
Philippe