That old GTK bug

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

That old GTK bug

Per Starbäck
That old GTK bug about recovering from disconnects,
https://gitlab.gnome.org/GNOME/gtk/issues/221 , which has been a
problem for Emacs users for years and years, was closed four months
ago because it is "ridiculous". I don't know if that affects if and
how Emacs should try to mitigate this further, but thought it could be
interesting, and I don't think it has been mentioned here.

Reply | Threaded
Open this post in threaded view
|

Re: That old GTK bug

Eli Zaretskii
On December 11, 2019 12:17:33 PM GMT+02:00, "Per Starbäck" <[hidden email]> wrote:
> That old GTK bug about recovering from disconnects,
> https://gitlab.gnome.org/GNOME/gtk/issues/221 , which has been a
> problem for Emacs users for years and years, was closed four months
> ago because it is "ridiculous". I don't know if that affects if and
> how Emacs should try to mitigate this further, but thought it could be
> interesting, and I don't think it has been mentioned here.

Thanks for letting us know.

I may be missing something, but I fail to see how such "closing" of a bug is significant, let alone useful for resolving the original issue.

Reply | Threaded
Open this post in threaded view
|

Re: That old GTK bug

Noam Postavsky
In reply to this post by Per Starbäck
On Wed, 11 Dec 2019 at 05:17, Per Starbäck <[hidden email]> wrote:
>
> That old GTK bug about recovering from disconnects,
> https://gitlab.gnome.org/GNOME/gtk/issues/221 , which has been a
> problem for Emacs users for years and years, was closed four months
> ago because it is "ridiculous".

I think the important part of that response is not the "ridiculous",
but rather the request for "a smaller test case than the whole of
Emacs".

Reply | Threaded
Open this post in threaded view
|

Re: That old GTK bug

Eli Zaretskii
On December 11, 2019 3:05:43 PM GMT+02:00, Noam Postavsky <[hidden email]> wrote:

> On Wed, 11 Dec 2019 at 05:17, Per Starbäck <[hidden email]> wrote:
> >
> > That old GTK bug about recovering from disconnects,
> > https://gitlab.gnome.org/GNOME/gtk/issues/221 , which has been a
> > problem for Emacs users for years and years, was closed four months
> > ago because it is "ridiculous".
>
> I think the important part of that response is not the "ridiculous",
> but rather the request for "a smaller test case than the whole of
> Emacs".

I agree.  If someone can submit such a test cade, please do.

Reply | Threaded
Open this post in threaded view
|

Re: That old GTK bug

Dmitry Gutov
In reply to this post by Eli Zaretskii
On 11.12.2019 13:50, Eli Zaretskii wrote:

> On December 11, 2019 12:17:33 PM GMT+02:00, "Per Starbäck" <[hidden email]> wrote:
>> That old GTK bug about recovering from disconnects,
>> https://gitlab.gnome.org/GNOME/gtk/issues/221 , which has been a
>> problem for Emacs users for years and years, was closed four months
>> ago because it is "ridiculous". I don't know if that affects if and
>> how Emacs should try to mitigate this further, but thought it could be
>> interesting, and I don't think it has been mentioned here.
>
> Thanks for letting us know.
>
> I may be missing something, but I fail to see how such "closing" of a bug is significant, let alone useful for resolving the original issue.

I think the last comment might also be implying that our problem could
now be caused by the abort() call it mentions that we keep around for
GTK 2 compatibility.

(It's really not my area, so please disregard at will.)

Reply | Threaded
Open this post in threaded view
|

Re: That old GTK bug

jackkamm
> I think the last comment might also be implying that our problem could
> now be caused by the abort() call it mentions that we keep around for
> GTK 2 compatibility.

I'm way out of my depth here, but would really like to use GTK
emacsclient over SSH X forwarding, so decided to have a look at this.

To trigger the bug, I ran "emacs --fgdaemon" on latest master (9ee5af3150)
in a Debian 10 VM, then connected to it with
"ssh -XY debian10 emacsclient -nc". Then, I kill the SSH connection.

It crashes on the following code block in xterm.c:

#ifdef USE_GTK
      /* A long-standing GTK bug prevents proper disconnect handling
         <https://gitlab.gnome.org/GNOME/gtk/issues/221>.  Once,
         the resulting Glib error message loop filled a user's disk.
         To avoid this, kill Emacs unconditionally on disconnect.  */
      shut_down_emacs (0, Qnil);
      fprintf (stderr, "%s\n\
When compiled with GTK, Emacs cannot recover from X disconnects.\n\
This is a GTK bug: https://gitlab.gnome.org/GNOME/gtk/issues/221\n\
For details, see etc/PROBLEMS.\n",
               error_msg);
      emacs_abort ();
#endif /* USE_GTK */

which is the abort call that the linked comment complains about when it
called this issue "ridiculous".

So I deleted this code block and reran my test. Emacs still crashes, but
in a different location. Here's the backtrace from gdb:

(gdb) bt
#0  0x00007ffff49135cb in raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00005555555965f8 in terminate_due_to_signal (sig=sig@entry=6, backtrace_limit=backtrace_limit@entry=40)
    at emacs.c:401
#2  0x0000555555596a29 in emacs_abort () at sysdep.c:2453
#3  0x000055555559a0c7 in wait_reading_process_output
    (time_limit=time_limit@entry=30, nsecs=nsecs@entry=0, read_kbd=read_kbd@entry=-1, do_display=do_display@entry=true, wait_for_cell=wait_for_cell@entry=0x0, wait_proc=wait_proc@entry=0x0, just_wait_proc=0) at process.c:5691
#4  0x00005555555a656e in sit_for
    (timeout=timeout@entry=0x7a, reading=reading@entry=true, display_option=display_option@entry=1) at lisp.h:1032
#5  0x000055555567efba in read_char
    (commandflag=1, map=0x5555581c9f03, prev_event=0x0, used_mouse_menu=0x7fffffffe03b, end_time=0x0) at lisp.h:1147
#6  0x000055555567f6ac in read_key_sequence
    (keybuf=0x7fffffffe150, prompt=0x0, dont_downcase_last=<optimized out>, can_return_switch_frame=true, fix_current_buffer=true, prevent_redisplay=<optimized out>) at keyboard.c:9536
#7  0x0000555555680cec in command_loop_1 () at lisp.h:1032
#8  0x00005555556e60b2 in internal_condition_case
    (bfun=bfun@entry=0x555555680b10 <command_loop_1>, handlers=handlers@entry=0x90, hfun=hfun@entry=0x555555677d70 <cmd_error>) at eval.c:1355
#9  0x0000555555672b94 in command_loop_2 (ignore=ignore@entry=0x0) at lisp.h:1032
#10 0x00005555556e6031 in internal_catch
    (tag=tag@entry=0xd110, func=func@entry=0x555555672b70 <command_loop_2>, arg=arg@entry=0x0) at eval.c:1116
#11 0x0000555555672b3b in command_loop () at lisp.h:1032
#12 0x0000555555677976 in recursive_edit_1 () at keyboard.c:714
#13 0x0000555555677c95 in Frecursive_edit () at keyboard.c:786
#14 0x000055555559c91f in main (argc=2, argv=<optimized out>) at emacs.c:2054
(gdb)

In particular, it crashes at this call to emacs_abort() in process.c:

      if (nfds < 0)
        {
          if (xerrno == EINTR)
            no_avail = 1;
          else if (xerrno == EBADF)
            emacs_abort ();
          else
            report_file_errno ("Failed select", Qnil, xerrno);
        }

I don't know what this means, but I gather that emacs is crashing due to
having a bad file descriptor. So it seems like there's still some real
underlying problem here, aside from Emacs' preemptive call to abort(),
and it doesn't have anything to do with GTK 2 (as I'm using GTK 3 here).

Reply | Threaded
Open this post in threaded view
|

New Issue following up on Issue #221 Emacs disconnects

Madhu-8
In reply to this post by Eli Zaretskii
[BCC'd to open a new issue on gnome.gitlab.org. Re."That old GTK bug"]

Attached is a reduced test case which was requested in the now closed
issue #221 - quickly sloppily hacked up from simple.c from gtk+-demos.

gtk_main is replaced by a loop which calls g_main_context_iteration.
X errors are handled via XSetIOErrorHandler and XSetErrorHandler to
handle a closed display and continue with the next
g_main_context_iteration.

To Run the test case:
gcc simple.c -Wno-deprecated -Wno-deprecated-declarations -g3 $(pkg-config gtk+-3.0 --cflags --libs ) -lX11
$ DISPLAY=:0 Xephyr :1
$ DISPLAY=:1 G_DEBUG=fatal-warnings ./a.out
kill Xephyr

The stack trace which you get from killing Xephyr is attached.

[Also featured in the code but not relevant to this bug report is the
"closed" signal which is expected from when GdkDisplay connection is
lost. This signal does not seem to fire]



/* simple.c
 * Copyright (C) 1997  Red Hat, Inc
 * Author: Elliot Lee
 *
 * This library is free software; you can redistribute it and/or
 * modify it under the terms of the GNU Library General Public
 * License as published by the Free Software Foundation; either
 * version 2 of the License, or (at your option) any later version.
 *
 * This library is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 * Library General Public License for more details.
 *
 * You should have received a copy of the GNU Library General Public
 * License along with this library. If not, see <http://www.gnu.org/licenses/>.
 */
#include <gtk/gtk.h>
#include <gtk/gtkx.h>
#include <setjmp.h>

jmp_buf jmp_ret;


void
gdk_display_closed_callback(GdkDisplay *display, gboolean is_error, gpointer data)
{
  fprintf(stderr, "gdk_display_closed_callback %p is_error=%d data=%p\n",
          display, is_error, data);
}

void
register_closed_callback()
{
  Display *dpy = gdk_x11_display_get_xdisplay (gdk_display_get_default ());
  GdkDisplay *gdpy = gdk_x11_lookup_xdisplay (dpy);
  int closed_callback_tag = g_signal_connect
    (G_OBJECT (gdpy),
     "closed",
     G_CALLBACK (gdk_display_closed_callback),
     NULL);
  if (!closed_callback_tag) {
    fprintf(stderr, "failed to register a callback on gdk display ::closed\n");
  } else {
    fprintf(stderr, "registered \"closed\" callback on gdk display %p dpy %s\n", gdpy, DisplayString(dpy));
  }
}

void
hello (void)
{
  g_print ("hello world\n");
}

void
x_error_quitter (Display *display, XErrorEvent *event)
{
  if (event->error_code == BadName)
    return;
  char buf[256];
  XGetErrorText (display, event->error_code, buf, sizeof (buf));
  fprintf(stderr, "x_error_quitter on %s: error %s on protocol request %d\n",
          DisplayString(display), buf, event->request_code);
  longjmp(jmp_ret, 1);
}


static  int
x_io_error_quitter (Display *display)
{
  fprintf(stderr, "Connection lost to X Server%s\n", DisplayString (display));
  longjmp(jmp_ret, 2);
}

static int
x_error_handler (Display *display, XErrorEvent *event)
{
  fprintf(stderr, "x_error_handler %p %p\n", display, event);
  if ((event->error_code == BadMatch || event->error_code == BadWindow)
      /* && event->request_code == X_SetInputFocus */)
    {
      return 0;
    }
  x_error_quitter (display, event);
  return 0;
}

void my_gtk_main_quit(GtkWidget *window, gboolean *kill_switch)
{
  fprintf(stderr, "my_gtk_main_quit()\n");
  *kill_switch = True;
}

int
main (int argc, char *argv[])
{
  GtkWidget *window;

  gtk_init (&argc, &argv);
  register_closed_callback();
  XSetErrorHandler (x_error_handler);
  XSetIOErrorHandler (x_io_error_quitter);
  gboolean kill_switch = False;
  window = g_object_connect (g_object_new (gtk_window_get_type (),
                                           "type", GTK_WINDOW_TOPLEVEL,
                                           "title", "hello world",
                                           "resizable", FALSE,
                                           "border_width", 10,
                                           NULL),
                             "signal::destroy", my_gtk_main_quit, &kill_switch,
                             NULL);
  g_object_connect (g_object_new (gtk_button_get_type (),
                                  "GtkButton::label", "hello world",
                                  "GtkWidget::parent", window,
                                  "GtkWidget::visible", TRUE,
                                  NULL),
                    "signal::clicked", hello, NULL,
                    NULL);
  gtk_widget_show (window);
  GMainContext *default_context = g_main_context_default();
  guint64 n_iter = 0;

  int ret;
  if ((ret = setjmp(jmp_ret)) != 0) {
    fprintf(stderr, "handled longjmp from %squitter\n",
            (ret == 2) ? "io" : "");
  }

  while(1) {
    if (kill_switch) break;
    gboolean ret = g_main_context_iteration(default_context, True);
    n_iter++;
    if (n_iter % 10 == 0) fprintf(stderr, ".");
    if (n_iter % 1000 == 0) { fprintf(stderr, "\n"); n_iter = 0; }

  }

  fprintf(stderr, "\nquitting: %d\n", kill_switch);
  return 0;
}

/*
gcc simple.c -Wno-deprecated -Wno-deprecated-declarations -g3 $(pkg-config gtk+-3.0 --cflags --libs ) -lX11

$ DISPLAY=:0 Xephyr :1
$ DISPLAY=:1 G_DEBUG=fatal-warnings ./a.out
*/
registered "closed" callback on gdk display 0x42e0e0 dpy :1
[New Thread 0x7ffff194b700 (LWP 28948)]
[New Thread 0x7ffff0fba700 (LWP 28949)]
.Connection lost to X Server:1
handled longjmp from ioquitter

(a.out:28942): GLib-WARNING **: 04:52:00.994: g_main_context_prepare() called recursively from within a source's check() or prepare() member.

Thread 1 "a.out" received signal SIGTRAP, Trace/breakpoint trap.
0x00007ffff72909c5 in _g_log_abort (breakpoint=1)
    at ../glib-2.63.0/glib/gmessages.c:554
554    G_BREAKPOINT ();
(gdb) back
#0  0x00007ffff72909c5 in _g_log_abort (breakpoint=1)
    at ../glib-2.63.0/glib/gmessages.c:554
#1  0x00007ffff7291bb6 in g_logv (log_domain=0x7ffff72f93ee "GLib",
    log_level=G_LOG_LEVEL_WARNING, format=<optimized out>,
    args=args@entry=0x7fffffffdf98) at ../glib-2.63.0/glib/gmessages.c:1373
#2  0x00007ffff7291d72 in g_log (
    log_domain=log_domain@entry=0x7ffff72f93ee "GLib",
    log_level=log_level@entry=G_LOG_LEVEL_WARNING,
    format=format@entry=0x7ffff73004e8 "g_main_context_prepare() called recursively from within a source's check() or prepare() member.")
    at ../glib-2.63.0/glib/gmessages.c:1415
#3  0x00007ffff728a5fa in g_main_context_prepare (
    context=context@entry=0x44fb40, priority=priority@entry=0x7fffffffe108)
    at ../glib-2.63.0/glib/gmain.c:3434
#4  0x00007ffff728ada3 in g_main_context_iterate (
    context=context@entry=0x44fb40, block=block@entry=1,
    dispatch=dispatch@entry=1, self=<optimized out>)
    at ../glib-2.63.0/glib/gmain.c:3898
#5  0x00007ffff728af8f in g_main_context_iteration (context=0x44fb40,
    may_block=1) at ../glib-2.63.0/glib/gmain.c:3979
#6  0x00000000004011b6 in main (argc=1, argv=0x7fffffffe298) at simple.c:131