bug#36609: 27.0.50; Possible race-condition in threading implementation

classic Classic list List threaded Threaded
30 messages Options
12
Reply | Threaded
Open this post in threaded view
|

bug#36609: 27.0.50; Possible race-condition in threading implementation

Andreas Politz, INF|INF-I-2

I think there is a race-condition in the implementation of threads.  I
tried to find a minimal test-case, without success.  Thus, I've attached
a lengthy source-file.  Loading that file should trigger this bug and
may freeze your session.  

Indications:

1. The main-thread has the name of one of created threads (XEmacs in
   this case), instead of "emacs".

2. Emacs stops processing all keyboard/mouse input while looping in
   wait_reading_process_output.  Sending commands via emacsclient still
   works.

GDB output:

(gdb) info threads
  Id   Target Id                                        Frame
* 1    Thread 0x7ffff17f5d40 (LWP 26264) "XEmacs"       0x000055555576eac0 in XPNTR (a=XIL(0x7ffff1312533)) at alloc.c:535
  2    Thread 0x7ffff0ac4700 (LWP 26265) "gmain"        0x00007ffff50d1667 in poll () from /usr/lib/libc.so.6
  3    Thread 0x7fffebd1a700 (LWP 26266) "gdbus"        0x00007ffff50d1667 in poll () from /usr/lib/libc.so.6
  4    Thread 0x7fffeb519700 (LWP 26267) "dconf worker" 0x00007ffff50d1667 in poll () from /usr/lib/libc.so.6

(gdb) bt full
#0  0x00007ffff50d3f76 in pselect () at /usr/lib/libc.so.6
#1  0x0000555555832e48 in really_call_select (arg=0x7fffffffd150) at thread.c:586
        sa = 0x7fffffffd150
        self = 0x555555c154e0 <main_thread>
        oldset = {
          __val = {0, 93825011232085, 140737488343064, 31632, 31632, 51840, 140737488343136, 93824994672716, 4294967298, 140737488343184, 93824999850720, 0, 0, 140737488343136, 93824993869565, 4041340661}
        }
#2  0x0000555555774449 in flush_stack_call_func (func=0x555555832d81 <really_call_select>, arg=0x7fffffffd150) at alloc.c:4969
        end = 0x7fffffffd100
        self = 0x555555c154e0 <main_thread>
        sentry = {
          o = {
            __max_align_ll = 140737488343296,
            __max_align_ld = <invalid float value>
          }
        }
#3  0x0000555555832f40 in thread_select (func=0x7ffff50d3eb0 <pselect>, max_fds=6, rfds=0x7fffffffd260, wfds=0x7fffffffd2e0, efds=0x0, timeout=0x7fffffffd8b0, sigmask=0x0) at thread.c:616
        sa = {
          func = 0x7ffff50d3eb0 <pselect>,
          max_fds = 6,
          rfds = 0x7fffffffd260,
          wfds = 0x7fffffffd2e0,
          efds = 0x0,
          timeout = 0x7fffffffd8b0,
          sigmask = 0x0,
          result = 210347336
        }
#4  0x000055555585fea3 in xg_select (fds_lim=6, rfds=0x7fffffffd920, wfds=0x7fffffffd9a0, efds=0x0, timeout=0x7fffffffd8b0, sigmask=0x0) at xgselect.c:117
        all_rfds = {
          fds_bits = {32, 0 <repeats 15 times>}
        }
        all_wfds = {
          fds_bits = {0 <repeats 16 times>}
        }
        tmo = {
          tv_sec = 7,
          tv_nsec = 140737488343264
        }
        tmop = 0x7fffffffd8b0
        context = 0x555555dae320
        have_wfds = true
        gfds_buf = {{
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 32,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 7,
            events = 0,
            revents = 0
          }, {
            fd = 8,
            events = 0,
            revents = 0
          }, {
            fd = 15,
            events = 0,
            revents = 0
          }, {
            fd = -11072,
            events = 32767,
            revents = 0
          }, {
            fd = 1460278864,
            events = 21845,
            revents = 0
          }, {
            fd = 64,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 2,
            events = 48,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 91,
            events = 124,
            revents = 0
          }, {
            fd = -397131776,
            events = 35414,
            revents = 11125
          }, {
            fd = 1460278864,
            events = 21845,
            revents = 0
          }, {
            fd = 1,
            events = 0,
            revents = 0
          }, {
            fd = 2,
            events = 0,
            revents = 0
          }, {
            fd = -11072,
            events = 32767,
            revents = 0
          }, {
            fd = 1439161536,
            events = 21845,
            revents = 0
          }, {
            fd = -11024,
            events = 32767,
            revents = 0
          }, {
            fd = -11088,
            events = 32767,
            revents = 0
          }, {
            fd = -182671191,
            events = 32767,
            revents = 0
          }, {
            fd = 1,
            events = 0,
            revents = 0
          }, {
            fd = -182670875,
            events = 32767,
            revents = 0
          }, {
            fd = 194871296,
            events = 59160,
            revents = 48198
          }, {
            fd = -1175563804,
            events = 19,
            revents = 0
          }, {
            fd = 24,
            events = 0,
            revents = 0
          }, {
            fd = 1460278864,
            events = 21845,
            revents = 0
          }, {
            fd = 2,
            events = 0,
            revents = 0
          }, {
            fd = 1439882448,
            events = 21845,
            revents = 0
          }, {
            fd = 2,
            events = 0,
            revents = 0
          }, {
            fd = 2,
            events = 0,
            revents = 0
          }, {
            fd = 1,
            events = 2,
            revents = 0
          }, {
            fd = 1460278864,
            events = 21845,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = -397131776,
            events = 35414,
            revents = 11125
          }, {
            fd = 24,
            events = 0,
            revents = 0
          }, {
            fd = 1439161536,
            events = 21845,
            revents = 0
          }, {
            fd = 2,
            events = 0,
            revents = 0
          }, {
            fd = 1,
            events = 0,
            revents = 0
          }, {
            fd = 1439161536,
            events = 21845,
            revents = 0
          }, {
            fd = -182606722,
            events = 32767,
            revents = 0
          }, {
            fd = -10960,
            events = 32767,
            revents = 0
          }, {
            fd = 1433487750,
            events = 21845,
            revents = 0
          }, {
            fd = -10976,
            events = 32767,
            revents = 0
          }, {
            fd = 1439161536,
            events = 21845,
            revents = 0
          }, {
            fd = -30,
            events = 0,
            revents = 0
          }, {
            fd = 1,
            events = 0,
            revents = 0
          }, {
            fd = 1,
            events = 64,
            revents = 0
          }, {
            fd = 1562877925,
            events = 0,
            revents = 0
          }, {
            fd = 31,
            events = 0,
            revents = 0
          }, {
            fd = 1562877925,
            events = 0,
            revents = 0
          }, {
            fd = -10960,
            events = 32767,
            revents = 0
          }, {
            fd = 1434542700,
            events = 21845,
            revents = 0
          }, {
            fd = -10912,
            events = 32767,
            revents = 0
          }, {
            fd = 1439161536,
            events = 21845,
            revents = 0
          }, {
            fd = 1562877925,
            events = 0,
            revents = 0
          }, {
            fd = -397131776,
            events = 35414,
            revents = 11125
          }, {
            fd = -10848,
            events = 32767,
            revents = 0
          }, {
            fd = 1434543288,
            events = 21845,
            revents = 0
          }, {
            fd = 1460274293,
            events = 21845,
            revents = 0
          }, {
            fd = 1385447426,
            events = 931,
            revents = 0
          }, {
            fd = 95390,
            events = 0,
            revents = 0
          }, {
            fd = 1433865373,
            events = 7256,
            revents = 10134
          }, {
            fd = 1562877925,
            events = 0,
            revents = 0
          }, {
            fd = 1439161536,
            events = 21845,
            revents = 0
          }, {
            fd = 664149,
            events = 0,
            revents = 0
          }, {
            fd = 80000,
            events = 0,
            revents = 0
          }, {
            fd = 1562877925,
            events = 0,
            revents = 0
          }, {
            fd = 664149080,
            events = 0,
            revents = 0
          }, {
            fd = 1562877925,
            events = 0,
            revents = 0
          }, {
            fd = 664149080,
            events = 0,
            revents = 0
          }, {
            fd = -10752,
            events = 32767,
            revents = 0
          }, {
            fd = 1434543494,
            events = 21845,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = -10544,
            events = 32767,
            revents = 0
          }, {
            fd = 499622378,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 499622378,
            events = 0,
            revents = 0
          }, {
            fd = -10688,
            events = 32767,
            revents = 0
          }, {
            fd = 1434939197,
            events = 21845,
            revents = 0
          }, {
            fd = 1562877925,
            events = 0,
            revents = 0
          }, {
            fd = 164526702,
            events = 0,
            revents = 0
          }, {
            fd = 1562877925,
            events = 0,
            revents = 0
          }, {
            fd = 664149080,
            events = 0,
            revents = 0
          }, {
            fd = -10544,
            events = 32767,
            revents = 0
          }, {
            fd = 499622378,
            events = 41450,
            revents = 7623
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 1562877925,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = -1,
            events = 65535,
            revents = 65535
          }, {
            fd = -10432,
            events = 32767,
            revents = 0
          }, {
            fd = 1433359581,
            events = 21845,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 1443725184,
            events = 85,
            revents = 0
          }, {
            fd = 1450650965,
            events = 21845,
            revents = 0
          }, {
            fd = 1450650965,
            events = 21845,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 16384,
            events = 0,
            revents = 0
          }, {
            fd = 5,
            events = 0,
            revents = 0
          }, {
            fd = -10496,
            events = 32767,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = -1,
            events = 65535,
            revents = 65535
          }, {
            fd = 1562877925,
            events = 0,
            revents = 0
          }, {
            fd = 164526702,
            events = 0,
            revents = 0
          }, {
            fd = 719,
            events = 0,
            revents = 0
          }, {
            fd = 893997157,
            events = 0,
            revents = 0
          }, {
            fd = 1439269600,
            events = 21845,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = -10496,
            events = 32767,
            revents = 0
          }, {
            fd = 1433288445,
            events = 21845,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 164532384,
            events = 0,
            revents = 0
          }, {
            fd = 1562977925,
            events = 0,
            revents = 0
          }, {
            fd = 1562977925,
            events = 0,
            revents = 0
          }, {
            fd = 164532384,
            events = 0,
            revents = 0
          }, {
            fd = -10368,
            events = 32767,
            revents = 0
          }, {
            fd = 1434938909,
            events = 21845,
            revents = 0
          }, {
            fd = 100000,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 1562877925,
            events = 0,
            revents = 0
          }, {
            fd = 164532384,
            events = 0,
            revents = 0
          }, {
            fd = -1,
            events = 65535,
            revents = 65535
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }}
        gfds = 0x7fffffffd360
        gfds_size = 128
        n_gfds = -1
        retval = 0
        our_fds = 0
        max_fds = 5
        context_acquired = false
        i = 0
        nfds = 135
        tmo_in_millisec = 1030
        must_free = 0
        need_to_dispatch = false
#5  0x0000555555802a78 in wait_reading_process_output (time_limit=0, nsecs=0, read_kbd=-1, do_display=true, wait_for_cell=XIL(0), wait_proc=0x0, just_wait_proc=0) at process.c:5423
        process_skipped = false
        channel = 6
        nfds = 1
        Available = {
          fds_bits = {32, 0 <repeats 15 times>}
        }
        Writeok = {
          fds_bits = {0 <repeats 16 times>}
        }
        check_write = true
        check_delay = 0
        no_avail = false
        xerrno = 11
        proc = XIL(0x7fffffffd9a0)
        timeout = {
          tv_sec = 0,
          tv_nsec = 499622378
        }
        end_time = {
          tv_sec = 93824993869565,
          tv_nsec = 0
        }
        timer_delay = {
          tv_sec = 0,
          tv_nsec = 499622378
        }
        got_output_end_time = {
          tv_sec = 1562977205,
          tv_nsec = 270561059
        }
        wait = FOREVER
        got_some_output = -1
        prev_wait_proc_nbytes_read = 0
        retry_for_async = false
        count = 4
        now = {
          tv_sec = 0,
          tv_nsec = -1
        }
#6  0x00005555556f4718 in kbd_buffer_get_event (kbp=0x7fffffffdc70, used_mouse_menu=0x7fffffffe235, end_time=0x0) at keyboard.c:3836
        do_display = true
        obj = make_fixnum(1073741816)
#7  0x00005555556f06c5 in read_event_from_main_queue (end_time=0x0, local_getcjmp=0x7fffffffe040, used_mouse_menu=0x7fffffffe235) at keyboard.c:2138
        c = XIL(0)
        save_jump = {{
            __jmpbuf = {0, 0, 0, 0, 0, 0, 0, 0},
            __mask_was_saved = 0,
            __saved_mask = {
              __val = {0 <repeats 16 times>}
            }
          }}
        kb = 0x7fffffffdcd0
        count = 3
#8  0x00005555556f099b in read_decoded_event_from_main_queue (end_time=0x0, local_getcjmp=0x7fffffffe040, prev_event=XIL(0), used_mouse_menu=0x7fffffffe235) at keyboard.c:2202
        nextevt = XIL(0)
        frame = 0x1dcd30d9
        terminal = 0x9
        events = {XIL(0), XIL(0x1dcd30d9), XIL(0xca80), XIL(0x2b758a56e8544000), XIL(0), XIL(0x555556772ff5), XIL(0x7fffffffde50), XIL(0x5555556f57d8), XIL(0x555555c982e0), XIL(0), XIL(0), XIL(0x7fffffffde50), XIL(0x5555556e3efd), XIL(0x1dcd30d9), XIL(0x7fffffffde90), XIL(0x5555556f3b29)}
        n = 0
#9  0x00005555556f2236 in read_char (commandflag=1, map=XIL(0x5555567756e3), prev_event=XIL(0), used_mouse_menu=0x7fffffffe235, end_time=0x0) at keyboard.c:2810
        c = XIL(0)
        jmpcount = 3
        local_getcjmp = {{
            __jmpbuf = {0, -245616553674816983, 93825011232757, 51840, 0, 0, -245616553305718231, -6214351577293929943},
            __mask_was_saved = 0,
            __saved_mask = {
              __val = {0, 0, 140737488347312, 93824993869565, 0, 140737488347424, 93824994673374, 93825011242707, 3, 0, 0, 140737488347424, 93824994446493, 93824999850720, 0, 0}
            }
          }}
        save_jump = {{
            __jmpbuf = {93824993869565, 0, 140737488347584, 93824994020698, 0, 51840, -1, 0},
            __mask_was_saved = 1450661587,
            __saved_mask = {
              __val = {93825011242723, 140737488347520, 93824993870282, 93825011242707, 93825011242723, 140737488347584, 93824994446493, 93824999885184, 34464, 34464, 140737488347584, 93824993869565, 40105367251, 93825004306304, 140737488347624, 93824999850720}
            }
          }}
        tem = XIL(0x2aaa9b29c970)
        save = XIL(0)
        previous_echo_area_message = XIL(0)
        also_record = XIL(0)
        reread = false
        recorded = false
        polling_stopped_here = true
        orig_kboard = 0x555555dfa570
#10 0x00005555556ff6c8 in read_key_sequence (keybuf=0x7fffffffe440, prompt=XIL(0), dont_downcase_last=false, can_return_switch_frame=true, fix_current_buffer=true, prevent_redisplay=false) at keyboard.c:9124
        interrupted_kboard = 0x555555dfa570
        interrupted_frame = 0x5555560d7f80
        key = XIL(0x5555557a7eff)
        used_mouse_menu = false
        echo_local_start = 0
        last_real_key_start = 0
        keys_local_start = 0
        new_binding = XIL(0x7fffffffe310)
        count = 3
        t = 0
        echo_start = 0
        keys_start = 0
        current_binding = XIL(0x5555567756e3)
        first_unbound = 31
        mock_input = 0
        used_mouse_menu_history = {false <repeats 30 times>}
        fkey = {
          parent = XIL(0x555555d6ce03),
          map = XIL(0x555555d6ce03),
          start = 0,
          end = 0
        }
        keytran = {
          parent = XIL(0x7ffff14a7dbb),
          map = XIL(0x7ffff14a7dbb),
          start = 0,
          end = 0
        }
        indec = {
          parent = XIL(0x555555d6cdf3),
          map = XIL(0x555555d6cdf3),
          start = 0,
          end = 0
        }
        shift_translated = false
        delayed_switch_frame = XIL(0)
        original_uppercase = XIL(0)
        original_uppercase_position = -1
        dummyflag = false
        starting_buffer = 0x7ffff0e1f6f0
        fake_prefixed_keys = XIL(0)
        first_event = XIL(0)
        second_event = XIL(0)
#11 0x00005555556ee5fb in command_loop_1 () at keyboard.c:1348
        cmd = XIL(0)
        keybuf = {XIL(0x7fffffffe490), XIL(0x5555557a804c), make_fixnum(11545945833472), XIL(0x7fffffffe4c0), XIL(0x555555c982e0), XIL(0), XIL(0), XIL(0x7fffffffe490), XIL(0x5555556e3efd), XIL(0xf0e1f6f5), XIL(0x7fffffffe500), make_fixnum(23456248668343), XIL(0x555555d18953), XIL(0x3), XIL(0x555555c982e0), XIL(0), XIL(0), XIL(0x7fffffffe4e0), XIL(0x5555556e3efd), XIL(0xf0e1f6f5), XIL(0x7fffffffe520), XIL(0x5555557a2e69), XIL(0x1556e3efd), XIL(0x5760), XIL(0x7fffffffe540), XIL(0x555555d53470), XIL(0), XIL(0), XIL(0x7fffffffe550), make_fixnum(23456248662876)}
        i = 32767
        prev_modiff = 0
        prev_buffer = 0x0
        already_adjusted = false
#12 0x00005555557a2a64 in internal_condition_case (bfun=0x5555556ee1b3 <command_loop_1>, handlers=XIL(0x5760), hfun=0x5555556ed968 <cmd_error>) at eval.c:1351
        val = XIL(0x5555556e3efd)
        c = 0x555555d53470
#13 0x00005555556ede9b in command_loop_2 (ignore=XIL(0)) at keyboard.c:1091
        val = XIL(0)
#14 0x00005555557a22d5 in internal_catch (tag=XIL(0xd110), func=0x5555556ede6e <command_loop_2>, arg=XIL(0)) at eval.c:1112
        val = XIL(0x45b00000000)
        c = 0x555555d53340
#15 0x00005555556ede39 in command_loop () at keyboard.c:1070
#16 0x00005555556ed537 in recursive_edit_1 () at keyboard.c:714
        count = 1
        val = make_fixnum(23456248667937)
#17 0x00005555556ed6bb in Frecursive_edit () at keyboard.c:786
        count = 0
        buffer = XIL(0)
#18 0x00005555556eb49d in main (argc=4, argv=0x7fffffffe8b8) at emacs.c:2086
        stack_bottom_variable = 0x20
        do_initial_setlocale = true
        no_loadup = false
        junk = 0x0
        dname_arg = 0x0
        ch_to_dir = 0x0
        original_pwd = 0x0
        dump_mode = 0x0
        skip_args = 0
        temacs = 0x0
        attempt_load_pdump = true
        rlim = {
          rlim_cur = 10022912,
          rlim_max = 18446744073709551615
        }
        sockfd = -1


thread-bug-2.el (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

bug#36609: 27.0.50; Possible race-condition in threading implementation

Eli Zaretskii
> From: Andreas Politz <[hidden email]>
> Date: Thu, 11 Jul 2019 22:51:10 +0200
>
> I think there is a race-condition in the implementation of threads.

Not sure what you mean by "race condition".  Since only one Lisp
thread can run at any given time, where could this race happen, and
between what threads?

> I tried to find a minimal test-case, without success.  Thus, I've
> attached a lengthy source-file.  Loading that file should trigger
> this bug and may freeze your session.

FWIW, it doesn't freeze my Emacs here, neither on GNU/Linux nor on
MS-Windows (with today's master).  Are you getting freezes with 100%
reproducibility?  If so, please show all the details of your build, as
collected by report-emacs-bug.

> Indications:
>
> 1. The main-thread has the name of one of created threads (XEmacs in
>    this case), instead of "emacs".

I don't think this is related to the problem.  I think we have a bug
in the implementation of the 'name' attribute of threads: we call
prctl(PR_SET_NAME), which AFAIU changes the name of the _calling_
thread, and the calling thread at that point is the main thread, see
sys_thread_create.  If I evaluate this:

  (make-thread 'ignore "XEmacs")

and then attach a debugger, I see that the name of the main thread has
changed to "XEmacs".

> 2. Emacs stops processing all keyboard/mouse input while looping in
>    wait_reading_process_output.  Sending commands via emacsclient still
>    works.
>
> GDB output:
>
> (gdb) info threads
>   Id   Target Id                                        Frame
> * 1    Thread 0x7ffff17f5d40 (LWP 26264) "XEmacs"       0x000055555576eac0 in XPNTR (a=XIL(0x7ffff1312533)) at alloc.c:535
>   2    Thread 0x7ffff0ac4700 (LWP 26265) "gmain"        0x00007ffff50d1667 in poll () from /usr/lib/libc.so.6
>   3    Thread 0x7fffebd1a700 (LWP 26266) "gdbus"        0x00007ffff50d1667 in poll () from /usr/lib/libc.so.6
>   4    Thread 0x7fffeb519700 (LWP 26267) "dconf worker" 0x00007ffff50d1667 in poll () from /usr/lib/libc.so.6
>
> (gdb) bt full

This "bt full" is not enough, because it shows the backtrace of one
thread only, the main thread.  Please show the backtrace of the other
3 threads by typing "thread apply all bt full" instead.

> #5  0x0000555555802a78 in wait_reading_process_output (time_limit=0, nsecs=0, read_kbd=-1, do_display=true, wait_for_cell=XIL(0), wait_proc=0x0, just_wait_proc=0) at process.c:5423
>         process_skipped = false
>         channel = 6
>         nfds = 1
>         Available = {
>           fds_bits = {32, 0 <repeats 15 times>}
>         }

Is the keyboard descriptor's bit in Available set or not?  What is the
contents of the fd_callback_info array at this point?

> ;; -*- lexical-binding: t -*-
>
> (require 'threads)
> (require 'eieio)
> (require 'cl-lib)
> (require 'ring)
>
> (defun debug (fmt &rest args)
>   (princ (apply #'format fmt args) #'external-debugging-output)
>   (terpri #'external-debugging-output))

Please describe what this program tries to accomplish, and how.  It's
not easy to reverse-engineer that from 200 lines of non-trivial code.
It's possible that the reason(s) for the freeze will be apparent from
describing your implementation ideas.

Thanks.



Reply | Threaded
Open this post in threaded view
|

bug#36609: 27.0.50; Possible race-condition in threading implementation

Eli Zaretskii
> Date: Fri, 12 Jul 2019 10:47:37 +0300
> From: Eli Zaretskii <[hidden email]>
> Cc: [hidden email]
>
> Please describe what this program tries to accomplish, and how.  It's
> not easy to reverse-engineer that from 200 lines of non-trivial code.
> It's possible that the reason(s) for the freeze will be apparent from
> describing your implementation ideas.

Oh, and one more question: does the value of the global variable
last_thread_error tell something interesting at that point?



Reply | Threaded
Open this post in threaded view
|

bug#36609: 27.0.50; Possible race-condition in threading implementation

Pip Cet
In reply to this post by Andreas Politz, INF|INF-I-2
On Thu, Jul 11, 2019 at 8:52 PM Andreas Politz
<[hidden email]> wrote:
> I think there is a race-condition in the implementation of threads.  I
> tried to find a minimal test-case, without success.  Thus, I've attached
> a lengthy source-file.  Loading that file should trigger this bug and
> may freeze your session.

It does here, so I can provide further debugging information if
needed. On first glance, it appears that xgselect returns abnormally
with g_main_context acquired in one thread, and then other threads
fail to acquire it and loop endlessly.

Can you confirm that this very dirty patch avoids (it's not a fix by
any means!) the problem?

Avoid-glib-issue.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

bug#36609: 27.0.50; Possible race-condition in threading implementation

Eli Zaretskii
> From: Pip Cet <[hidden email]>
> Date: Fri, 12 Jul 2019 09:02:22 +0000
> Cc: [hidden email]
>
> On Thu, Jul 11, 2019 at 8:52 PM Andreas Politz
> <[hidden email]> wrote:
> > I think there is a race-condition in the implementation of threads.  I
> > tried to find a minimal test-case, without success.  Thus, I've attached
> > a lengthy source-file.  Loading that file should trigger this bug and
> > may freeze your session.
>
> It does here, so I can provide further debugging information if
> needed.

Thanks, can you provide the info I asked for?

> On first glance, it appears that xgselect returns abnormally with
> g_main_context acquired in one thread, and then other threads fail
> to acquire it and loop endlessly.

If you can describe what causes this to happen, I think we might be
half-way to a solution.

> +  ptrdiff_t count = SPECPDL_INDEX ();

I don't think we should do that at this low level.



Reply | Threaded
Open this post in threaded view
|

bug#36609: 27.0.50; Possible race-condition in threading implementation

Pip Cet
In reply to this post by Pip Cet
On Fri, Jul 12, 2019 at 9:02 AM Pip Cet <[hidden email]> wrote:

>
> On Thu, Jul 11, 2019 at 8:52 PM Andreas Politz
> <[hidden email]> wrote:
> > I think there is a race-condition in the implementation of threads.  I
> > tried to find a minimal test-case, without success.  Thus, I've attached
> > a lengthy source-file.  Loading that file should trigger this bug and
> > may freeze your session.
>
> It does here, so I can provide further debugging information if
> needed. On first glance, it appears that xgselect returns abnormally
> with g_main_context acquired in one thread, and then other threads
> fail to acquire it and loop endlessly.
Okay, I'm still not sure this is really the problem Andreas was
seeing, but this code fails to work with xg_select:

(let ((thread (make-thread (lambda ()
                 (sleep-for 3)))))
  (thread-yield)
  (thread-signal thread 'error nil))

Proposed patch attached.

0001-Protect-against-abnormal-exit-in-xg_select.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

bug#36609: 27.0.50; Possible race-condition in threading implementation

Eli Zaretskii
> From: Pip Cet <[hidden email]>
> Date: Fri, 12 Jul 2019 12:44:35 +0000
> Cc: [hidden email]
>
> Okay, I'm still not sure this is really the problem Andreas was
> seeing, but this code fails to work with xg_select:
>
> (let ((thread (make-thread (lambda ()
>                  (sleep-for 3)))))
>   (thread-yield)
>   (thread-signal thread 'error nil))

What do you mean by "fails"?  What do you see when you eval this?

> Proposed patch attached.

As I said earlier, I don't think it's right to use stack unwinding at
this level.  (Did you try this in a -nw session?)

I'd appreciate a description of what you think happens here, and why.

Thanks.



Reply | Threaded
Open this post in threaded view
|

bug#36609: 27.0.50; Possible race-condition in threading implementation

Pip Cet
In reply to this post by Eli Zaretskii
On Fri, Jul 12, 2019 at 12:42 PM Eli Zaretskii <[hidden email]> wrote:

>
> > From: Pip Cet <[hidden email]>
> > Date: Fri, 12 Jul 2019 09:02:22 +0000
> > Cc: [hidden email]
> >
> > On Thu, Jul 11, 2019 at 8:52 PM Andreas Politz
> > <[hidden email]> wrote:
> > > I think there is a race-condition in the implementation of threads.  I
> > > tried to find a minimal test-case, without success.  Thus, I've attached
> > > a lengthy source-file.  Loading that file should trigger this bug and
> > > may freeze your session.
> >
> > It does here, so I can provide further debugging information if
> > needed.
>
> Thanks, can you provide the info I asked for?

Yes, albeit not right now.

> > On first glance, it appears that xgselect returns abnormally with
> > g_main_context acquired in one thread, and then other threads fail
> > to acquire it and loop endlessly.
>
> If you can describe what causes this to happen, I think we might be
> half-way to a solution.

Here's the backtrace of the abnormal exit I see with the patch attached:


(gdb) bt full
#0  0x00000000006bf987 in release_g_main_context (ptr=0xc1d070) at xgselect.c:36
        context = 0x7fffedf79710
#1  0x0000000000616f03 in do_one_unbind
    (this_binding=0x7fffedf79770, unwinding=true, bindflag=SET_INTERNAL_UNBIND)
    at eval.c:3446
#2  0x0000000000617245 in unbind_to (count=0, value=XIL(0)) at eval.c:3567
        this_binding = {
          kind = SPECPDL_UNWIND_PTR,
          unwind = {
            kind = SPECPDL_UNWIND_PTR,
            func = 0x6bf97b <release_g_main_context>,
            arg = XIL(0xc1d070),
            eval_depth = 0
          },
          unwind_array = {
            kind = SPECPDL_UNWIND_PTR,
            nelts = 7076219,
            array = 0xc1d070
          },
          unwind_ptr = {
            kind = SPECPDL_UNWIND_PTR,
            func = 0x6bf97b <release_g_main_context>,
            arg = 0xc1d070
          },
          unwind_int = {
            kind = SPECPDL_UNWIND_PTR,
            func = 0x6bf97b <release_g_main_context>,
            arg = 12701808
          },
          unwind_excursion = {
            kind = SPECPDL_UNWIND_PTR,
            marker = XIL(0x6bf97b),
            window = XIL(0xc1d070)
          },
          unwind_void = {
            kind = SPECPDL_UNWIND_PTR,
            func = 0x6bf97b <release_g_main_context>
          },
          let = {
            kind = SPECPDL_UNWIND_PTR,
            symbol = XIL(0x6bf97b),
            old_value = XIL(0xc1d070),
            where = XIL(0),
            saved_value = XIL(0xef26a0)
          },
          bt = {
            kind = SPECPDL_UNWIND_PTR,
            debug_on_exit = false,
            function = XIL(0x6bf97b),
            args = 0xc1d070,
            nargs = 0
          }
        }
        quitf = XIL(0)
#3  0x00000000006116df in unwind_to_catch
    (catch=0x7fffd8000c50, type=NONLOCAL_EXIT_SIGNAL, value=XIL(0x14d3653))
    at eval.c:1162
        last_time = false
#4  0x00000000006126d9 in signal_or_quit
    (error_symbol=XIL(0x90), data=XIL(0), keyboard_quit=false) at eval.c:1674
        unwind_data = XIL(0x14d3653)
        conditions = XIL(0x7ffff05d676b)
        string = XIL(0x5f5e77)
        real_error_symbol = XIL(0x90)
        clause = XIL(0x30)
        h = 0x7fffd8000c50
#5  0x00000000006122e9 in Fsignal (error_symbol=XIL(0x90), data=XIL(0))
    at eval.c:1564
#6  0x0000000000698901 in post_acquire_global_lock (self=0xe09db0) at
thread.c:115
        sym = XIL(0x90)
        data = XIL(0)
        prev_thread = 0xa745c0 <main_thread>
#7  0x000000000069892b in acquire_global_lock (self=0xe09db0) at thread.c:123
#8  0x0000000000699303 in really_call_select (arg=0x7fffedf79a70) at
thread.c:596
        sa = 0x7fffedf79a70
        self = 0xe09db0
        oldset = {
          __val = {0, 0, 7, 0, 80, 140736817269952, 2031, 2080,
18446744073709550952, 32, 343597383808, 4, 0, 472446402655,
511101108348, 0}
        }
#9  0x00000000005e5ee0 in flush_stack_call_func
    (func=0x699239 <really_call_select>, arg=0x7fffedf79a70) at alloc.c:4969
        end = 0x7fffedf79a30
        self = 0xe09db0
        sentry = {
          o = {
            __max_align_ll = 0,
            __max_align_ld = <invalid float value>
          }
        }
#10 0x0000000000699389 in thread_select
    (func=0x419320 <pselect@plt>, max_fds=9, rfds=0x7fffedf79fa0,
wfds=0x7fffedf79f20, efds=0x0, timeout=0x7fffedf7a260, sigmask=0x0) at
thread.c:616
        sa = {
          func = 0x419320 <pselect@plt>,
          max_fds = 9,
          rfds = 0x7fffedf79fa0,
          wfds = 0x7fffedf79f20,
          efds = 0x0,
          timeout = 0x7fffedf7a260,
          sigmask = 0x0,
          result = 1
        }
#11 0x00000000006bfef5 in xg_select
    (fds_lim=9, rfds=0x7fffedf7a300, wfds=0x7fffedf7a280, efds=0x0,
timeout=0x7fffedf7a260, sigmask=0x0) at xgselect.c:130
        all_rfds = {
          fds_bits = {8, 0 <repeats 15 times>}
        }
        all_wfds = {
          fds_bits = {0 <repeats 16 times>}
        }
        tmo = {
          tv_sec = 0,
          tv_nsec = 0
        }
        tmop = 0x7fffedf7a260
        context = 0xc1d070
        have_wfds = true
        gfds_buf = {{
            fd = 5,
            events = 1,
            revents = 0
          }, {
            fd = 6,
            events = 1,
            revents = 0
          }, {
            fd = 8,
            events = 1,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          } <repeats 125 times>}
        gfds = 0x7fffedf79b10
        gfds_size = 128
        n_gfds = 3
        retval = 0
        our_fds = 0
        max_fds = 8
        context_acquired = true
        i = 3
        nfds = 0
        tmo_in_millisec = -1
        must_free = 0
        need_to_dispatch = false
        count = 3
#12 0x000000000066b757 in wait_reading_process_output
    (time_limit=3, nsecs=0, read_kbd=0, do_display=false,
wait_for_cell=XIL(0), wait_proc=0x0, just_wait_proc=0) at
process.c:5423
        process_skipped = false
        channel = 0
        nfds = 0
        Available = {
          fds_bits = {8, 0 <repeats 15 times>}
        }
        Writeok = {
          fds_bits = {0 <repeats 16 times>}
        }
        check_write = true
        check_delay = 0
        no_avail = false
        xerrno = 0
        proc = XIL(0x7fffedf7a440)
        timeout = {
          tv_sec = 3,
          tv_nsec = 0
        }
        end_time = {
          tv_sec = 1562935633,
          tv_nsec = 911868453
        }
        timer_delay = {
          tv_sec = 0,
          tv_nsec = -1
        }
        got_output_end_time = {
          tv_sec = 0,
          tv_nsec = -1
        }
        wait = TIMEOUT
        got_some_output = -1
        prev_wait_proc_nbytes_read = 0
        retry_for_async = false
        count = 2
        now = {
          tv_sec = 0,
          tv_nsec = -1
        }
#13 0x0000000000429bf6 in Fsleep_for (seconds=make_fixnum(3),
milliseconds=XIL(0))
    at dispnew.c:5825
        t = {
          tv_sec = 3,
          tv_nsec = 0
        }
        tend = {
          tv_sec = 1562935633,
          tv_nsec = 911868112
        }
        duration = 3
#14 0x0000000000613e99 in eval_sub (form=XIL(0xf6df73)) at eval.c:2273
        i = 2
        maxargs = 2
        args_left = XIL(0)
        numargs = 1
        original_fun = XIL(0x7fffefa9fb98)
        original_args = XIL(0xf6df83)
        count = 1
        fun = XIL(0xa756a5)
        val = XIL(0)
        funcar = make_fixnum(35184372085343)
        argvals =
          {make_fixnum(3), XIL(0), XIL(0), XIL(0), XIL(0), XIL(0),
XIL(0), XIL(0)}
#15 0x0000000000610032 in Fprogn (body=XIL(0)) at eval.c:462
        form = XIL(0xf6df73)
        val = XIL(0)
#16 0x0000000000616102 in funcall_lambda
    (fun=XIL(0xf6da43), nargs=0, arg_vector=0xe09dd8) at eval.c:3065
        val = XIL(0xc0)
        syms_left = XIL(0)
        next = XIL(0x3400000013)
        lexenv = XIL(0)
        count = 1
        i = 0
        optional = false
        rest = false
#17 0x0000000000615542 in Ffuncall (nargs=1, args=0xe09dd0) at eval.c:2813
        fun = XIL(0xf6da43)
        original_fun = XIL(0xf6da43)
        funcar = XIL(0xc0)
        numargs = 0
        val = XIL(0xaf72e0)
        count = 0
#18 0x000000000069956f in invoke_thread_function () at thread.c:702
        count = 0
#19 0x0000000000611d61 in internal_condition_case
    (bfun=0x69953e <invoke_thread_function>, handlers=XIL(0x30),
hfun=0x699596 <record_thread_error>) at eval.c:1351
        val = make_fixnum(1405386)
        c = 0x7fffd8000c50
#20 0x0000000000699697 in run_thread (state=0xe09db0) at thread.c:741
        stack_pos = {
          __max_align_ll = 0,
          __max_align_ld = 0
        }
        self = 0xe09db0
        iter = 0x0
        c = 0x7fffd8000b20
#21 0x00007ffff4b38fa3 in start_thread (arg=<optimized out>)
    at pthread_create.c:486
        ret = <optimized out>
        pd = <optimized out>
        now = <optimized out>
        unwind_buf = {
          cancel_jmp_buf = {{
              jmp_buf = {140737185822464, -1249422724209328276,
140737488341374, 140737488341375, 140737185822464, 0,
1249453444682727276, 1249398985402204012},
              mask_was_saved = 0
            }},
          priv = {
            pad = {0x0, 0x0, 0x0, 0x0},
            data = {
              prev = 0x0,
              cleanup = 0x0,
              canceltype = 0
            }
          }
        }
        not_first_call = <optimized out>
#22 0x00007ffff49724cf in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Lisp Backtrace:
"sleep-for" (0xedf7a530)
0xf6da40 Lisp type 3

post_acquire_global_lock () can return abnormally (I didn't know
that), so really_call_select() can, too, so thread_select() can, too.

> > +  ptrdiff_t count = SPECPDL_INDEX ();
>
> I don't think we should do that at this low level.

You're right, it does stick out. I think we're safe because we're
calling Fsignal with the global lock held, but it's not a pretty or
well-documented situation.



Reply | Threaded
Open this post in threaded view
|

bug#36609: 27.0.50; Possible race-condition in threading implementation

Eli Zaretskii
> From: Pip Cet <[hidden email]>
> Date: Fri, 12 Jul 2019 12:57:51 +0000
> Cc: [hidden email], [hidden email]
>
> Lisp Backtrace:
> "sleep-for" (0xedf7a530)
> 0xf6da40 Lisp type 3
>
> post_acquire_global_lock () can return abnormally (I didn't know
> that), so really_call_select() can, too, so thread_select() can, too.

Do you know which code sets current_thread->error_symbol, and what is
that error symbol?

> > > +  ptrdiff_t count = SPECPDL_INDEX ();
> >
> > I don't think we should do that at this low level.
>
> You're right, it does stick out. I think we're safe because we're
> calling Fsignal with the global lock held, but it's not a pretty or
> well-documented situation.

Is this the main thread which does that?  if so, there should be no
problem with holding the global lock in this case, is there?

If this is not the main thread, then the error handlers should be set
so as to log the error in last_thread_error, and then terminate the
thread (via exiting the thread function, AFAIR).

If this is not what happens, I'd like to know what does happen here,
and why.

Thanks.



Reply | Threaded
Open this post in threaded view
|

bug#36609: 27.0.50; Possible race-condition in threading implementation

Pip Cet
In reply to this post by Eli Zaretskii
On Fri, Jul 12, 2019 at 12:55 PM Eli Zaretskii <[hidden email]> wrote:

> > From: Pip Cet <[hidden email]>
> > Date: Fri, 12 Jul 2019 12:44:35 +0000
> > Cc: [hidden email]
> >
> > Okay, I'm still not sure this is really the problem Andreas was
> > seeing, but this code fails to work with xg_select:
> >
> > (let ((thread (make-thread (lambda ()
> >                  (sleep-for 3)))))
> >   (thread-yield)
> >   (thread-signal thread 'error nil))
>
> What do you mean by "fails"?  What do you see when you eval this?

With `emacs -Q', a blinking cursor and an unresponsive emacs (no
response to key presses, not even triple C-g). Sometimes, at least.

Here's a full backtrace of a session where the hang did happen:

Thread 4 (Thread 0x7fffeeb8a700 (LWP 26117)):
#0  0x00007ffff4967819 in __GI___poll (fds=0xd93070, nfds=1, timeout=-1)
    at ../sysdeps/unix/sysv/linux/poll.c:29
        resultvar = 18446744073709551100
        sc_cancel_oldtype = 0
#1  0x00007ffff698e136 in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#2  0x00007ffff698e25c in g_main_context_iteration ()
    at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3  0x00007fffeebd7ffd in  ()
    at /usr/lib/x86_64-linux-gnu/gio/modules/libdconfsettings.so
#4  0x00007ffff69b6415 in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#5  0x00007ffff4b38fa3 in start_thread (arg=<optimized out>)
    at pthread_create.c:486
        ret = <optimized out>
        pd = <optimized out>
        now = <optimized out>
        unwind_buf = {
          cancel_jmp_buf = {{
              jmp_buf = {140737198466816, -2332898995959237690,
140737488337566, 140737488337567, 140737198466816, 13933232,
2332932536188793798, 2332875456844002246},
              mask_was_saved = 0
            }},
          priv = {
            pad = {0x0, 0x0, 0x0, 0x0},
            data = {
              prev = 0x0,
              cleanup = 0x0,
              canceltype = 0
            }
          }
        }
        not_first_call = <optimized out>
#6  0x00007ffff49724cf in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7fffef59e700 (LWP 26116)):
#0  0x00007ffff4967819 in __GI___poll (fds=0xcb7a20, nfds=2, timeout=-1)
    at ../sysdeps/unix/sysv/linux/poll.c:29
        resultvar = 18446744073709551100
        sc_cancel_oldtype = 0
#1  0x00007ffff698e136 in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#2  0x00007ffff698e4c2 in g_main_loop_run ()
    at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3  0x00007ffff6bba0d6 in  () at /lib/x86_64-linux-gnu/libgio-2.0.so.0
#4  0x00007ffff69b6415 in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#5  0x00007ffff4b38fa3 in start_thread (arg=<optimized out>)
    at pthread_create.c:486
        ret = <optimized out>
        pd = <optimized out>
        now = <optimized out>
        unwind_buf = {
          cancel_jmp_buf = {{
              jmp_buf = {140737209034496, -2332898995959237690,
140737488336926, 140737488336927, 140737209034496, 13323152,
2332932821804118982, 2332875456844002246},
              mask_was_saved = 0
            }},
          priv = {
            pad = {0x0, 0x0, 0x0, 0x0},
            data = {
              prev = 0x0,
              cleanup = 0x0,
              canceltype = 0
            }
          }
        }
        not_first_call = <optimized out>
#6  0x00007ffff49724cf in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7ffff017d700 (LWP 26115)):
#0  0x00007ffff4967819 in __GI___poll (fds=0xb400d0, nfds=1, timeout=-1)
    at ../sysdeps/unix/sysv/linux/poll.c:29
        resultvar = 18446744073709551100
        sc_cancel_oldtype = 0
#1  0x00007ffff698e136 in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#2  0x00007ffff698e25c in g_main_context_iteration ()
    at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3  0x00007ffff698e2a1 in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#4  0x00007ffff69b6415 in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#5  0x00007ffff4b38fa3 in start_thread (arg=<optimized out>)
    at pthread_create.c:486
        ret = <optimized out>
        pd = <optimized out>
        now = <optimized out>
        unwind_buf = {
          cancel_jmp_buf = {{
              jmp_buf = {140737221482240, -2332898995959237690,
140737488347710, 140737488347711, 140737221482240, 0,
2332865253915489222, 2332875456844002246},
              mask_was_saved = 0
            }},
          priv = {
            pad = {0x0, 0x0, 0x0, 0x0},
            data = {
              prev = 0x0,
              cleanup = 0x0,
              canceltype = 0
            }
          }
        }
        not_first_call = <optimized out>
#6  0x00007ffff49724cf in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7ffff0e30e00 (LWP 26108)):
#0  0x00007ffff4b404c6 in pthread_sigmask
    (how=2, newmask=<optimized out>, oldmask=0x0)
    at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:48
        local_newmask = {
          __val = {7, 17697270867008781056, 11498208, 5790296, 0,
140737488342800, 2, 0, 0, 0, 0, 0, 0, 0, 0, 6922490}
        }
        result = 0
#1  0x0000000000585a7d in restore_signal_mask (oldset=0x7fffffffcf10)
    at sysdep.c:876
#2  0x0000000000699292 in really_call_select (arg=0x7fffffffd060) at
thread.c:584
        sa = 0x7fffffffd060
        self = 0xa745c0 <main_thread>
        oldset = {
          __val = {0, 5710159, 8192, 126116988112, 140737488344608,
140737298839344, 7, 0, 11381952, 0, 8192, 1562936528, 140737488342720,
5968906924941129, 0, 16538645}
        }
#3  0x00000000005e5ee0 in flush_stack_call_func
    (func=0x699239 <really_call_select>, arg=0x7fffffffd060) at alloc.c:4969
        end = 0x7fffffffd020
        self = 0xa745c0 <main_thread>
        sentry = {
          o = {
            __max_align_ll = 0,
            __max_align_ld = <invalid float value>
          }
        }
#4  0x0000000000699389 in thread_select
    (func=0x419320 <pselect@plt>, max_fds=6, rfds=0x7fffffffd590,
wfds=0x7fffffffd510, efds=0x0, timeout=0x7fffffffd840, sigmask=0x0) at
thread.c:616
        sa = {
          func = 0x419320 <pselect@plt>,
          max_fds = 6,
          rfds = 0x7fffffffd590,
          wfds = 0x7fffffffd510,
          efds = 0x0,
          timeout = 0x7fffffffd840,
          sigmask = 0x0,
          result = -157757095
        }
#5  0x00000000006bfeac in xg_select
    (fds_lim=6, rfds=0x7fffffffd8e0, wfds=0x7fffffffd860, efds=0x0,
timeout=0x7fffffffd840, sigmask=0x0) at xgselect.c:117
        all_rfds = {
          fds_bits = {32, 0 <repeats 15 times>}
        }
        all_wfds = {
          fds_bits = {0 <repeats 16 times>}
        }
        tmo = {
          tv_sec = 5621546,
          tv_nsec = 15785445
        }
        tmop = 0x7fffffffd840
        context = 0xc1c200
        have_wfds = true
        gfds_buf = {{
            fd = 895,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 8096,
            events = 65535,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 16,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          } <repeats 11 times>, {
            fd = -2057698731,
            events = 39706,
            revents = 28662
          }, {
            fd = 3,
            events = 0,
            revents = 0
          }, {
            fd = -2057698731,
            events = 39706,
            revents = 28662
          }, {
            fd = -161788823,
            events = 32767,
            revents = 0
          }, {
            fd = -2057698731,
            events = 39642,
            revents = 28662
          }, {
            fd = -190624704,
            events = 32767,
            revents = 0
          }, {
            fd = 16,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = -664,
            events = 65535,
            revents = 65535
          }, {
            fd = -504530176,
            events = 22038,
            revents = 62873
          }, {
            fd = 12553216,
            events = 0,
            revents = 0
          }, {
            fd = -1252392363,
            events = 44399,
            revents = 28662
          }, {
            fd = 32,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 12553216,
            events = 0,
            revents = 0
          }, {
            fd = -191458327,
            events = 32767,
            revents = 0
          }, {
            fd = -190624704,
            events = 32767,
            revents = 0
          }, {
            fd = 139264,
            events = 0,
            revents = 0
          }, {
            fd = 17472,
            events = 0,
            revents = 0
          }, {
            fd = -191893943,
            events = 32767,
            revents = 0
          }, {
            fd = 128,
            events = 0,
            revents = 0
          }, {
            fd = -191909562,
            events = 32767,
            revents = 0
          }, {
            fd = 48,
            events = 0,
            revents = 0
          }, {
            fd = 1,
            events = 0,
            revents = 0
          }, {
            fd = 7,
            events = 0,
            revents = 0
          }, {
            fd = 64,
            events = 0,
            revents = 0
          }, {
            fd = 4,
            events = 32767,
            revents = 0
          }, {
            fd = 11649056,
            events = 0,
            revents = 0
          }, {
            fd = 47,
            events = 0,
            revents = 0
          }, {
            fd = 96,
            events = 0,
            revents = 0
          }, {
            fd = -664,
            events = 65535,
            revents = 65535
          }, {
            fd = 1,
            events = 0,
            revents = 0
          }, {
            fd = 4,
            events = 49,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 91,
            events = 110,
            revents = 0
          }, {
            fd = 124,
            events = 119,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 17456,
            events = 0,
            revents = 0
          }, {
            fd = -190624704,
            events = 32767,
            revents = 0
          }, {
            fd = 48,
            events = 0,
            revents = 0
          }, {
            fd = 2,
            events = 0,
            revents = 0
          }, {
            fd = -664,
            events = 65535,
            revents = 65535
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = -191900310,
            events = 32767,
            revents = 0
          }, {
            fd = -664,
            events = 65535,
            revents = 65535
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 48,
            events = 0,
            revents = 0
          }, {
            fd = 187264016,
            events = 0,
            revents = 0
          }, {
            fd = 11498208,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = -11392,
            events = 32767,
            revents = 0
          }, {
            fd = 5621546,
            events = 0,
            revents = 0
          }, {
            fd = 187264016,
            events = 0,
            revents = 0
          }, {
            fd = -11360,
            events = 32767,
            revents = 0
          }, {
            fd = 187280083,
            events = 0,
            revents = 0
          }, {
            fd = -11344,
            events = 32767,
            revents = 0
          }, {
            fd = 5622263,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 187280083,
            events = 0,
            revents = 0
          }, {
            fd = -11280,
            events = 32767,
            revents = 0
          }, {
            fd = 6170867,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 5627325,
            events = 0,
            revents = 0
          }, {
            fd = 6,
            events = 0,
            revents = 0
          }, {
            fd = 1,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 1,
            revents = 0
          }, {
            fd = 11498208,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = -11232,
            events = 32767,
            revents = 0
          }, {
            fd = 5621546,
            events = 0,
            revents = 0
          }, {
            fd = -134398935,
            events = 0,
            revents = 0
          }, {
            fd = -11200,
            events = 32767,
            revents = 0
          }, {
            fd = 187280099,
            events = 0,
            revents = 0
          }, {
            fd = -11184,
            events = 32767,
            revents = 0
          }, {
            fd = 5622263,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 187280099,
            events = 0,
            revents = 0
          }, {
            fd = -11120,
            events = 32767,
            revents = 0
          }, {
            fd = 6170867,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 5627325,
            events = 0,
            revents = 0
          }, {
            fd = 6,
            events = 0,
            revents = 0
          }, {
            fd = 1,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 1,
            revents = 0
          }, {
            fd = 11498208,
            events = 0,
            revents = 0
          }, {
            fd = 187280099,
            events = 0,
            revents = 0
          }, {
            fd = -11072,
            events = 32767,
            revents = 0
          }, {
            fd = 5622263,
            events = 0,
            revents = 0
          }, {
            fd = -11024,
            events = 32767,
            revents = 0
          }, {
            fd = -134398935,
            events = 32767,
            revents = 0
          }, {
            fd = -10944,
            events = 32767,
            revents = 0
          }, {
            fd = 15785445,
            events = 0,
            revents = 0
          }, {
            fd = -11016,
            events = 32767,
            revents = 0
          }, {
            fd = 5623097,
            events = 0,
            revents = 0
          }, {
            fd = 11498208,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = -10992,
            events = 32767,
            revents = 0
          }}
        gfds = 0x7fffffffd100
        gfds_size = 128
        n_gfds = -1
        retval = 0
        our_fds = 0
        max_fds = 5
        context_acquired = false
        i = 0
        nfds = 0
        tmo_in_millisec = 0
        must_free = 0
        need_to_dispatch = false
#6  0x000000000066b757 in wait_reading_process_output
    (time_limit=0, nsecs=0, read_kbd=-1, do_display=true,
wait_for_cell=XIL(0), wait_proc=0x0, just_wait_proc=0) at
process.c:5423
        process_skipped = false
        channel = 6
        nfds = 1
        Available = {
          fds_bits = {32, 0 <repeats 15 times>}
        }
        Writeok = {
          fds_bits = {0 <repeats 16 times>}
        }
        check_write = true
        check_delay = 0
        no_avail = false
        xerrno = 11
        proc = XIL(0xcf7c00)
        timeout = {
          tv_sec = 100000,
          tv_nsec = 0
        }
        end_time = {
          tv_sec = 0,
          tv_nsec = 140737488345168
        }
        timer_delay = {
          tv_sec = 0,
          tv_nsec = -1
        }
        got_output_end_time = {
          tv_sec = 0,
          tv_nsec = -1
        }
        wait = FOREVER
        got_some_output = -1
        prev_wait_proc_nbytes_read = 0
        retry_for_async = false
        count = 4
        now = {
          tv_sec = 0,
          tv_nsec = -1
        }
#7  0x000000000056c31e in kbd_buffer_get_event
    (kbp=0x7fffffffdbd8, used_mouse_menu=0x7fffffffe1bf, end_time=0x0)
    at keyboard.c:3836
        do_display = true
        obj = XIL(0x14bfd3fb)
#8  0x0000000000568706 in read_event_from_main_queue
    (end_time=0x0, local_getcjmp=0x7fffffffdf80, used_mouse_menu=0x7fffffffe1bf)
    at keyboard.c:2138
        c = XIL(0)
        save_jump = {{
            __jmpbuf = {0, 0, 0, 0, 0, 0, 0, 0},
            __mask_was_saved = 0,
            __saved_mask = {
              __val = {0 <repeats 16 times>}
            }
          }}
        kb = 0x55c9f7 <XSETCDR+28>
        count = 3
#9  0x0000000000568970 in read_decoded_event_from_main_queue
    (end_time=0x0, local_getcjmp=0x7fffffffdf80, prev_event=XIL(0),
used_mouse_menu=0x7fffffffe1bf) at keyboard.c:2202
        nextevt = XIL(0x7fffffffde30)
        frame = 0x0
        terminal = 0x0
        events =
          {XIL(0x7fffffffddb0), make_fixnum(1422534), XIL(0xaf72e0),
XIL(0), XIL(0), XIL(0x7fffffffddb0), make_fixnum(1405386),
XIL(0x135b623), XIL(0x7fffffffddf0), make_fixnum(1420782),
XIL(0xaf72e0), XIL(0x100000000), XIL(0), XIL(0x7fffffffddf0),
make_fixnum(1405386), XIL(0)}
        n = 0
#10 0x000000000056a002 in read_char
    (commandflag=1, map=XIL(0xfbaa33), prev_event=XIL(0),
used_mouse_menu=0x7fffffffe1bf, end_time=0x0) at keyboard.c:2810
        c = XIL(0)
        jmpcount = 3
        local_getcjmp = {{
            __jmpbuf = {0, 2332898996579464134, 16538645, 48, 0, 0,
2332898994838827974, -2332899704998988858},
            __mask_was_saved = 0,
            __saved_mask = {
              __val = {140737214405912, 0, 140737225904168, 11498208,
0, 0, 140737488347152, 5621546, 4032516088, 11498208, 0, 0,
140737488347200, 5621546, 12363987, 140737488347296}
            }
          }}
        save_jump = {{
            __jmpbuf = {0, 0, 140737224472304, 0, 0, 11529984, 5621546, 0},
            __mask_was_saved = -8336,
            __saved_mask = {
              __val = {6254137, 140737231486936, 12884893472, 0,
31776, 140737224472304, 140737231486936, 5623453, 51539607552,
140737224472309, 140737224472304, 5632971, 11529984, 11529984, 0,
11498208}
            }
          }}
        tem = XIL(0x7fffefa759f0)
        save = XIL(0)
        previous_echo_area_message = XIL(0)
        also_record = XIL(0)
        reread = false
        recorded = false
        polling_stopped_here = true
        orig_kboard = 0xc863c0
#11 0x0000000000576842 in read_key_sequence
    (keybuf=0x7fffffffe3a0, prompt=XIL(0), dont_downcase_last=false,
can_return_switch_frame=true, fix_current_buffer=true,
prevent_redisplay=false)
    at keyboard.c:9124
        interrupted_kboard = 0xc863c0
        interrupted_frame = 0xf976a0
        key = XIL(0xf97ac0)
        used_mouse_menu = false
        echo_local_start = 0
        last_real_key_start = 0
        keys_local_start = 0
        new_binding = XIL(0xf97ac0)
        count = 3
        t = 0
        echo_start = 0
        keys_start = 0
        current_binding = XIL(0xfbaa33)
        first_unbound = 31
        mock_input = 0
        used_mouse_menu_history = {false <repeats 30 times>}
        fkey = {
          parent = XIL(0xbe3cd3),
          map = XIL(0xbe3cd3),
          start = 0,
          end = 0
        }
        keytran = {
          parent = XIL(0x7ffff0ae0eeb),
          map = XIL(0x7ffff0ae0eeb),
          start = 0,
          end = 0
        }
        indec = {
          parent = XIL(0xbe3cc3),
          map = XIL(0xbe3cc3),
          start = 0,
          end = 0
        }
        shift_translated = false
        delayed_switch_frame = XIL(0)
        original_uppercase = XIL(0)
        original_uppercase_position = -1
        dummyflag = false
        starting_buffer = 0x7ffff04576f0
        fake_prefixed_keys = XIL(0)
        first_event = XIL(0)
        second_event = XIL(0)
#12 0x00000000005667ae in command_loop_1 () at keyboard.c:1348
        cmd = XIL(0x7fffffffe4f0)
        keybuf =
          {XIL(0x7fffffffe420), XIL(0x7ffff0b08008), XIL(0x100000007),
XIL(0), XIL(0), XIL(0x80a0), XIL(0x11f), XIL(0xaff380), XIL(0xaff380),
XIL(0), XIL(0x7fffffffe440), XIL(0x617021), make_fixnum(1073741824),
XIL(0x7fffffffe460), XIL(0xaf72e0), XIL(0), XIL(0),
XIL(0x7fffffffe440), make_fixnum(1405386), XIL(0xf04576f5),
XIL(0x7fffffffe4a0), XIL(0x61729f), XIL(0xaf72e0), XIL(0), XIL(0),
XIL(0x7fffffffe480), make_fixnum(1405386), XIL(0xf04576f5),
XIL(0x7fffffffe4c0), make_fixnum(1591385)}
        i = 0
        prev_modiff = 0
        prev_buffer = 0x0
        already_adjusted = false
#13 0x0000000000611d61 in internal_condition_case
    (bfun=0x566384 <command_loop_1>, handlers=XIL(0x90), hfun=0x565b7d
<cmd_error>) at eval.c:1351
        val = make_fixnum(1405386)
        c = 0xb99e70
#14 0x000000000056607a in command_loop_2 (ignore=XIL(0)) at keyboard.c:1091
        val = XIL(0)
#15 0x000000000061161a in internal_catch
    (tag=XIL(0xd140), func=0x566051 <command_loop_2>, arg=XIL(0)) at eval.c:1112
        val = XIL(0x7fffffffe5c0)
        c = 0xb94fd0
#16 0x000000000056601c in command_loop () at keyboard.c:1070
#17 0x0000000000565752 in recursive_edit_1 () at keyboard.c:714
        count = 1
        val = XIL(0x7fffffffe620)
#18 0x00000000005658d4 in Frecursive_edit () at keyboard.c:786
        count = 0
        buffer = XIL(0)
#19 0x0000000000563961 in main (argc=4, argv=0x7fffffffe878) at emacs.c:2086
        stack_bottom_variable = 0x5e00
        do_initial_setlocale = true
        no_loadup = false
        junk = 0x0
        dname_arg = 0x0
        ch_to_dir = 0x0
        original_pwd = 0x0
        dump_mode = 0x0
        skip_args = 0
        temacs = 0x0
        attempt_load_pdump = true
        rlim = {
          rlim_cur = 10022912,
          rlim_max = 18446744073709551615
        }
        sockfd = -1

i thr
  Id   Target Id                                        Frame
* 1    Thread 0x7ffff0e30e00 (LWP 26108) "emacs"        pthread_sigmask (how=0,
    newmask=<optimized out>, oldmask=0x7fffffffcf10)
    at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:48
  2    Thread 0x7ffff017d700 (LWP 26115) "gmain"
0x00007ffff4967819 in __GI___poll (fds=0xb400d0, nfds=1, timeout=-1)
at ../sysdeps/unix/sysv/linux/poll.c:29
  3    Thread 0x7fffef59e700 (LWP 26116) "gdbus"
0x00007ffff4967819 in __GI___poll (fds=0xcb7a20, nfds=2, timeout=-1)
at ../sysdeps/unix/sysv/linux/poll.c:29
  4    Thread 0x7fffeeb8a700 (LWP 26117) "dconf worker"
0x00007ffff4967819 in __GI___poll (fds=0xd93070, nfds=1, timeout=-1)
at ../sysdeps/unix/sysv/linux/poll.c:29

> > Proposed patch attached.
>
> As I said earlier, I don't think it's right to use stack unwinding at
> this level.  (Did you try this in a -nw session?)

Sorry, I didn't see that email until after I'd sent mine. I did try
with an -nw session, now, but I really think thread_select should not
return abnormally.

> I'd appreciate a description of what you think happens here, and why.

When a thread is signalled (by thread-signal, which sets another
thread's error_symbol) while the signalled thread is inside a
select(), thread_select() will return non-locally for that thread, and
we fail to release an internal GLib lock through
g_main_context_release(). That's the first bug.

When xg_select() fails to acquire the internal GLib lock, it simply
does a select() on the remaining file descriptors:

  context_acquired = g_main_context_acquire (context);
  /* FIXME: If we couldn't acquire the context, we just silently proceed
     because this function handles more than just glib file descriptors.
     Note that, as implemented, this failure is completely silent: there is
     no feedback to the caller.  */

This seems like a second, albeit documented, bug to me. I think we're
risking not waking up from the actual select because another
(non-Emacs) thread happened to hold the main context at the time.



Reply | Threaded
Open this post in threaded view
|

bug#36609: 27.0.50; Possible race-condition in threading implementation

Eli Zaretskii
> From: Pip Cet <[hidden email]>
> Date: Fri, 12 Jul 2019 13:40:15 +0000
> Cc: [hidden email], [hidden email]
>
> When a thread is signalled (by thread-signal, which sets another
> thread's error_symbol) while the signalled thread is inside a
> select(), thread_select() will return non-locally for that thread, and
> we fail to release an internal GLib lock through
> g_main_context_release(). That's the first bug.

We should either release the global lock before the thread exits, or
defer the acting upon the signal until later.  We cannot disable the
signal handling altogether because it is entirely legitimate to signal
another thread, and when we do, that other thread will _always_ be
inside thread_select.

For the main thread, handling the signal in that situation shouldn't
be a problem, because it is not going to exit.  Right?

> When xg_select() fails to acquire the internal GLib lock, it simply
> does a select() on the remaining file descriptors:

Why does it fail to acquire that lock?

>   context_acquired = g_main_context_acquire (context);
>   /* FIXME: If we couldn't acquire the context, we just silently proceed
>      because this function handles more than just glib file descriptors.
>      Note that, as implemented, this failure is completely silent: there is
>      no feedback to the caller.  */
>
> This seems like a second, albeit documented, bug to me. I think we're
> risking not waking up from the actual select because another
> (non-Emacs) thread happened to hold the main context at the time.

So what is the proposal for that? spin waiting for the lock?



Reply | Threaded
Open this post in threaded view
|

bug#36609: 27.0.50; Possible race-condition in threading implementation

Pip Cet
In reply to this post by Eli Zaretskii
On Fri, Jul 12, 2019 at 1:31 PM Eli Zaretskii <[hidden email]> wrote:>

> > From: Pip Cet <[hidden email]>
> > Date: Fri, 12 Jul 2019 12:57:51 +0000
> > Cc: [hidden email], [hidden email]
> >
> > Lisp Backtrace:
> > "sleep-for" (0xedf7a530)
> > 0xf6da40 Lisp type 3
> >
> > post_acquire_global_lock () can return abnormally (I didn't know
> > that), so really_call_select() can, too, so thread_select() can, too.
>
> Do you know which code sets current_thread->error_symbol, and what is
> that error symbol?

thread-signal, which my example code calls directly. I passed 'error,
and that's what error_symbol is set to.

last_thread_error is (error) when the Emacs session freezes.

> > > > +  ptrdiff_t count = SPECPDL_INDEX ();
> > >
> > > I don't think we should do that at this low level.
> >
> > You're right, it does stick out. I think we're safe because we're
> > calling Fsignal with the global lock held, but it's not a pretty or
> > well-documented situation.
>
> Is this the main thread which does that?  if so, there should be no
> problem with holding the global lock in this case, is there?
>
> If this is not the main thread, then the error handlers should be set
> so as to log the error in last_thread_error, and then terminate the
> thread (via exiting the thread function, AFAIR).

Indeed, that's what happens; but the thread now fails to release the
GLib lock it had previously acquired, so other threads cannot acquire
it again, ever.

> If this is not what happens, I'd like to know what does happen here,
> and why.

Sure, we should understand what's going on here; even if the fix turns
out to be simple.



Reply | Threaded
Open this post in threaded view
|

bug#36609: 27.0.50; Possible race-condition in threading implementation

Pip Cet
In reply to this post by Eli Zaretskii
On Fri, Jul 12, 2019 at 1:51 PM Eli Zaretskii <[hidden email]> wrote:

> > From: Pip Cet <[hidden email]>
> > Date: Fri, 12 Jul 2019 13:40:15 +0000
> > Cc: [hidden email], [hidden email]
> >
> > When a thread is signalled (by thread-signal, which sets another
> > thread's error_symbol) while the signalled thread is inside a
> > select(), thread_select() will return non-locally for that thread, and
> > we fail to release an internal GLib lock through
> > g_main_context_release(). That's the first bug.
>
> We should either release the global lock before the thread exits, or
> defer the acting upon the signal until later.  We cannot disable the
> signal handling altogether because it is entirely legitimate to signal
> another thread, and when we do, that other thread will _always_ be
> inside thread_select.

Really? What about thread-yield?

> For the main thread, handling the signal in that situation shouldn't
> be a problem, because it is not going to exit.  Right?

I think the main thread can still fail to release the lock...

> > When xg_select() fails to acquire the internal GLib lock, it simply
> > does a select() on the remaining file descriptors:
>
> Why does it fail to acquire that lock?

Because another thread holds it, either an Emacs or a non-Emacs
thread. In both cases, I think we might miss events unless we return
with errno == EINTR.

> >   context_acquired = g_main_context_acquire (context)
> >   /* FIXME: If we couldn't acquire the context, we just silently proceed
> >      because this function handles more than just glib file descriptors.
> >      Note that, as implemented, this failure is completely silent: there is
> >      no feedback to the caller.  */
> >
> > This seems like a second, albeit documented, bug to me. I think we're
> > risking not waking up from the actual select because another
> > (non-Emacs) thread happened to hold the main context at the time.
>
> So what is the proposal for that? spin waiting for the lock?

I don't know, to be honest, but I'm afraid there's currently no better
way. There's a way to take the lock without spinning, but it's been
deprecated.



Reply | Threaded
Open this post in threaded view
|

bug#36609: 27.0.50; Possible race-condition in threading implementation

Eli Zaretskii
> From: Pip Cet <[hidden email]>
> Date: Fri, 12 Jul 2019 14:34:44 +0000
> Cc: [hidden email], [hidden email]
>
> > > When a thread is signalled (by thread-signal, which sets another
> > > thread's error_symbol) while the signalled thread is inside a
> > > select(), thread_select() will return non-locally for that thread, and
> > > we fail to release an internal GLib lock through
> > > g_main_context_release(). That's the first bug.
> >
> > We should either release the global lock before the thread exits, or
> > defer the acting upon the signal until later.  We cannot disable the
> > signal handling altogether because it is entirely legitimate to signal
> > another thread, and when we do, that other thread will _always_ be
> > inside thread_select.
>
> Really? What about thread-yield?

What about it?

You are asking whether, when thread-signal is executed, the thread
which we are signaling is necessarily parked inside thread_select?  If
so, I don't understand your surprise: only one thread can ever be
running, and that is by definition the thread which calls
thread-signal.  All the other threads cannot be running, which means
they are parked either in thread_select or in sys_mutex_lock called
from acquire_global_lock.  Right?

As for thread-yield, I'm not sure I understand how is it related to
the issue we are discussing.

> > For the main thread, handling the signal in that situation shouldn't
> > be a problem, because it is not going to exit.  Right?
>
> I think the main thread can still fail to release the lock...

Since the main thread doesn't exit, but just longjmp's to top-level,
and then continues to run, why does it need to release the lock?  Any
thread should hold the lock for as long as it runs.

> > > When xg_select() fails to acquire the internal GLib lock, it simply
> > > does a select() on the remaining file descriptors:
> >
> > Why does it fail to acquire that lock?
>
> Because another thread holds it, either an Emacs or a non-Emacs
> thread.

OK, and why is it a problem that we continue calling thread_select
regardless of the failure to acquire the Glib lock?  is the problem
this one:

> In both cases, I think we might miss events unless we return
> with errno == EINTR.

?  Or is there something else?  If the problem with missing events,
then which events are those, and what bad things will happen if we
miss them?



Reply | Threaded
Open this post in threaded view
|

bug#36609: 27.0.50; Possible race-condition in threading implementation

Eli Zaretskii
In reply to this post by Pip Cet
> From: Pip Cet <[hidden email]>
> Date: Fri, 12 Jul 2019 13:51:48 +0000
> Cc: [hidden email], [hidden email]
>
> > If this is not the main thread, then the error handlers should be set
> > so as to log the error in last_thread_error, and then terminate the
> > thread (via exiting the thread function, AFAIR).
>
> Indeed, that's what happens; but the thread now fails to release the
> GLib lock it had previously acquired, so other threads cannot acquire
> it again, ever.

Then we should make sure we release it when the thread function exits,
or before we call Fsignal from post_acquire_global_lock.  There's no
reason to use the unwind-protect machinery for that, I think.



Reply | Threaded
Open this post in threaded view
|

bug#36609: 27.0.50; Possible race-condition in threading implementation

Pip Cet
On Fri, Jul 12, 2019 at 3:06 PM Eli Zaretskii <[hidden email]> wrote:

> > From: Pip Cet <[hidden email]>
> > Date: Fri, 12 Jul 2019 13:51:48 +0000
> > Cc: [hidden email], [hidden email]
> >
> > > If this is not the main thread, then the error handlers should be set
> > > so as to log the error in last_thread_error, and then terminate the
> > > thread (via exiting the thread function, AFAIR).
> >
> > Indeed, that's what happens; but the thread now fails to release the
> > GLib lock it had previously acquired, so other threads cannot acquire
> > it again, ever.
>
> Then we should make sure we release it when the thread function exits,

That's too late, it's legitimate for the thread to have a signal
handler. We need to release the lock at precisely the time that
unbind_to would release it.

> or before we call Fsignal from post_acquire_global_lock.  There's no
> reason to use the unwind-protect machinery for that, I think.

If we call Fsignal, surely the unwind-protect machinery is already set
up and we can safely call it? So unbind_to would work again...

I guess I've changed my mind: the unwind-protect machinery does
precisely what we want, we should simply document that thread_select
can exit nonlocally and that the select functions need to be written
to cater to that.

Two places call xg_select, and they both run long after unbind_to works.

Doing this without the unwind-protect machinery is difficult: for
example, a comment in post_acquire_global_lock claims
unbind_for_thread_switch can exit nonlocally, though I don't think
that actually happens or we would have seen the bug report.

What's your proposal for doing this?



Reply | Threaded
Open this post in threaded view
|

bug#36609: 27.0.50; Possible race-condition in threading implementation

Eli Zaretskii
> From: Pip Cet <[hidden email]>
> Date: Fri, 12 Jul 2019 18:06:29 +0000
> Cc: [hidden email], [hidden email]
>
> > Then we should make sure we release it when the thread function exits,
>
> That's too late, it's legitimate for the thread to have a signal
> handler. We need to release the lock at precisely the time that
> unbind_to would release it.

I had also another proposal:

> > or before we call Fsignal from post_acquire_global_lock.  There's no
> > reason to use the unwind-protect machinery for that, I think.
>
> If we call Fsignal, surely the unwind-protect machinery is already set
> up and we can safely call it? So unbind_to would work again...

Sorry, I don't want to call unwind-protect there.  Call me paranoid,
if you want.

> I guess I've changed my mind: the unwind-protect machinery does
> precisely what we want, we should simply document that thread_select
> can exit nonlocally and that the select functions need to be written
> to cater to that.
>
> Two places call xg_select, and they both run long after unbind_to works.
>
> Doing this without the unwind-protect machinery is difficult: for
> example, a comment in post_acquire_global_lock claims
> unbind_for_thread_switch can exit nonlocally, though I don't think
> that actually happens or we would have seen the bug report.
>
> What's your proposal for doing this?

Like I said: release the lock before calling Fsignal.  We could do
that before calling unbind_for_thread_switch, if needed.



Reply | Threaded
Open this post in threaded view
|

bug#36609: 27.0.50; Possible race-condition in threading implementation

Eli Zaretskii
> Date: Fri, 12 Jul 2019 21:27:40 +0300
> From: Eli Zaretskii <[hidden email]>
> Cc: [hidden email], [hidden email]
>
> Sorry, I don't want to call unwind-protect there.  Call me paranoid,
> if you want.

Maybe I should explain the rationale behind that paranoia.  The main
reason is that you are proposing to do that inside code that can
switch threads.  Switching threads means switching to another stack
and also to another set of handlers.  So using the unwind-protect
machinery in this situation is IMO asking for trouble.

And then there's the TTY frame case, where C-g triggers SIGINT, and we
actually longjmp from wherever we were...



Reply | Threaded
Open this post in threaded view
|

bug#36609: 27.0.50; Possible race-condition in threading implementation

Pip Cet
On Fri, Jul 12, 2019 at 6:34 PM Eli Zaretskii <[hidden email]> wrote:
> > Sorry, I don't want to call unwind-protect there.  Call me paranoid,
> > if you want.
>
> Maybe I should explain the rationale behind that paranoia.  The main

I don't think it's paranoia, I just wasn't sure there was a good way
to avoid it.

I'm now convinced that there simply is no safe way to call select()
from two threads at once when using glib.

I think our options are hacking around it and hoping nothing breaks
(this is what the attached patch does; it releases the main context
glib lock from the wrong thread soon "after" the other thread called
select, but there's actually no way to ensure that "after" is
accurate), or rewriting things so we have a single thread that does
all the select()ing.

> reason is that you are proposing to do that inside code that can
> switch threads.  Switching threads means switching to another stack
> and also to another set of handlers.  So using the unwind-protect
> machinery in this situation is IMO asking for trouble.

Thanks for explaining.

glib-hack.diff (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

bug#36609: 27.0.50; Possible race-condition in threading implementation

Pip Cet
In reply to this post by Eli Zaretskii
> > > We should either release the global lock before the thread exits, or
> > > defer the acting upon the signal until later.  We cannot disable the
> > > signal handling altogether because it is entirely legitimate to signal
> > > another thread, and when we do, that other thread will _always_ be
> > > inside thread_select.
> >
> > Really? What about thread-yield?
>
> What about it?
>
> You are asking whether, when thread-signal is executed, the thread
> which we are signaling is necessarily parked inside thread_select?  If
> so, I don't understand your surprise: only one thread can ever be
> running, and that is by definition the thread which calls
> thread-signal.  All the other threads cannot be running, which means
> they are parked either in thread_select or in sys_mutex_lock called
> from acquire_global_lock.  Right?

No, they might also be in the sys_thread_yield syscall, having
released the global lock but not yet reacquired it:

  release_global_lock ();
  sys_thread_yield (); <<<<< here
  acquire_global_lock (self);

> As for thread-yield, I'm not sure I understand how is it related to
> the issue we are discussing.

I'm not sure how it's relevant to assert that "that other thread will
_always_ be inside thread_select". I have an idea where you might be
going with that, but that idea wouldn't work (to release the lock from
the signalling thread, not the signalled thread that holds it).

> If the problem with missing events,
> then which events are those, and what bad things will happen if we
> miss them?

All events that glib knows about but Emacs doesn't. For example, a
glib timeout is apparently used to achieve some kind of scroll effect
on GTK menus, which is why we call xg_select from xmenu.c.

I don't know which libraries use glib-based threads, but I think dbus does, too.

I believe, but am not certain, that this also includes X events when
using GTK. That would explain the frozen sessions.



12