bug#40665: 28.0.50; tls hang on local ssl

classic Classic list List threaded Threaded
32 messages Options
12
Reply | Threaded
Open this post in threaded view
|

bug#40665: 28.0.50; tls hang on local ssl

Derek Zhou-2

I have a vps that have ssl cert from let's encrypt. If I run emacs on
that machine and use eww to open the local url through proper DNS name,
it hangs. This only happen with gnutls 3.6+ I believe. w3m works.

recipe:
emacs -q
M-x eww
https://u15789687.ct.sendgrid.net/ls/click?upn=5u6PACFCQRlPqbnHSU4z2cQ21VoDRd3GIiP45VZkVss-3DZL1J_BeaubjMha5v6Ct1PCBdNXS7c-2B4F-2FiYml2FeMaPqPFw6mlPCBFCAdA-2BJhSopciFrXarTStvJH7F2cC8M4I5-2BSIuHrtbPlXdAWJ2PCGsvuTDU6kkpVZ5JAYurFBfbu8uG6PzR-2BqkNWc1EoW8kPIdIY9NymfSyaiKNcH-2F4DHCP9GFBvbUhtpcMZmq3z0U-2FWPsThxiaP-2BlzG2O8BI46NDehudn-2FKUS7g-2FjrNI0N1zBKjqzV6MI7ge8846jUqXY-2FUQ3jE

This only happens when running on mail.3qin.us itself; from across the
network it is ok. I believe it is timing related and worsen by very
short network latency and small files.


In GNU Emacs 28.0.50 (build 1, x86_64-pc-linux-gnu)
 of 2020-04-16 built on mail.3qin.us
Repository revision: d5a7df8c02f04102d50a5cd2290262f59f2b1415
Repository branch: master
System Description: Debian GNU/Linux 9 (stretch)

Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.
Contacting host: mail.3qin.us:443
Quit
Making completion list... [2 times]

Configured using:
 'configure --prefix=/usr/local/stow/emacs-git'

Configured features:
SOUND NOTIFY INOTIFY GNUTLS LIBXML2 ZLIB MODULES THREADS PDUMPER GMP

Important settings:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

Major mode: eww

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  buffer-read-only: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message dired dired-loaddefs rfc822 mml
mml-sec epa derived epg epg-config mm-decode mm-bodies mm-encode
mailabbrev gmm-utils mailheader sendmail gnutls network-stream url-http
mail-parse rfc2231 url-gw nsm rmc url-cache url-auth regexp-opt eww
easymenu mm-url gnus nnheader gnus-util rmail tool-bar rmail-loaddefs
rfc2047 rfc2045 ietf-drums time-date mail-utils wid-edit mm-util
mail-prsvr thingatpt url-queue url url-proxy url-privacy url-expand
url-methods url-history mailcap shr text-property-search url-cookie
url-domsuf url-util url-parse auth-source cl-seq eieio eieio-core
cl-macs eieio-loaddefs password-cache json subr-x map url-vars puny
image svg xml seq dom browse-url format-spec cl-loaddefs cl-lib
term/xterm xterm byte-opt gv bytecomp byte-compile cconv tooltip eldoc
electric uniquify ediff-hook vc-hooks lisp-float-type tabulated-list
replace newcomment text-mode elisp-mode lisp-mode prog-mode register
page tab-bar menu-bar rfn-eshadow isearch timer select mouse jit-lock
font-lock syntax facemenu font-core term/tty-colors frame minibuffer
cl-generic cham georgian utf-8-lang misc-lang vietnamese tibetan thai
tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek romanian
slovak czech european ethiopic indian cyrillic chinese composite
charscript charprop case-table epa-hook jka-cmpr-hook help simple abbrev
obarray cl-preloaded nadvice loaddefs button faces cus-face macroexp
files text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget hashtable-print-readable backquote threads inotify
multi-tty make-network-process emacs)

Memory information:
((conses 16 80381 6214)
 (symbols 48 9811 1)
 (strings 32 28375 1106)
 (string-bytes 1 862248)
 (vectors 16 12526)
 (vector-slots 8 146194 6312)
 (floats 8 58 333)
 (intervals 56 186 0)
 (buffers 992 13))



Reply | Threaded
Open this post in threaded view
|

bug#40665: 28.0.50; tls hang on local ssl

Robert Pluim
>>>>> On Thu, 16 Apr 2020 14:06:30 +0000 (UTC), Derek Zhou <[hidden email]> said:

    Derek> I have a vps that have ssl cert from let's encrypt. If I run emacs on
    Derek> that machine and use eww to open the local url through proper DNS name,
    Derek> it hangs. This only happen with gnutls 3.6+ I believe. w3m works.

w3m uses OpenSSL rather than GnuTLS, I think.

    Derek> recipe:
    Derek> emacs -q
    Derek> M-x eww
    Derek> https://u15789687.ct.sendgrid.net/ls/click?upn=5u6PACFCQRlPqbnHSU4z2cQ21VoDRd3GIiP45VZkVss-3DZL1J_BeaubjMha5v6Ct1PCBdNXS7c-2B4F-2FiYml2FeMaPqPFw6mlPCBFCAdA-2BJhSopciFrXarTStvJH7F2cC8M4I5-2BSIuHrtbPlXdAWJ2PCGsvuTDU6kkpVZ5JAYurFBfbu8uG6PzR-2BqkNWc1EoW8kPIdIY9NymfSyaiKNcH-2F4DHCP9GFBvbUhtpcMZmq3z0U-2FWPsThxiaP-2BlzG2O8BI46NDehudn-2FKUS7g-2FjrNI0N1zBKjqzV6MI7ge8846jUqXY-2FUQ3jE

    Derek> This only happens when running on mail.3qin.us itself; from across the
    Derek> network it is ok. I believe it is timing related and worsen by very
    Derek> short network latency and small files.

Which version exactly of GnuTLS are you running? Is it possible for
you to do a local install of a newer version and try that with emacs?

Otherwise, maybe turning off TLS1.3 will help:

    (setq gnutls-algorithm-priority "NORMAL:-VERS-TLS1.3")

Another thing to try is setting 'gnutls-log-level' to progressively
higher values, to see if it resolves the timing issues.

Robert



Reply | Threaded
Open this post in threaded view
|

bug#40665: 28.0.50; tls hang on local ssl

Derek Zhou-2

Derek Zhou writes:

> gnutls.c: [1] (Emacs) non-fatal error: Resource temporarily unavailable,
> try again.

I debug a bit; if I undefine GNUTLS_NONBLOCK everything works. However,
I still cannot make the non-blocking operation work without possibility
of hang or pre-mature stop on reading or writing. It is either gnutls
has some problem in non-blocking operation or the handling of it in emacs
code. Will try more tonight.

Derek



Reply | Threaded
Open this post in threaded view
|

bug#40665: 28.0.50; tls hang on local ssl

Derek Zhou-2

Derek Zhou writes:

> Derek Zhou writes:
>
>> gnutls.c: [1] (Emacs) non-fatal error: Resource temporarily unavailable,
>> try again.
>
> I debug a bit; if I undefine GNUTLS_NONBLOCK everything works. However,
> I still cannot make the non-blocking operation work without possibility
> of hang or pre-mature stop on reading or writing. It is either gnutls
> has some problem in non-blocking operation or the handling of it in emacs
> code. Will try more tonight.

I take this back. now I cannot make anything work any more. Very
strange.

Derek




Reply | Threaded
Open this post in threaded view
|

bug#40665: 28.0.50; tls hang on local ssl

Robert Pluim
In reply to this post by Robert Pluim
>>>>> On Thu, 16 Apr 2020 18:22:10 +0000 (UTC), Derek Zhou <[hidden email]> said:
    >> Which version exactly of GnuTLS are you running? Is it possible for
    >> you to do a local install of a newer version and try that with emacs?
    Derek> I am using Debian 10, standard gnutls version
    Derek> ii  libgnutls-openssl27:amd64          3.6.7-4+deb10u2                amd64        GNU TLS library - OpenSSL wrapper
    Derek> ii  libgnutls28-dev:amd64              3.6.7-4+deb10u2                amd64        GNU TLS library - development files
    Derek> I don't want to mess up system libraries; if there is a way to compile a
    Derek> gnutls locally and link emacs with it statically, I can try.

You should be able to install GnuTLS to a local directory and then
use PKG_CONFIG_PATH to point emacs' configure at it. At runtime you'd
need to use LD_LIBRARY_PATH

Robert



Reply | Threaded
Open this post in threaded view
|

bug#40665: 28.0.50; tls hang on local ssl

Derek Zhou-2
In reply to this post by Derek Zhou-2

Derek Zhou writes:

> When this thing happens, the tls handshakes are done properly. However,
> emacs did not write anything into gnutls before starting to read and
> obviously cannot get anything out at all. It is not really a hang, but
> write never happen and the display buffer stays empty.
>
> Derek

Took my nearly the whole day to debug, but this one-line patch fixed my
problem.
My server finishes tls handshake within the gnutls_boot itself, and if the
sentinel is not called right after, it will never be called so write
will not happen. Someone should review this carefully.


diff --git a/src/process.c b/src/process.c
index 91d426103d..6d497ef854 100644
--- a/src/process.c
+++ b/src/process.c
@@ -5937,8 +5937,7 @@ wait_reading_process_output (intmax_t time_limit, int nsecs, int read_kbd,
   /* If we have an incompletely set up TLS connection,
      then defer the sentinel signaling until
      later. */
-  if (NILP (p->gnutls_boot_parameters)
-      && !p->gnutls_p)
+  if (NILP (p->gnutls_boot_parameters))
 #endif
     {
       pset_status (p, Qrun);
Reply | Threaded
Open this post in threaded view
|

bug#40665: 28.0.50; tls hang on local ssl

Derek Zhou-2
In reply to this post by Robert Pluim

Robert Pluim writes:

>>>>>> On Thu, 16 Apr 2020 18:22:10 +0000 (UTC), Derek Zhou <[hidden email]> said:
>     >> Which version exactly of GnuTLS are you running? Is it possible for
>     >> you to do a local install of a newer version and try that with emacs?
>     Derek> I am using Debian 10, standard gnutls version
>     Derek> ii  libgnutls-openssl27:amd64          3.6.7-4+deb10u2                amd64        GNU TLS library - OpenSSL wrapper
>     Derek> ii  libgnutls28-dev:amd64              3.6.7-4+deb10u2                amd64        GNU TLS library - development files
>     Derek> I don't want to mess up system libraries; if there is a way to compile a
>     Derek> gnutls locally and link emacs with it statically, I can try.
>
> You should be able to install GnuTLS to a local directory and then
> use PKG_CONFIG_PATH to point emacs' configure at it. At runtime you'd
> need to use LD_LIBRARY_PATH
Tried some later version of gnutls, does not help.

Derek




Reply | Threaded
Open this post in threaded view
|

bug#40665: 28.0.50; tls hang on local ssl

Robert Pluim
In reply to this post by Derek Zhou-2
>>>>> On Sat, 18 Apr 2020 02:44:05 +0000 (UTC), Derek Zhou <[hidden email]> said:

    Derek> Derek Zhou writes:

    >> When this thing happens, the tls handshakes are done properly. However,
    >> emacs did not write anything into gnutls before starting to read and
    >> obviously cannot get anything out at all. It is not really a hang, but
    >> write never happen and the display buffer stays empty.
    >>
    >> Derek

    Derek> Took my nearly the whole day to debug, but this one-line patch fixed my
    Derek> problem.
    Derek> My server finishes tls handshake within the gnutls_boot itself, and if the
    Derek> sentinel is not called right after, it will never be called so write
    Derek> will not happen. Someone should review this carefully.

    Derek> diff --git a/src/process.c b/src/process.c
    Derek> index 91d426103d..6d497ef854 100644
    Derek> --- a/src/process.c
    Derek> +++ b/src/process.c
    Derek> @@ -5937,8 +5937,7 @@ wait_reading_process_output (intmax_t time_limit, int nsecs, int read_kbd,
    Derek>    /* If we have an incompletely set up TLS connection,
    Derek>       then defer the sentinel signaling until
    Derek>       later. */
    Derek> -  if (NILP (p->gnutls_boot_parameters)
    Derek> -      && !p->gnutls_p)
    Derek> +  if (NILP (p->gnutls_boot_parameters))
    Derek>  #endif
    Derek>      {
    Derek>        pset_status (p, Qrun);

Hereʼs what I think is happening:

The only way for p->gnutls_boot_parameters to become nil is here in
connect_network_socket:

      if (p->gnutls_initstage == GNUTLS_STAGE_READY)
        {
          p->gnutls_boot_parameters = Qnil;
          /* Run sentinels, etc. */
          finish_after_tls_connection (proc);
        }

and finish_after_tls_connection should call the sentinel, but
NON_BLOCKING_CONNECT_FD is still set, so it doesnʼt.

The next chance to call the sentinel would be from
wait_reading_process_output, but only if handshaking has been tried
and not completed, except it is complete already.

wait_reading_process_output then calls delete_write_fd, which clears
NON_BLOCKING_CONNECT_FD, and doesnʼt run the sentinel because
p->gnutls_boot_parameters is nil and p->gnutls_p is true

finish_after_tls_connection never gets another chance to run, since
the socket is connected and handshaking is complete.

After your change, you've fixed this case:

    if p->gnutls_boot_parameters is nil, that means the handshake
    completed already and the TLS connection is up, so
    calling the sentinel is ok.

In other cases where the handshake does not complete straight away in
Fgnutls_boot, it will complete here in wait_reading_process_output

                /* Continue TLS negotiation. */
                if (p->gnutls_initstage == GNUTLS_STAGE_HANDSHAKE_TRIED
                    && p->is_non_blocking_client)
                  {
                    gnutls_try_handshake (p);
                    p->gnutls_handshakes_tried++;

                    if (p->gnutls_initstage == GNUTLS_STAGE_READY)
                      {
                        gnutls_verify_boot (aproc, Qnil);
                        finish_after_tls_connection (aproc);
                      }

which always happens after delete_write_fd has been called, which
clears NON_BLOCKING_CONNECT_FD, so finish_after_tls_connection calls
the sentinel.

One change we could make is to set p->gnutls_boot_parameters to nil
here, so that in the sequence

    Fgnutls_boot, handshake does not complete
    handshake succeeds first time in wait_reading_process_output
    delete_write_fd then checks p->gnutls_boot_parameters

the sentinel ends up getting run, but Iʼve not seen the handshake ever
succeed straight away before the delete_write_fd, and if it ever has
in the wild we would have seen bug reports (and this is dragon-filled
code, so I donʼt want to make changes to it if I can help it :-))

In short: I think the change is ok. It passes the network-stream
tests, so Iʼll run with it for a while, and push it in a week or so.

Robert



Reply | Threaded
Open this post in threaded view
|

bug#40665: 28.0.50; tls hang on local ssl

Derek Zhou-2

Robert Pluim writes:

> One change we could make is to set p->gnutls_boot_parameters to nil
> here, so that in the sequence
>
>     Fgnutls_boot, handshake does not complete
>     handshake succeeds first time in wait_reading_process_output
>     delete_write_fd then checks p->gnutls_boot_parameters
>
> the sentinel ends up getting run, but Iʼve not seen the handshake ever
> succeed straight away before the delete_write_fd, and if it ever has
> in the wild we would have seen bug reports (and this is dragon-filled
> code, so I donʼt want to make changes to it if I can help it :-))
>
> In short: I think the change is ok. It passes the network-stream
> tests, so Iʼll run with it for a while, and push it in a week or so.

Will be glad to test anything on this issue. Just let me know
if you pushed any change.

Derek




Reply | Threaded
Open this post in threaded view
|

bug#40665: 28.0.50; tls hang on local ssl

Robert Pluim
>>>>> On Sun, 19 Apr 2020 15:34:01 +0000 (UTC), Derek Zhou <[hidden email]> said:

    Derek> Robert Pluim writes:

    >> One change we could make is to set p->gnutls_boot_parameters to nil
    >> here, so that in the sequence
    >>
    >> Fgnutls_boot, handshake does not complete
    >> handshake succeeds first time in wait_reading_process_output
    >> delete_write_fd then checks p->gnutls_boot_parameters
    >>
    >> the sentinel ends up getting run, but Iʼve not seen the handshake ever
    >> succeed straight away before the delete_write_fd, and if it ever has
    >> in the wild we would have seen bug reports (and this is dragon-filled
    >> code, so I donʼt want to make changes to it if I can help it :-))
    >>
    >> In short: I think the change is ok. It passes the network-stream
    >> tests, so Iʼll run with it for a while, and push it in a week or so.

    Derek> Will be glad to test anything on this issue. Just let me know
    Derek> if you pushed any change.

I assume youʼre running with the change locally. If you continue to do
so, and donʼt see any issues, then that raises my confidence in the
change being correct.

Iʼll update this bug when I push a change.

Thanks

Robert



Reply | Threaded
Open this post in threaded view
|

bug#40665: 28.0.50; tls hang on local ssl

Derek Zhou-2

Derek Zhou writes:

> Robert Pluim writes:
>
>> I assume youʼre running with the change locally. If you continue to do
>> so, and donʼt see any issues, then that raises my confidence in the
>> change being correct.
>>
>> Iʼll update this bug when I push a change.
> Hold on. This break a lot of other https sites. I'll debug more, if
> someone is more familiar with this case please also look into this.

False alarm. I was backporting the patch to 26.3, which does not have
the other change:

commit b34c6d2c9694ec300b92129dbf88fe012837dfe2
Author: Robert Pluim <[hidden email]>
Date:   Mon Jul 15 13:28:25 2019 +0200

    Don't delete GnuTLS boot parameters too early
   
    * src/process.c (connect_network_socket): Don't delete the GnuTLS
    boot parameters until after we've managed to connect at the IP
    level (bug#36660).


So obviously this won't work.
Right now I am running 26.3 + this + my patch, everything seems ok for
now.

Derek



Reply | Threaded
Open this post in threaded view
|

bug#40665: 28.0.50; tls hang on local ssl

Robert Pluim
>>>>> On Tue, 21 Apr 2020 01:41:21 +0000 (UTC), Derek Zhou <[hidden email]> said:

    Derek> Derek Zhou writes:

    >> Robert Pluim writes:
    >>
    >>> I assume youʼre running with the change locally. If you continue to do
    >>> so, and donʼt see any issues, then that raises my confidence in the
    >>> change being correct.
    >>>
    >>> Iʼll update this bug when I push a change.
    >> Hold on. This break a lot of other https sites. I'll debug more, if
    >> someone is more familiar with this case please also look into this.

    Derek> False alarm. I was backporting the patch to 26.3, which does not have
    Derek> the other change:

See my earlier note about dragons in the code :-)

    Derek> commit b34c6d2c9694ec300b92129dbf88fe012837dfe2
    Derek> Author: Robert Pluim <[hidden email]>
    Derek> Date:   Mon Jul 15 13:28:25 2019 +0200

    Derek>     Don't delete GnuTLS boot parameters too early
   
    Derek>     * src/process.c (connect_network_socket): Don't delete the GnuTLS
    Derek>     boot parameters until after we've managed to connect at the IP
    Derek>     level (bug#36660).


    Derek> So obviously this won't work.

Right, you'll end up calling the sentinel before handshaking has
completed.

    Derek> Right now I am running 26.3 + this + my patch, everything seems ok for
    Derek> now.

And itʼs fine so far for me here as well in emacs-27.

Robert



Reply | Threaded
Open this post in threaded view
|

bug#40665: 28.0.50; tls hang on local ssl

Derek Zhou-2

Robert Pluim writes:

>>>>>> On Tue, 21 Apr 2020 01:41:21 +0000 (UTC), Derek Zhou <[hidden email]> said:
>
>     Derek> Derek Zhou writes:
>
>     >> Robert Pluim writes:
>     >>
>     >>> I assume youʼre running with the change locally. If you continue to do
>     >>> so, and donʼt see any issues, then that raises my confidence in the
>     >>> change being correct.
>     >>>
>     >>> Iʼll update this bug when I push a change.
>     >> Hold on. This break a lot of other https sites. I'll debug more, if
>     >> someone is more familiar with this case please also look into this.
>
>     Derek> False alarm. I was backporting the patch to 26.3, which does not have
>     Derek> the other change:
>
> See my earlier note about dragons in the code :-)

I still see some https site stalled with no content displayed. However,
as soon as I type something on the keyboard, content would come in;
Strongly suggested that there are some problem in the emacs process
layer. Will dig deeper.

Derek



Reply | Threaded
Open this post in threaded view
|

bug#40665: 28.0.50; tls hang on local ssl

Robert Pluim
>>>>> On Tue, 21 Apr 2020 13:37:04 +0000 (UTC), Derek Zhou <[hidden email]> said:

    Derek> Robert Pluim writes:

    >>>>>>> On Tue, 21 Apr 2020 01:41:21 +0000 (UTC), Derek Zhou <[hidden email]> said:
    >>
    Derek> Derek Zhou writes:
    >>
    >> >> Robert Pluim writes:
    >> >>
    >> >>> I assume youʼre running with the change locally. If you continue to do
    >> >>> so, and donʼt see any issues, then that raises my confidence in the
    >> >>> change being correct.
    >> >>>
    >> >>> Iʼll update this bug when I push a change.
    >> >> Hold on. This break a lot of other https sites. I'll debug more, if
    >> >> someone is more familiar with this case please also look into this.
    >>
    Derek> False alarm. I was backporting the patch to 26.3, which does not have
    Derek> the other change:
    >>
    >> See my earlier note about dragons in the code :-)

    Derek> I still see some https site stalled with no content displayed. However,
    Derek> as soon as I type something on the keyboard, content would come in;
    Derek> Strongly suggested that there are some problem in the emacs process
    Derek> layer. Will dig deeper.

With emacs-26 or emacs-27? The GnuTLS code between the two is quite
different, mainly things like:

    commit e87e6a24c49542111e669b7d0f1a412024663f8e
    Author: Paul Eggert <[hidden email]>
    Date:   Tue Jan 15 23:51:45 2019 -0800

        Fix unlikely races with GnuTLS, datagrams

which despite the comment are perhaps not that unlikely.

Robert



Reply | Threaded
Open this post in threaded view
|

bug#40665: 28.0.50; tls hang on local ssl

Derek Zhou-2

Robert Pluim writes:

> With emacs-26 or emacs-27? The GnuTLS code between the two is quite
> different, mainly things like:
>
>     commit e87e6a24c49542111e669b7d0f1a412024663f8e
>     Author: Paul Eggert <[hidden email]>
>     Date:   Tue Jan 15 23:51:45 2019 -0800
>
>         Fix unlikely races with GnuTLS, datagrams
>
> which despite the comment are perhaps not that unlikely.

With emacs-26. maybe I should just use emacs-27 from now on.
emacs-27 don't have this indefinite stall, however, it still stalls for
a few seconds. The network is not that slow; it feels like something
wasn't happening in time and was only being triggered later by some
fallback mechanism.

Derek



Reply | Threaded
Open this post in threaded view
|

bug#40665: 28.0.50; tls hang on local ssl

Derek Zhou-2

Derek Zhou writes:
>
> With emacs-26. maybe I should just use emacs-27 from now on.
> emacs-27 don't have this indefinite stall, however, it still stalls for
> a few seconds. The network is not that slow; it feels like something
> wasn't happening in time and was only being triggered later by some
> fallback mechanism.
>
Robert,

One of the website that has strange stall is https://www.opensource.com
It loads in sub-second in w3m, curl or wget. However, I have
multi-second stall in emacs-27. Can you repo?

Derek




Reply | Threaded
Open this post in threaded view
|

bug#40665: 28.0.50; tls hang on local ssl

Derek Zhou-2
In reply to this post by Derek Zhou-2

Derek Zhou writes:

> Robert Pluim writes:
>
>> Thatʼs always possible. You'd have to stick some instrumentation in
>> things like connect_network_socket and wait_reading_process_output to
>> narrow it down.
>>
> I think what happened is gnutls's internal buffering exhausts the
> available data from the socket, so select blocks. There is an comment in
> the code said gnutls leave one byte in the socket for select, but I
> don't think it is doing this anymore. The following patch move the check
> before the select and skip the select if there is data to be read in
> gnutls. It fixed the occational stall in https.
Sorry, the previous patch was wrong. Cannot reuse the fd_set intended
for select. Corrected:


diff --git a/src/process.c b/src/process.c
index 91d426103d..49b034b1a7 100644
--- a/src/process.c
+++ b/src/process.c
@@ -5566,7 +5566,38 @@ wait_reading_process_output (intmax_t time_limit, int nsecs, int read_kbd,
     }
 #endif
 
-/* Non-macOS HAVE_GLIB builds call thread_select in xgselect.c.  */
+#ifdef HAVE_GNUTLS
+          /* GnuTLS buffers data internally. We need to check if some
+             data is available in the buffers manually before the select.
+     And if so, we need to skip the select which could block */
+  {
+    fd_set tls_available;
+    FD_ZERO (&tls_available);
+    nfds = 0;
+    for (channel = 0; channel < FD_SETSIZE; ++channel)
+      if (! NILP (chan_process[channel]))
+ {
+  struct Lisp_Process *p =
+    XPROCESS (chan_process[channel]);
+  if (p && p->gnutls_p && p->gnutls_state
+      && ((emacs_gnutls_record_check_pending
+   (p->gnutls_state))
+  > 0))
+    {
+      nfds++;
+      eassert (p->infd == channel);
+      FD_SET (p->infd, &tls_available);
+    }
+ }
+  /* don't select if we have something here */
+    if (nfds > 0) {
+      Available = tls_available;
+      goto SELECT_END;
+    }
+  }
+#endif
+
+  /* Non-macOS HAVE_GLIB builds call thread_select in xgselect.c.  */
 #if defined HAVE_GLIB && !defined HAVE_NS
   nfds = xg_select (max_desc + 1,
     &Available, (check_write ? &Writeok : 0),
@@ -5582,65 +5613,9 @@ wait_reading_process_output (intmax_t time_limit, int nsecs, int read_kbd,
  (check_write ? &Writeok : 0),
  NULL, &timeout, NULL);
 #endif /* !HAVE_GLIB */
-
-#ifdef HAVE_GNUTLS
-          /* GnuTLS buffers data internally.  In lowat mode it leaves
-             some data in the TCP buffers so that select works, but
-             with custom pull/push functions we need to check if some
-             data is available in the buffers manually.  */
-          if (nfds == 0)
-    {
-      fd_set tls_available;
-      int set = 0;
-
-      FD_ZERO (&tls_available);
-      if (! wait_proc)
- {
-  /* We're not waiting on a specific process, so loop
-     through all the channels and check for data.
-     This is a workaround needed for some versions of
-     the gnutls library -- 2.12.14 has been confirmed
-     to need it.  */
-  for (channel = 0; channel < FD_SETSIZE; ++channel)
-    if (! NILP (chan_process[channel]))
-      {
- struct Lisp_Process *p =
-  XPROCESS (chan_process[channel]);
- if (p && p->gnutls_p && p->gnutls_state
-    && ((emacs_gnutls_record_check_pending
- (p->gnutls_state))
- > 0))
-  {
-    nfds++;
-    eassert (p->infd == channel);
-    FD_SET (p->infd, &tls_available);
-    set++;
-  }
-      }
- }
-      else
- {
-  /* Check this specific channel.  */
-  if (wait_proc->gnutls_p /* Check for valid process.  */
-      && wait_proc->gnutls_state
-      /* Do we have pending data?  */
-      && ((emacs_gnutls_record_check_pending
-   (wait_proc->gnutls_state))
-  > 0))
-    {
-      nfds = 1;
-      eassert (0 <= wait_proc->infd);
-      /* Set to Available.  */
-      FD_SET (wait_proc->infd, &tls_available);
-      set++;
-    }
- }
-      if (set)
- Available = tls_available;
-    }
-#endif
  }
 
+    SELECT_END:
       xerrno = errno;
 
       /* Make C-g and alarm signals set flags again.  */
Reply | Threaded
Open this post in threaded view
|

bug#40665: 28.0.50; tls hang on local ssl

Robert Pluim
>>>>> On Tue, 21 Apr 2020 22:29:30 +0000 (UTC), Derek Zhou <[hidden email]> said:

    Derek> Version 3, add safty guard for tls read detection. Please use this one.
    Derek> You may want to review this carefully.

I haven't tested it yet, but what about wait_proc?

Robert



Reply | Threaded
Open this post in threaded view
|

bug#40665: 28.0.50; tls hang on local ssl

Derek Zhou-2

Robert Pluim writes:

>>>>>> On Tue, 21 Apr 2020 22:29:30 +0000 (UTC), Derek Zhou <[hidden email]> said:
>
>     Derek> Version 3, add safty guard for tls read detection. Please use this one.
>     Derek> You may want to review this carefully.
>
> I haven't tested it yet, but what about wait_proc?
>
I don't quite understand the wait_proc business. The idea of the patch
is to detect that out of all the fds that are going to be selected, how
many are gnutls managed and are ready from the gnutls buffer? If the
answer is positive, we skip the select and pretend the select return
those fds only. I think this is safe; because it is one of the possible
and legal return of the select, wait_proc or not.

Another way is to still do a zero timeout select, and merge the gnutls
ready set with the select ready set. It is more intrusive but probably
closer to the original intent of the code. I can write the path that way
if you want.

Derek



Reply | Threaded
Open this post in threaded view
|

bug#40665: 28.0.50; tls hang on local ssl

Robert Pluim
>>>>> On Wed, 22 Apr 2020 12:25:59 +0000 (UTC), Derek Zhou <[hidden email]> said:

    Derek> Robert Pluim writes:

    >>>>>>> On Tue, 21 Apr 2020 22:29:30 +0000 (UTC), Derek Zhou <[hidden email]> said:
    >>
    Derek> Version 3, add safty guard for tls read detection. Please use this one.
    Derek> You may want to review this carefully.
    >>
    >> I haven't tested it yet, but what about wait_proc?
    >>

Iʼve tested it lightly and it fixes <https://www.opensource.com>

    Derek> I don't quite understand the wait_proc business. The idea of the patch
    Derek> is to detect that out of all the fds that are going to be selected, how
    Derek> many are gnutls managed and are ready from the gnutls buffer? If the
    Derek> answer is positive, we skip the select and pretend the select return
    Derek> those fds only. I think this is safe; because it is one of the possible
    Derek> and legal return of the select, wait_proc or not.

The reason for checking wait_proc is to allow 'accept-process-output'
to specify that emacs should return only when there is data for that
specific process, with your patch it can return if there is any data
in the TLS buffers for any connection, but none for wait_proc. That
would make 'accept-process-output' return earlier than expected, or
even return for the case where the timeout is infinite.

A quick survey of the emacs sources shows almost every call to
'accept-process-output' passes in wait_proc, so I think that your
change as it stands is too risky. With a check for wait_proc it might
be OK.

    Derek> Another way is to still do a zero timeout select, and merge the gnutls
    Derek> ready set with the select ready set. It is more intrusive but probably
    Derek> closer to the original intent of the code. I can write the path that way
    Derek> if you want.

I donʼt think we always do a zero timeout select. This sounds even
riskier.

Robert



12