bug#31901: Incorrect make-network-process + nowait state handling for non-existing unix sockets in emacs-26.1

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

bug#31901: Incorrect make-network-process + nowait state handling for non-existing unix sockets in emacs-26.1

Mike Kazantsev-2

Description:

Running (make-network-process ... :family 'local :nowait t) with
non-exising socket hangs forever in "open" state without ever calling
sentinel function.

100% reproducible on emacs-26.1 in current Archlinux OS.


==== How to reproduce:

Run following s-expressions from lisp-interaction-mode buffer in the
same sequence:

  (defun unix-socket-test-func (proc ev)
    (message "unix-socket-test: %s %s" proc ev))

  (defvar socket)
  (setq socket
    (make-network-process
      :name "unix-socket-test"
      :family 'local
      :service "/tmp/does-not-exist.sock"
      :nowait t
      :coding '(utf-8 . utf-8)
      :buffer (get-buffer-create "some-ipc-buffer")
      :noquery t
      :filter #'unix-socket-test-func
      :sentinel #'unix-socket-test-func))

  (process-live-p socket)
  (process-status socket)
  (delete-process socket)

Note - assuming that /tmp/does-not-exist.sock does not exist on the
filesystem.


==== Expected result:

One or more of:

- Message "unix-socket-test: failed" or some similar failure is
  reported by the sentinel function.

- Maybe "open" then "failed" states reported, in case socket() call is
  treated as 'open before connect() is issued.

- "socket" variable is set to nil, as it was with ":nowait t" before
  emacs-26, due to such simple and instant error.

- (process-live-p socket) reports nil when run few seconds after
  make-network-process call.

- (process-status socket) reports some kind of failed state, not 'open.

None of this happens.


==== Actual result:

- socket process is created and stuck forever in 'open state.

- strace on emacs process (monitoring syscalls) show this syscall
  sequence happening immediately when make-network-process call is made:

    [pid 31590] socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 21
    [pid 31590] connect(21, {sa_family=AF_UNIX, sun_path="/tmp/does-not-exist.sock"}, 110) = -1 ENOENT (No such file or directory)
    [pid 31590] close(21)                   = 0

- Function passed as :sentinel to make-network-process is never called
  with any state - neither 'open nor 'failed.

- (process-live-p socket) returns non-nil result for process associated
  with socket that is actually closed, and was closed immediately on
  connect, without any indication of that in elisp code.

- (process-status socket) returns 'open, even though it never sent
  'open status to sentinel function, nor is socket actually open at
  that point. It never succeeded at connecting to anything, in fact.

Same thing happens with socket that nothing is listening on, except
strace there gives different error on connect():

  [pid 31590] socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 21
  [pid 31590] connect(21, {sa_family=AF_UNIX, sun_path="/tmp/does-not-exist.sock"}, 110) = -1 ECONNREFUSED (Connection refused)
  [pid 31590] close(21)

This makes using unix sockets with emacs correctly and efficiently very
hard, if not impossible, as it there's no way to tell if it is stuck
connecting or failed except setting some kind of timeout on sentinel
response.

As far as I can tell, this is not documented behavior, definitely
unexpected, breaks all old code that used unix sockets, and probably
unintentional, i.e. a bug.


Apart from mentioned Archlinux setup, symptoms of the same issue were
also reported on EMMS (emacs multimedia system) mailing list with
emacs-26 recently on Ubuntu 18.10 pre-release version.

Have tried to find exising reports for such - rather basic, it seems -
problem, but haven't found any, so not entirely sure if maybe I'm doing
something wrong here.


Thanks for reading this far and any assistance.


--
Mike Kazantsev // fraggod.net



Reply | Threaded
Open this post in threaded view
|

bug#31901: Incorrect make-network-process + nowait state handling for non-existing unix sockets in emacs-26.1

Noam Postavsky
tags 31901 + confirmed
quit

Mike Kazantsev <[hidden email]> writes:

> As far as I can tell, this is not documented behavior, definitely
> unexpected, breaks all old code that used unix sockets, and probably
> unintentional, i.e. a bug.

It seems to be due to this is_non_blocking_client check in
connect_network_socket:

  if (s < 0)
    {
      /* If non-blocking got this far - and failed - assume non-blocking is
         not supported after all.  This is probably a wrong assumption, but
         the normal blocking calls to open-network-stream handles this error
         better.  */
      if (p->is_non_blocking_client)
        return;

      report_file_errno ((p->is_server
                          ? "make server process failed"
                          : "make client process failed"),
                         contact, xerrno);
    }

In Emacs 25, this check was directly in Fmake_network_process before the
process object creation code, so we would just return nil in that case.
Seems to have changed, I assume accidentally, in [1: e09c0972c3].

[1: e09c0972c3]: 2016-01-28 23:50:47 +0100
  Refactor make_network_process
  https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=e09c0972c350e9411683b509414fc598cbf387d3



Reply | Threaded
Open this post in threaded view
|

bug#31901: Incorrect make-network-process + nowait state handling for non-existing unix sockets in emacs-26.1

Eli Zaretskii
Ping!

Lars, could you please look into this?

> From: Noam Postavsky <[hidden email]>
> Date: Sun, 01 Jul 2018 11:21:06 -0400
> Cc: Lars Ingebrigtsen <[hidden email]>, [hidden email]
>
> tags 31901 + confirmed
> quit
>
> Mike Kazantsev <[hidden email]> writes:
>
> > As far as I can tell, this is not documented behavior, definitely
> > unexpected, breaks all old code that used unix sockets, and probably
> > unintentional, i.e. a bug.
>
> It seems to be due to this is_non_blocking_client check in
> connect_network_socket:
>
>   if (s < 0)
>     {
>       /* If non-blocking got this far - and failed - assume non-blocking is
>          not supported after all.  This is probably a wrong assumption, but
>          the normal blocking calls to open-network-stream handles this error
>          better.  */
>       if (p->is_non_blocking_client)
>         return;
>
>       report_file_errno ((p->is_server
>                           ? "make server process failed"
>                           : "make client process failed"),
>                          contact, xerrno);
>     }
>
> In Emacs 25, this check was directly in Fmake_network_process before the
> process object creation code, so we would just return nil in that case.
> Seems to have changed, I assume accidentally, in [1: e09c0972c3].
>
> [1: e09c0972c3]: 2016-01-28 23:50:47 +0100
>   Refactor make_network_process
>   https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=e09c0972c350e9411683b509414fc598cbf387d3
>
>
>
>



Reply | Threaded
Open this post in threaded view
|

bug#31901: Incorrect make-network-process + nowait state handling for non-existing unix sockets in emacs-26.1

Eli Zaretskii
Ping!  Ping!

> Date: Fri, 06 Jul 2018 11:59:35 +0300
> From: Eli Zaretskii <[hidden email]>
> Cc: [hidden email], [hidden email]
>
> Ping!
>
> Lars, could you please look into this?
>
> > From: Noam Postavsky <[hidden email]>
> > Date: Sun, 01 Jul 2018 11:21:06 -0400
> > Cc: Lars Ingebrigtsen <[hidden email]>, [hidden email]
> >
> > tags 31901 + confirmed
> > quit
> >
> > Mike Kazantsev <[hidden email]> writes:
> >
> > > As far as I can tell, this is not documented behavior, definitely
> > > unexpected, breaks all old code that used unix sockets, and probably
> > > unintentional, i.e. a bug.
> >
> > It seems to be due to this is_non_blocking_client check in
> > connect_network_socket:
> >
> >   if (s < 0)
> >     {
> >       /* If non-blocking got this far - and failed - assume non-blocking is
> >          not supported after all.  This is probably a wrong assumption, but
> >          the normal blocking calls to open-network-stream handles this error
> >          better.  */
> >       if (p->is_non_blocking_client)
> >         return;
> >
> >       report_file_errno ((p->is_server
> >                           ? "make server process failed"
> >                           : "make client process failed"),
> >                          contact, xerrno);
> >     }
> >
> > In Emacs 25, this check was directly in Fmake_network_process before the
> > process object creation code, so we would just return nil in that case.
> > Seems to have changed, I assume accidentally, in [1: e09c0972c3].
> >
> > [1: e09c0972c3]: 2016-01-28 23:50:47 +0100
> >   Refactor make_network_process
> >   https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=e09c0972c350e9411683b509414fc598cbf387d3
> >
> >
> >
> >
>
>
>
>



Reply | Threaded
Open this post in threaded view
|

bug#31901: Incorrect make-network-process + nowait state handling for non-existing unix sockets in emacs-26.1

Eli Zaretskii
> Date: Sat, 14 Jul 2018 11:20:57 +0300
> From: Eli Zaretskii <[hidden email]>
> Cc: [hidden email], [hidden email], [hidden email]
>
> Ping!  Ping!

Still no response.

Noam, would you like to propose a fix for this?



Reply | Threaded
Open this post in threaded view
|

bug#31901: Incorrect make-network-process + nowait state handling for non-existing unix sockets in emacs-26.1

Lars Ingebrigtsen
Eli Zaretskii <[hidden email]> writes:

> Still no response.

Sorry for the delays; I should have time on Sunday to go through the
emacs mail (and also the netsec branch, hopefully).

--
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



Reply | Threaded
Open this post in threaded view
|

bug#31901: Incorrect make-network-process + nowait state handling for non-existing unix sockets in emacs-26.1

Eli Zaretskii
> From: Lars Ingebrigtsen <[hidden email]>
> Cc: [hidden email],  [hidden email],  [hidden email]
> Date: Fri, 20 Jul 2018 12:08:33 +0200
>
> Eli Zaretskii <[hidden email]> writes:
>
> > Still no response.
>
> Sorry for the delays; I should have time on Sunday to go through the
> emacs mail (and also the netsec branch, hopefully).

Great, thanks in advance.



Reply | Threaded
Open this post in threaded view
|

bug#31901: Incorrect make-network-process + nowait state handling for non-existing unix sockets in emacs-26.1

Lars Ingebrigtsen
OK, I've now fixed the async error case (on master), I think:

  (process-live-p socket)
  nil
  (process-status socket)
  file-missing

I've fixed this on master, because I'm not 100% sure that my fix is
completely correct.  Should we perhaps also call a sentinel in the async
error case?

But we should, in any case, perhaps cherry-pick this fix into Emacs
26.2.

--
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Reply | Threaded
Open this post in threaded view
|

bug#31901: Incorrect make-network-process + nowait state handling for non-existing unix sockets in emacs-26.1

Eli Zaretskii
> From: Lars Ingebrigtsen <[hidden email]>
> Cc: [hidden email],  [hidden email],  [hidden email]
> Date: Sun, 22 Jul 2018 13:40:51 +0200
>
> OK, I've now fixed the async error case (on master), I think:
>
>   (process-live-p socket)
>   nil
>   (process-status socket)
>   file-missing

Thanks!

> I've fixed this on master, because I'm not 100% sure that my fix is
> completely correct.  Should we perhaps also call a sentinel in the async
> error case?

But we didn't call the sentinel before, did we?

> But we should, in any case, perhaps cherry-pick this fix into Emacs
> 26.2.

It looks like the only visible change in behavior is that we now store
the error in the process object, is that right?  If so, I think we
want this on emacs-26, yes.



Reply | Threaded
Open this post in threaded view
|

bug#31901: Incorrect make-network-process + nowait state handling for non-existing unix sockets in emacs-26.1

Lars Ingebrigtsen
Eli Zaretskii <[hidden email]> writes:

>> I've fixed this on master, because I'm not 100% sure that my fix is
>> completely correct.  Should we perhaps also call a sentinel in the async
>> error case?
>
> But we didn't call the sentinel before, did we?

I don't think we did, but I'm not completely sure.  I tried following
the pre-Emacs 26 code flow, and I didn't see a call, but I may be
mistaken.

>> But we should, in any case, perhaps cherry-pick this fix into Emacs
>> 26.2.
>
> It looks like the only visible change in behavior is that we now store
> the error in the process object, is that right?

Yup.  That is, it should be unless I messed something up in the
refactoring of report_file_errno.  :-)

> If so, I think we want this on emacs-26, yes.

Yup.

--
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



Reply | Threaded
Open this post in threaded view
|

bug#31901: Incorrect make-network-process + nowait state handling for non-existing unix sockets in emacs-26.1

Mike Kazantsev-2
In reply to this post by Lars Ingebrigtsen
On Sun, 22 Jul 2018 13:40:51 +0200
Lars Ingebrigtsen <[hidden email]> wrote:

> OK, I've now fixed the async error case (on master), I think:
>
>   (process-live-p socket)
>   nil
>   (process-status socket)
>   file-missing
>
> I've fixed this on master, because I'm not 100% sure that my fix is
> completely correct.  Should we perhaps also call a sentinel in the async
> error case?

Unless I misunderstood the question, for me as a user, it would be more
obvious to get this error in a sentinel call than not with emacs 26,
because that's how :nowait is documented to work now:

  The recommended way to deal with asynchronous sockets is to avoid
  interacting with them until they have changed status to "run".
  This is most easily done from a process sentinel.

https://github.com/emacs-mirror/emacs/blob/master/etc/NEWS.26

  If bool is non-nil for a stream connection, return without waiting
  for the connection to complete. When the connection succeeds or
  fails, Emacs will call the sentinel function, with a second argument
  matching "open" (if successful) or "failed".

https://www.gnu.org/software/emacs/manual/html_node/elisp/Network-Processes.html 

Given either of these guidelines, it'd be more surprising to not get
error state in the sentinel call with ":nowait t" in emacs-26.

(while ":nowait nil" signaling 'file-error is kinda expected)


--
Mike Kazantsev // fraggod.net



Reply | Threaded
Open this post in threaded view
|

bug#31901: Incorrect make-network-process + nowait state handling for non-existing unix sockets in emacs-26.1

Lars Ingebrigtsen
Mike Kazantsev <[hidden email]> writes:

> Given either of these guidelines, it'd be more surprising to not get
> error state in the sentinel call with ":nowait t" in emacs-26.

I agree, it would be better to get a sentinel callback.  But I don't
think that's how it's ever worked, so that's not a regression.  But it
could be a new feature for Emacs 27...

--
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



Reply | Threaded
Open this post in threaded view
|

bug#31901: Incorrect make-network-process + nowait state handling for non-existing unix sockets in emacs-26.1

Mike Kazantsev-2
On Sun, 22 Jul 2018 17:23:45 +0200
Lars Ingebrigtsen <[hidden email]> wrote:

> Mike Kazantsev <[hidden email]> writes:
>
> > Given either of these guidelines, it'd be more surprising to not get
> > error state in the sentinel call with ":nowait t" in emacs-26.  
>
> I agree, it would be better to get a sentinel callback.  But I don't
> think that's how it's ever worked, so that's not a regression.  But it
> could be a new feature for Emacs 27...

Yeah, guess so.

Though I thought how it works was supposed to change in emacs-26
already (as indicated by NEWS/documentation), which is why reverting to
emacs-25 behavior looks like a bug in itself.

But of course happy with it working either way, thanks for fixing it!


--
Mike Kazantsev // fraggod.net



Reply | Threaded
Open this post in threaded view
|

bug#31901: Incorrect make-network-process + nowait state handling for non-existing unix sockets in emacs-26.1

Eli Zaretskii
In reply to this post by Lars Ingebrigtsen
> From: Lars Ingebrigtsen <[hidden email]>
> Cc: Eli Zaretskii <[hidden email]>,  [hidden email],  [hidden email]
> Date: Sun, 22 Jul 2018 17:23:45 +0200
>
> Mike Kazantsev <[hidden email]> writes:
>
> > Given either of these guidelines, it'd be more surprising to not get
> > error state in the sentinel call with ":nowait t" in emacs-26.
>
> I agree, it would be better to get a sentinel callback.  But I don't
> think that's how it's ever worked, so that's not a regression.  But it
> could be a new feature for Emacs 27...

Yes, I think we should just fix the bug on emacs-26, and introduce the
sentinel callback on master.

Thanks.



Reply | Threaded
Open this post in threaded view
|

bug#31901: Incorrect make-network-process + nowait state handling for non-existing unix sockets in emacs-26.1

Mike Kazantsev-2
On Sun, 22 Jul 2018 18:56:31 +0300
Eli Zaretskii <[hidden email]> wrote:

> > From: Lars Ingebrigtsen <[hidden email]>
> > Cc: Eli Zaretskii <[hidden email]>,  [hidden email],  [hidden email]
> > Date: Sun, 22 Jul 2018 17:23:45 +0200
> >
> > Mike Kazantsev <[hidden email]> writes:
> >  
> > > Given either of these guidelines, it'd be more surprising to not get
> > > error state in the sentinel call with ":nowait t" in emacs-26.  
> >
> > I agree, it would be better to get a sentinel callback.  But I don't
> > think that's how it's ever worked, so that's not a regression.  But it
> > could be a new feature for Emacs 27...  
>
> Yes, I think we should just fix the bug on emacs-26, and introduce the
> sentinel callback on master.

I just built emacs from git to test how it works, and have to report
that unfortunately current behavior is not consistent with what
emacs had either before this bug was introduced.

More details follow.


== How it worked in emacs-25:

  (make-network-process
    ...
    :family 'local
    :service "/tmp/does-not-exist.sock"
    :nowait t
    :sentinel #'unix-socket-test-func)

  => nil
  [sentinel never called]

  (process-live-p socket) => nil (because "socket" will be nil)
  (process-status socket) => error

I.e. nil was returned to indicate failure to create network process.


== How it works in emacs-26.1 (with the bug in question):

  (make-network-process
    ...
    :family 'local
    :service "/tmp/does-not-exist.sock"
    :nowait t
    :sentinel #'unix-socket-test-func)

  => #<process unix-socket-test>
  [sentinel never called]

  (process-live-p socket) => [non-nil value]
  (process-status socket) => 'open

This is the case described in the original bug report.


== How it works in emacs-26.1 with ":nowail nil":

  (make-network-process
    ...
    :family 'local
    :service "/tmp/does-not-exist.sock"
    :nowait nil
    :sentinel #'unix-socket-test-func)

  => [file-missing signaled]
  [sentinel never called]

  (process-live-p socket) => [no socket value]
  (process-status socket) => [no socket value]

Provided here to illustrate the difference with current ":nowait t"
case.


== How it works in current emacs-git [8f3bca3ad51]:

  (make-network-process
    ...
    :family 'local
    :service "/tmp/does-not-exist.sock"
    :nowait t
    :sentinel #'unix-socket-test-func)

  => #<process unix-socket-test>
  [sentinel never called]

  (process-live-p socket) => nil
  (process-status socket) => 'file-missing

Note that this is:

- Not consistent with what emacs did before this bug.

- Not consistent with what emacs-26 does with ":nowait nil".

- Does not seem to be consistent with what documentation describes
  (as indicated in this thread earlier, might be subjective).


So wanted to note that as it is, entirely new unexpected behavior was
introduced, which does not line up neither with how this worked
previously (and will certainly break the code there), nor how you'd
expect it to work after emacs-26 changes (either signal error same as
":nowait nil" does, or via sentinel).

As far as I can tell, this applies to both missing unix socket and when
it exists, but connection to it is refused (nothing listening there).


--
Mike Kazantsev // fraggod.net



Reply | Threaded
Open this post in threaded view
|

bug#31901: Incorrect make-network-process + nowait state handling for non-existing unix sockets in emacs-26.1

Mike Kazantsev-2
On Sun, 22 Jul 2018 21:42:07 +0500
Mike Kazantsev <[hidden email]> wrote:

> == How it works in current emacs-git [8f3bca3ad51]:
>
>   (make-network-process
>     ...
>     :family 'local
>     :service "/tmp/does-not-exist.sock"
>     :nowait t
>     :sentinel #'unix-socket-test-func)
>
>   => #<process unix-socket-test>  
>   [sentinel never called]
>
>   (process-live-p socket) => nil
>   (process-status socket) => 'file-missing
>
> Note that this is:
>
> - Not consistent with what emacs did before this bug.

After sending previous mail, realized that maybe it is consistent with
emacs 26.0 or similar version before the bug which I've never tested,
where change was made to behave exactly like that, but then this bug
changed it again.

If so, this change will be consistent with that, and I'm definitely
wrong about this behavior being only introduced here.

Which would explain my misunderstanding of how this fix works, and if
that's the case, please disregard that previous message, sorry.


--
Mike Kazantsev // fraggod.net



Reply | Threaded
Open this post in threaded view
|

bug#31901: Incorrect make-network-process + nowait state handling for non-existing unix sockets in emacs-26.1

Noam Postavsky
Mike Kazantsev <[hidden email]> writes:

> On Sun, 22 Jul 2018 21:42:07 +0500
> Mike Kazantsev <[hidden email]> wrote:
>
>> == How it works in current emacs-git [8f3bca3ad51]:
>>
>>   (make-network-process
>>     ...
>>     :family 'local
>>     :service "/tmp/does-not-exist.sock"
>>     :nowait t
>>     :sentinel #'unix-socket-test-func)
>>
>>   => #<process unix-socket-test>  
>>   [sentinel never called]
>>
>>   (process-live-p socket) => nil
>>   (process-status socket) => 'file-missing
>>
>> Note that this is:
>>
>> - Not consistent with what emacs did before this bug.
>
> After sending previous mail, realized that maybe it is consistent with
> emacs 26.0 or similar version before the bug which I've never tested,
> where change was made to behave exactly like that, but then this bug
> changed it again.

I believe the behaviour changed with the introduction of the bug (by the
way, Emacs releases are XX.y where y > 1, so there was never a 26.0
release).

So the situation is not ideal, but I do think that it's acceptable as a
compromise.  Shall we backport and close the bug?





Reply | Threaded
Open this post in threaded view
|

bug#31901: Incorrect make-network-process + nowait state handling for non-existing unix sockets in emacs-26.1

Eli Zaretskii
> From: Noam Postavsky <[hidden email]>
> Cc: Eli Zaretskii <[hidden email]>,  Lars Ingebrigtsen <[hidden email]>,  [hidden email]
> Date: Mon, 06 Aug 2018 16:36:28 -0400
>
> So the situation is not ideal, but I do think that it's acceptable as a
> compromise.  Shall we backport and close the bug?

Yes, I believe we already agreed on backporting.

Thanks.



Reply | Threaded
Open this post in threaded view
|

bug#31901: Incorrect make-network-process + nowait state handling for non-existing unix sockets in emacs-26.1

Noam Postavsky
tags 31901 fixed
close 31901 26.2
quit

Eli Zaretskii <[hidden email]> writes:

>> From: Noam Postavsky <[hidden email]>
>> Cc: Eli Zaretskii <[hidden email]>,  Lars Ingebrigtsen <[hidden email]>,  [hidden email]
>> Date: Mon, 06 Aug 2018 16:36:28 -0400
>>
>> So the situation is not ideal, but I do think that it's acceptable as a
>> compromise.  Shall we backport and close the bug?
>
> Yes, I believe we already agreed on backporting.

Done.

[1: 18588bce36]: 2018-08-08 19:30:50 -0400
  Make async :family 'local failures fail correctly again
  https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=18588bce36617179cc7c8af74a6197c8e16819ea