bug#34763: 27.0.50; url-retrieve-synchronously misbehaves inside eldoc-documentation-function

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

bug#34763: 27.0.50; url-retrieve-synchronously misbehaves inside eldoc-documentation-function

Dmitry Gutov
1. Evaluate the attached .el file.

2. Start typing, randomly and with certain pauses, trying to align the
interval to the "Contacting host" messages.

At first, Emacs is responsive, and the requests are duly aborted, like
'while-no-input' obliges them to. A little bit later (maybe 10-30
seconds later), I see errors like "Transfer interrupted", and Emacs
stutters when that happens. Later on, any HTTP request to the same host
freezes Emacs until I press C-g. The attached screenshot shows the
messages log and the list of processes at the end.

I guessing this became much easier to trigger with
12e922156c86a26fa4bb2cb9e7d2b3fd639e4707 when eldoc started using
when-no-input. But whatever underlying problem is there in url-http,
it's probably been there longer.

In GNU Emacs 27.0.50 (build 20, x86_64-pc-linux-gnu, GTK+ Version 3.22.30)
  of 2019-03-05 built on zappa
Repository revision: 9b93e3b0759d562989283eaecf32e075f984c18c
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12001000
System Description: Ubuntu 18.04.2 LTS

Screenshot from 2019-03-05 16-36-55.png (587K) Download Attachment
url-retrieve-synchronously-bug.el (275 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

bug#34763: 27.0.50; url-retrieve-synchronously misbehaves inside eldoc-documentation-function

Dmitry Gutov
Can anybody reproduce this?

On 05.03.2019 23:34, Dmitry Gutov wrote:

> 1. Evaluate the attached .el file.
>
> 2. Start typing, randomly and with certain pauses, trying to align the
> interval to the "Contacting host" messages.
>
> At first, Emacs is responsive, and the requests are duly aborted, like
> 'while-no-input' obliges them to. A little bit later (maybe 10-30
> seconds later), I see errors like "Transfer interrupted", and Emacs
> stutters when that happens. Later on, any HTTP request to the same host
> freezes Emacs until I press C-g. The attached screenshot shows the
> messages log and the list of processes at the end.
>
> I guessing this became much easier to trigger with
> 12e922156c86a26fa4bb2cb9e7d2b3fd639e4707 when eldoc started using
> when-no-input. But whatever underlying problem is there in url-http,
> it's probably been there longer.
>
> In GNU Emacs 27.0.50 (build 20, x86_64-pc-linux-gnu, GTK+ Version 3.22.30)
>  ¬†of 2019-03-05 built on zappa
> Repository revision: 9b93e3b0759d562989283eaecf32e075f984c18c
> Repository branch: master
> Windowing system distributor 'The X.Org Foundation', version 11.0.12001000
> System Description: Ubuntu 18.04.2 LTS




Reply | Threaded
Open this post in threaded view
|

bug#34763: 27.0.50; url-retrieve-synchronously misbehaves inside eldoc-documentation-function

Eli Zaretskii
> From: Dmitry Gutov <[hidden email]>
> Date: Mon, 11 Mar 2019 02:18:54 +0200
>
> Can anybody reproduce this?

Does it help to replace this:

  (defsubst url-http-debug (&rest args)
    (if quit-flag

with this:

  (defsubst url-http-debug (&rest args)
    (if (eq quit-flag t)

in url-http.el?



Reply | Threaded
Open this post in threaded view
|

bug#34763: 27.0.50; url-retrieve-synchronously misbehaves inside eldoc-documentation-function

Dmitry Gutov
On 11.03.2019 16:30, Eli Zaretskii wrote:

> Does it help to replace this:
>
>    (defsubst url-http-debug (&rest args)
>      (if quit-flag
>
> with this:
>
>    (defsubst url-http-debug (&rest args)
>      (if (eq quit-flag t)
>
> in url-http.el?

No change, unfortunately. I double-checked.



Reply | Threaded
Open this post in threaded view
|

bug#34763: 27.0.50; url-retrieve-synchronously misbehaves inside eldoc-documentation-function

Dmitry Gutov
Some further investigation follows.

I've checked out the revision before the one where eldoc started to use
while-no-input (6d898918980be4cb29a182ecde7f8e1f95a08462) and repeated
the scenario.

Predictably, Emacs was considerably less responsive, lots of pauses
waiting for the replies from the remote server.

Also predictably, no "Transfer interrupted" messages until I press C-g.

But I still reach the state where Emacs seemingly just freezes with
"Contacting host: www.google.com:80" in the echo area, and even faster
than with the current Emacs master (within the first 10-20 seconds).

I can abort that with tapping C-g, see that "Transfer interrupted"
message (Emacs still frozen), and then tapping C-g again to see "eldoc
error: (error Interrupted!)", wait a second, and see "Quit".

The message log is only slightly different: it consistently says:

"error in process sentinel: url-http-end-of-document-sentinel: Transfer
interrupted!"

Whereas previously, I mostly seen other sentinels mentioned here.

After Emacs freezes this way once and I C-g out of it, similarly to the
original problem description, any subsequent call to silly-eldoc freezes
right away. Calling url-retrieve-synchronously from eval-expression,
meanwhile, works (although kind of slowly).

This reminds me strongly of a problem with my third-party package
reported by several people, which I never managed to cleanly reproduce:
https://github.com/dgutov/robe/issues/105

Also tried this in Emacs 25.2, the scenario reproduces almost exactly.



Reply | Threaded
Open this post in threaded view
|

bug#34763: 27.0.50; url-retrieve-synchronously misbehaves inside eldoc-documentation-function

Eli Zaretskii
In reply to this post by Dmitry Gutov
> Cc: [hidden email]
> From: Dmitry Gutov <[hidden email]>
> Date: Tue, 12 Mar 2019 11:08:06 +0200
>
> On 11.03.2019 16:30, Eli Zaretskii wrote:
>
> > Does it help to replace this:
> >
> >    (defsubst url-http-debug (&rest args)
> >      (if quit-flag
> >
> > with this:
> >
> >    (defsubst url-http-debug (&rest args)
> >      (if (eq quit-flag t)
> >
> > in url-http.el?
>
> No change, unfortunately. I double-checked.

Crystal ball says there _was_ in fact a change: the error message text
has changed slightly.  Which means there's one more place with a
similar problem.

The below should fix both; please see if it does, and also whether it
fixes your original real-life use case.

Thanks.

diff --git a/lisp/url/url-http.el b/lisp/url/url-http.el
index 76faac1..651a2cc 100644
--- a/lisp/url/url-http.el
+++ b/lisp/url/url-http.el
@@ -150,7 +150,7 @@ url-https-default-port
 ;; These routines will allow us to implement persistent HTTP
 ;; connections.
 (defsubst url-http-debug (&rest args)
-  (if quit-flag
+  (if (eq quit-flag t)
       (let ((proc (get-buffer-process (current-buffer))))
  ;; The user hit C-g, honor it!  Some things can get in an
  ;; incredibly tight loop (chunked encoding)
diff --git a/lisp/url/url-util.el b/lisp/url/url-util.el
index cb80ec6..72ff4f1 100644
--- a/lisp/url/url-util.el
+++ b/lisp/url/url-util.el
@@ -61,7 +61,7 @@ url-debug
 
 ;;;###autoload
 (defun url-debug (tag &rest args)
-  (if quit-flag
+  (if (eq quit-flag t)
       (error "Interrupted!"))
   (if (or (eq url-debug t)
   (numberp url-debug)



Reply | Threaded
Open this post in threaded view
|

bug#34763: 27.0.50; url-retrieve-synchronously misbehaves inside eldoc-documentation-function

Dmitry Gutov
On 12.03.2019 19:18, Eli Zaretskii wrote:

>> Cc: [hidden email]
>> From: Dmitry Gutov <[hidden email]>
>> Date: Tue, 12 Mar 2019 11:08:06 +0200
>>
>> On 11.03.2019 16:30, Eli Zaretskii wrote:
>>
>>> Does it help to replace this:
>>>
>>>     (defsubst url-http-debug (&rest args)
>>>       (if quit-flag
>>>
>>> with this:
>>>
>>>     (defsubst url-http-debug (&rest args)
>>>       (if (eq quit-flag t)
>>>
>>> in url-http.el?
>>
>> No change, unfortunately. I double-checked.
>
> Crystal ball says there _was_ in fact a change: the error message text
> has changed slightly.

I'm sorry, you are right. The text did change.

> The below should fix both; please see if it does, and also whether it
> fixes your original real-life use case.

The patch does away with the first part of the complaint: no "Transfer
interrupted" or "Interrupted" messages anymore. Thanks!

The HTTP requests do take longer and longer times to complete, though.
After enough typing, the 10th request never seems to complete. I can
monitor that with changing the function's definition to

(defun silly-eldoc ()
   (dotimes (i 10)
     (url-retrieve-synchronously "http://www.google.com")
     (message "try %i" i))
   "done")

and watching the Messages buffer.

The counter fairly quickly reaches 0, but rarely goes past 3, and almost
never past 5 (now that I've been calling that eldoc function for several
minutes, albeit with big pauses).

M-x list-processes shows 12 lines with very similar contents.

I don't know whether having 10 requests to the same URL counts as a
real-life case, though.



Reply | Threaded
Open this post in threaded view
|

bug#34763: 27.0.50; url-retrieve-synchronously misbehaves inside eldoc-documentation-function

Eli Zaretskii
> Cc: [hidden email]
> From: Dmitry Gutov <[hidden email]>
> Date: Tue, 12 Mar 2019 19:44:04 +0200
>
> > Crystal ball says there _was_ in fact a change: the error message text
> > has changed slightly.
>
> I'm sorry, you are right. The text did change.

It's similar enough, so I can understand the confusion and
disappointment ;-).

> > The below should fix both; please see if it does, and also whether it
> > fixes your original real-life use case.
>
> The patch does away with the first part of the complaint: no "Transfer
> interrupted" or "Interrupted" messages anymore. Thanks!
>
> The HTTP requests do take longer and longer times to complete, though.
> After enough typing, the 10th request never seems to complete. I can
> monitor that with changing the function's definition to
>
> (defun silly-eldoc ()
>    (dotimes (i 10)
>      (url-retrieve-synchronously "http://www.google.com")
>      (message "try %i" i))
>    "done")
>
> and watching the Messages buffer.
>
> The counter fairly quickly reaches 0, but rarely goes past 3, and almost
> never past 5 (now that I've been calling that eldoc function for several
> minutes, albeit with big pauses).
>
> M-x list-processes shows 12 lines with very similar contents.
>
> I don't know whether having 10 requests to the same URL counts as a
> real-life case, though.

Isn't the above expected, given the recipe?  You are initiating an
HTTP retrieval inside while-no-input, and then provide input which
aborts the retrieval.  What else can we expect from this code?

The point of my changes was to avoid signaling errors from the
retrieval routines when some other unrelated code, such as the "real"
eldoc function, is running inside while-no-input.  But if you run
url-retrieve-synchronously itself inside while-no-input, you should
expect it to be aborted by arriving input, right?  Or am I missing
something?



Reply | Threaded
Open this post in threaded view
|

bug#34763: 27.0.50; url-retrieve-synchronously misbehaves inside eldoc-documentation-function

Dmitry Gutov
On 12.03.2019 20:00, Eli Zaretskii wrote:
> Isn't the above expected, given the recipe?  You are initiating an
> HTTP retrieval inside while-no-input, and then provide input which
> aborts the retrieval.  What else can we expect from this code?

I'm looking at two distinct things at once:

- The requests to be aborted in time without extra stutters or alarming
messages.
- The requests to complete promptly enough if I don't abort them. E.g.
type 'a' and just wait for "done" to show up in the echo area.

The scenario I described triggers, in my testing, problems in both of
them. The first item seems fixed with your patch.

The second is more or less unchanged. It's fairly likely an orthogonal
problem, and this is just the way I've found to trigger it. I can file a
new report, if you like.

To reiterate, the way to reproduce the problem:

- Evaluate the file attached to the report and do some typing, as though
to reproduce the first problem.

- Continue typing, pause from time to time and wait for "done" to show
up in the echo area. After a little while, it will stop arriving,
because the requests will stall for some reason.



Reply | Threaded
Open this post in threaded view
|

bug#34763: 27.0.50; url-retrieve-synchronously misbehaves inside eldoc-documentation-function

Eli Zaretskii
> Cc: [hidden email]
> From: Dmitry Gutov <[hidden email]>
> Date: Tue, 12 Mar 2019 22:05:59 +0200
>
> - Evaluate the file attached to the report and do some typing, as though
> to reproduce the first problem.
>
> - Continue typing, pause from time to time and wait for "done" to show
> up in the echo area. After a little while, it will stop arriving,
> because the requests will stall for some reason.

They don't stall here, they just take more time to complete.

If you do the same from an interactive function, not from eldoc, does
the problem happen there as well?



Reply | Threaded
Open this post in threaded view
|

bug#34763: 27.0.50; url-retrieve-synchronously misbehaves inside eldoc-documentation-function

Dmitry Gutov
On 13.03.2019 18:49, Eli Zaretskii wrote:

>> - Continue typing, pause from time to time and wait for "done" to show
>> up in the echo area. After a little while, it will stop arriving,
>> because the requests will stall for some reason.
>
> They don't stall here, they just take more time to complete.

How much more time? In my current testing, it can go up 20x. And maybe more.

So normally all 10 requests to GOOG complete in 1.1s (meaning 0.11s per
request), but after some typing, it slows down and takes ~20s. That's
~1.8s per request.

At the same time, if I open a second instance of Emacs and try the
scenario there, the requests start out slow already (same as in the
already open instance).

But at the same time, opening www.google.com in Firefox is still fast.
As well as doing it from a console with CURL ('time curl
http://www.google.com' shows 0.3s).

Sounds like an external library problem, maybe?

> If you do the same from an interactive function, not from eldoc, does
> the problem happen there as well?

More or less. I managed to make it slower by ~20x as well.

Try this:

(defun silly-eldoc ()
   (let ((start (time-to-seconds)))
     (dotimes (i 10)
       (url-retrieve-synchronously "http://www.google.com")
       (message "try %i" i))
     (format "done in %f" (- (time-to-seconds) start))))

(defun silly-command ()
   (interactive)
   (message "%s" (silly-eldoc)))

(global-set-key (kbd "C-o") #'silly-command)

Call the command a few time with C-o, it should run the timer and finish
with calculation how much time it took to run the requests. Now:

1. Alternate between pressing C-o and C-g, with varying speeds.

The goal is to see

  error in process filter: Transfer interrupted!
  error in process filter: url-http-generic-filter: Transfer interrupted!

a few times.

2. Then see if C-o still runs to completion. If it does, and does not
look slow enough, repeat 1.

BUT! I have tried to reproduce this slowdown behavior with a local
server, and so far no luck. Ping to Google (or almost any other website)
is 100ms from my current location, so maybe it makes a difference.



Reply | Threaded
Open this post in threaded view
|

bug#34763: 27.0.50; url-retrieve-synchronously misbehaves inside eldoc-documentation-function

Dmitry Gutov
In reply to this post by Eli Zaretskii
And regardless of whether the slowdown is caused by external factors in
my examples, here are two aspects that worry me either way:

1. When I get to "Transfer interrupted", I need to press 'C-g' a second
time to actually abort right away.

2. When that happens, 'M-x list-processes' shows an extra new line with
status 'open'. Interrupt enough requests this way - and there are many
such lines there. That doesn't look good.



Reply | Threaded
Open this post in threaded view
|

bug#34763: 27.0.50; url-retrieve-synchronously misbehaves inside eldoc-documentation-function

Eli Zaretskii
In reply to this post by Dmitry Gutov
> Cc: [hidden email]
> From: Dmitry Gutov <[hidden email]>
> Date: Fri, 15 Mar 2019 16:24:44 +0200
>
> Sounds like an external library problem, maybe?

GnuTLS?

> > If you do the same from an interactive function, not from eldoc, does
> > the problem happen there as well?
>
> More or less. I managed to make it slower by ~20x as well.
>
> Try this:
>
> (defun silly-eldoc ()
>    (let ((start (time-to-seconds)))
>      (dotimes (i 10)
>        (url-retrieve-synchronously "http://www.google.com")
>        (message "try %i" i))
>      (format "done in %f" (- (time-to-seconds) start))))
>
> (defun silly-command ()
>    (interactive)
>    (message "%s" (silly-eldoc)))
>
> (global-set-key (kbd "C-o") #'silly-command)
>
> Call the command a few time with C-o, it should run the timer and finish
> with calculation how much time it took to run the requests. Now:
>
> 1. Alternate between pressing C-o and C-g, with varying speeds.
>
> The goal is to see
>
>   error in process filter: Transfer interrupted!
>   error in process filter: url-http-generic-filter: Transfer interrupted!
>
> a few times.
>
> 2. Then see if C-o still runs to completion. If it does, and does not
> look slow enough, repeat 1.

So it sounds like quitting in the middle of a URL retrieval leaves
some leftovers that get in the way of additional retrieval attempts.

> BUT! I have tried to reproduce this slowdown behavior with a local
> server, and so far no luck. Ping to Google (or almost any other website)
> is 100ms from my current location, so maybe it makes a difference.

Maybe with a local server you aren't fast enough to interrupt the
transfer before it completes?



Reply | Threaded
Open this post in threaded view
|

bug#34763: 27.0.50; url-retrieve-synchronously misbehaves inside eldoc-documentation-function

Eli Zaretskii
In reply to this post by Dmitry Gutov
> Cc: [hidden email]
> From: Dmitry Gutov <[hidden email]>
> Date: Fri, 15 Mar 2019 16:29:10 +0200
>
> And regardless of whether the slowdown is caused by external factors in
> my examples, here are two aspects that worry me either way:
>
> 1. When I get to "Transfer interrupted", I need to press 'C-g' a second
> time to actually abort right away.
>
> 2. When that happens, 'M-x list-processes' shows an extra new line with
> status 'open'. Interrupt enough requests this way - and there are many
> such lines there. That doesn't look good.

The code in the functions that react to quit-flag being non-nil should
be audited.  They don't look clean to me, FWIW, and we don't generally
do such things from Lisp, we normally quit on the C level.  Maybe this
whole sub-feature, whereby we show a message for interrupted
transfers, should be rethought.



Reply | Threaded
Open this post in threaded view
|

bug#34763: 27.0.50; url-retrieve-synchronously misbehaves inside eldoc-documentation-function

Dmitry Gutov
In reply to this post by Eli Zaretskii
On 15.03.2019 17:47, Eli Zaretskii wrote:
>> Cc: [hidden email]
>> From: Dmitry Gutov <[hidden email]>
>> Date: Fri, 15 Mar 2019 16:24:44 +0200
>>
>> Sounds like an external library problem, maybe?
>
> GnuTLS?

Except the requests only use HTTP, not HTTPS. So... IDK?

> So it sounds like quitting in the middle of a URL retrieval leaves
> some leftovers that get in the way of additional retrieval attempts.

Maybe so.

>> BUT! I have tried to reproduce this slowdown behavior with a local
>> server, and so far no luck. Ping to Google (or almost any other website)
>> is 100ms from my current location, so maybe it makes a difference.
>
> Maybe with a local server you aren't fast enough to interrupt the
> transfer before it completes?

I do manage to trigger the "Transfer interrupted" errors, though. So
maybe not. Or not exactly.