TRAMP problem with large repositories

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

TRAMP problem with large repositories

Philippe Vaucher
Hello,

Sorry if this is not the right place to post, feel free to redirect me as needed.

While helping someone for a projectile issue (https://github.com/bbatsov/projectile/issues/1480), it seems that when `shell-command-to-string` tries to execute `git ls-files -zco --exclude-standard` over TRAMP on a repository that has 85K files it takes forever to complete. 

Here's a stacktrace:


We see that `tramp-wait-for-output` calls `tramp-wait-for-regexp` which calls `tramp-check-for-regexp`, and when looking at the source:

(defun tramp-wait-for-output (proc &optional timeout)
  "Wait for output from remote command."
  (unless (buffer-live-p (process-buffer proc))
    (delete-process proc)
    (tramp-error proc 'file-error "Process `%s' not available, try again" proc))
  (with-current-buffer (process-buffer proc)
    (let* (;; Initially, `tramp-end-of-output' is "#$ ".  There might
	   ;; be leading escape sequences, which must be ignored.
	   ;; Busyboxes built with the EDITING_ASK_TERMINAL config
	   ;; option send also escape sequences, which must be
	   ;; ignored.
	   (regexp (format "[^#$\n]*%s\\(%s\\)?\r?$"
			   (regexp-quote tramp-end-of-output)
			   tramp-device-escape-sequence-regexp))
	   ;; Sometimes, the commands do not return a newline but a
	   ;; null byte before the shell prompt, for example "git
	   ;; ls-files -c -z ...".
	   (regexp1 (format "\\(^\\|\000\\)%s" regexp))
	   (found (tramp-wait-for-regexp proc timeout regexp1)))
      .... snip ...

My understanding is that it does a loop that reads a bit of what the commands outputs then tries to parse end of lines (or '\0') and repeats until the process died or that it found one. Because the command returns a huge string (85K files), this process of read-regexp-repeat takes all the CPU (compared to reading the whole chunk in one go and then trying to check for the regexp).

My questions are the following:
  1. Did I understand the problem right? Is this something known?
  2. Is there something to be done about this? Or maybe it would it require too much refactoring / faster implementation?
Kind regards,
Philippe
Reply | Threaded
Open this post in threaded view
|

Re: TRAMP problem with large repositories

Michael Albinus
Philippe Vaucher <[hidden email]> writes:

> Hello,

Hi Philippe,

> While helping someone for a projectile issue
> (https://github.com/bbatsov/projectile/issues/1480), it seems that
> when `shell-command-to-string` tries to execute `git ls-files -zco -
> -exclude-standard` over TRAMP on a repository that has 85K files it
> takes forever to complete.
>
> We see that `tramp-wait-for-output` calls `tramp-wait-for-regexp`
> which calls `tramp-check-for-regexp`, and when looking at the source:
>
> My understanding is that it does a loop that reads a bit of what the
> commands outputs then tries to parse end of lines (or '\0') and
> repeats until the process died or that it found one. Because the
> command returns a huge string (85K files), this process of
> read-regexp-repeat takes all the CPU (compared to reading the whole
> chunk in one go and then trying to check for the regexp).
>
> My questions are the following:
>
> 1 Did I understand the problem right? Is this something known?
Yes, your analysis is right. And no, I haven't seen related reports yet.

> 2 Is there something to be done about this? Or maybe it would it
>   require too much refactoring / faster implementation?

I have appended a patch which should fix the problem. Could you, please,
(let) test?

Btw, the latest Tramp release is always available via GNU ELPA.

> Kind regards,
> Philippe

Best regards, Michael.


attachment0 (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: TRAMP problem with large repositories

Philippe Vaucher
> 2 Is there something to be done about this? Or maybe it would it
>   require too much refactoring / faster implementation?

I have appended a patch which should fix the problem. Could you, please,
(let) test?

Great, I'll make them test & report (or test myself if needed).

 
Btw, the latest Tramp release is always available via GNU ELPA.

Ah, good to know!

Thanks,
Philippe
Reply | Threaded
Open this post in threaded view
|

Re: TRAMP problem with large repositories

Michael Albinus
Philippe Vaucher <[hidden email]> writes:

>     I have appended a patch which should fix the problem. Could you,
>     please, (let) test?
>
> Great, I'll make them test & report (or test myself if needed).

FTR, this morning I've made a test with a cloned Linux kernel git repo
(66473 files). I've accessed the directory via "/ssh::".
(shell-command-to-string "git ls-files -zco --exclude-standard")
returned in a few seconds, with a line of 2479486 bytes.

>     Btw, the latest Tramp release is always available via GNU ELPA.
>
> Ah, good to know!

I've committed the patch meanwhile. Tramp 2.4.3, scheduled for end of
the year, will contain it.

In case there's a problem, you still have some days to report :-)

> Thanks,
> Philippe

Best regards, Michael.

Reply | Threaded
Open this post in threaded view
|

Re: TRAMP problem with large repositories

Philippe Vaucher
>     I have appended a patch which should fix the problem. Could you,
>     please, (let) test?
>
> Great, I'll make them test & report (or test myself if needed).

FTR, this morning I've made a test with a cloned Linux kernel git repo
(66473 files). I've accessed the directory via "/ssh::".
(shell-command-to-string "git ls-files -zco --exclude-standard")
returned in a few seconds, with a line of 2479486 bytes.

Is that with or without the patch? Could you reproduce the problem without the patch?

 
I've committed the patch meanwhile. Tramp 2.4.3, scheduled for end of
the year, will contain it.

In case there's a problem, you still have some days to report :-)

Great, thanks!

Kind regards,
Philippe 
Reply | Threaded
Open this post in threaded view
|

Re: TRAMP problem with large repositories

Michael Albinus
Philippe Vaucher <[hidden email]> writes:

>     FTR, this morning I've made a test with a cloned Linux kernel git
>     repo (66473 files). I've accessed the directory via "/ssh::".
>     (shell-command-to-string "git ls-files -zco --exclude-standard")
>     returned in a few seconds, with a line of 2479486 bytes.
>
> Is that with or without the patch? Could you reproduce the problem
> without the patch?

That was with the patch. I reran the test again, this time over a slow
connection (using a multi-hop to savannah and back). With the patch, the
command returned after <10 secs. W/o the patch, I had to cancel the
command after a while.

> Kind regards,
> Philippe

Best regards, Michael.

Reply | Threaded
Open this post in threaded view
|

Re: TRAMP problem with large repositories

Philippe Vaucher


On Fri, Dec 13, 2019 at 7:31 PM Michael Albinus <[hidden email]> wrote:
> Is that with or without the patch? Could you reproduce the problem
> without the patch?

That was with the patch. I reran the test again, this time over a slow
connection (using a multi-hop to savannah and back). With the patch, the
command returned after <10 secs. W/o the patch, I had to cancel the
command after a while.

A great, nice that you were able to reproduce.

Thank you very much, I'll report when I have news on the other side.

Kind regards,
Philippe 
Reply | Threaded
Open this post in threaded view
|

Re: TRAMP problem with large repositories

Philippe Vaucher
Thank you very much, I'll report when I have news on the other side.

"It worked! Projectile was able to fetch the files within about 10 seconds, and now the files appear to be cached." 

All good then! Thanks a lot.

Philippe