bug#34350: 27.0.50; ediff-revision broken with SVN backend + non ascii chars both in directory and in filename

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

bug#34350: 27.0.50; ediff-revision broken with SVN backend + non ascii chars both in directory and in filename

Vincent Belaïche-2

Hello Emacs experts,

I am trying to do some 'M-x ediff-revision' on a file that is under
version control with svn locally to my MSWindows-10 machine, and that
does not work.

The file path is something like

c:/blah/blah/blah/ê/trunk/ê.tex


(well, it is not exactly this exact string, but I presume that makes no
difference for our concern…).


After some debugging, I realized that the whole thing executes with
default-directory set to "c:/blah/blah/blah/ê/trunk/" the following
two commands:

 (process-file "svn" nil t nil "--non-interactive" "status" "-v" "ê.tex")

and then:

 (process-file "svn" nil t nil "--non-interactive" "cat" "ê.tex")

The first command works quite fine, but the second one fails, and having
a look info the " *temp*" buffer, I saw that svn angrilly barks with
some error message like this one:


--8<----8<----8<----8<----8<-- begin -->8---->8---->8---->8---->8----
svn: warning: W200005: 'C:\blah\blah\blah\\trunk\ª.tex' is not under version control
svn: E200009: Could not cat all targets because some targets are not versioned
svn: E200009: Illegal target for the requested operation
--8<----8<----8<----8<----8<--  end  -->8---->8---->8---->8---->8----

looking more carefully at things, and having inspected the C code of
function call-process, I realized that the variable
coding-system-for-write has some importance, and tracing this I realized
that when the first svn command is called the variable
coding-system-for-write is nil, while when the second svn command is
called it is 'no-conversion, which makes it fail.


The problem is that in the vc-find-revision function you have somewhere
this setting :


 (let ( … (coding-system-for-write 'no-conversion) …) …)

here is a backtrace where I marked two lines X and Y.


--8<----8<----8<----8<----8<-- begin -->8---->8---->8---->8---->8----
Debugger entered--Lisp error: (error "Failed (status 1): svn --non-interactive cat  ê.tex")
  signal(error ("Failed (status 1): svn --non-interactive cat  ê.tex"))
  error("Failed (%s): %s" "status 1" "svn --non-interactive cat  ê.tex")
  (progn (if (eq 32 (aref (buffer-name (current-buffer)) 0)) nil (pop-to-buffer (current-buffer)) (goto-char (point-min)) (shrink-window-if-larger-than-buffer)) (error "Failed (%s): %s" (if (integerp status) (format "status %d" status) status) full-command))
Y (if (and (not (eq t okstatus)) (or (not (integerp status)) (and okstatus (< okstatus status)))) (progn (if (eq 32 (aref (buffer-name (current-buffer)) 0)) nil (pop-to-buffer (current-buffer)) (goto-char (point-min)) (shrink-window-if-larger-than-buffer)) (error "Failed (%s): %s" (if (integerp status) (format "status %d" status) status) full-command)))
  (if (eq okstatus 'async) (let ((proc (let ((process-connection-type nil)) (apply 'start-file-process command (current-buffer) command squeezed)))) (if vc-command-messages (progn (message "Running in background: %s" full-command))) (set-process-sentinel proc #'ignore) (set-process-filter proc 'vc-process-filter) (setq status proc) (if vc-command-messages (progn (vc-exec-after #'(lambda nil (let ... ...)))))) (if vc-command-messages (progn (message "Running in foreground: %s" full-command))) (let ((buffer-undo-list t)) (message "CSFW=%S" coding-system-for-write) (message "CS=%S" (apply 'find-operation-coding-system 'call-process "svn" squeezed)) (message "DIR=%s" default-directory) (message "SQUEEZED=%S" squeezed) (setq status (apply 'process-file command nil t nil squeezed))) (if (and (not (eq t okstatus)) (or (not (integerp status)) (and okstatus (< okstatus status)))) (progn (if (eq 32 (aref (buffer-name (current-buffer)) 0)) nil (pop-to-buffer (current-buffer)) (goto-char (point-min)) (shrink-window-if-larger-than-buffer)) (error "Failed (%s): %s" (if (integerp status) (format "status %d" status) status) full-command))) (if vc-command-messages (progn (message "Done (status=%d): %s" status full-command))))
  (let ((process-environment (cons "LC_MESSAGES=C" process-environment)) (w32-quote-process-args t)) (if (eq okstatus 'async) (let ((proc (let ((process-connection-type nil)) (apply 'start-file-process command (current-buffer) command squeezed)))) (if vc-command-messages (progn (message "Running in background: %s" full-command))) (set-process-sentinel proc #'ignore) (set-process-filter proc 'vc-process-filter) (setq status proc) (if vc-command-messages (progn (vc-exec-after #'(lambda nil ...))))) (if vc-command-messages (progn (message "Running in foreground: %s" full-command))) (let ((buffer-undo-list t)) (message "CSFW=%S" coding-system-for-write) (message "CS=%S" (apply 'find-operation-coding-system 'call-process "svn" squeezed)) (message "DIR=%s" default-directory) (message "SQUEEZED=%S" squeezed) (setq status (apply 'process-file command nil t nil squeezed))) (if (and (not (eq t okstatus)) (or (not (integerp status)) (and okstatus (< okstatus status)))) (progn (if (eq 32 (aref (buffer-name ...) 0)) nil (pop-to-buffer (current-buffer)) (goto-char (point-min)) (shrink-window-if-larger-than-buffer)) (error "Failed (%s): %s" (if (integerp status) (format "status %d" status) status) full-command))) (if vc-command-messages (progn (message "Done (status=%d): %s" status full-command)))))
  (let ((squeezed (remq nil flags)) (inhibit-read-only t) (status 0)) (if files (progn (setq squeezed (nconc squeezed files)))) (let ((process-environment (cons "LC_MESSAGES=C" process-environment)) (w32-quote-process-args t)) (if (eq okstatus 'async) (let ((proc (let (...) (apply ... command ... command squeezed)))) (if vc-command-messages (progn (message "Running in background: %s" full-command))) (set-process-sentinel proc #'ignore) (set-process-filter proc 'vc-process-filter) (setq status proc) (if vc-command-messages (progn (vc-exec-after #'...)))) (if vc-command-messages (progn (message "Running in foreground: %s" full-command))) (let ((buffer-undo-list t)) (message "CSFW=%S" coding-system-for-write) (message "CS=%S" (apply 'find-operation-coding-system 'call-process "svn" squeezed)) (message "DIR=%s" default-directory) (message "SQUEEZED=%S" squeezed) (setq status (apply 'process-file command nil t nil squeezed))) (if (and (not (eq t okstatus)) (or (not (integerp status)) (and okstatus (< okstatus status)))) (progn (if (eq 32 (aref ... 0)) nil (pop-to-buffer (current-buffer)) (goto-char (point-min)) (shrink-window-if-larger-than-buffer)) (error "Failed (%s): %s" (if (integerp status) (format "status %d" status) status) full-command))) (if vc-command-messages (progn (message "Done (status=%d): %s" status full-command))))) (vc-exec-after #'(lambda nil (run-hook-with-args 'vc-post-command-functions command file-or-list flags))) status)
  (save-current-buffer (if (or (eq buffer t) (and (stringp buffer) (string= (buffer-name) buffer)) (eq buffer (current-buffer))) nil (vc-setup-buffer buffer)) (let ((squeezed (remq nil flags)) (inhibit-read-only t) (status 0)) (if files (progn (setq squeezed (nconc squeezed files)))) (let ((process-environment (cons "LC_MESSAGES=C" process-environment)) (w32-quote-process-args t)) (if (eq okstatus 'async) (let ((proc (let ... ...))) (if vc-command-messages (progn (message "Running in background: %s" full-command))) (set-process-sentinel proc #'ignore) (set-process-filter proc 'vc-process-filter) (setq status proc) (if vc-command-messages (progn (vc-exec-after ...)))) (if vc-command-messages (progn (message "Running in foreground: %s" full-command))) (let ((buffer-undo-list t)) (message "CSFW=%S" coding-system-for-write) (message "CS=%S" (apply 'find-operation-coding-system 'call-process "svn" squeezed)) (message "DIR=%s" default-directory) (message "SQUEEZED=%S" squeezed) (setq status (apply 'process-file command nil t nil squeezed))) (if (and (not (eq t okstatus)) (or (not ...) (and okstatus ...))) (progn (if (eq 32 ...) nil (pop-to-buffer ...) (goto-char ...) (shrink-window-if-larger-than-buffer)) (error "Failed (%s): %s" (if ... ... status) full-command))) (if vc-command-messages (progn (message "Done (status=%d): %s" status full-command))))) (vc-exec-after #'(lambda nil (run-hook-with-args 'vc-post-command-functions command file-or-list flags))) status))
  (let* ((files (mapcar #'(lambda (f) (file-relative-name (expand-file-name f))) (if (listp file-or-list) file-or-list (list file-or-list)))) (message-truncate-lines t) (full-command (concat (if (string= (substring command -1) "\n") (substring command 0 -1) command) " " (vc-delistify flags) " " (vc-delistify files)))) (save-current-buffer (if (or (eq buffer t) (and (stringp buffer) (string= (buffer-name) buffer)) (eq buffer (current-buffer))) nil (vc-setup-buffer buffer)) (let ((squeezed (remq nil flags)) (inhibit-read-only t) (status 0)) (if files (progn (setq squeezed (nconc squeezed files)))) (let ((process-environment (cons "LC_MESSAGES=C" process-environment)) (w32-quote-process-args t)) (if (eq okstatus 'async) (let ((proc ...)) (if vc-command-messages (progn ...)) (set-process-sentinel proc #'ignore) (set-process-filter proc 'vc-process-filter) (setq status proc) (if vc-command-messages (progn ...))) (if vc-command-messages (progn (message "Running in foreground: %s" full-command))) (let ((buffer-undo-list t)) (message "CSFW=%S" coding-system-for-write) (message "CS=%S" (apply ... ... "svn" squeezed)) (message "DIR=%s" default-directory) (message "SQUEEZED=%S" squeezed) (setq status (apply ... command nil t nil squeezed))) (if (and (not ...) (or ... ...)) (progn (if ... nil ... ... ...) (error "Failed (%s): %s" ... full-command))) (if vc-command-messages (progn (message "Done (status=%d): %s" status full-command))))) (vc-exec-after #'(lambda nil (run-hook-with-args 'vc-post-command-functions command file-or-list flags))) status)))
  vc-do-command(#<buffer  *temp file*-256291> 0 "svn" "c:/blah/blah/blah/ê/trunk/ê.tex" "--non-interactive" "cat" nil)
  apply(vc-do-command #<buffer  *temp file*-256291> 0 "svn" "c:/blah/blah/blah/ê/trunk/ê.tex" ("--non-interactive" "cat" nil))
  vc-svn-command(#<buffer  *temp file*-256291> 0 "c:/blah/blah/blah/ê/trunk/ê.tex" "cat" nil)
  apply(vc-svn-command #<buffer  *temp file*-256291> 0 "c:/blah/blah/blah/ê/trunk/ê.tex" "cat" nil nil)
  (let (process-file-side-effects) (apply 'vc-svn-command buffer 0 file "cat" (and rev (not (string= rev "")) (concat "-r" rev)) (vc-switches 'SVN 'checkout)))
  vc-svn-find-revision("c:/blah/blah/blah/ê/trunk/ê.tex" nil #<buffer  *temp file*-256291>)
  apply(vc-svn-find-revision ("c:/blah/blah/blah/ê/trunk/ê.tex" nil #<buffer  *temp file*-256291>))
  vc-call-backend(SVN find-revision "c:/blah/blah/blah/ê/trunk/ê.tex" nil #<buffer  *temp file*-256291>)
  (if backend (vc-call-backend backend 'find-revision file revision outbuf) (vc-call-backend (vc-backend file) 'find-revision file revision outbuf))
  (save-current-buffer (set-buffer filebuf) (if backend (vc-call-backend backend 'find-revision file revision outbuf) (vc-call-backend (vc-backend file) 'find-revision file revision outbuf)))
  (let ((outbuf (current-buffer))) (save-current-buffer (set-buffer filebuf) (if backend (vc-call-backend backend 'find-revision file revision outbuf) (vc-call-backend (vc-backend file) 'find-revision file revision outbuf))))
  (save-current-buffer (set-buffer temp-buffer) (let ((outbuf (current-buffer))) (save-current-buffer (set-buffer filebuf) (if backend (vc-call-backend backend 'find-revision file revision outbuf) (vc-call-backend (vc-backend file) 'find-revision file revision outbuf)))))
  (prog1 (save-current-buffer (set-buffer temp-buffer) (let ((outbuf (current-buffer))) (save-current-buffer (set-buffer filebuf) (if backend (vc-call-backend backend 'find-revision file revision outbuf) (vc-call-backend (vc-backend file) 'find-revision file revision outbuf))))) (save-current-buffer (set-buffer temp-buffer) (write-region nil nil temp-file nil 0)))
  (unwind-protect (prog1 (save-current-buffer (set-buffer temp-buffer) (let ((outbuf (current-buffer))) (save-current-buffer (set-buffer filebuf) (if backend (vc-call-backend backend 'find-revision file revision outbuf) (vc-call-backend (vc-backend file) 'find-revision file revision outbuf))))) (save-current-buffer (set-buffer temp-buffer) (write-region nil nil temp-file nil 0))) (and (buffer-name temp-buffer) (kill-buffer temp-buffer)))
  (let ((temp-file filename) (temp-buffer (get-buffer-create (generate-new-buffer-name " *temp file*")))) (unwind-protect (prog1 (save-current-buffer (set-buffer temp-buffer) (let ((outbuf (current-buffer))) (save-current-buffer (set-buffer filebuf) (if backend (vc-call-backend backend ... file revision outbuf) (vc-call-backend ... ... file revision outbuf))))) (save-current-buffer (set-buffer temp-buffer) (write-region nil nil temp-file nil 0))) (and (buffer-name temp-buffer) (kill-buffer temp-buffer))))
X (let ((coding-system-for-read 'no-conversion) (coding-system-for-write 'no-conversion)) (let ((temp-file filename) (temp-buffer (get-buffer-create (generate-new-buffer-name " *temp file*")))) (unwind-protect (prog1 (save-current-buffer (set-buffer temp-buffer) (let ((outbuf ...)) (save-current-buffer (set-buffer filebuf) (if backend ... ...)))) (save-current-buffer (set-buffer temp-buffer) (write-region nil nil temp-file nil 0))) (and (buffer-name temp-buffer) (kill-buffer temp-buffer)))) (setq failed nil))
  (unwind-protect (let ((coding-system-for-read 'no-conversion) (coding-system-for-write 'no-conversion)) (let ((temp-file filename) (temp-buffer (get-buffer-create (generate-new-buffer-name " *temp file*")))) (unwind-protect (prog1 (save-current-buffer (set-buffer temp-buffer) (let (...) (save-current-buffer ... ...))) (save-current-buffer (set-buffer temp-buffer) (write-region nil nil temp-file nil 0))) (and (buffer-name temp-buffer) (kill-buffer temp-buffer)))) (setq failed nil)) (if (and failed (file-exists-p filename)) (progn (delete-file filename))))
  (let ((failed t)) (unwind-protect (let ((coding-system-for-read 'no-conversion) (coding-system-for-write 'no-conversion)) (let ((temp-file filename) (temp-buffer (get-buffer-create (generate-new-buffer-name " *temp file*")))) (unwind-protect (prog1 (save-current-buffer (set-buffer temp-buffer) (let ... ...)) (save-current-buffer (set-buffer temp-buffer) (write-region nil nil temp-file nil 0))) (and (buffer-name temp-buffer) (kill-buffer temp-buffer)))) (setq failed nil)) (if (and failed (file-exists-p filename)) (progn (delete-file filename)))))
  (save-current-buffer (set-buffer filebuf) (let ((failed t)) (unwind-protect (let ((coding-system-for-read 'no-conversion) (coding-system-for-write 'no-conversion)) (let ((temp-file filename) (temp-buffer (get-buffer-create ...))) (unwind-protect (prog1 (save-current-buffer ... ...) (save-current-buffer ... ...)) (and (buffer-name temp-buffer) (kill-buffer temp-buffer)))) (setq failed nil)) (if (and failed (file-exists-p filename)) (progn (delete-file filename))))) (vc-mode-line file))
  (if (file-exists-p automatic-backup) (rename-file automatic-backup filename nil) (message "Checking out %s..." filename) (save-current-buffer (set-buffer filebuf) (let ((failed t)) (unwind-protect (let ((coding-system-for-read 'no-conversion) (coding-system-for-write 'no-conversion)) (let ((temp-file filename) (temp-buffer ...)) (unwind-protect (prog1 ... ...) (and ... ...))) (setq failed nil)) (if (and failed (file-exists-p filename)) (progn (delete-file filename))))) (vc-mode-line file)) (message "Checking out %s...done" filename))
  (if (file-exists-p filename) nil (if (file-exists-p automatic-backup) (rename-file automatic-backup filename nil) (message "Checking out %s..." filename) (save-current-buffer (set-buffer filebuf) (let ((failed t)) (unwind-protect (let ((coding-system-for-read ...) (coding-system-for-write ...)) (let (... ...) (unwind-protect ... ...)) (setq failed nil)) (if (and failed (file-exists-p filename)) (progn (delete-file filename))))) (vc-mode-line file)) (message "Checking out %s...done" filename)))
  (let ((automatic-backup (vc-version-backup-file-name file revision)) (filebuf (or (get-file-buffer file) (current-buffer))) (filename (vc-version-backup-file-name file revision 'manual))) (if (file-exists-p filename) nil (if (file-exists-p automatic-backup) (rename-file automatic-backup filename nil) (message "Checking out %s..." filename) (save-current-buffer (set-buffer filebuf) (let ((failed t)) (unwind-protect (let (... ...) (let ... ...) (setq failed nil)) (if (and failed ...) (progn ...)))) (vc-mode-line file)) (message "Checking out %s...done" filename))) (let ((result-buf (find-file-noselect filename))) (save-current-buffer (set-buffer result-buf) (set (make-local-variable 'vc-parent-buffer) filebuf)) result-buf))
  vc-find-revision("c:/blah/blah/blah/ê/trunk/ê.tex" nil)

  … … … …
--8<----8<----8<----8<----8<--  end  -->8---->8---->8---->8---->8----

On line X coding-system-for-write is set to 'no-conversion.
line Y is shown here in function vc-do-command, where I also marked a
line as Z:

--8<----8<----8<----8<----8<-- begin -->8---->8---->8---->8---->8----
            ;; Run synchronously
            (when vc-command-messages
              (message "Running in foreground: %s" full-command))
            (let ((buffer-undo-list t))
Z      (setq status (apply 'process-file command nil t nil squeezed)))
Y    (when (and (not (eq t okstatus))
                       (or (not (integerp status))
                           (and okstatus (< okstatus status))))
              (unless (eq ?\s (aref (buffer-name (current-buffer)) 0))
                (pop-to-buffer (current-buffer))
                (goto-char (point-min))
                (shrink-window-if-larger-than-buffer))
--8<----8<----8<----8<----8<--  end  -->8---->8---->8---->8---->8----

Line Z is where the svn command are called, and there a function
process-file is called, this function calls in turn function
call-process. Function call-process encodes the arguments when
necessary, but the setting of coding-system-for-write to 'no-conversion
disturbs that, and OUCH, the command line passed to svn is not properly
encoded, so with some reason svn unhappilly barks.


OK, I know that some Askese people will probably say something like
« don't use non-ASCII characters in file-names » which is IMHO quite a
ASCII-centric way of thinking, so just in case, my answer is « Î wîll
ùse nôn-ÀSCÎÎ çhærâctèrs név
œrthel
èss, because Î àm mysèlf a nôn-ASCII
pèrsôn ! » ;-)

  Vincent.


In GNU Emacs 27.0.50 (build 1, x86_64-w64-mingw32)
 of 2018-08-10 built on AIGLEROYAL
Windowing system distributor 'Microsoft Corp.', version 10.0.17134
Recent messages:
Continuing through this frame
Checking out c:/Users/Vincent/Documents/administration/divers/20181212_je_veux_être_un_homme_euro/trunk/20181212_je_veux_être_un_homme_euro.tex.~118~...
Running in foreground: svn --non-interactive cat  20181212_je_veux_être_un_homme_euro.tex
CSFW=no-conversion
CS=(windows-1252 . windows-1252)
DIR=c:/Users/Vincent/Documents/administration/divers/20181212_je_veux_être_un_homme_euro/trunk/
SQUEEZED=("--non-interactive" "cat" "20181212_je_veux_être_un_homme_euro.tex")
Entering debugger...
Mark set
Making completion list...

Configured using:
 'configure --prefix=C:/Nos_Programmes/GNU/Emacs
 PKG_CONFIG_PATH=/mingw64/lib/pkgconfig:/mingw64/share/pkgconfig
 'CFLAGS= -Og -g3 -L
 C:/Programmes/installation/emacs-install/libXpm-3.5.8/src' 'CPPFLAGS=
 -DFOR_MSW=1 -I
 C:/Programmes/installation/emacs-install/libXpm-3.5.8/include -I
 C:/Programmes/installation/emacs-install/libXpm-3.5.8/src -L
 C:/Programmes/installation/emacs-install/libXpm-3.5.8/src'
 PKG_CONFIG=/usr/bin/pkg-config.exe'

Configured features:
XPM JPEG TIFF GIF PNG RSVG SOUND NOTIFY ACL GNUTLS LIBXML2 ZLIB
TOOLKIT_SCROLL_BARS THREADS LCMS2

Important settings:
  value of $LC_MESSAGES: C
  value of $LANG: FRA
  locale-coding-system: cp1252

Major mode: Emacs-Lisp

Minor modes in effect:
  diff-auto-refine-mode: t
  shell-dirtrack-mode: t
  TeX-PDF-mode: t
  recentf-mode: t
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
c:/Programmes/installation/cedet-install/cedet-git/lisp/speedbar/loaddefs hides c:/Nos_Programmes/GNU/Emacs/share/emacs/27.0.50/lisp/loaddefs
c:/Programmes/installation/cedet-install/cedet-git/lisp/speedbar/loaddefs hides c:/Programmes/installation/cedet-install/cedet-git/lisp/cedet/loaddefs

Features:
(shadow emacsbug vc-annotate vc-filewise edebug completion find-cmd grep
find-dired pp face-remap ediff-ptch eieio-opt vc-git diff-mode help-fns
radix-tree cl-print debug backtrace ediff-vers ediff-merg ediff-wind
ediff-diff ediff-mult ediff-help ediff-init ediff-util ediff vc-rcs
vc-dir bbdb-com vc reftex-cite reftex-parse texmathp org-rmail org-mhe
org-irc org-info org-gnus nnir gnus-sum gnus-group gnus-undo gnus-start
gnus-cloud nnimap nnmail mail-source utf7 netrc nnoo gnus-spec gnus-int
gnus-range gnus-win gnus nnheader org-docview doc-view jka-compr
image-mode org-bibtex bibtex org-bbdb org-w3m org-element org org-macro
org-footnote org-pcomplete org-list org-faces org-entities org-version
ob-emacs-lisp ob ob-tangle org-src ob-ref ob-lob ob-table ob-keys ob-exp
ob-comint ob-core ob-eval org-compat org-macs org-loaddefs find-func
cal-tex cal-menu calendar cal-loaddefs misearch multi-isearch
vc-dispatcher vc-svn preview prv-emacs reftex-dcr reftex reftex-loaddefs
reftex-vars tex-bar tex-buf toolbar-x noutline outline font-latex latex
latex-flymake flymake-proc flymake warnings tex-ispell tex-style
tex-mode shell pcomplete latexenc log-edit add-log pcvs pcvs-parse
pcvs-info pcvs-defs pcvs-util ewoc calc-undo calccomp calc-vec calc-aent
calc-alg calc-menu parse-time vc-cvs hl-line balance calc-forms
network-stream nsm mailalias smtpmail qp sort bbdb-message sendmail
mail-extr message rmc puny format-spec rfc822 mml mml-sec epa gnus-util
rmail rmail-loaddefs mm-decode mm-bodies mm-encode mail-parse rfc2231
rfc2047 rfc2045 mm-util ietf-drums mail-prsvr mailabbrev mail-utils
gmm-utils mailheader elec-pair edmacro kmacro jdee jdee-wiz jdee-test
jdee-archive memoize jdee-stacktrace jdee-refactor dired-aux dired-x
dired dired-loaddefs jdee-project-file jdee-maven dash jdee-keys
jdee-jdb jdee-java-grammar jdee-which-method jdee-font-lock jdee-issues
jdee-help jdee-gen tempo jdee-find jdee-deps jdee-cygwin jdee-custom
jdee-compile jdee-class jdee-bytecode jdee-bug easy-mmode jdee-dbs
jdee-run jdee-jdk-manager jdee-dbo jdee-widgets jdee-db jdee-open-source
semantic/senator semantic/db-find semantic/db-ref semantic/decorate
pulse jdee-import jdee-complete semantic/idle working fame jdee-parse
jdee-backend jdee-bsh jdee-util arc-mode archive-mode jdee-parse-expr
beanshell thingatpt semantic/sb speedbar sb-image dframe jdee-imenu
semantic/imenu imenu semantic/sort semantic/db-file semantic/adebug
eieio-datadebug data-debug cedet-files semantic/db eieio-base
semantic/java semantic/format ezimage semantic/tag-ls semantic/find
semantic/doc semantic/ctxt semantic/util-modes semantic/util semantic
semantic/tag semantic/lex semantic/fw mode-local jdee-avl-tree avl-tree
generator etags xref project efc eieio-compat jdee-annotations
jdee-abbrev jdee-classpath jdee-files jdee-activator jdee-log executable
cus-edit cus-start cus-load compile comint ansi-color ring cedet cc-mode
cc-fonts cc-guess cc-menus cc-cmds cc-styles cc-align cc-engine cc-vars
cc-defs browse-url flatland-black-theme calc-misc calc-arith calc-ext
calc calc-loaddefs calc-macs skeleton rx tex-mik tex crm advice
preview-latex auto-loads tex-site bbdb bbdb-site timezone bbdb-loaddefs
template w32utils cl recentf tree-widget wid-edit load-path-to-cedet-svn
time-date mule-util finder-inf package let-alist derived pcase cl-extra
help-mode easymenu url-handlers url-parse auth-source cl-seq eieio
eieio-core cl-macs eieio-loaddefs password-cache json map url-vars seq
byte-opt gv bytecomp byte-compile cconv epg epg-config subr-x
cl-loaddefs cl-lib tooltip eldoc electric uniquify ediff-hook vc-hooks
lisp-float-type mwheel dos-w32 ls-lisp disp-table term/w32-win w32-win
w32-vars term/common-win tool-bar dnd fontset image regexp-opt fringe
tabulated-list replace newcomment text-mode elisp-mode lisp-mode
prog-mode register page menu-bar rfn-eshadow isearch timer select
scroll-bar mouse jit-lock font-lock syntax facemenu font-core
term/tty-colors frame cl-generic cham georgian utf-8-lang misc-lang
vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms cp51932
hebrew greek romanian slovak czech european ethiopic indian cyrillic
chinese composite charscript charprop case-table epa-hook jka-cmpr-hook
help simple abbrev obarray minibuffer cl-preloaded nadvice loaddefs
button faces cus-face macroexp files text-properties overlay sha1 md5
base64 format env code-pages mule custom widget hashtable-print-readable
backquote threads w32notify w32 lcms2 multi-tty make-network-process
emacs)

Memory information:
((conses 16 705712 103613)
 (symbols 56 59375 1)
 (miscs 48 1904 2148)
 (strings 32 161627 7850)
 (string-bytes 1 5385398)
 (vectors 16 71779)
 (vector-slots 8 2019402 40514)
 (floats 8 326 902)
 (intervals 56 18236 3220)
 (buffers 992 66))
Reply | Threaded
Open this post in threaded view
|

bug#34350: 27.0.50; ediff-revision broken with SVN backend + non ascii chars both in directory and in filename

Eli Zaretskii
> From: Vincent Belaïche <[hidden email]>
> Date: Wed, 06 Feb 2019 20:52:38 +0100
> Cc: Vincent Belaïche <[hidden email]>
>
> After some debugging, I realized that the whole thing executes with
> default-directory set to "c:/blah/blah/blah/ê/trunk/" the following
> two commands:
>
>  (process-file "svn" nil t nil "--non-interactive" "status" "-v" "ê.tex")
>
> and then:
>
>  (process-file "svn" nil t nil "--non-interactive" "cat" "ê.tex")
>
> The first command works quite fine, but the second one fails, and having
> a look info the " *temp*" buffer, I saw that svn angrilly barks with
> some error message like this one:
>
> --8<----8<----8<----8<----8<-- begin -->8---->8---->8---->8---->8----
> svn: warning: W200005: 'C:\blah\blah\blah\\trunk\ª.tex' is not under version control
> svn: E200009: Could not cat all targets because some targets are not versioned
> svn: E200009: Illegal target for the requested operation
> --8<----8<----8<----8<----8<--  end  -->8---->8---->8---->8---->8----
>
> looking more carefully at things, and having inspected the C code of
> function call-process, I realized that the variable
> coding-system-for-write has some importance, and tracing this I realized
> that when the first svn command is called the variable
> coding-system-for-write is nil, while when the second svn command is
> called it is 'no-conversion, which makes it fail.

Ouch!  There's a real mess in vc-find-revision and its backend
implementations wrt encoding and decoding.  I will address the general
problem separately, but for now please see if the patch below fixes
the immediate problem with SVN.

> OK, I know that some Askese people will probably say something like
> « don't use non-ASCII characters in file-names » which is IMHO quite a
> ASCII-centric way of thinking, so just in case, my answer is « Î wîll
> ùse nôn-ÀSCÎÎ çhærâctèrs névœrthelèss, because Î àm mysèlf a nôn-ASCII
> pèrsôn ! » ;-)

Is this part relevant?  You didn't really think we refuse to support
non-ASCII file names, did you?

diff --git a/lisp/vc/vc.el b/lisp/vc/vc.el
index 9925196..326284f 100644
--- a/lisp/vc/vc.el
+++ b/lisp/vc/vc.el
@@ -1966,10 +1966,13 @@ vc-find-revision
  (with-current-buffer filebuf
   (let ((failed t))
     (unwind-protect
- (let ((coding-system-for-read 'no-conversion)
-      (coding-system-for-write 'no-conversion))
+ (let ((coding-system-for-read 'no-conversion))
   (with-temp-file filename
     (let ((outbuf (current-buffer)))
+                      ;; We will read the backend's output with no
+                      ;; conversions, so we should also save the
+                      ;; temporary file with no encoding conversions.
+                      (setq buffer-file-coding-system 'no-conversion)
       ;; Change buffer to get local value of
       ;; vc-checkout-switches.
       (with-current-buffer filebuf



Reply | Threaded
Open this post in threaded view
|

bug#34350: 27.0.50; ediff-revision broken with SVN backend + non ascii chars both in directory and in filename

Eli Zaretskii
In reply to this post by Vincent Belaïche-2
Stefan, Dmitry, can I have your attention please?

I think there's a real mess regarding encoding and decoding stuff in
vc-find-revision and its backend implementations.  For some reason,
several backend implementations bind both coding-system-for-read and
coding-system-for-write to no-conversion when they invoke the VCS to
produce a file at a certain revision.  Here's an example from
vc-git.el:

  (defun vc-git-find-revision (file rev buffer)
    (let* (process-file-side-effects
           (coding-system-for-read 'binary)
           (coding-system-for-write 'binary)
           (fullname
            (let ((fn (vc-git--run-command-string
                       file "ls-files" "-z" "--full-name" "--")))
              ;; ls-files does not return anything when looking for a
              ;; revision of a file that has been renamed or removed.
              (if (string= fn "")
                  (file-relative-name file (vc-git-root default-directory))
                (substring fn 0 -1)))))
      (vc-git-command
       buffer 0
       nil
       "cat-file" "blob" (concat (if rev rev "HEAD") ":" fullname))))

Any idea why it insists on disabling code-conversions?

More generally, the find-revision method is documented as

  Fetch revision REV of file FILE and put it into BUFFER.

There's nothing here that hints that the resulting buffer will not
look like any visited file, i.e. after decoding.  Am I missing
something?

In addition to producing undecoded buffer, these bindings have the
adverse effect of forcing Emacs not to encode command-line arguments
we pass to the VCS, which is the immediate cause for the current bug.

It is my impression that one of the backends did these bindings for
some reason, and then others simply copy-pasted from it.

I'd like to fix this mess, if possible.

Thanks in advance for any insights.



Reply | Threaded
Open this post in threaded view
|

bug#34350: 27.0.50; ediff-revision broken with SVN backend + non ascii chars both in directory and in filename

Vincent Belaïche-2
In reply to this post by Eli Zaretskii

Le 07/02/2019 à 16:41, Eli Zaretskii a écrit :
>> From: Vincent Belaïche <[hidden email]>
>> Date: Wed, 06 Feb 2019 20:52:38 +0100
>> Cc: Vincent Belaïche <[hidden email]>
>>

[...]

>
>> OK, I know that some Askese people will probably say something like
>> « don't use non-ASCII characters in file-names » which is IMHO quite a
>> ASCII-centric way of thinking, so just in case, my answer is « Î wîll
>> ùse nôn-ÀSCÎÎ çhærâctèrs névœrthelèss, because Î àm mysèlf a nôn-ASCII
>> pèrsôn ! » ;-)
>
> Is this part relevant?  You didn't really think we refuse to support
> non-ASCII file names, did you?
>

[...]

No, I did not, that was just a joke, sorry for not being funny.

Concerning your patch, it made it work.

  Vincent.



Reply | Threaded
Open this post in threaded view
|

bug#34350: 27.0.50; ediff-revision broken with SVN backend + non ascii chars both in directory and in filename

Eli Zaretskii
> From: Vincent Belaïche <[hidden email]>
> Cc: Vincent Belaïche <[hidden email]>
> Date: Thu, 07 Feb 2019 23:27:28 +0100
>
> > Is this part relevant?  You didn't really think we refuse to support
> > non-ASCII file names, did you?
>
> [...]
>
> No, I did not, that was just a joke, sorry for not being funny.

Oh, that was a joke?  Sorry for not catching it.

> Concerning your patch, it made it work.

Thanks, pushed to the emacs-26 branch.

This only fixes some VC backends, there are others which bind
coding-system-for-write in their backend implementations, which looks
simply wrong to me.  I hope we will be able to fix those as well, but
perhaps not in Emacs 26.2, if the changes I have in mind will be
deemed unsafe.



Reply | Threaded
Open this post in threaded view
|

bug#34350: 27.0.50; ediff-revision broken with SVN backend + non ascii chars both in directory and in filename

Stefan Monnier
In reply to this post by Eli Zaretskii
> I think there's a real mess regarding encoding and decoding stuff in
> vc-find-revision and its backend implementations.

I can believe it.

I think intuitively, in terms of encoding for the file's contents the
backends should always return a byte-sequence (i.e. with no-conversion)
and the front-end should then decide how to decode it (e.g., obeying
the -*- coding -*- cookie and such).

But since that holds for all backends, ideally they should be called in
such a way that they automatically return bytes (rather than chars)
without any extra effort (e.g. without manually binding
coding-system-for-read/write).


        Stefan



Reply | Threaded
Open this post in threaded view
|

bug#34350: 27.0.50; ediff-revision broken with SVN backend + non ascii chars both in directory and in filename

Dmitry Gutov
In reply to this post by Eli Zaretskii
On 07.02.2019 18:50, Eli Zaretskii wrote:

> Any idea why it insists on disabling code-conversions?

This was well before my time, and the only people I remember messing
with these variables anywhere in VC ever since are you (for
Windows-related fixes) and Michael. So this might require some digging
with vc-annotate.

Also note that this affects two commands: one that returns the file
name, and another, its contents. I'm not sure what encoding 'git
cat-file blob' outputs in.

> In addition to producing undecoded buffer, these bindings have the
> adverse effect of forcing Emacs not to encode command-line arguments
> we pass to the VCS, which is the immediate cause for the current bug.

The adverse effect sounds like a separate bug.



Reply | Threaded
Open this post in threaded view
|

bug#34350: 27.0.50; ediff-revision broken with SVN backend + non ascii chars both in directory and in filename

Eli Zaretskii
In reply to this post by Stefan Monnier
> From: Stefan Monnier <[hidden email]>
> Cc: Dmitry Gutov <[hidden email]>,  [hidden email],  [hidden email]
> Date: Fri, 08 Feb 2019 11:08:36 -0500
>
> I think intuitively, in terms of encoding for the file's contents the
> backends should always return a byte-sequence (i.e. with no-conversion)
> and the front-end should then decide how to decode it (e.g., obeying
> the -*- coding -*- cookie and such).

Why do we need to leave this to the front end?  It's a waste of cycles
to do decoding manually in Lisp, and it also pushes this obscure art
to application levels, where many Lisp programmers won't know how to
DTRT.

IMO, it's the other way around: by default the back-end should not
insist on any encoding, except where the VCS requires it, and it
should always by default decode the stuff it returns.  The front end
should disable code conversions when needed, and the back-end should
honor that.  This will mostly do TRT, because in the vast majority of
cases users do want eventually to have the stuff decoded, the number
of cases where we want the contrary is minuscule.

VC is just a fancy way of visiting files, so it should normally do the
same, i.e. decode.



Reply | Threaded
Open this post in threaded view
|

bug#34350: 27.0.50; ediff-revision broken with SVN backend + non ascii chars both in directory and in filename

Eli Zaretskii
In reply to this post by Dmitry Gutov
> Cc: [hidden email], [hidden email]
> From: Dmitry Gutov <[hidden email]>
> Date: Fri, 8 Feb 2019 20:25:35 +0300
>
> On 07.02.2019 18:50, Eli Zaretskii wrote:
>
> > Any idea why it insists on disabling code-conversions?
>
> This was well before my time, and the only people I remember messing
> with these variables anywhere in VC ever since are you (for
> Windows-related fixes) and Michael. So this might require some digging
> with vc-annotate.

How is vc-annotate relevant to vc-git-find-revision and the issue at
hand?  The last thing vc-find-revision does is to visit the file
written by the back-end, and it visits it normally, with
find-file-noselect, which decodes the contents as usual.  The problems
I see are with forcing no-conversion in the intermediate operations in
the back-end, where it extracts the specified version to a buffer.

> Also note that this affects two commands: one that returns the file
> name, and another, its contents. I'm not sure what encoding 'git
> cat-file blob' outputs in.

We already have in vc-git a variable that holds that encoding (UTF-8
by default).

> > In addition to producing undecoded buffer, these bindings have the
> > adverse effect of forcing Emacs not to encode command-line arguments
> > we pass to the VCS, which is the immediate cause for the current bug.
>
> The adverse effect sounds like a separate bug.

It's not a bug, it's a documented feature.  coding-system-for-write
affects both I/O from and to a subprocess and the encoding of the
command-line arguments we pass to that subprocess.  That's why we
should be extra-careful when binding coding-system-for-write around
invocations of other programs.



Reply | Threaded
Open this post in threaded view
|

bug#34350: 27.0.50; ediff-revision broken with SVN backend + non ascii chars both in directory and in filename

Stefan Monnier
In reply to this post by Eli Zaretskii
>> I think intuitively, in terms of encoding for the file's contents the
>> backends should always return a byte-sequence (i.e. with no-conversion)
>> and the front-end should then decide how to decode it (e.g., obeying
>> the -*- coding -*- cookie and such).
> Why do we need to leave this to the front end?

So that it's not half-implemented differently in every backend.

> It's a waste of cycles to do decoding manually in Lisp,

It'd be better to decode "on the fly" rather than first insert the byte
stream in a buffer and then decode it.  No doubt.
But I can't see how to do that and handle -*-coding-*-,
auto-coding-regexp-alist, and friends.

> and it also pushes this obscure art to application levels,

The "front-end" is `vc-find-revision`.  All other code should be layered
on top of that one, so it's done at only one place.  If that doesn't
work (because vc-find-revision is not flexible enough) then we should
move this decoding code to a middle-layer between vc-find-revision and
(vc-call find-revision ...) that all users of `find-revision` can then use.

> insist on any encoding, except where the VCS requires it, and it

I don't know of any VCS that enforces any kind of encoding on the files
it manages.  AFAIK they pretty much all manage files made of lines where
the lines are made of bytes (with some extra accommodations for files
not made of lines).  Some try to handle line-end conversion to
some extent, but that doesn't really affect us anyway.


        Stefan



Reply | Threaded
Open this post in threaded view
|

bug#34350: 27.0.50; ediff-revision broken with SVN backend + non ascii chars both in directory and in filename

Eli Zaretskii
> From: Stefan Monnier <[hidden email]>
> Cc: [hidden email],  [hidden email],  [hidden email]
> Date: Fri, 08 Feb 2019 16:50:58 -0500
>
> >> I think intuitively, in terms of encoding for the file's contents the
> >> backends should always return a byte-sequence (i.e. with no-conversion)
> >> and the front-end should then decide how to decode it (e.g., obeying
> >> the -*- coding -*- cookie and such).
> > Why do we need to leave this to the front end?
>
> So that it's not half-implemented differently in every backend.

But every back-end has its own peculiar ways to do the job.  E.g., Git
needs to call "git ls-files" first, and we then need to read the file
names before calling "git cat".  Other VCSes don't necessarily need
that.  So different implementations cannot be worked around and no
higher layers can even know what the lower layers do to set up
encoding correctly for all of them, at least not in most cases.

> > It's a waste of cycles to do decoding manually in Lisp,
>
> It'd be better to decode "on the fly" rather than first insert the byte
> stream in a buffer and then decode it.  No doubt.
> But I can't see how to do that and handle -*-coding-*-,
> auto-coding-regexp-alist, and friends.

What do you mean by "how"?  Just do the normal I/O, and all of those
will be taken care of.  Like when we visit a file.  What am I missing?

> > and it also pushes this obscure art to application levels,
>
> The "front-end" is `vc-find-revision`.

So you are saying that no other VC function cal call the backend's
find-revision method?  But we already have 2 more functions that do
it, and I see no problems with doing that.

> All other code should be layered on top of that one, so it's done at
> only one place.  If that doesn't work (because vc-find-revision is
> not flexible enough) then we should move this decoding code to a
> middle-layer between vc-find-revision and (vc-call find-revision
> ...) that all users of `find-revision` can then use.

I don't think this is possible in general, because different callers
have different needs wrt to encoding/decoding.

> > insist on any encoding, except where the VCS requires it, and it
>
> I don't know of any VCS that enforces any kind of encoding on the files
> it manages.  AFAIK they pretty much all manage files made of lines where
> the lines are made of bytes (with some extra accommodations for files
> not made of lines).  Some try to handle line-end conversion to
> some extent, but that doesn't really affect us anyway.

You forget VCS operations that return stuff other than the complete
file's contents, like vc-log or vc-dff or calls that return file names
etc.



Reply | Threaded
Open this post in threaded view
|

bug#34350: 27.0.50; ediff-revision broken with SVN backend + non ascii chars both in directory and in filename

Stefan Monnier
>> >> I think intuitively, in terms of encoding for the file's contents the
>> >> backends should always return a byte-sequence (i.e. with no-conversion)
>> >> and the front-end should then decide how to decode it (e.g., obeying
>> >> the -*- coding -*- cookie and such).
>> > Why do we need to leave this to the front end?
>>
>> So that it's not half-implemented differently in every backend.
>
> But every back-end has its own peculiar ways to do the job.  E.g., Git
> needs to call "git ls-files" first, and we then need to read the file
> names before calling "git cat".  Other VCSes don't necessarily need
> that.  So different implementations cannot be worked around and no
> higher layers can even know what the lower layers do to set up
> encoding correctly for all of them, at least not in most cases.

Then the front/middle end shouldn't set anything up for the backend, but
the backend still needs to return just undecoded bytes.

>> > It's a waste of cycles to do decoding manually in Lisp,
>>
>> It'd be better to decode "on the fly" rather than first insert the byte
>> stream in a buffer and then decode it.  No doubt.
>> But I can't see how to do that and handle -*-coding-*-,
>> auto-coding-regexp-alist, and friends.
> What do you mean by "how"?  Just do the normal I/O, and all of those
> will be taken care of.  Like when we visit a file.  What am I missing?

The `find-revision` backend operation must put the revision's bytes into
the provided buffer, indeed similarly to when we visit a file.
But the difference is that in 99% of the cases, the bytes don't come
from a file but from a process's stdout, so the backed can't directly
use the "visit a file" trick.

> So you are saying that no other VC function cal call the backend's
> find-revision method?

That's the idea, yes.

> But we already have 2 more functions that do
> it, and I see no problems with doing that.

Then we need an additional middle layer.

>> All other code should be layered on top of that one, so it's done at
>> only one place.  If that doesn't work (because vc-find-revision is
>> not flexible enough) then we should move this decoding code to a
>> middle-layer between vc-find-revision and (vc-call find-revision
>> ...) that all users of `find-revision` can then use.
>
> I don't think this is possible in general, because different callers
> have different needs wrt to encoding/decoding.

Hmm... looking at the code I see 3 places where we call the
`find-revision` backend:

- vc-find-revision-save
- vc-find-revision-no-save
- vc-default-revert

The last one should indeed call the backend directly (as it currently
does) and should be changed not to bind coding-system-for-read/write and
instead to assume that the backend deals with bytes.

The other two are begging to be unified to reduce code redundancy and
are the ones that need the do the file-like decoding (and they indeed do
it).

> You forget VCS operations that return stuff other than the complete
> file's contents, like vc-log or vc-dff or calls that return file names
> etc.

Not really forgetting, no.  Instead I was talking specifically about
things like `find-revision` (i.e. about the content of files).
For filenames, commit messages and other metadata the situation is quite
different, indeed.


        Stefan



Reply | Threaded
Open this post in threaded view
|

bug#34350: 27.0.50; ediff-revision broken with SVN backend + non ascii chars both in directory and in filename

Dmitry Gutov
In reply to this post by Eli Zaretskii
On 09.02.2019 00:45, Eli Zaretskii wrote:
> How is vc-annotate relevant to vc-git-find-revision and the issue at
> hand?

It's an investigation tool.

> coding-system-for-write
> affects both I/O from and to a subprocess and the encoding of the
> command-line arguments we pass to that subprocess.

Aren't there any programs that expect one encoding for their arguments,
yet output contents in other encodings sometimes?



Reply | Threaded
Open this post in threaded view
|

bug#34350: 27.0.50; ediff-revision broken with SVN backend + non ascii chars both in directory and in filename

Eli Zaretskii
> Cc: [hidden email], [hidden email],
>  [hidden email]
> From: Dmitry Gutov <[hidden email]>
> Date: Sat, 9 Feb 2019 10:58:38 +0300
>
> > coding-system-for-write
> > affects both I/O from and to a subprocess and the encoding of the
> > command-line arguments we pass to that subprocess.
>
> Aren't there any programs that expect one encoding for their arguments,
> yet output contents in other encodings sometimes?

It happens, yes.  Which is why by default coding-system-for-write is
not bound.  When you bind it while invoking a subprocess, you need to
know what you are doing.  If there's a conflict between the two
encodings, there are a couple of ways to deal with that.

However, in the case in point, my reading of the code indicates that
there's no need to bind coding-system-for-write at all.  It seems we
do that due to some copy/paste habit more than due to a real need.
Let's see if I succeed in convincing you and Stefan.



Reply | Threaded
Open this post in threaded view
|

bug#34350: 27.0.50; ediff-revision broken with SVN backend + non ascii chars both in directory and in filename

Eli Zaretskii
In reply to this post by Stefan Monnier
> From: Stefan Monnier <[hidden email]>
> Cc: [hidden email], [hidden email]
> Date: Fri, 08 Feb 2019 18:03:49 -0500
>
> >> > It's a waste of cycles to do decoding manually in Lisp,
> >>
> >> It'd be better to decode "on the fly" rather than first insert the byte
> >> stream in a buffer and then decode it.  No doubt.
> >> But I can't see how to do that and handle -*-coding-*-,
> >> auto-coding-regexp-alist, and friends.
> > What do you mean by "how"?  Just do the normal I/O, and all of those
> > will be taken care of.  Like when we visit a file.  What am I missing?
>
> The `find-revision` backend operation must put the revision's bytes into
> the provided buffer, indeed similarly to when we visit a file.
> But the difference is that in 99% of the cases, the bytes don't come
> from a file but from a process's stdout, so the backed can't directly
> use the "visit a file" trick.

Isn't that sub-optimal design, then?  Those back-ends that can produce
a file for a given revision (there are at least 3 of them, AFAIK)
should be left to their devices; those which cannot and use some kind
of 'cat' command instead can be invoked with output redirected to a
file.  This should be lightning-fast with most (all?) VCSes.  Instead,
we invoke the VCS with output redirected to a pipe, slowly read that
output from the pipe into a buffer, then write the resulting buffer to
a file.  Why?

> - vc-find-revision-save
> - vc-find-revision-no-save
> - vc-default-revert
>
> The last one should indeed call the backend directly (as it currently
> does) and should be changed not to bind coding-system-for-read/write and
> instead to assume that the backend deals with bytes.
>
> The other two are begging to be unified to reduce code redundancy and
> are the ones that need the do the file-like decoding (and they indeed do
> it).

If vc-find-revision and vc-find-revision-no-save need to enforce
no-conversion, then why do most of the back-ends do that as well?  If
we decide that back-ends produce undecoded buffers, then vc.el
shouldn't be bothered with forcing coding-system-for-read/write at
all, right?  This duplication is a large part of the problem here.

In addition, while I could understand binding of
coding-system-for-read in the backend's find-revision (assuming we
want the resulting buffer remain undecoded), why should the back-end
also bind coding-system-for-write?  I see absolutely no reason for
that.  E.g., look at this example:

  (defun vc-hg-find-revision (file rev buffer)
    (let ((coding-system-for-read 'binary)
          (coding-system-for-write 'binary))
      (if rev
          (vc-hg-command buffer 0 file "cat" "-r" rev)
        (vc-hg-command buffer 0 file "cat"))))

Why on earth does this bind coding-system-for-write, when it doesn't
write anything at all, it only reads?

> > You forget VCS operations that return stuff other than the complete
> > file's contents, like vc-log or vc-dff or calls that return file names
> > etc.
>
> Not really forgetting, no.  Instead I was talking specifically about
> things like `find-revision` (i.e. about the content of files).
> For filenames, commit messages and other metadata the situation is quite
> different, indeed.

But vc-git.el, for example, uses both in its find-revision
implementation.  It therefore must use more complicated juggling with
binding the various coding-system variables.  Once again, this is an
argument in favor of leaving the encoding/decoding stuff to the
back-end.



Reply | Threaded
Open this post in threaded view
|

bug#34350: 27.0.50; ediff-revision broken with SVN backend + non ascii chars both in directory and in filename

Stefan Monnier
>> The `find-revision` backend operation must put the revision's bytes into
>> the provided buffer, indeed similarly to when we visit a file.
>> But the difference is that in 99% of the cases, the bytes don't come
>> from a file but from a process's stdout, so the backed can't directly
>> use the "visit a file" trick.
> Isn't that sub-optimal design, then?  Those back-ends that can produce
> a file for a given revision (there are at least 3 of them, AFAIK)
> should be left to their devices; those which cannot and use some kind
> of 'cat' command instead can be invoked with output redirected to a
> file.

FWIW, I find it cleaner to put the result in a buffer than in a file,
since with files we have to choose a file name, which implies things
like race conditions, accessrights/security issues, interaction with
Tramp, and whatnot.

So I think the API seems clean enough.  Doesn't insert-file-contents do
all the decoding dance done in find-file (such as
auto-coding-regexp-alist, ...)?

> Instead, we invoke the VCS with output redirected to a pipe, slowly
> read that output from the pipe into a buffer,

Why should reading from a pipe be slower than from a file?

> then write the resulting buffer to a file.  Why?

The front-end part may want to do that, but not necessarily.
We did that back in the CVS days because it took a long time to extract
a particular revision, so this was a way to cache the result.

With Git, I find that I basically never want to save the result to
a file.

>> - vc-find-revision-save
>> - vc-find-revision-no-save
>> - vc-default-revert
>>
>> The last one should indeed call the backend directly (as it currently
>> does) and should be changed not to bind coding-system-for-read/write and
>> instead to assume that the backend deals with bytes.
>>
>> The other two are begging to be unified to reduce code redundancy and
>> are the ones that need the do the file-like decoding (and they indeed do
>> it).
>
> If vc-find-revision and vc-find-revision-no-save need to enforce
> no-conversion, then why do most of the back-ends do that as well?

The front-end functions (vc-find-revision-* and vc-default-revert)
should not bind coding-system-for-read/write and should leave that to
the backend, since it seems the backends need to do this kind of
coding-system control more finely.

> If we decide that back-ends produce undecoded buffers, then vc.el
> shouldn't be bothered with forcing coding-system-for-read/write at
> all, right?

Right.

> In addition, while I could understand binding of
> coding-system-for-read in the backend's find-revision (assuming we
> want the resulting buffer remain undecoded), why should the back-end
> also bind coding-system-for-write?  I see absolutely no reason for
> that.  E.g., look at this example:

I don't see a reason either, other than the coder not being sure which
of read/write influences which of send/receive, maybe.

>   (defun vc-hg-find-revision (file rev buffer)
>     (let ((coding-system-for-read 'binary)
>  (coding-system-for-write 'binary))
>       (if rev
>  (vc-hg-command buffer 0 file "cat" "-r" rev)
> (vc-hg-command buffer 0 file "cat"))))
>
> Why on earth does this bind coding-system-for-write, when it doesn't
> write anything at all, it only reads?

Plain bug, AFAICT.

> But vc-git.el, for example, uses both in its find-revision
> implementation.  It therefore must use more complicated juggling with
> binding the various coding-system variables.  Once again, this is an
> argument in favor of leaving the encoding/decoding stuff to the
> back-end.

Agreed: it should be the backend's responsibility to control the
encoding/decoding setup in order for the result in the buffer to be
undecoded bytes (it'd be nice to be able to do it at only one spot
instead of doing it "by hand" in each and every backend, but IIUC this
is not an option).


        Stefan



Reply | Threaded
Open this post in threaded view
|

bug#34350: 27.0.50; ediff-revision broken with SVN backend + non ascii chars both in directory and in filename

Eli Zaretskii
> From: Stefan Monnier <[hidden email]>
> Cc: [hidden email], [hidden email]
> Date: Sat, 09 Feb 2019 08:12:45 -0500
>
> > Isn't that sub-optimal design, then?  Those back-ends that can produce
> > a file for a given revision (there are at least 3 of them, AFAIK)
> > should be left to their devices; those which cannot and use some kind
> > of 'cat' command instead can be invoked with output redirected to a
> > file.
>
> FWIW, I find it cleaner to put the result in a buffer than in a file,
> since with files we have to choose a file name, which implies things
> like race conditions, accessrights/security issues, interaction with
> Tramp, and whatnot.

But vc-find-revision eventually does put it in a file, so ...

> Doesn't insert-file-contents do all the decoding dance done in
> find-file (such as auto-coding-regexp-alist, ...)?

It does (and we call find-file-noselect anyway, not
insert-file-contents).

> > Instead, we invoke the VCS with output redirected to a pipe, slowly
> > read that output from the pipe into a buffer,
>
> Why should reading from a pipe be slower than from a file?

Because we read in small chunks, and need to enlarge the gap each
time.

> The front-end functions (vc-find-revision-* and vc-default-revert)
> should not bind coding-system-for-read/write and should leave that to
> the backend, since it seems the backends need to do this kind of
> coding-system control more finely.
>
> > If we decide that back-ends produce undecoded buffers, then vc.el
> > shouldn't be bothered with forcing coding-system-for-read/write at
> > all, right?
>
> Right.
>
> > In addition, while I could understand binding of
> > coding-system-for-read in the backend's find-revision (assuming we
> > want the resulting buffer remain undecoded), why should the back-end
> > also bind coding-system-for-write?  I see absolutely no reason for
> > that.  E.g., look at this example:
>
> I don't see a reason either, other than the coder not being sure which
> of read/write influences which of send/receive, maybe.
>
> >   (defun vc-hg-find-revision (file rev buffer)
> >     (let ((coding-system-for-read 'binary)
> >  (coding-system-for-write 'binary))
> >       (if rev
> >  (vc-hg-command buffer 0 file "cat" "-r" rev)
> > (vc-hg-command buffer 0 file "cat"))))
> >
> > Why on earth does this bind coding-system-for-write, when it doesn't
> > write anything at all, it only reads?
>
> Plain bug, AFAICT.
>
> > But vc-git.el, for example, uses both in its find-revision
> > implementation.  It therefore must use more complicated juggling with
> > binding the various coding-system variables.  Once again, this is an
> > argument in favor of leaving the encoding/decoding stuff to the
> > back-end.
>
> Agreed: it should be the backend's responsibility to control the
> encoding/decoding setup in order for the result in the buffer to be
> undecoded bytes (it'd be nice to be able to do it at only one spot
> instead of doing it "by hand" in each and every backend, but IIUC this
> is not an option).

OK, so we basically agree.  I will try to remove the cruft from some
of that, thanks for the feedback.



Reply | Threaded
Open this post in threaded view
|

bug#34350: 27.0.50; ediff-revision broken with SVN backend + non ascii chars both in directory and in filename

Stefan Monnier
>> Why should reading from a pipe be slower than from a file?
> Because we read in small chunks,

I understand that what we currently do, but is it really inevitable?
I know you can't transfer an arbitrarily large text with a single `write`
at one end of a pipe and a single `read` at the other (contrary to what
happens with files), but it should be possible to use chunks large
enough that the efficiency is comparable.

> and need to enlarge the gap each time.

Yes, this used to be a real problems in set-buffer-multibyte (with O(N²)
behavior), but nowadays it should be a relatively small constant factor.
Of course, if needed, we could provide some way for the Elisp code to
"preallocate" space to avoid resizing the buffer too many times.

[ Goes on dreaming about a buffer representation made of a tree of
  immutable text-chunks... ]

> OK, so we basically agree.  I will try to remove the cruft from some
> of that, thanks for the feedback.

Right.   Maybe we could improve the backend API, but that's a side
problem: the real pressing problem is just cruft and should not be too
hard to fix.


        Stefan



Reply | Threaded
Open this post in threaded view
|

bug#34350: 27.0.50; ediff-revision broken with SVN backend + non ascii chars both in directory and in filename

Vincent Belaïche-2
In reply to this post by Dmitry Gutov

Le 09/02/2019 à 08:58, Dmitry Gutov a écrit :

> On 09.02.2019 00:45, Eli Zaretskii wrote:
>> How is vc-annotate relevant to vc-git-find-revision and the issue at
>> hand?
>
> It's an investigation tool.
>
>> coding-system-for-write affects both I/O from and to a subprocess and
>> the encoding of the command-line arguments we pass to that
>> subprocess.
>
> Aren't there any programs that expect one encoding for their
> arguments, yet output contents in other encodings sometimes?

Yes there are. For instance if you edit a document with LaTeX and use
pdflatex + koi8 for the encoding, then all the compilation error
messages showing the error messages interleaved with pieces of text will
be a byte stream where the pieces of text will correspond to
characteters coded in koi8.

But they are also interleaved with filenames which are also a byte
stream which correspond to characters encoded with some encoding
depending on the filesystem, for instance that will be utf8 for an
MSW-10 machine as it seems that my pdflatex port (MiKTeX) has them in
this format — for filenames all in ASCII that will just make no
difference because both utf8 & koi8 are supersets of ASCII. Internally,
AFAIK MSW10 uses UTF16, I speculate that my pdflatex port to MSW uses
UTF16 wchar_t to access the files and does some utf8/utf16
encoding/decoding w.r.t to input/output with tex source code/tex log.

Of course users will want to have the log shown with character decoding
using koi8, so that the pieces of text in the log are understandable, so
that will mess-up the non-ASCII filenames, for instance if the filename
is « Макрон—Узурпатор.tex » , it will show out like
« п°п╟п╨я─п╬п╫Б─■пёп╥я┐я─п©п╟я┌п╬я─.tex » in the log, and AUCTeX won't
be able to jump at the error line when you do « M-g n » — maybe someday
I submit some contribution for AUCTeX to be configurable to decode the
filenames, or maybe by that time I simply use xelatex and everything is
in utf8 :-/


On the other hand the pdflatex command will accept arguments encoded in
whatever the OS requires — I don't know what my pdflatex port internally
does, probably it has some « int wmain(int ac, wchar_t const* av[]) »
prototype and does some utf16/utf8 encoding of the command line args in
order to get a byte stream.

So, still under MSW, if my pdflatex is launched from a powershell
window, filenames and any arguments are passed to the command as wide
chars with utf16 encoding and pdflatex.

So if the filename is №1.tex, and I type from a powershell prompt:

  pdflatex №1.tex

that will work fine — provided that the file exists — because, although
the character « № » does not exist in window1252, both powershell and my
pdflatex port use utf16.

Now, if I try the same command from a cmd prompt, funnilly that also
will work — however a DOS .bat file encoded in utf-8 with the same
command, even if you launch cmd with /U option…

But if now I try this command from Emacs (with « M-x compile », or
« M-& », or directly by eval of « (call-process "pdflatex" nil t nil
"№1.tex") » that will not work either. And that will not work either
from a *shell* buffer using cmdproxy.exe. This is probably because Emacs
uses the standard system call « system(…) » function — I speculate — and
the latter accepts windows1252 encoded command line, and not the
microsoft specific _wsystem one.

BTW, I don't know whether _wsystem is ported to MinGW, I would not be
surprised if it is not : for instance « int main(int ac, wchar_t
const*av[]) » prototypes aren't ported to MinGW, they do a linker error,
and one has to resort to GetCommandLineW & CommandLineToArgvW windows
API functions. An alternative is 1) to use the powershell quote syntax
for the command line + utf16 enocoding, 2) to base-64 encode the command
line, and 3) pass the encoded block to powershell with the
-EncodedCommand option through the usual system(…)  call.


So, to summarise :

1. encoding of filenames, encoding of command line, and encoding of file
   content are 3 different things. So, it is a bit surprising if
   coding-system-for-write affects all of them in the same way.

2. filename encoding may be different on the command line and on the
   input/output streams (e.g. pdflatex called from powershell has utf16
   on the command line, and utf8 in the files for filenames.

3. filename/argument encoding depends on the command (under MSW, some
   commands have « int main(int ac, char const*av[]) » under the hood
   and as such they expect arguments to be windows1252 encoded — in
   Western Europe — other have « int main(int ac, wchar_t const*av[]) »
   under the hood and as such they expect utf16 encoded arguments. So
   with the first type of command you can have « œ » but not « № » in
   the arguments, while with the second type you can have both of them.


  V.




Reply | Threaded
Open this post in threaded view
|

bug#34350: 27.0.50; ediff-revision broken with SVN backend + non ascii chars both in directory and in filename

Vincent Belaïche-2
In reply to this post by Stefan Monnier

Le 08/02/2019 à 22:50, Stefan Monnier a écrit :

>>> I think intuitively, in terms of encoding for the file's contents
>>> the backends should always return a byte-sequence (i.e. with
>>> no-conversion) and the front-end should then decide how to decode it
>>> (e.g., obeying the -*- coding -*- cookie and such).
>> Why do we need to leave this to the front end?
>
> So that it's not half-implemented differently in every backend.
>
>> It's a waste of cycles to do decoding manually in Lisp,
>
> It'd be better to decode "on the fly" rather than first insert the
> byte stream in a buffer and then decode it.  No doubt.  But I can't
> see how to do that and handle -*-coding-*-, auto-coding-regexp-alist,
> and friends.
>
>> and it also pushes this obscure art to application levels,
>
> The "front-end" is `vc-find-revision`.  All other code should be
> layered on top of that one, so it's done at only one place.  If that
> doesn't work (because vc-find-revision is not flexible enough) then we
> should move this decoding code to a middle-layer between
> vc-find-revision and (vc-call find-revision ...) that all users of
> `find-revision` can then use.
>
>> insist on any encoding, except where the VCS requires it, and it
>
> I don't know of any VCS that enforces any kind of encoding on the
> files it manages.  AFAIK they pretty much all manage files made of
> lines where the lines are made of bytes (with some extra
> accommodations for files not made of lines).  Some try to handle
> line-end conversion to some extent, but that doesn't really affect us
> anyway.
>
>
>         Stefan

Hello,

Just one simple question : why do we need some specific code in
vc-find-revision to be in charge of selecting the coding scheme and
doing the decoding.

My understanding is that, say for svn backend, what you want to do is
to get the same as

1) svn --non-interactive cat  my_non_ascii_named_file.tex  > temp_file

2) visit temp_file


Instead of doing that redirection into a temporary file, « svn
--non-interactive cat my_non_ascii_named_file.tex » is executed in a
process, and the output is directly inserted into a buffer. Hence the
troubles…

Why not use a temporary file ?

I understand that a temporary file has several drawbacks :

1) it is probably slower, at least for small files,

2) you have to do some clean-up

3) there is a potential security breach to leave tacks on a system file.


Having had a look at find-file lisp code, I noticed that under the hood,
the function insert-file-content is what does the real job of converting
the file bytes to a buffer. Now, looking at the C code of
Finsert_file_contents in fileio.c, I noticed that there is some
provision for not regular files, like pipes.

I was wondering whether there was some intention at some point of time
to use that function in order to recollect in a unified way the output
of a process into a buffer.

Wouldn't that help with our decoding issues ?

  V.






12