bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file

Vincent Belaïche-2




================================================================================

I was editing some file written in Markdown. Here is the file :

https://framagit.org/latex-pourquoi-comment/lpc-articles/blob/795ecb9d4f7b8870486fe6557f01d2fe450c4461/CONTRIBUTING.md

My Emacs default configuration was to get files in latin-1. So I had
added some `coding: utf-8' cookie in this file. But it did not work, the
file was still read in latin-1 instead of utf8.

I made a test with one more cookie `eval: (message "Hello")', this one
worked, which means that the problem is not that cookies aren't read,
the problem is within the application of the coding scheme.

The only way for me to get the correct encoding is to place:

(modify-coding-system-alist 'file "\\.m\\(d\\|arkdown\\)\\'"
  'prefer-utf-8)

In my init file.

I made the trial with `emacs -q', and the problem is still there, which
shows that markdown-mode is not to blame. My first thought was that
markdown-mode was the culprit, see discussion here :
https://github.com/jrblevin/markdown-mode/issues/198

Jason Blevin is the author of markdown-mode, he noted that the presence
of the [ character has some impact. See:

https://github.com/jrblevin/markdown-mode/issues/198#issuecomment-308524696

I did not double check his analysis. To me this looks like some race
problem where the automatic encoding detection is applied after the
cookie and undoes it. Maybe some semaphore is missing, or something like
that.

   Vincent.

================================================================================


In GNU Emacs 25.2.50.1 (i686-pc-mingw32)
 of 2017-06-14 built on AIGLEROYAL
Repository revision: da62c1532e479bbac4ce242ee1d170df9c435591
Windowing system distributor 'Microsoft Corp.', version 10.0.14393
Configured using:
 'configure --prefix=c:/Nos_Programmes/GNU/Emacs --without-jpeg
 --without-tiff --without-gif --without-png 'CFLAGS= -Og -g3 -L
 C:/Programmes/installation/emacs-install/libXpm-3.5.8/src' 'CPPFLAGS=
 -DFOR_MSW=1 -I
 C:/Programmes/installation/emacs-install/libXpm-3.5.8/include -I
 C:/Programmes/installation/emacs-install/libXpm-3.5.8/src -L
 C:/Programmes/installation/emacs-install/libXpm-3.5.8/src''

Configured features:
XPM SOUND NOTIFY ACL TOOLKIT_SCROLL_BARS

Important settings:
  value of $LANG: FRA
  locale-coding-system: cp1252

Major mode: Dired by name

Minor modes in effect:
  diff-auto-refine-mode: t
  TeX-PDF-mode: t
  shell-dirtrack-mode: t
  recentf-mode: t
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  buffer-read-only: t
  line-number-mode: t
  transient-mark-mode: t

Recent messages:
Mark set [2 times]
Mark saved where search started
Quit
scroll-up-command: End of buffer
Mark set
find-dired *Find* finished.
dired-get-file-for-visit: No file on this line [2 times]
Mark set
Quit
Making completion list...

Load-path shadows:
c:/Programmes/installation/cedet-install/cedet-git/lisp/speedbar/loaddefs hides c:/Nos_Programmes/GNU/Emacs/share/emacs/25.2.50/lisp/loaddefs
c:/Programmes/installation/cedet-install/cedet-git/lisp/speedbar/loaddefs hides c:/Programmes/installation/cedet-install/cedet-git/lisp/cedet/loaddefs

Features:
(shadow emacsbug find-dired calc-yank calc-mode calccomp calc-alg
calc-vec calc-aent calc-menu cal-move whitespace perl-mode log-edit
pcvs-util eieio-opt speedbar sb-image ezimage dframe vc-bzr vc-src
vc-sccs vc-svn vc-rcs vc-dir ewoc add-log org-element org-rmail org-mhe
org-irc org-info org-gnus org-docview doc-view subr-x jka-compr
image-mode org-bibtex bibtex org-bbdb org-w3m org org-macro org-footnote
org-pcomplete org-list org-faces org-entities org-version ob-emacs-lisp
ob ob-tangle ob-ref ob-lob ob-table ob-exp org-src ob-keys ob-comint
ob-core ob-eval org-compat org-macs org-loaddefs find-func cal-menu
calendar cal-loaddefs tex-info texinfo vc vc-dispatcher ediff-vers
thingatpt rect visual-basic-mode sh-script smie executable make-mode
misearch multi-isearch ediff-merg ediff-wind ediff-diff ediff-mult
ediff-help ediff-init ediff-util ediff vc-git diff-mode reftex-dcr
reftex reftex-vars preview prv-emacs noutline outline pcmpl-unix
latexenc tex-bar latex easy-mmode tex-style toolbar-x font-latex
plain-tex tex-buf tex advice tex-mode compile shell pcomplete comint
ansi-color ring bbdb-print info mailalias smtpmail sort ispell vc-cvs
hl-line balance eieio-compat calc-forms dired-aux mail-extr bbdb-message
sendmail gnus-async qp gnus-ml cursor-sensor nndraft nnmh nnfolder
bbdb-gnus bbdb-mua bbdb-com crm network-stream nsm auth-source eieio
eieio-core starttls gnus-agent gnus-srvr gnus-score score-mode nnvirtual
gnus-msg gnus-art mm-uu mml2015 mm-view mml-smime smime dig mailcap nntp
gnus-cache gnus-sum gnus-group gnus-undo gnus-start gnus-cloud nnimap
nnmail mail-source tls gnutls utf7 netrc nnoo parse-time gnus-spec
gnus-int gnus-range message dired-x dired format-spec rfc822 mml mml-sec
password-cache epg mm-decode mm-bodies mm-encode mail-parse rfc2231
rfc2047 rfc2045 ietf-drums mailabbrev gmm-utils mailheader gnus-win gnus
gnus-ems nnheader gnus-util mail-utils mm-util help-fns mail-prsvr
edmacro kmacro skeleton calc-misc calc-arith calc-ext calc calc-loaddefs
calc-macs tex-mik preview-latex tex-site auto-loads bbdb bbdb-site
timezone bbdb-loaddefs template w32utils cl-seq cl-macs cl recentf
tree-widget wid-edit load-path-to-cedet-svn finder-inf package
epg-config seq byte-opt gv bytecomp byte-compile cl-extra help-mode
easymenu cconv cl-loaddefs pcase cl-lib time-date mule-util tooltip
eldoc electric uniquify ediff-hook vc-hooks lisp-float-type mwheel
dos-w32 ls-lisp disp-table w32-win w32-vars term/common-win tool-bar dnd
fontset image regexp-opt fringe tabulated-list newcomment elisp-mode
lisp-mode prog-mode register page menu-bar rfn-eshadow timer select
scroll-bar mouse jit-lock font-lock syntax facemenu font-core frame
cl-generic cham georgian utf-8-lang misc-lang vietnamese tibetan thai
tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek romanian
slovak czech european ethiopic indian cyrillic chinese charscript
case-table epa-hook jka-cmpr-hook help simple abbrev minibuffer
cl-preloaded nadvice loaddefs button faces cus-face macroexp files
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget hashtable-print-readable backquote w32notify w32 multi-tty
make-network-process emacs)

Memory information:
((conses 8 899957 158092)
 (symbols 32 53590 0)
 (miscs 32 2257 2796)
 (strings 16 133750 20600)
 (string-bytes 1 5975277)
 (vectors 8 55330)
 (vector-slots 4 1716681 54830)
 (floats 8 651 494)
 (intervals 28 72632 8079)
 (buffers 516 78))

---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus




Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file

Eli Zaretskii
> From: [hidden email] (Vincent Belaïche)
> Date: Fri, 16 Jun 2017 12:00:06 +0200
> Cc: Vincent Belaïche <[hidden email]>
>
> I was editing some file written in Markdown. Here is the file :
>
> https://framagit.org/latex-pourquoi-comment/lpc-articles/blob/795ecb9d4f7b8870486fe6557f01d2fe450c4461/CONTRIBUTING.md
>
> My Emacs default configuration was to get files in latin-1. So I had
> added some `coding: utf-8' cookie in this file. But it did not work, the
> file was still read in latin-1 instead of utf8.

I cannot reproduce this, and I don't see any coding cookies in the
file I downloaded.

Please provide a minimal recipe that's required to reproduce the
problem.  In particular, since you tried in "emacs -q", I don't
understand what does it mean that your default configuration is
latin-1: in "emacs -q" your default configuration is determined by
your system locale.

Thanks.



Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file

Vincent Belaïche-2
In reply to this post by Vincent Belaïche-2
Le 16/06/2017 à 14:59, Eli Zaretskii a écrit :

>> From: [hidden email] (Vincent Belaïche)
>> Date: Fri, 16 Jun 2017 12:00:06 +0200
>> Cc: Vincent Belaïche <[hidden email]>
>>
>> I was editing some file written in Markdown. Here is the file :
>>
>> https://framagit.org/latex-pourquoi-comment/lpc-articles/blob/795ecb9d4f7b8870486fe6557f01d2fe450c4461/CONTRIBUTING.md
>>
>> My Emacs default configuration was to get files in latin-1. So I had
>> added some `coding: utf-8' cookie in this file. But it did not work, the
>> file was still read in latin-1 instead of utf8.
>
> I cannot reproduce this, and I don't see any coding cookies in the
> file I downloaded.
>
> Please provide a minimal recipe that's required to reproduce the
> problem.  In particular, since you tried in "emacs -q", I don't
> understand what does it mean that your default configuration is
> latin-1: in "emacs -q" your default configuration is determined by
> your system locale.
>
> Thanks.
Attached is the file causing the issue. Recipe is just to visit the file
with emacs -q, and you see that the encoding is not taken.

For instance I get the following doc section :

--8<----8<----8<----8<----8<-- begin -->8---->8---->8---->8---->8----
### doc
Placez dans *doc* et ses sous-répertoires toute la documentation afférente au projet, sans oublier les notes et courriers électroniques importants. Vous pouvez avoir des sous-répertoires de doc contenant différents types de documents ou pour différentes phases du projet.
--8<----8<----8<----8<----8<--  end  -->8---->8---->8---->8---->8----

Instead of:

--8<----8<----8<----8<----8<-- begin -->8---->8---->8---->8---->8----
### doc
Placez dans *doc* et ses sous-répertoires toute la documentation afférente au projet, sans oublier les notes et courriers électroniques importants. Vous pouvez avoir des sous-répertoires de doc contenant différents types de documents ou pour différentes phases du projet.
--8<----8<----8<----8<----8<--  end  -->8---->8---->8---->8---->8----

  Vincent.



---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus

Guide de contribution
=====================

WorkFlow
--------
Ce projet utilise Git-flow au pied de la lettre:
* http://nvie.com/posts/a-successful-git-branching-model/

L'article de base qui donnera naissance au projet


* https://danielkummer.github.io/git-flow-cheatsheet/index.fr_FR.html

Aide mémoire français (et en d'autre traduction).


Contributions
-------------
Libre à vous de cloner le dépôt... Et de proposer des modifications.


Conventions de nomnage
======================

Arborescence de fichier
-----------------------

### doc
Placez dans *doc* et ses sous-répertoires toute la documentation afférente au projet, sans oublier les notes et courriers électroniques importants. Vous pouvez avoir des sous-répertoires de doc contenant différents types de documents ou pour différentes phases du projet.

Si vous avez besoin de documentation externe, envisager de la copier ici. Cela rendra service pour maintenir le projet si l'endroit où les données en questions étaient accessibles disparaît.


### src
Ce répertoire contient le code source du projet. Vous pouvez y faire des sous-répertoires pour différents types de code source, par exemple:

* src/inc
* src/img
* ...


### util
Répertoire contenant les utilitaires, outils et scripts spécifiques au projet.


### vendor
Si le projet utilise des bibliothèques fournies par une partie tierce ou des fichiers d'en-têtes que vous désirez archiver avec votre code, faites-le ici.


Gestionnaire de version
-----------------------
Le workflow git suit scrupuleusement git-flow.


### Branche **master**
Elle représente le dernier état installable en production du projet. Seul les administrateurs du dépôt peuvent travailler dans cette branche.


### Branche **devel**
La branche où est récolté le travail de tout le monde, des branches de développement privées. Seul la "Team" peut travailler dans cette branche.


### les branches **feature**
Chaque branche doit être Nommée de la manière suivante:

* PSEUDO-DESCRIPTION

où:

* **PSEUDO** est le pseudo de l'administrateur (le créateur) de la branche
* **DESCRIPTION** Une description en CamelCase (RaisonCreationBranche) de cette branche



[comment]: # ( Local Variables: )
[comment]: # ( coding: utf-8 )
[comment]: # ( eval: (message "Coucou") )
[comment]: # ( End: )

                               
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file

Vincent Belaïche-2
Le 16/06/2017 à 16:08, Vincent Belaïche a écrit :

> Le 16/06/2017 à 14:59, Eli Zaretskii a écrit :
>>> From: [hidden email] (Vincent Belaïche)
>>> Date: Fri, 16 Jun 2017 12:00:06 +0200
>>> Cc: Vincent Belaïche <[hidden email]>
>>>
>>> I was editing some file written in Markdown. Here is the file :
>>>
>>> https://framagit.org/latex-pourquoi-comment/lpc-articles/blob/795ecb9d4f7b8870486fe6557f01d2fe450c4461/CONTRIBUTING.md
>>>
>>> My Emacs default configuration was to get files in latin-1. So I had
>>> added some `coding: utf-8' cookie in this file. But it did not work, the
>>> file was still read in latin-1 instead of utf8.
>> I cannot reproduce this, and I don't see any coding cookies in the
>> file I downloaded.
>>
>> Please provide a minimal recipe that's required to reproduce the
>> problem.  In particular, since you tried in "emacs -q", I don't
>> understand what does it mean that your default configuration is
>> latin-1: in "emacs -q" your default configuration is determined by
>> your system locale.
>>
>> Thanks.
> Attached is the file causing the issue. Recipe is just to visit the file
> with emacs -q, and you see that the encoding is not taken.
>
> For instance I get the following doc section :
>
> --8<----8<----8<----8<----8<-- begin -->8---->8---->8---->8---->8----
> ### doc
> Placez dans *doc* et ses sous-répertoires toute la documentation afférente au projet, sans oublier les notes et courriers électroniques importants. Vous pouvez avoir des sous-répertoires de doc contenant différents types de documents ou pour différentes phases du projet.
> --8<----8<----8<----8<----8<--  end  -->8---->8---->8---->8---->8----
>
> Instead of:
>
> --8<----8<----8<----8<----8<-- begin -->8---->8---->8---->8---->8----
> ### doc
> Placez dans *doc* et ses sous-répertoires toute la documentation afférente au projet, sans oublier les notes et courriers électroniques importants. Vous pouvez avoir des sous-répertoires de doc contenant différents types de documents ou pour différentes phases du projet.
> --8<----8<----8<----8<----8<--  end  -->8---->8---->8---->8---->8----
>
>    Vincent.
>
>
>
> ---
> L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
> https://www.avast.com/antivirus
Just for the clarification, you needed to click on the open raw button
to see the cookie. I should have sent you this link :

https://framagit.org/latex-pourquoi-comment/lpc-articles/raw/795ecb9d4f7b8870486fe6557f01d2fe450c4461/CONTRIBUTING.md

Instead of the "viewer" equivalent link, where the markdown tags are
interpreted into formatting.

You cannot see the cookies with the viewer link because they are
commented out, so the viewer does not display them.

   V.



Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file

Eli Zaretskii
In reply to this post by Vincent Belaïche-2
> From: [hidden email] (Vincent Belaïche)
> Cc: Vincent Belaïche <[hidden email]>
> Date: Fri, 16 Jun 2017 16:08:09 +0200
>
> Attached is the file causing the issue. Recipe is just to visit the file
> with emacs -q, and you see that the encoding is not taken.

Your fancy comment causes this: remove the leading '[' and the problem
goes away.  Looks like regex-quoting that somehow misfires.



Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file

Vincent Belaïche-2


Le 16/06/2017 à 20:38, Eli Zaretskii a écrit :
>> From: [hidden email] (Vincent Belaïche)
>> Cc: Vincent Belaïche <[hidden email]>
>> Date: Fri, 16 Jun 2017 16:08:09 +0200
>>
>> Attached is the file causing the issue. Recipe is just to visit the file
>> with emacs -q, and you see that the encoding is not taken.
> Your fancy comment causes this: remove the leading '[' and the problem
> goes away.  Looks like regex-quoting that somehow misfires.


I used this type of comment marks after reading this discussion:

https://stackoverflow.com/questions/4823468/comments-in-markdown

   V.

---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus




Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file

Vincent Belaïche-2
In reply to this post by Eli Zaretskii


Le 16/06/2017 à 20:38, Eli Zaretskii a écrit :
>> From: [hidden email] (Vincent Belaïche)
>> Cc: Vincent Belaïche <[hidden email]>
>> Date: Fri, 16 Jun 2017 16:08:09 +0200
>>
>> Attached is the file causing the issue. Recipe is just to visit the file
>> with emacs -q, and you see that the encoding is not taken.
> Your fancy comment causes this: remove the leading '[' and the problem
> goes away.  Looks like regex-quoting that somehow misfires.

After some investigation, it seems that the bug is in regexp-quote:

(regexp-quote "[comment]: # (")

outputs

"^\\[comment]: # ( "

instead of

"^\\[comment\\]: # ( "


   Vincent.



---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus




Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file

Andreas Schwab-2
On Jun 16 2017, Vincent Belaïche <[hidden email]> wrote:

> After some investigation, it seems that the bug is in regexp-quote:
>
> (regexp-quote "[comment]: # (")
>
> outputs
>
> "^\\[comment]: # ( "
>
> instead of
>
> "^\\[comment\\]: # ( "

But `]' is not special.

(string-match "^\\[comment]: # ( " "[comment]: # ( ") => 0

Andreas.

--
Andreas Schwab, [hidden email]
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file

Vincent Belaïche-2
In reply to this post by Vincent Belaïche-2


Le 16/06/2017 à 21:15, Vincent Belaïche a écrit :

>
>
> Le 16/06/2017 à 20:38, Eli Zaretskii a écrit :
>>> From: [hidden email] (Vincent Belaïche)
>>> Cc: Vincent Belaïche <[hidden email]>
>>> Date: Fri, 16 Jun 2017 16:08:09 +0200
>>>
>>> Attached is the file causing the issue. Recipe is just to visit the
>>> file
>>> with emacs -q, and you see that the encoding is not taken.
>> Your fancy comment causes this: remove the leading '[' and the problem
>> goes away.  Looks like regex-quoting that somehow misfires.
>
> After some investigation, it seems that the bug is in regexp-quote:
>
> (regexp-quote "[comment]: # (")
>
> outputs
>
> "^\\[comment]: # ( "
>
> instead of
>
> "^\\[comment\\]: # ( "
>
>
>   Vincent.
>
>
After some more investigation, I think that the bug is in function
insert-file-contents of fileio.c which is the one that decide and sets
the coding system well before the other local variables are looked into.

---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus




Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file

Vincent Belaïche-2
In reply to this post by Vincent Belaïche-2


Le 16/06/2017 à 21:37, Vincent Belaïche a écrit :
>
>
> Le 16/06/2017 à 21:15, Vincent Belaïche a écrit :
>>

[...]

>>
>>
> After some more investigation, I think that the bug is in function
> insert-file-contents of fileio.c which is the one that decide and sets
> the coding system well before the other local variables are looked into.

After some more investigation, in the end the find-auto-coding of
mule.el is what is called to detect the coding. This function calls some
re-coding regexp.

Here is a test function defining the same regexp.


(defun doit ()
  (interactive)
  (let* ((prefix (regexp-quote "[comment]: # ("))
         (suffix (regexp-quote ")"))
         (re-coding
          (concat
           "[\r\n]" prefix
           ;; N.B. without the \n below, the regexp can
           ;; eat newlines.
           "[ \t]*coding[ \t]*:[ \t]*\\([^ \t\r\n]+\\)[ \t]*"
           suffix "[\r\n]")))
    (message (if (looking-at re-coding) "ok" "nak"))))

I tried it with point at end of line

[comment]: # ( Local Variables: )

and it answered "ok". Now I defined this with re-search-forward instead
of looking-at:

(defun doit ()
  (interactive)
  (let* ((prefix (regexp-quote "[comment]: # ("))
         (suffix (regexp-quote ")"))
         (re-coding
          (concat
           "[\r\n]" prefix
           ;; N.B. without the \n below, the regexp can
           ;; eat newlines.
           "[ \t]*coding[ \t]*:[ \t]*\\([^ \t\r\n]+\\)[ \t]*"
           suffix "[\r\n]")))
    (message (if (re-search-forward re-coding nil t) "ok" "nak"))))

I placed the point before the coding: line, and I also got answer "ok"

So I don't think that the regexp as such is to blame. Something else
seems to happen. It is too late now, I need to go to bed...

  Vincent.


---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus




Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file

Philipp Stephani


Vincent Belaïche <[hidden email]> schrieb am Fr., 16. Juni 2017 um 23:28 Uhr:


Le 16/06/2017 à 21:37, Vincent Belaïche a écrit :
>
>
> Le 16/06/2017 à 21:15, Vincent Belaïche a écrit :
>>

[...]

>>
>>
> After some more investigation, I think that the bug is in function
> insert-file-contents of fileio.c which is the one that decide and sets
> the coding system well before the other local variables are looked into.

After some more investigation, in the end the find-auto-coding of
mule.el is what is called to detect the coding. This function calls some
re-coding regexp.

Here is a test function defining the same regexp.


(defun doit ()
  (interactive)
  (let* ((prefix (regexp-quote "[comment]: # ("))
         (suffix (regexp-quote ")"))
         (re-coding
          (concat
           "[\r\n]" prefix
           ;; N.B. without the \n below, the regexp can
           ;; eat newlines.
           "[ \t]*coding[ \t]*:[ \t]*\\([^ \t\r\n]+\\)[ \t]*"
           suffix "[\r\n]")))
    (message (if (looking-at re-coding) "ok" "nak"))))

I tried it with point at end of line

[comment]: # ( Local Variables: )

and it answered "ok". Now I defined this with re-search-forward instead
of looking-at:

(defun doit ()
  (interactive)
  (let* ((prefix (regexp-quote "[comment]: # ("))
         (suffix (regexp-quote ")"))
         (re-coding
          (concat
           "[\r\n]" prefix
           ;; N.B. without the \n below, the regexp can
           ;; eat newlines.
           "[ \t]*coding[ \t]*:[ \t]*\\([^ \t\r\n]+\\)[ \t]*"
           suffix "[\r\n]")))
    (message (if (re-search-forward re-coding nil t) "ok" "nak"))))

I placed the point before the coding: line, and I also got answer "ok"

So I don't think that the regexp as such is to blame. Something else
seems to happen. It is too late now, I need to go to bed...

  Vincent.


I think it's actually the regexp that searches for "Local Variables". The following minimal example fails for me:

(with-temp-buffer
  (insert "

[comment]: # ( Local Variables: )
[comment]: # ( coding: utf-8 )
[comment]: # ( End: )

")
(goto-char (point-min))
(re-search-forward
 "[\r\n]\\([^[\r\n]*\\)[ \t]*Local Variables:[ \t]*\\([^\r\n]*\\)[\r\n]"))

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file

Philipp Stephani


Philipp Stephani <[hidden email]> schrieb am Fr., 16. Juni 2017 um 23:34 Uhr:
Vincent Belaïche <[hidden email]> schrieb am Fr., 16. Juni 2017 um 23:28 Uhr:


Le 16/06/2017 à 21:37, Vincent Belaïche a écrit :
>
>
> Le 16/06/2017 à 21:15, Vincent Belaïche a écrit :
>>

[...]

>>
>>
> After some more investigation, I think that the bug is in function
> insert-file-contents of fileio.c which is the one that decide and sets
> the coding system well before the other local variables are looked into.

After some more investigation, in the end the find-auto-coding of
mule.el is what is called to detect the coding. This function calls some
re-coding regexp.

Here is a test function defining the same regexp.


(defun doit ()
  (interactive)
  (let* ((prefix (regexp-quote "[comment]: # ("))
         (suffix (regexp-quote ")"))
         (re-coding
          (concat
           "[\r\n]" prefix
           ;; N.B. without the \n below, the regexp can
           ;; eat newlines.
           "[ \t]*coding[ \t]*:[ \t]*\\([^ \t\r\n]+\\)[ \t]*"
           suffix "[\r\n]")))
    (message (if (looking-at re-coding) "ok" "nak"))))

I tried it with point at end of line

[comment]: # ( Local Variables: )

and it answered "ok". Now I defined this with re-search-forward instead
of looking-at:

(defun doit ()
  (interactive)
  (let* ((prefix (regexp-quote "[comment]: # ("))
         (suffix (regexp-quote ")"))
         (re-coding
          (concat
           "[\r\n]" prefix
           ;; N.B. without the \n below, the regexp can
           ;; eat newlines.
           "[ \t]*coding[ \t]*:[ \t]*\\([^ \t\r\n]+\\)[ \t]*"
           suffix "[\r\n]")))
    (message (if (re-search-forward re-coding nil t) "ok" "nak"))))

I placed the point before the coding: line, and I also got answer "ok"

So I don't think that the regexp as such is to blame. Something else
seems to happen. It is too late now, I need to go to bed...

  Vincent.


I think it's actually the regexp that searches for "Local Variables". The following minimal example fails for me:

(with-temp-buffer
  (insert "

[comment]: # ( Local Variables: )
[comment]: # ( coding: utf-8 )
[comment]: # ( End: )

")
(goto-char (point-min))
(re-search-forward
 "[\r\n]\\([^[\r\n]*\\)[ \t]*Local Variables:[ \t]*\\([^\r\n]*\\)[\r\n]"))


Does anybody know why the second character range says [^[\r\n] instead of  [^\r\n]? This seems to explicitly exclude a leading [.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file

Philipp Stephani


Philipp Stephani <[hidden email]> schrieb am Fr., 16. Juni 2017 um 23:39 Uhr:
Philipp Stephani <[hidden email]> schrieb am Fr., 16. Juni 2017 um 23:34 Uhr:
Vincent Belaïche <[hidden email]> schrieb am Fr., 16. Juni 2017 um 23:28 Uhr:


Le 16/06/2017 à 21:37, Vincent Belaïche a écrit :
>
>
> Le 16/06/2017 à 21:15, Vincent Belaïche a écrit :
>>

[...]

>>
>>
> After some more investigation, I think that the bug is in function
> insert-file-contents of fileio.c which is the one that decide and sets
> the coding system well before the other local variables are looked into.

After some more investigation, in the end the find-auto-coding of
mule.el is what is called to detect the coding. This function calls some
re-coding regexp.

Here is a test function defining the same regexp.


(defun doit ()
  (interactive)
  (let* ((prefix (regexp-quote "[comment]: # ("))
         (suffix (regexp-quote ")"))
         (re-coding
          (concat
           "[\r\n]" prefix
           ;; N.B. without the \n below, the regexp can
           ;; eat newlines.
           "[ \t]*coding[ \t]*:[ \t]*\\([^ \t\r\n]+\\)[ \t]*"
           suffix "[\r\n]")))
    (message (if (looking-at re-coding) "ok" "nak"))))

I tried it with point at end of line

[comment]: # ( Local Variables: )

and it answered "ok". Now I defined this with re-search-forward instead
of looking-at:

(defun doit ()
  (interactive)
  (let* ((prefix (regexp-quote "[comment]: # ("))
         (suffix (regexp-quote ")"))
         (re-coding
          (concat
           "[\r\n]" prefix
           ;; N.B. without the \n below, the regexp can
           ;; eat newlines.
           "[ \t]*coding[ \t]*:[ \t]*\\([^ \t\r\n]+\\)[ \t]*"
           suffix "[\r\n]")))
    (message (if (re-search-forward re-coding nil t) "ok" "nak"))))

I placed the point before the coding: line, and I also got answer "ok"

So I don't think that the regexp as such is to blame. Something else
seems to happen. It is too late now, I need to go to bed...

  Vincent.


I think it's actually the regexp that searches for "Local Variables". The following minimal example fails for me:

(with-temp-buffer
  (insert "

[comment]: # ( Local Variables: )
[comment]: # ( coding: utf-8 )
[comment]: # ( End: )

")
(goto-char (point-min))
(re-search-forward
 "[\r\n]\\([^[\r\n]*\\)[ \t]*Local Variables:[ \t]*\\([^\r\n]*\\)[\r\n]"))


Does anybody know why the second character range says [^[\r\n] instead of  [^\r\n]? This seems to explicitly exclude a leading [.

If this is a typo, then here's a patch. 

0001-Allow-local-variables-section-to-begin-with-a-square-b.txt (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file

Vincent Belaïche-2
In reply to this post by Vincent Belaïche-2


Le 16/06/2017 à 21:37, Vincent Belaïche a écrit :
>
>
> Le 16/06/2017 à 21:15, Vincent Belaïche a écrit :
>>

[...]

>>
>>
> After some more investigation, I think that the bug is in function
> insert-file-contents of fileio.c which is the one that decide and sets
> the coding system well before the other local variables are looked into.

I have located the bug.

After some more investigation, in the end the find-auto-coding of
mule.el is what is called to detect the coding.

This function evaluates this expression to find the local variables:

 (re-search-forward
               "[\r\n]\\([^[\r\n]*\\)[ \t]*Local Variables:[ \t]*\\([^\r\n]*\\)[\r\n]"
               tail-end t)

This expression evaluates to nil over file CONTRIBUTING.md

I can make a simple fix if you tell me on which branch to do it.

However I think that the root of the problem is poor code factorization
of local variable parsing between mule.el and file.el. A better, more
futureproof fix would be some unique local variable parser with some
input constrain telling what sort of setting are sought. The output of
the parse could be used in file.el and mule.el.

  Vincent.


---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus




Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file

Vincent Belaïche-2


Le 17/06/2017 à 00:09, Vincent Belaïche a écrit :

>
> Le 16/06/2017 à 21:37, Vincent Belaïche a écrit :
>>
>> Le 16/06/2017 à 21:15, Vincent Belaïche a écrit :
> [...]
>
>>>
>> After some more investigation, I think that the bug is in function
>> insert-file-contents of fileio.c which is the one that decide and sets
>> the coding system well before the other local variables are looked into.
> I have located the bug.
>
> After some more investigation, in the end the find-auto-coding of
> mule.el is what is called to detect the coding.
>
> This function evaluates this expression to find the local variables:
>
>   (re-search-forward
>       "[\r\n]\\([^[\r\n]*\\)[ \t]*Local Variables:[ \t]*\\([^\r\n]*\\)[\r\n]"
>       tail-end t)
>
> This expression evaluates to nil over file CONTRIBUTING.md
>
> I can make a simple fix if you tell me on which branch to do it.
>
> However I think that the root of the problem is poor code factorization
> of local variable parsing between mule.el and file.el. A better, more
> futureproof fix would be some unique local variable parser with some
> input constrain telling what sort of setting are sought. The output of
> the parse could be used in file.el and mule.el.
>
>    Vincent.
>
>
Ooops... my lengthy email of T23:34 was unwantedly sent. A shorter
version with only the conclusion and w/o all the details of my
investigation is above.

Anyway, Philipp's patch is what I had in mind as a quick fix. Although I
don't think that this is a good solution not to factorize code when
possible. Factorizing makes it more maintainable.

  V.

---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus




Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file

Vincent Belaïche-2


Le 17/06/2017 à 00:23, Vincent Belaïche a écrit :

>
>
> Le 17/06/2017 à 00:09, Vincent Belaïche a écrit :
>>
>> Le 16/06/2017 à 21:37, Vincent Belaïche a écrit :
>>>
>>> Le 16/06/2017 à 21:15, Vincent Belaïche a écrit :
>> [...]
>>
>>>>
>>> After some more investigation, I think that the bug is in function
>>> insert-file-contents of fileio.c which is the one that decide and sets
>>> the coding system well before the other local variables are looked into.
>> I have located the bug.
>>
>> After some more investigation, in the end the find-auto-coding of
>> mule.el is what is called to detect the coding.
>>
>> This function evaluates this expression to find the local variables:
>>
>>   (re-search-forward
>>            "[\r\n]\\([^[\r\n]*\\)[ \t]*Local Variables:[ \t]*\\([^\r\n]*\\)[\r\n]"
>>            tail-end t)
>>
>> This expression evaluates to nil over file CONTRIBUTING.md
>>
>> I can make a simple fix if you tell me on which branch to do it.
>>
>> However I think that the root of the problem is poor code factorization
>> of local variable parsing between mule.el and file.el. A better, more
>> futureproof fix would be some unique local variable parser with some
>> input constrain telling what sort of setting are sought. The output of
>> the parse could be used in file.el and mule.el.
>>
>>    Vincent.
>>
>>
> Ooops... my lengthy email of T23:34 was unwantedly sent. A shorter
> version with only the conclusion and w/o all the details of my
> investigation is above.
>
> Anyway, Philipp's patch is what I had in mind as a quick fix. Although I
> don't think that this is a good solution not to factorize code when
> possible. Factorizing makes it more maintainable.
>
>  V.

Just to mention the following points noted by me when comparing the code
in find-auto-coding and in hack-local-variables:

* In hack-local-variables the tailing local variables section is
  considered to be at max 3000 characters from eob, while in
  find-auto-coding it is considered to be 3072. The « correct » figure
  should be 3072, not 3000, for consistency with « 1024 * 3 » code in
  function Finsert_file_contents of fileio.c :

                  if (nread == 1024)
                    {
                      int ntail;
                      if (lseek (fd, - (1024 * 3), SEEK_END) < 0)
                        report_file_error ("Setting file position",
                                           orig_filename);
                      ntail = emacs_read_quit (fd, read_buf + nread, 1024 * 3);
                      nread = ntail < 0 ? ntail : nread + ntail;
                    }

   Maybe the exact value should be in some constant.

* In find-auto-coding there is no such thing as regexp operator "^" (for
  bol) or "$" (for eol) used, instead there is "[\r\n]". I suspect that
  this is because at this stage the coding system is not yet set, and
  therefore there is no such thing as bol or eol, the whole buffer is a
  single line. If as such, I withdraw my previous statement that code
  factorization is desirable.


* In both cases what is sought for is the *FIRST* occurrence searched
  *FORWARD* of case sensitive string "Local Variables:" in the buffer
  tailing 3000--3072 characters. I think that this is a problem and that
  either we should search it *BACKWARD* or after finding the 1st
  occurrence, possible subsequent occurrences should be searched for,
  and the last occurrence should be considered instead. I say this
  because with emacs-template package it is possible that the template
  file has some local variables in the template definition section that
  differ from that of template itself. See
                (info "(template) DefSect")
  For instance the end of the template file would be as follow:


--8<----8<----8<----8<----8<-- begin -->8---->8---->8---->8---->8----

... blah blah blah template content ...

// Local Variables:
// toto: "tata"
// End:

>>>TEMPLATE-DEFINITION-SECTION<<<

... blah blah blah Lisp Template rules ...

;; Local Variables:
;; foo: "bar"
;; End:
--8<----8<----8<----8<----8<--  end  -->8---->8---->8---->8---->8----

  Maybe preventing the [ character in the prefix string is not a typo
  but was some intentional design to allow preventing false detection of
  the local variable section. I strongly recommend that before doing any
  fix, somebody dig in file history to find when and *WHY* this [
  preventing has been introduced --- sorry, but I do not volunteer for
  this tedious/time consuming kind of work...

   Vincent.

---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus




Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file

Philipp Stephani
In reply to this post by Vincent Belaïche-2


Vincent Belaïche <[hidden email]> schrieb am Sa., 17. Juni 2017 um 00:23 Uhr:


Le 17/06/2017 à 00:09, Vincent Belaïche a écrit :
>
> Le 16/06/2017 à 21:37, Vincent Belaïche a écrit :
>>
>> Le 16/06/2017 à 21:15, Vincent Belaïche a écrit :
> [...]
>
>>>
>> After some more investigation, I think that the bug is in function
>> insert-file-contents of fileio.c which is the one that decide and sets
>> the coding system well before the other local variables are looked into.
> I have located the bug.
>
> After some more investigation, in the end the find-auto-coding of
> mule.el is what is called to detect the coding.
>
> This function evaluates this expression to find the local variables:
>
>   (re-search-forward
>              "[\r\n]\\([^[\r\n]*\\)[ \t]*Local Variables:[ \t]*\\([^\r\n]*\\)[\r\n]"
>              tail-end t)
>
> This expression evaluates to nil over file CONTRIBUTING.md
>
> I can make a simple fix if you tell me on which branch to do it.
>
> However I think that the root of the problem is poor code factorization
> of local variable parsing between mule.el and file.el. A better, more
> futureproof fix would be some unique local variable parser with some
> input constrain telling what sort of setting are sought. The output of
> the parse could be used in file.el and mule.el.
>
>    Vincent.
>
>
Ooops... my lengthy email of T23:34 was unwantedly sent. A shorter
version with only the conclusion and w/o all the details of my
investigation is above.

Anyway, Philipp's patch is what I had in mind as a quick fix.

OK, I've pushed this commit as c3813b2aa8d2f5a625195fdbbfe6a01a602d7735.
 
Although I
don't think that this is a good solution not to factorize code when
possible. Factorizing makes it more maintainable.

Agreed. Note that there's a third place in Emacs that parses a subset of file-local variables: lread.c, to detect the lexical-binding variable when loading ELisp files. Ideally that would be merged as well. 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file

Philipp Stephani
In reply to this post by Vincent Belaïche-2


Vincent Belaïche <[hidden email]> schrieb am Sa., 17. Juni 2017 um 07:45 Uhr:


Le 17/06/2017 à 00:23, Vincent Belaïche a écrit :
>
>
* In find-auto-coding there is no such thing as regexp operator "^" (for
  bol) or "$" (for eol) used, instead there is "[\r\n]". I suspect that
  this is because at this stage the coding system is not yet set, and
  therefore there is no such thing as bol or eol, the whole buffer is a
  single line. If as such, I withdraw my previous statement that code
  factorization is desirable.

Why? It's a small variant that should be distinguishable using a parameter to a shared function, such as:

enum file_local_flags {
  file_local_flag_default = 0x0,
  file_local_flag_use_bol_eol = 0x1,
  file_local_flag_search_trailer = 0x2,
};
Lisp_Object get_file_local_variable_value (Lisp_Object name, enum file_local_flags flags);
 


* In both cases what is sought for is the *FIRST* occurrence searched
  *FORWARD* of case sensitive string "Local Variables:" in the buffer
  tailing 3000--3072 characters. I think that this is a problem and that
  either we should search it *BACKWARD* or after finding the 1st
  occurrence, possible subsequent occurrences should be searched for,
  and the last occurrence should be considered instead.

Yes, that would be consistent with normal file-local variables.
 

  Maybe preventing the [ character in the prefix string is not a typo
  but was some intentional design to allow preventing false detection of
  the local variable section. I strongly recommend that before doing any
  fix, somebody dig in file history to find when and *WHY* this [
  preventing has been introduced --- sorry, but I do not volunteer for
  this tedious/time consuming kind of work...


With git-blame it's not really tedious. Commit 6b61353c0a0320ee15bb6488149735381fed62ec replaced ^\\(.*\\)[ \t]* with [\r\n]\\([^[\r\n]*\\)[ \t]*, so I think it's almost certain this is a typo (the previous regex didn't exclude the [ either). Anyway, if people want this to stay, they should have added a comment.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file

Vincent Belaïche-2
[...]
>
> With git-blame it's not really tedious. Commit
> 6b61353c0a0320ee15bb6488149735381fed62ec replaced ^\\(.*\\)[ \t]* with
> [\r\n]\\([^[\r\n]*\\)[ \t]*, so I think it's almost certain this is a
> typo (the previous regex didn't exclude the [ either). Anyway, if
> people want this to stay, they should have added a comment.

Thank you, I had a look at Wikipedia for the QWERTY keyboard layout (I
have a French keyboard and the layout is somehow different for \ and ]).

Modern QWERTY layout is as follows:

1 2 3 4 5 6 7 8 9 0 - =
Q W E R T Y U I O P [ ] \
A S D F G H J K L ; '
Z X C V B N M , . /

So ] is just next to \.

So, yes, definitely this is a typo, the author had too big a finger when
hitting \.

Concerning factorization, couldn't one use [\n\r] in all cases rather
than a switch based on some input argument ?

I was also wondering whether it is not possible to have a single regexp
for the whole Local Variable section. The following `doit' function is a
trial to do so. `M-x doit' will seach forward the whole Local Variables
section and display "ok" if found, "nak" otherwise.

(defun doit ()
  (interactive)
  (let* ((eol "\\(\r\n?\\|\n\\)")
         (eol-again "\\1")
         (space-maybe "[ \t]*")
         ;; suffix may be the empty string
         (suffix  "\\([^ \r\n]+\\(?:[^\r\n]*[^ \r\n]\\)?\\|\\)")
         (prefix "\\([ \t]*[^ \r\n]+\\(?:[^\r\n]*[^ \r\n]\\)?\\)")
         (prefix-again "\\2")
         (suffix-again "\\3")
         (symbol: "\\(?:\\(?:[^][()'\" \t\r\n]\\|\\\\[][()'\" \t]\\)+[ \t]*:\\)")
         (sexp (concat "\\(?:" (substring prefix 2))))

    (message (if (and (re-search-forward
                  (concat eol
                          prefix space-maybe "Local Variables:" space-maybe suffix space-maybe eol-again
                          "\\(?:" prefix space-maybe symbol:  sexp space-maybe suffix-again space-maybe eol-again "\\)*"
                          prefix space-maybe "End:" space-maybe suffix space-maybe "\\(" eol-again "\\)?"
                          )
                  nil t)
                  ;; when the tailing eol is not there we must be at EOB.
                  (or (match-string 3) (eobp)))
                                    "ok" "nak"))))



   Vincent.



---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus




Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file

Philipp Stephani


Vincent Belaïche <[hidden email]> schrieb am Mo., 19. Juni 2017 um 12:51 Uhr:

Concerning factorization, couldn't one use [\n\r] in all cases rather
than a switch based on some input argument ?

It should be possible, but it slightly changes the behavior of file-local variables. I wouldn't expect anything to break though. 
 

I was also wondering whether it is not possible to have a single regexp
for the whole Local Variable section. The following `doit' function is a
trial to do so. `M-x doit' will seach forward the whole Local Variables
section and display "ok" if found, "nak" otherwise.

(defun doit ()
  (interactive)
  (let* ((eol "\\(\r\n?\\|\n\\)")
         (eol-again "\\1")
         (space-maybe "[ \t]*")
         ;; suffix may be the empty string
         (suffix  "\\([^ \r\n]+\\(?:[^\r\n]*[^ \r\n]\\)?\\|\\)")
         (prefix "\\([ \t]*[^ \r\n]+\\(?:[^\r\n]*[^ \r\n]\\)?\\)")
         (prefix-again "\\2")
         (suffix-again "\\3")
         (symbol: "\\(?:\\(?:[^][()'\" \t\r\n]\\|\\\\[][()'\" \t]\\)+[ \t]*:\\)")
         (sexp (concat "\\(?:" (substring prefix 2))))

    (message (if (and (re-search-forward
                  (concat eol
                          prefix space-maybe "Local Variables:" space-maybe suffix space-maybe eol-again
                          "\\(?:" prefix space-maybe symbol:  sexp space-maybe suffix-again space-maybe eol-again "\\)*"
                          prefix space-maybe "End:" space-maybe suffix space-maybe "\\(" eol-again "\\)?"
                          )
                  nil t)
                  ;; when the tailing eol is not there we must be at EOB.
                  (or (match-string 3) (eobp)))
                                    "ok" "nak"))))



Looks good. Consider using `rx' for complex regexes, in my experiences it increases readability a lot. 
12
Loading...