bug#43351: 27.1; [PATCH] Change ASCII handling in mm-charset-to-coding-system to us-ascii

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

bug#43351: 27.1; [PATCH] Change ASCII handling in mm-charset-to-coding-system to us-ascii

Alex Bochannek-2
Hello!

This is a very small patch, but I am not confident that there aren't
other side effects, so please evaluate it carefully.

In the fix for bug#5458 (2011-06-30), a change was made to
mm-charset-to-coding-system to support "ansi.x3.4*" as an alias for
'ascii. As part of that patch 'us-ascii was also mapped to 'ascii. This
is problematic because decode-coding-string does not recognize 'ascii as
a coding system and throws an "Invalid coding system: ascii" exception.

As a result, when using gnus-article-browse-html-article (K H) to
display a text/html message that has charset=us-ascii (or presumably
also charset=ascii), the display will fail iff the header of the message
is not ASCII.

Tracing gnus-article-browse-html-parts the call chain in my test case
looks like this:

(setq hcharset (mm-find-mime-charset-region (point-min)(point-max)))
returns 'utf-8 because of the RFC 2047 encoded words in the
from-header. The HTML part has charset=us-ascii and therefore coding and
charset differ. (setq body (mm-charset-to-coding-system charset nil t))
then sets 'us-ascii to 'ascii (see above) and the attempt to transcode
the part into 'utf-8 fails at (encode-coding-string
(decode-coding-string content body) charset) That last piece of code
seems to have gone in on 2016-02-12 when removing XEmacs compat
functions from mm-util.el.

This patch no longer maps 'us-ascii and instead maps 'ascii to 'us-ascii
(The ANSI alias is untouched.) Alternatively, I could modify
gnus-article-browse-html-parts to special-case this, but I don't think
mm-charset-to-coding-system should output 'ascii if it is not a valid
coding system (anymore?) However, I don't know what else that could
possibly break, which is why I want to offer this patch with some
caution.

Please let me know if there is anything I can do to help with getting
this change accepted.

Thanks!

--
Alex. <[hidden email]>

diff --git a/lisp/gnus/mm-util.el b/lisp/gnus/mm-util.el
index 282465722d..3dc93e4ad4 100644
--- a/lisp/gnus/mm-util.el
+++ b/lisp/gnus/mm-util.el
@@ -137,9 +137,9 @@ mm-charset-to-coding-system
  (let ((cs (cdr (assq charset mm-charset-override-alist))))
    (and cs (mm-coding-system-p cs) cs))))
    ;; ascii
-   ((or (eq charset 'us-ascii)
+   ((or (eq charset 'ascii)
  (string-match "ansi.x3.4" (symbol-name charset)))
-    'ascii)
+    'us-ascii)
    ;; Check to see whether we can handle this charset.  (This depends
    ;; on there being some coding system matching each `mime-charset'
    ;; property defined, as there should be.)
Reply | Threaded
Open this post in threaded view
|

bug#43351: 27.1; [PATCH] Change ASCII handling in mm-charset-to-coding-system to us-ascii

Lars Ingebrigtsen
Alex Bochannek <[hidden email]> writes:

> This is a very small patch, but I am not confident that there aren't
> other side effects, so please evaluate it carefully.

The patch makes sense conceptually, but it can't be applied because
there's code (presumably) all over the place that depends on it
returning `ascii'.  For instance:

  (let ((cs (mm-charset-to-coding-system charset nil allow-override)))
    (cond ((eq cs 'ascii)
           (setq cs (or (mm-charset-to-coding-system mail-parse-charset)
                        'raw-text)))

> In the fix for bug#5458 (2011-06-30), a change was made to
> mm-charset-to-coding-system to support "ansi.x3.4*" as an alias for
> 'ascii. As part of that patch 'us-ascii was also mapped to 'ascii. This
> is problematic because decode-coding-string does not recognize 'ascii as
> a coding system and throws an "Invalid coding system: ascii" exception.

Indeed; ascii isn't a valid coding system...  but poking around here, I
think the function (despite its name) isn't really returning a coding
system.  I mean, it does in most cases, but not for ascii.  :-/

(The doc string of the function should be fixed.)

> As a result, when using gnus-article-browse-html-article (K H) to
> display a text/html message that has charset=us-ascii (or presumably
> also charset=ascii), the display will fail iff the header of the message
> is not ASCII.

I think this has to be changed in gnus-article-browse-html-article
instead, unfortunately.  

--
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



Reply | Threaded
Open this post in threaded view
|

bug#43351: 27.1; [PATCH] Change ASCII handling in mm-charset-to-coding-system to us-ascii

Eli Zaretskii
> From: Lars Ingebrigtsen <[hidden email]>
> Date: Sat, 12 Sep 2020 16:13:43 +0200
> Cc: [hidden email]
>
> > In the fix for bug#5458 (2011-06-30), a change was made to
> > mm-charset-to-coding-system to support "ansi.x3.4*" as an alias for
> > 'ascii. As part of that patch 'us-ascii was also mapped to 'ascii. This
> > is problematic because decode-coding-string does not recognize 'ascii as
> > a coding system and throws an "Invalid coding system: ascii" exception.
>
> Indeed; ascii isn't a valid coding system...  but poking around here, I
> think the function (despite its name) isn't really returning a coding
> system.  I mean, it does in most cases, but not for ascii.  :-/

Maybe the simplest solution is to define a coding-system-alias named
'ascii'.



Reply | Threaded
Open this post in threaded view
|

bug#43351: 27.1; [PATCH] Change ASCII handling in mm-charset-to-coding-system to us-ascii

Lars Ingebrigtsen
Eli Zaretskii <[hidden email]> writes:

> Maybe the simplest solution is to define a coding-system-alias named
> 'ascii'.

Yeah, I wondered whether the coding system was called `us-ascii', and
the charset called `ascii', to avoid some sort of confusion somewhere?

I've now made `ascii' a coding system alias, and it doesn't seem to
break anything.  I haven't documented it in the manual, because I guess
we still prefer people to use `us-ascii' as the coding system...

--
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



Reply | Threaded
Open this post in threaded view
|

bug#43351: 27.1; [PATCH] Change ASCII handling in mm-charset-to-coding-system to us-ascii

Lars Ingebrigtsen
In reply to this post by Eli Zaretskii
Alex Bochannek <[hidden email]> writes:

> I like this approach because it seems like it would do the least amount
> of potential harm. I am happy to just put a (define-coding-system-alias
> 'ascii 'us-ascii) in my .gnus, but should this possibly be done in
> mule-conf.el?

[...]

>  (define-coding-system-alias 'iso-safe 'us-ascii)
> +(define-coding-system-alias 'ascii 'us-ascii)

Yup; I just applied a very similar change to Emacs 28.  :-)

--
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no