bug#31062: 26.0.91; warning on UTF-8 encoding of unibyte text

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

bug#31062: 26.0.91; warning on UTF-8 encoding of unibyte text

Charles A. Roelli
(This test case assumes a locale-coding-system of utf-8-unix, and
LANG: en_GB.UTF-8 or anything similar.)

emacs -q
C-x b test RET
M-: (insert-byte 195 1) RET
M-: (insert-byte 188 1) RET > buffer text should look like \303\274
C-x C-s /tmp/foo RET > the path is irrelevant

There's this warning:

These default coding systems were tried to encode text
in the buffer ‘test’:
  (utf-8-unix (1 . 4194243) (2 . 4194236))
However, each of them encountered characters it couldn’t encode:
  utf-8-unix cannot encode these: \303 \274

Is the text "(1 . 4194243) (2 . 4194236)" useful here?  It looks like
it's there by accident.  If it is helpful, could someone please
explain what it means?



Reply | Threaded
Open this post in threaded view
|

bug#31062: 26.0.91; warning on UTF-8 encoding of unibyte text

Eli Zaretskii
> Date: Wed, 04 Apr 2018 20:26:52 +0200
> From: [hidden email] (Charles A. Roelli)
>
> emacs -q
> C-x b test RET
> M-: (insert-byte 195 1) RET
> M-: (insert-byte 188 1) RET > buffer text should look like \303\274
> C-x C-s /tmp/foo RET > the path is irrelevant
>
> There's this warning:
>
> These default coding systems were tried to encode text
> in the buffer ‘test’:
>   (utf-8-unix (1 . 4194243) (2 . 4194236))
> However, each of them encountered characters it couldn’t encode:
>   utf-8-unix cannot encode these: \303 \274
>
> Is the text "(1 . 4194243) (2 . 4194236)" useful here?

It shows the positions and the codepoints of the offending characters,
and the coding-system that was tried.



Reply | Threaded
Open this post in threaded view
|

bug#31062: 26.0.91; warning on UTF-8 encoding of unibyte text

Charles A. Roelli
> Date: Wed, 04 Apr 2018 22:20:55 +0300
> From: Eli Zaretskii <[hidden email]>
>
> > There's this warning:
> >
> > These default coding systems were tried to encode text
> > in the buffer ‘test’:
> >   (utf-8-unix (1 . 4194243) (2 . 4194236))
> > However, each of them encountered characters it couldn’t encode:
> >   utf-8-unix cannot encode these: \303 \274
> >
> > Is the text "(1 . 4194243) (2 . 4194236)" useful here?
>
> It shows the positions and the codepoints of the offending characters,
> and the coding-system that was tried.

Thank you for clarifying.  Could we write something like,

> These default coding systems were tried to encode text in the buffer
> 'test', but failed for the listed (POSITION . CODEPOINT) elements:

to make that clear to the user?



Reply | Threaded
Open this post in threaded view
|

bug#31062: 26.0.91; warning on UTF-8 encoding of unibyte text

Eli Zaretskii
> Date: Thu, 05 Apr 2018 20:27:15 +0200
> From: [hidden email] (Charles A. Roelli)
> CC: [hidden email]
>
> > These default coding systems were tried to encode text in the buffer
> > 'test', but failed for the listed (POSITION . CODEPOINT) elements:
>
> to make that clear to the user?

Feel free to suggest a patch, but the list includes the coding-systems
tried, not just positions and codepoints.



Reply | Threaded
Open this post in threaded view
|

bug#31062: 26.0.91; warning on UTF-8 encoding of unibyte text

Charles A. Roelli
> Date: Thu, 05 Apr 2018 21:47:11 +0300
> From: Eli Zaretskii <[hidden email]>
>
> > > These default coding systems were tried to encode text in the buffer
> > > 'test', but failed for the listed (POSITION . CODEPOINT) elements:
> >
> > to make that clear to the user?
>
> Feel free to suggest a patch, but the list includes the coding-systems
> tried, not just positions and codepoints.

That's true, but after looking at the code of
select-safe-coding-system-interactively, it seems that the "rejected"
list is also printed in the same run as "unsafe", and "rejected" is
indeed a list of coding systems.

            (insert
             "These default coding systems were tried to encode"
             (if (stringp from)
                 (concat " \"" (if (> (length from) 10)
                                   (concat (substring from 0 10) "...\"")
                                 (concat from "\"")))
               (format-message " text\nin the buffer `%s'" bufname))
             ":\n")
            (let ((pos (point))
                  (fill-prefix "  "))
              (dolist (x (append rejected unsafe)) ← "rejected" printed here
                (princ "  ") (princ x))
              (insert "\n")
              (fill-region-as-paragraph pos (point)))

Strangely, the "rejected" list is then printed again, if it's non-nil:

            (when rejected
              (insert "These safely encode the text in the buffer,
but are not recommended for encoding text in this context,
e.g., for sending an email message.\n ")
              (dolist (x rejected)
                (princ " ") (princ x))
              (insert "\n"))

One solution might be to only print the "rejected" list in this second
form, and in the first form explain more clearly what is the meaning
of the elements in the "unsafe" list.