Changes to message-mode and encoding in Emacs26

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Changes to message-mode and encoding in Emacs26

Alex Bennée

Hi,

I've just recently updated to the Emacs 26 branch and I've run into an
odd problem with message encoding. Despite (I think) having utf-8 set
throughout when sending email I get a warning complaining that utf-8
can't encode my name.

While composing everything works fine, the character is described as:

            character: é (displayed as é) (codepoint 233, #o351, #xe9)
    preferred charset: unicode-bmp (Unicode Basic Multilingual Plane (U+0000..U+FFFF))
code point in charset: 0xE9
               script: latin
               syntax: w which means: word
             category: .:Base, L:Left-to-right (strong), c:Chinese, j:Japanese, l:Latin, v:Viet
             to input: type "C-x 8 RET e9" or "C-x 8 RET LATIN SMALL LETTER E WITH ACUTE"
          buffer code: #xC3 #xA9
            file code: #xC3 #xA9 (encoded by coding system utf-8-unix)
              display: terminal code #xC3 #xA9

But when I send it ends up:

            character:  (displayed as ) (codepoint 4194243, #o17777703, #x3fffc3)
    preferred charset: eight-bit (Raw bytes 128-255)
code point in charset: 0xC3
               syntax: w which means: word
             category: L:Left-to-right (strong)
             to input: type "C-x 8 RET 3fffc3"
          buffer code: #xC3
            file code: not encodable by coding system utf-8-unix
              display: not encodable for terminal

And a prompt:

  These default coding systems were tried to encode text
  in the buffer ‘1506440244.6bae7e76af4f300b.zen:2,S’:
    (utf-8-unix (347 . 4194243) (348 . 4194217))
  However, each of them encountered characters it couldn’t encode:
    utf-8-unix cannot encode these:

  Click on a character (or switch to this window by ‘M-x other-window’
  and select the characters by RET) to jump to the place it appears,
  where ‘C-u C-x =’ will give information about it.

  Select one of the safe coding systems listed below,
  or cancel the writing with C-g and edit the buffer
     to remove or modify the problematic characters,
  or specify any other coding system (and risk losing
     the problematic characters).

    raw-text no-conversion

If I force raw-text it seems to look fine. Any idea what's going on?

There is a bug raised with mu4e (the mail client using message-mode):

  https://github.com/djcb/mu/issues/1081

--
Alex Bennée

Reply | Threaded
Open this post in threaded view
|

Re: Changes to message-mode and encoding in Emacs26

Eli Zaretskii
> From: Alex Bennée <[hidden email]>
> Date: Tue, 26 Sep 2017 16:41:03 +0100
>
> If I force raw-text it seems to look fine. Any idea what's going on?
>
> There is a bug raised with mu4e (the mail client using message-mode):
>
>   https://github.com/djcb/mu/issues/1081

First, this kind of problems should be reported to the Emacs bug
tracker, not here.

And second, since this involves mu4e, any of its discussions should
IMO include the mu4e developers, because we have no similar issues
reported by people who use the bundled MUA packages (nor MH-E, AFAIK).
So it sounds like mu4e is at least somehow involved in this.

Reply | Threaded
Open this post in threaded view
|

Re: Changes to message-mode and encoding in Emacs26

Alexis

Eli Zaretskii <[hidden email]> writes:

>> From: Alex Bennée <[hidden email]>
>> Date: Tue, 26 Sep 2017 16:41:03 +0100
>>
>> If I force raw-text it seems to look fine. Any idea what's
>> going on?
>>
>> There is a bug raised with mu4e (the mail client using
>> message-mode):
>>
>>   https://github.com/djcb/mu/issues/1081
>
> First, this kind of problems should be reported to the Emacs bug
> tracker, not here.
>
> And second, since this involves mu4e, any of its discussions
> should
> IMO include the mu4e developers, because we have no similar
> issues
> reported by people who use the bundled MUA packages (nor MH-E,
> AFAIK).
> So it sounds like mu4e is at least somehow involved in this.

i can't speak for other mu4e users, but the coding issue i've been
intermittently experiencing as a mu4e user is exactly as described
in
this bug report regarding RMAIL:

     https://debbugs.gnu.org/cgi/bugreport.cgi?bug=28266#17

More specifically, it's this:

     "The encoding issue I mentioned seems to only apply to the
     saving
     of the mail buffer into the file specified in the FCC
     header."


Alexis.

Reply | Threaded
Open this post in threaded view
|

Re: Changes to message-mode and encoding in Emacs26

Alex Bennée

Alexis <[hidden email]> writes:

> Eli Zaretskii <[hidden email]> writes:
>
>>> From: Alex Bennée <[hidden email]>
>>> Date: Tue, 26 Sep 2017 16:41:03 +0100
>>>
>>> If I force raw-text it seems to look fine. Any idea what's going
>>> on?
>>>
>>> There is a bug raised with mu4e (the mail client using
>>> message-mode):
>>>
>>>   https://github.com/djcb/mu/issues/1081
>>
>> First, this kind of problems should be reported to the Emacs bug
>> tracker, not here.

At the moment I'm not sure it's an Emacs bug. Behaviour has changed but
I was trying to understand why it might have first.

>>
>> And second, since this involves mu4e, any of its discussions should
>> IMO include the mu4e developers, because we have no similar issues
>> reported by people who use the bundled MUA packages (nor MH-E,
>> AFAIK).
>> So it sounds like mu4e is at least somehow involved in this.

Hmm OK. I didn't think it could be as the problem only occurs when
message-send is called. I'll have a dig through and see if there are any
hooks/variables involved in the call chain that might break it.

>
> i can't speak for other mu4e users, but the coding issue i've been
> intermittently experiencing as a mu4e user is exactly as described in
> this bug report regarding RMAIL:
>
>     https://debbugs.gnu.org/cgi/bugreport.cgi?bug=28266#17
>
> More specifically, it's this:
>
>     "The encoding issue I mentioned seems to only apply to the
> saving
>     of the mail buffer into the file specified in the FCC   header."

Thanks for the link, I'll have a look.

--
Alex Bennée

Reply | Threaded
Open this post in threaded view
|

Re: Changes to message-mode and encoding in Emacs26

Eli Zaretskii
In reply to this post by Alexis
> From: Alexis <[hidden email]>
> Cc: [hidden email]
> Date: Sat, 30 Sep 2017 00:12:22 +1000
>
> i can't speak for other mu4e users, but the coding issue i've been
> intermittently experiencing as a mu4e user is exactly as described
> in
> this bug report regarding RMAIL:
>
>      https://debbugs.gnu.org/cgi/bugreport.cgi?bug=28266#17
>
> More specifically, it's this:
>
>      "The encoding issue I mentioned seems to only apply to the
>      saving
>      of the mail buffer into the file specified in the FCC
>      header."

I'm confused: are you using mu4e or are you using Rmail?  These are
two different packages, and AFAIK they don't share any code.

Also, bug#28266 explicitly says that the issue happens when
sendmail.el is used as the MUA, which means it's not about message.el,
a completely different MUA.

So I think these are probably two different issues, although the
result is similar.  There are many ways to get raw bytes in Emacs when
you want non-ASCII characters, and not all of them are due to the same
bug...

Reply | Threaded
Open this post in threaded view
|

Re: Changes to message-mode and encoding in Emacs26

Alexis

Eli Zaretskii <[hidden email]> writes:

> I'm confused: are you using mu4e or are you using Rmail?  These
> are
> two different packages, and AFAIK they don't share any code.
>
> Also, bug#28266 explicitly says that the issue happens when
> sendmail.el is used as the MUA, which means it's not about
> message.el,
> a completely different MUA.
>
> So I think these are probably two different issues, although the
> result is similar.  There are many ways to get raw bytes in
> Emacs when
> you want non-ASCII characters, and not all of them are due to
> the same
> bug...

i use mu4e, not Rmail; i was addressing your earlier remark:

> we have no similar issues reported by people who use the bundled
> MUA
> packages

i thought that the fact that both an mu4e user (me) and an Rmail
user
(Charles) are experiencing the same issue ("The encoding issue I
mentioned seems to only apply to the saving of the mail buffer
into the
file specified in the FCC header") /even though/ mu4e uses
message.el
and Rmail doesn't, might indicate a common cause lying deeper in
Emacs'
internals.

But if you think that this is just a coincidence of results, and
that
there are actually two different issues in play, i'll certainly
defer to
your knowledge and experience. Hopefully next time the issue
appears for
me, i'll be able to narrow it down to a minimal working example.


Alexis.

Reply | Threaded
Open this post in threaded view
|

Re: Changes to message-mode and encoding in Emacs26

Alex Bennée

Alexis <[hidden email]> writes:

> Eli Zaretskii <[hidden email]> writes:
>
>> I'm confused: are you using mu4e or are you using Rmail?  These are
>> two different packages, and AFAIK they don't share any code.
>>
>> Also, bug#28266 explicitly says that the issue happens when
>> sendmail.el is used as the MUA, which means it's not about
>> message.el,
>> a completely different MUA.
>>
>> So I think these are probably two different issues, although the
>> result is similar.  There are many ways to get raw bytes in Emacs
>> when
>> you want non-ASCII characters, and not all of them are due to the
>> same
>> bug...
>
> i use mu4e, not Rmail; i was addressing your earlier remark:
>
>> we have no similar issues reported by people who use the bundled MUA
>> packages
>
> i thought that the fact that both an mu4e user (me) and an Rmail user
> (Charles) are experiencing the same issue ("The encoding issue I
> mentioned seems to only apply to the saving of the mail buffer into
> the
> file specified in the FCC header") /even though/ mu4e uses message.el
> and Rmail doesn't, might indicate a common cause lying deeper in
> Emacs'
> internals.
>
> But if you think that this is just a coincidence of results, and that
> there are actually two different issues in play, i'll certainly defer
> to
> your knowledge and experience. Hopefully next time the issue appears
> for
> me, i'll be able to narrow it down to a minimal working example.

I haven't narrowed it down yet but is certainly during message-do-fcc.
It's hard to tell because the work takes place in a temporary buffer but
I'm currently looking at the code that does:

    (when file
      (with-temp-buffer
        (insert-buffer-substring buf)
        (message-clone-locals buf)
        (message-encode-message-body)

And wondering how that might of changed.

Any idea how to examine the current with-temp-buffer while stepping
through in edebug?

>
>
> Alexis.


--
Alex Bennée

Reply | Threaded
Open this post in threaded view
|

Re: Changes to message-mode and encoding in Emacs26

Eli Zaretskii
> From: Alex Bennée <[hidden email]>
> Cc: Eli Zaretskii <[hidden email]>, [hidden email]
> Date: Mon, 02 Oct 2017 09:36:07 +0100
>
> I haven't narrowed it down yet but is certainly during message-do-fcc.
> It's hard to tell because the work takes place in a temporary buffer but
> I'm currently looking at the code that does:
>
>     (when file
>       (with-temp-buffer
> (insert-buffer-substring buf)
> (message-clone-locals buf)
> (message-encode-message-body)
>
> And wondering how that might of changed.

Is 'buf' a unibyte buffer or a multibyte buffer?

> Any idea how to examine the current with-temp-buffer while stepping
> through in edebug?

You can "C-x b" when Emacs is stopped in Edebug.

(Why are we still discussing this issue on this list?)

Reply | Threaded
Open this post in threaded view
|

Re: Changes to message-mode and encoding in Emacs26

Alex Bennée

Eli Zaretskii <[hidden email]> writes:

>> From: Alex Bennée <[hidden email]>
>> Cc: Eli Zaretskii <[hidden email]>, [hidden email]
>> Date: Mon, 02 Oct 2017 09:36:07 +0100
>>
>> I haven't narrowed it down yet but is certainly during message-do-fcc.
>> It's hard to tell because the work takes place in a temporary buffer but
>> I'm currently looking at the code that does:
>>
>>     (when file
>>       (with-temp-buffer
>> (insert-buffer-substring buf)
>> (message-clone-locals buf)
>> (message-encode-message-body)
>>
>> And wondering how that might of changed.
>
> Is 'buf' a unibyte buffer or a multibyte buffer?

buf is the source buffer and yes it will be a multibyte buffer by virtue
of utf-8 encoding and special characters.

>
>> Any idea how to examine the current with-temp-buffer while stepping
>> through in edebug?
>
> You can "C-x b" when Emacs is stopped in Edebug.

The with-temp-buffer doesn't show up on the list until later (where it
is a duplicate created for the purpose of saving the email)

>
> (Why are we still discussing this issue on this list?)

I was responding to Alexis but I can create a new emacs-devel message if
you want or just post directly to the bugs. I feel we are getting close.

--
Alex Bennée

Reply | Threaded
Open this post in threaded view
|

Re: Changes to message-mode and encoding in Emacs26

Eli Zaretskii
> From: Alex Bennée <[hidden email]>
> Cc: [hidden email]
> Date: Mon, 02 Oct 2017 18:12:29 +0100
>
> >>     (when file
> >>       (with-temp-buffer
> >> (insert-buffer-substring buf)
> >> (message-clone-locals buf)
> >> (message-encode-message-body)
> >>
> >> And wondering how that might of changed.
> >
> > Is 'buf' a unibyte buffer or a multibyte buffer?
>
> buf is the source buffer and yes it will be a multibyte buffer by virtue
> of utf-8 encoding and special characters.

If the text in 'buf' is already encoded, why does the code call
message-encode-message-body -- that would encode an already encoded
message, and could be the root cause of the problem.

> >> Any idea how to examine the current with-temp-buffer while stepping
> >> through in edebug?
> >
> > You can "C-x b" when Emacs is stopped in Edebug.
>
> The with-temp-buffer doesn't show up on the list until later (where it
> is a duplicate created for the purpose of saving the email)

What list is that?

A temporary buffer's name begins with a space, so you need to specify
its name explicitly when "C-x b" prompts for it, instead of relying on
completion.

Reply | Threaded
Open this post in threaded view
|

Re: Changes to message-mode and encoding in Emacs26

Alex Bennée
In reply to this post by Eli Zaretskii

Eli Zaretskii <[hidden email]> writes:

>> From: Alex Bennée <[hidden email]>
>> Cc: Eli Zaretskii <[hidden email]>, [hidden email]
>> Date: Mon, 02 Oct 2017 09:36:07 +0100
>>
>> I haven't narrowed it down yet but is certainly during message-do-fcc.
>> It's hard to tell because the work takes place in a temporary buffer but
>> I'm currently looking at the code that does:
>>
>>     (when file
>>       (with-temp-buffer
>> (insert-buffer-substring buf)
>> (message-clone-locals buf)
>> (message-encode-message-body)
>>
>> And wondering how that might of changed.

By the way it is the (message-clone-locals buf) that was introduced in:

  3a9e56d840b5551a90fe9068ee335cc37ed12ef2

that regresses this behaviour. If I comment that line out everything
proceeds as normal. I'm guessing something in that set of local
variables confuses message-encode-message-body?

While I was tracing through the code I noticed it is called twice, once
for the source buffer and again on the temp buffer used for fcc. Maybe
there is some state that gets confused by a "double-encode"?

>
> Is 'buf' a unibyte buffer or a multibyte buffer?

buf is the source buffer and yes it will be a multibyte buffer by virtue
of utf-8 encoding and special characters.

>
>> Any idea how to examine the current with-temp-buffer while stepping
>> through in edebug?
>
> You can "C-x b" when Emacs is stopped in Edebug.

The with-temp-buffer doesn't show up on the list until later (where it
is a duplicate created for the purpose of saving the email)

>
> (Why are we still discussing this issue on this list?)

I was responding to Alexis but I can create a new emacs-devel message if
you want or just post directly to the bugs. I feel we are getting close.

--
Alex Bennée