Emacs text bug

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Emacs text bug

drain
Before I report this as a bug, I want to make sure it doesn't already have
a solution:

All of the "-" characters have been replaced with "\ 342\200\224" (which
has a different face and cannot be replaced with replace-string).
Reply | Threaded
Open this post in threaded view
|

Re: Emacs text bug

Peter Dyballa

Am 26.01.2013 um 21:23 schrieb drain:

> All of the "-" characters have been replaced with "\ 342\200\224" (which
> has a different face and cannot be replaced with replace-string).

Because the encoding of the buffer has changed? I can see similar things in one specific user's GNU Emacs. In *compilation* buffers the curly quotes are turned into their byte-triplets, in dired buffers the "ä" in the German name März for March are also sometimes lost. But why and when does this happen? Without this knowledge it's kind of senseless to report…

--
Greetings

  Pete

The best way to accelerate a PC is 9.8 m/s²


Reply | Threaded
Open this post in threaded view
|

Re: Emacs text bug

drain
Perhaps the encoding did change. I recall copy / pasting a bunch of text
from a book online into the buffer, and somewhere along the way I might
have blindly changed the setting.

Which encoding system supports the "—" character?
Reply | Threaded
Open this post in threaded view
|

RE: Emacs text bug

Drew Adams
In reply to this post by Peter Dyballa
> But why and when does this happen? Without this knowledge
> it's kind of senseless to report.

I disagree with that claim.

While it is always better to base a bug report on more information, even just
reporting a problem can sometimes help.  At the very least it gives Emacs core
developers and other users a heads-up to look further wrt the problem and its
details (e.g. "why and when").

That's already happening, because the OP posted here, thanks to your reply and
his followup wrt encoding.

Staying in one's corner because one does not have all the info or understanding
is too often a brake on progress.

Not every user has the motivation or the means, including time, to dig deeper
and investigate a problem encountered, to determine the why & when.  Just
communicating that there seems to be a problem, even if one is not sure, is a
good start.

There is no way that Emacs developers can completely test every change they
make.  Users reporting questions and perceived problems are indispensable to
getting it right.

IMHO, it is better for users, especially new users or those who feel unsure, to
err on the side of reporting too much than too little.  It is definitely _not_
the case, IMO, that "it's kind of senseless to report" without knowledge of the
why & when.

The OP brought up the question here first, before reporting, in order to pose
ask whether he was missing something.  That's a good thing.  If the replies here
ultimately suggest that "it doesn't already have a solution", then I, for one,
encourage a bug report.


Reply | Threaded
Open this post in threaded view
|

Re: Emacs text bug

Peter Dyballa
In reply to this post by drain

Am 26.01.2013 um 23:43 schrieb drain:

> Which encoding system supports the "—" character?

You showed before that three bytes were used for the EM DASH' encoding, so it was done in UTF-8. (This character can also be encoded in CP125[0-2] and ISO 8859-1 – but then as 1 byte only.)

--
Greetings

  Pete

Chicago, n.:
        Where the dead still vote … early and often!


Reply | Threaded
Open this post in threaded view
|

Re: Emacs text bug

drain
That was a bit tricky. The local buffer setting was "raw text", and I had
to change it to UTF-8. But the strings of codes were not automatically
converted (which would have been nice); I had to copy / paste the text into
the buffer again.

Is there a way to reload these characters once the encoding is changed? I
might have a few buffers like this, and it would save me copy / pasting
texts again. replace-string modus operandi would even work for me.
Reply | Threaded
Open this post in threaded view
|

Re: Emacs text bug

Peter Dyballa
In reply to this post by Drew Adams

Am 26.01.2013 um 23:48 schrieb Drew Adams:

> While it is always better to base a bug report on more information, even just
> reporting a problem can sometimes help.  At the very least it gives Emacs core
> developers and other users a heads-up to look further wrt the problem and its
> details (e.g. "why and when").

This happens as far as I can see rarely. Just some days ago it happened again and I was very soon there. C-h l did not show anything. While the compilation was still going on and showed UTF-8 encoding in the mode-line I tried to fix the way the buffer contents was presented by invoking revert-buffer-with-coding-system, C-x RET r, but it did not change anything. All other buffers (I visited) containing non-US ASCII characters showed the same fault: the UTF-8 encoding bytes were displayed.

This could be a Mac OS X problem. Here I can see that 'find … -ls' inserts ASCII NULs, ^@, into *shell* buffer at the transition from the column with the file size to the next one, the one with the date. Or it happens between the date column and the file name column – I am not completely sure about it. Something like these extra characters or bytes could be inserted into the *compilation* buffer as well and then the binary byte sequence gets out of sequence and order. But why does it hit all buffers and not only the faulty one with the extraneous bytes?

There seems to be one more indication: the hardware is PowerPC, 32-bit. The Mac OS X version is also close to ancient: Mac OS X 10.4 or 10.5 (Tiger or Leopard). On intel hardware it did occur yet…

--
Greetings

  Pete

A blizzard is when it snows sideways.


Reply | Threaded
Open this post in threaded view
|

Re: Emacs text bug

Peter Dyballa
In reply to this post by drain

Am 27.01.2013 um 00:23 schrieb drain:

> Is there a way to reload these characters once the encoding is changed?

Yes: revert-buffer-with-coding-system or C-x RET r <encoding> RET

--
Greetings

  Pete

Work is the curse of the drinking class.
                                – Oscar Wilde


Reply | Threaded
Open this post in threaded view
|

Re: Emacs text bug

drain
Still problems.

(1) revert-buffer-with-coding system RET
(2) utf-8 RET
(3) "Revert buffer from file[...]" y RET
(4) [characters appear as they should now]
(5) [make change so I can save]
(6) save-buffer
(7) "Select coding system (default raw-text)" utf-8
(8) "wrote buffer [...]"
(9) kill-buffer RET foo.org RET
(10) find-file foo.org RET, sees it's back to raw-text, not utf-8, with
     characters mangled.
Reply | Threaded
Open this post in threaded view
|

RE: Emacs text bug

Doug Lewan
> (9) kill-buffer RET foo.org RET
> (10) find-file foo.org RET, sees it's back to raw-text, not utf-8, with
>      characters mangled.

I think that's what you should expect. Once you kill the buffer, emacs forgets all about the file that it had held.

Apparently emacs can't figure out that the file is UTF-8. You'll need to provide a hint. `-*- coding: utf-8 -*-' in the first line is one way. You'll find more in the emacs info page, node `Coding Systems'.

I hope this helps.

,Douglas
Douglas Lewan
Shubert Ticketing
(201) 489-8600 ext 224

When I do good, I feel good. When I do bad, I feel bad and that's my religion. - Abraham Lincoln

> -----Original Message-----
> From: help-gnu-emacs-bounces+dougl=[hidden email]
> [mailto:help-gnu-emacs-bounces+dougl=[hidden email]] On
> Behalf Of drain
> Sent: Thursday, 2013 January 31 12:56
> To: [hidden email]
> Subject: Re: Emacs text bug
>
> Still problems.
>
> (1) revert-buffer-with-coding system RET
> (2) utf-8 RET
> (3) "Revert buffer from file[...]" y RET
> (4) [characters appear as they should now]
> (5) [make change so I can save]
> (6) save-buffer
> (7) "Select coding system (default raw-text)" utf-8
> (8) "wrote buffer [...]"
> (9) kill-buffer RET foo.org RET
> (10) find-file foo.org RET, sees it's back to raw-text, not utf-8, with
>      characters mangled.
>
>
>
> --
> View this message in context: http://emacs.1067599.n5.nabble.com/Emacs-
> text-bug-tp276577p276925.html
> Sent from the Emacs - Help mailing list archive at Nabble.com.


Reply | Threaded
Open this post in threaded view
|

RE: Emacs text bug

drain
Doug Lewan wrote
You'll need to provide a hint. `-*- coding: utf-8 -*-' in the first line is one way.
That appears to have worked. A bit ugly having that instruction at the top,
but better than manually reverting the buffer every single time.
Reply | Threaded
Open this post in threaded view
|

Re: Emacs text bug

Eli Zaretskii
In reply to this post by drain
> Date: Thu, 31 Jan 2013 09:55:52 -0800 (PST)
> From: drain <[hidden email]>
>
> Still problems.
>
> (1) revert-buffer-with-coding system RET
> (2) utf-8 RET
> (3) "Revert buffer from file[...]" y RET
> (4) [characters appear as they should now]
> (5) [make change so I can save]
> (6) save-buffer
> (7) "Select coding system (default raw-text)" utf-8
> (8) "wrote buffer [...]"
> (9) kill-buffer RET foo.org RET
> (10) find-file foo.org RET, sees it's back to raw-text, not utf-8, with
>      characters mangled.

Evidently, you have in that file bytes that are not valid UTF-8
sequences.  You need to fix them (the "Select coding system ..."
prompt tells you which characters cannot be encoded in UTF-8 -- those
are the ones you need to fix.).

Reply | Threaded
Open this post in threaded view
|

Re: Emacs text bug

Eli Zaretskii
In reply to this post by drain
> Date: Thu, 31 Jan 2013 10:45:31 -0800 (PST)
> From: drain <[hidden email]>
>
> Doug Lewan wrote
> > You'll need to provide a hint. `-*- coding: utf-8 -*-' in the first line
> > is one way.
>
> That appears to have worked. A bit ugly having that instruction at the top,
> but better than manually reverting the buffer every single time.

You shouldn't need that.  You need to clean up your file instead.

Reply | Threaded
Open this post in threaded view
|

Re: Emacs text bug

drain
In reply to this post by Eli Zaretskii
Now I see. This problem must have started when I copied an early 19th
century letter into the buffer, and the characters did not transliterate
properly into modern English. Whatever those characters were, they turned
into circumflexed /a/ (â), the pound sign (£), and a (special) right double
quotation mark (”). utf-8 apparently cannot handle these.

But why would this prevent utf-8 from encoding the rest of the buffer? Why
not just leave those three characters mangled, and display the rest
properly? It reverted fine; it just would not stay in utf-8 unless I (1)
put the instruction at the top of the buffer or (2) deleted those special
characters. So the functionality appears to be there: Emacs just would not
accept it as a saved state (absent instruction at the top).

Somehow that buffer got stuck with a limited encoding system. I'm composing
this message right now in a "scratch.org" buffer which is using utf-8-unix
-- and apparently handles those three characters fine (consequently I'm
switching the problem file from utf-8 to utf-8-unix).

Anyway, glad to get that sorted.
Reply | Threaded
Open this post in threaded view
|

Re: Emacs text bug

Eli Zaretskii
> Date: Thu, 31 Jan 2013 11:28:47 -0800 (PST)
> From: drain <[hidden email]>
>
> Now I see. This problem must have started when I copied an early 19th
> century letter into the buffer, and the characters did not transliterate
> properly into modern English. Whatever those characters were, they turned
> into circumflexed /a/ (â), the pound sign (£), and a (special) right double
> quotation mark (”). utf-8 apparently cannot handle these.

UTF-8 certainly _can_ handle them.  I suspect that these characters
got copied as raw bytes instead.

> But why would this prevent utf-8 from encoding the rest of the buffer? Why
> not just leave those three characters mangled, and display the rest
> properly? It reverted fine; it just would not stay in utf-8 unless I (1)
> put the instruction at the top of the buffer or (2) deleted those special
> characters. So the functionality appears to be there: Emacs just would not
> accept it as a saved state (absent instruction at the top).

Emacs auto-detects the encoding each time you visit a file, unless
either the file (by the 'coding:' cookie) or you (by using "C-x RET c")
tell it exactly how to decode the file.