bug#5235: 23.1; Unibyte keyboard input problem

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

bug#5235: 23.1; Unibyte keyboard input problem

scianagoryczy
Hi,
In Emacs 23.1, in unibyte mode (emacs --unibyte) and with windows-1250
coding I can't write Polish chars with right Alt key.  For example right Alt
+ 'a' gives ^E on the screen. In Emacs 22.3 it works fine (I see polish
char 'ą'), but there there is other problem that buffer is printed in
iso-8859 even if I configure Language Environment to use windows-1250.  In
23.1 with such Language Environment (configured to use cp1250) polish special
chars read from file are printed correctly (I see them) but I can't write
them using right Alt key (or even input mode polish-slash).

I checked it on GNU/Linux and also on MS Windows XP (pure NT-Emacs and
EmacsW32), it's the same problem.

Regards
Tomek

In GNU Emacs 23.1.1 (i686-pc-linux-gnu, GTK+ Version 2.12.9)
 of 2009-08-15 on scianagoryczy
Windowing system distributor `The X.Org Foundation', version 11.0.10400090
configured using `configure  '--with-x-toolkit=gtk''

Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: pl_PL.UTF-8
  value of $XMODIFIERS: nil
  locale-coding-system: utf-8-unix
  default-enable-multibyte-characters: t

Major mode: Lisp Interaction

Minor modes in effect:
  show-paren-mode: t
  gud-tooltip-mode: t
  global-hl-line-mode: t
  global-auto-revert-mode: t
  display-time-mode: t
  auto-insert-mode: t
  yas/minor-mode: t
  tooltip-mode: t
  mouse-wheel-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  global-auto-composition-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  column-number-mode: t
  line-number-mode: t
  transient-mark-mode: t

Recent input:
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo>
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo>
<help-echo> <help-echo> <help-echo> <menu-bar> <help-menu>
<send-emacs-bug-report>

Recent messages:
Loading /home/tomek/emacs/color-theme-6.6.0/themes/color-theme-library.el
(source)...done
Loading autoinsert...done
Loading time...done
Loading autorevert...done
Loading hl-line...done
Loading gud...done
Loading paren...done
Loading which-func...done
For information about GNU Emacs and the GNU system, type C-h C-a.
call-interactively: Text is read-only





Reply | Threaded
Open this post in threaded view
|

Re: bug#5235: 23.1; Unibyte keyboard input problem

Jason Rumney-4
Tomasz Zbrożek wrote:
> Hi,
> In Emacs 23.1, in unibyte mode (emacs --unibyte)
Does it work as expected if you remove the --unibyte?



Reply | Threaded
Open this post in threaded view
|

Re: bug#5235: 23.1; Unibyte keyboard input problem

scianagoryczy
Thanks for reply!
In multibyte mode (I mean no --unibyte) Emacs 23.1 works great for me :)
I'll try to explain why I need unibyte mode. I'm maintener of a C/C++ source
code which has comments coded in cp1250 (polish language) but strings in code
are coded in cp852. So I have two different code pages in source code file.
This is old source code and it was developed in Windows (that's why comments
are in cp1250) but is compiled to work on MS-DOS (that's why strings are
coded in cp852). Of course in multibyte mode I am able to write in these code
pages (for example reloading file with C-x RET r) but when I select cp1250 to
save the buffer emacs often tells me that some cp852 coded chars are not able
to be saved in cp1250 and it wants me to select between raw-text,
no-conversion and emacs-mule. In this situation I have to enter "cp1250" and
force Emacs to save buffer in cp1250. So I do not want to write "cp1250"
again and again when saving buffer to file.. And additionaly I'm not sure
when I force to save my buffer in cp1250 what's going on exactly with cp852
coded chars (I noticed both cp1250 and 852 chars are coded ok).
That's why I decided to use unibyte mode. But as I described I found it's a
problem with writing polish native chars in unibyte mode in Emacs 23.1.
In fact I what to change mode when Emacs works, I mean not with --unibyte but
with set-buffer-multibyte to nil when cpp file is being loaded but it seems
this function does not work correctly or I do not undestand something.

Here is how I configure Language Environment:
 '(current-language-environment "Polish")
 '(language-info-custom-alist (quote (("Polish" (charset cp1250)
(coding-system cp1250) (coding-priority cp1250 cp852) (nonascii-translation .
cp1250) (unibyte-display . cp1250)))))
'(unibyte-display-via-language-environment t)

--
tomek

On Thursday 17 December 2009 17:47:29 Jason Rumney wrote:
> Tomasz Zbrożek wrote:
> > Hi,
> > In Emacs 23.1, in unibyte mode (emacs --unibyte)
>
> Does it work as expected if you remove the --unibyte?



--
tomek


Reply | Threaded
Open this post in threaded view
|

bug#5235: 23.1; Unibyte keyboard input problem

Stefan Monnier
> In multibyte mode (I mean no --unibyte) Emacs 23.1 works great for me :)

--unibyte is deprecated, so rather than try and "fix" it, we want to fix
the problem that caused you to use --unibyte.

> I'll try to explain why I need unibyte mode. I'm maintener of a C/C++
> source  code which has comments coded in cp1250 (polish language) but
> strings in code  are coded in cp852. So I have two different code
> pages in source code file.  This is old source code and it was
> developed in Windows (that's why comments  are in cp1250) but is
> compiled to work on MS-DOS (that's why strings are  coded in cp852).

So what happens if you read those files as binary (i.e. C-x RET
r binary RET)?


        Stefan




Reply | Threaded
Open this post in threaded view
|

bug#5235: 23.1; Unibyte keyboard input problem

Jason Rumney-4
Stefan Monnier wrote:

>> I'll try to explain why I need unibyte mode. I'm maintener of a C/C++
>> source  code which has comments coded in cp1250 (polish language) but
>> strings in code  are coded in cp852. So I have two different code
>> pages in source code file.  This is old source code and it was
>> developed in Windows (that's why comments  are in cp1250) but is
>> compiled to work on MS-DOS (that's why strings are  coded in cp852).
>>    
>
> So what happens if you read those files as binary (i.e. C-x RET
> r binary RET)?
>  

At best, he'd end up silently screwing up his files even further, with
cp1250, cp852 and now utf-8 encoded characters in them.  More likely he
would still get prompted when saving, just as if he'd used cp1250 or
cp852 to read them.

The problem here is the files, not Emacs.  Basically the reason for
using unibyte is that it allows the user to bury their head in the sand
and pretend the problem does not exist.

I work on similar files in my day job, with Japanese comments in
ShiftJIS and Chinese comments in GB2312. An easy method of fixing such
files would be nice, but the best I can think of would be to provide a
recode-region function, which would still be too much manual work to be
worth it to me given that I can barely make sense of the Japanese
comments and can't make any sense of the Chinese ones. The original
poster might be more motivated to make use of such a function if it
existed though.






Reply | Threaded
Open this post in threaded view
|

bug#5235: 23.1; Unibyte keyboard input problem

Eli Zaretskii
> Date: Thu, 24 Dec 2009 23:21:41 +0800
> From: Jason Rumney <[hidden email]>
> Cc: Tomasz Zbrożek <[hidden email]>,
> [hidden email], [hidden email]
>
> The problem here is the files, not Emacs.

I'd say, more accurately: the problem is that Emacs does not support
such use-cases.  It would be nice if we did: having comments in one
encoding and strings in another is not such a corner case.

> I work on similar files in my day job, with Japanese comments in
> ShiftJIS and Chinese comments in GB2312. An easy method of fixing such
> files would be nice, but the best I can think of would be to provide a
> recode-region function

We would also need a way to encode different regions differently.
Perhaps adding special text properties to guide the encoding process
would be a way of doing that (we already have charset properties for
similar reasons).





Reply | Threaded
Open this post in threaded view
|

bug#5235: 23.1; Unibyte keyboard input problem

scianagoryczy
In reply to this post by Jason Rumney-4
The multibyte mode and its prompts for correct codepage is not problem. I
think it's definitelty CORRECT behaviour and it's not the case I wanted to
submit  to you.  
I think that solution for the problem with two code pages in one file is
unibyte mode.

I started this bug-case to get the answer to the question: why in unibyte mode
when I try to write in cp1250 I get codes like ^E instead of proper chars in
buffer ? This behaviour is not correct even when comparing to previous Emacs
version (22.3). So, my question is how to fix this strange keyboard input
behaviour in unibyte mode ?

--
tomek

On Thursday 24 December 2009 16:21:41 Jason Rumney wrote:

> Stefan Monnier wrote:
> >> I'll try to explain why I need unibyte mode. I'm maintener of a C/C++
> >> source  code which has comments coded in cp1250 (polish language) but
> >> strings in code  are coded in cp852. So I have two different code
> >> pages in source code file.  This is old source code and it was
> >> developed in Windows (that's why comments  are in cp1250) but is
> >> compiled to work on MS-DOS (that's why strings are  coded in cp852).
> >
> > So what happens if you read those files as binary (i.e. C-x RET
> > r binary RET)?
>
> At best, he'd end up silently screwing up his files even further, with
> cp1250, cp852 and now utf-8 encoded characters in them.  More likely he
> would still get prompted when saving, just as if he'd used cp1250 or
> cp852 to read them.
>
> The problem here is the files, not Emacs.  Basically the reason for
> using unibyte is that it allows the user to bury their head in the sand
> and pretend the problem does not exist.
>
> I work on similar files in my day job, with Japanese comments in
> ShiftJIS and Chinese comments in GB2312. An easy method of fixing such
> files would be nice, but the best I can think of would be to provide a
> recode-region function, which would still be too much manual work to be
> worth it to me given that I can barely make sense of the Japanese
> comments and can't make any sense of the Chinese ones. The original
> poster might be more motivated to make use of such a function if it
> existed though.



--
tomek




Reply | Threaded
Open this post in threaded view
|

bug#5235: 23.1; Unibyte keyboard input problem

Jason Rumney-4
Tomasz Zbrożek wrote:
> I started this bug-case to get the answer to the question: why in unibyte mode
> when I try to write in cp1250 I get codes like ^E instead of proper chars in
> buffer ?

Keyboard input on Windows is Unicode in 23.1.  In previous versions it
was in the system default codepage.



>  This behaviour is not correct even when comparing to previous Emacs
> version (22.3). So, my question is how to fix this strange keyboard input
> behaviour in unibyte mode ?
>  

What is "correct" is undefined in unibyte mode, since unibyte deals with
bytes, not characters.





Reply | Threaded
Open this post in threaded view
|

bug#5235: 23.1; Unibyte keyboard input problem

scianagoryczy
On Friday 25 December 2009 12:23:42 Jason Rumney wrote:
> Tomasz Zbrożek wrote:
> > I started this bug-case to get the answer to the question: why in unibyte
> > mode when I try to write in cp1250 I get codes like ^E instead of proper
> > chars in buffer ?
>
> Keyboard input on Windows is Unicode in 23.1.  In previous versions it
> was in the system default codepage.
Is this why I get '^E' code instead of 'ą' when I press right ALT + 'a' in
unibyte mode with codepage set to cp1250 (emacs version 23.1) ?
I checked it on Windows and GNU/Linux and it works the same.

Is there possibility to change emacs configuration somehow to get proper
polish chars when writing in unibyte mode ?



--
tomek




Reply | Threaded
Open this post in threaded view
|

bug#5235: 23.1; Unibyte keyboard input problem

Eli Zaretskii
In reply to this post by scianagoryczy
> From: Tomasz Zbrożek <[hidden email]>
> Date: Fri, 25 Dec 2009 12:03:29 +0100
> Cc: [hidden email], [hidden email]
>
> The multibyte mode and its prompts for correct codepage is not problem. I
> think it's definitelty CORRECT behaviour and it's not the case I wanted to
> submit  to you.  
> I think that solution for the problem with two code pages in one file is
> unibyte mode.

I think Emacs developers are much more motivated to improve the
multibyte mode than to fix the unibyte mode.  I cannot speak for the
head maintainers, but that is certainly my opinion: the unibyte mode
should simply die, as a mode for interactive editing.

You received several suggestions for trying things in multibyte mode.
Perhaps you could try them and see if they allow you to edit your
programs without screwing up the cp852 characters.  If something is
still wrong, please describe the problems here: we are much more
likely to find a solution for multibyte mode editing than for unibyte.





Reply | Threaded
Open this post in threaded view
|

bug#5235: 23.1; Unibyte keyboard input problem

scianagoryczy
In reply to this post by Jason Rumney-4

>I think Emacs developers are much more motivated to improve the
>multibyte mode than to fix the unibyte mode.  I cannot speak for the
>head maintainers, but that is certainly my opinion: the unibyte mode
>should simply die, as a mode for interactive editing.
ok, I will not use unibyte mode :)

>You received several suggestions for trying things in multibyte mode.
>Perhaps you could try them and see if they allow you to edit your
>programs without screwing up the cp852 characters.  If something is
>still wrong, please describe the problems here: we are much more
>likely to find a solution for multibyte mode editing than for unibyte.
so, my only problem (in multibyte mode) is annoying question for safe coding
when saving buffer, I attach a new screenshot:
- on the most upper buffer you see my file which has originally some cp1250
chars and also cp852 chars,
- on the middle buffer you see that I have cp1250 set to save this buffer,
- and below there is a buffer with information that there is no possibility to
encode \210 char (originally cp852) to cp1250 (because cp1250 is my codepage
to save, but of course after saving this char in the file should be cp852
coded and it will be when I force cp1250 - this is ok)

I can't find any way to force emacs not to prompt me with codepage selection,
I understand emacs treats it like an error (in his opinion \210 char is
wrong) but I would like to set somehow that cp1250 is safe,
"-*- coding: cp1250 -*-" or modify-coding-system-alist function is not
solution

my only question is: how to configure emacs to omit this codepage selection in
such situation?

I would be thankful for help!

emacs.png (214K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

bug#5235: 23.1; Unibyte keyboard input problem

Eli Zaretskii
> From: Tomasz Zbrożek <[hidden email]>
> Date: Sat, 26 Dec 2009 13:45:53 +0100
> Cc: Stefan Monnier <[hidden email]>,
>  [hidden email],
>  Eli Zaretskii <[hidden email]>
>
> encode \210 char (originally cp852) to cp1250 (because cp1250 is my codepage
> to save, but of course after saving this char in the file should be cp852
> coded and it will be when I force cp1250 - this is ok)
>
> I can't find any way to force emacs not to prompt me with codepage selection,
> I understand emacs treats it like an error (in his opinion \210 char is
> wrong) but I would like to set somehow that cp1250 is safe,
> "-*- coding: cp1250 -*-" or modify-coding-system-alist function is not
> solution
>
> my only question is: how to configure emacs to omit this codepage selection in
> such situation?

Does it help to evaluate the expression below?

   (aset latin-extra-code-table ?\210 t)

Please do that _before_ visiting files which have the \210 character.
Then try to save such a file and see if this helps.

The above only handles the \210 character, so please don't try to use
any other characters whose code is between 128 and 160.  If this
works, it is trivial to cover the entire range, of course.

If the above does not help with cp1250, please try the same with
latin-2 instead (you will have to modify the `coding:' cookie for this
to work).





Reply | Threaded
Open this post in threaded view
|

bug#5235: 23.1; Unibyte keyboard input problem

Stefan Monnier
In reply to this post by Jason Rumney-4
>>> I'll try to explain why I need unibyte mode. I'm maintener of a C/C++
>>> source  code which has comments coded in cp1250 (polish language) but
>>> strings in code  are coded in cp852. So I have two different code
>>> pages in source code file.  This is old source code and it was
>>> developed in Windows (that's why comments  are in cp1250) but is
>>> compiled to work on MS-DOS (that's why strings are  coded in cp852).
>> So what happens if you read those files as binary (i.e. C-x RET
>> r binary RET)?
> At best, he'd end up silently screwing up his files even further, with
> cp1250, cp852 and now utf-8 encoded characters in them.  More likely he
> would still get prompted when saving, just as if he'd used cp1250 or cp852
> to read them.

That would be a bug: a file visited as `binary' (or as `raw-text')
should be placed in a unibyte buffer, so it should not screw anything up
more than was already the case to start with.

> The problem here is the files, not Emacs.  Basically the reason for using
> unibyte is that it allows the user to bury their head in the sand and
> pretend the problem does not exist.

Of course, but if you start with such files and can't (or don't want to)
recode the parts consistently, we can't do much better.

> I work on similar files in my day job, with Japanese comments in ShiftJIS
> and Chinese comments in GB2312. An easy method of fixing such files would be
> nice, but the best I can think of would be to provide a recode-region
> function, which would still be too much manual work to be worth it to me
> given that I can barely make sense of the Japanese comments and can't make
> any sense of the Chinese ones. The original poster might be more motivated
> to make use of such a function if it existed though.

I'm not sure what would be the best approach in general or in particular
cases, but we could certainly provide a command that recodes comments.
Or another one that looks for invalid byte sequences (i.e. decoded as
eight-bit-bytes) and tries to re-decode them with a secondary coding system.


        Stefan




Reply | Threaded
Open this post in threaded view
|

bug#5235: 23.1; Unibyte keyboard input problem

scianagoryczy
In reply to this post by scianagoryczy
Eli,
is something going on with case 5235 on emacs bug list ?
I mean, will your patch (as I remember it needs some improvement ;) be
implemented to the emacs current version ?

best regards!
--
tomek




Reply | Threaded
Open this post in threaded view
|

bug#5235: 23.1; Unibyte keyboard input problem

Eli Zaretskii
> From: Tomasz Zbrożek <[hidden email]>
> Date: Fri, 26 Feb 2010 21:42:34 +0100
>
> Eli,
> is something going on with case 5235 on emacs bug list ?
> I mean, will your patch (as I remember it needs some improvement ;) be
> implemented to the emacs current version ?

I didn't yet have time to work on the improvement, sorry.  So I guess
it will not be in Emacs 23.2.





Reply | Threaded
Open this post in threaded view
|

bug#5235: 23.1; Unibyte keyboard input problem

Lars Ingebrigtsen
In reply to this post by scianagoryczy
Tomasz Zbrożek <[hidden email]> writes:

> In Emacs 23.1, in unibyte mode (emacs --unibyte) and with windows-1250
> coding I can't write Polish chars with right Alt key.

The --unibyte switch has been removed, so I can't reproduce the bug in
question here, so I'm going to go ahead and guess that this is no longer
relevant, and I'm closing this bug report.  Although skimming this bug
report, I'm wondering whether this is still relevant if you're
explicitly (set-buffer-multibyte nil) and entering text, but...  I'm not
sure?  If it is, please respond to the debbugs address, and we'll reopen.

--
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no