bug#12693: 24.2.50; src/w32font.c should depend on ANSI code page

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

bug#12693: 24.2.50; src/w32font.c should depend on ANSI code page

Kazuhiro Ito
When I run Emacs on Cygwin with the native Windows UI, I can't specify
font by non-ascii font name.  For example, the below code success on
precompiled binary on Windows (Japanese edition) but raises error on
Cygwin with the native Windows UI.

(set-default-font "MS ゴシック-14")

The reason is that lfFaceName member of LOGFONT structure is expected
to be encoded in ANSI code page, but Emacs encodes in or decodes as
the coding system specified in locale-coding-system variable.  It is
set to utf-8-unix on Cygwin and causes the above problem.

I think the below patch or similar modification would be needed.


=== modified file 'src/w32font.c'
--- src/w32font.c 2012-09-17 12:07:36 +0000
+++ src/w32font.c 2012-10-20 12:12:49 +0000
@@ -34,6 +34,15 @@
 #include "font.h"
 #include "w32font.h"
 
+/* From w32select.c */
+extern Lisp_Object QANSICP;
+
+#define ENCODE_ACP(str) \
+  (code_convert_string_norecord (str, QANSICP, 1))
+
+#define DECODE_ACP(str) \
+  (code_convert_string_norecord (str, QANSICP, 0))
+
 /* Cleartype available on Windows XP, cleartype_natural from XP SP1.
    The latter does not try to fit cleartype smoothed fonts into the
    same bounding box as the non-antialiased version of the font.
@@ -285,7 +294,7 @@
 Lisp_Object
 intern_font_name (char * string)
 {
-  Lisp_Object str = DECODE_SYSTEM (build_string (string));
+  Lisp_Object str = DECODE_ACP (build_string (string));
   int len = SCHARS (str);
   Lisp_Object obarray = check_obarray (Vobarray);
   Lisp_Object tem = oblookup (obarray, SDATA (str), len, len);
@@ -971,10 +980,10 @@
       }
     if (name)
       font->props[FONT_FULLNAME_INDEX]
-        = DECODE_SYSTEM (build_string (name));
+        = DECODE_ACP (build_string (name));
     else
       font->props[FONT_FULLNAME_INDEX]
- = DECODE_SYSTEM (build_string (logfont.lfFaceName));
+ = DECODE_ACP (build_string (logfont.lfFaceName));
   }
 
   font->max_width = w32_font->metrics.tmMaxCharWidth;
@@ -2035,7 +2044,7 @@
       else if (SYMBOLP (tmp))
  {
   strncpy (logfont->lfFaceName,
-   SDATA (ENCODE_SYSTEM (SYMBOL_NAME (tmp))), LF_FACESIZE);
+   SDATA (ENCODE_ACP (SYMBOL_NAME (tmp))), LF_FACESIZE);
   logfont->lfFaceName[LF_FACESIZE-1] = '\0';
  }
     }
@@ -2131,7 +2140,7 @@
       if (NILP (family))
         continue;
       else if (SYMBOLP (family))
-        name = SDATA (ENCODE_SYSTEM (SYMBOL_NAME (family)));
+        name = SDATA (ENCODE_ACP (SYMBOL_NAME (family)));
       else
  continue;
 
@@ -2511,7 +2520,7 @@
       || logfont_to_fcname (&lf, cf.iPointSize, buf, 100) < 0)
     return Qnil;
 
-  return DECODE_SYSTEM (build_string (buf));
+  return DECODE_ACP (build_string (buf));
 }
 
 static const char *const w32font_booleans [] = {

=== modified file 'src/w32select.c'
--- src/w32select.c 2012-10-11 00:32:25 +0000
+++ src/w32select.c 2012-10-20 06:11:00 +0000
@@ -117,7 +117,8 @@
    based on current system parameters. */
 static LCID DEFAULT_LCID;
 static UINT ANSICP, OEMCP;
-static Lisp_Object QUNICODE, QANSICP, QOEMCP;
+static Lisp_Object QUNICODE, QOEMCP;
+Lisp_Object QANSICP;
 
 /* A hidden window just for the clipboard management. */
 static HWND clipboard_owner;


--
Kazuhiro Ito



Reply | Threaded
Open this post in threaded view
|

bug#12693: 24.2.50; src/w32font.c should depend on ANSI code page

Jason Rumney-4
Kazuhiro Ito <[hidden email]> writes:

> When I run Emacs on Cygwin with the native Windows UI, I can't specify
> font by non-ascii font name.  For example, the below code success on
> precompiled binary on Windows (Japanese edition) but raises error on
> Cygwin with the native Windows UI.
>
> (set-default-font "MS ゴシック-14")
>
> The reason is that lfFaceName member of LOGFONT structure is expected
> to be encoded in ANSI code page, but Emacs encodes in or decodes as
> the coding system specified in locale-coding-system variable.  It is
> set to utf-8-unix on Cygwin and causes the above problem.

This is a problem with the Cygwin build's initialisation of
locale-coding-system. It is supposed to be set to the coding system that
system calls will accept, which on Windows cannot be utf-8 (maybe on
recent versions it can be, but when I tried on Windows XP, it caused all
manner of problems).





Reply | Threaded
Open this post in threaded view
|

bug#12693: 24.2.50; src/w32font.c should depend on ANSI code page

Kazuhiro Ito
> > When I run Emacs on Cygwin with the native Windows UI, I can't specify
> > font by non-ascii font name.  For example, the below code success on
> > precompiled binary on Windows (Japanese edition) but raises error on
> > Cygwin with the native Windows UI.
> >
> > (set-default-font "MS ゴシック-14")
> >
> > The reason is that lfFaceName member of LOGFONT structure is expected
> > to be encoded in ANSI code page, but Emacs encodes in or decodes as
> > the coding system specified in locale-coding-system variable.  It is
> > set to utf-8-unix on Cygwin and causes the above problem.
>
> This is a problem with the Cygwin build's initialisation of
> locale-coding-system. It is supposed to be set to the coding system that
> system calls will accept, which on Windows cannot be utf-8 (maybe on
> recent versions it can be, but when I tried on Windows XP, it caused all
> manner of problems).

On Cygwin, locale-coding-system's value depends on its environment.
For example,

$ env LANG=ja_JP.CP932 emacs --batch --eval '(princ locale-coding-system)'
-> japanese-cp932-unix

$ env LANG=ja_JP.UTF-8 emacs --batch --eval '(princ locale-coding-system)'
-> utf-8-unix


And, some functions expect locale-coding-system to be set locale's
coding system, not ANSI code page.
Please try the below code (cygwin, locale is ja_JP.UTF-8).

(list
 locale-coding-system
 (let ((locale-coding-system 'utf-8))
   (format-time-string "%c"))
 (let ((locale-coding-system 'cp932))
   (format-time-string "%c")))

-> (utf-8-unix "2012年10月23日 21時30分39秒" #("2012蟷エ10譛\21023譌・ 21譎\20230蛻\20639遘\222" 4 5 (charset cp932-2-byte) 5 8 (charset katakana-sjis) 8 13 (charset cp932-2-byte) 13 17 (charset katakana-sjis) 17 26 (charset cp932-2-byte)))


At present, locale-coding-system has to be ANSI code page for
(w32-select-font), and has to be locale's coding system for
(format-time-string "%c").  The cause is that we use two kinds of
system calls, Windows's API and Cygwin's API (may three, if we count
Windows's Unicode API).

--
Kazuhiro Ito
Reply | Threaded
Open this post in threaded view
|

bug#12693: 24.2.50; src/w32font.c should depend on ANSI code page

Eli Zaretskii
In reply to this post by Jason Rumney-4
> From: Jason Rumney <[hidden email]>
> Date: Tue, 23 Oct 2012 19:52:30 +0800
> Cc: [hidden email]
>
> [locale-coding-system] is supposed to be set to the coding system that
> system calls will accept, which on Windows cannot be utf-8 (maybe on
> recent versions it can be, but when I tried on Windows XP, it caused all
> manner of problems).

No, UTF-8 still cannot be used on Windows, AFAIK.



Reply | Threaded
Open this post in threaded view
|

bug#12693: 24.2.50; src/w32font.c should depend on ANSI code page

Eli Zaretskii
In reply to this post by Kazuhiro Ito
> Date: Tue, 23 Oct 2012 22:05:46 +0900
> From: Kazuhiro Ito <[hidden email]>
> Cc: [hidden email]
>
> On Cygwin, locale-coding-system's value depends on its environment.
> For example,
>
> $ env LANG=ja_JP.CP932 emacs --batch --eval '(princ locale-coding-system)'
> -> japanese-cp932-unix
>
> $ env LANG=ja_JP.UTF-8 emacs --batch --eval '(princ locale-coding-system)'
> -> utf-8-unix

This is not necessarily relevant to Emacs, or at least doesn't provide
a definitive answer to the question what encoding should ENCODE_SYSTEM
use in the cygw32 build, which is a kind of androgen wrt encoding and
decoding issues.

There are several places where this issue might (or will) pop up:

  . decoding keyboard key events
  . encoding and decoding file names
  . encoding strings passed to various non-file APIs, like the one you
    mentioned

At least the first 2 items use different single-byte encoding in the
GUI and the console frames.

Someone(TM) should analyze all these and come up with recommendations
whether cygw32 should cater to the normal Cygwin locale, or maybe for
practical reasons it should do something else.

> Please try the below code (cygwin, locale is ja_JP.UTF-8).
>
> (list
>  locale-coding-system
>  (let ((locale-coding-system 'utf-8))
>    (format-time-string "%c"))
>  (let ((locale-coding-system 'cp932))
>    (format-time-string "%c")))

This is but one example.  As you yourself found out, this encoding is
unsuitable for the font interface.

> At present, locale-coding-system has to be ANSI code page for
> (w32-select-font)

So maybe we need w32-select-font to use UTF-16 in the cygw32 case, as
it does for menus.

> The cause is that we use two kinds of system calls, Windows's API
> and Cygwin's API (may three, if we count Windows's Unicode API).

See above: there's much more than just 3.



Reply | Threaded
Open this post in threaded view
|

bug#12693: 24.2.50; src/w32font.c should depend on ANSI code page

Daniel Colascione-5
On 10/23/2012 9:22 AM, Eli Zaretskii wrote:

>> Date: Tue, 23 Oct 2012 22:05:46 +0900
>> From: Kazuhiro Ito <[hidden email]>
>> Cc: [hidden email]
>>
>> On Cygwin, locale-coding-system's value depends on its environment.
>> For example,
>>
>> $ env LANG=ja_JP.CP932 emacs --batch --eval '(princ locale-coding-system)'
>> -> japanese-cp932-unix
>>
>> $ env LANG=ja_JP.UTF-8 emacs --batch --eval '(princ locale-coding-system)'
>> -> utf-8-unix
>
> This is not necessarily relevant to Emacs, or at least doesn't provide
> a definitive answer to the question what encoding should ENCODE_SYSTEM
> use in the cygw32 build, which is a kind of androgen wrt encoding and
> decoding issues.
>
> There are several places where this issue might (or will) pop up:
>
>   . decoding keyboard key events
Already handled, I believe.

>   . encoding and decoding file names

We talk to Cygwin here, so there's no problem using locale-coding-system.

>   . encoding strings passed to various non-file APIs, like the one you
>     mentioned

I tried to ferret these out what I was doing the initial port, but it looks like
I missed the font code.

>
> At least the first 2 items use different single-byte encoding in the
> GUI and the console frames.
>
> Someone(TM) should analyze all these and come up with recommendations
> whether cygw32 should cater to the normal Cygwin locale, or maybe for
> practical reasons it should do something else.

The right code for Cygw32 is to always define NTGUI_UNICODE and unconditionally
use Unicode APIs when NTGUI_UNICODE is set. Maybe, someday, we can define
NTGUI_UNICODE for the NT port too.


signature.asc (264 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

bug#12693: 24.2.50; src/w32font.c should depend on ANSI code page

Eli Zaretskii
> Date: Thu, 25 Oct 2012 14:18:07 -0700
> From: Daniel Colascione <[hidden email]>
> CC: Kazuhiro Ito <[hidden email]>, [hidden email]
>
> The right code for Cygw32 is to always define NTGUI_UNICODE and unconditionally
> use Unicode APIs when NTGUI_UNICODE is set.

I figured that much.  So I suggest that the patch to fix this issue be
reworked in that direction.

> Maybe, someday, we can define NTGUI_UNICODE for the NT port too.

That's the plan, yes.  Although I think it will not be a compile-time
test, since there's a lot of work involved, and so some old code will
have to coexist with the new for some time.  Volunteers are welcome.



Reply | Threaded
Open this post in threaded view
|

bug#12693: [cygwin] Setting fonts with non-ascii names throws error quit

Kazuhiro Ito
In reply to this post by Kazuhiro Ito
> > When I run Emacs on Cygwin with the native Windows UI, I can't specify
> > font by non-ascii font name.  For example, the below code success on
> > precompiled binary on Windows (Japanese edition) but raises error on
> > Cygwin with the native Windows UI.
> >
> > (set-default-font "MS ゴシック-14")
>
> This was seven years ago, and this function no longer exists, so
> obviously things have changed in this area.  Are you still seeing this
> bug in a recent version of Emacs?

Yes.

(set-frame-font "MS ゴシック-14") raises an error on Cygw32 build
but not on MinGW64 build.  x-select-font function returns encoded
string on Cygw32 build.  Let-binding locale-coding-system to the
correct codepage can avoid the problem.

;; Chose "MS ゴシック-14"
(x-select-font)

-> "\202l\202r \203S\203V\203b\203N-14"

(let ((locale-coding-system 'cp932))
  (x-select-font))

-> #("MS ゴシック-14" 0 10 (charset cp932-2-byte))

(set-frame-font "MS ゴシック-14")

-> error

(let ((locale-coding-system 'cp932))
  (set-frame-font "MS ゴシック-14"))

-> Frame font is changed.

--
Kazuhiro Ito



Reply | Threaded
Open this post in threaded view
|

bug#12693: [cygwin] Setting fonts with non-ascii names throws error quit

Lars Ingebrigtsen
Kazuhiro Ito <[hidden email]> writes:

> (set-frame-font "MS ゴシック-14") raises an error on Cygw32 build
> but not on MinGW64 build.  x-select-font function returns encoded
> string on Cygw32 build.  Let-binding locale-coding-system to the
> correct codepage can avoid the problem.
>
> ;; Chose "MS ゴシック-14"
> (x-select-font)
>
> -> "\202l\202r \203S\203V\203b\203N-14"

Hm...  I don't use Windows, so I can't test this, but perhaps the result
from `x-select-font' should use `detect-coding-string' or something on
the result (and then decode it) so that we get a correct string in Emacs?

> (let ((locale-coding-system 'cp932))
>   (x-select-font))
>
> -> #("MS ゴシック-14" 0 10 (charset cp932-2-byte))
>
> (set-frame-font "MS ゴシック-14")
>
> -> error
>
> (let ((locale-coding-system 'cp932))
>   (set-frame-font "MS ゴシック-14"))
>
> -> Frame font is changed.

And the same here, but the other way around -- encode the string before
calling set-frame-front?

Unfortunately, on Debian, it looks like none of the fonts available here
have non-ASCII names, so I can't really test whether this idea even
makes any sense.  Anybody?

--
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



Reply | Threaded
Open this post in threaded view
|

bug#12693: [cygwin] Setting fonts with non-ascii names throws error quit

Kazuhiro Ito
> > (set-frame-font "MS ゴシック-14") raises an error on Cygw32 build
> > but not on MinGW64 build.  x-select-font function returns encoded
> > string on Cygw32 build.  Let-binding locale-coding-system to the
> > correct codepage can avoid the problem.
> >
> > ;; Chose "MS ゴシック-14"
> > (x-select-font)
> >
> > -> "\202l\202r \203S\203V\203b\203N-14"
>
> Hm...  I don't use Windows, so I can't test this, but perhaps the result
> from `x-select-font' should use `detect-coding-string' or something on
> the result (and then decode it) so that we get a correct string in Emacs?

As discussed in the original thread, Emacs uses ANSI version of
Windows API to handle fonts.  Strings passed to or received from APIs
should be encoded in or decoded from ANSI codepage.  To do that,
ENCODE_SYSTEM and DECODE_SYSTEM macros are used (See src/w32font.c).
It means that locale-coding-system is used around Windows font API.
That works well on MinGW64, because locale-coding-system is the same
with ANSI codepage.  But on Cygw32, locale-coding-system is normally
utf-8 and it is not ANSI codepage.  This is the cause of the problem.

My original post makes Emacs use ANSI codepage for Windows font API.
Further discussion indicates to make Emacs on Windows use unicode API
if available.  But no progresss after that.

--
Kazuhiro Ito