bug#19910: 24.4; Japanese font names are decoded incorrectly in Cygwin's emacs-w32 in LANG=ja_JP.UTF-8

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

bug#19910: 24.4; Japanese font names are decoded incorrectly in Cygwin's emacs-w32 in LANG=ja_JP.UTF-8

Fujii Hironori
(font-family-list) returns incorrectly decoded Japanese font names.
My locale-coding-system is utf-8-unix.

If I do (setq locale-coding-system 'cp932), it returns correct font names.
But, locale-coding-system is used in other places (e.g. M-x term and M-x man).
locale-coding-system must be utf-8 in my Emacs.



In GNU Emacs 24.4.1 (x86_64-unknown-cygwin)
 of 2015-02-13 on desktop-new
Windowing system distributor `Microsoft Corp.', version 6.1.7601
Configured using:
 `configure
 --srcdir=/home/kbrown/src/cygemacs/emacs-24.4-3.x86_64/src/emacs-24.4
 --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin
 --libexecdir=/usr/libexec --datadir=/usr/share --localstatedir=/var
 --sysconfdir=/etc --libdir=/usr/lib --datarootdir=/usr/share
 --docdir=/usr/share/doc/emacs --htmldir=/usr/share/doc/emacs/html -C
 --with-w32 'CFLAGS=-ggdb -O2 -pipe -Wimplicit-function-declaration
 -fdebug-prefix-map=/home/kbrown/src/cygemacs/emacs-24.4-3.x86_64/build=/usr/src/debug/emacs-24.4-3
 -fdebug-prefix-map=/home/kbrown/src/cygemacs/emacs-24.4-3.x86_64/src/emacs-24.4=/usr/src/debug/emacs-24.4-3'
 CPPFLAGS= LDFLAGS='

Important settings:
  value of $LANG: ja_JP.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Fundamental

Minor modes in effect:
  tooltip-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  buffer-read-only: t
  line-number-mode: t
  transient-mark-mode: t

Recent input:
<language-change> <help-echo> <help-echo> <help-echo>
<help-echo> <help-echo> <help-echo> <help-echo> <menu-bar>
<help-menu> <send-emacs-bug-report>

Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.
No docstring slot for setup-japanese-environment-internal

Load-path shadows:
None found.

Features:
(shadow sort gnus-util mail-extr emacsbug message format-spec rfc822 mml
easymenu mml-sec mm-decode mm-bodies mm-encode mail-parse rfc2231
mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums
mm-util help-fns mail-prsvr mail-utils time-date japan-util tooltip
electric uniquify ediff-hook vc-hooks lisp-float-type mwheel
w32-common-fns disp-table w32-win w32-vars tool-bar dnd fontset image
regexp-opt fringe tabulated-list newcomment lisp-mode prog-mode register
page menu-bar rfn-eshadow timer select scroll-bar mouse jit-lock
font-lock syntax facemenu font-core frame cham georgian utf-8-lang
misc-lang vietnamese tibetan thai tai-viet lao korean japanese hebrew
greek romanian slovak czech european ethiopic indian cyrillic chinese
case-table epa-hook jka-cmpr-hook help simple abbrev minibuffer nadvice
loaddefs button faces cus-face macroexp files text-properties overlay
sha1 md5 base64 format env code-pages mule custom widget
hashtable-print-readable backquote make-network-process dbusbind
gfilenotify w32 multi-tty emacs)

Memory information:
((conses 16 76005 6531)
 (symbols 48 17442 0)
 (miscs 40 60 88)
 (strings 32 10617 5193)
 (string-bytes 1 268020)
 (vectors 16 9545)
 (vector-slots 8 454592 39464)
 (floats 8 57 94)
 (intervals 56 193 0)
 (buffers 960 12))



Reply | Threaded
Open this post in threaded view
|

bug#19910: 24.4; Japanese font names are decoded incorrectly in Cygwin's emacs-w32 in LANG=ja_JP.UTF-8

Eli Zaretskii
> Date: Fri, 20 Feb 2015 19:39:55 +0900
> From: Fujii Hironori <[hidden email]>
>
> (font-family-list) returns incorrectly decoded Japanese font names.
> My locale-coding-system is utf-8-unix.
>
> If I do (setq locale-coding-system 'cp932), it returns correct font names.
> But, locale-coding-system is used in other places (e.g. M-x term and M-x man).
> locale-coding-system must be utf-8 in my Emacs.

The problem is in w32font.c: it should call the "wide" (a.k.a.
"Unicode") APIs, and then decode strings using utf-16le, like we do in
w32fns.c with encoding strings we pass to w32 GUI APIs.



Reply | Threaded
Open this post in threaded view
|

bug#19910: 24.4; Japanese font names are decoded incorrectly in Cygwin's emacs-w32 in LANG=ja_JP.UTF-8

Fujii Hironori
Tags: patch

On Fri, Feb 20, 2015 at 8:21 PM, Eli Zaretskii <[hidden email]> wrote:
> The problem is in w32font.c: it should call the "wide" (a.k.a.
> "Unicode") APIs, and then decode strings using utf-16le, like we do in
> w32fns.c with encoding strings we pass to w32 GUI APIs.

Unicode API patch is attached. Could you review it?
Should I use GetProcAddress for Windows 9x?

font.patch (32K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

bug#19910: 24.4; Japanese font names are decoded incorrectly in Cygwin's emacs-w32 in LANG=ja_JP.UTF-8

Eli Zaretskii
> Date: Sat, 28 Feb 2015 00:22:00 +0900
> From: Fujii Hironori <[hidden email]>
> Cc: [hidden email]
>
> On Fri, Feb 20, 2015 at 8:21 PM, Eli Zaretskii <[hidden email]> wrote:
> > The problem is in w32font.c: it should call the "wide" (a.k.a.
> > "Unicode") APIs, and then decode strings using utf-16le, like we do in
> > w32fns.c with encoding strings we pass to w32 GUI APIs.
>
> Unicode API patch is attached. Could you review it?
> Should I use GetProcAddress for Windows 9x?

Thanks.

However, this goes too far: there's no need to replace all the
functions with "wide" versions, only those functions that return font
name strings from the system.  For example, I don't think
CreateFontIndirect needs to be switched to Unicode, does it?  And CRT
functions like _wcslwr and swprintf that work on wchar_t arguments
aren't supported on Windows 9X, AFAIK, so we cannot call them.  (One
reason for using the minimum number of "wide" APIs is that we don't
have good ways of testing the development code on Windows 9X.)

And yes, for Windows 9X you will need to call these functions through
function pointers, after assigning them with GetProcAddress, as
w32font.c does elsewhere.

I would actually suggest to have a Cygwin-only branches of the code,
where you can freely call the "wide" APIs without bothering about
Windows 9X, since that's what the Cygwin-w32 build does elsewhere, and
since this is a Cygwin-specific problem due to the difference between
file-name encoding and the locale emulated by Cygwin.  There are a
bunch of macros like GUI_STR and GUI_ENCODE_FILE near the end of
w32term.h that can be used to minimize #ifdef's to the absolute
minimum.



Reply | Threaded
Open this post in threaded view
|

bug#19910: 24.4; Japanese font names are decoded incorrectly in Cygwin's emacs-w32 in LANG=ja_JP.UTF-8

Fujii Hironori
Thank you for reviewing my patch, Eli.

On Sat, Feb 28, 2015 at 1:03 AM, Eli Zaretskii <[hidden email]> wrote:
> However, this goes too far: there's no need to replace all the
> functions with "wide" versions, only those functions that return font
> name strings from the system.  For example, I don't think
> CreateFontIndirect needs to be switched to Unicode, does it?  And CRT
> functions like _wcslwr and swprintf that work on wchar_t arguments
> aren't supported on Windows 9X, AFAIK, so we cannot call them.  (One
> reason for using the minimum number of "wide" APIs is that we don't
> have good ways of testing the development code on Windows 9X.)

This is the code:

|    862  hfont = CreateFontIndirect (&logfont);
| (...)
|    912 = DECODE_SYSTEM (build_string (logfont.lfFaceName));

logfont.lfFaceName is ANSI text and DECODE_SYSTEM is the problem.
CreateFontIndirect should be wide.

> I would actually suggest to have a Cygwin-only branches of the code,
> where you can freely call the "wide" APIs without bothering about
> Windows 9X, since that's what the Cygwin-w32 build does elsewhere, and
> since this is a Cygwin-specific problem due to the difference between
> file-name encoding and the locale emulated by Cygwin.  There are a
> bunch of macros like GUI_STR and GUI_ENCODE_FILE near the end of
> w32term.h that can be used to minimize #ifdef's to the absolute
> minimum.

If this approach is used, structs such as LOGFONT and ENUMLOGFONTEX
should be ranemed to GUI_FN(LOGFONT) and GUI_FN(ENUMLOGFONTEX).
This looks ugly.

The best way to solve this is defining _UNICODE.
Defining _UNICODE is already filed, but closed as wontfix.

#265 - Build error with _UNICODE on w32. - GNU bug report logs
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=265

If Bug#265 is resolved, this bug (Bug#19910) will be resolved automatically.
And, _UNICODE macro can be used not only for Cygwin, but also NTEmacs.



Reply | Threaded
Open this post in threaded view
|

bug#19910: 24.4; Japanese font names are decoded incorrectly in Cygwin's emacs-w32 in LANG=ja_JP.UTF-8

Eli Zaretskii
> Date: Sat, 28 Feb 2015 21:14:00 +0900
> From: Fujii Hironori <[hidden email]>
> Cc: [hidden email]
>
> > I would actually suggest to have a Cygwin-only branches of the code,
> > where you can freely call the "wide" APIs without bothering about
> > Windows 9X, since that's what the Cygwin-w32 build does elsewhere, and
> > since this is a Cygwin-specific problem due to the difference between
> > file-name encoding and the locale emulated by Cygwin.  There are a
> > bunch of macros like GUI_STR and GUI_ENCODE_FILE near the end of
> > w32term.h that can be used to minimize #ifdef's to the absolute
> > minimum.
>
> If this approach is used, structs such as LOGFONT and ENUMLOGFONTEX
> should be ranemed to GUI_FN(LOGFONT) and GUI_FN(ENUMLOGFONTEX).
> This looks ugly.

We use it in quite a few places in Emacs, so ugly or not, this is a
kind of de-facto standard for resolving these issues.  More
importantly, it doesn't run the risk of breaking Emacs on Windows 9X.

> The best way to solve this is defining _UNICODE.
> Defining _UNICODE is already filed, but closed as wontfix.
>
> #265 - Build error with _UNICODE on w32. - GNU bug report logs
> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=265
>
> If Bug#265 is resolved, this bug (Bug#19910) will be resolved automatically.
> And, _UNICODE macro can be used not only for Cygwin, but also NTEmacs.

Most, if not all, of the issues which could motivate someone to use
_UNICODE were meanwhile fixed, so reviving that now makes very little
sense.  In particular, the native Windows build already uses the
Unicode APIs wherever feasible.  (The particular issue discussed in
this thread doesn't exist in the native build, AFAIU, because
DECODE_SYSTEM does its job there.)

Thanks.



Reply | Threaded
Open this post in threaded view
|

bug#19910: 24.4; Japanese font names are decoded incorrectly in Cygwin's emacs-w32 in LANG=ja_JP.UTF-8

Stefan Kangas
In reply to this post by Eli Zaretskii
Eli Zaretskii <[hidden email]> writes:

>> Date: Fri, 20 Feb 2015 19:39:55 +0900
>> From: Fujii Hironori <[hidden email]>
>>
>> (font-family-list) returns incorrectly decoded Japanese font names.
>> My locale-coding-system is utf-8-unix.
>>
>> If I do (setq locale-coding-system 'cp932), it returns correct font names.
>> But, locale-coding-system is used in other places (e.g. M-x term and M-x man).
>> locale-coding-system must be utf-8 in my Emacs.
>
> The problem is in w32font.c: it should call the "wide" (a.k.a.
> "Unicode") APIs, and then decode strings using utf-16le, like we do in
> w32fns.c with encoding strings we pass to w32 GUI APIs.

That was 5 years ago.  Is any of this still an issue on recent
versions of Emacs?

Best regards,
Stefan Kangas



Reply | Threaded
Open this post in threaded view
|

bug#19910: 24.4; Japanese font names are decoded incorrectly in Cygwin's emacs-w32 in LANG=ja_JP.UTF-8

Eli Zaretskii
> From: Stefan Kangas <[hidden email]>
> Cc: Fujii Hironori <[hidden email]>,  [hidden email]
> Date: Sun, 01 Dec 2019 09:21:56 +0100
>
> Eli Zaretskii <[hidden email]> writes:
>
> >> Date: Fri, 20 Feb 2015 19:39:55 +0900
> >> From: Fujii Hironori <[hidden email]>
> >>
> >> (font-family-list) returns incorrectly decoded Japanese font names.
> >> My locale-coding-system is utf-8-unix.
> >>
> >> If I do (setq locale-coding-system 'cp932), it returns correct font names.
> >> But, locale-coding-system is used in other places (e.g. M-x term and M-x man).
> >> locale-coding-system must be utf-8 in my Emacs.
> >
> > The problem is in w32font.c: it should call the "wide" (a.k.a.
> > "Unicode") APIs, and then decode strings using utf-16le, like we do in
> > w32fns.c with encoding strings we pass to w32 GUI APIs.
>
> That was 5 years ago.  Is any of this still an issue on recent
> versions of Emacs?

I don't think anything's changed in that department, so the problem
should still be there.

However, I have an idea of a much simpler fix for this, but I need a
volunteer who has this problem to test a patch I'd like to write to
fix this.  Anyone?