[ELPA] New Package: greek-polytonic.el

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[ELPA] New Package: greek-polytonic.el

Johannes Choo
Hi all,

I'd like to contribute a new package greek-polytonic.el to ELPA, or where ever it is more appropriate.

The latest version is maintained at https://github.com/jhanschoo/quail-greek-polytonic/tree/fsf .

Rationale: Polytonic Greek input is of interest primarily to classicists, people who want to reproduce Ancient Greek quotations, and for input of Katharevousa Greek. There already exist several input methods for Polytonic Greek in the greek.el file distributed with Emacs, but this file improves over them in the following 2 ways:

1) Mapping based on "standard" Win and Mac layouts—The existing input methods' mappings are modeled after ad-hoc polytonic input schemes devised by classicists very early in computing history, in conjunction with specialized software. The mappings I use are modeled after Mac and Windows polytonic Greek keyboard mappings, (in turn modeled after monotonic Greek), and hence require less context-switching to use for people accustomed to modern and popular Greek keyboards.

2) Input of combining character sequences possible—While the existing input methods allow for the input of bare letters and precomposed letter+diacritics, but not for Unicode letter+diacritic sequences that are not represented by precomposed characters. For example, the sequence <alpha>+<combining macron>+<combining acute accent> is not represented by any precomposed character, but appears frequently in critical editions of classics. greek-polytonic.el allows for the input of combining characters themselves, and substitutes such sequences with their Unicode-canonical precomposed equivalents if they exist; hence input from this method satisfies Unicode-NFC (Normalization Form Canonical Composition), while allowing input of sequences that have no corresponding precomposed character. Though it is to be admitted that font support and Emacs's display support for such decomposed sequences is still rudimentary and the sequence may visually appear funky.

While I consider this package functionally complete, there are several avenues for further work if there is interest:
—Allow "prefix" input of diacritics à la most other input methods. (currently only "postfix" diacritic input is supported)

—Allow input of greek numeral modifiers, of archaic letters, and of greek "symbols".
—Allow input of non-combining versions of diacritics.

Thanks for your attention.

Bests,
Johannes Choo
--
Bests,
Johannes Choo
B. Comp student at National University of Singapore
NUSHackers Coreteam

greek-polytonic.el (34K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [ELPA] New Package: greek-polytonic.el

Eli Zaretskii
> From: Johannes Choo <[hidden email]>
> Date: Sat, 14 Jul 2018 04:29:15 -0500
>
> I'd like to contribute a new package greek-polytonic.el to ELPA, or where ever it is more appropriate.

Why not add this to greek.el?

> 2) Input of combining character sequences possible—While the existing input methods allow for the input of
> bare letters and precomposed letter+diacritics, but not for Unicode letter+diacritic sequences that are not
> represented by precomposed characters. For example, the sequence <alpha>+<combining
> macron>+<combining acute accent> is not represented by any precomposed character, but appears
> frequently in critical editions of classics. greek-polytonic.el allows for the input of combining characters
> themselves, and substitutes such sequences with their Unicode-canonical precomposed equivalents if they
> exist; hence input from this method satisfies Unicode-NFC (Normalization Form Canonical Composition),
> while allowing input of sequences that have no corresponding precomposed character. Though it is to be
> admitted that font support and Emacs's display support for such decomposed sequences is still rudimentary
> and the sequence may visually appear funky.

Is this a good idea?  It seems to go against the intent of whoever is
typing the text: they do want the decomposed characters to appear in
the text.  Emacs will automatically (by default) compose them on
display (and if it doesn't, that's a bug that should be reported and
fixed), per Unicode requirements, and if the font supports the
precomposed glyph, you will actually see that glyph on display.
Replacing characters with their NFC equivalents should IMO be a
separate feature, not something an input method does.  Am I missing
something?

Thanks.

Reply | Threaded
Open this post in threaded view
|

Re: [ELPA] New Package: greek-polytonic.el

Cesar Crusius
Eli Zaretskii <[hidden email]> writes:

>> From: Johannes Choo <[hidden email]>
>> Date: Sat, 14 Jul 2018 04:29:15 -0500
>>
>> I'd like to contribute a new package greek-polytonic.el to ELPA, or
>> where ever it is more appropriate.
>
> Why not add this to greek.el?
>
>> 2) Input of combining character sequences possible—While the
>> existing input methods allow for the input of
>> bare letters and precomposed letter+diacritics, but not for Unicode
>> letter+diacritic sequences that are not
>> represented by precomposed characters. For example, the sequence <alpha>+<combining
>> macron>+<combining acute accent> is not represented by any
>> precomposed character, but appears
>> frequently in critical editions of classics. greek-polytonic.el
>> allows for the input of combining characters
>> themselves, and substitutes such sequences with their
>> Unicode-canonical precomposed equivalents if they
>> exist; hence input from this method satisfies Unicode-NFC
>> (Normalization Form Canonical Composition),
>> while allowing input of sequences that have no corresponding
>> precomposed character. Though it is to be
>> admitted that font support and Emacs's display support for such
>> decomposed sequences is still rudimentary
>> and the sequence may visually appear funky.
>
> Is this a good idea?  It seems to go against the intent of whoever is
> typing the text: they do want the decomposed characters to appear in
> the text.  Emacs will automatically (by default) compose them on
> display (and if it doesn't, that's a bug that should be reported and
> fixed), per Unicode requirements, and if the font supports the
> precomposed glyph, you will actually see that glyph on display.
> Replacing characters with their NFC equivalents should IMO be a
> separate feature, not something an input method does.  Am I missing
> something?
I'm not sure what you mean by "want the decomposed characters to appear
in the text," but when I am writing polytonic Greek and type the
sequence above, all I want is to see an alpha+macron+acute in front of
me. I don't particularly care how it is represented internally. As long
as the input method produces a valid representation for what I want, it
should be fine.

By the way, thanks for trying to solve this problem -- it's been a
long-standing one. I solved some of that for myself via XCompose, but
that's not portable.

Font and application support may be the biggest hurdle, though. It is
for me. What are the consumers of the texts you are producing in Emacs,
TeX and friends?  I'd be interested in a properly working TeX setup for
polytonic Greek, but that's another thread (in another group, maybe?).

Cheers,

--
Cesar Crusius

signature.asc (671 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [ELPA] New Package: greek-polytonic.el

Eli Zaretskii
> From: Cesar Crusius <[hidden email]>
> Cc: Johannes Choo <[hidden email]>,  [hidden email]
> Date: Sat, 14 Jul 2018 10:11:01 -0700
>
> > Is this a good idea?  It seems to go against the intent of whoever is
> > typing the text: they do want the decomposed characters to appear in
> > the text.  Emacs will automatically (by default) compose them on
> > display (and if it doesn't, that's a bug that should be reported and
> > fixed), per Unicode requirements, and if the font supports the
> > precomposed glyph, you will actually see that glyph on display.
> > Replacing characters with their NFC equivalents should IMO be a
> > separate feature, not something an input method does.  Am I missing
> > something?
>
> I'm not sure what you mean by "want the decomposed characters to appear
> in the text," but when I am writing polytonic Greek and type the
> sequence above, all I want is to see an alpha+macron+acute in front of
> me.

On display or in the buffer?  If on display, then Emacs should already
do that, provided that the font you are using supports the composed
characters.  That's because by default we have the
auto-composition-mode turned on.

I was talking about what's in the buffer.  I think that if the user
types a sequence of characters, Emacs should generally put those
characters unaltered in the buffer.  If the user wants a precomposed
character, she could always type that character's codepoint using
"C-x 8 RET", no?

But maybe I don't know enough about the expectations of users who
would use greek-polytonic input method, maybe in some use cases such
automatic composition in the buffer is expected?

Reply | Threaded
Open this post in threaded view
|

Re: [ELPA] New Package: greek-polytonic.el

Richard Stallman
[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > I was talking about what's in the buffer.  I think that if the user
  > types a sequence of characters, Emacs should generally put those
  > characters unaltered in the buffer.

This is good for several reasons:

* It makes C-b and C-f do the natural thing.

* It makes search do the natural thing.
--
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)



Reply | Threaded
Open this post in threaded view
|

Re: [ELPA] New Package: greek-polytonic.el

Cesar Crusius
In reply to this post by Eli Zaretskii
Eli Zaretskii <[hidden email]> writes:

>> From: Cesar Crusius <[hidden email]> Cc: Johannes Choo
>> <[hidden email]>,  [hidden email] Date: Sat, 14 Jul
>> 2018 10:11:01 -0700  
>> > Is this a good idea?  It seems to go against the intent of
>> > whoever is typing the text: they do want the decomposed
>> > characters to appear in the text.  Emacs will automatically
>> > (by default) compose them on display (and if it doesn't,
>> > that's a bug that should be reported and fixed), per Unicode
>> > requirements, and if the font supports the precomposed glyph,
>> > you will actually see that glyph on display.  Replacing
>> > characters with their NFC equivalents should IMO be a
>> > separate feature, not something an input method does.  Am I
>> > missing something?
>>  I'm not sure what you mean by "want the decomposed characters
>> to appear in the text," but when I am writing polytonic Greek
>> and type the sequence above, all I want is to see an
>> alpha+macron+acute in front of me.
>
> On display or in the buffer?  If on display, then Emacs should
> already do that, provided that the font you are using supports
> the composed characters.  That's because by default we have the
> auto-composition-mode turned on.
>
> I was talking about what's in the buffer.  I think that if the
> user types a sequence of characters, Emacs should generally put
> those characters unaltered in the buffer.  If the user wants a
> precomposed character, she could always type that character's
> codepoint using "C-x 8 RET", no?
>
> But maybe I don't know enough about the expectations of users
> who would use greek-polytonic input method, maybe in some use
> cases such automatic composition in the buffer is expected?
Maybe we're talking about different things...

Input methods do automatic composition all the time. That's what
they are expected to do. I do it every day when writing Portuguese
text. Consider "á": I just wrote it by switching input methods and
typing "<acute>-<a>". What ends up in the buffer and on the
display is one single character. If my buffer had instead
"<a>+<combining acute>" I would consider that a bug. Unicode
supports the combination, I want the combination there.

This means that the input method's semantics is to translate a
sequence of keys into the most natural underlying
representation. For "a acute," it is "á", not "a+combining acute",
and nobody blinks an eye.

For polytonic Greek, however, the problem is that Unicode does not
have pre-composed characters to represent all the
possibilities. Combining characters will be needed, but the input
method can -- and I argue /should/ -- combine what they
can. Example:

* Typing "a + macron" should give U+1FB1, "Greek small letter
  alpha with
  macron," /one/ character, just as "á" above. Similarly, I would
  consider "<a>+<combining macron>" a bug.
* Typing "a + macron + acute" should give the above plus a U+0301
  "combining
  acute", because it is the best it can do -- and it is what fonts
  like Skolar expect.

"C-x 8 RET" is not a solution if you are typing in a language that
requires it once or more every word. (Again, that becomes the job
of the input method.)

By the way, I'm all for greek.el supporting polytonic Greek
natively and naturally. I don't remember what the problems were,
but I gave up on it quickly when trying polytonic because it
didn't work.

Cheers,

--
Cesar Crusius

signature.asc (671 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [ELPA] New Package: greek-polytonic.el

Eli Zaretskii
> From: Cesar Crusius <[hidden email]>
> Cc: Cesar Crusius <[hidden email]>,  [hidden email],  [hidden email]
> Date: Sat, 14 Jul 2018 18:37:23 -0700
>
> >>  I'm not sure what you mean by "want the decomposed characters
> >> to appear in the text," but when I am writing polytonic Greek
> >> and type the sequence above, all I want is to see an
> >> alpha+macron+acute in front of me.
> >
> > On display or in the buffer?  If on display, then Emacs should
> > already do that, provided that the font you are using supports
> > the composed characters.  That's because by default we have the
> > auto-composition-mode turned on.
> >
> > I was talking about what's in the buffer.  I think that if the
> > user types a sequence of characters, Emacs should generally put
> > those characters unaltered in the buffer.  If the user wants a
> > precomposed character, she could always type that character's
> > codepoint using "C-x 8 RET", no?
> >
> > But maybe I don't know enough about the expectations of users
> > who would use greek-polytonic input method, maybe in some use
> > cases such automatic composition in the buffer is expected?
>
> Maybe we're talking about different things...
>
> Input methods do automatic composition all the time. That's what
> they are expected to do. I do it every day when writing Portuguese
> text. Consider "á": I just wrote it by switching input methods and
> typing "<acute>-<a>". What ends up in the buffer and on the
> display is one single character.

True, and I was not talking about that.

> This means that the input method's semantics is to translate a
> sequence of keys into the most natural underlying
> representation. For "a acute," it is "á", not "a+combining acute",
> and nobody blinks an eye.

More accurately, input methods normally read ASCII characters and
produce non-ASCII characters, whether accented or not.  By contrast,
your original text:

> For example, the sequence <alpha>+<combining macron>+<combining
> acute accent> is not represented by any precomposed character, but
> appears frequently in critical editions of
> classics. greek-polytonic.el allows for the input of combining
> characters themselves, and substitutes such sequences with their
> Unicode-canonical precomposed equivalents if they exist;

led me to believe that your input method takes three non-ASCII
characters, alpha combining macron and combining acute accent, and
produce from them a single composed character which is their NFC
precomposed character.  This is not what an input method should do,
IMO.

However, I see now that no such NFC composition is being done for
non-ASCII input (right?), so I guess I misunderstood; sorry about
that.

> For polytonic Greek, however, the problem is that Unicode does not
> have pre-composed characters to represent all the
> possibilities. Combining characters will be needed, but the input
> method can -- and I argue /should/ -- combine what they
> can. Example:
>
> * Typing "a + macron" should give U+1FB1, "Greek small letter
>   alpha with
>   macron," /one/ character, just as "á" above. Similarly, I would
>   consider "<a>+<combining macron>" a bug.
> * Typing "a + macron + acute" should give the above plus a U+0301
>   "combining
>   acute", because it is the best it can do -- and it is what fonts
>   like Skolar expect.

Emacs combines these automatically, but only on display; in the buffer
we still have several separate codepoints.  And I think this is
correct.

> By the way, I'm all for greek.el supporting polytonic Greek
> natively and naturally. I don't remember what the problems were,
> but I gave up on it quickly when trying polytonic because it
> didn't work.

I was talking about adding your input method to greek.el.

Reply | Threaded
Open this post in threaded view
|

Re: [ELPA] New Package: greek-polytonic.el

Cesar Crusius
Eli Zaretskii <[hidden email]> writes:

>> From: Cesar Crusius <[hidden email]> Cc: Cesar Crusius
>> <[hidden email]>,  [hidden email],
>> [hidden email] Date: Sat, 14 Jul 2018 18:37:23 -0700  
>> >>  I'm not sure what you mean by "want the decomposed
>> >>  characters  
>> >> to appear in the text," but when I am writing polytonic
>> >> Greek  and type the sequence above, all I want is to see an
>> >> alpha+macron+acute in front of me.  
>> >  On display or in the buffer?  If on display, then Emacs
>> > should  already do that, provided that the font you are using
>> > supports  the composed characters.  That's because by default
>> > we have the  auto-composition-mode turned on.    I was
>> > talking about what's in the buffer.  I think that if the
>> > user types a sequence of characters, Emacs should generally
>> > put  those characters unaltered in the buffer.  If the user
>> > wants a  precomposed character, she could always type that
>> > character's  codepoint using "C-x 8 RET", no?    But maybe I
>> > don't know enough about the expectations of users  who would
>> > use greek-polytonic input method, maybe in some use  cases
>> > such automatic composition in the buffer is expected?
>>  Maybe we're talking about different things...   (snip)
>
> More accurately, input methods normally read ASCII characters
> and produce non-ASCII characters, whether accented or not.  By
> contrast, your original text:
>
>> For example, the sequence <alpha>+<combining macron>+<combining
>> acute accent> is not represented by any precomposed character,
>> but appears frequently in critical editions of
>> classics. greek-polytonic.el allows for the input of combining
>> characters themselves, and substitutes such sequences with
>> their Unicode-canonical precomposed equivalents if they exist;
That's not mine, but the OP's text :)

> led me to believe that your input method takes three non-ASCII
> characters, alpha combining macron and combining acute accent,
> and produce from them a single composed character which is their
> NFC precomposed character.  This is not what an input method
> should do, IMO.
>
> However, I see now that no such NFC composition is being done
> for non-ASCII input (right?), so I guess I misunderstood; sorry
> about that.

No need to be sorry about anything -- wonders of written
communication. I think we're on the same page now.

> (snip)
>
>> By the way, I'm all for greek.el supporting polytonic Greek
>> natively and naturally. I don't remember what the problems
>> were,  but I gave up on it quickly when trying polytonic
>> because it  didn't work.
>
> I was talking about adding your input method to greek.el.

Not /my/ input method, I'm just encouraging the OP to think about
making this an improvement to greek.el instead of a separate
package, as you suggested in your first e-mail :)

Cheers,

--
Cesar Crusius

signature.asc (671 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [ELPA] New Package: greek-polytonic.el

Johannes Choo
Sorry for a late reply. I found a couple typos and it actually needs to be updated.

@Cesar:

I actually have a working polytonic Greek setup for LaTeX; (more accurately, XeLaTeX and LuaLaTeX); send me a message and I'll send you a minimum working preamble :)

@Eli:

yes, I think putting it in greek.el would be a better option, but I don't have contributor rights, and so far this has been the easier way for me to distribute it. I'd be happy to if someone can help me with that.

> It seems to go against the intent of whoever is
> typing the text: they do want the decomposed characters to appear in
> the text.  Emacs will automatically (by default) compose them on
> display (and if it doesn't, that's a bug that should be reported and
> fixed), per Unicode requirements, and if the font supports the
> precomposed glyph, you will actually see that glyph on display.
> Replacing characters with their NFC equivalents should IMO be a
> separate feature, not something an input method does.

In an ideal world... yeah. De facto, polytonic Greek online and presumably in most digital systems use the precomposed forms, and /all/ polytonic fonts I'm aware of do not gracefully handle the placement of decorations on greek letters.[1] Some fonts don't display the accents, some fonts have them overlap, and probably all fonts don't place the combining breathings (single-quotation-commas) in the right place. When writing this I found that my favorite programming font on emacs actually crashes my Linux system when using these combining characters![2]

This is the most prevalent compromise I've found online. greek-polytonic.el gives the most graceful fallback, compositing into precomposed forms whenever available.

[1]: I think this is in fact the case, de facto, for most Latin/Greek/Cyrillic documents. For example, I've observed that in movement in emacs, decomposed characters count as two characters, but all the documents so far I've opened have their accented characters count as one. Korean... is similar in its poor de facto support for decomposed characters. The only natural language that I know writes in decomposed characters de facto are the Indic languages!

[2]: This is a bug on emacs or the window system, but I don't even know where to begin diagnosing it.

Bests,
Johannes

On Tue, Jul 17, 2018 at 12:23 AM Cesar Crusius <[hidden email]> wrote:
Eli Zaretskii <[hidden email]> writes:

>> From: Cesar Crusius <[hidden email]> Cc: Cesar Crusius
>> <[hidden email]>,  [hidden email],
>> [hidden email] Date: Sat, 14 Jul 2018 18:37:23 -0700 
>> >>  I'm not sure what you mean by "want the decomposed
>> >>  characters 
>> >> to appear in the text," but when I am writing polytonic
>> >> Greek  and type the sequence above, all I want is to see an
>> >> alpha+macron+acute in front of me. 
>> >  On display or in the buffer?  If on display, then Emacs
>> > should  already do that, provided that the font you are using
>> > supports  the composed characters.  That's because by default
>> > we have the  auto-composition-mode turned on.    I was
>> > talking about what's in the buffer.  I think that if the
>> > user types a sequence of characters, Emacs should generally
>> > put  those characters unaltered in the buffer.  If the user
>> > wants a  precomposed character, she could always type that
>> > character's  codepoint using "C-x 8 RET", no?    But maybe I
>> > don't know enough about the expectations of users  who would
>> > use greek-polytonic input method, maybe in some use  cases
>> > such automatic composition in the buffer is expected?
>>  Maybe we're talking about different things...   (snip)
>
> More accurately, input methods normally read ASCII characters
> and produce non-ASCII characters, whether accented or not.  By
> contrast, your original text:
>
>> For example, the sequence <alpha>+<combining macron>+<combining
>> acute accent> is not represented by any precomposed character,
>> but appears frequently in critical editions of
>> classics. greek-polytonic.el allows for the input of combining
>> characters themselves, and substitutes such sequences with
>> their Unicode-canonical precomposed equivalents if they exist;

That's not mine, but the OP's text :)

> led me to believe that your input method takes three non-ASCII
> characters, alpha combining macron and combining acute accent,
> and produce from them a single composed character which is their
> NFC precomposed character.  This is not what an input method
> should do, IMO.
>
> However, I see now that no such NFC composition is being done
> for non-ASCII input (right?), so I guess I misunderstood; sorry
> about that.

No need to be sorry about anything -- wonders of written
communication. I think we're on the same page now.

> (snip)
>
>> By the way, I'm all for greek.el supporting polytonic Greek
>> natively and naturally. I don't remember what the problems
>> were,  but I gave up on it quickly when trying polytonic
>> because it  didn't work.
>
> I was talking about adding your input method to greek.el.

Not /my/ input method, I'm just encouraging the OP to think about
making this an improvement to greek.el instead of a separate
package, as you suggested in your first e-mail :)

Cheers,

--
Cesar Crusius
--
Bests,
Johannes Choo
B. Comp student at National University of Singapore
NUSHackers Coreteam