Change of Lisp syntax for "fancy" quotes in Emacs 27?

classic Classic list List threaded Threaded
97 messages Options
12345
Reply | Threaded
Open this post in threaded view
|

Change of Lisp syntax for "fancy" quotes in Emacs 27?

Noam Postavsky-2
In Emacs 26 and earlier the following is valid lisp code:

(setq ’bar 42)
(setq foo ’bar)

In the current master branch, this will signal (invalid-read-syntax
"strange quote" "’"). To write the equivalent the ’ must be backslash
escaped:

(setq \’bar 42)
(setq foo \’bar)

(the backslash escaping also works in earlier Emacs versions).

The point of this change is to give a more straightforward error in
cases where a plain straight quote is accidentally written instead of
a curved one.

In Bug#30217, Drew Adams strongly objects to this change. I don't want
to "sneak" this in, so I'm asking here for people's thoughts on this.

References:
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=30217
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=2967

PS In case anyone has trouble reading the example code (e.g., due to
some email encoding failure), evaluating

   (insert "(setq \u2019bar 42)\n(setq foo \u2019bar)")

will write it into your current buffer.

Reply | Threaded
Open this post in threaded view
|

Re: Change of Lisp syntax for "fancy" quotes in Emacs 27?

Paul Eggert
On 02/02/2018 02:24 PM, Noam Postavsky wrote:
> In Bug#30217, Drew Adams strongly objects to this change. I don't want
> to "sneak" this in, so I'm asking here for people's thoughts on this.

I see two main categories of users here, with different needs.
Less-expert users are likely to run into problems with quotes and other
characters (that's why we got bug reports), and appreciate diagnostics
pinpointing the problems; also, programmers concerned about security are
likely to want these confusing characters to be diagnosed, to prevent an
attacker from sending code that is easily read one way but actually
operates in a different way. On the other hand, programs that generate
Elisp code might prefer not having to special-case these characters. So
perhaps there should be a buffer-local variable that controls which
behavior is selected. The default behavior should be the one that caters
better to general users and is safer.

While we're on the topic, I suggest using the Unicode confusables list
<http://www.unicode.org/Public/security/10.0.0/confusables.txt> to come
up with a list of confusing alternatives for each character that has a
special meaning in Emacs Lisp. This should be better than our trying to
come up with our own, ad-hoc list. For example, U+A78C LATIN SMALL
LETTER SALTILLO (ꞌ) looks almost exactly like an apostrophe on my screen
and is in the confusables list, but is not a character that Emacs
currently checks for.


Reply | Threaded
Open this post in threaded view
|

RE: Change of Lisp syntax for "fancy" quotes in Emacs 27?

Drew Adams
> I see two main categories of users here, with different needs.
> Less-expert users are likely to run into problems with quotes
> and other characters (that's why we got bug reports), and
> appreciate diagnostics pinpointing the problems; also,
> programmers concerned about security are likely to want these
> confusing characters to be diagnosed, to prevent an attacker
> from sending code that is easily read one way but actually
> operates in a different way.
>
> On the other hand, programs that generate Elisp code might
> prefer not having to special-case these characters. So
> perhaps there should be a buffer-local variable that controls
> which behavior is selected. The default behavior should be
> the one that caters better to general users and is safer.

The distinction I think needs to be made is between:

1. Trying to _warn users_ (all users, less-expert or not)
   about possible misuse of particularly confusable chars.
   This just warns about possible pilot error.

2. _Changing Lisp_ reading and evaluating, to treat some
   (all?) confusable characters specially, changing their
   syntax and requiring them to be escaped in order to be
   treated normally (i.e., as they have been treated so far).

I object to #2, NOT to #1.

#1: By all means, we should try to help users.  We can
    issue byte-compilation warnings and some interactive
    warnings - provided we can helpfully and unambiguously
    distinguish the right situations.

#2 changes Lisp in non-neglible, non-helpful ways.
   See bug #30217 for more.

----

There are lots more characters to which the same
non-bug "fix" of changing Lisp might be applied (which
means that users will wonder why this confusable char
is treated specially, and not that one).

Such chars include pretty much anything that could be
confused with anything that is ever used as a delimiter
in Emacs Lisp: brackets (in the British sense) of all
sorts: parens, square, angle, curly.  There are really
quite a few such bracket-confusables.

Such chars also include pretty much anything that could
be confused with any other chars that are used specially
in Lisp: period, comma, quote, backquote, colon.  Again:
there are quite a few such confusables.

They even include chars that could be confused with the
directory separators used in Emacs Lisp.

Finally (?), they include chars that could be confused
with the ASCII-digit numerals 0123456789.  There are
lots of these confusables too.

(Even with just ASCII there are confusables.  Think of
what some use in passwords or leet: zero vs uppercase
letter O, digit 1 vs lowercase letter l, etc.  We've
just gotten used to carefully distinguishing such chars.
Now there are many more, and slighter, differences to
get used to.)

----

Beyond the question of which chars to treat specially,
there's the question of where - in which contexts -
to try to distinguish them.

Contexts include such places as sexps being evaluated,
doc strings, and comments.

They can also include fonts: a given character might
be confusable, or more confusable, in one font than
in another.  Even font size can make a difference
(with some fonts I find myself zooming in to see
whether a quote-thingy might really be a curly quote).

The questions of which chars and where (context) are
both relevant even if we only warn users (#1) and do
not change Lisp syntax (#2).

----

At the very least, I would hope that if we do anything
at all about this we would start by only warning.
I really hope we will not change Lisp syntax for this,
i.e., I hope we revert the change that has been made so
far for Emacs 27.

> While we're on the topic, I suggest using the Unicode
> confusables list ... to come up with a list of confusing
> alternatives for each character that has a special meaning
> in Emacs Lisp. This should be better than our trying to
> come up with our own, ad-hoc list.
>
> For example, U+A78C LATIN SMALL LETTER SALTILLO (ꞌ) looks
> almost exactly like an apostrophe on my screen and is in
> the confusables list, but is not a character that Emacs
> currently checks for.

Yup, and that's just one tiny tip of this terribly
tippy iceberg.

Reply | Threaded
Open this post in threaded view
|

Re: Change of Lisp syntax for "fancy" quotes in Emacs 27?

Paul Eggert
On 02/02/2018 04:00 PM, Drew Adams wrote:

> The distinction I think needs to be made is between:
>
> 1. Trying to_warn users_  (all users, less-expert or not)
>     about possible misuse of particularly confusable chars.
>     This just warns about possible pilot error.
>
> 2._Changing Lisp_  reading and evaluating, to treat some
>     (all?) confusable characters specially, changing their
>     syntax and requiring them to be escaped in order to be
>     treated normally (i.e., as they have been treated so far).
>
> I object to #2, NOT to #1.

I don't see a clear distinction between #1 and #2. For example, in an
adversarial environment, users who get warned about suspicious
characters in their incoming source files will most likely type "no"
when asked to run such code. In that case, if you want your audience to
include users who care even a smidgen about security, you'll need to
escape confusable characters in the business parts of your Emacs Lisp
code. Effectively that will be a change to Emacs Lisp, even if its
formal syntax does not change.


Reply | Threaded
Open this post in threaded view
|

RE: Change of Lisp syntax for "fancy" quotes in Emacs 27?

Drew Adams
> > The distinction I think needs to be made is between:
> >
> > 1. Trying to_warn users_  (all users, less-expert or not)
> >     about possible misuse of particularly confusable chars.
> >     This just warns about possible pilot error.
> >
> > 2._Changing Lisp_  reading and evaluating, to treat some
> >     (all?) confusable characters specially, changing their
> >     syntax and requiring them to be escaped in order to be
> >     treated normally (i.e., as they have been treated so far).
> >
> > I object to #2, NOT to #1.
>
> I don't see a clear distinction between #1 and #2.

That's too bad.  They are really quite different.

In the first case, you get a warning.  In the second case
your code breaks.

> For example, in an adversarial environment...

I don't think that's the reason for this change at all.
It was not mentioned in the bug thread, AFAIK.

The motivation was to prevent confusion on the part of
users, not to prevent or avoid malevolent behavior.
Please see the bug thread (#30217).

The idea was to improve convenience and reduce confusion
by someone who copy+pastes code from a web page (for
example), when (for example) that page renders a normal
quote as a curly quote.

You want to introduce a security aspect here.  I can't
speak much to that.  I'll simply ask whether other Lisps
(e.g. Common Lisp) worry about such a risk?  What does
Clojure do about confusables in Lisp symbols?  Does any
other Lisp change the Lisp syntax and behavior to require
special escaping of such chars in symbols (or elsewhere)?

Sure, even if no other Lisp worries about this or takes
the same approach as that proposed, that's not a proof
that Emacs Lisp shouldn't.  Still...

Given enough motivation, you can already, today, create
Lisp code (confusing, confusable, or otherwise) that is
evil, even without using any consusable Unicode chars.

When I was a kid we would play tricks on each other,
changing a character somewhere in a friend's large deck
of punched Hollerith cards - e.g., insert or remove a
decimal point.  You had to wait a full day to get back
the result of your program run, and the result would
only be a pretty cryptic error msg.  Argggh!

It was just good-natured fun - a game among friends.
And that was only with assembler and Fortran, and we
were just newbie kids.  Imagine what you can do today,
without bothering to rely on close Unicode confusables.

Sorry, but your "security" argument just doesn't pass
muster, for me.

Reply | Threaded
Open this post in threaded view
|

Re: Change of Lisp syntax for "fancy" quotes in Emacs 27?

Eli Zaretskii
In reply to this post by Noam Postavsky-2
> From: Noam Postavsky <[hidden email]>
> Date: Fri, 2 Feb 2018 17:24:43 -0500
> Cc: Drew Adams <[hidden email]>
>
> In Emacs 26 and earlier the following is valid lisp code:
>
> (setq ’bar 42)
> (setq foo ’bar)
>
> In the current master branch, this will signal (invalid-read-syntax
> "strange quote" "’"). To write the equivalent the ’ must be backslash
> escaped:
>
> (setq \’bar 42)
> (setq foo \’bar)
>
> (the backslash escaping also works in earlier Emacs versions).
>
> The point of this change is to give a more straightforward error in
> cases where a plain straight quote is accidentally written instead of
> a curved one.

The bug reports which triggered the above changes are bug#2967 and
bug#23425.  So any proposal to remove those changes should also
suggest an alternative for handling those bug reports.

Reply | Threaded
Open this post in threaded view
|

RE: Change of Lisp syntax for "fancy" quotes in Emacs 27?

Drew Adams
> The bug reports which triggered the above changes are bug#2967 and
> bug#23425.  So any proposal to remove those changes should also
> suggest an alternative for handling those bug reports.

For "handling those bug reports"?  Are we to add
more cans of worms to this question, obscuring it?

AFAICT, no alternatives to handling those bugs
are needed because of reverting the Lisp syntax
change made for bug #30217.  Can you point to
how/why reverting that change would necessitate
alternative fixes for those bugs?

Bug #2967 just asked for a warning, e.g. during
byte-compilation or loading.  There's no
objection here to warning.

Bug #2967 did not ask for (or get) a change in
Lisp syntax.  I see no negative impact on #2967
from reverting the Lisp-syntax "fix" to #30217.

Even #30217 did not ask for such a syntax change.
Warning is sufficient for fixing #30217 too.

Bug #23425, on the other hand, is a gigantic
stream-of-consciousness about anything and
everything to do with Paul's changes to Emacs
over the last few years wrt curly quotes.
It's not a single bug report thread - it's
all over the map.

In any case, #23425, like #2967 (and even
#30217), is not about what was done to "fix"
#30217 - changing Lisp syntax for fancy quotes.

How is it helpful to throw all of #23425 into
this Lisp syntax-change question, as if the
present issue puts into question everything
ever discussed about curly quotes?

Or do you have something specific in mind here
wrt #23425 - some part of it?  Something that
would actually be impacted negatively by
reverting the Lisp syntax changes for #30217?
If so, please identify it.

But if you mean only the ability to get confused
by copy+pasting Lisp code that has a fancy quote
mark somewhere in place of ordinary ASCII
apostrophe ('), e.g., (setq foo ’bar), then
that's just the same pilot-error gotcha as for
bug #30217.

There are many gotchas in Lisp.  You can see
repeated postings of some at various places
(e.g., help-gnu-emacs, emacs.stackexchange).
E.g., the error that a given Lisp function is
not defined (because its library was not loaded).

The pilot error described in bug #30217 is not
even a commonly reported one.  The "fix" made
in #30217 is an overreaction.

So one solution to #30217 is to do nothing - just
revert the misguided Lisp syntax change.  Users
will learn that gotcha the same way they learn
others.  Not every report of a gotcha needs to
lead to changes to Emacs.

If we do nothing there will continue to be some
such pilot errors, of course.  But we already
raise an error if the code leads to a problem.

And the original error message from bug #23425
is _more_ meaningful and helpful, not less,
than the new one after the "fix".

The original error msg of #23425:
  (wrong-number-of-arguments setq 31)

tells you pretty much that setq is missing an
argument or it has too many, which makes you
look at its arguments.  Not so obscure.  And
accurate.

The new error msg:
  (invalid-read-syntax "strange quote" "’")

is obscure.  Invalid read syntax when reading
what?  What's invalid about it?

Confusion - not understanding an accurate error
msg, is not the same thing as Lisp itself having
a bug because such a character is included in a
symbol name.

Another solution is to try to warn users about
the use of confusables.

That's actually many solutions, because it
requires handling different chars and different
gotcha contexts differently, and carefully.
But unlike a syntax change it's not an
all-or-nothing thing: we could add warnings here
and there, as something might be better than
nothing.

Either doing nothing or trying to warn about such
gotchas is right.  Changing Lisp syntax here is
not right.  Lisp doesn't have a bug here.

This is all about pilot error - the same kind of
thing that happens when someone mistypes `,' for
`.' for dotted-pair syntax, or types `.' in `a.b'
intending dotted-pair syntax but getting a symbol
instead, or quotes a sexp expecting the sexp to
be evaluated.

Yes, a user might scratch her head when seeing
the error message from such a mistake, but the
error message is right, not wrong, and eventually
the light turns on.

And this enlightenment is aided by the fact that
Lisp syntax is so simple.  The "fix" for bug
#30217 goes in the opposite direction.  It makes
Lisp syntax more complex and makes understanding
syntax mistakes more difficult.

Reply | Threaded
Open this post in threaded view
|

Re: Change of Lisp syntax for "fancy" quotes in Emacs 27?

Eli Zaretskii
> Date: Sat, 3 Feb 2018 08:16:15 -0800 (PST)
> From: Drew Adams <[hidden email]>
> Cc: [hidden email]
>
> > The bug reports which triggered the above changes are bug#2967 and
> > bug#23425.  So any proposal to remove those changes should also
> > suggest an alternative for handling those bug reports.
>
> For "handling those bug reports"?  Are we to add
> more cans of worms to this question, obscuring it?
>
> AFAICT, no alternatives to handling those bugs
> are needed because of reverting the Lisp syntax
> change made for bug #30217.  Can you point to
> how/why reverting that change would necessitate
> alternative fixes for those bugs?

Those bug reports complained about obscure error messages that are
unhelpful when a Lisp programmer tries to figure out the root cause.
I'm saying that we should find an alternative way of making clear,
helpful error messages in those special cases where characters which
display similarly might make the error message confusing if it just
cites the symbol's name.

For example, suppose you have a Lisp program that produces the
following error message when compiled/executed:

  Symbol's value as variable is void: 'аbbrevs-changed

You then type "C-h v abbrevs-changed RET" and get the expected result,
meaning that the variable is known to Emacs.  How quickly will you be
able to spot the cause of the error message?

The change that got reverted from the emacs-26 branch was about a
similar case, but for a character that's much more important for Lisp
than 'a': it's about the character used to quote symbol names.  But
the essence is the same: due to how characters are displayed, some
characters can be confused for others.

We want to find a way of identifying such situation and telling the
Lisp programmer about that in clear and easily understandable ways.
One way, perhaps too radical one, is to reject such "confusable"
characters outright.  We could decide that we don't want such a
radical solution, but that doesn't mean we should give up on the
attempt to find some other solution for the problem.  Neither does it
mean we should proclaim people who installed the change as enemies of
the society.

> Bug #23425, on the other hand, is a gigantic
> stream-of-consciousness about anything and
> everything [...]
> [...]
> How is it helpful to throw all of #23425 into
> this Lisp syntax-change question, as if the
> present issue puts into question everything
> ever discussed about curly quotes?

I could turn the table and ask you how is it helpful to dump on us all
your random thoughts about this, instead of simply saying you didn't
understand the relevance and asking for more explanations.  Which I
just provided.

I hope now the issue is clear enough.

> And the original error message from bug #23425
> is _more_ meaningful and helpful, not less,
> than the new one after the "fix".
>
> The original error msg of #23425:
>   (wrong-number-of-arguments setq 31)
>
> tells you pretty much that setq is missing an
> argument or it has too many, which makes you
> look at its arguments.  Not so obscure.  And
> accurate.
>
> The new error msg:
>   (invalid-read-syntax "strange quote" "’")
>
> is obscure.  Invalid read syntax when reading
> what?  What's invalid about it?

I think you are so eager to make your point that you are willing to
claim that black is white and vice versa.  Any objective person would
agree that the new error message is more directly pointing to the root
cause, which is the syntax of specifying a quoted symbol name using a
"strange quote".  If we are good in writing and indexing our ELisp
manual, then I'd expect to find there an index entry for "strange
quote", which will land me where this issue is explained.  Case
closed.

Once again, I can agree that this measure might be too harsh, but I
would still like to see clear diagnostics of such typos, and like
Paul, I thing we should take our inspiration from the Unicode
Standard's notion of "confusables".  Ideas and proposals for patches
along those lines are welcome.  Ignoring the problem, or trying to
convince us that it doesn't exist, is not.

> Either doing nothing or trying to warn about such
> gotchas is right.  Changing Lisp syntax here is
> not right.

Doing nothing would be ignoring the problem.  That changing Lisp
syntax is not right is your opinion: legitimate, but clearly not
shared by at least some.

> Lisp doesn't have a bug here.

That's a strawman, and you know it.  We are talking about diagnostics
for bugs in Lisp programs.

Reply | Threaded
Open this post in threaded view
|

Re: Change of Lisp syntax for "fancy" quotes in Emacs 27?

Aaron Ecay
In reply to this post by Noam Postavsky-2
Hi Noam,

2018ko otsailak 2an, Noam Postavsky-ek idatzi zuen:
>
> In Emacs 26 and earlier the following is valid lisp code:
>
> (setq ’bar 42)
> (setq foo ’bar)

I was surprised to learn that this is the case, in light of what is
said in the Elisp reference about symbol names: “A symbol name can
contain any characters whatever. Most symbol names are written with
letters, digits, and the punctuation characters ‘-+=*/’. Such names
require no special punctuation; the characters of the name suffice as
long as the name does not look like a number. (If it does, write a ‘\’
at the beginning of the name to force interpretation as a symbol.) The
characters ‘_~!@$%^&:<>{}?’  are less often used but also require no
special punctuation. Any other characters may be included in a symbol's
name by escaping them with a backslash.”  (info "(elisp) Symbol Type")

Would it be worth considering making the reader enforce this fully
specification, as an alternative to your patch?  That would solve
this problem with curly quotes in symbol names (which also bit me at
one point), as well as the potential problems with other confusable
characters raised by Paul.

(It might still be desirable to add a special user-friendly error message
when the illegal characters are confusable with an ASCII single quote, as
an additional user-friendliness measure.)

Aaron

PS if this approach is not taken, the manual should at least be changed
to match the actual behavior of the reader.

Reply | Threaded
Open this post in threaded view
|

Re: Change of Lisp syntax for "fancy" quotes in Emacs 27?

Michael Heerdegen
In reply to this post by Eli Zaretskii
Hello,

Helpfulness of error messages surely depends on the beholder, and on
expectations.  In my eyes,

> Symbol's value as variable is void: 'аbbrevs-changed

is quite clear: you think this        ^^^^^^^^^^^^^^^^ is a quoted
thing, but the error message calls it a symbol.  So there must be a
problem with that quote, it has obviously gotten read as part of the
symbol.  Sure, you have still to find out why.  OTOH

> >   (invalid-read-syntax "strange quote" "’")

also doesn't say what's wrong with that quote.  It even calls something
a quote where there is none.  The error message is confusing.  Repeating
the pseudo quote character in the error message doesn't make it look
less like a quote.

> I think you are so eager to make your point that you are willing to
> claim that black is white and vice versa.  Any objective person would
> agree that the new error message is more directly pointing to the root
> cause

Are you really sure that every Emacs user would expect that we modify
the Lisp reader to catch typos?

FWIW, we already modified the Lisp reader to catch another style issue
(to get rid of old-style backquotes) and made it error.  It broke my
stuff (el-search) horribly - though I don't use old-style backquotes,
and for code that also doesn't use them.  Now I need to work around
`read' and define my own `read' function.  I also need to remember for a
long time that using `read' is forbidden in my library.  I even
implemented a minor mode to warn me just about that: it warns me that I
use `read' and it's forbidden.  Otherwise, I would get strange errors
when using my stuff, from time to time, whenever I added a `read' by
accident.  All other users of my package, too.  And believe me, _these_
error messages are then less understandable than

> Symbol's value as variable is void: 'аbbrevs-changed.

Misusing something fundamental as the Lisp reader to catch such stuff
should be the very last resort.  The result can get much more confusing
in situations we now don't think about.

> > Lisp doesn't have a bug here.
> That's a strawman, and you know it.  We are talking about diagnostics
> for bugs in Lisp programs.

I think it's a eligible argument.  Drew just thinks it's the wrong fix.
He may also think that no fix would maybe suffice.  That's ok, and I
think he made some good points.

We should discuss about alternative approaches to move forward.  People
often paste stuff into scratch or the M-: prompt that they copied from
elsewhere.  Maybe we could make M-: and C-x C-e check for this problem.
These could also check for other, similar frequent problems.  Any better
suggestions?


Michael.

Reply | Threaded
Open this post in threaded view
|

Re: Change of Lisp syntax for "fancy" quotes in Emacs 27?

Clément Pit-Claudel
On 2018-02-03 20:16, Michael Heerdegen wrote:
> Helpfulness of error messages surely depends on the beholder, and on
> expectations.  In my eyes,
>
>> Symbol's value as variable is void: 'аbbrevs-changed
> is quite clear: you think this        ^^^^^^^^^^^^^^^^ is a quoted
> thing, but the error message calls it a symbol.  So there must be a
> problem with that quote, it has obviously gotten read as part of the
> symbol.  Sure, you have still to find out why.

I think you're making Eli's point, actually :)

The problem isn't the quote: it's the CYRILLIC SMALL LETTER A instead of LATIN SMALL LETTER A.  IOW, (string= "аbbrevs-changed" "abbrevs-changed") is nil.

I think Eli was illustrating the confusion that can stem from Unicode confusables (and I must agree that the error message could be much better ^^)

Clément.

Reply | Threaded
Open this post in threaded view
|

RE: Change of Lisp syntax for "fancy" quotes in Emacs 27?

Drew Adams
In reply to this post by Eli Zaretskii
> Those bug reports complained about obscure error messages that are
> unhelpful when a Lisp programmer tries to figure out the root cause.
> I'm saying that we should find an alternative way of making clear,
> helpful error messages in those special cases where characters which
> display similarly might make the error message confusing if it just
> cites the symbol's name.

OK.  Except I would say warnings, not error messages, at
least in most cases.

But even if we have an error message, that's not a call
to change the syntax of Lisp.  User errors happen.  We
should just want to help users avoid making such errors.

> For example, suppose you have a Lisp program that produces
> the following error message when compiled/executed:
>
>   Symbol's value as variable is void: 'аbbrevs-changed
>
> You then type "C-h v abbrevs-changed RET" and get the expected result,
> meaning that the variable is known to Emacs.  How quickly will you be
> able to spot the cause of the error message?

Some people will wonder for a while.  Others, perhaps
already bitten by this gotcha, will notice the quote
mark there right away.

One thing that would help, I think, and which should
be done in general, would be to put the offending
thingie between `...':

 Symbol's value as variable is void: `'аbbrevs-changed'

That makes it more obvious that the symbol name
includes that fancy quote char.

Still, all of this is pilot error, where "pilot" can
include the user who wrote the code but more likely
means a user who copy+pasted it.

> The change that got reverted from the emacs-26 branch was about a
> similar case, but for a character that's much more important for Lisp
> than 'a': it's about the character used to quote symbol names.  But
> the essence is the same: due to how characters are displayed, some
> characters can be confused for others.
>
> We want to find a way of identifying such situation and telling the
> Lisp programmer about that in clear and easily understandable ways.
> One way, perhaps too radical one, is to reject such "confusable"
> characters outright.  We could decide that we don't want such a
> radical solution, but that doesn't mean we should give up on the
> attempt to find some other solution for the problem.  Neither does it
> mean we should proclaim people who installed the change as enemies of
> the society.

Agreed.  As I've said, I'm in favor of providing
friendly warnings/reminders that point out that
such a character is present.

I think that should be enough.

There are lots of potential confusables, and lots
of different use contexts.  But if we start with
just one or two such chars and one or two common
and clear contexts where a warning might help, that
would be good.  We can always add more such warnings
as cases come up (get reported or otherwise become
obvious).

It would be an overreaction, IMO, to jump to
changing the existing Lisp syntax to raise errors
when someone uses such a character in, say, a symbol
name.  We should not require such chars to be
escaped in a symbol name.  Such chars have no special
meaning for Lisp (unlike `.', `,' `'', ``', `(', `)',
`[', `]', `"', `<', `>', `#' `;', and perhaps some more).

> > Bug #23425, on the other hand, is a gigantic
> > stream-of-consciousness about anything and
> > everything [...]
> > [...]
> > How is it helpful to throw all of #23425 into
> > this Lisp syntax-change question, as if the
> > present issue puts into question everything
> > ever discussed about curly quotes?
>
> I could turn the table and ask you how is it helpful
> to dump on us all your random thoughts about this,
> instead of simply saying you didn't understand the
> relevance and asking for more explanations.  Which I
> just provided.

Whoa!  I don't see a connection between the current
issue and the many things discussed in #23425.  And
I don't think I dumped any random thoughts on anyone.

> I hope now the issue is clear enough.

No idea what your point is there.

If there is some part of bug #23425 that you think
is relevant here, and you think it will be UNfixed by
reverting the Lisp-syntax change made for bug #30217,
please tell us what that part is.

I don't see anything in #23425 that needs the change
in Lisp syntax made for #30217.  And I don't see that
Lisp change being necessary to fix #30217 either.
It wasn't requested by the bug filer, AFAIK.  Same
for the other bugs you mentioned.  The filers just
asked for warnings, AFAICT.

> > And the original error message from bug #23425
> > is _more_ meaningful and helpful, not less,
> > than the new one after the "fix".
>
> I think you are so eager to make your point that you are willing to
> claim that black is white and vice versa.  Any objective person would
> agree that the new error message is more directly pointing to the root
> cause, which is the syntax of specifying a quoted symbol name using a
> "strange quote".  If we are good in writing and indexing our ELisp
> manual, then I'd expect to find there an index entry for "strange
> quote", which will land me where this issue is explained.  Case
> closed.

We can perhaps agree to disagree about that.
But of course if you say the case is closed then
it's closed.

> Once again, I can agree that this measure might be too harsh, but I
> would still like to see clear diagnostics of such typos, and like
> Paul, I thing we should take our inspiration from the Unicode
> Standard's notion of "confusables".

I've agreed about that from the beginning.  It can
be helpful to warn users about possible confusion
when they use confusables.  And I agree that clear
diagnostics are needed - that was one of my points.

That's different from changing the syntax of Lisp.

> Ideas and proposals for patches along those lines
> are welcome.

Ditto.

> Ignoring the problem, or trying to convince us
> that it doesn't exist, is not.

I recognize the problems of confusable characters.
Not all such possible confusions are equally likely,
in practice.

Recognizing contexts where something might well be
a typo, and warrants a helpful reminder/warning, is
what's needed - case by case.

What's not needed, IMO (and probably the only place
where I differ from you on this, even if you don't
want to recognize it) is a change in Lisp syntax,
making it a read error not to escape such a character.

> > Either doing nothing or trying to warn about such
> > gotchas is right.  Changing Lisp syntax here is
> > not right.
>
> Doing nothing would be ignoring the problem.

Yes.  It's maybe not the best help for users, but
it would be one way to handle those few reports of
confusion.  We get a lot more questions due to
other confusions wrt Lisp than we do such questions
due to confusing one char for another.

I didn't, and don't, say that doing nothing is the
best approach.  I said it's one way to deal with
such reports.  Unlike changing Lisp syntax, it at
least doesn't introduce new problems.

> That changing Lisp syntax is not right is your
> opinion: legitimate, but clearly not shared by at
> least some.

That's why we're having this discussion.

I have yet to hear a reason why it is right to
change Lisp syntax for this - why a simple warning
is not sufficient and we need to also make Lisp
raise an error.

> > Lisp doesn't have a bug here.
>
> That's a strawman, and you know it.  We are talking
> about diagnostics for bugs in Lisp programs.

I have no objection to diagnostics.  Add warnings
for byte-compilation, loading, whatever.

Make sure the warnings are clear.  Say, for instance
that a curly quote was used in sexp `...'.  Don't
just say that invalid syntax was read (somewhere).
Clearly pointing out the confusable char in the
possibly confused sexp should go a long way to
making things clear.

My objection is to making such chars be escaped to
prevent Lisp from raising an error.  I don't put
`a’b' in the same class as, say, `a,b'.

`,' is special in Lisp, and (setq a,b 42) should
(and does) raise an error.  `’' is not special in
Lisp, and (setq a’b 42) should not raise an error (IMO).
Likewise, (setq ,b 42) (yes) and (setq ’b 42) (no).

If you want to argue for this syntax change, why
not address some of my arguments against it?  Where
will you draw the line, for instance?  There are
_lots_ of possible confusables.

I'd say start with only the few that have actually
been reported (is there only one reported?), trying
to come up with reasonable warnings in particular
contexts of use.  That would be a good start.

We might even have a user option that lists the
confusables to check/warn for, with whatever
default value people here think is best (it might
be only `’', to start with - or both left and
right curly quotes).

Are you thinking instead (since both you and Paul
mentioned the Unicode list of confusables) of
starting with _all_ characters in that list?

http://www.unicode.org/Public/security/8.0.0/confusables.txt

I won't argue about which chars should be warned
about, though I might be interested to see what
contexts we warn for and what the messages will be.

My objection is not about detecting this or that
use of this or that character and warning/reminding
users about it.

My objection is to making Lisp require escaping of
such characters.  That's all.  I think I've made
that as clear as I possibly can.

But you seem to want to paint my objection as
being against helping users know about accidental
use of confusables, e.g., `’' instead of `''.  Why?

Reply | Threaded
Open this post in threaded view
|

RE: Change of Lisp syntax for "fancy" quotes in Emacs 27?

Drew Adams
In reply to this post by Aaron Ecay
> I was surprised to learn that this is the case, in light of what is
> said in the Elisp reference about symbol names:
>
> “A symbol name can
> contain any characters whatever. Most symbol names are written with
> letters, digits, and the punctuation characters ‘-+=*/’. Such names
> require no special punctuation; the characters of the name suffice as
> long as the name does not look like a number. (If it does, write a ‘\’
> at the beginning of the name to force interpretation as a symbol.) The
> characters ‘_~!@$%^&:<>{}?’  are less often used but also require no
> special punctuation. Any other characters may be included in a symbol's
> name by escaping them with a backslash.”  (info "(elisp) Symbol Type")

Thank you very much for that.  I guess I wasn't aware of
that text.  I thought that there were only a very few
chars that needed to be escaped in symbol names - `,',
`(', etc.: only chars that have special syntactic
meaning in Lisp.

I suppose that invalidates my objection, though I wonder
_why_ we would require escaping so many ordinary chars.

And like you I wonder whether that text is accurate.
I wonder whether that is the intended design (why?) or it
is just an inaccurate description of the real behavior.

Trying various chars from confusables.txt, it does not
seem like they require escaping (at least not yet).
That text appears to be wrong.

I'd prefer it if escaping was _not_ required for chars
other than those mentioned in that text, including
chars in confusables.txt.  I think it makes more sense
to require escaping only for characters that have
special Lisp significance, syntactically.

IOW, I prefer the actual behavior to the behavior
described in that text.  I don't think someone using
Hebrew or Arabic or Chinese or Korean letters in a
symbol name should need to escape each one (or any
of them).

But if the design described there has already been
decided on then as best for Emacs then I guess my
argument is moot.  In that case, the implementation
is currently waaaaaay out of whack wrt the design.

And if that's the design to be implemented then I
agree with you that implementing it as described
in that text would at least have an advantage of
consistency.

> Would it be worth considering making the reader enforce this fully
> specification, as an alternative to your patch?  That would solve
> this problem with curly quotes in symbol names (which also bit me at
> one point), as well as the potential problems with other confusable
> characters raised by Paul.
>
> (It might still be desirable to add a special user-friendly error
> message when the illegal characters are confusable with an ASCII
> single quote, as an additional user-friendliness measure.)
>
> if this approach is not taken, the manual should at least
> be changed to match the actual behavior of the reader.

That's the approach I'd prefer.  Let chars be used in
symbol names without escaping, except for those with
special Lisp syntax.

But add warnings in contexts where we think someone
might have inadvertently used a confusable in place
of a common character.

Reply | Threaded
Open this post in threaded view
|

RE: Change of Lisp syntax for "fancy" quotes in Emacs 27?

Drew Adams
In reply to this post by Clément Pit-Claudel
> > Helpfulness of error messages surely depends on the beholder, and on
> > expectations.  In my eyes,
> >
> >> Symbol's value as variable is void: 'аbbrevs-changed
> > is quite clear: you think this        ^^^^^^^^^^^^^^^^ is a quoted
> > thing, but the error message calls it a symbol.  So there must be a
> > problem with that quote, it has obviously gotten read as part of the
> > symbol.  Sure, you have still to find out why.
>
> I think you're making Eli's point, actually :)
>
> The problem isn't the quote: it's the CYRILLIC SMALL LETTER A instead of
> LATIN SMALL LETTER A.  IOW, (string= "аbbrevs-changed" "abbrevs-
> changed") is nil.
>
> I think Eli was illustrating the confusion that can stem from Unicode
> confusables (and I must agree that the error message could be much
> better ^^)

I too misread Eli's example as being about using a
curly quote instead of an apostrophe.  You're right
that it's an ordinary apostrophe and the first `a'
is the letter you mention.

But then why would anyone ever see the quote mark
in such a message?  Was the message artificially
configured?

In any case, if that example, without the quote, say,
is trying to make Eli's point, then he must be arguing
for warning about using such confusables also - `а'
as a confusable for `a'.

That's a monumental undertaking (take a look at the
confusables.txt list).  And the messages (warning or
error) would need to be pretty darn clear about just
what char was used and where, in order not to sow
even more confusion.  It sure won't cut the mustard
to just say "Invalid read syntax"!

Reply | Threaded
Open this post in threaded view
|

Re: Change of Lisp syntax for "fancy" quotes in Emacs 27?

Michael Heerdegen
In reply to this post by Clément Pit-Claudel
Clément Pit-Claudel <[hidden email]> writes:

> On 2018-02-03 20:16, Michael Heerdegen wrote:
> > Helpfulness of error messages surely depends on the beholder, and on
> > expectations.  In my eyes,
> >
> >> Symbol's value as variable is void: 'аbbrevs-changed
> > is quite clear: you think this        ^^^^^^^^^^^^^^^^ is a quoted
> > thing, but the error message calls it a symbol.  So there must be a
> > problem with that quote, it has obviously gotten read as part of the
> > symbol.  Sure, you have still to find out why.
>
> I think you're making Eli's point, actually :)
>
> The problem isn't the quote: it's the CYRILLIC SMALL LETTER A instead
> of LATIN SMALL LETTER A.  IOW, (string= "аbbrevs-changed"
> "abbrevs-changed") is nil.

Oh.  Why is then there a quote in this error message?

FWIW, I'm not against doing something that helps the user in such
situations.  But these are problems in the interaction between the user
and Emacs, so we should care about it on that (the interface) level.
And keep Lisp, the language, simple.


Michael.

Reply | Threaded
Open this post in threaded view
|

Re: Change of Lisp syntax for "fancy" quotes in Emacs 27?

Noam Postavsky-2
In reply to this post by Drew Adams
On Sat, Feb 3, 2018 at 8:55 PM, Drew Adams <[hidden email]> wrote:

> My objection is to making Lisp require escaping of
> such characters.  That's all.  I think I've made
> that as clear as I possibly can.

I think your position is indeed quite clear by now. In fact, I think
the length and frequency of your posts are going to make it harder for
other people to participate, so could you dial it back it a bit.
Please?

Reply | Threaded
Open this post in threaded view
|

Re: Change of Lisp syntax for "fancy" quotes in Emacs 27?

Paul Eggert
In reply to this post by Aaron Ecay
Aaron Ecay wrote:
> I was surprised to learn that this is the case, in light of what is
> said in the Elisp reference about symbol names

Good point; thanks. In the spirit of "be strict about what you generate", the
Emacs printer should escape any character that is not in the list of characters
documented in the Elisp manual as being safe (i.e., as not requiring escaping).
This is elementary future-proofing, and is independent of whether we want Emacs
to warn about or disallow confusable chars in symbols.

Proposed patches against 'master' attached. The first merely simplifes the code
without changing its effect. The second fixes a bug in the manual, which
incorrectly states that '?' never needs escaping in symbol names. These two
patches are routine. (I assume the second one should be applied to emacs26
instead of to master.)

The third patch changes the Lisp printer to escape characters as suggested above.

The fourth patch changes the Lisp printer to escape '?' only at the start of a
symbol. This is nicer for programs using Scheme-style naming conventions in
Emacs Lisp, e.g., 'fooish?' rather than 'fooishp'. I discovered the need for
this patch when I wrote the second patch.

0001-Simplify-print_object-a-bit.patch (2K) Download Attachment
0002-Say-needs-escaping-at-start-of-symbol.patch (1K) Download Attachment
0003-prin1-etc.-now-escape-more-chars-in-symbols.patch (3K) Download Attachment
0004-Escape-only-at-start-of-symbol.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Change of Lisp syntax for "fancy" quotes in Emacs 27?

Andreas Schwab-2
On Feb 03 2018, Paul Eggert <[hidden email]> wrote:

> Good point; thanks. In the spirit of "be strict about what you generate",
> the Emacs printer should escape any character that is not in the list of
> characters documented in the Elisp manual as being safe (i.e., as not
> requiring escaping).

No!

Andreas.

--
Andreas Schwab, [hidden email]
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

Reply | Threaded
Open this post in threaded view
|

Re: Change of Lisp syntax for "fancy" quotes in Emacs 27?

Alan Third
In reply to this post by Clément Pit-Claudel
On Sat, Feb 03, 2018 at 08:25:01PM -0500, Clément Pit-Claudel wrote:

> On 2018-02-03 20:16, Michael Heerdegen wrote:
> > Helpfulness of error messages surely depends on the beholder, and on
> > expectations.  In my eyes,
> >
> >> Symbol's value as variable is void: 'аbbrevs-changed
> > is quite clear: you think this        ^^^^^^^^^^^^^^^^ is a quoted
> > thing, but the error message calls it a symbol.  So there must be a
> > problem with that quote, it has obviously gotten read as part of the
> > symbol.  Sure, you have still to find out why.
>
> I think you're making Eli's point, actually :)
>
> The problem isn't the quote: it's the CYRILLIC SMALL LETTER A
> instead of LATIN SMALL LETTER A. IOW, (string= "аbbrevs-changed"
> "abbrevs-changed") is nil.
>
> I think Eli was illustrating the confusion that can stem from
> Unicode confusables (and I must agree that the error message could
> be much better ^^)

Something like:

Symbol's value as variable is void: 'аbbrevs-changed
Did you mean `abbrevs-changed'?
Symbol contains `а' (CYRILLIC SMALL LETTER A) at character 0, did you
mean `a' (LATIN SMALL LETTER A)?

The middle line would require Emacs to do a fuzzy search for similar
symbols, which may be too much. Something like that could be helpful
even in cases where the name has been mistyped (abbrev-changed instead
of abbrevs-changed, for example).

--
Alan Third

Reply | Threaded
Open this post in threaded view
|

Re: Change of Lisp syntax for "fancy" quotes in Emacs 27?

Alan Mackenzie
In reply to this post by Michael Heerdegen
Hello, Michael.

On Sun, Feb 04, 2018 at 02:16:52 +0100, Michael Heerdegen wrote:
> Hello,

> Helpfulness of error messages surely depends on the beholder, and on
> expectations.  In my eyes,

> > Symbol's value as variable is void: 'аbbrevs-changed

> is quite clear: you think this        ^^^^^^^^^^^^^^^^ is a quoted
> thing, but the error message calls it a symbol.  So there must be a
> problem with that quote, it has obviously gotten read as part of the
> symbol.  Sure, you have still to find out why.  OTOH

This has actually happened to me.  In the error message, I didn't see
the quote as part of the symbol, I subconsciously dismissed it as a
quoting convention in the error message.  So what my brain saw was

    Symbol's value as variable is void: abbrevs-changed

.  This puzzled me a long time.

> > >   (invalid-read-syntax "strange quote" "’")

> also doesn't say what's wrong with that quote.  It even calls something
> a quote where there is none.

Perhaps "strange quasi quote" would be more emphatic and clearer.

> The error message is confusing.  Repeating the pseudo quote character
> in the error message doesn't make it look less like a quote.

Agreed, on both points.

> > I think you are so eager to make your point that you are willing to
> > claim that black is white and vice versa.  Any objective person would
> > agree that the new error message is more directly pointing to the root
> > cause

> Are you really sure that every Emacs user would expect that we modify
> the Lisp reader to catch typos?

We're not talking about typos here.  The curly quotes aren't present on
typical keyboard layouts (though I'm informed they are present on
Finnish keyboards), so nobody who isn't Finnish will type one of these
characters by accident.  We're talking about Emacs itself corrupting
ASCII quotes into curly quotes in a `message' call because of the
default setting of `text-quoting-style', and so on.

Because of this, the error message should concentrate on that quote, not
the strange symbol, which Emacs itself created.

[ .... ]

> > Symbol's value as variable is void: 'аbbrevs-changed.

> Misusing something fundamental as the Lisp reader to catch such stuff
> should be the very last resort.  The result can get much more confusing
> in situations we now don't think about.

Maybe we're already at the last resort for this problem.  Or maybe not.
Maybe an error message for unknown symbols should check for them
beginning with a curly quote.

> > > Lisp doesn't have a bug here.
> > That's a strawman, and you know it.  We are talking about diagnostics
> > for bugs in Lisp programs.

> I think it's a eligible argument.  Drew just thinks it's the wrong fix.
> He may also think that no fix would maybe suffice.  That's ok, and I
> think he made some good points.

> We should discuss about alternative approaches to move forward.  People
> often paste stuff into scratch or the M-: prompt that they copied from
> elsewhere.  Maybe we could make M-: and C-x C-e check for this problem.
> These could also check for other, similar frequent problems.  Any better
> suggestions?

I think that's a good suggestion.

> Michael.

--
Alan Mackenzie (Nuremberg, Germany).

12345