bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

Alan Mackenzie
Hello, Emacs and Stefan.

In the following C comment:

1   /*
2     \*/
3   /**/

, with point at BOL 1, do M-: (forward-comment 1).  This leaves point
wrongly at EOL 2.  It should end up at EOL 3, since the apparent comment
ender on L2 is actually escaped.

The following patch fixes this.  Are there any objections to me
installing it?


diff --git a/src/syntax.c b/src/syntax.c
index e6af8a377b..066972e6d8 100644
--- a/src/syntax.c
+++ b/src/syntax.c
@@ -2354,6 +2354,13 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
  /* We have encountered a nested comment of the same style
    as the comment sequence which began this comment section.  */
  nesting++;
+      if (comment_end_can_be_escaped
+          && (code == Sescape || code == Scharquote))
+        {
+          inc_both (&from, &from_byte);
+          UPDATE_SYNTAX_TABLE_FORWARD (from);
+          if (from == stop) continue; /* Failure */
+        }
       inc_both (&from, &from_byte);
       UPDATE_SYNTAX_TABLE_FORWARD (from);
 

--
Alan Mackenzie (Nuremberg, Germany).



Reply | Threaded
Open this post in threaded view
|

bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

Stefan Monnier
Hi Alan,

> Hello, Emacs and Stefan.
>
> In the following C comment:
>
> 1   /*
> 2     \*/
> 3   /**/
>
> , with point at BOL 1, do M-: (forward-comment 1).  This leaves point
> wrongly at EOL 2.

That seems to be correct w.r.t the highlighting I see, OTOH.
IOW the bug seems to affect both forward-comment and parse-partial-sexp, right?

> It should end up at EOL 3, since the apparent comment
> ender on L2 is actually escaped.
>
> The following patch fixes this.

Does it fix it for `parse-partial-sexp` as well?

> Are there any objections to me installing it?

None from me, no.


        Stefan




Reply | Threaded
Open this post in threaded view
|

bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

Alan Mackenzie
Hello, Stefan.

On Tue, Sep 22, 2020 at 10:09:43 -0400, Stefan Monnier wrote:
> Hi Alan,

> > Hello, Emacs and Stefan.

> > In the following C comment:

> > 1   /*
> > 2     \*/
> > 3   /**/

> > , with point at BOL 1, do M-: (forward-comment 1).  This leaves point
> > wrongly at EOL 2.

> That seems to be correct w.r.t the highlighting I see, OTOH.
> IOW the bug seems to affect both forward-comment and parse-partial-sexp, right?

Yes.

> > It should end up at EOL 3, since the apparent comment
> > ender on L2 is actually escaped.

> > The following patch fixes this.

> Does it fix it for `parse-partial-sexp` as well?

It does, yes.  The patch is in forw_comment, which is called by
Fforward_comment, scan_lists, and scan_sexps_forward.

> > Are there any objections to me installing it?

> None from me, no.

Thanks!

>         Stefan

--
Alan Mackenzie (Nuremberg, Germany).



Reply | Threaded
Open this post in threaded view
|

bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

Mattias Engdegård-2
In reply to this post by Alan Mackenzie
Sorry if I misunderstood, but since when do backslashes escape */ in C?




Reply | Threaded
Open this post in threaded view
|

bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

Alan Mackenzie
Hello, Mattias.

On Wed, Sep 23, 2020 at 11:01:59 +0200, Mattias Engdegård wrote:
> Sorry if I misunderstood, but since when do backslashes escape */ in C?

Since forever, but only in the CC Mode test suite.  :-(

I just tried it out with gcc, and it seems that \*/ does indeed end a
block comment.  But an escaped newline doesn't end a line comment,
instead continuing it to the next line.  So I got confused.  Thanks for
pointing out the mistake.

It seems that as well as the existing variable
comment-end-can-be-escaped, we need a new one, say
line-comment-end-can-be-escaped, too.  In C and C++ modes, these would
be nil and t respectively.

--
Alan Mackenzie (Nuremberg, Germany).



Reply | Threaded
Open this post in threaded view
|

bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

Stefan Monnier
> It seems that as well as the existing variable
> comment-end-can-be-escaped, we need a new one, say
> line-comment-end-can-be-escaped, too.

syntax.c doesn't like to think of it as "line-comment" but rather as
comment stay a, b, c, or nested and non-nested.

> In C and C++ modes, these would
> be nil and t respectively.

I sm-c-mode, I'd handle those corner cases in
`syntax-propertize-function` (tho I think I don't bother with this one
currently).

So, I guess in CC-mode, you could handle those by placing `syntax-table`
properties from ... wherever you place them ;-)


        Stefan




Reply | Threaded
Open this post in threaded view
|

bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

Alan Mackenzie
Hello, Stefan.

On Wed, Sep 23, 2020 at 14:44:54 -0400, Stefan Monnier wrote:
> > It seems that as well as the existing variable
> > comment-end-can-be-escaped, we need a new one, say
> > line-comment-end-can-be-escaped, too.

> syntax.c doesn't like to think of it as "line-comment" but rather as
> comment stay [ ?? style ?? ] a, b, c, or nested and non-nested.

Hmm.  It could be quite troublesome to decide on an interface for major
modes specifying "comment style b can have its ender escaped, but
comment styles a and c cannot".

> > In C and C++ modes, these would
> > be nil and t respectively.

> I sm-c-mode, I'd handle those corner cases in
> `syntax-propertize-function` (tho I think I don't bother with this one
> currently).

> So, I guess in CC-mode, you could handle those by placing `syntax-table`
> properties from ... wherever you place them ;-)

Thanks, that's an idea - either putting a neutral s-t prop on the \ of
\*/, or something on the \n of \\n in a line comment.  I think the first
of these is a better idea than the second.

But on the other hand, it feels like a workaround for the lack of a
full-featured comment-end-can-be-escaped.

>         Stefan

--
Alan Mackenzie (Nuremberg, Germany).



Reply | Threaded
Open this post in threaded view
|

bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

Stefan Monnier
> But on the other hand, it feels like a workaround for the lack of a

Yes, that's the definition of `syntax-propertize-function` ;-)


        Stefan




Reply | Threaded
Open this post in threaded view
|

bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

Stefan Monnier
In reply to this post by Stefan Monnier
> As already said, this is a(n ugly) workaround.  syntax.c should handle
> comments in all their generality.  With a bit of consideration, the
> method to do this is clear:

In my world, it's quite normal for a specific language's lexical rules
not to line up 100% with syntax tables (whether for strings, comments,
younameit).  I don't see anything very special here.

A `syntax-propertize` rule for "\*/" should be very easy to implement
and fairly cheap since the regexp is simple and will almost never match.

So, yeah, you can add yet-another-hack on top of the other syntax.c
hacks if you want, but there's a good chance it will only ever be used
by CC-mode.  It will take a lot more code changes in syntax.c than
a quick tweak to your Elisp code to search for "\*/".

I do think it would be good to handle this without `syntax-table`
text-property hacks, but I think that should come with an overhaul of
syntax.c based on a major-mode provided DFA (or something like that) so
it can accommodate all the various oddball cases without even the need
to introduce the notion of escaping comment markers.


        Stefan




Reply | Threaded
Open this post in threaded view
|

bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

Alan Mackenzie
Hello, Stefan.

On Thu, Sep 24, 2020 at 12:56:42 -0400, Stefan Monnier wrote:
> > As already said, this is a(n ugly) workaround.  syntax.c should handle
> > comments in all their generality.  With a bit of consideration, the
> > method to do this is clear:

> In my world, it's quite normal for a specific language's lexical rules
> not to line up 100% with syntax tables (whether for strings, comments,
> younameit).  I don't see anything very special here.

Normally when there's a mismatch, it's because a character is
syntactically ambiguous.  There's nothing syntax.c can do about this.

In the current situation, this isn't the case: syntax.c is unable to
handle a comment scenario where there is no ambiguity.

> A `syntax-propertize` rule for "\*/" should be very easy to implement
> and fairly cheap since the regexp is simple and will almost never match.

Well, the rule would actually be for escaped newlines, but this would be
quite expensive (compared with a syntax.c solution) since every comment
near a change region would need scanning at each change.

> So, yeah, you can add yet-another-hack on top of the other syntax.c
> hacks if you want, but there's a good chance it will only ever be used
> by CC-mode.  It will take a lot more code changes in syntax.c than
> a quick tweak to your Elisp code to search for "\*/".

I've hacked up a working, but as yet unsatisfactory, change to syntax.c.
It is surely better, where possible, to fix bugs at their point of
causation rather than by workarounds elsewhere.  As you note, CC Mode
modes will be the only known users at the moment.

Just as an aside, the project where I was working ~four years ago banned
a proprietory editor after a mammoth search for a bug caused by an
unintentional escaped NL on a line comment.  The banned editor didn't
fontify the continuation line in comment face.  I was able to
demonstrate to the project manager that Emacs fontified that comment
correctly.

> I do think it would be good to handle this without `syntax-table`
> text-property hacks, but I think that should come with an overhaul of
> syntax.c based on a major-mode provided DFA (or something like that) so
> it can accommodate all the various oddball cases without even the need
> to introduce the notion of escaping comment markers.

That sounds almost more like a rewrite than an overhaul.  You mean, I
think, that the syntax of language expressions would be defined using
something a bit like (but more powerful than) regular expressions.  And
with that, the need for syntactic analysis in Lisp would be much
reduced.

We would need to make sure that this wouldn't run more slowly than the
current syntax.c/Lisp combination.

>         Stefan

--
Alan Mackenzie (Nuremberg, Germany).



Reply | Threaded
Open this post in threaded view
|

bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

Michael Welsh Duggan-5
In reply to this post by Alan Mackenzie
Alan Mackenzie <[hidden email]> writes:

> Hello, Mattias.
>
> On Wed, Sep 23, 2020 at 11:01:59 +0200, Mattias Engdegård wrote:
>> Sorry if I misunderstood, but since when do backslashes escape */ in C?
>
> Since forever, but only in the CC Mode test suite.  :-(
>
> I just tried it out with gcc, and it seems that \*/ does indeed end a
> block comment.  But an escaped newline doesn't end a line comment,
> instead continuing it to the next line.  So I got confused.  Thanks for
> pointing out the mistake.
>
> It seems that as well as the existing variable
> comment-end-can-be-escaped, we need a new one, say
> line-comment-end-can-be-escaped, too.  In C and C++ modes, these would
> be nil and t respectively.

But where does it say that backslashes escape */ in C++?  The C++ 14
standard (and it hasn't changed through C++ 20) says:

    2.7 Comments [lex.comment]
   
    The characters /* start a comment, which terminates with the
    characters */. These comments do not nest.  The characters // start
    a comment, which terminates immediately before the next new-line
    character. If there is a form-feed or a vertical-tab character in
    such a comment, only white-space characters shall appear between it
    and the new-line that terminates the comment; no diagnostic is
    required. [ Note: The comment characters //, /*, and */ have no
    special meaning within a // comment and are treated just like other
    characters. Similarly, the comment characters // and /* have no
    special meaning within a /* comment.  — end note ]

--
Michael Welsh Duggan
([hidden email])



Reply | Threaded
Open this post in threaded view
|

bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

Alan Mackenzie
Hello, Michael.

On Thu, Sep 24, 2020 at 14:52:16 -0400, Michael Welsh Duggan wrote:
> Alan Mackenzie <[hidden email]> writes:

> > On Wed, Sep 23, 2020 at 11:01:59 +0200, Mattias Engdegård wrote:
> >> Sorry if I misunderstood, but since when do backslashes escape */ in C?

> > Since forever, but only in the CC Mode test suite.  :-(

> > I just tried it out with gcc, and it seems that \*/ does indeed end a
> > block comment.  But an escaped newline doesn't end a line comment,
> > instead continuing it to the next line.  So I got confused.  Thanks for
> > pointing out the mistake.

> > It seems that as well as the existing variable
> > comment-end-can-be-escaped, we need a new one, say
> > line-comment-end-can-be-escaped, too.  In C and C++ modes, these would
> > be nil and t respectively.

> But where does it say that backslashes escape */ in C++?

Nowhere.  :-(

There has been a test in the CC Mode test suite for many years which
assumed this (but was disabled for existing (X)Emacs versions, waiting
for a new Emacs version to be "fixed").

> The C++ 14 standard (and it hasn't changed through C++ 20) says:

>     2.7 Comments [lex.comment]
   
>     The characters /* start a comment, which terminates with the
>     characters */. These comments do not nest.  The characters // start
>     a comment, which terminates immediately before the next new-line
>     character.

For all the difference it makes, Emacs assumes the comment ends _after_
the NL.

>     If there is a form-feed or a vertical-tab character in such a
>     comment, only white-space characters shall appear between it and
>     the new-line that terminates the comment; no diagnostic is
>     required.

I didn't know that.  Emacs/CC Mode doesn't code up this subtlety.  It
probably isn't worth bothering about.

>     [ Note: The comment characters //, /*, and */ have no special
>     meaning within a // comment and are treated just like other
>     characters. Similarly, the comment characters // and /* have no
>     special meaning within a /* comment.  — end note ]

Additionally, an escaped newline continues a comment onto the next line.
This happens, notionally, at a very early stage of compilation where a
backslash followed by NL anywhere get replaced by a space.  I think that
even two backslashes followed by NL would get replaced by backslash,
space.

> --
> Michael Welsh Duggan
> ([hidden email])

--
Alan Mackenzie (Nuremberg, Germany).



Reply | Threaded
Open this post in threaded view
|

bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

Michael Welsh Duggan-3
Alan Mackenzie <[hidden email]> writes:

> Hello, Michael.
>
> On Thu, Sep 24, 2020 at 14:52:16 -0400, Michael Welsh Duggan wrote:
>> Alan Mackenzie <[hidden email]> writes:
>
>> > On Wed, Sep 23, 2020 at 11:01:59 +0200, Mattias Engdegård wrote:
>> >> Sorry if I misunderstood, but since when do backslashes escape */ in C?
>
>> > Since forever, but only in the CC Mode test suite.  :-(
>
>> > I just tried it out with gcc, and it seems that \*/ does indeed end a
>> > block comment.  But an escaped newline doesn't end a line comment,
>> > instead continuing it to the next line.  So I got confused.  Thanks for
>> > pointing out the mistake.
>
>> > It seems that as well as the existing variable
>> > comment-end-can-be-escaped, we need a new one, say
>> > line-comment-end-can-be-escaped, too.  In C and C++ modes, these would
>> > be nil and t respectively.
>
>> But where does it say that backslashes escape */ in C++?
>
> Nowhere.  :-(
>
> There has been a test in the CC Mode test suite for many years which
> assumed this (but was disabled for existing (X)Emacs versions, waiting
> for a new Emacs version to be "fixed").
>
>> The C++ 14 standard (and it hasn't changed through C++ 20) says:
>
>>     2.7 Comments [lex.comment]
>    
>>     The characters /* start a comment, which terminates with the
>>     characters */. These comments do not nest.  The characters // start
>>     a comment, which terminates immediately before the next new-line
>>     character.
>
> For all the difference it makes, Emacs assumes the comment ends _after_
> the NL.
>
>>     If there is a form-feed or a vertical-tab character in such a
>>     comment, only white-space characters shall appear between it and
>>     the new-line that terminates the comment; no diagnostic is
>>     required.
>
> I didn't know that.  Emacs/CC Mode doesn't code up this subtlety.  It
> probably isn't worth bothering about.
>
>>     [ Note: The comment characters //, /*, and */ have no special
>>     meaning within a // comment and are treated just like other
>>     characters. Similarly, the comment characters // and /* have no
>>     special meaning within a /* comment.  — end note ]
>
> Additionally, an escaped newline continues a comment onto the next line.
> This happens, notionally, at a very early stage of compilation where a
> backslash followed by NL anywhere get replaced by a space.  I think that
> even two backslashes followed by NL would get replaced by backslash,
> space.

Almost.  A backslash followed by a newline is elided completely, joining
the lines.  (Not replaced by a space.  Otherwise, I concur.

--
Michael Welsh Duggan
([hidden email])