# bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

22 messages
12
Open this post in threaded view
|

## bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

 Hello, Emacs and Stefan. In the following C comment: 1   /* 2     \*/ 3   /**/ , with point at BOL 1, do M-: (forward-comment 1).  This leaves point wrongly at EOL 2.  It should end up at EOL 3, since the apparent comment ender on L2 is actually escaped. The following patch fixes this.  Are there any objections to me installing it? diff --git a/src/syntax.c b/src/syntax.c index e6af8a377b..066972e6d8 100644 --- a/src/syntax.c +++ b/src/syntax.c @@ -2354,6 +2354,13 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,   /* We have encountered a nested comment of the same style     as the comment sequence which began this comment section.  */   nesting++; +      if (comment_end_can_be_escaped +          && (code == Sescape || code == Scharquote)) +        { +          inc_both (&from, &from_byte); +          UPDATE_SYNTAX_TABLE_FORWARD (from); +          if (from == stop) continue; /* Failure */ +        }        inc_both (&from, &from_byte);        UPDATE_SYNTAX_TABLE_FORWARD (from);   -- Alan Mackenzie (Nuremberg, Germany).
Open this post in threaded view
|

## bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

 Hi Alan, > Hello, Emacs and Stefan. > > In the following C comment: > > 1   /* > 2     \*/ > 3   /**/ > > , with point at BOL 1, do M-: (forward-comment 1).  This leaves point > wrongly at EOL 2. That seems to be correct w.r.t the highlighting I see, OTOH. IOW the bug seems to affect both forward-comment and parse-partial-sexp, right? > It should end up at EOL 3, since the apparent comment > ender on L2 is actually escaped. > > The following patch fixes this. Does it fix it for `parse-partial-sexp` as well? > Are there any objections to me installing it? None from me, no.         Stefan
Open this post in threaded view
|

## bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

 Hello, Stefan. On Tue, Sep 22, 2020 at 10:09:43 -0400, Stefan Monnier wrote: > Hi Alan, > > Hello, Emacs and Stefan. > > In the following C comment: > > 1   /* > > 2     \*/ > > 3   /**/ > > , with point at BOL 1, do M-: (forward-comment 1).  This leaves point > > wrongly at EOL 2. > That seems to be correct w.r.t the highlighting I see, OTOH. > IOW the bug seems to affect both forward-comment and parse-partial-sexp, right? Yes. > > It should end up at EOL 3, since the apparent comment > > ender on L2 is actually escaped. > > The following patch fixes this. > Does it fix it for `parse-partial-sexp` as well? It does, yes.  The patch is in forw_comment, which is called by Fforward_comment, scan_lists, and scan_sexps_forward. > > Are there any objections to me installing it? > None from me, no. Thanks! >         Stefan -- Alan Mackenzie (Nuremberg, Germany).
Open this post in threaded view
|

## bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

 In reply to this post by Alan Mackenzie Sorry if I misunderstood, but since when do backslashes escape */ in C?
Open this post in threaded view
|

## bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

 Hello, Mattias. On Wed, Sep 23, 2020 at 11:01:59 +0200, Mattias Engdegård wrote: > Sorry if I misunderstood, but since when do backslashes escape */ in C? Since forever, but only in the CC Mode test suite.  :-( I just tried it out with gcc, and it seems that \*/ does indeed end a block comment.  But an escaped newline doesn't end a line comment, instead continuing it to the next line.  So I got confused.  Thanks for pointing out the mistake. It seems that as well as the existing variable comment-end-can-be-escaped, we need a new one, say line-comment-end-can-be-escaped, too.  In C and C++ modes, these would be nil and t respectively. -- Alan Mackenzie (Nuremberg, Germany).
Open this post in threaded view
|

## bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

 > It seems that as well as the existing variable > comment-end-can-be-escaped, we need a new one, say > line-comment-end-can-be-escaped, too. syntax.c doesn't like to think of it as "line-comment" but rather as comment stay a, b, c, or nested and non-nested. > In C and C++ modes, these would > be nil and t respectively. I sm-c-mode, I'd handle those corner cases in `syntax-propertize-function` (tho I think I don't bother with this one currently). So, I guess in CC-mode, you could handle those by placing `syntax-table` properties from ... wherever you place them ;-)         Stefan
Open this post in threaded view
|

## bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

 Hello, Stefan. On Wed, Sep 23, 2020 at 14:44:54 -0400, Stefan Monnier wrote: > > It seems that as well as the existing variable > > comment-end-can-be-escaped, we need a new one, say > > line-comment-end-can-be-escaped, too. > syntax.c doesn't like to think of it as "line-comment" but rather as > comment stay [ ?? style ?? ] a, b, c, or nested and non-nested. Hmm.  It could be quite troublesome to decide on an interface for major modes specifying "comment style b can have its ender escaped, but comment styles a and c cannot". > > In C and C++ modes, these would > > be nil and t respectively. > I sm-c-mode, I'd handle those corner cases in > `syntax-propertize-function` (tho I think I don't bother with this one > currently). > So, I guess in CC-mode, you could handle those by placing `syntax-table` > properties from ... wherever you place them ;-) Thanks, that's an idea - either putting a neutral s-t prop on the \ of \*/, or something on the \n of \\n in a line comment.  I think the first of these is a better idea than the second. But on the other hand, it feels like a workaround for the lack of a full-featured comment-end-can-be-escaped. >         Stefan -- Alan Mackenzie (Nuremberg, Germany).
Open this post in threaded view
|

## bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

 > But on the other hand, it feels like a workaround for the lack of a Yes, that's the definition of `syntax-propertize-function` ;-)         Stefan
Open this post in threaded view
|

## bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

 In reply to this post by Stefan Monnier > As already said, this is a(n ugly) workaround.  syntax.c should handle > comments in all their generality.  With a bit of consideration, the > method to do this is clear: In my world, it's quite normal for a specific language's lexical rules not to line up 100% with syntax tables (whether for strings, comments, younameit).  I don't see anything very special here. A `syntax-propertize` rule for "\*/" should be very easy to implement and fairly cheap since the regexp is simple and will almost never match. So, yeah, you can add yet-another-hack on top of the other syntax.c hacks if you want, but there's a good chance it will only ever be used by CC-mode.  It will take a lot more code changes in syntax.c than a quick tweak to your Elisp code to search for "\*/". I do think it would be good to handle this without `syntax-table` text-property hacks, but I think that should come with an overhaul of syntax.c based on a major-mode provided DFA (or something like that) so it can accommodate all the various oddball cases without even the need to introduce the notion of escaping comment markers.         Stefan
Open this post in threaded view
|

## bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

 Hello, Stefan. On Thu, Sep 24, 2020 at 12:56:42 -0400, Stefan Monnier wrote: > > As already said, this is a(n ugly) workaround.  syntax.c should handle > > comments in all their generality.  With a bit of consideration, the > > method to do this is clear: > In my world, it's quite normal for a specific language's lexical rules > not to line up 100% with syntax tables (whether for strings, comments, > younameit).  I don't see anything very special here. Normally when there's a mismatch, it's because a character is syntactically ambiguous.  There's nothing syntax.c can do about this. In the current situation, this isn't the case: syntax.c is unable to handle a comment scenario where there is no ambiguity. > A `syntax-propertize` rule for "\*/" should be very easy to implement > and fairly cheap since the regexp is simple and will almost never match. Well, the rule would actually be for escaped newlines, but this would be quite expensive (compared with a syntax.c solution) since every comment near a change region would need scanning at each change. > So, yeah, you can add yet-another-hack on top of the other syntax.c > hacks if you want, but there's a good chance it will only ever be used > by CC-mode.  It will take a lot more code changes in syntax.c than > a quick tweak to your Elisp code to search for "\*/". I've hacked up a working, but as yet unsatisfactory, change to syntax.c. It is surely better, where possible, to fix bugs at their point of causation rather than by workarounds elsewhere.  As you note, CC Mode modes will be the only known users at the moment. Just as an aside, the project where I was working ~four years ago banned a proprietory editor after a mammoth search for a bug caused by an unintentional escaped NL on a line comment.  The banned editor didn't fontify the continuation line in comment face.  I was able to demonstrate to the project manager that Emacs fontified that comment correctly. > I do think it would be good to handle this without `syntax-table` > text-property hacks, but I think that should come with an overhaul of > syntax.c based on a major-mode provided DFA (or something like that) so > it can accommodate all the various oddball cases without even the need > to introduce the notion of escaping comment markers. That sounds almost more like a rewrite than an overhaul.  You mean, I think, that the syntax of language expressions would be defined using something a bit like (but more powerful than) regular expressions.  And with that, the need for syntactic analysis in Lisp would be much reduced. We would need to make sure that this wouldn't run more slowly than the current syntax.c/Lisp combination. >         Stefan -- Alan Mackenzie (Nuremberg, Germany).
Open this post in threaded view
|

## bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

 In reply to this post by Alan Mackenzie Alan Mackenzie <[hidden email]> writes: > Hello, Mattias. > > On Wed, Sep 23, 2020 at 11:01:59 +0200, Mattias Engdegård wrote: >> Sorry if I misunderstood, but since when do backslashes escape */ in C? > > Since forever, but only in the CC Mode test suite.  :-( > > I just tried it out with gcc, and it seems that \*/ does indeed end a > block comment.  But an escaped newline doesn't end a line comment, > instead continuing it to the next line.  So I got confused.  Thanks for > pointing out the mistake. > > It seems that as well as the existing variable > comment-end-can-be-escaped, we need a new one, say > line-comment-end-can-be-escaped, too.  In C and C++ modes, these would > be nil and t respectively. But where does it say that backslashes escape */ in C++?  The C++ 14 standard (and it hasn't changed through C++ 20) says:     2.7 Comments [lex.comment]         The characters /* start a comment, which terminates with the     characters */. These comments do not nest.  The characters // start     a comment, which terminates immediately before the next new-line     character. If there is a form-feed or a vertical-tab character in     such a comment, only white-space characters shall appear between it     and the new-line that terminates the comment; no diagnostic is     required. [ Note: The comment characters //, /*, and */ have no     special meaning within a // comment and are treated just like other     characters. Similarly, the comment characters // and /* have no     special meaning within a /* comment.  — end note ] -- Michael Welsh Duggan ([hidden email])
Open this post in threaded view
|

## bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

 Hello, Michael. On Thu, Sep 24, 2020 at 14:52:16 -0400, Michael Welsh Duggan wrote: > Alan Mackenzie <[hidden email]> writes: > > On Wed, Sep 23, 2020 at 11:01:59 +0200, Mattias Engdegård wrote: > >> Sorry if I misunderstood, but since when do backslashes escape */ in C? > > Since forever, but only in the CC Mode test suite.  :-( > > I just tried it out with gcc, and it seems that \*/ does indeed end a > > block comment.  But an escaped newline doesn't end a line comment, > > instead continuing it to the next line.  So I got confused.  Thanks for > > pointing out the mistake. > > It seems that as well as the existing variable > > comment-end-can-be-escaped, we need a new one, say > > line-comment-end-can-be-escaped, too.  In C and C++ modes, these would > > be nil and t respectively. > But where does it say that backslashes escape */ in C++? Nowhere.  :-( There has been a test in the CC Mode test suite for many years which assumed this (but was disabled for existing (X)Emacs versions, waiting for a new Emacs version to be "fixed"). > The C++ 14 standard (and it hasn't changed through C++ 20) says: >     2.7 Comments [lex.comment]     >     The characters /* start a comment, which terminates with the >     characters */. These comments do not nest.  The characters // start >     a comment, which terminates immediately before the next new-line >     character. For all the difference it makes, Emacs assumes the comment ends _after_ the NL. >     If there is a form-feed or a vertical-tab character in such a >     comment, only white-space characters shall appear between it and >     the new-line that terminates the comment; no diagnostic is >     required. I didn't know that.  Emacs/CC Mode doesn't code up this subtlety.  It probably isn't worth bothering about. >     [ Note: The comment characters //, /*, and */ have no special >     meaning within a // comment and are treated just like other >     characters. Similarly, the comment characters // and /* have no >     special meaning within a /* comment.  — end note ] Additionally, an escaped newline continues a comment onto the next line. This happens, notionally, at a very early stage of compilation where a backslash followed by NL anywhere get replaced by a space.  I think that even two backslashes followed by NL would get replaced by backslash, space. > -- > Michael Welsh Duggan > ([hidden email]) -- Alan Mackenzie (Nuremberg, Germany).
Open this post in threaded view
|

## bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

 Alan Mackenzie <[hidden email]> writes: > Hello, Michael. > > On Thu, Sep 24, 2020 at 14:52:16 -0400, Michael Welsh Duggan wrote: >> Alan Mackenzie <[hidden email]> writes: > >> > On Wed, Sep 23, 2020 at 11:01:59 +0200, Mattias Engdegård wrote: >> >> Sorry if I misunderstood, but since when do backslashes escape */ in C? > >> > Since forever, but only in the CC Mode test suite.  :-( > >> > I just tried it out with gcc, and it seems that \*/ does indeed end a >> > block comment.  But an escaped newline doesn't end a line comment, >> > instead continuing it to the next line.  So I got confused.  Thanks for >> > pointing out the mistake. > >> > It seems that as well as the existing variable >> > comment-end-can-be-escaped, we need a new one, say >> > line-comment-end-can-be-escaped, too.  In C and C++ modes, these would >> > be nil and t respectively. > >> But where does it say that backslashes escape */ in C++? > > Nowhere.  :-( > > There has been a test in the CC Mode test suite for many years which > assumed this (but was disabled for existing (X)Emacs versions, waiting > for a new Emacs version to be "fixed"). > >> The C++ 14 standard (and it hasn't changed through C++ 20) says: > >>     2.7 Comments [lex.comment] >     >>     The characters /* start a comment, which terminates with the >>     characters */. These comments do not nest.  The characters // start >>     a comment, which terminates immediately before the next new-line >>     character. > > For all the difference it makes, Emacs assumes the comment ends _after_ > the NL. > >>     If there is a form-feed or a vertical-tab character in such a >>     comment, only white-space characters shall appear between it and >>     the new-line that terminates the comment; no diagnostic is >>     required. > > I didn't know that.  Emacs/CC Mode doesn't code up this subtlety.  It > probably isn't worth bothering about. > >>     [ Note: The comment characters //, /*, and */ have no special >>     meaning within a // comment and are treated just like other >>     characters. Similarly, the comment characters // and /* have no >>     special meaning within a /* comment.  — end note ] > > Additionally, an escaped newline continues a comment onto the next line. > This happens, notionally, at a very early stage of compilation where a > backslash followed by NL anywhere get replaced by a space.  I think that > even two backslashes followed by NL would get replaced by backslash, > space. Almost.  A backslash followed by a newline is elided completely, joining the lines.  (Not replaced by a space.  Otherwise, I concur. -- Michael Welsh Duggan ([hidden email])
Open this post in threaded view
|

## bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

Open this post in threaded view
|

## bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

 >> So, yeah, you can add yet-another-hack on top of the other syntax.c >> hacks if you want, but there's a good chance it will only ever be used >> by CC-mode.  It will take a lot more code changes in syntax.c than >> a quick tweak to your Elisp code to search for "\*/". [...] > OK, here's the patch. I think the patch agrees with my assessment above (even though it's still missing a etc/NEWS entry, adjustment to the docstring of modify-syntax-entry and to the .texi manual). I really can't understand why you resist so much the use of a `syntax-table` property on those rare \\\n sequences.         Stefan PS: Also, I just noticed that `gcc -Wall` warns about the use of such multiline comments, so it doesn't seem to be a very popular feature. PPS: For reference, I just tried to add support for it in sm-c-mode and this is the resulting code: @@ -312,7 +315,15 @@ E.g. a #define nested within 2 #ifs will be turned into \"#  define\"."                                 'syntax-table (string-to-syntax "|"))              (put-text-property (match-beginning 2) (match-end 2)                                 'syntax-table (string-to-syntax "|"))) -          (sm-c--cpp-syntax-propertize end))))) +          (sm-c--cpp-syntax-propertize end)))) +    ("\\\\\\(\n\\)" +     (1 (let ((ppss (save-excursion (syntax-ppss (match-beginning 0))))) +          (when (and (nth 4 ppss)        ;Within a comment +                     (null (nth 7 ppss)) ;Within a // comment +                     (save-excursion     ;The \ is not itself escaped +                       (goto-char (match-beginning 0)) +                       (zerop (mod (skip-chars-backward "\\\\") 2)))) +            (string-to-syntax "."))))))     (point) end))    (defun sm-c-syntactic-face-function (ppss)
Open this post in threaded view
|

## bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

 > Because syntax-table text properties are already used for so many > different things in CC Mode (I think the count is five in C++ Mode). > Adding another one would mean having to scan for this rare construct at > every buffer change, and this would slow things down, possibly a lot. The fact that you already have 5 other such uses implies that the slow down from this one cannot possibly be larger than 20% (since the scan for it is very simple, I doubt any of the other 5 is simpler). Most major modes have such things and we live just fine with them. This is a non-issue.         Stefan
Open this post in threaded view
|

## bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

 In reply to this post by Stefan Monnier > Date: Sun, 22 Nov 2020 13:12:31 +0000 > From: Alan Mackenzie <[hidden email]> > Cc: [hidden email], >  Mattias Engdegård <[hidden email]>, [hidden email] > > +@samp{e} means that when @var{c}, a comment ender or first character > +of a two character ender, is directly proceded by one or more escape                                          ^^^^^^^^ "preceded", I guess?
Open this post in threaded view
|

## bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

 In reply to this post by Stefan Monnier Hello, Stefan. On Sun, Nov 22, 2020 at 10:20:32 -0500, Stefan Monnier wrote: > > Because syntax-table text properties are already used for so many > > different things in CC Mode (I think the count is five in C++ Mode). > > Adding another one would mean having to scan for this rare construct at > > every buffer change, and this would slow things down, possibly a lot. > The fact that you already have 5 other such uses implies that the slow > down from this one cannot possibly be larger than 20% (since the scan > for it is very simple, I doubt any of the other 5 is simpler). The fact remains that an implementation at the C level is objectively better than one at the Lisp level. > Most major modes have such things and we live just fine with them. > This is a non-issue. Really?  Are there any other programming language modes whose comments syntax.c cannot handle without syntax-table text properties? >         Stefan -- Alan Mackenzie (Nuremberg, Germany).
Open this post in threaded view
|

## bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

 Hello, Dmitry. On Sun, Nov 22, 2020 at 19:46:24 +0200, Dmitry Gutov wrote: > On 22.11.2020 19:08, Alan Mackenzie wrote: > > Really?  Are there any other programming language modes whose comments > > syntax.c cannot handle without syntax-table text properties? > Ruby is just one example. Thanks. I've just searched the web for that.  Ruby has block comment delimiters =begin and =end. It would be possible to handle these in syntax.c, but somewhat clumsy and awkward. Presumably ruby-mode handles these with syntax-table text properties applied to the = sign and the terminating d, which is a little clumsy, but not too bad, at the Lisp level. -- Alan Mackenzie (Nuremberg, Germany).