bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

Alan Mackenzie
Hello, Emacs and Stefan.

In the following C comment:

1   /*
2     \*/
3   /**/

, with point at BOL 1, do M-: (forward-comment 1).  This leaves point
wrongly at EOL 2.  It should end up at EOL 3, since the apparent comment
ender on L2 is actually escaped.

The following patch fixes this.  Are there any objections to me
installing it?


diff --git a/src/syntax.c b/src/syntax.c
index e6af8a377b..066972e6d8 100644
--- a/src/syntax.c
+++ b/src/syntax.c
@@ -2354,6 +2354,13 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
  /* We have encountered a nested comment of the same style
    as the comment sequence which began this comment section.  */
  nesting++;
+      if (comment_end_can_be_escaped
+          && (code == Sescape || code == Scharquote))
+        {
+          inc_both (&from, &from_byte);
+          UPDATE_SYNTAX_TABLE_FORWARD (from);
+          if (from == stop) continue; /* Failure */
+        }
       inc_both (&from, &from_byte);
       UPDATE_SYNTAX_TABLE_FORWARD (from);
 

--
Alan Mackenzie (Nuremberg, Germany).



Reply | Threaded
Open this post in threaded view
|

bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

Stefan Monnier
Hi Alan,

> Hello, Emacs and Stefan.
>
> In the following C comment:
>
> 1   /*
> 2     \*/
> 3   /**/
>
> , with point at BOL 1, do M-: (forward-comment 1).  This leaves point
> wrongly at EOL 2.

That seems to be correct w.r.t the highlighting I see, OTOH.
IOW the bug seems to affect both forward-comment and parse-partial-sexp, right?

> It should end up at EOL 3, since the apparent comment
> ender on L2 is actually escaped.
>
> The following patch fixes this.

Does it fix it for `parse-partial-sexp` as well?

> Are there any objections to me installing it?

None from me, no.


        Stefan




Reply | Threaded
Open this post in threaded view
|

bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

Alan Mackenzie
Hello, Stefan.

On Tue, Sep 22, 2020 at 10:09:43 -0400, Stefan Monnier wrote:
> Hi Alan,

> > Hello, Emacs and Stefan.

> > In the following C comment:

> > 1   /*
> > 2     \*/
> > 3   /**/

> > , with point at BOL 1, do M-: (forward-comment 1).  This leaves point
> > wrongly at EOL 2.

> That seems to be correct w.r.t the highlighting I see, OTOH.
> IOW the bug seems to affect both forward-comment and parse-partial-sexp, right?

Yes.

> > It should end up at EOL 3, since the apparent comment
> > ender on L2 is actually escaped.

> > The following patch fixes this.

> Does it fix it for `parse-partial-sexp` as well?

It does, yes.  The patch is in forw_comment, which is called by
Fforward_comment, scan_lists, and scan_sexps_forward.

> > Are there any objections to me installing it?

> None from me, no.

Thanks!

>         Stefan

--
Alan Mackenzie (Nuremberg, Germany).



Reply | Threaded
Open this post in threaded view
|

bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

Mattias Engdegård-2
In reply to this post by Alan Mackenzie
Sorry if I misunderstood, but since when do backslashes escape */ in C?




Reply | Threaded
Open this post in threaded view
|

bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

Alan Mackenzie
Hello, Mattias.

On Wed, Sep 23, 2020 at 11:01:59 +0200, Mattias Engdegård wrote:
> Sorry if I misunderstood, but since when do backslashes escape */ in C?

Since forever, but only in the CC Mode test suite.  :-(

I just tried it out with gcc, and it seems that \*/ does indeed end a
block comment.  But an escaped newline doesn't end a line comment,
instead continuing it to the next line.  So I got confused.  Thanks for
pointing out the mistake.

It seems that as well as the existing variable
comment-end-can-be-escaped, we need a new one, say
line-comment-end-can-be-escaped, too.  In C and C++ modes, these would
be nil and t respectively.

--
Alan Mackenzie (Nuremberg, Germany).



Reply | Threaded
Open this post in threaded view
|

bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

Stefan Monnier
> It seems that as well as the existing variable
> comment-end-can-be-escaped, we need a new one, say
> line-comment-end-can-be-escaped, too.

syntax.c doesn't like to think of it as "line-comment" but rather as
comment stay a, b, c, or nested and non-nested.

> In C and C++ modes, these would
> be nil and t respectively.

I sm-c-mode, I'd handle those corner cases in
`syntax-propertize-function` (tho I think I don't bother with this one
currently).

So, I guess in CC-mode, you could handle those by placing `syntax-table`
properties from ... wherever you place them ;-)


        Stefan




Reply | Threaded
Open this post in threaded view
|

bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

Alan Mackenzie
Hello, Stefan.

On Wed, Sep 23, 2020 at 14:44:54 -0400, Stefan Monnier wrote:
> > It seems that as well as the existing variable
> > comment-end-can-be-escaped, we need a new one, say
> > line-comment-end-can-be-escaped, too.

> syntax.c doesn't like to think of it as "line-comment" but rather as
> comment stay [ ?? style ?? ] a, b, c, or nested and non-nested.

Hmm.  It could be quite troublesome to decide on an interface for major
modes specifying "comment style b can have its ender escaped, but
comment styles a and c cannot".

> > In C and C++ modes, these would
> > be nil and t respectively.

> I sm-c-mode, I'd handle those corner cases in
> `syntax-propertize-function` (tho I think I don't bother with this one
> currently).

> So, I guess in CC-mode, you could handle those by placing `syntax-table`
> properties from ... wherever you place them ;-)

Thanks, that's an idea - either putting a neutral s-t prop on the \ of
\*/, or something on the \n of \\n in a line comment.  I think the first
of these is a better idea than the second.

But on the other hand, it feels like a workaround for the lack of a
full-featured comment-end-can-be-escaped.

>         Stefan

--
Alan Mackenzie (Nuremberg, Germany).



Reply | Threaded
Open this post in threaded view
|

bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

Stefan Monnier
> But on the other hand, it feels like a workaround for the lack of a

Yes, that's the definition of `syntax-propertize-function` ;-)


        Stefan




Reply | Threaded
Open this post in threaded view
|

bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

Stefan Monnier
In reply to this post by Stefan Monnier
> As already said, this is a(n ugly) workaround.  syntax.c should handle
> comments in all their generality.  With a bit of consideration, the
> method to do this is clear:

In my world, it's quite normal for a specific language's lexical rules
not to line up 100% with syntax tables (whether for strings, comments,
younameit).  I don't see anything very special here.

A `syntax-propertize` rule for "\*/" should be very easy to implement
and fairly cheap since the regexp is simple and will almost never match.

So, yeah, you can add yet-another-hack on top of the other syntax.c
hacks if you want, but there's a good chance it will only ever be used
by CC-mode.  It will take a lot more code changes in syntax.c than
a quick tweak to your Elisp code to search for "\*/".

I do think it would be good to handle this without `syntax-table`
text-property hacks, but I think that should come with an overhaul of
syntax.c based on a major-mode provided DFA (or something like that) so
it can accommodate all the various oddball cases without even the need
to introduce the notion of escaping comment markers.


        Stefan




Reply | Threaded
Open this post in threaded view
|

bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

Alan Mackenzie
Hello, Stefan.

On Thu, Sep 24, 2020 at 12:56:42 -0400, Stefan Monnier wrote:
> > As already said, this is a(n ugly) workaround.  syntax.c should handle
> > comments in all their generality.  With a bit of consideration, the
> > method to do this is clear:

> In my world, it's quite normal for a specific language's lexical rules
> not to line up 100% with syntax tables (whether for strings, comments,
> younameit).  I don't see anything very special here.

Normally when there's a mismatch, it's because a character is
syntactically ambiguous.  There's nothing syntax.c can do about this.

In the current situation, this isn't the case: syntax.c is unable to
handle a comment scenario where there is no ambiguity.

> A `syntax-propertize` rule for "\*/" should be very easy to implement
> and fairly cheap since the regexp is simple and will almost never match.

Well, the rule would actually be for escaped newlines, but this would be
quite expensive (compared with a syntax.c solution) since every comment
near a change region would need scanning at each change.

> So, yeah, you can add yet-another-hack on top of the other syntax.c
> hacks if you want, but there's a good chance it will only ever be used
> by CC-mode.  It will take a lot more code changes in syntax.c than
> a quick tweak to your Elisp code to search for "\*/".

I've hacked up a working, but as yet unsatisfactory, change to syntax.c.
It is surely better, where possible, to fix bugs at their point of
causation rather than by workarounds elsewhere.  As you note, CC Mode
modes will be the only known users at the moment.

Just as an aside, the project where I was working ~four years ago banned
a proprietory editor after a mammoth search for a bug caused by an
unintentional escaped NL on a line comment.  The banned editor didn't
fontify the continuation line in comment face.  I was able to
demonstrate to the project manager that Emacs fontified that comment
correctly.

> I do think it would be good to handle this without `syntax-table`
> text-property hacks, but I think that should come with an overhaul of
> syntax.c based on a major-mode provided DFA (or something like that) so
> it can accommodate all the various oddball cases without even the need
> to introduce the notion of escaping comment markers.

That sounds almost more like a rewrite than an overhaul.  You mean, I
think, that the syntax of language expressions would be defined using
something a bit like (but more powerful than) regular expressions.  And
with that, the need for syntactic analysis in Lisp would be much
reduced.

We would need to make sure that this wouldn't run more slowly than the
current syntax.c/Lisp combination.

>         Stefan

--
Alan Mackenzie (Nuremberg, Germany).



Reply | Threaded
Open this post in threaded view
|

bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

Michael Welsh Duggan-5
In reply to this post by Alan Mackenzie
Alan Mackenzie <[hidden email]> writes:

> Hello, Mattias.
>
> On Wed, Sep 23, 2020 at 11:01:59 +0200, Mattias Engdegård wrote:
>> Sorry if I misunderstood, but since when do backslashes escape */ in C?
>
> Since forever, but only in the CC Mode test suite.  :-(
>
> I just tried it out with gcc, and it seems that \*/ does indeed end a
> block comment.  But an escaped newline doesn't end a line comment,
> instead continuing it to the next line.  So I got confused.  Thanks for
> pointing out the mistake.
>
> It seems that as well as the existing variable
> comment-end-can-be-escaped, we need a new one, say
> line-comment-end-can-be-escaped, too.  In C and C++ modes, these would
> be nil and t respectively.

But where does it say that backslashes escape */ in C++?  The C++ 14
standard (and it hasn't changed through C++ 20) says:

    2.7 Comments [lex.comment]
   
    The characters /* start a comment, which terminates with the
    characters */. These comments do not nest.  The characters // start
    a comment, which terminates immediately before the next new-line
    character. If there is a form-feed or a vertical-tab character in
    such a comment, only white-space characters shall appear between it
    and the new-line that terminates the comment; no diagnostic is
    required. [ Note: The comment characters //, /*, and */ have no
    special meaning within a // comment and are treated just like other
    characters. Similarly, the comment characters // and /* have no
    special meaning within a /* comment.  — end note ]

--
Michael Welsh Duggan
([hidden email])



Reply | Threaded
Open this post in threaded view
|

bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

Alan Mackenzie
Hello, Michael.

On Thu, Sep 24, 2020 at 14:52:16 -0400, Michael Welsh Duggan wrote:
> Alan Mackenzie <[hidden email]> writes:

> > On Wed, Sep 23, 2020 at 11:01:59 +0200, Mattias Engdegård wrote:
> >> Sorry if I misunderstood, but since when do backslashes escape */ in C?

> > Since forever, but only in the CC Mode test suite.  :-(

> > I just tried it out with gcc, and it seems that \*/ does indeed end a
> > block comment.  But an escaped newline doesn't end a line comment,
> > instead continuing it to the next line.  So I got confused.  Thanks for
> > pointing out the mistake.

> > It seems that as well as the existing variable
> > comment-end-can-be-escaped, we need a new one, say
> > line-comment-end-can-be-escaped, too.  In C and C++ modes, these would
> > be nil and t respectively.

> But where does it say that backslashes escape */ in C++?

Nowhere.  :-(

There has been a test in the CC Mode test suite for many years which
assumed this (but was disabled for existing (X)Emacs versions, waiting
for a new Emacs version to be "fixed").

> The C++ 14 standard (and it hasn't changed through C++ 20) says:

>     2.7 Comments [lex.comment]
   
>     The characters /* start a comment, which terminates with the
>     characters */. These comments do not nest.  The characters // start
>     a comment, which terminates immediately before the next new-line
>     character.

For all the difference it makes, Emacs assumes the comment ends _after_
the NL.

>     If there is a form-feed or a vertical-tab character in such a
>     comment, only white-space characters shall appear between it and
>     the new-line that terminates the comment; no diagnostic is
>     required.

I didn't know that.  Emacs/CC Mode doesn't code up this subtlety.  It
probably isn't worth bothering about.

>     [ Note: The comment characters //, /*, and */ have no special
>     meaning within a // comment and are treated just like other
>     characters. Similarly, the comment characters // and /* have no
>     special meaning within a /* comment.  — end note ]

Additionally, an escaped newline continues a comment onto the next line.
This happens, notionally, at a very early stage of compilation where a
backslash followed by NL anywhere get replaced by a space.  I think that
even two backslashes followed by NL would get replaced by backslash,
space.

> --
> Michael Welsh Duggan
> ([hidden email])

--
Alan Mackenzie (Nuremberg, Germany).



Reply | Threaded
Open this post in threaded view
|

bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

Michael Welsh Duggan-3
Alan Mackenzie <[hidden email]> writes:

> Hello, Michael.
>
> On Thu, Sep 24, 2020 at 14:52:16 -0400, Michael Welsh Duggan wrote:
>> Alan Mackenzie <[hidden email]> writes:
>
>> > On Wed, Sep 23, 2020 at 11:01:59 +0200, Mattias Engdegård wrote:
>> >> Sorry if I misunderstood, but since when do backslashes escape */ in C?
>
>> > Since forever, but only in the CC Mode test suite.  :-(
>
>> > I just tried it out with gcc, and it seems that \*/ does indeed end a
>> > block comment.  But an escaped newline doesn't end a line comment,
>> > instead continuing it to the next line.  So I got confused.  Thanks for
>> > pointing out the mistake.
>
>> > It seems that as well as the existing variable
>> > comment-end-can-be-escaped, we need a new one, say
>> > line-comment-end-can-be-escaped, too.  In C and C++ modes, these would
>> > be nil and t respectively.
>
>> But where does it say that backslashes escape */ in C++?
>
> Nowhere.  :-(
>
> There has been a test in the CC Mode test suite for many years which
> assumed this (but was disabled for existing (X)Emacs versions, waiting
> for a new Emacs version to be "fixed").
>
>> The C++ 14 standard (and it hasn't changed through C++ 20) says:
>
>>     2.7 Comments [lex.comment]
>    
>>     The characters /* start a comment, which terminates with the
>>     characters */. These comments do not nest.  The characters // start
>>     a comment, which terminates immediately before the next new-line
>>     character.
>
> For all the difference it makes, Emacs assumes the comment ends _after_
> the NL.
>
>>     If there is a form-feed or a vertical-tab character in such a
>>     comment, only white-space characters shall appear between it and
>>     the new-line that terminates the comment; no diagnostic is
>>     required.
>
> I didn't know that.  Emacs/CC Mode doesn't code up this subtlety.  It
> probably isn't worth bothering about.
>
>>     [ Note: The comment characters //, /*, and */ have no special
>>     meaning within a // comment and are treated just like other
>>     characters. Similarly, the comment characters // and /* have no
>>     special meaning within a /* comment.  — end note ]
>
> Additionally, an escaped newline continues a comment onto the next line.
> This happens, notionally, at a very early stage of compilation where a
> backslash followed by NL anywhere get replaced by a space.  I think that
> even two backslashes followed by NL would get replaced by backslash,
> space.

Almost.  A backslash followed by a newline is elided completely, joining
the lines.  (Not replaced by a space.  Otherwise, I concur.

--
Michael Welsh Duggan
([hidden email])



Reply | Threaded
Open this post in threaded view
|

bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

Alan Mackenzie
In reply to this post by Stefan Monnier
Hello, Stefan.

On Thu, Sep 24, 2020 at 12:56:42 -0400, Stefan Monnier wrote:
> > As already said, this is a(n ugly) workaround.  syntax.c should handle
> > comments in all their generality.  With a bit of consideration, the
> > method to do this is clear:

> In my world, it's quite normal for a specific language's lexical rules
> not to line up 100% with syntax tables (whether for strings, comments,
> younameit).  I don't see anything very special here.

> A `syntax-propertize` rule for "\*/" should be very easy to implement
> and fairly cheap since the regexp is simple and will almost never match.

> So, yeah, you can add yet-another-hack on top of the other syntax.c
> hacks if you want, but there's a good chance it will only ever be used
> by CC-mode.  It will take a lot more code changes in syntax.c than
> a quick tweak to your Elisp code to search for "\*/".

> I do think it would be good to handle this without `syntax-table`
> text-property hacks, but I think that should come with an overhaul of
> syntax.c based on a major-mode provided DFA (or something like that) so
> it can accommodate all the various oddball cases without even the need
> to introduce the notion of escaping comment markers.

OK, here's the patch.  As a matter of interest, it's been heavily tested
by the .../test/src/syntax-tests.el unit tests, further enhancements to
which are part of the patch.

Just as a reminder, the motivation is to be able to have syntax.c
correctly parse C/C++ line comments which look like:

    foo(); // comment \\
    second line of comment.

by introducing a new syntax flag "e" as a modifier on the syntax entry
for \n:

    (modify-syntax-entry ?\n "> be")

>         Stefan



diff --git a/src/syntax.c b/src/syntax.c
index df07809aaa..c701729ba1 100644
--- a/src/syntax.c
+++ b/src/syntax.c
@@ -108,6 +108,11 @@ SYNTAX_FLAGS_COMMENT_NESTED (int flags)
 {
   return (flags >> 22) & 1;
 }
+static bool
+SYNTAX_FLAGS_COMMENT_ESCAPES (int flags)
+{
+  return (flags >> 24) & 1;
+}
 
 /* FLAGS should be the flags of the main char of the comment marker, e.g.
    the second for comstart and the first for comend.  */
@@ -673,6 +678,26 @@ prev_char_comend_first (ptrdiff_t pos, ptrdiff_t pos_byte)
   return val;
 }
 
+static bool
+comment_ender_quoted (ptrdiff_t from, ptrdiff_t from_byte, int syntax)
+{
+  int c;
+  int next_syntax;
+  if (comment_end_can_be_escaped && char_quoted (from, from_byte))
+    return true;
+  if (SYNTAX_FLAGS_COMMENT_ESCAPES (syntax))
+    {
+      dec_both (&from, &from_byte);
+      UPDATE_SYNTAX_TABLE_BACKWARD (from);
+      c = FETCH_CHAR_AS_MULTIBYTE (from_byte);
+      next_syntax = SYNTAX_WITH_FLAGS (c);
+      UPDATE_SYNTAX_TABLE_FORWARD (from + 1);
+      if (next_syntax == Sescape || next_syntax == Scharquote)
+        return true;
+    }
+  return false;
+}
+
 /* Check whether charpos FROM is at the end of a comment.
    FROM_BYTE is the bytepos corresponding to FROM.
    Do not move back before STOP.
@@ -755,6 +780,20 @@ back_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
  && SYNTAX_FLAGS_COMEND_SECOND (prev_syntax));
       comstart = (com2start || code == Scomment);
 
+      /* Check for any current delimiter being escaped.  */
+      if (from > stop
+          && (((com2end || code == Sendcomment)
+               && comment_ender_quoted (from, from_byte, syntax))
+              || (code == Scomment
+                  && comment_end_can_be_escaped
+                  && char_quoted (from, from_byte))))
+        {
+          dec_both (&from, &from_byte);
+          UPDATE_SYNTAX_TABLE_BACKWARD (from);
+          com2end = comstart = com2start = 0;
+          syntax = Smax;
+        }
+
       /* Nasty cases with overlapping 2-char comment markers:
  - snmp-mode: -- c -- foo -- c --
               --- c --
@@ -1191,6 +1230,10 @@ the value of a `syntax-table' text property.  */)
       case 'c':
  val |= 1 << 23;
  break;
+
+      case 'e':
+        val |= 1 << 24;
+        break;
       }
 
   if (val < ASIZE (Vsyntax_code_object) && NILP (match))
@@ -1279,7 +1322,8 @@ DEFUN ("internal-describe-syntax-value", Finternal_describe_syntax_value,
   (Lisp_Object syntax)
 {
   int code, syntax_code;
-  bool start1, start2, end1, end2, prefix, comstyleb, comstylec, comnested;
+  bool start1, start2, end1, end2, prefix, comstyleb, comstylec, comnested,
+    comescapes;
   char str[2];
   Lisp_Object first, match_lisp, value = syntax;
 
@@ -1320,6 +1364,7 @@ DEFUN ("internal-describe-syntax-value", Finternal_describe_syntax_value,
   comstyleb = SYNTAX_FLAGS_COMMENT_STYLEB (syntax_code);
   comstylec = SYNTAX_FLAGS_COMMENT_STYLEC (syntax_code);
   comnested = SYNTAX_FLAGS_COMMENT_NESTED (syntax_code);
+  comescapes = SYNTAX_FLAGS_COMMENT_ESCAPES (syntax_code);
 
   if (Smax <= code)
     {
@@ -1353,6 +1398,8 @@ DEFUN ("internal-describe-syntax-value", Finternal_describe_syntax_value,
     insert ("c", 1);
   if (comnested)
     insert ("n", 1);
+  if (comescapes)
+    insert ("e", 1);
 
   insert_string ("\twhich means: ");
 
@@ -1416,6 +1463,8 @@ DEFUN ("internal-describe-syntax-value", Finternal_describe_syntax_value,
     insert_string (" (comment style c)");
   if (comnested)
     insert_string (" (nestable)");
+  if (comescapes)
+    insert_string (" (can be escaped)");
 
   if (prefix)
     {
@@ -2336,7 +2385,7 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
   && SYNTAX_FLAGS_COMMENT_STYLE (syntax, 0) == style
   && (SYNTAX_FLAGS_COMMENT_NESTED (syntax) ?
       (nesting > 0 && --nesting == 0) : nesting < 0)
-          && !(comment_end_can_be_escaped && char_quoted (from, from_byte)))
+          && !comment_ender_quoted (from, from_byte, syntax))
  /* We have encountered a comment end of the same style
    as the comment sequence which began this comment
    section.  */
@@ -2354,12 +2403,12 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
  /* We have encountered a nested comment of the same style
    as the comment sequence which began this comment section.  */
  nesting++;
-      if (comment_end_can_be_escaped
-          && (code == Sescape || code == Scharquote))
+      if (SYNTAX_FLAGS_COMEND_FIRST (syntax)
+          && comment_ender_quoted (from, from_byte, syntax))
         {
           inc_both (&from, &from_byte);
           UPDATE_SYNTAX_TABLE_FORWARD (from);
-          if (from == stop) continue; /* Failure */
+          continue;
         }
       inc_both (&from, &from_byte);
       UPDATE_SYNTAX_TABLE_FORWARD (from);
@@ -2493,8 +2542,8 @@ between them, return t; otherwise return nil.  */)
       /* We're at the start of a comment.  */
       found = forw_comment (from, from_byte, stop, comnested, comstyle, 0,
     &out_charpos, &out_bytepos, &dummy, &dummy2);
-      from = out_charpos; from_byte = out_bytepos;
-      if (!found)
+      from = out_charpos; from_byte = out_bytepos;
+     if (!found)
  {
   SET_PT_BOTH (from, from_byte);
   return Qnil;
@@ -2526,21 +2575,27 @@ between them, return t; otherwise return nil.  */)
   if (code == Sendcomment)
     comstyle = SYNTAX_FLAGS_COMMENT_STYLE (syntax, 0);
   if (from > stop && SYNTAX_FLAGS_COMEND_SECOND (syntax)
-      && prev_char_comend_first (from, from_byte)
-      && !char_quoted (from - 1, dec_bytepos (from_byte)))
+      && prev_char_comend_first (from, from_byte))
     {
       int other_syntax;
-      /* We must record the comment style encountered so that
+              /* We must record the comment style encountered so that
  later, we can match only the proper comment begin
  sequence of the same style.  */
       dec_both (&from, &from_byte);
-      code = Sendcomment;
-      /* Calling char_quoted, above, set up global syntax position
- at the new value of FROM.  */
       c1 = FETCH_CHAR_AS_MULTIBYTE (from_byte);
       other_syntax = SYNTAX_WITH_FLAGS (c1);
-      comstyle = SYNTAX_FLAGS_COMMENT_STYLE (other_syntax, syntax);
-      comnested |= SYNTAX_FLAGS_COMMENT_NESTED (other_syntax);
+              if (!comment_ender_quoted (from, from_byte, other_syntax))
+                {
+                  code = Sendcomment;
+                  comstyle = SYNTAX_FLAGS_COMMENT_STYLE (other_syntax, syntax);
+                  comnested |= SYNTAX_FLAGS_COMMENT_NESTED (other_syntax);
+                  syntax = other_syntax;
+                }
+              else
+                {
+                  inc_both (&from, &from_byte);
+                  UPDATE_SYNTAX_TABLE_FORWARD (from);
+                }
     }
 
   if (code == Scomment_fence)
@@ -2579,7 +2634,8 @@ between them, return t; otherwise return nil.  */)
     }
   else if (code == Sendcomment)
     {
-              found = (!quoted || !comment_end_can_be_escaped)
+              found =
+                !comment_ender_quoted (from, from_byte, syntax)
                 && back_comment (from, from_byte, stop, comnested, comstyle,
                                  &out_charpos, &out_bytepos);
       if (!found)
@@ -2864,6 +2920,7 @@ scan_lists (EMACS_INT from0, EMACS_INT count, EMACS_INT depth, bool sexpflag)
       other_syntax = SYNTAX_WITH_FLAGS (c2);
       comstyle = SYNTAX_FLAGS_COMMENT_STYLE (other_syntax, syntax);
       comnested |= SYNTAX_FLAGS_COMMENT_NESTED (other_syntax);
+              syntax = other_syntax;
     }
 
   /* Quoting turns anything except a comment-ender
@@ -2946,7 +3003,10 @@ scan_lists (EMACS_INT from0, EMACS_INT count, EMACS_INT depth, bool sexpflag)
     case Sendcomment:
       if (!parse_sexp_ignore_comments)
  break;
-      found = back_comment (from, from_byte, stop, comnested, comstyle,
+      found =
+                (from == stop
+                 || !comment_ender_quoted (from, from_byte, syntax))
+                && back_comment (from, from_byte, stop, comnested, comstyle,
     &out_charpos, &out_bytepos);
       /* FIXME:  if !found, it really wasn't a comment-end.
  For single-char Sendcomment, we can't do much about it apart
diff --git a/test/src/syntax-resources/syntax-comments.txt b/test/src/syntax-resources/syntax-comments.txt
index a292d816b9..f3357ea244 100644
--- a/test/src/syntax-resources/syntax-comments.txt
+++ b/test/src/syntax-resources/syntax-comments.txt
@@ -34,7 +34,7 @@
 54{ //74 \
 }54
 55{/* */}55
-56{ /*76 \*/ }56
+56{ /*76 \*/80 }56
 57*/77
 58}58
 60{ /*78 \\*/79}60
@@ -87,6 +87,21 @@
 110
 111#| ; |#111
 
+/* Comments and purported comments containing string delimiters. */
+120/* "string" */120
+121/* "" */121
+122/* " */122
+130/*
+" " */130
+" "*/123
+124/* " ' */124
+126/*
+" ' */126
+127/* " " " " " */127
+128/* " ' "  ' " ' */128
+129/*   ' "  ' " ' */129
+" ' */125
+
 Local Variables:
 mode: fundamental
 eval: (set-syntax-table (make-syntax-table))
diff --git a/test/src/syntax-tests.el b/test/src/syntax-tests.el
index edee01ec58..399986c31d 100644
--- a/test/src/syntax-tests.el
+++ b/test/src/syntax-tests.el
@@ -307,6 +307,7 @@ syntax-pps-comments
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 (defun {-in ()
   (setq parse-sexp-ignore-comments t)
+  (setq comment-use-syntax-ppss nil)
   (setq comment-end-can-be-escaped nil)
   (modify-syntax-entry ?{ "<")
   (modify-syntax-entry ?} ">"))
@@ -336,6 +337,7 @@ {-out
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 (defun \;-in ()
   (setq parse-sexp-ignore-comments t)
+  (setq comment-use-syntax-ppss nil)
   (setq comment-end-can-be-escaped nil)
   (modify-syntax-entry ?\n ">")
   (modify-syntax-entry ?\; "<")
@@ -375,6 +377,7 @@ \;-out
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 (defun \#|-in ()
   (setq parse-sexp-ignore-comments t)
+  (setq comment-use-syntax-ppss nil)
   (modify-syntax-entry ?# ". 14")
   (modify-syntax-entry ?| ". 23n")
   (modify-syntax-entry ?\; "< b")
@@ -418,15 +421,18 @@ \#|-out
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 (defun /*-in ()
   (setq parse-sexp-ignore-comments t)
+  (setq comment-use-syntax-ppss nil)
   (setq comment-end-can-be-escaped t)
   (modify-syntax-entry ?/ ". 124b")
   (modify-syntax-entry ?* ". 23")
-  (modify-syntax-entry ?\n "> b"))
+  (modify-syntax-entry ?\n "> b")
+  (modify-syntax-entry ?\' "\""))
 (defun /*-out ()
   (setq comment-end-can-be-escaped nil)
   (modify-syntax-entry ?/ ".")
   (modify-syntax-entry ?* ".")
-  (modify-syntax-entry ?\n " "))
+  (modify-syntax-entry ?\n " ")
+  (modify-syntax-entry ?\' "."))
 (eval-and-compile
   (setq syntax-comments-section "c"))
 
@@ -489,4 +495,142 @@ /*-out
 (syntax-pps-comments /* 56 76 77 58)
 (syntax-pps-comments /* 60 78 79)
 
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; Emacs 28 "C" style comments - `comment-end-can-be-escaped' is nil, the
+;; "e" flag is used for line comments.
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+(defun //-in ()
+  (setq parse-sexp-ignore-comments t)
+  (setq comment-use-syntax-ppss nil)
+  (modify-syntax-entry ?/ ". 124be")
+  (modify-syntax-entry ?* ". 23")
+  (modify-syntax-entry ?\n "> be")
+  (modify-syntax-entry ?\' "\""))
+(defun //-out ()
+  (modify-syntax-entry ?/ ".")
+  (modify-syntax-entry ?* ".")
+  (modify-syntax-entry ?\n " ")
+  (modify-syntax-entry ?\' "."))
+(eval-and-compile
+  (setq syntax-comments-section "c++"))
+
+(syntax-comments // forward t 1)
+(syntax-comments // backward t 1)
+(syntax-comments // forward t 2)
+(syntax-comments // backward t 2)
+(syntax-comments // forward t 3)
+(syntax-comments // backward t 3)
+
+(syntax-comments // forward t 4)
+(syntax-comments // backward t 4)
+(syntax-comments // forward t 5 6)
+(syntax-comments // backward nil 5 0)
+(syntax-comments // forward nil 6 0)
+(syntax-comments // backward t 6 5)
+
+(syntax-comments // forward t 7)
+(syntax-comments // backward t 7)
+(syntax-comments // forward nil 8 0)
+(syntax-comments // backward nil 8 0)
+(syntax-comments // forward t 9)
+(syntax-comments // backward t 9)
+
+(syntax-comments // forward nil 10 0)
+(syntax-comments // backward nil 10 0)
+(syntax-comments // forward t 11)
+(syntax-comments // backward t 11)
+
+(syntax-comments // forward t 13)
+(syntax-comments // backward t 13)
+(syntax-comments // forward t 15)
+(syntax-comments // backward t 15)
+
+;; Emacs 28 "C" style comments inside brace lists.
+(syntax-br-comments // forward t 50)
+(syntax-br-comments // backward t 50)
+(syntax-br-comments // forward t 51)
+(syntax-br-comments // backward t 51)
+(syntax-br-comments // forward t 52)
+(syntax-br-comments // backward t 52)
+
+(syntax-br-comments // forward t 53)
+(syntax-br-comments // backward t 53)
+(syntax-br-comments // forward t 54 58)
+(syntax-br-comments // backward t 54)
+(syntax-br-comments // forward t 55)
+(syntax-br-comments // backward t 55)
+
+(syntax-br-comments // forward t 56 56)
+(syntax-br-comments // backward t 58 54)
+(syntax-br-comments // backward nil 59)
+(syntax-br-comments // forward t 60)
+(syntax-br-comments // backward t 60)
+
+;; Emacs 28 "C" style comments parsed by `parse-partial-sexp'.
+(syntax-pps-comments // 50 70 71)
+(syntax-pps-comments // 52 72 73)
+(syntax-pps-comments // 54 74 55 58)
+(syntax-pps-comments // 56 76 80)
+(syntax-pps-comments // 60 78 79)
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; Comments containing string delimiters.
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+(eval-and-compile
+  (setq syntax-comments-section "c-\""))
+
+(syntax-comments /* forward t 120)
+(syntax-comments /* backward t 120)
+(syntax-comments /* forward t 121)
+(syntax-comments /* backward t 121)
+(syntax-comments /* forward t 122)
+(syntax-comments /* backward t 122)
+
+(syntax-comments /* backward nil 123 0)
+(syntax-comments /* forward t 124)
+(syntax-comments /* backward t 124)
+(syntax-comments /* backward nil 125 0)
+(syntax-comments /* forward t 126)
+(syntax-comments /* backward t 126)
+
+(syntax-comments /* forward t 127)
+(syntax-comments /* backward t 127)
+(syntax-comments /* forward t 128)
+(syntax-comments /* backward t 128)
+(syntax-comments /* forward t 129)
+(syntax-comments /* backward t 129)
+
+(syntax-comments /* forward t 130)
+(syntax-comments /* backward t 130)
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; The same again, with Emacs 28 style C comments.
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+(eval-and-compile
+  (setq syntax-comments-section "c++-\""))
+
+(syntax-comments // forward t 120)
+(syntax-comments // backward t 120)
+(syntax-comments // forward t 121)
+(syntax-comments // backward t 121)
+(syntax-comments // forward t 122)
+(syntax-comments // backward t 122)
+
+(syntax-comments // backward nil 123 0)
+(syntax-comments // forward t 124)
+(syntax-comments // backward t 124)
+(syntax-comments // backward nil 125 0)
+(syntax-comments // forward t 126)
+(syntax-comments // backward t 126)
+
+(syntax-comments // forward t 127)
+(syntax-comments // backward t 127)
+(syntax-comments // forward t 128)
+(syntax-comments // backward t 128)
+(syntax-comments // forward t 129)
+(syntax-comments // backward t 129)
+
+(syntax-comments // forward t 130)
+(syntax-comments // backward t 130)
+
 ;;; syntax-tests.el ends here


--
Alan Mackenzie (Nuremberg, Germany).



Reply | Threaded
Open this post in threaded view
|

bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

Stefan Monnier
>> So, yeah, you can add yet-another-hack on top of the other syntax.c
>> hacks if you want, but there's a good chance it will only ever be used
>> by CC-mode.  It will take a lot more code changes in syntax.c than
>> a quick tweak to your Elisp code to search for "\*/".
[...]
> OK, here's the patch.

I think the patch agrees with my assessment above (even though it's
still missing a etc/NEWS entry, adjustment to the docstring of
modify-syntax-entry and to the .texi manual).

I really can't understand why you resist so much the use of
a `syntax-table` property on those rare \\\n sequences.


        Stefan


PS: Also, I just noticed that `gcc -Wall` warns about the use of such
multiline comments, so it doesn't seem to be a very popular feature.

PPS: For reference, I just tried to add support for it in sm-c-mode
and this is the resulting code:


@@ -312,7 +315,15 @@ E.g. a #define nested within 2 #ifs will be turned into \"#  define\"."
                                'syntax-table (string-to-syntax "|"))
             (put-text-property (match-beginning 2) (match-end 2)
                                'syntax-table (string-to-syntax "|")))
-          (sm-c--cpp-syntax-propertize end)))))
+          (sm-c--cpp-syntax-propertize end))))
+    ("\\\\\\(\n\\)"
+     (1 (let ((ppss (save-excursion (syntax-ppss (match-beginning 0)))))
+          (when (and (nth 4 ppss)        ;Within a comment
+                     (null (nth 7 ppss)) ;Within a // comment
+                     (save-excursion     ;The \ is not itself escaped
+                       (goto-char (match-beginning 0))
+                       (zerop (mod (skip-chars-backward "\\\\") 2))))
+            (string-to-syntax "."))))))
    (point) end))
 
 (defun sm-c-syntactic-face-function (ppss)




Reply | Threaded
Open this post in threaded view
|

bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

Stefan Monnier
> Because syntax-table text properties are already used for so many
> different things in CC Mode (I think the count is five in C++ Mode).
> Adding another one would mean having to scan for this rare construct at
> every buffer change, and this would slow things down, possibly a lot.

The fact that you already have 5 other such uses implies that the slow
down from this one cannot possibly be larger than 20% (since the scan
for it is very simple, I doubt any of the other 5 is simpler).

Most major modes have such things and we live just fine with them.
This is a non-issue.


        Stefan




Reply | Threaded
Open this post in threaded view
|

bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

Eli Zaretskii
In reply to this post by Stefan Monnier
> Date: Sun, 22 Nov 2020 13:12:31 +0000
> From: Alan Mackenzie <[hidden email]>
> Cc: [hidden email],
>  Mattias Engdegård <[hidden email]>, [hidden email]
>
> +@samp{e} means that when @var{c}, a comment ender or first character
> +of a two character ender, is directly proceded by one or more escape
                                         ^^^^^^^^
"preceded", I guess?



Reply | Threaded
Open this post in threaded view
|

bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

Alan Mackenzie
In reply to this post by Stefan Monnier
Hello, Stefan.

On Sun, Nov 22, 2020 at 10:20:32 -0500, Stefan Monnier wrote:
> > Because syntax-table text properties are already used for so many
> > different things in CC Mode (I think the count is five in C++ Mode).
> > Adding another one would mean having to scan for this rare construct at
> > every buffer change, and this would slow things down, possibly a lot.

> The fact that you already have 5 other such uses implies that the slow
> down from this one cannot possibly be larger than 20% (since the scan
> for it is very simple, I doubt any of the other 5 is simpler).

The fact remains that an implementation at the C level is objectively
better than one at the Lisp level.

> Most major modes have such things and we live just fine with them.
> This is a non-issue.

Really?  Are there any other programming language modes whose comments
syntax.c cannot handle without syntax-table text properties?

>         Stefan

--
Alan Mackenzie (Nuremberg, Germany).



Reply | Threaded
Open this post in threaded view
|

bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

Alan Mackenzie
Hello, Dmitry.

On Sun, Nov 22, 2020 at 19:46:24 +0200, Dmitry Gutov wrote:
> On 22.11.2020 19:08, Alan Mackenzie wrote:
> > Really?  Are there any other programming language modes whose comments
> > syntax.c cannot handle without syntax-table text properties?

> Ruby is just one example.

Thanks.

I've just searched the web for that.  Ruby has block comment delimiters
=begin and =end.

It would be possible to handle these in syntax.c, but somewhat clumsy
and awkward.

Presumably ruby-mode handles these with syntax-table text properties
applied to the = sign and the terminating d, which is a little clumsy,
but not too bad, at the Lisp level.

--
Alan Mackenzie (Nuremberg, Germany).



Reply | Threaded
Open this post in threaded view
|

bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.

Dmitry Gutov
On 22.11.2020 20:19, Alan Mackenzie wrote:

> Hello, Dmitry.
>
> On Sun, Nov 22, 2020 at 19:46:24 +0200, Dmitry Gutov wrote:
>> On 22.11.2020 19:08, Alan Mackenzie wrote:
>>> Really?  Are there any other programming language modes whose comments
>>> syntax.c cannot handle without syntax-table text properties?
>
>> Ruby is just one example.
>
> Thanks.
>
> I've just searched the web for that.  Ruby has block comment delimiters
> =begin and =end.
>
> It would be possible to handle these in syntax.c, but somewhat clumsy
> and awkward.

Just like the C comments syntax discussed here.

> Presumably ruby-mode handles these with syntax-table text properties
> applied to the = sign and the terminating d, which is a little clumsy,
> but not too bad, at the Lisp level.

This is just two more regexps to search for (and propertize). I don't
expect that the slowdown from them is in any way perceptible.

And the general point is that the Emacs syntax table structure doesn't
necessarily have to mirror the syntax of the C language.



12