bug#47408: Etags support for Mercury [v0.3]

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

bug#47408: Etags support for Mercury [v0.3]

fabrice nicol
Hi,

I'm sending a new patch for this Mercury feature request.

The attached patch fixes a previously unnoticed regression that affects
Objective C parsing (Mercury and Objective C have same file extensions .m).

I had to resort to an added heuristics (implemented in
`test_objc_is_mercury') to disambiguate Mercury from Objective C source
files with extension .m.

The patch is cumulative and replaces the former one.

Best,

Fabrice Nicol


0001-Fixed-regressions-caused-by-Objc-Mercury-ambiguous-f.patch (20K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

bug#47408: Etags support for Mercury [v0.3]

Eli Zaretskii
> From: fabrice nicol <[hidden email]>
> Date: Sat, 27 Mar 2021 11:51:22 +0100
>
> I'm sending a new patch for this Mercury feature request.

Thanks, I have some comments below.

> >From 50f3f9a0d46d11d0ac096f79f0d5aa1bc17b7920 Mon Sep 17 00:00:00 2001
> From: Fabrice Nicol <[hidden email]>
> Date: Sat, 27 Mar 2021 10:16:44 +0100
> Subject: [PATCH] Fixed regressions caused by Objc/Mercury ambiguous file
>  extension .m.

Please accompany the changeset with a ChangeLog-style commit log
message, you can see the style we are using via "git log" and also
find some instructions in CONTRIBUTE.

>  .B \-V, \-\-version
>  Print the current version of the program (same as the version of the
>  emacs \fBetags\fP is shipped with).
> +.TP
> +.B \-, \-\-version
> +Print the current version of the program (same as the version of the
> +emacs \fBetags\fP is shipped with).

Copy/paste mistake? or why are you duplicating the --version
description?

> +.TP
> +.B \-M, \-\-no\-defines
> +For the Mercury programming language, tag both declarations and
> +definitions.  Declarations start a line with \fI:\-\fP optionally followed by a
> +quantifier over a variable (\fIsome [T]\fP or \fIall [T]\fP), then by
> +a builtin operator like \fIpred\fP or \fIfunc\fP.
> +Definitions are first rules of clauses, as in Prolog.
> +Implies \-\-language=mercury.
> +.TP
> +.B \-m, \-\-declarations
> +For the Mercury programming language, tag declarations as with \fB\-M\fP, but do not
> +tag definitions. Implies \-\-language=mercury.

This is not what Francesco Potortì suggested to do.  He suggested that
you use the existing options --no-defines and --declarations, but give
them Mercury-specific meanings when processing Mercury source files.
IOW, let's not introduce the new -m and -M shorthands for these options,
and let's describe the Mercury-specific meaning of the existing
options where they are currently described in etags.1.  OK?

> --- a/etc/NEWS
> +++ b/etc/NEWS
> @@ -93,6 +93,15 @@ useful on systems such as FreeBSD which ships only with "etc/termcap".
>  
>  * Changes in Emacs 28.1
>  
> +---
   ^^^
This should be "+++", since you submitted the changes for the
documentation as part of the changeset.

> +** Etags support for the Mercury programming language (https://mercurylang.org).
> +** New etags command line options '-M/-m' or --declarations/--no-defines'.
> +Tags all Mercury declarations.  For compatibility with Prolog etags support,
> +predicates and functions appearing first in clauses will be tagged if etags is
> +run with the option '-M' or '--declarations'.  If run with '-m' or
> +'--no-defines', declarations will be tagged but definitions will not.
> +Both options imply --language=mercury.

This should be amended for the changes in the options I described
above.

> +/* Define MERCURY_HEURISTICS_RATIO as it was necessary to disambiguate
> +   Mercury from Objective C, which have same file extensions .m */

This comment should explain how the value is used to disambiguate, so
that people could decide what alternative value to use.

> +static void test_objc_is_mercury(char *, language **);
                                  ^^
Our style is to leave one space between the function's name and the
opening parenthesis.  Please follow that here and elsewhere in your
patch.

> @@ -621,7 +629,6 @@ #define STDIN 0x1001 /* returned by getopt_long on --parse-stdin */
>  "In Java code, all the tags constructs of C and C++ code are\n\
>  tagged.  (Use --help --lang=c --lang=c++ --lang=java for full help.)";
>  
> -
>  static const char *Cobol_suffixes [] =
>    { "COB", "cob", NULL };
>  static char Cobol_help [] =

Why remove this empty line?

>  static const char *Objc_suffixes [] =
> -  { "lm", /* Objective lex file */
> -    "m", /* Objective C file */
> -     NULL };
> +  {"lm",
> +   "m",  /* By default, Objective C will be assumed. */
> +   NULL};

This loses the explanation that a .lm file is an ObjC lex file.

> @@ -773,7 +792,6 @@ #define STDIN 0x1001 /* returned by getopt_long on --parse-stdin */
>  'TEXTAGS' to a colon-separated list like, for example,\n\
>       TEXTAGS=\"mycommand:myothercommand\".";
>  
> -
>  static const char *Texinfo_suffixes [] =
>    { "texi", "texinfo", "txi", NULL };
>  static const char Texinfo_help [] =

Again, an empty line removed -- why?

> +  puts ("-m, --declarations\n\
> +        For the Mercury programming language, only tag declarations.\n\
> + Declarations start a line with :- \n\
> +        Implies --language=mercury.");
> +
> +  puts ("-M, --no-defines\n\
> +        For the Mercury programming language, include both declarations and\n\
> + definitions.  Declarations start a line with :- while definitions\n\
> + are first rules for a given item, as for Prolog.\n\
> +        Implies --language=mercury.");
> +

This should be merged with the existing description of the long
options.

>    /* When the optstring begins with a '-' getopt_long does not rearrange the
>       non-options arguments to be at the end, but leaves them alone. */
> -  optstring = concat ("-ac:Cf:Il:o:Qr:RSVhH",
> +  optstring = concat ("-ac:Cf:Il:Mmo:Qr:RSVhHW",
>        (CTAGS) ? "BxdtTuvw" : "Di:",
>        "");

As mentioned, let's not introduce -m and -M.

> +      case 'M':
> + with_mercury_definitions = true; FALLTHROUGH;
> +      case 'm':
> + {
> +  language lang =
> +    { "mercury", Mercury_help, Mercury_functions, Mercury_suffixes };
> +
> +  argbuffer[current_arg].lang = &lang;
> +  argbuffer[current_arg].arg_type = at_language;
> + }
> + break;

Shouldn't be needed anymore.

>   /* Etags options */
> -      case 'D': constantypedefs = false; break;
> +      case 'D': constantypedefs = false;                        break;

This whitespace change is for the worse: our conventions are to use
mixed spaces-with-tabs style for indentation in C source files, not
just spaces.

> +static void test_objc_is_mercury(char *this_file, language **lang)

Our style is to write function definitions like this:

static void
test_objc_is_mercury (char *this_file, language **lang)

IOW, break the line between the return type and the function's name.

> +  FILE* fp = fopen(this_file, "r");
> +  if (fp == NULL) return;

No error/warning if the file couldn't be open?

In any case, this leaks a FILE object: you open a file, but never
close it.

> +  uint64_t lines = 1;
> +  uint64_t mercury_decls = 0;

We don't use such types elsewhere in etags.c; why do you need them
here?  can you use intmax_t instead, as we do elsewhere?

> + case  '%': FALLTHROUGH;
> +        case  ' ': FALLTHROUGH;
> +        case '\t':
> +  start_of_line = false;

FALLTHROUGH isn't needed here, as there's no code under the first 2
'case' lines.

> +      /* Change the language from Objective C to Mercury */

Our style for comments is to end each comment with a period and 2
spaces, like this:

   /* Change the language from Objective C to Mercury.  */

Please follow this style, here and elsewhere in the changeset.

> +  uint8_t decl_type_length = pos - origpos;

Please use 'unsigned char' instead of uint8_t.

> +  if (( (s[len] == '.'  /* This is a statement dot, not a module dot. */
> + || (s[len] == '(' && (len += 1))
> +         || (s[len] == ':'  /* Stopping in case of a rule. */
> +     && s[len + 1] == '-'
> +     && (len += 2)))
> + && (lastlen != len || memcmp (s, last, len) != 0)
> + )
> +      /* Types are often declared on several lines so keeping just
> + the first line */
> +      || is_mercury_type
> +      )

Please avoid parentheses alone on their lines.

> diff --git a/lisp/speedbar.el b/lisp/speedbar.el
> index 12e57b1108..63f3cd6ca1 100644
> --- a/lisp/speedbar.el
> +++ b/lisp/speedbar.el
> @@ -3534,6 +3534,8 @@ speedbar-fetch-etags-parse-list
>       speedbar-parse-c-or-c++tag)
>      ("^\\.emacs$\\|.\\(el\\|l\\|lsp\\)\\'" .
>       "def[^i]+\\s-+\\(\\(\\w\\|[-_]\\)+\\)\\s-*\C-?")
> +      ("^\\.m$\\'" .
> +     "\\(^:-\\)?\\s-*\\(\\(pred\\|func\\|type\\|instance\\|typeclass\\)+\\s+\\([a-z]+[a-zA-Z0-9_]*\\)+\\)\\s-*(?^?")
>  ;    ("\\.\\([fF]\\|for\\|FOR\\|77\\|90\\)\\'" .
>  ;      speedbar-parse-fortran77-tag)
>      ("\\.tex\\'" . speedbar-parse-tex-string)

What about ObjC here? or are these keywords good for ObjC as well?

Last, but not least: if you can, please provide a test file for the
etags test suite, see test/manual/etags/.

Thanks again for working on this.



Reply | Threaded
Open this post in threaded view
|

bug#47408: Etags support for Mercury [v0.3]

fabrice nicol

Thanks for this review.

Changes will be implemented soon as indicated.

(1)  There is just one point that I would like to discuss before changing things around: the proposed -m/-M short option issue.


I left this couple of options in (following Francesco Potorti only for long options --declarations/--no-defines), for two reasons:

1. The ambiguity between Objective C and Mercury

Both languages having the same file extension .m, it was necessary to add in a heuristic test function, in the absence of explicit language identification input from command line.

Yet all heuristics may fail in rare cases. Tests show a fairly low failure rate on the Mercury compiler source code.  Less than 0.5 % of .m files are not identified as Mercury files by the test (this should have been documented somewhere).  File concerned by test failure are some Mercury test files and documentary test files with only (or almost only) comments and blank lines.

While this could be improved by tweaking the heuristic test, it would make it more complex, bug-prone and ultimately hard to maintain.

So -m/-M are useful to deal with these rare files, as they do not rely on the heuristic test function at all but on their own semantics, which explicitly identifies Mercury.   

The only alternative I see is to explicitly warn users about adding '-l mercury' to command line when using long options (in etags.1 and possibly other docs).

Whether this is less intrusive (or more) than -m/-M is not crystal-clear to me.  Both solutions look on a par wrt this criterion, but -m/-M may be more user-friendly.

If two short options are one too many, I propose redesigning the short option pair as just one -m option with a binary argument (like: '-m defines / -m all', or -m 0 / -m 1).


2. The social side of things

As indicated previously, I also consulted the Mercury review list, and the feedback was positive on -m/-M (see below):

Accommodating different people's different preferences is a good idea
if it can be done at acceptable cost.

Instead of -M, you should use --declarations

Instead of -m, you should use --no-defines
There is no need for "instead"; you can support both forms of both options.

So I opted for a compromise: renaming long options, following F. Potorti, and keeping -m/-M, following Z. Somogyi.


(2) Your following question:


diff --git a/lisp/speedbar.el b/lisp/speedbar.el
index 12e57b1108..63f3cd6ca1 100644
--- a/lisp/speedbar.el
+++ b/lisp/speedbar.el
@@ -3534,6 +3534,8 @@ speedbar-fetch-etags-parse-list
      speedbar-parse-c-or-c++tag)
     ("^\\.emacs$\\|.\\(el\\|l\\|lsp\\)\\'" .
      "def[^i]+\\s-+\\(\\(\\w\\|[-_]\\)+\\)\\s-*\C-?")
+      ("^\\.m$\\'" .
+     "\\(^:-\\)?\\s-*\\(\\(pred\\|func\\|type\\|instance\\|typeclass\\)+\\s+\\([a-z]+[a-zA-Z0-9_]*\\)+\\)\\s-*(?^?")
 ;    ("\\.\\([fF]\\|for\\|FOR\\|77\\|90\\)\\'" .
 ;      speedbar-parse-fortran77-tag)
     ("\\.tex\\'" . speedbar-parse-tex-string)
What about ObjC here? or are these keywords good for ObjC as well?

has the following reply: Objective C .m files are not parsed by speedbar.el in current repository code, so the added feature does not break anything.  Issues will only arise if/when Emacs maintainers for Objective C support decide on adding this file format to the speedbar parser.   It would be premature (and out-of-place) for me to settle this on my own.  Should this move happen, the heuristics used in etags.c (function test_objc_is_mercury) could then be ported to elisp code.


+.TP
+.B \-M, \-\-no\-defines
+For the Mercury programming language, tag both declarations and
+definitions.  Declarations start a line with \fI:\-\fP optionally followed by a
+quantifier over a variable (\fIsome [T]\fP or \fIall [T]\fP), then by
+a builtin operator like \fIpred\fP or \fIfunc\fP.
+Definitions are first rules of clauses, as in Prolog.
+Implies \-\-language=mercury.
+.TP
+.B \-m, \-\-declarations
+For the Mercury programming language, tag declarations as with \fB\-M\fP, but do not
+tag definitions. Implies \-\-language=mercury.
This is not what Francesco Potortì suggested to do.  He suggested that
you use the existing options --no-defines and --declarations, but give
them Mercury-specific meanings when processing Mercury source files.
IOW, let's not introduce the new -m and -M shorthands for these options,
and let's describe the Mercury-specific meaning of the existing
options where they are currently described in etags.1.  OK?

+** Etags support for the Mercury programming language (https://mercurylang.org).
+** New etags command line options '-M/-m' or --declarations/--no-defines'.
+Tags all Mercury declarations.  For compatibility with Prolog etags support,
+predicates and functions appearing first in clauses will be tagged if etags is
+run with the option '-M' or '--declarations'.  If run with '-m' or
+'--no-defines', declarations will be tagged but definitions will not.
+Both options imply --language=mercury.
This should be amended for the changes in the options I described
above.
As mentioned, let's not introduce -m and -M.

+      case 'M':
+	with_mercury_definitions = true; FALLTHROUGH;
+      case 'm':
+	{
+	  language lang =
+	    { "mercury", Mercury_help, Mercury_functions, Mercury_suffixes };
+
+	  argbuffer[current_arg].lang = &lang;
+	  argbuffer[current_arg].arg_type = at_language;
+	}
+	break;
Shouldn't be needed anymore.

diff --git a/lisp/speedbar.el b/lisp/speedbar.el
index 12e57b1108..63f3cd6ca1 100644
--- a/lisp/speedbar.el
+++ b/lisp/speedbar.el
@@ -3534,6 +3534,8 @@ speedbar-fetch-etags-parse-list
      speedbar-parse-c-or-c++tag)
     ("^\\.emacs$\\|.\\(el\\|l\\|lsp\\)\\'" .
      "def[^i]+\\s-+\\(\\(\\w\\|[-_]\\)+\\)\\s-*\C-?")
+      ("^\\.m$\\'" .
+     "\\(^:-\\)?\\s-*\\(\\(pred\\|func\\|type\\|instance\\|typeclass\\)+\\s+\\([a-z]+[a-zA-Z0-9_]*\\)+\\)\\s-*(?^?")
 ;    ("\\.\\([fF]\\|for\\|FOR\\|77\\|90\\)\\'" .
 ;      speedbar-parse-fortran77-tag)
     ("\\.tex\\'" . speedbar-parse-tex-string)
What about ObjC here? or are these keywords good for ObjC as well?

Last, but not least: if you can, please provide a test file for the
etags test suite, see test/manual/etags/.

Thanks again for working on this.
Reply | Threaded
Open this post in threaded view
|

bug#47408: Etags support for Mercury [v0.4]

fabrice nicol
Attached is the new patch that integrates your indications.

Please note two points:

1. Now that -m/-M have been done with, there is no use specifying any
Mercury-specific behavior for --no-defines.

Actually the Mercury community consensus is that all declarations should
be tagged in any case.

So --no-defines is just the default behavior of etags run without any
option and does not need to be used explicitly or specifically documented.

I followed your indications about --declarations. I also added a line to
etags.1 about --language=mercury or --language=objc, should the
heuristic test fail to detect the right language. Note, however, that
removing language-specific options comes at a price. The heuristic test
has now to be more complex. I had errless detection results against my
test base of 4,000 mercury files and 500 Obj.-C files. This looks
satisfactory but I had to tweak the heuristic test function
(test_objc_is_mercury) quite a bit to weed out detection failures.

I added the ChangeLog, the requested test file (array.m) under
test/manual/etags/merc-src and altered the corresponding Makefile
accordingly.

2. I removed by added line to speedbar.el, which in the end did not
prove very useful. It is located in a Xemacs compatibility layer that is
no longer used by most users.


Le 28/03/2021 à 18:22, Eli Zaretskii a écrit :

>> From: fabrice nicol <[hidden email]>
>> Date: Sun, 28 Mar 2021 17:49:20 +0200
>>
>> I left this couple of options in (following Francesco Potorti only for long options --declarations/--no-defines),
>> for two reasons:
>>
>> 1. The ambiguity between Objective C and Mercury
>>
>> Both languages having the same file extension .m, it was necessary to add in a heuristic test function, in the
>> absence of explicit language identification input from command line.
>>
>> Yet all heuristics may fail in rare cases. Tests show a fairly low failure rate on the Mercury compiler source
>> code.  Less than 0.5 % of .m files are not identified as Mercury files by the test (this should have been
>> documented somewhere).  File concerned by test failure are some Mercury test files and documentary test
>> files with only (or almost only) comments and blank lines.
>>
>> While this could be improved by tweaking the heuristic test, it would make it more complex, bug-prone and
>> ultimately hard to maintain.
>>
>> So -m/-M are useful to deal with these rare files, as they do not rely on the heuristic test function at all but on
>> their own semantics, which explicitly identifies Mercury.
>>
>> The only alternative I see is to explicitly warn users about adding '-l mercury' to command line when using
>> long options (in etags.1 and possibly other docs).
> I think "-l mercury" is indeed the way to tell etags this is a Mercury
> source.
>
> We never had language-specific options in etags, and I don't see a
> serious enough reason to introduce them now.  I do find it unfortunate
> that Mercury uses the same extension as ObjC, but that's water under
> the bridge.
>
> Of course, if the heuristic test could be improved to make it err
> less, it would also be good.
>
>>   diff --git a/lisp/speedbar.el b/lisp/speedbar.el
>> index 12e57b1108..63f3cd6ca1 100644
>> --- a/lisp/speedbar.el
>> +++ b/lisp/speedbar.el
>> @@ -3534,6 +3534,8 @@ speedbar-fetch-etags-parse-list
>>        speedbar-parse-c-or-c++tag)
>>       ("^\\.emacs$\\|.\\(el\\|l\\|lsp\\)\\'" .
>>        "def[^i]+\\s-+\\(\\(\\w\\|[-_]\\)+\\)\\s-*\C-?")
>> +      ("^\\.m$\\'" .
>> +     "\\(^:-\\)?\\s-*\\(\\(pred\\|func\\|type\\|instance\\|typeclass\\)+\\s+\\([a-z]+[a-zA-Z0-9_]*\\)+\\)\\s-*(?^?")
>>   ;    ("\\.\\([fF]\\|for\\|FOR\\|77\\|90\\)\\'" .
>>   ;      speedbar-parse-fortran77-tag)
>>       ("\\.tex\\'" . speedbar-parse-tex-string)
>>
>>   What about ObjC here? or are these keywords good for ObjC as well?
>>
>> has the following reply: Objective C .m files are not parsed by speedbar.el in current repository code, so the
>> added feature does not break anything.  Issues will only arise if/when Emacs maintainers for Objective C
>> support decide on adding this file format to the speedbar parser.   It would be premature (and out-of-place)
>> for me to settle this on my own.  Should this move happen, the heuristics used in etags.c (function
>> test_objc_is_mercury) could then be ported to elisp code.
> OK, so please add there a comment to say that .m is also Objective C,
> but Speedbar doesn't support it yet.
>
> Thanks.

0001-Add-etags-support-for-Mercury-v0.4.patch (143K) Download Attachment