Regexp linting scan

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Regexp linting scan

Mattias Engdegård-2
Another scan for regexp mistakes, based on a slightly improved detector that finds more nested repetitions.


relint.log (167K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Regexp linting scan

Richard Copley-2
On Sun, 1 Dec 2019 at 22:42, Mattias Engdegård <[hidden email]> wrote:
Another scan for regexp mistakes, based on a slightly improved detector that finds more nested repetitions.

Hi Mattias,
Many of these are already fixed, on master. Did you scan the wrong tree?

Reply | Threaded
Open this post in threaded view
|

Re: Regexp linting scan

Mattias Engdegård-2
2 dec. 2019 kl. 11.14 skrev Richard Copley <[hidden email]>:

> Many of these are already fixed, on master. Did you scan the wrong tree?

Wrong file attached, my apologies. Here is the right one.


relint.log (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Regexp linting scan

Paul Eggert
Thanks for the scan; I installed the attached.


0001-Fix-regex-repetition-of-repetitions.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Regexp linting scan

Michael Welsh Duggan-5
Paul Eggert <[hidden email]> writes:

> Thanks for the scan; I installed the attached.
>
> diff --git a/lisp/mail/rfc2368.el b/lisp/mail/rfc2368.el
> index 05f27e4d99..b658ffab58 100644
> --- a/lisp/mail/rfc2368.el
> +++ b/lisp/mail/rfc2368.el
> @@ -61,7 +61,7 @@
>  ;; only an approximation?
>  ;; see rfc 1738
>  (defconst rfc2368-mailto-regexp
> -  "^\\(mailto:\\)\\([^?]+\\)*\\(\\?\\(.*\\)\\)*"
> +  "^\\(mailto:\\)\\([^?]+\\)?\\(\\?\\(.*\\)\\)*"
>    "Regular expression to match and aid in parsing a mailto url.")
>  
>  ;; describes 'mailto:'

Wouldn't "^\\(mailto:\\)\\([^?]*\\)\\(\\?\\(.*\\)\\)*" make more sense?
Maybe even without the grouping, as it doesn't seem to be used.  If I'm
not mistaken (and I might be) this matches "mailto:" followed by anything
as long as it contains at most one question mark.

> diff --git a/lisp/progmodes/verilog-mode.el b/lisp/progmodes/verilog-mode.el
> index 0afbdc3dd1..6ec8d995c1 100644
> --- a/lisp/progmodes/verilog-mode.el
> +++ b/lisp/progmodes/verilog-mode.el
> @@ -10973,7 +10973,7 @@ verilog-inject-inst
>   (t
>   ;; Delete identical interconnect
>                   (let ((case-fold-search nil))  ; So we don't convert upper-to-lower, etc
> -   (while (verilog-re-search-forward-quick "\\.\\s *\\([a-zA-Z0-9`_$]+\\)*\\s *(\\s *\\1\\s *)\\s *" end-pt t)
> +   (while (verilog-re-search-forward-quick "\\.\\s *\\([a-zA-Z0-9`_$]+\\)?\\s *(\\s *\\1\\s *)\\s *" end-pt t)
>       (delete-region (match-beginning 0) (match-end 0))
>                       (setq end-pt (- end-pt (- (match-end 0) (match-beginning 0))))  ; Keep it correct
>       (while (or (looking-at "[ \t\n\f,]+")

Similarly, in this one I think you could also replace the `+' with a `*'
and leave out the `?'.

--
Michael Welsh Duggan
([hidden email])

Reply | Threaded
Open this post in threaded view
|

Re: Regexp linting scan

Mattias Engdegård-2
In reply to this post by Paul Eggert
5 dec. 2019 kl. 01.57 skrev Paul Eggert <[hidden email]>:

> Thanks for the scan; I installed the attached.

Thank you, looks very reasonable.

Since repetition of capture groups (like "\\(a.\\)+" is usually a mistake I tried to locate those specifically, but it's hard to know for sure that a group is unused.

Bonus: fresh from newly committed Org changes!

lisp/org/org-agenda.el:7673:40: In call to string-match: Unescaped literal `+' (pos 1)
  "^+[-+]"
   .^
lisp/org/org.el:7828:34: In call to string-match: Duplicated `\' inside character alternative (pos 12)
  "\\`[ \t]*\\([\\+\\-]?[0-9]+\\)\\([dwmy]\\)[ \t]*\\'"
   ................^



Reply | Threaded
Open this post in threaded view
|

Re: Regexp linting scan

Mattias Engdegård-2
In reply to this post by Michael Welsh Duggan-5
5 dec. 2019 kl. 03.10 skrev Michael Welsh Duggan <[hidden email]>:

>> -  "^\\(mailto:\\)\\([^?]+\\)*\\(\\?\\(.*\\)\\)*"
>> +  "^\\(mailto:\\)\\([^?]+\\)?\\(\\?\\(.*\\)\\)*"
>
> Wouldn't "^\\(mailto:\\)\\([^?]*\\)\\(\\?\\(.*\\)\\)*" make more sense?
> Maybe even without the grouping, as it doesn't seem to be used.  If I'm
> not mistaken (and I might be) this matches "mailto:" followed by anything
> as long as it contains at most one question mark.

Group 2 is actually used (see rfc2368-mailto-prequery-index). Although your suggested change does not alter the set of matched strings, the code would need to be altered as well, since group 2 would then be the empty string instead of nil when it did not match anything.
(The group around mailto: is harder to justify.)

>> -   (while (verilog-re-search-forward-quick "\\.\\s *\\([a-zA-Z0-9`_$]+\\)*\\s *(\\s *\\1\\s *)\\s *" end-pt t)
>> +   (while (verilog-re-search-forward-quick "\\.\\s *\\([a-zA-Z0-9`_$]+\\)?\\s *(\\s *\\1\\s *)\\s *" end-pt t)
>
> Similarly, in this one I think you could also replace the `+' with a `*'
> and leave out the `?'.

Group 1 is used in a backref, which only matches if the group matched. With your change, the regexp would match ".()", which isn't matched at present.

The repetition-of-repetition check does not complain about \(X+\)? for these reasons. It is also not a performance concern.


Reply | Threaded
Open this post in threaded view
|

Re: Regexp linting scan

Mattias Engdegård-2
5 dec. 2019 kl. 12.38 skrev Mattias Engdegård <[hidden email]>:

>
> 5 dec. 2019 kl. 03.10 skrev Michael Welsh Duggan <[hidden email]>:
>
>>> -   (while (verilog-re-search-forward-quick "\\.\\s *\\([a-zA-Z0-9`_$]+\\)*\\s *(\\s *\\1\\s *)\\s *" end-pt t)
>>> +   (while (verilog-re-search-forward-quick "\\.\\s *\\([a-zA-Z0-9`_$]+\\)?\\s *(\\s *\\1\\s *)\\s *" end-pt t)
>>
>> Similarly, in this one I think you could also replace the `+' with a `*'
>> and leave out the `?'.
>
> Group 1 is used in a backref, which only matches if the group matched. With your change, the regexp would match ".()", which isn't matched at present.

Correction: the '?' can be removed (keeping the '+') since the \1 is matched unconditionally.


Reply | Threaded
Open this post in threaded view
|

Re: Regexp linting scan

Paul Eggert
On 12/5/19 4:11 AM, Mattias Engdegård wrote:

> Correction: the '?' can be removed (keeping the '+') since the \1 is matched unconditionally.

Thanks, I installed the attached patch to fix that and the two new Org
problems you mentioned today.


0001-2019-12-05-regexp-lint-fixes.patch (2K) Download Attachment