Another scan for regexp mistakes, based on a slightly improved detector that finds more nested repetitions.
|
On Sun, 1 Dec 2019 at 22:42, Mattias Engdegård <[hidden email]> wrote: Another scan for regexp mistakes, based on a slightly improved detector that finds more nested repetitions. Hi Mattias, Many of these are already fixed, on master. Did you scan the wrong tree? |
2 dec. 2019 kl. 11.14 skrev Richard Copley <[hidden email]>:
> Many of these are already fixed, on master. Did you scan the wrong tree? Wrong file attached, my apologies. Here is the right one. |
Thanks for the scan; I installed the attached.
|
Paul Eggert <[hidden email]> writes:
> Thanks for the scan; I installed the attached. > > diff --git a/lisp/mail/rfc2368.el b/lisp/mail/rfc2368.el > index 05f27e4d99..b658ffab58 100644 > --- a/lisp/mail/rfc2368.el > +++ b/lisp/mail/rfc2368.el > @@ -61,7 +61,7 @@ > ;; only an approximation? > ;; see rfc 1738 > (defconst rfc2368-mailto-regexp > - "^\\(mailto:\\)\\([^?]+\\)*\\(\\?\\(.*\\)\\)*" > + "^\\(mailto:\\)\\([^?]+\\)?\\(\\?\\(.*\\)\\)*" > "Regular expression to match and aid in parsing a mailto url.") > > ;; describes 'mailto:' Wouldn't "^\\(mailto:\\)\\([^?]*\\)\\(\\?\\(.*\\)\\)*" make more sense? Maybe even without the grouping, as it doesn't seem to be used. If I'm not mistaken (and I might be) this matches "mailto:" followed by anything as long as it contains at most one question mark. > diff --git a/lisp/progmodes/verilog-mode.el b/lisp/progmodes/verilog-mode.el > index 0afbdc3dd1..6ec8d995c1 100644 > --- a/lisp/progmodes/verilog-mode.el > +++ b/lisp/progmodes/verilog-mode.el > @@ -10973,7 +10973,7 @@ verilog-inject-inst > (t > ;; Delete identical interconnect > (let ((case-fold-search nil)) ; So we don't convert upper-to-lower, etc > - (while (verilog-re-search-forward-quick "\\.\\s *\\([a-zA-Z0-9`_$]+\\)*\\s *(\\s *\\1\\s *)\\s *" end-pt t) > + (while (verilog-re-search-forward-quick "\\.\\s *\\([a-zA-Z0-9`_$]+\\)?\\s *(\\s *\\1\\s *)\\s *" end-pt t) > (delete-region (match-beginning 0) (match-end 0)) > (setq end-pt (- end-pt (- (match-end 0) (match-beginning 0)))) ; Keep it correct > (while (or (looking-at "[ \t\n\f,]+") Similarly, in this one I think you could also replace the `+' with a `*' and leave out the `?'. -- Michael Welsh Duggan ([hidden email]) |
In reply to this post by Paul Eggert
5 dec. 2019 kl. 01.57 skrev Paul Eggert <[hidden email]>:
> Thanks for the scan; I installed the attached. Thank you, looks very reasonable. Since repetition of capture groups (like "\\(a.\\)+" is usually a mistake I tried to locate those specifically, but it's hard to know for sure that a group is unused. Bonus: fresh from newly committed Org changes! lisp/org/org-agenda.el:7673:40: In call to string-match: Unescaped literal `+' (pos 1) "^+[-+]" .^ lisp/org/org.el:7828:34: In call to string-match: Duplicated `\' inside character alternative (pos 12) "\\`[ \t]*\\([\\+\\-]?[0-9]+\\)\\([dwmy]\\)[ \t]*\\'" ................^ |
In reply to this post by Michael Welsh Duggan-5
5 dec. 2019 kl. 03.10 skrev Michael Welsh Duggan <[hidden email]>:
>> - "^\\(mailto:\\)\\([^?]+\\)*\\(\\?\\(.*\\)\\)*" >> + "^\\(mailto:\\)\\([^?]+\\)?\\(\\?\\(.*\\)\\)*" > > Wouldn't "^\\(mailto:\\)\\([^?]*\\)\\(\\?\\(.*\\)\\)*" make more sense? > Maybe even without the grouping, as it doesn't seem to be used. If I'm > not mistaken (and I might be) this matches "mailto:" followed by anything > as long as it contains at most one question mark. Group 2 is actually used (see rfc2368-mailto-prequery-index). Although your suggested change does not alter the set of matched strings, the code would need to be altered as well, since group 2 would then be the empty string instead of nil when it did not match anything. (The group around mailto: is harder to justify.) >> - (while (verilog-re-search-forward-quick "\\.\\s *\\([a-zA-Z0-9`_$]+\\)*\\s *(\\s *\\1\\s *)\\s *" end-pt t) >> + (while (verilog-re-search-forward-quick "\\.\\s *\\([a-zA-Z0-9`_$]+\\)?\\s *(\\s *\\1\\s *)\\s *" end-pt t) > > Similarly, in this one I think you could also replace the `+' with a `*' > and leave out the `?'. Group 1 is used in a backref, which only matches if the group matched. With your change, the regexp would match ".()", which isn't matched at present. The repetition-of-repetition check does not complain about \(X+\)? for these reasons. It is also not a performance concern. |
5 dec. 2019 kl. 12.38 skrev Mattias Engdegård <[hidden email]>:
> > 5 dec. 2019 kl. 03.10 skrev Michael Welsh Duggan <[hidden email]>: > >>> - (while (verilog-re-search-forward-quick "\\.\\s *\\([a-zA-Z0-9`_$]+\\)*\\s *(\\s *\\1\\s *)\\s *" end-pt t) >>> + (while (verilog-re-search-forward-quick "\\.\\s *\\([a-zA-Z0-9`_$]+\\)?\\s *(\\s *\\1\\s *)\\s *" end-pt t) >> >> Similarly, in this one I think you could also replace the `+' with a `*' >> and leave out the `?'. > > Group 1 is used in a backref, which only matches if the group matched. With your change, the regexp would match ".()", which isn't matched at present. Correction: the '?' can be removed (keeping the '+') since the \1 is matched unconditionally. |
On 12/5/19 4:11 AM, Mattias Engdegård wrote:
> Correction: the '?' can be removed (keeping the '+') since the \1 is matched unconditionally. Thanks, I installed the attached patch to fix that and the two new Org problems you mentioned today. |
Free forum by Nabble | Edit this page |