Inefficient code in xml.el

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Inefficient code in xml.el

Kim Storm

I noticed that xml.el has several occurences of code like this:

           ((looking-at (concat "<!ENTITY[ \t\n\r]*\\(" xml-name-re
                                "\\)[ \t\n\r]*\\(" xml-entity-value-re
                                "\\)[ \t\n\r]*>"))
            (let ((name  (buffer-substring (nth 2 (match-data))
                                           (nth 3 (match-data))))
                  (value (buffer-substring (+ (nth 4 (match-data)) 1)
                                           (- (nth 5 (match-data)) 1))))
              (goto-char (nth 1 (match-data)))


Using (match-data) like that is VERY, VERY inefficient.

Use either match-beginning/match-end, or match-string instead.


--
Kim F. Storm <[hidden email]> http://www.cua.dk



_______________________________________________
Emacs-devel mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/emacs-devel
Reply | Threaded
Open this post in threaded view
|

Re: Inefficient code in xml.el

David Kastrup
[hidden email] (Kim F. Storm) writes:

> I noticed that xml.el has several occurences of code like this:
>
>   ((looking-at (concat "<!ENTITY[ \t\n\r]*\\(" xml-name-re
> "\\)[ \t\n\r]*\\(" xml-entity-value-re
> "\\)[ \t\n\r]*>"))
>    (let ((name  (buffer-substring (nth 2 (match-data))
>   (nth 3 (match-data))))
>  (value (buffer-substring (+ (nth 4 (match-data)) 1)
>   (- (nth 5 (match-data)) 1))))
>      (goto-char (nth 1 (match-data)))
>
>
> Using (match-data) like that is VERY, VERY inefficient.

Uh, that is only half the story.  It is not just devastatingly
inefficient by itself.  Every single such call creates a whole slew of
markers, and those slow down _any_ text manipulation in the buffer
_afterwards_ until they get garbage-collected at some indeterminate
point of time in the future.

The code passage above creates 30 new markers _every_ time it is run.
All of these are maintained for every insertion/deletion in the buffer
until garbage collection finally removes them.

> Use either match-beginning/match-end, or match-string instead.

Yes, yes, YES!

--
David Kastrup, Kriemhildstr. 15, 44793 Bochum


_______________________________________________
Emacs-devel mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/emacs-devel
Reply | Threaded
Open this post in threaded view
|

RE: Inefficient code in xml.el

klaus.berndl
In reply to this post by Kim Storm
Well, i get the point, but how an elisp-programmer should know this?!

The manual says:

 - Function: match-data
     This function returns a newly constructed list containing all the
     information on what text the last search matched.  Element zero is
     the position of the beginning of the match for the whole
     expression; element one is the position of the end of the match
     for the expression.  The next two elements are the positions of
     the beginning and end of the match for the first subexpression,
     and so on.  In general, element number 2N corresponds to
     `(match-beginning N)'; and element number 2N + 1 corresponds to
     `(match-end N)'.

     All the elements are markers or `nil' if matching was done on a
     buffer, and all are integers or `nil' if matching was done on a
     string with `string-match'.

     As always, there must be no possibility of intervening searches
     between the call to a search function and the call to `match-data'
     that is intended to access the match data for that search.

          (match-data)
               =>  (#<marker at 9 in foo>
                    #<marker at 17 in foo>
                    #<marker at 13 in foo>
                    #<marker at 17 in foo>)

What about to mention such inefficiency-problems in the documentation?!
The manual only says "... Corresponds to ....". IMHO the documentation
must mention the performnace-topic if it is so important you wrote in
these postings!

Ciao,
Klaus


David Kastrup wrote:

> [hidden email] (Kim F. Storm) writes:
>
>> I noticed that xml.el has several occurences of code like this:
>>
>>   ((looking-at (concat "<!ENTITY[ \t\n\r]*\\(" xml-name-re
>> "\\)[ \t\n\r]*\\(" xml-entity-value-re
>> "\\)[ \t\n\r]*>"))
>>    (let ((name  (buffer-substring (nth 2 (match-data))
>>   (nth 3 (match-data))))
>>  (value (buffer-substring (+ (nth 4 (match-data)) 1)
>>   (- (nth 5 (match-data)) 1))))
>>      (goto-char (nth 1 (match-data)))
>>
>>
>> Using (match-data) like that is VERY, VERY inefficient.
>
> Uh, that is only half the story.  It is not just devastatingly
> inefficient by itself.  Every single such call creates a whole slew of
> markers, and those slow down _any_ text manipulation in the buffer
> _afterwards_ until they get garbage-collected at some indeterminate
> point of time in the future.
>
> The code passage above creates 30 new markers _every_ time it is run.
> All of these are maintained for every insertion/deletion in the buffer
> until garbage collection finally removes them.
>
>> Use either match-beginning/match-end, or match-string instead.
>
> Yes, yes, YES!


_______________________________________________
Emacs-devel mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/emacs-devel
Reply | Threaded
Open this post in threaded view
|

Re: Inefficient code in xml.el

Kevin Rodgers
[hidden email] wrote:
 > Well, i get the point, but how an elisp-programmer should know this?!
 >
 > The manual says:
 >
 >  - Function: match-data
 >      This function returns a newly constructed list containing all the
 >      information on what text the last search matched.  Element zero is
 >      the position of the beginning of the match for the whole
 >      expression; element one is the position of the end of the match
 >      for the expression.  The next two elements are the positions of
 >      the beginning and end of the match for the first subexpression,
 >      and so on.  In general, element number 2N corresponds to
 >      `(match-beginning N)'; and element number 2N + 1 corresponds to
 >      `(match-end N)'.
 >
 >      All the elements are markers or `nil' if matching was done on a
 >      buffer, and all are integers or `nil' if matching was done on a
 >      string with `string-match'.
 >
 >      As always, there must be no possibility of intervening searches
 >      between the call to a search function and the call to `match-data'
 >      that is intended to access the match data for that search.
 >
 >           (match-data)
 >                =>  (#<marker at 9 in foo>
 >                     #<marker at 17 in foo>
 >                     #<marker at 13 in foo>
 >                     #<marker at 17 in foo>)
 >
 > What about to mention such inefficiency-problems in the documentation?!
 > The manual only says "... Corresponds to ....". IMHO the documentation
 > must mention the performnace-topic if it is so important you wrote in
 > these postings!

Just change the first sentence to:

        This function returns a newly constructed list of newly
        allocated markers, containing all the information on what text
        the last search matched.

A reference to the Markers node there might be useful.  And this bit
from the Markers node should be emphasized:

        Insertion and deletion in a buffer must check all the markers
        and relocate them if necessary.  This slows processing in a
        buffer with a large number of markers.

--
Kevin Rodgers



_______________________________________________
Emacs-devel mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/emacs-devel