Text property searching

classic Classic list List threaded Threaded
52 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Text property searching

Lars Ingebrigtsen
I've suggested a few times before that it would be nice to have search
functions that are... nicer... than the ones we have now
(`text-property-any' and `next-single-property-change'), and the
maintainer(s) at the time said "sure".  I think.

But I never implemented that wonderful function, because I could never
decide what it would look like.

But last night I think I got it: It should be just like search-forward,
only not.  (Hm.  I'm feeling a slight sense of deja vu while typing
this -- have I had this revelation before but forgotten about it?)

Anyway:

Let's say you have a region in the buffer that has the text property
`shr-url' with the value "http://fsf.org/", then:

(text-property-search-forward 'shr-url "http://fsf.org/" t)

would place point at the end of that region, and `match-beginning' and
`match-end' would point to the start and end.

The `t' there is the predicate: `t' means "equal", `nil' means "not
equal", and then you can write your own predicates for other uses.

So, to collect all urls from text properties, you'd write:

(while (text-property-search-forward 'shr-url nil nil)
  (push (get-text-property (match-beginning 0) 'shr-url) urls))

and that's it.  Or to collect all images:

(while (text-property-search-forward 'display 'image
                                     (lambda (elem val)
                                       (and (consp elem)
                                            (eq (car elem) val))))
  (push (plist-get (cdr (get-text-property (match-beginning 0) 'display)) :data)
        images))

Does this look OK to everybody?

--
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no


Reply | Threaded
Open this post in threaded view
|

Re: Text property searching

T.V Raman
LGTM -- it would make a lot of the code in emacspeak for EWW support a
lot nicer to start with:-)
--

Reply | Threaded
Open this post in threaded view
|

Re: Text property searching

Dmitry Gutov
In reply to this post by Lars Ingebrigtsen
On 4/16/18 1:56 AM, Lars Ingebrigtsen wrote:

> Let's say you have a region in the buffer that has the text property
> `shr-url' with the value "http://fsf.org/", then:
>
> (text-property-search-forward 'shr-url "http://fsf.org/" t)
>
> would place point at the end of that region, and `match-beginning' and
> `match-end' would point to the start and end.

Sounds quite nice.

> The `t' there is the predicate: `t' means "equal", `nil' means "not
> equal", and then you can write your own predicates for other uses.

"Equals or includes" should be another popular predicate (think faces).

Reply | Threaded
Open this post in threaded view
|

Re: Text property searching

Lars Ingebrigtsen
Dmitry Gutov <[hidden email]> writes:

>> The `t' there is the predicate: `t' means "equal", `nil' means "not
>> equal", and then you can write your own predicates for other uses.
>
> "Equals or includes" should be another popular predicate (think faces).

Yes, that's true...  We could have a special symbol for that, or would
it be confusing?

--
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

Reply | Threaded
Open this post in threaded view
|

Re: Text property searching

Dmitry Gutov
On 4/16/18 3:01 PM, Lars Ingebrigtsen wrote:
> Dmitry Gutov <[hidden email]> writes:
>
>>> The `t' there is the predicate: `t' means "equal", `nil' means "not
>>> equal", and then you can write your own predicates for other uses.
>>
>> "Equals or includes" should be another popular predicate (think faces).
>
> Yes, that's true...  We could have a special symbol for that<...>
I'd like that.

Reply | Threaded
Open this post in threaded view
|

RE: Text property searching

Drew Adams
In reply to this post by Lars Ingebrigtsen
> >> The `t' there is the predicate: `t' means "equal", `nil' means "not
> >> equal", and then you can write your own predicates for other uses.
> >
> > "Equals or includes" should be another popular predicate (think faces).
>
> Yes, that's true...  We could have a special symbol for that, or would
> it be confusing?

FWIW -

My library `isearch-prop.el' has long let you Isearch zones
that have arbitrary text-property or overlay-property values.

I agree that an eq/equal-or-memq/member predicate can be
useful.  But it's not really enough when it comes to dealing
with properties, including but not limited to `face' and
similar (whose values can combine for an accumulated effect).

Like what you propose, the code I use lets you use an
arbitrary predicate, but matching allows for matches that
involve overlap of property values, in this sense: If the
PROPERTY value is an atom then it must be a member of the
set of test VALUES, but if the PROPERTY value is a list,
then at least one of its elements must be a member of VALUES.

https://www.emacswiki.org/emacs/download/isearch-prop.el

---

This is the crux of the property-matching & predicate code:

(defun isearchp-property-matches-p (type property values
                                    match-fn position)
  "Return non-nil if POSITION has PROPERTY with a value matching VALUES.
TYPE is `overlay', `text', or nil, and specifies the type of property.
TYPE nil means look for both overlay and text properties.  Return
 non-nil if either matches.

Matching means finding text with a PROPERTY value that overlaps with
VALUES: If the value of PROPERTY is an atom, then it must be a member
of VALUES.  If it is a list, then at least one list element must be a
member of VALUES.

MATCH-FN is a binary predicate that is applied to each item of VALUES
and a zone of text with property PROP.  If it returns non-nil then the
zone is a search hit."
  (let* ((ov-matches-p   nil)
         (txt-matches-p  nil)
         (ovchk-p        (and (or (not type)  (eq type 'overlay))))
         (ovs            (and ovchk-p  (overlays-at position))))
    (when ovchk-p
      (setq ov-matches-p
            (catch 'i-p-m-p
              (dolist (ov  ovs)
                (when (isearchp-some
                       values (overlay-get ov property) match-fn)
                  (throw 'i-p-m-p t)))
              nil)))
    (when (and (or (not type)  (eq type 'text)))
      (setq txt-matches-p
            (isearchp-some
             values (get-text-property position property) match-fn)))
    (or ov-matches-p  txt-matches-p)))

(defun isearchp-property-filter-pred (type property values)
  "Return a predicate that uses `isearchp-property-matches-p'.
TYPE, PROPERTY, and VALUES are used by that function.
The predicate is suitable as a value of `isearch-filter-predicate'."
  (let ((tag  (make-symbol "isearchp-property-filter-pred")))
    `(lambda (beg end)
       (and (or (not (boundp 'isearchp-reg-beg))
                (not isearchp-reg-beg)
                (>= beg isearchp-reg-beg))
            (or (not (boundp 'isearchp-reg-end))
                (not isearchp-reg-end)
                (< end isearchp-reg-end))
            (or (isearch-filter-visible beg end)
                (not (or (eq search-invisible t)
                         (not (isearch-range-invisible beg end)))))
            (catch ',tag
              (while (< beg end)
                (let ((matches-p
                       (isearchp-property-matches-p
                        ',type ',property
                        ',values
                        (isearchp-property-default-match-fn ',property)
                        beg)))
                  (unless (if matches-p
                              (not isearchp-complement-domain-p)
                            isearchp-complement-domain-p)
                    (throw ',tag nil)))
                (setq beg  (1+ beg)))
              t)))))

Reply | Threaded
Open this post in threaded view
|

Re: Text property searching

Lars Ingebrigtsen
In reply to this post by Dmitry Gutov
Dmitry Gutov <[hidden email]> writes:

>>> "Equals or includes" should be another popular predicate (think faces).
>>
>> Yes, that's true...  We could have a special symbol for that<...>
> I'd like that.

On the other hand, perhaps we should just have a general predicate for
this not-uncommon thing?  `equal-or-member'?  It would literally be

(defun equal-or-member (thing collection)
  (or (equal thing collection)
      (and (consp collection)
           (member thing collection))))

We have a lot of data structures that can be lists or atoms, so I think
it might be generally useful...

--
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

Reply | Threaded
Open this post in threaded view
|

Re: Text property searching

João Távora
In reply to this post by Lars Ingebrigtsen
Hi,

On Sun, Apr 15, 2018 at 11:56 PM, Lars Ingebrigtsen <[hidden email]> wrote:

(text-property-search-forward 'shr-url "http://fsf.org/" t)

would place point at the end of that region, and `match-beginning' and
`match-end' would point to the start and end.

Great idea, I've wanted this badly in the past, too. Two cents:

1. What should happen if search starts in the region where the
property is already set?

2. Can we generalize this to work for searches for regions where the
property is set to some constant value and also for regions where the
property is just present. What about "not-present"? Or do you envision
this to be handled by the second and third arguments? Perhaps, in
addition to the other type of value, both could also be passed a
function: the second one a function of one arg, the buffer position,
producing a value, and the third one a function of two values
returning a boolean (this is vaguely CL's :key and :test, obviously).

Bye,
João
Reply | Threaded
Open this post in threaded view
|

Re: Text property searching

Lars Ingebrigtsen
João Távora <[hidden email]> writes:

> 1. What should happen if search starts in the region where the
> property is already set?

I think it should give you a match starting at point and ending where
the property ends.  

> 2. Can we generalize this to work for searches for regions where the
> property is set to some constant value and also for regions where the
> property is just present. What about "not-present"?

Well, that's what the two arguments do -- the match and the predicate,
so those are covered...

> Or do you envision this to be handled by the second and third
> arguments? Perhaps, in addition to the other type of value, both could
> also be passed a function: the second one a function of one arg, the
> buffer position, producing a value, and the third one a function of
> two values returning a boolean (this is vaguely CL's :key and :test,
> obviously).

Hm...  I don't quite see the need for the single-value function (i.e.,
the :key function) because the predicate can do whatever it wants.

--
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

Reply | Threaded
Open this post in threaded view
|

Re: Text property searching

João Távora
[I missed emacs-devel in my last email, sorry. Should use Gnus :-)]

On Mon, Apr 16, 2018 at 5:32 PM, Lars Ingebrigtsen <[hidden email]> wrote:

>
> João Távora <[hidden email]> writes:
>
> > Just a CL-style convenience (wouldn't your reasoning apply to
> > the second argument in general?). Perhaps, as an example,
> > you could clarify a bit better what exactly is passed to the third
> >  function in case the second property [i meant argument] is nil.
>
> The arguments to the predicate will be the same in any case -- the first
> argument is the VALUE (in this case nil) and the second is the text
> property value.
>
It'll probably become a bit clearer after I've written 
> the function and some documentation and added some examples.
> I'm typing away as we speak.  I mean mail.  :-)
 
OK. In your example tho, I think will need to distinguish the case where
the property's value is nil from the case where the property isn't set at all.
Reply | Threaded
Open this post in threaded view
|

Re: Text property searching

Lars Ingebrigtsen
João Távora <[hidden email]> writes:

> OK. In your example tho, I think will need to distinguish the case where
> the property's value is nil from the case where the property isn't set at
> all.

Hm...  is that a distinction that makes a difference anywhere?

--
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

Reply | Threaded
Open this post in threaded view
|

Re: Text property searching

Lars Ingebrigtsen
In reply to this post by Lars Ingebrigtsen
Lars Ingebrigtsen <[hidden email]> writes:

> So, to collect all urls from text properties, you'd write:
>
> (while (text-property-search-forward 'shr-url nil nil)
>   (push (get-text-property (match-beginning 0) 'shr-url) urls))

Hm, I'm writing the documentation now, and I wonder whether it would be
cleaner and more convenient to just return a data structure here to
avoid messing with the match state...  It could be, for instance, a
structure with nice accessors like

(prop-match-start match)

and stuff...

--
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

Reply | Threaded
Open this post in threaded view
|

Re: Text property searching

João Távora
In reply to this post by Lars Ingebrigtsen
On Mon, Apr 16, 2018 at 5:57 PM, Lars Ingebrigtsen <[hidden email]> wrote:
João Távora <[hidden email]> writes:

> OK. In your example tho, I think will need to distinguish the case where
> the property's value is nil from the case where the property isn't set at
> all.

Hm...  is that a distinction that makes a difference anywhere?
 
Perhaps we are misunderstanding each other. I asked you earlier
if the new function can grab regions of the buffer where a particular
property isn't present, which is different from that property being nil.
Reply | Threaded
Open this post in threaded view
|

Re: Text property searching

Lars Ingebrigtsen
João Távora <[hidden email]> writes:

> Perhaps we are misunderstanding each other. I asked you earlier
> if the new function can grab regions of the buffer where a particular
> property isn't present, which is different from that property being nil.

I misunderstood.  No, I wasn't aware that there was any difference
between "not being present" and "being nil" for text properties.

--
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

Reply | Threaded
Open this post in threaded view
|

Re: Text property searching

Eli Zaretskii
In reply to this post by Lars Ingebrigtsen
> From: Lars Ingebrigtsen <[hidden email]>
> Date: Mon, 16 Apr 2018 14:01:23 +0200
> Cc: [hidden email]
>
> > "Equals or includes" should be another popular predicate (think faces).
>
> Yes, that's true...  We could have a special symbol for that, or would
> it be confusing?

An alternative would be to define "meta-properties", like
'foreground-color', 'font', 'weight', etc.

Reply | Threaded
Open this post in threaded view
|

Re: Text property searching

Lars Ingebrigtsen
In reply to this post by Lars Ingebrigtsen
I've now implemented this on the scratch/prop-search branch.

It got a bit more convoluted than I originally thought, but I think it
should do what you'd expect now.  The subtleties are between searching
for things that don't match, and searching for nothing that doesn't
match.

The known unknowns and the unknown knowns.

I'm sure.

--
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no


Reply | Threaded
Open this post in threaded view
|

Re: Text property searching

Eli Zaretskii
In reply to this post by Lars Ingebrigtsen
> From: Lars Ingebrigtsen <[hidden email]>
> Date: Mon, 16 Apr 2018 17:11:04 +0200
> Cc: [hidden email]
>
> (defun equal-or-member (thing collection)
>   (or (equal thing collection)
>       (and (consp collection)
>            (member thing collection))))

This seems to assume a flat one-level list, but some popular
properties are more complex.  E.g., see the 'display' properties.

Reply | Threaded
Open this post in threaded view
|

Re: Text property searching

Lars Ingebrigtsen
In reply to this post by Lars Ingebrigtsen
Below is a draft of the documentation of this function.  Does it all
make sense?  :-)

Should we perhaps go for a shorter name for this function?  It's a bit
of a mouthful, but I don't really have any ideas for a good, snappy name
here...

 -- Function: text-property-search-forward prop value predicate
     Search for the next region that has text property PROP set to VALUE
     according to PREDICATE.

     This function is modelled after ‘search-forward’ and friends in
     that it moves point, but it returns a structure that describes the
     match instead of returning it in ‘match-beginning’ and friends.

     If the text property can’t be found, the function returns ‘nil’.
     If it’s found, point is placed at the end of the region that has
     this text property match, and a ‘prop-match’ structure is returned.

     PREDICATE can either be ‘t’ (which is a synonym for ‘equal’), ‘nil’
     (which means “not equal”), or a predicate that will be called with
     two parameters: The first is VALUE, and the second is the value of
     the text property we’re inspecting.

     In the examples below, imagine that you’re in a buffer that looks
     like this:

          This is a bold and here's bolditalic and this is the end.

     That is, the “bold” words are the ‘bold’ face, and the “italic”
     word is in the ‘italic’ face.

     With point at the start:

          (while (setq match (text-property-search-forward 'face 'bold t))
            (push (buffer-substring (prop-match-beginning match) (prop-match-end match))
                  words))

     This will pick out all the words that use the ‘bold’ face.

          (while (setq match (text-property-search-forward 'face nil t))
            (push (buffer-substring (prop-match-beginning match) (prop-match-end match))
                  words))

     This will pick out all the bits that have no face properties, which
     will result in the list ‘("This is a " "and here's " "and this is
     the end")’ (only reversed, since we used ‘push’).

          (while (setq match (text-property-search-forward 'face nil nil))
            (push (buffer-substring (prop-match-beginning match) (prop-match-end match))
                  words))

     This will pick out all the regions where ‘face’ is set to
     something, but this is split up into where the properties change,
     so the result here will be ‘"bold" "bold" "italic"’.

     For a more realistic example where you might use this, consider
     that you have a buffer where certain sections represent URLs, and
     these are tagged with ‘shr-url’.

          (while (setq match (text-property-search-forward 'shr-url nil nil))
            (push (prop-match-value match) urls))

     This will give you a list of all those URLs.

---

Hm...  it strikes me now that the two last parameters should be
optional, since (text-property-search-forward 'shr-url) would then be
even more obvious in its meaning.

--
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no


Reply | Threaded
Open this post in threaded view
|

Re: Text property searching

Lars Ingebrigtsen
In reply to this post by Eli Zaretskii
Eli Zaretskii <[hidden email]> writes:

>> From: Lars Ingebrigtsen <[hidden email]>
>> Date: Mon, 16 Apr 2018 17:11:04 +0200
>> Cc: [hidden email]
>>
>> (defun equal-or-member (thing collection)
>>   (or (equal thing collection)
>>       (and (consp collection)
>>            (member thing collection))))
>
> This seems to assume a flat one-level list, but some popular
> properties are more complex.  E.g., see the 'display' properties.

Hm, yes that's true, so perhaps it wouldn't be all that useful here
anyway...

--
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

Reply | Threaded
Open this post in threaded view
|

Re: Text property searching

Lars Ingebrigtsen
In reply to this post by Eli Zaretskii
Eli Zaretskii <[hidden email]> writes:

> An alternative would be to define "meta-properties", like
> 'foreground-color', 'font', 'weight', etc.

Yes, that might also be nice...

--
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

123