Generation of tags for the current project on the fly

classic Classic list List threaded Threaded
48 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Generation of tags for the current project on the fly

Dmitry Gutov
Here's an idea I've been working on. We generate tags for all files the
current project contains (except the ignored ones) when the user calls
one of the xref commands, but hasn't explicitly visited any tags table.

The result is used until they make a change in a file somewhere and save
the buffer, then the generated table is discarded.

I think it will be helpful for new users (who don't really know how to
generate tags), as well as people who are used to certain other editors
performing the indexing automatically, in small-to-medium sized
projects. With some effort, we could implement re-indexing and
invalidation on a more granular level (so it's usable in bigger projects
too), but transitioning to GNU Global would probably be better.

For reference, indexing the Emacs sources takes ~1.1sec here.

What do people think?

See the attached patch.

project-auto-tags.diff (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Generation of tags for the current project on the fly

Eli Zaretskii
> From: Dmitry Gutov <[hidden email]>
> Date: Fri, 12 Jan 2018 04:02:06 +0300
>
> Here's an idea I've been working on. We generate tags for all files the
> current project contains (except the ignored ones) when the user calls
> one of the xref commands, but hasn't explicitly visited any tags table.
>
> The result is used until they make a change in a file somewhere and save
> the buffer, then the generated table is discarded.

Why discard it after the first save?  The tags table is probably still
very much valid.  I'd not discard it until either of the following
happens:

  . we fail to find a tag
  . the user visits a tags table explicitly
  . the user switches to a different project(?)

> I think it will be helpful for new users (who don't really know how to
> generate tags), as well as people who are used to certain other editors
> performing the indexing automatically, in small-to-medium sized
> projects. With some effort, we could implement re-indexing and
> invalidation on a more granular level (so it's usable in bigger projects
> too), but transitioning to GNU Global would probably be better.

We could offer generating a tags table if we don't find one in the
tree, instead of generating it automatically.  I think this would be a
better UI and UX, especially given the time it could take to generate
TAGS (see below).

> For reference, indexing the Emacs sources takes ~1.1sec here.

Was that with cold cache or warm cache?

"make TAGS" takes about 9 sec here with a warm cache, and this is an
SSD disk.  On fencepost.gnu.org, a (somewhat slow) GNU/Linux system,
it took 12 sec with a cold cache and 4 sec with a warm cache.  And
Emacs is not a large project; I wonder what would happen in larger
ones, like GCC or glibc.

IOW, I don't think this is so fast that we could do that without user
approval.

> +         (extensions '("rb" "js" "py" "pl" "el" "c" "cpp" "cc" "h" "hh" "hpp"
> +                       "java" "go" "cl" "lisp" "prolog" "php" "erl" "hrl"
> +                       "F" "f" "f90" "for" "cs" "a" "asm" "ads" "adb" "ada"))
> +         (file-regexp (format "\\.%s\\'" (regexp-opt extensions))))
> +    (setq etags--project-tags-file (make-temp-file "emacs-project-tags-"))
> +    (with-temp-buffer
> +      (mapc (lambda (f)
> +              (when (string-match-p file-regexp f)
> +                (insert f "\n")))
> +            files)
> +      (shell-command-on-region (point-min) (point-max)
> +                               (format "%s - -o %s" etags-command etags--project-tags-file)
> +                               nil nil "*etags-project-tags-errors*" t))))

I don't understand why you didn't use the commonly used form:

   find . -name "*.rb" -o -name "*.js" ... | etags -o- -

Doing things the way you did raises issues with encoding of file
names, which could cause subtle problem in rare use cases.  I think
using 'find' is also faster.

More generally, I think doing this that way is not TRT, at least not
by default.  "make TAGS" in Emacs will produce a much richer tags
table than your method, because our Makefiles use regexps to augment
the automatic tagging in etags.  So I think we should first try to
invoke the TAGS target of a Makefile in the tree, if one exists, and
only use the naïve command as fallback.  And perhaps we should also
provide some customization for the command to be used (but that will
obviously not help newbies who didn't yet customize the project they
are working on).

Thanks.

Reply | Threaded
Open this post in threaded view
|

Re: Generation of tags for the current project on the fly

Dmitry Gutov
On 1/12/18 12:01 PM, Eli Zaretskii wrote:

> Why discard it after the first save?  The tags table is probably still
> very much valid.

Indeed, it's a rough heuristic. I'm aiming for correctness here, not for
performance.

On the other hand, code navigation and editing are often fairly distinct
activities, you don't switch between the two too frequently. So waiting
a second or two when going from the latter to the former shouldn't be
too terrible.

> I'd not discard it until either of the following
> happens:
>
>    . we fail to find a tag

Not sure about this one. We can make this customizable, of course
(although the implementation might end up a bit convoluted), but IMO
it's not good for the default behavior.

Failing to find a tag is a valid result (some identifiers can be absent,
or defined somewhere else, e.g. in the libraries), and doing a rescan
each time that happens might be more annoying.

Further, some users will call C-u xref-find-definitions, look for the
new tag in the completion table, fail to find it there, and simply abort
without trying the search.

>    . the user visits a tags table explicitly

That's of course, works already.

>    . the user switches to a different project(?)

It's an omission currently, but yes, I fully intend to add this.

> We could offer generating a tags table if we don't find one in the
> tree, instead of generating it automatically.

And then what? Visit it? And make the user to rescan manually every
time? I'm fine with this as an optional behavior (and it will also be an
improvement, of course, since generating tags is not exactly trivial for
new users, and even many not-so-new ones), but I don't want this for the
default.

> I think this would be a
> better UI and UX, especially given the time it could take to generate
> TAGS (see below).

Sublime Text, Atom and VS Code simply index the project code, AFAIK,
without extra prompts. I think we should try to show a similar
experience, even if it's not great for big projects. There are several
directions we can improve on it, but showing the user that "yes, we can
find-definition right away" is a good thing.

>> For reference, indexing the Emacs sources takes ~1.1sec here.
>
> Was that with cold cache or warm cache?

Warm, probably. But that's the relevant time, isn't it? We're most
wondering how long it will take to *reindexing* (because we're
discussing when to do it). The first indexing will take place anyway.

> "make TAGS" takes about 9 sec here with a warm cache, and this is an
> SSD disk.

'make tags' makes 1 second on my machine, with an NVMe disk.

> On fencepost.gnu.org, a (somewhat slow) GNU/Linux system,
> it took 12 sec with a cold cache and 4 sec with a warm cache.  And
> Emacs is not a large project; I wonder what would happen in larger
> ones, like GCC or glibc.

We can try to somehow detect very large projects, and helpfully offer to
visit a tags table instead. Anyway, M-x visit-tags-table still works.

> IOW, I don't think this is so fast that we could do that without user
> approval.

The argument here is that if the user called xref-find-definitions, it's
better to do a (long-ish) scan and show something, instead of failing.
They always have an option of C-g (we could also catch it and show
helpful instructions if the process took too long).

> I don't understand why you didn't use the commonly used form:
>
>     find . -name "*.rb" -o -name "*.js" ... | etags -o- -

Because the project API doesn't make this easy. Anyway, generating the
full list of files is relatively fast in comparison. At most, it took
like 30% of the whole time (and less in other cases). And we can speed
it up further independently (e.g. using git ls-files).

> Doing things the way you did raises issues with encoding of file
> names, which could cause subtle problem in rare use cases.

Well, I haven't seen them yet, and don't really understand how they're
going to happen. But we'll probably fix them, one way or another.

> I think
> using 'find' is also faster.

find is used under the covers. The difference is just that the
invocations of etags are only happening later.

> More generally, I think doing this that way is not TRT, at least not
> by default.  "make TAGS" in Emacs will produce a much richer tags
> table than your method, because our Makefiles use regexps to augment
> the automatic tagging in etags.  So I think we should first try to
> invoke the TAGS target of a Makefile in the tree, if one exists, and
> only use the naïve command as fallback.

'make tags' is very much specific to Emacs. We can introduce some kind
of protocol, of course, but my primary goal here is to improve the
out-of-the-box behavior.

Further, the task will have to write tags to stdout: the current code
saves the temporary tags file to /tmp, and there are reasons to do that.
Anyway, that part shouldn't be too hard.

A possible venue for improvement is to somehow derive a multi-TAGS-files
structure (with their dependencies) from the project information. Still
thinking about it.

Reply | Threaded
Open this post in threaded view
|

Re: Generation of tags for the current project on the fly

Eli Zaretskii
> Cc: [hidden email]
> From: Dmitry Gutov <[hidden email]>
> Date: Fri, 12 Jan 2018 16:52:21 +0300
>
> On the other hand, code navigation and editing are often fairly distinct
> activities, you don't switch between the two too frequently.

In my workflows, I do that all the time, because I don't always
remember the details of the functions I need to call in the code I'm
writing.

> >    . we fail to find a tag
>
> Not sure about this one. We can make this customizable, of course
> (although the implementation might end up a bit convoluted), but IMO
> it's not good for the default behavior.
>
> Failing to find a tag is a valid result (some identifiers can be absent,
> or defined somewhere else, e.g. in the libraries), and doing a rescan
> each time that happens might be more annoying.

If you maintain that scanning is fast, then the annoyance should be
minimal.

> > We could offer generating a tags table if we don't find one in the
> > tree, instead of generating it automatically.
>
> And then what? Visit it?

No, just do what you intended, but only after an approval.  It could
be that the user thought she already visited a tags table, or some
other mistake.

> >> For reference, indexing the Emacs sources takes ~1.1sec here.
> >
> > Was that with cold cache or warm cache?
>
> Warm, probably. But that's the relevant time, isn't it?

Not necessarily.  The first time a tree is scanned could well be the
shortly after you start working on a project.

> We're most wondering how long it will take to *reindexing* (because
> we're discussing when to do it). The first indexing will take place
> anyway.
>
> > "make TAGS" takes about 9 sec here with a warm cache, and this is an
> > SSD disk.
>
> 'make tags' makes 1 second on my machine, with an NVMe disk.

I bet it will be even faster with a RAM disk.  But we shouldn't base
our decisions on such configurations, as that isn't the norm yet, I
think.

> > IOW, I don't think this is so fast that we could do that without user
> > approval.
>
> The argument here is that if the user called xref-find-definitions, it's
> better to do a (long-ish) scan and show something, instead of failing.

It could be a mistake, or the user could reconsider given the
question.  We do that with visiting large files, for example.

> > I don't understand why you didn't use the commonly used form:
> >
> >     find . -name "*.rb" -o -name "*.js" ... | etags -o- -
>
> Because the project API doesn't make this easy. Anyway, generating the
> full list of files is relatively fast in comparison.

Invoking 'find' will always be faster, as it's optimized for
traversing directory trees.

> > I think
> > using 'find' is also faster.
>
> find is used under the covers. The difference is just that the
> invocations of etags are only happening later.

No, the difference is also that in my example etags runs in parallel
with 'find', not in sequence.

> > More generally, I think doing this that way is not TRT, at least not
> > by default.  "make TAGS" in Emacs will produce a much richer tags
> > table than your method, because our Makefiles use regexps to augment
> > the automatic tagging in etags.  So I think we should first try to
> > invoke the TAGS target of a Makefile in the tree, if one exists, and
> > only use the naïve command as fallback.
>
> 'make tags' is very much specific to Emacs.

No, TAGS is a standard target in GNU Makefile's.

Reply | Threaded
Open this post in threaded view
|

Re: Generation of tags for the current project on the fly

Dmitry Gutov
On 1/12/18 9:52 PM, Eli Zaretskii wrote:

> In my workflows, I do that all the time, because I don't always
> remember the details of the functions I need to call in the code I'm
> writing.

Sure, but not as often as you use completion-at-point, probably. Anyway,
what I said was an approximation/simplification. People's workflows are
bound to be different.

>> Failing to find a tag is a valid result (some identifiers can be absent,
>> or defined somewhere else, e.g. in the libraries), and doing a rescan
>> each time that happens might be more annoying.
>
> If you maintain that scanning is fast, then the annoyance should be
> minimal.

If scanning is fast, invalidate-on-save should be good enough. And it's
easier to implement (already is).

>>> We could offer generating a tags table if we don't find one in the
>>> tree, instead of generating it automatically.
>>
>> And then what? Visit it?
>
> No, just do what you intended, but only after an approval.  It could
> be that the user thought she already visited a tags table, or some
> other mistake.

OK, so if the user says yes, we "temporarily visit" to auto-generated
tags table. Then the user saves a file and that table get invalidated
(or via some other mechanism), and we want to index it again. Ask again?

>>>> For reference, indexing the Emacs sources takes ~1.1sec here.
>>>
>>> Was that with cold cache or warm cache?
>>
>> Warm, probably. But that's the relevant time, isn't it?
>
> Not necessarily.  The first time a tree is scanned could well be the
> shortly after you start working on a project.

Not sure what you mean. The tree has to be scanned *sometime* at least
once, hasn't it?

>> 'make tags' makes 1 second on my machine, with an NVMe disk.
>
> I bet it will be even faster with a RAM disk.  But we shouldn't base
> our decisions on such configurations, as that isn't the norm yet, I
> think.

NVMe is a bus for an actual storage device, though. Anyway, 1 second and
4 seconds are different, but not hugely different. And we haven't
optimized everything we could, yet.

For instance, could you try to see how long takes the generation of the
file list alone? And populating the buffer with it. But without passing
it to etags.

>>> IOW, I don't think this is so fast that we could do that without user
>>> approval.
>>
>> The argument here is that if the user called xref-find-definitions, it's
>> better to do a (long-ish) scan and show something, instead of failing.
>
> It could be a mistake, or the user could reconsider given the
> question.  We do that with visiting large files, for example.

That's a valid argument. On the other hand, they might not know how long
the indexing will take anyway.

>>> I don't understand why you didn't use the commonly used form:
>>>
>>>      find . -name "*.rb" -o -name "*.js" ... | etags -o- -
>>
>> Because the project API doesn't make this easy. Anyway, generating the
>> full list of files is relatively fast in comparison.
>
> Invoking 'find' will always be faster, as it's optimized for
> traversing directory trees.

'git ls-files' will probably be faster still.

>>> I think
>>> using 'find' is also faster.
>>
>> find is used under the covers. The difference is just that the
>> invocations of etags are only happening later.
>
> No, the difference is also that in my example etags runs in parallel
> with 'find', not in sequence.

That's what I was trying to say.

>> 'make tags' is very much specific to Emacs.
>
> No, TAGS is a standard target in GNU Makefile's.

OK, good to know. Two questions, then:

- Can we make it output the tags to stdout?
- Can we detect than a given Makefile has a proper TAGS target (that can
output to stdout)?

Not sure yet how to handle the TAGS files inclusions, though.

Reply | Threaded
Open this post in threaded view
|

Re: Generation of tags for the current project on the fly

Eli Zaretskii
> Cc: [hidden email]
> From: Dmitry Gutov <[hidden email]>
> Date: Sun, 14 Jan 2018 05:05:04 +0300
>
> >>> We could offer generating a tags table if we don't find one in the
> >>> tree, instead of generating it automatically.
> >>
> >> And then what? Visit it?
> >
> > No, just do what you intended, but only after an approval.  It could
> > be that the user thought she already visited a tags table, or some
> > other mistake.
>
> OK, so if the user says yes, we "temporarily visit" to auto-generated
> tags table. Then the user saves a file and that table get invalidated
> (or via some other mechanism), and we want to index it again. Ask again?

No, I think asking once per project should be enough.

> >> Warm, probably. But that's the relevant time, isn't it?
> >
> > Not necessarily.  The first time a tree is scanned could well be the
> > shortly after you start working on a project.
>
> Not sure what you mean. The tree has to be scanned *sometime* at least
> once, hasn't it?

I mean the first time the tags table is required might very well be at
the beginning of working on a project, at which time the project
source tree is not yet in the cache.

> For instance, could you try to see how long takes the generation of the
> file list alone? And populating the buffer with it. But without passing
> it to etags.

What Lisp shall I use for that?
> > Invoking 'find' will always be faster, as it's optimized for
> > traversing directory trees.
>
> 'git ls-files' will probably be faster still.

Yes, but that only works in Git repositories.

> > No, TAGS is a standard target in GNU Makefile's.
>
> OK, good to know. Two questions, then:
>
> - Can we make it output the tags to stdout?

Not likely.  But you could just visit the TAGS file(s), no?

> - Can we detect than a given Makefile has a proper TAGS target (that can
> output to stdout)?

Maybe CEDET has something, but if not, searching for ^TAGS: should be
easy.

> Not sure yet how to handle the TAGS files inclusions, though.

"make TAGS" should handle it, as it does in Emacs.

Reply | Threaded
Open this post in threaded view
|

Re: Generation of tags for the current project on the fly

Dmitry Gutov
On 1/14/18 7:21 PM, Eli Zaretskii wrote:

>> OK, so if the user says yes, we "temporarily visit" to auto-generated
>> tags table. Then the user saves a file and that table get invalidated
>> (or via some other mechanism), and we want to index it again. Ask again?
>
> No, I think asking once per project should be enough.

Until the end of the current Emacs session? And ask again after restart?

What about if the user switches to a different project and then back?

> I mean the first time the tags table is required might very well be at
> the beginning of working on a project, at which time the project
> source tree is not yet in the cache.

Yes, and? The user will need it to be indexed either way, right?

There's also another optimization opportunity: performing reindexing in
an asynchronous fashion, in the background (maybe after a timeout, too),
after any file is changed and saved. This one comes with its own
tradeoffs, though.

>> For instance, could you try to see how long takes the generation of the
>> file list alone? And populating the buffer with it. But without passing
>> it to etags.
>
> What Lisp shall I use for that?

To measure the full time:

(benchmark 1 '(progn (etags--project-tags-cleanup)
(etags--maybe-use-project-tags)))

To measure the time to generate the list of files only:

(benchmark 1 '(all-completions "" (project-file-completion-table
(project-current) (list default-directory))))

>>> Invoking 'find' will always be faster, as it's optimized for
>>> traversing directory trees.
>>
>> 'git ls-files' will probably be faster still.
>
> Yes, but that only works in Git repositories.

We can probably optimize for that use case these days. Git or some other
VCS is usually in place, especially in non-toy projects.

>>> No, TAGS is a standard target in GNU Makefile's.
>>
>> OK, good to know. Two questions, then:
>>
>> - Can we make it output the tags to stdout?
>
> Not likely.  But you could just visit the TAGS file(s), no?

Hmm, there are reasons not to do that in general, but if the way we
generate the files is known to be "right", they mostly disappear (except
for the implementation complexity: doing it this way and using temporary
files in the other case will require more code).

How do we figure which files to visit? Do we just visit src/TAGS and
expect the rest to be 'include'-d.

>> - Can we detect than a given Makefile has a proper TAGS target (that can
>> output to stdout)?
>
> Maybe CEDET has something, but if not, searching for ^TAGS: should be
> easy.
>
>> Not sure yet how to handle the TAGS files inclusions, though.
>
> "make TAGS" should handle it, as it does in Emacs.

So these questions have answers, good.

Here's another one: considering the reindexing costs are not always
negligible and depend on the size of a project, will there be actual
benefit to using the proposed scheme in GNU projects like Emacs, GCC and
others (those are the ones that use 'make TAGS')? Or is there a subset
of them, at least, which we expect to benefit?

Reply | Threaded
Open this post in threaded view
|

Re: Generation of tags for the current project on the fly

John Yates-4
In reply to this post by Eli Zaretskii
> > > Invoking 'find' will always be faster, as it's optimized for
> > > traversing directory trees.
> >
> > 'git ls-files' will probably be faster still.
>
> Yes, but that only works in Git repositories.

The context of this discussion is _large_ projects.  My sense is
that git's efficiency relative to other SCM technologies means
that the larger the project the higher the likelihood of use git.

Also, when talking speed ripgrep has been a revelation:

  https://github.com/BurntSushi/ripgrep

Admitted that ripgrep is written in rust but a scanner exploiting
similar ideas could change what we imagine to be a big project.

/john

Reply | Threaded
Open this post in threaded view
|

Re: Generation of tags for the current project on the fly

Eli Zaretskii
In reply to this post by Dmitry Gutov
> Cc: [hidden email]
> From: Dmitry Gutov <[hidden email]>
> Date: Mon, 15 Jan 2018 04:44:58 +0300
>
> On 1/14/18 7:21 PM, Eli Zaretskii wrote:
>
> >> OK, so if the user says yes, we "temporarily visit" to auto-generated
> >> tags table. Then the user saves a file and that table get invalidated
> >> (or via some other mechanism), and we want to index it again. Ask again?
> >
> > No, I think asking once per project should be enough.
>
> Until the end of the current Emacs session? And ask again after restart?

Yes.

> What about if the user switches to a different project and then back?

Ideally, don't ask anymore about that project.

> > I mean the first time the tags table is required might very well be at
> > the beginning of working on a project, at which time the project
> > source tree is not yet in the cache.
>
> Yes, and? The user will need it to be indexed either way, right?

Yes, but my point was that col-cache times cannot be ignored.

> There's also another optimization opportunity: performing reindexing in
> an asynchronous fashion, in the background (maybe after a timeout, too),
> after any file is changed and saved. This one comes with its own
> tradeoffs, though.

Doing that asynchronously could be an automatic action , not in need
of any user confirmation.  It complicates the implementation a bit,
but perhaps not too much, so this could be a good design choice.

> To measure the full time:
>
> (benchmark 1 '(progn (etags--project-tags-cleanup)
> (etags--maybe-use-project-tags)))

5.5 sec with warm cache.  This is with an unoptimized Emacs, btw, but
most of the time is spent by external programs, so perhaps this
doesn't matter.

> To measure the time to generate the list of files only:
>
> (benchmark 1 '(all-completions "" (project-file-completion-table
> (project-current) (list default-directory))))

0.95 sec with cold cache, 0.23 with warm cache.

> >> 'git ls-files' will probably be faster still.
> >
> > Yes, but that only works in Git repositories.
>
> We can probably optimize for that use case these days. Git or some other
> VCS is usually in place, especially in non-toy projects.

For the projects using Git, yes.

> How do we figure which files to visit? Do we just visit src/TAGS and
> expect the rest to be 'include'-d.

I think just visit TAGS in the directory of the source whose symbol is
requested, or maybe use locate-dominating-file to look higher in the
tree if not found in the current directory.

> Here's another one: considering the reindexing costs are not always
> negligible and depend on the size of a project, will there be actual
> benefit to using the proposed scheme in GNU projects like Emacs, GCC and
> others (those are the ones that use 'make TAGS')? Or is there a subset
> of them, at least, which we expect to benefit?

That's a good question.  But if the tags table is automatically
produced in the background, the time this takes is much less
important, and having TAGS always up to date would be a valuable
feature.  FWIW, I do "make TAGS" in every large project I start
working on seriously, so at least for me this is important.

Reply | Threaded
Open this post in threaded view
|

Re: Generation of tags for the current project on the fly

Eli Zaretskii
In reply to this post by John Yates-4
> From: John Yates <[hidden email]>
> Date: Sun, 14 Jan 2018 20:50:02 -0500
> Cc: Dmitry Gutov <[hidden email]>, Emacs developers <[hidden email]>
>
> > > > Invoking 'find' will always be faster, as it's optimized for
> > > > traversing directory trees.
> > >
> > > 'git ls-files' will probably be faster still.
> >
> > Yes, but that only works in Git repositories.
>
> The context of this discussion is _large_ projects.  My sense is
> that git's efficiency relative to other SCM technologies means
> that the larger the project the higher the likelihood of use git.

That's definitely true for personal and FLOSS environments, but not
elsewhere.  Where I earn my paycheck, they use TFS and even
ClearCase(!).

Reply | Threaded
Open this post in threaded view
|

Re: Generation of tags for the current project on the fly

Dmitry Gutov
On 1/15/18 8:42 AM, Eli Zaretskii wrote:

> That's definitely true for personal and FLOSS environments, but not
> elsewhere.  Where I earn my paycheck, they use TFS and even
> ClearCase(!).

No counterparts to 'git ls-files' in those VC systems?

Reply | Threaded
Open this post in threaded view
|

Re: Generation of tags for the current project on the fly

John Yates-4
In reply to this post by Eli Zaretskii
On Mon, Jan 15, 2018 at 12:42 AM, Eli Zaretskii <[hidden email]> wrote:
>  Where I earn my paycheck, they use TFS and even ClearCase(!).

My condolences :-)

Small aside: At Apollo Computer I contributed to DSEE - ClearCase's
progenitor - by designing SML, the System Model Language.  After our
acquisition by HP I opted not to join Atria, ending up instead in
the Alpha chip group at DEC (my second stint with that company).

/john

Reply | Threaded
Open this post in threaded view
|

Re: Generation of tags for the current project on the fly

Eli Zaretskii
In reply to this post by Dmitry Gutov
> Cc: [hidden email]
> From: Dmitry Gutov <[hidden email]>
> Date: Mon, 15 Jan 2018 18:01:00 +0300
>
> On 1/15/18 8:42 AM, Eli Zaretskii wrote:
>
> > That's definitely true for personal and FLOSS environments, but not
> > elsewhere.  Where I earn my paycheck, they use TFS and even
> > ClearCase(!).
>
> No counterparts to 'git ls-files' in those VC systems?

Some, but they are not faster than 'find' running locally, AFAIR.

Reply | Threaded
Open this post in threaded view
|

Re: Generation of tags for the current project on the fly

Dmitry Gutov
On 1/15/18 8:21 PM, Eli Zaretskii wrote:

>> No counterparts to 'git ls-files' in those VC systems?
>
> Some, but they are not faster than 'find' running locally, AFAIR.

Anyway, I don't think we support those projects via the VC project
backend because there are no VC backends for these AFAIK.

When someone creates a project backend for them, they can implement the
file listing speedup one way or another. Maybe by keeping the list of
files in memory and listening for file events, as one option.

Reply | Threaded
Open this post in threaded view
|

Re: Generation of tags for the current project on the fly

Dmitry Gutov
In reply to this post by Eli Zaretskii
On 1/15/18 8:37 AM, Eli Zaretskii wrote:

>>> No, I think asking once per project should be enough.
>>
>> Until the end of the current Emacs session? And ask again after restart?
>
> Yes.
>
>> What about if the user switches to a different project and then back?
>
> Ideally, don't ask anymore about that project.

This is doable, ok.

>> There's also another optimization opportunity: performing reindexing in
>> an asynchronous fashion, in the background (maybe after a timeout, too),
>> after any file is changed and saved. This one comes with its own
>> tradeoffs, though.
>
> Doing that asynchronously could be an automatic action , not in need
> of any user confirmation.  It complicates the implementation a bit,
> but perhaps not too much, so this could be a good design choice.

It would shorten the waits, but do nothing for the CPU and disk usage.
Those I'm more worried about, actually, as a laptop user with a
not-so-great battery life on GNU/Linux.

If we just invalidate on save, the rescan doesn't happen until you
intend to use it again. And with asynchronous approach, they will occur
again and again, just as you edit and save files. With large projects,
one CPU core might always be busy this way (or does etags parallelize?
more cores then).

So maybe someone would prefer this approach, but I'd only go for it only
as a qualify-of-life improvements when scans are already pretty short.

>> To measure the full time:
>>
>> (benchmark 1 '(progn (etags--project-tags-cleanup)
>> (etags--maybe-use-project-tags)))
>
> 5.5 sec with warm cache.  This is with an unoptimized Emacs, btw, but
> most of the time is spent by external programs, so perhaps this
> doesn't matter.

Probably doesn't, indeed.

>> To measure the time to generate the list of files only:
>>
>> (benchmark 1 '(all-completions "" (project-file-completion-table
>> (project-current) (list default-directory))))
>
> 0.95 sec with cold cache, 0.23 with warm cache.

Thanks, so 1 second for file listing for 4.5 seconds for etags. Even if
we allowed etags to execute in parallel with find, it could only shave
it down to 4.5 seconds (and probably not even that).

>> How do we figure which files to visit? Do we just visit src/TAGS and
>> expect the rest to be 'include'-d.
>
> I think just visit TAGS in the directory of the source whose symbol is
> requested, or maybe use locate-dominating-file to look higher in the
> tree if not found in the current directory.

That option is not as easy to code as what I suggested.

Further, if we just visit lisp/TAGS when in lisp/, and xref-etags-mode
is enabled, we won't be able to find the definition of 'car'.

>> Here's another one: considering the reindexing costs are not always
>> negligible and depend on the size of a project, will there be actual
>> benefit to using the proposed scheme in GNU projects like Emacs, GCC and
>> others (those are the ones that use 'make TAGS')? Or is there a subset
>> of them, at least, which we expect to benefit?
>
> That's a good question.  But if the tags table is automatically
> produced in the background, the time this takes is much less
> important, and having TAGS always up to date would be a valuable
> feature.  FWIW, I do "make TAGS" in every large project I start
> working on seriously, so at least for me this is important.

You probably do it just once, though, and update very rarely. The old
way of operation is still going to work.

Can we improve the "warm" reindex times? In the first message of this
thread I mentioned GNU Global because it reportedly supports incremental
updates. Can we get such feature in etags, too?

I more or less imagine how I'd implement such a feature using Lisp and
'etags --append', but that would do nothing to help when the tags are
generated by make.

Reply | Threaded
Open this post in threaded view
|

Re: Generation of tags for the current project on the fly

Matthias Meulien-2
In reply to this post by Dmitry Gutov
Anyway, I don't think we support those projects via the VC project 
backend because there are no VC backends for these AFAIK.

Few years ago, I was forced to use TFS... https://marmalade-repo.org/packages/vc-tfs
--
Matthias
Reply | Threaded
Open this post in threaded view
|

Re: Generation of tags for the current project on the fly

Dmitry Gutov
On 1/15/18 11:56 PM, Matthias Meulien wrote:
>> Anyway, I don't think we support those projects via the VC project
>> backend because there are no VC backends for these AFAIK.
>
> Few years ago, I was forced to use TFS...
> https://marmalade-repo.org/packages/vc-tfs

That's pretty cool. But the second paragraph remains true.

We should add a VC backend action like 'ls-files', and all backends,
including yours, can provide their own implementation. The default will
use 'find'.

Reply | Threaded
Open this post in threaded view
|

Re: Generation of tags for the current project on the fly

Eli Zaretskii
In reply to this post by Dmitry Gutov
> Cc: [hidden email]
> From: Dmitry Gutov <[hidden email]>
> Date: Mon, 15 Jan 2018 21:50:33 +0300
>
> Can we improve the "warm" reindex times? In the first message of this
> thread I mentioned GNU Global because it reportedly supports incremental
> updates. Can we get such feature in etags, too?

Incremental tagging needs to leave the record about what was tagged
somewhere, right?  Since there's no such feature in etags now, this
sounds like a project for which I won't have time any time soon.  Any
volunteers?

> I more or less imagine how I'd implement such a feature using Lisp and
> 'etags --append', but that would do nothing to help when the tags are
> generated by make.

It will also not help if Emacs is restarted, right?

Reply | Threaded
Open this post in threaded view
|

Re: Generation of tags for the current project on the fly

Dmitry Gutov
On 1/16/18 20:50, Eli Zaretskii wrote:

> Incremental tagging needs to leave the record about what was tagged
> somewhere, right?  

The information is inside the TAGS file, isn't it? Even though it's in a
flat list, unsorted, spread throughout the file.

I was thinking that maybe we can add this feature simply using some
clever engineering, without changing the format of the file.

And I think it should be fairly easy (in terms of the algorithm, at
least) to implement incremental update for one-to-few files: you scan
through the file, remove the corresponding entries, and then scan the
files (ones that still exist) and add those entries at the end.

Might be slower to incrementally update when passed (almost) the same
list of files, like 'make tags' does. It has higher complexity on paper
(looking for/matching file names), but maybe it would still yield a
measurable improvement over a full reindex.

> Since there's no such feature in etags now, this
> sounds like a project for which I won't have time any time soon.  Any
> volunteers?

Not volunteering yet. Could be something I might have time for a few
months from now, depending on whether we have a solid plan and you'll
want to provide some hand-holding.

>> I more or less imagine how I'd implement such a feature using Lisp and
>> 'etags --append', but that would do nothing to help when the tags are
>> generated by make.
>
> It will also not help if Emacs is restarted, right?

Right, but it will do a full scan after the restart. Spending a longer
amount of time just once per project per restart is more or less fine, I
think. Especially after an explicit prompt (I've added one now, you can
see it on the branch).


Reply | Threaded
Open this post in threaded view
|

Re: Generation of tags for the current project on the fly

Dmitry Gutov
In reply to this post by Eli Zaretskii
On 1/15/18 08:37, Eli Zaretskii wrote:
>> To measure the full time:
>>
>> (benchmark 1 '(progn (etags--project-tags-cleanup)
>> (etags--maybe-use-project-tags)))
> 5.5 sec with warm cache.  This is with an unoptimized Emacs, btw, but
> most of the time is spent by external programs, so perhaps this
> doesn't matter.

BTW, I've just measured it on my older laptop with an SSD, bought around
2012. Just 2 seconds here.

The CPU is i7-3630QM, 4-core and more or less top-of-the-line for a
laptop back then, but not desktop grade anyway.

123