FW: bug#32758: 26.1 emacs-mac 7.2; forward-sentence in eww

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

FW: bug#32758: 26.1 emacs-mac 7.2; forward-sentence in eww

Van L

Hello,

I am looking for a one-space after punctuation sentence-ending for M-e to jump by.

Disabling the following variable stops short at Nov. or Gov. (see below)

┌────
│ (setq sentence-end-double-space nil)
└────

┌────
│ Haley’s departure also stoked speculation she could replace Lindsey Graham as the senator from
│ South Carolina, a possibility that Trump played down. Talk in Washington is that should Trump
│ replace Attorney General Jeff Sessions with Graham after the Nov. 6 congressional elections,
│ South Carolina Gov. Henry McMaster would be responsible for selecting a replacement to serve
│ until the 2020 election. McMaster was previously Haley’s No. 2 in the state.
└────

and I’m aware of names like A. B. C. Nurmagomedov which will stop early, too.

What rules specify an almost perfect spot for the end of a sentence followed by single space?; to fit in the sentence-end function.

Is WordNet useful for this?

https://en.wikipedia.org/wiki/Wordnet

┌────
│ 178  (defun sentence-end ()
│ 179    "Return the regexp describing the end of a sentence.
│ 180  
│ 181  This function returns either the value of the variable `sentence-end'
│ 182  if it is non-nil, or the default value constructed from the
│ 183  variables `sentence-end-base', `sentence-end-double-space',
│ 184  `sentence-end-without-period' and `sentence-end-without-space'.
│ 185  
│ 186  The default value specifies that in order to be recognized as the
│ 187  end of a sentence, the ending period, question mark, or exclamation point
│ 188  must be followed by two spaces, with perhaps some closing delimiters
│ 189  in between.  See Info node `(elisp)Standard Regexps'."
│ 190    (or sentence-end
│ 191        ;; We accept non-break space along with space.
│ 192        (concat (if sentence-end-without-period "\\w[ \u00a0][ \u00a0]\\|")
│ 193        "\\("
│ 194        sentence-end-base
│ 195        (if sentence-end-double-space
│ 196    "\\($\\|[ \u00a0]$\\|\t\\|[ \u00a0][ \u00a0]\\)" "\\($\\|[\t \u00a0]\\)")
│ 197        "\\|[" sentence-end-without-space "]+"
│ 198        "\\)"
│ 199        "[ \u00a0\t\n]*")))
└────


Reply | Threaded
Open this post in threaded view
|

Re: FW: bug#32758: 26.1 emacs-mac 7.2; forward-sentence in eww

Yuri Khan
On Wed, Oct 10, 2018 at 5:17 PM Van L <[hidden email]> wrote:

> I am looking for a one-space after punctuation sentence-ending for M-e to jump by.
>
> Disabling the following variable stops short at Nov. or Gov. (see below)
>
> ┌────
> │ (setq sentence-end-double-space nil)
> └────
>
> ┌────
> │ Haley’s departure also stoked speculation she could replace Lindsey Graham as the senator from
> │ South Carolina, a possibility that Trump played down. Talk in Washington is that should Trump
> │ replace Attorney General Jeff Sessions with Graham after the Nov. 6 congressional elections,
> │ South Carolina Gov. Henry McMaster would be responsible for selecting a replacement to serve
> │ until the 2020 election. McMaster was previously Haley’s No. 2 in the state.
> └────
>
> and I’m aware of names like A. B. C. Nurmagomedov which will stop early, too.

It might be a good idea to treat the sequence “period followed by a
single non-breaking space” as not ending a sentence. This, coupled
with the proper use of non-breaking spaces with abbreviations and
initials, will go a long way in solving the above false positive in
sentence end detection.

> ┌────
> │ 178  (defun sentence-end ()

> │ 191        ;; We accept non-break space along with space.

> │ 199         "[ \u00a0\t\n]*")))
> └────

Reply | Threaded
Open this post in threaded view
|

Re: bug#32758: 26.1 emacs-mac 7.2; forward-sentence in eww

Van L

>> ┌────
>> │ Haley’s departure also stoked speculation she could replace Lindsey Graham as the senator from
>> │ South Carolina, a possibility that Trump played down. Talk in Washington is that should Trump
>> │ replace Attorney General Jeff Sessions with Graham after the Nov. 6 congressional elections,
>> │ South Carolina Gov. Henry McMaster would be responsible for selecting a replacement to serve
>> │ until the 2020 election. McMaster was previously Haley’s No. 2 in the state.
>> └────
>>
>> and I’m aware of names like A. B. C. Nurmagomedov which will stop early, too.
>
> It might be a good idea to treat the sequence “period followed by a
> single non-breaking space” as not ending a sentence. This, coupled
> with the proper use of non-breaking spaces with abbreviations and
> initials, will go a long way in solving the above false positive in
> sentence end detection.

The text passage is generated in eww-mode after pressing R. If that sparks any ideas.
Reply | Threaded
Open this post in threaded view
|

Re: FW: bug#32758: 26.1 emacs-mac 7.2; forward-sentence in eww

Stefan Monnier
In reply to this post by Van L
> and I’m aware of names like A. B. C. Nurmagomedov which will stop early, too.

As mentioned by Yuri NBSP can help this case.  Another heuristic is to
assume sentences don't end with a single-capital-letter word.

This said, there's also the occasional "Mr. Foo" or "Dr. Bar".

I suggest you collect examples to add them to
test/lisp/textmodes/paragraphs-tests.el (and then write some Elisp code
that tries to handle them all correctly).


        Stefan


Reply | Threaded
Open this post in threaded view
|

Re: FW: bug#32758: 26.1 emacs-mac 7.2; forward-sentence in eww

Yuri Khan
On Wed, Oct 10, 2018 at 10:18 PM Stefan Monnier
<[hidden email]> wrote:

> > and I’m aware of names like A. B. C. Nurmagomedov which will stop early, too.
>
> As mentioned by Yuri NBSP can help this case.  Another heuristic is to
> assume sentences don't end with a single-capital-letter word.

They sometimes do; we need plan B.

> This said, there's also the occasional "Mr. Foo" or "Dr. Bar".

These are no exceptions from the NBSP rule. Neither are St. Patrick
and Mt. Fuji.