bug#32281: shr.el align support patch

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

bug#32281: shr.el align support patch

Bad Blue Bull
Here's a patch for shr.el that makes it handle "align" attribute for headers, paragraphs and lists, works only when monospace font is used (becoz of lame implementation of fill-paragraph). Also default value of shr-use-fonts set to nil so monospace fonts are used by default.
So now if shr-use-fonts is nil then page gets filled, if it's non nil then then it will be rendered like current version of shr.el does it, without filling and justification (see pics).
shr.el.diff is a diff file, shr.el is a patched version of shr.el

shr.el.diff (4K) Download Attachment
fill.png (185K) Download Attachment
no_fill.png (192K) Download Attachment
shr.el (116K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

bug#32281: shr.el align support patch

Noam Postavsky
severity 32281 wishlist
tags 32281 + patch
quit

Bad Blue Bull <[hidden email]> writes:

> Here's a patch for shr.el that makes it handle "align" attribute for
> headers, paragraphs and lists, works only when monospace font is used
> (becoz of lame implementation of fill-paragraph). Also default value
> of shr-use-fonts set to nil so monospace fonts are used by default.

Seems reasonable overall, but I'm not sure if changing the default is
justified.

> -(defcustom shr-use-fonts t
> +(defcustom shr-use-fonts nil ; set default value to nil because monospace fonts are better for Emacs

Although I do somewhat agree with the "monospace fonts are better for
Emacs" sentiment.

> +(defun shr-fill-paragraph-with-breaks (&optional justify bre-del bre-regexp)
> +  "Fill paragraph at or after point, breaking lines at positions matching regexp argument BRE-REGEXP.
> +If JUSTIFY is non-nil justify as well.
> +Delete matched text if BRE-DEL is non nil
> +BRE-REGEXP must be a regexp that determines positions where to break lines, default value is \\x2028."
> +  (unless bre-regexp (setq bre-regexp "\x2028"))
> +  (setq bre-regexp (concat bre-regexp "\\|\x2029"))
> +  (save-excursion
> +   (let ((line-begin) (paragraph-end))
> + (forward-paragraph)
> + (insert "\x2029") ; use \x2029 (unicode paragraph separator) to mark end of a paragraph
> + (backward-paragraph)
> + (setq line-begin (point))
> + (while (not paragraph-end)
> + (re-search-forward bre-regexp)
> + (if (equal (match-string 0) "\x2029") (setq paragraph-end t)) ; end of paragraph reached
> + (if (or bre-del paragraph-end) (replace-match ""))
> + (unless (equal (char-after) ?\n) (insert "\n")) ; I don't know why but two adjacent \n leave an empty line after fill
> + (fill-region-as-paragraph line-begin (- (point) 1) justify)
> + (setq line-begin (point)))
> + (delete-char -1)
> + )))

> +(defun shr-fill-paragraph (dom)
> +  "Fill paragraph"
> +  (when (not shr-use-fonts) ;fill-paragraph is useful only with monospace fonts
> + (shr-fill-paragraph-with-breaks
> + ((lambda (x)
> +   (cond
> + ((equal x "right") 'right)
> + ((equal x "center") 'center)
> + ((equal x "left") 'left)
> + ))
> + (cdr (assq 'align (dom-attributes dom)))) ; justify parameter
> + t) ; bre-del t
> + )
> +  )
> +

>  (defun shr-tag-li (dom)
> @@ -1770,6 +1813,7 @@
>   (put-text-property start (1+ start) 'shr-prefix-length (length bullet))
>   (shr-generic dom))))
>    (unless (bolp)
> + (unless shr-use-fonts (insert "\x2028")) ; insert a line separator
>      (insert "\n")))
>  
>  (defun shr-mark-fill (start)
> @@ -1785,7 +1829,8 @@
>       (or (not (bolp))
>   (and (> (- (point) 2) (point-min))
>        (not (= (char-after (- (point) 2)) ?\n)))))
> -    (insert "\n"))
> + (unless shr-use-fonts (insert "\x2028")) ; insert a line separator
> + (insert "\n"))
>    (shr-generic dom))

The indentation in your patch looks kind of off, are you not using
Emacs' builtin auto-indentation?  (also, don't leave hanging parens.)



Reply | Threaded
Open this post in threaded view
|

bug#32281: shr.el align support patch

Eli Zaretskii
> From: Noam Postavsky <[hidden email]>
> Date: Sun, 05 Aug 2018 22:52:35 -0400
> Cc: [hidden email]
>
> Bad Blue Bull <[hidden email]> writes:
>
> > Here's a patch for shr.el that makes it handle "align" attribute for
> > headers, paragraphs and lists, works only when monospace font is used
> > (becoz of lame implementation of fill-paragraph). Also default value
> > of shr-use-fonts set to nil so monospace fonts are used by default.
>
> Seems reasonable overall, but I'm not sure if changing the default is
> justified.

And I'm sure it isn't.  Not only do we not change past defaults so
easily, I personally don't think I'd like such a change, or consider
it correct.

> > +  "Fill paragraph at or after point, breaking lines at positions matching regexp argument BRE-REGEXP.

The first line of the doc string should not be longer than 78
characters, and should be a complete sentence.

> > +BRE-REGEXP must be a regexp that determines positions where to break lines, default value is \\x2028."
> > +  (unless bre-regexp (setq bre-regexp "\x2028"))
> > +  (setq bre-regexp (concat bre-regexp "\\|\x2029"))
> > +  (save-excursion
> > +   (let ((line-begin) (paragraph-end))
> > + (forward-paragraph)
> > + (insert "\x2029") ; use \x2029 (unicode paragraph separator) to mark end of a paragraph

Why did you decide to use u+2028 and u+2029 for these purposes?  Emacs
doesn't yet support these characters as Unicode intended, so using
them might have unexpected effects, and might produce different effect
if we start supporting them in the future.

This needs a NEWS entry, I think.

Thanks.



Reply | Threaded
Open this post in threaded view
|

bug#32281: shr.el align support patch

Noam Postavsky
In reply to this post by Bad Blue Bull
[forwarding to list, please use "Reply All" to keep [hidden email] on Cc]


> The indentation in your patch looks kind of off, are you not using
> Emacs' builtin auto-indentation? (also, don't leave hanging parens.)

I have (setq-default lisp-body-indent 2) in my ~/.emacs, things seemed
for me indented like the rest of the source, as for hanging parens -
there's a comment (that though should be put before paren as it relates
to the last argument), it's unnecessary.

As for monospace/proportional fonts: I don't insist on change of
shr-use-fonts default value, paragraph justification won't work with
proportional fonts though.

Current version of shr.el don't fill the text at all and I don't know
maybe smb likes it and won't welcome the change (see pics), although ERC
fills text for example.
Reply | Threaded
Open this post in threaded view
|

bug#32281: shr.el align support patch

Bad Blue Bull
In reply to this post by Eli Zaretskii
OKay, I've trimmed the source and reverted default value of shr-use-fonts to t.
 
06.08.2018, 18:14, "Eli Zaretskii" <[hidden email]>:
The first line of the doc string should not be longer than 78
characters, and should be a complete sentence.

I don't know what do you mean of "complete sentence", docs say it must start with a capital letter and end with a period.
First sentence now fills 2 lines I know it's not good for apropos, I can edit it again if needed.

 

Why did you decide to use u+2028 and u+2029 for these purposes? Emacs
doesn't yet support these characters as Unicode intended, so using
them might have unexpected effects, and might produce different effect
if we start supporting them in the future.

I need to use a character to mark places where lines must be split (specified by <br> tags and end of list items), also a character to mark end of a paragraph to be filled (a mark can be used for this purpose but docs warn against it). These chars will be removed when a paragraph gets filled, I don't see them cause any trouble in the future and those values can easily be altered to diffirent ones if it happens.



 


shr.el.diff (4K) Download Attachment
shr.el (116K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

bug#32281: shr.el align support patch

Bad Blue Bull
 
 
 
I need to use a character to mark places where lines must be split (specified by <br> tags and end of list items), also a character to mark end of a paragraph to be filled (a mark can be used for this purpose but docs warn against it).

This phrase looks confusing, sorry, I meant push-mark can be used for this purpose but it's no good.
Reply | Threaded
Open this post in threaded view
|

bug#32281: shr.el align support patch

Noam Postavsky
In reply to this post by Bad Blue Bull
Bad Blue Bull <[hidden email]> writes:

>>  The first line of the doc string should not be longer than 78
>>  characters, and should be a complete sentence.
>
> I don't know what do you mean of "complete sentence", docs say it must
> start with a capital letter and end with a period.  First sentence now
> fills 2 lines I know it's not good for apropos, I can edit it again if
> needed.

Yeah, that's all we mean there, the whole sentence (i.e., from capital
letter to period) needs to fit on the first line.

>>  Why did you decide to use u+2028 and u+2029 for these purposes? Emacs
>>  doesn't yet support these characters as Unicode intended, so using
>>  them might have unexpected effects, and might produce different effect
>>  if we start supporting them in the future.
>
> I need to use a character to mark places where lines must be split
> (specified by <br> tags and end of list items), also a character to
> mark end of a paragraph to be filled (a mark can be used for this
> purpose but docs warn against it).
[also quoting clarification from following message]
> This phrase looks confusing, sorry, I meant push-mark can be used for
> this purpose but it's no good.

By "docs warn against it", were you referring to

    Novice Emacs Lisp programmers often try to use the mark for the wrong
    purposes.  See the documentation of ‘set-mark’ for more information.

That's just about using that particular function for non-interactive use
cases.  If it does makes sense to use a mark for this, just don't use
push-mark or set-mark, but rather copy-mark and save the resulting mark
object to a local variable.

> +  (save-excursion
> +   (let ((line-begin) (paragraph-end))
> + (forward-paragraph)
> + (insert "\x2029") ; use \x2029 (unicode paragraph separator) to mark end
> +   ; of a paragraph
> + (backward-paragraph)
> + (setq line-begin (point))
> + (while (not paragraph-end)
> + (re-search-forward bre-regexp)
> + (if (equal (match-string 0) "\x2029") (setq paragraph-end t))
> + ; end of paragraph reached
> + (if (or bre-del paragraph-end) (replace-match ""))
> + (unless (equal (char-after) ?\n) (insert "\n"))
> + ; I don't know why but two adjacent \n
> + ; leave an empty line after fill
> + (fill-region-as-paragraph line-begin (- (point) 1)
> + justify)
> + (setq line-begin (point)))
> + (delete-char -1))))

Regarding the indentation, here's what I get after running indent-region
over the above code:

  (save-excursion
    (let ((line-begin) (paragraph-end))
      (forward-paragraph)
      (insert "\x2029") ; use \x2029 (unicode paragraph separator) to mark end
                                        ; of a paragraph
      (backward-paragraph)
      (setq line-begin (point))
      (while (not paragraph-end)
        (re-search-forward bre-regexp)
        (if (equal (match-string 0) "\x2029") (setq paragraph-end t))
                                        ; end of paragraph reached
        (if (or bre-del paragraph-end) (replace-match ""))
        (unless (equal (char-after) ?\n) (insert "\n"))
                                        ; I don't know why but two adjacent \n
                                        ; leave an empty line after fill
        (fill-region-as-paragraph line-begin (- (point) 1)
                                  justify)
        (setq line-begin (point)))
      (delete-char -1)))

Notice especially the difference in the while loop body.  You mentioned
lisp-body-indent, but that only affects indentation increase after a
defun line.  I do see that your indentation after the let seems to be
based on 4-space tabs, but Emacs uses 8-space tabs.  If you could set
indent-tabs-mode to nil (which is what the .dir-locals in the git
repository does), that would take care of it regardless.



Reply | Threaded
Open this post in threaded view
|

bug#32281: shr.el align support patch

Eli Zaretskii
In reply to this post by Bad Blue Bull
> From: Bad Blue Bull <[hidden email]>
> Cc: "[hidden email]" <[hidden email]>
> Date: Tue, 07 Aug 2018 03:51:52 +0300
>
>  Why did you decide to use u+2028 and u+2029 for these purposes? Emacs
>  doesn't yet support these characters as Unicode intended, so using
>  them might have unexpected effects, and might produce different effect
>  if we start supporting them in the future.
>
> I need to use a character to mark places where lines must be split (specified by <br> tags and end of list
> items), also a character to mark end of a paragraph to be filled (a mark can be used for this purpose but docs
> warn against it). These chars will be removed when a paragraph gets filled, I don't see them cause any trouble
> in the future and those values can easily be altered to diffirent ones if it happens.

I'm not sure I understand why you needed a character for that role.
fill-region-as-paragraph accepts buffer positions, and
re-search-forward can be told not to search beyond a certain buffer
position.  So you should be able to record the positions in some
variables, and use them instead of inserting characters that need to
be deleted afterwards.

The disadvantage of inserting characters that were not there in the
first place is that if the user types C-g at some unfortunate moment,
these characters might be left in the buffer (unless you complicate
the code by arranging for them to be deleted in that case).  Using
buffer positions avoids all those complications.

Am I missing some reason why you needed characters as markers?

Thanks.



Reply | Threaded
Open this post in threaded view
|

bug#32281: shr.el align support patch

Bad Blue Bull
07.08.2018, 18:10, "Eli Zaretskii" <[hidden email]>:
 From: Bad Blue Bull <[hidden email]>
 Cc: "[hidden email]" <[hidden email]>
 Date: Tue, 07 Aug 2018 03:51:52 +0300

  Why did you decide to use u+2028 and u+2029 for these purposes? Emacs
  doesn't yet support these characters as Unicode intended, so using
  them might have unexpected effects, and might produce different effect
  if we start supporting them in the future.

 I need to use a character to mark places where lines must be split (specified by <br> tags and end of list
 items), also a character to mark end of a paragraph to be filled (a mark can be used for this purpose but docs
 warn against it). These chars will be removed when a paragraph gets filled, I don't see them cause any trouble
 in the future and those values can easily be altered to diffirent ones if it happens.


I'm not sure I understand why you needed a character for that role.
fill-region-as-paragraph accepts buffer positions, and
re-search-forward can be told not to search beyond a certain buffer
position. So you should be able to record the positions in some
variables, and use them instead of inserting characters that need to
be deleted afterwards.

The disadvantage of inserting characters that were not there in the
first place is that if the user types C-g at some unfortunate moment,
these characters might be left in the buffer (unless you complicate
the code by arranging for them to be deleted in that case). Using
buffer positions avoids all those complications.

Am I missing some reason why you needed characters as markers?

Thanks.

 

I've escaped using of paragraph separator symbol by using narrowing (can't just use constant position as fill-region-as-paragraph will change the text before it and it won't correspond the end of paragraph anymore).
Using a marker symbol to mark line breaks is still necessary, any attempt to avoid it would lead to ridicously complicated solution. Also note that shr-fill-paragraph-with-breaks is a great function that deserves to be inluded into fill.el, for example it allows a user to fill lists without joining lines inside of them (maybe upgrade needed to handle fill-prefix in such case, but still it's a function that can be useful outside of shr.el).
There's a problem found with images inside paragraphs and also tables got broken now that I switched to using narrowing (see pics).
fill-region-as-paragraph disregards fill-collumn when aligns pic, all I can do is not to justify images at all (but I'm really puzzled how to do it, I could use other marker just to determine where image starts and ends and skip that region, but since you're against such approach... )
Justification is badly implemented in fill, it's useless for proportional fonts and you can see it's bad for images too. TBH would be good to make fill.el just ignore justify in such cases (maybe giving a warning message).

test.jpg (177K) Download Attachment
test_ff.jpg (196K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

bug#32281: shr.el align support patch

Bad Blue Bull
 
 
07.08.2018, 19:54, "Bad Blue Bull" <[hidden email]>:
 
There's a problem found with images inside paragraphs and also tables got broken now that I switched to using narrowing (see pics).
fill-region-as-paragraph disregards fill-collumn when aligns pic, all I can do is not to justify images at all (but I'm really puzzled how to do it, I could use other marker just to determine where image starts and ends and skip that region, but since you're against such approach... )
Justification is badly implemented in fill, it's useless for proportional fonts and you can see it's bad for images too. TBH would be good to make fill.el just ignore justify in such cases (maybe giving a warning message).

But also I think it would be alright just to leave it as it is because who uses text flowing around images inside aligned paragraphs?
Reply | Threaded
Open this post in threaded view
|

bug#32281: shr.el align support patch

Eli Zaretskii
In reply to this post by Bad Blue Bull
> From: Bad Blue Bull <[hidden email]>
> Cc: "[hidden email]" <[hidden email]>,
> "[hidden email]" <[hidden email]>
> Date: Tue, 07 Aug 2018 19:54:58 +0300
>
> I've escaped using of paragraph separator symbol by using narrowing

That's also fine.

> (can't just use constant position as
> fill-region-as-paragraph will change the text before it and it won't correspond the end of paragraph anymore).

We have markers for that.  But if narrowing solves your problem, it's
okay to use that.

> Using a marker symbol to mark line breaks is still necessary, any attempt to avoid it would lead to ridicously
> complicated solution.

Can you explain why?  I still don't think I understand.

> fill-region-as-paragraph disregards fill-collumn when aligns pic, all I can do is not to justify images at all (but I'm
> really puzzled how to do it, I could use other marker just to determine where image starts and ends and skip
> that region, but since you're against such approach... )
> Justification is badly implemented in fill, it's useless for proportional fonts and you can see it's bad for images
> too. TBH would be good to make fill.el just ignore justify in such cases (maybe giving a warning message).

To support proportional fonts, we need:

  . convert all the calculations to work in pixels instead of in
    columns (or, equivalently, in units of the canonical character
    width, in which case you then must use floating-point values)
  . put on some (or all?) SPC characters a :width display property,
    with the suitably calculated width in pixels

The infrastructure for all of that already exists, it's just a matter
of coding.



Reply | Threaded
Open this post in threaded view
|

bug#32281: shr.el align support patch

Bad Blue Bull
In reply to this post by Bad Blue Bull
 
 
07.08.2018, 20:20, "Eli Zaretskii" <[hidden email]>:
 From: Bad Blue Bull <[hidden email]>
 Cc: "[hidden email]" <[hidden email]>,
         "[hidden email]" <[hidden email]>
 Date: Tue, 07 Aug 2018 19:54:58 +0300
 
 Using a marker symbol to mark line breaks is still necessary, any attempt to avoid it would lead to ridicously
 complicated solution.


Can you explain why? I still don't think I understand.
 

Yea there's a simple alternative to store a point in a global variable in beginning of functions that handle block tags then every time line break is needed fill text between point stored in that var and current point and then go on till the end of the block and fill the final part before end of tag block tag handling function. I needa fix issue with tables first, then I'll try to implement this solution, it seems that it can be more simple that the current one.
Reply | Threaded
Open this post in threaded view
|

bug#32281: shr.el align support patch

Lars Ingebrigtsen
In reply to this post by Bad Blue Bull
Bad Blue Bull <[hidden email]> writes:

> + (fill-region-as-paragraph line-begin (- (point) 1)

My guess is that this would be so slow as to make shr/eww unusable on
typical pages.

shr lives within the constraints of being written in a slow language
(Emacs Lisp) and eschews features that makes it too slow for normal
usage.  There's a gazillion rendering features we could add, but we
haven't because it's not practical within normal timing constraints.

--
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



Reply | Threaded
Open this post in threaded view
|

bug#32281: shr.el align support patch

Bad Blue Bull
In reply to this post by Bad Blue Bull
Lars Ingebrigtsen yeah I knew you'd say smth like that :-)
I wanted to make Emacs view fb2 files (ebooks), and I wanted to convert them into HTML and view them in EWW, but I've just found out that nov.el is a great package that displays epub files and I'll rather convert fb2 files to epub and view them in it.
So I withdraw this patch, can't say it was somehow slow on pages I tested though.
 
 
07.08.2018, 21:16, "Lars Ingebrigtsen" <[hidden email]>:

Bad Blue Bull <[hidden email]> writes:
 

 + (fill-region-as-paragraph line-begin (- (point) 1)


My guess is that this would be so slow as to make shr/eww unusable on
typical pages.

shr lives within the constraints of being written in a slow language
(Emacs Lisp) and eschews features that makes it too slow for normal
usage. There's a gazillion rendering features we could add, but we
haven't because it's not practical within normal timing constraints.
 

--
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



 

Reply | Threaded
Open this post in threaded view
|

bug#32281: shr.el align support patch

Noam Postavsky
tags 32281 wontfix
close 32281
quit

Bad Blue Bull <[hidden email]> writes:

> Lars Ingebrigtsen yeah I knew you'd say smth like that :-)
> I wanted to make Emacs view fb2 files (ebooks), and I wanted to
> convert them into HTML and view them in EWW, but I've just found out
> that nov.el is a great package that displays epub files and I'll
> rather convert fb2 files to epub and view them in it.
> So I withdraw this patch, can't say it was somehow slow on pages I tested though.

Okay, I'll close the bug then.

By the way, it seems to me there was some miscommunication over the term
"marker": you were using it to describe actual characters inserted into
the buffer, but you might want to take a look at `(elisp) Markers' which
can be used to similar effect but without actual characters cluttering
things up.