VARUNA JAYASIRI

@vpj

Page Breaks

April 27, 2016

How to automatically add sensible page breaks?

In Wallaptta we model pagination as a cost minimization problem. That is, we try to find where to place page breaks so that there is no overflow and the cost is minimised. If the cost of adding a page break at a given point is know this can be easily solved with dynamic programming.

The question is how to determine the cost of adding a page-break at a given point.

First we calculated the cost based on the type of sections we split; e.g. list, paragraph, table, between to paragraphs, between two topics. This way the program would try to insert page breaks where the cost is less, like between two topics. Often inserting a page break will split multiple sections.

1 Item one
2 Item two
  <<<<< Page break
  Item two details
3 Item three

For instance the above page break splits a set of paragraphs as well as a list of items. It's important to know the hierarchy of the document when calculating this cost.

When we first tried with just the above cost, it gave two main problems:

  1. Not splitting at the best points

    e.g. when only 4 items would fit in a page

    1. Item 1
    <<<<< Page break
    2. Item 2
    3. Item 3
    4. Item 4
    5. Item 5

    The algorithm would place the break between 1 and 2, whilst the best place is between 4 and 5.

  2. Empty space

    Like in the above example it would leave a lot of empty space in initial pages while filling up later pages.

The first reaction to this was to add breaks at later points if the cost is the same. This didn't work out well as there were instances where the cost of breaking at a earlier point was slightly less - although the cost was less it didn't look nice.

So we introduced a cost based on the height of upper-part of the sections we are splitting. And another cost based on the height of the page left empty.

The cost based on height of the upper-part is an inversely proportionaly to the height; i.e. there's a very large cost if you break a section right after the start. And the cost of empty pages is also inversely proportional to the height filled with content; i.e. a large cost if only a small portion of a page is filled. The empty page cost is not added for the last page.

This algorithm seem to give decent results so far. Here are some example

Cost based on the type of sections getting splitted

Cost based on the height of upper-part of the sections getting splitted, and on the height of the page left empty

--How to automatically add sensible page breaks? In <<https://github.com/vpj/wallapatta(Wallaptta)>> we model pagination as a cost minimization problem. That is, we try to find where to place page breaks so that there is no overflow and the cost is minimised. If the cost of adding a page break at a given point is know this can be easily solved with --dynamic programming. The question is how to determine the cost of adding a page-break at a given point. First we calculated the cost based on the type of sections we split; e.g. --list--, --paragraph--, --table--, --between to paragraphs--, --between two topics--. This way the program would try to insert page breaks where the cost is less, like between two topics. Often inserting a page break will split multiple sections. >>> Cost based on the type of sections getting splitted ``` 1 Item one 2 Item two <<<<< Page break Item two details 3 Item three For instance the above page break splits a set of paragraphs as well as a list of items. It's important to know the hierarchy of the document when calculating this cost. When we first tried with just the above cost, it gave two main problems: - --Not splitting at the best points e.g. when only 4 items would fit in a page ``` 1. Item 1 <<<<< Page break 2. Item 2 3. Item 3 4. Item 4 5. Item 5 The algorithm would place the break between 1 and 2, whilst the best place is between 4 and 5. - --Empty space Like in the above example it would leave a lot of empty space in initial pages while filling up later pages. The first reaction to this was to add breaks at later points if the cost is the same. This didn't work out well as there were instances where the cost of breaking at a earlier point was slightly less - --although the cost was less it didn't look nice. So we introduced a cost based on the height of upper-part of the sections we are splitting. And another cost based on the height of the page left empty. >>> Cost based on the height of upper-part of the sections getting splitted, and on the height of the page left empty The cost based on height of the upper-part is an inversely proportionaly to the height; i.e. there's a very large cost if you break a section right after the start. And the cost of empty pages is also inversely proportional to the height filled with content; i.e. a large cost if only a small portion of a page is filled. --The empty page cost is not added for the last page. This algorithm seem to give decent results so far. Here are some example * <<http://vpj.github.io/images/posts/page_breaks/sample_a4.pdf(A guide - A4 size)>> * <<http://vpj.github.io/images/posts/page_breaks/sample_a5.pdf(Same guide - A5 size)>>