pretty permalink urls for flipbook reader

Bug #302656 reported by raj
2
Affects Status Importance Assigned to Milestone
Open Library
Fix Released
Undecided
Unassigned

Bug Description

We need a nice URL for accessing flipbook.. i.e. http://www.archive.org/book/id

We also need a nice page-level permalink url, i.e. http://www.archive.org/page/id/{index, leafnum, pagenum}

We need subpage linking, ISBN linking, openurl-style citation linking, etc.

Tags: qa-verified
raj (raj-archive)
Changed in gnubook:
assignee: nobody → raj-archive
importance: Undecided → High
mangtronix (mang)
Changed in gnubook:
assignee: raj-archive → mang
status: New → In Progress
Revision history for this message
mangtronix (mang) wrote :

Additionally we need a way to specify at least search term highlighting. See feature request https://bugs.launchpad.net/openlibrary/+bug/126611 in Open Library.

Revision history for this message
solrize (solrize) wrote :

We also need to have a fulltext search api that the flipbook would use instead of the current hack of regexp searching through the abby XML. Among other things that would save a ton of disk space by letting us gzip the xml.

Revision history for this message
mangtronix (mang) wrote :

This is required for Archive.org EAD support.

Revision history for this message
mangtronix (mang) wrote :

Here's the latest version of the URL scheme (to be implemented): http://openlibrary.org/dev/docs/bookurls

Revision history for this message
Anand Chitipothu (anandology) wrote : Re: [Bug 302656] Re: pretty permalink urls for flipbook reader

2009/4/2 mangtronix <email address hidden>:
> Here's the latest version of the URL scheme (to be implemented):
> http://openlibrary.org/dev/docs/bookurls

Will the URL be changed if user scrolls to the next page?

Revision history for this message
mangtronix (mang) wrote :

It currently does so that's the plan.

Revision history for this message
Anand Chitipothu (anandology) wrote :

> It currently does so that's the plan.

It changes only the anchor, not the whole url.

changing the whole URL will require the page to be reloaded and it
will disrupt the reading experience of the user.

Revision history for this message
mangtronix (mang) wrote :

That's a good point. We could move the page number after the hash-mark though it wouldn't look as nice.

Revision history for this message
Anand Chitipothu (anandology) wrote :

2009/4/7 mangtronix <email address hidden>:
> That's a good point.  We could move the page number after the hash-mark
> though it wouldn't look as nice.

I don't think it is feasible to do it the other way.
Since you don't store the position of the page on the screen (in
single page mode), if you reload the page by changing the URL, it may
not be able to restore the page correctly.

Revision history for this message
raj (raj-archive) wrote :

One thought is you can append a page (or image) number to the hash, that overrides the page key. For example, scrolling down one page might lead to something like this:

http://www.archive.org/stream/aliceinwonderlan00carriala/page/23#page24

If someone was to copy and paste that url, the bookreader, on init(), could redirect to /page/24

If scrolling updates the hash, do we need a 'cite this page' link that produces canonical urls?

Revision history for this message
mangtronix (mang) wrote :

Hmmm, I wonder if we shouldn't just put everything after the hash when someone comes in for interactive reading. We want a 'cite this page' link but also to support bookmarking and copy-paste out of the location bar as well.

e.g. if you come in here:
http://www.archive.org/stream/aliceinwonderlan00carriala/page/23

we would redirect you to here:
http://www.archive.org/stream/aliceinwonderlan00carriala#page23

But then we need to figure out how to encode the other key-value pairs.

Revision history for this message
solrize (solrize) wrote :

Segments (parts of the url after the #) can only be read with javascript, right? Even though the book reader is in javascript, maybe it's better to use query parameters, so that the urls might be usable in non-js implementations (e.g. that render the page on the server side).

http://blog.kfish.org/2009/04/discovery-and-fallback-for-media.html

may be of interest.

Revision history for this message
mangtronix (mang) wrote :

Thanks for the link. We want to dynamically update the URL in the JavaScript viewer. The URLs given out by the share/embed/link button could still use /page/n

Revision history for this message
mangtronix (mang) wrote :

YouTube uses the # fragment syntax, fwiw. E.g. http://www.youtube.com/watch?v=5RbbQmnonuU#t=0m18s

The part after the hash mark is only visible on the client-side (in general) so this technique precludes server-side fallback.

Revision history for this message
mangtronix (mang) wrote :

Here's an idea:

1. Canonical URLs will not use fragment syntax (will not contain a hash mark). The embed/share functionality only generates canonical URLs.

2. User agents visiting a canonical URL that maps to a non-JS reader will read consecutive pages by loading a new page, so the reader url will be updated correctly in this case.
e.g. after retrieving http://www.archive.org/stream/alice/page/23 it will retrieve http://www.archive.org/stream/alice/page/24

3. User agents visiting a canonical URL that maps to a JS client side reader will be redirected to a URL where the trailing portion of the URL containing reader options is replaced with a hash fragment. We use a redirect to avoid multiple page loads.
e.g. the browser requests http://www.archive.org/stream/alice/page/23 is redirected to http://www.archive.org/alice#page/23

The browser client side will dynamically update the browser location using location.replace. This allows bookmarking and if the window URL is copied and pasted the correct page will load.

If the embed/share functionality is invoked a canonical URL is generated. This is more future proof and looks better in citations.

The problem with this scheme is if we redirect someone to http://www.archive.org/alice#page/23 since that will be served with a JS client-side reader and they *don't* have JavaScript we lose everything after the hash mark. Not sure how big a problem this is.

Revision history for this message
mangtronix (mang) wrote :

Hmm it turns out going through our code path for /stream is pretty slow (needs to do a bunch of checks to see what the type of the item is) so I'm hesitant to do a Location redirect as I suggested in #3 above

I think I'll use Raj's suggestion and have the JS user agent append the current location using the hash mark and window.replace. This also solves the problem of sending an non-JS user agent to a URL where state is only encoded after the hash mark.

So browser-generated URLs will look like this as the user browses through the book:
http://www.archive.org/stream/aliceinwonderlan00carriala/page/23#page24

The canonical URL for this example as generated by the "share" functionality would be this:
http://www.archive.org/stream/aliceinwonderlan00carriala/page/24

Revision history for this message
mangtronix (mang) wrote :

I have GnuBook working at /stream on my dev host supporting the old #n page index functionality.

Revision history for this message
mangtronix (mang) wrote :

After further thought hacking for our AJAX bookreader (GnuBook, on /stream) ONLY using the hash mark seems to make more sense, since this can be updated dynamically giving more consistent urls (no /page/23 followed by overriding #page/24). We don't currently have plans for a non-JS reader on /stream.

Some typical bookreader URLs:
http://www.archive.org/stream/alice#page/23
http://www.archive.org/stream/alice#i/30/mode/2up

mangtronix (mang)
Changed in gnubook:
milestone: none → 0.9.4
mangtronix (mang)
Changed in gnubook:
milestone: 0.9.4 → 0.9.6
mangtronix (mang)
Changed in gnubook:
milestone: 0.9.6 → 0.9.7
mangtronix (mang)
Changed in gnubook:
status: In Progress → Triaged
Revision history for this message
mangtronix (mang) wrote :

Please check that the page, index and mode portions of the bookreader URL support are working properly.

Here's an example URL:
http://www-mang.archive.org/stream/polarregionsofwe00snelrich#page/31/mode/1up

This doc explains how the URL handling should work:
http://openlibrary.org/dev/docs/bookurls

You should, for example:
* try changing the number after page to different (and invalid values)
* try to break the URL handling (repeating parts of the URL, bad formatting)
* see that the order of the options after the hash mark is correct as specified in the document

Changed in gnubook:
assignee: mangtronix (mang) → Bonnie Real (bonnie-archive)
tags: added: needs-qa
Revision history for this message
mangtronix (mang) wrote :

Only the mode, index (i) and page parts of the URL are currently implemented. Specifying the page index just using a number should also work.

e.g.
http://www-mang.archive.org/stream/polarregionsofwe00snelrich#32

tags: added: qa-verified
removed: needs-qa
Changed in gnubook:
assignee: Bonnie Real (bonnie-archive) → mangtronix (mang)
mangtronix (mang)
Changed in gnubook:
status: Triaged → Fix Released
Changed in openlibrary:
assignee: nobody → mangtronix (mang)
assignee: mangtronix (mang) → nobody
Changed in openlibrary:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.