Searching the Oakland Library for books from your Goodreads Shelf

My only new years resolution this year was to read more books. Over the past couple of years, I had essentially stopped reading books, and that realization was not a joyous one. Reading books is a fun way to both learn and be entertained. Jhumpa Lahiri had a good quote in The Namesake: That's the thing about books. They let you travel without moving your feet. I'm happy to say, that so far, I've maintained my resolution, having finished two books already and am onto my third one.

Anyways, since I have a subscription to the Oakland Library, I figured it'd be better for me to borrow the books I want to read as opposed to buying them each time (even though it's fun to fill out a bookshelf). At the same time, I started keeping track of books I want to read on Goodreads (the shelves they have, along with the ability to order the books makes it a useful utility for me). The first book I borrowed from the library was Thinking Fast and Slow and the process of me finding a copy to borrow left me wanting for a better way to locate and hold a book. The story goes something like this: find the next book I want to read from my GR shelf, search on the oakland library website for it's availability, find out that it's not available, rinse and repeat until one becomes available (took me a few days). The putting it on hold was particulary important since I had one occasion where I saw the book was available at a particular branch, immdiately left to go pick it up (without putting it on hold) and by the time I reached there (~15 minutes), someone had already borrowed it. I eventually did find a copy at a different branch but I really wanted a better way of doing things.

The solution to my problem was to build an API that accesses the Oakland Library Catalog (they don't really have an API, so this library scrapes their website) and combine it with the published API from GR (xml, really) to tell me whether the books on my shelf are available at the library or not (and more in the future). At the same time, working on this project gave me the opportunity to learn and use Python3, something I hadn't really done until now (for no good reason besides laziness).

Building an "API" for the Oakland Library Catalog

To be able to search their catalog programatically, I had to write a simple scraper. This was done using the awesome requests library for making the http requests and the equally awesome BeautilfulSoup library to parse the HTML response to find the elements I'm looking for. The code for all of this is available on github and should be fairly easy to follow. The oaklibapi module is intended to be an API module though and, if needed, can be imported into other projects as well. Some sample code showing how to use it:

In [1]: from oaklibapi import OaklandLibraryAPI

In [2]: oaklib = OaklandLibraryAPI(isbn="9780060892999") # you can only search by the book's ISBN

In [3]: oaklib.is_available() # check if the book is available
Out[3]: True

In [4]: oaklib.title() # retrieve the title as returned by the Oakland Library Catalog
Out[4]: 'A canticle for Leibowitz

That's all there is to it as of now. I need to add a couple more enhancements, specifically around retreiving the number of books available as well as the branches they're available at.

An interesting bit to note here is that BeautifulSoup since v4 has started giving users the option to specify a parser. I ended up using the the lxml parser for this since I had to use it to parse XML from the GR api anyways. I didn't benchmark things, but as documented, and based on my very non-scientific observations, I did perceive an improvement in instantiating the OaklandLibraryAPI class.

Integrating with the Goodreads API

Fortunately, GR does publish an API. It does seem rather antiquated though - responses are in XML (some are in JSON, which is weird too; why do only some of the resources return either XML or JSON while most only return XML) and there are no official client libraries. There are libraries available on github. None of them really did what I wanted them to do though (albeit, I only spent a short amount of time looking) and since it was fairly easy to query the API and parse the response myself, I just ended up doing so, instead of using one of the available libraries.

My goal with this API was to simply retrieve my "to-read" bookshelf. It took me some time to figure out that I had to use the reviews.list resource for this (go figure). Only other thing to note here that BeautifulSoup does not seem to have a default parser for XML documents. I had to explicitly define lxml while parsing the response from the API (as opposed to HTML, which does seem to have a default parser, which does throw a big & ugly warning when not overridden though).

Here's some sample code on how to use GoodreadsQueryAPI:

In [1]: from goodreads_api import GoodreadsQueryAPI

In [2]: gr = GoodreadsQueryAPI(user_id="75811584", access_key="") # insert your GR API access key for the second param

In [3]: for book in gr.get_books():
   ...:     print(book)
   ...:
{'isbn': '9780802123459', 'title': 'The Sympathizer'}
{'isbn': '9780525427575', 'title': 'Enlightenment Now: The Case for Reason, Science, Humanism, and Progress'}
{'isbn': '9780670022953', 'title': 'The Better Angels of Our Nature: Why Violence Has Declined'}
{'isbn': '9780812536355', 'title': 'A Deepness in the Sky (Zones of Thought, #2)'}
{'isbn': '', 'title': 'The Wandering Earth: Classic Science Fiction Collection'}
{'isbn': '9780765309402', 'title': "Old Man's War (Old Man's War, #1)"}
{'isbn': '9781585422784', 'title': 'Ultramarathon Man: Confessions of an All-Night Runner'}
{'isbn': '9780316547611', 'title': 'The Power'}
{'isbn': '', 'title': 'We Are Legion (We Are Bob) (Bobiverse, #1)'}
{'isbn': '', 'title': 'Split Second'}
{'isbn': '9781101886724', 'title': 'Waking Gods (Themis Files, #2)'}
{'isbn': '9781594631764', 'title': 'And the Mountains Echoed'}
{'isbn': '9780062273208', 'title': 'The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers'}
{'isbn': '9781501126062', 'title': 'Sing, Unburied, Sing'}
{'isbn': '9780670026197', 'title': 'A Gentleman in Moscow'}
{'isbn': '9780553447439', 'title': 'Evicted: Poverty and Profit in the American City'}
{'isbn': '9781594204876', 'title': 'Grant'}
{'isbn': '9781501144318', 'title': 'Why We Sleep: Unlocking the Power of Sleep and Dreams'}

Of-course, many improvements can be made to the interface, but this works for a v0.1 :). I think it'd also be good to be able to pull the name of the author as well as ordering the books in the response by position defined by the user.

Putting it all together

Having the API for the library and Goodreads makes it easy for us to put everything together. Grabbing the list of books in the GR shelf followed by querying each one to see it's availability is done already. GOODREADS_ACCESS_KEY and GOODREADS_USER_ID is read in from settings.py (a template for that is available). For now, it simply spits out the list of books along with it's availability (like below). A good v1.0 would be to implement some form of a notification system.

(oaklibapi) ~ [master ] $ python oaklibsearcher.py
Looking for title=The Sympathizer, ISBN=9780802123459
Book with title=The sympathizer : a novel is available
Looking for title=Enlightenment Now: The Case for Reason, Science, Humanism, and Progress, ISBN=9780525427575
Book with title=ENLIGHTENMENT NOW : THE CASE FOR REASON, SCIENCE, HUMANISM, AND PROGRESS is not available
Looking for title=The Better Angels of Our Nature: Why Violence Has Declined, ISBN=9780670022953
Book with title=The better angels of our nature : why violence has declined is available
Looking for title=A Deepness in the Sky (Zones of Thought, #2), ISBN=9780812536355
Book with title= is not available
Looking for title=The Wandering Earth: Classic Science Fiction Collection, ISBN=
No ISBN available. Skipping
Looking for title=Old Man's War (Old Man's War, #1), ISBN=9780765309402
Book with title=Old man's war is not available
Looking for title=Ultramarathon Man: Confessions of an All-Night Runner, ISBN=9781585422784
Book with title=Ultramarathon man : confessions of an all-night runner is available
Looking for title=The Power, ISBN=9780316547611
Book with title=The power : a novel is not available
Looking for title=We Are Legion (We Are Bob) (Bobiverse, #1), ISBN=
No ISBN available. Skipping
Looking for title=Split Second, ISBN=
No ISBN available. Skipping
Looking for title=Waking Gods (Themis Files, #2), ISBN=9781101886724
Book with title=Waking gods is available
Looking for title=And the Mountains Echoed, ISBN=9781594631764
Book with title=And the mountains echoed is available
Looking for title=The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers, ISBN=9780062273208
Book with title=The hard thing about hard things : building a business when there are no easy answers is not available
Looking for title=Sing, Unburied, Sing, ISBN=9781501126062
Book with title=Sing, unburied, sing : a novel is not available
Looking for title=A Gentleman in Moscow, ISBN=9780670026197
Book with title=A gentleman in Moscow is not available
Looking for title=Evicted: Poverty and Profit in the American City, ISBN=9780553447439
Book with title=Evicted : poverty and profit in the American city is available
Looking for title=Grant, ISBN=9781594204876
Book with title=Grant is not available
Looking for title=Why We Sleep: Unlocking the Power of Sleep and Dreams, ISBN=9781501144318
Book with title=Why we sleep : unlocking the power of sleep and dreams is not available

That's about all there is to it, for now. There are quiet a few improvements that can be made to it and I'll be working on a few of them over the coming weeks; I'll post a follow up as and when I learn some new things.

social