My only new years resolution this year was to read more books. Over the past couple of years, I had
essentially stopped reading books, and that realization was not a joyous one. Reading books
is a fun way to both learn and be entertained. Jhumpa Lahiri had a
good quote
in The Namesake: That's the thing about books. They let you travel without moving your feet.
I'm happy to say, that so far, I've maintained my resolution, having finished
two
books already
and am onto my third one.
Anyways, since I have a subscription to the Oakland Library, I figured it'd be better for me to borrow
the books I want to read as opposed to buying them each time (even though it's fun to fill out a
bookshelf). At the same time, I started keeping track of books I want to read on
Goodreads (the shelves they have, along with the ability to order
the books makes it a useful utility for me). The first book I borrowed from the library was
Thinking Fast and Slow and the process of me finding a copy to borrow left me wanting for a
better way to locate and hold a book. The story goes something like this: find the
next book I want to read from my GR shelf, search on the oakland library website for it's
availability, find out that it's not available, rinse and repeat until one
becomes available (took me a few days). The putting it on hold was particulary important
since I had one occasion where I saw the book was available at a particular branch,
immdiately left to go pick it up (without putting it on hold) and by the time I
reached there (~15 minutes), someone had already borrowed it. I eventually did find a copy
at a different branch but I really wanted a better way of doing things.
The solution to my problem was to build an API that accesses the Oakland Library Catalog (they
don't really have an API, so this library scrapes their website) and combine it with the
published API from GR (xml, really) to tell me whether the books on my shelf are available
at the library or not (and more in the future). At the same time, working on this project
gave me the opportunity to learn and use Python3, something I hadn't really done until now
(for no good reason besides laziness).
Building an "API" for the Oakland Library Catalog
To be able to search their catalog programatically, I had to write a simple scraper. This
was done using the awesome requests library
for making the http requests and the equally awesome
BeautilfulSoup library to parse the HTML
response to find the elements I'm looking for. The code for all of this is available on
github and
should be fairly easy to follow. The oaklibapi module is intended to be an API module
though and, if needed, can be imported into other projects as well. Some sample code showing
how to use it:
In [1]: from oaklibapi import OaklandLibraryAPI
In [2]: oaklib = OaklandLibraryAPI(isbn="9780060892999") # you can only search by the book's ISBN
In [3]: oaklib.is_available() # check if the book is available
Out[3]: True
In [4]: oaklib.title() # retrieve the title as returned by the Oakland Library Catalog
Out[4]: 'A canticle for Leibowitz
That's all there is to it as of now. I need to add a couple more enhancements, specifically
around retreiving the number of books available
as well as the branches they're available at.
An interesting bit to note here is that BeautifulSoup since v4 has started giving users the option to
specify a parser. I ended up using the the lxml parser for this since I had to use it
to parse XML from the GR api anyways. I didn't benchmark things, but as documented, and based on my
very non-scientific observations, I did perceive an improvement in instantiating the OaklandLibraryAPI
class.
Integrating with the Goodreads API
Fortunately, GR does publish an API. It does seem rather antiquated
though - responses are in XML (some are in JSON, which is weird too; why do only some of the
resources return either XML or JSON while most only return XML) and there are no official
client libraries. There are libraries
available on github. None of them really did what
I wanted them to do though (albeit, I only spent a short amount of time looking) and since it was
fairly easy to query the API and parse the response myself, I just ended up doing so, instead of
using one of the available libraries.
My goal with this API was to simply retrieve my
"to-read" bookshelf. It took me
some time to figure out that I had to use the reviews.list
resource for this (go figure). Only other thing to note here that BeautifulSoup does not seem to have a
default parser for XML documents. I had to explicitly define lxml while parsing the response from the API
(as opposed to HTML, which does seem to have a default parser, which does throw a big & ugly warning when
not overridden though).
Here's some sample code on how to use GoodreadsQueryAPI:
In [1]: from goodreads_api import GoodreadsQueryAPI
In [2]: gr = GoodreadsQueryAPI(user_id="75811584", access_key="") # insert your GR API access key for the second param
In [3]: for book in gr.get_books():
...: print(book)
...:
{'isbn': '9780802123459', 'title': 'The Sympathizer'}
{'isbn': '9780525427575', 'title': 'Enlightenment Now: The Case for Reason, Science, Humanism, and Progress'}
{'isbn': '9780670022953', 'title': 'The Better Angels of Our Nature: Why Violence Has Declined'}
{'isbn': '9780812536355', 'title': 'A Deepness in the Sky (Zones of Thought, #2)'}
{'isbn': '', 'title': 'The Wandering Earth: Classic Science Fiction Collection'}
{'isbn': '9780765309402', 'title': "Old Man's War (Old Man's War, #1)"}
{'isbn': '9781585422784', 'title': 'Ultramarathon Man: Confessions of an All-Night Runner'}
{'isbn': '9780316547611', 'title': 'The Power'}
{'isbn': '', 'title': 'We Are Legion (We Are Bob) (Bobiverse, #1)'}
{'isbn': '', 'title': 'Split Second'}
{'isbn': '9781101886724', 'title': 'Waking Gods (Themis Files, #2)'}
{'isbn': '9781594631764', 'title': 'And the Mountains Echoed'}
{'isbn': '9780062273208', 'title': 'The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers'}
{'isbn': '9781501126062', 'title': 'Sing, Unburied, Sing'}
{'isbn': '9780670026197', 'title': 'A Gentleman in Moscow'}
{'isbn': '9780553447439', 'title': 'Evicted: Poverty and Profit in the American City'}
{'isbn': '9781594204876', 'title': 'Grant'}
{'isbn': '9781501144318', 'title': 'Why We Sleep: Unlocking the Power of Sleep and Dreams'}
Of-course, many improvements can be made to the interface, but this works for a v0.1 :). I think it'd also be good to
be able to pull the name of the author as well as ordering
the books in the response by position defined by the user.
Putting it all together
Having the API for the library and Goodreads makes it easy for us to put everything together. Grabbing the list of
books in the GR shelf followed by querying each one to see it's availability is
done already.
GOODREADS_ACCESS_KEY and GOODREADS_USER_ID is read in from settings.py (a template for that is
available). For now, it simply
spits out the list of books along with it's availability (like below). A good v1.0 would be to implement
some form of a notification system.
(oaklibapi) ~ [master ] $ python oaklibsearcher.py
Looking for title=The Sympathizer, ISBN=9780802123459
Book with title=The sympathizer : a novel is available
Looking for title=Enlightenment Now: The Case for Reason, Science, Humanism, and Progress, ISBN=9780525427575
Book with title=ENLIGHTENMENT NOW : THE CASE FOR REASON, SCIENCE, HUMANISM, AND PROGRESS is not available
Looking for title=The Better Angels of Our Nature: Why Violence Has Declined, ISBN=9780670022953
Book with title=The better angels of our nature : why violence has declined is available
Looking for title=A Deepness in the Sky (Zones of Thought, #2), ISBN=9780812536355
Book with title= is not available
Looking for title=The Wandering Earth: Classic Science Fiction Collection, ISBN=
No ISBN available. Skipping
Looking for title=Old Man's War (Old Man's War, #1), ISBN=9780765309402
Book with title=Old man's war is not available
Looking for title=Ultramarathon Man: Confessions of an All-Night Runner, ISBN=9781585422784
Book with title=Ultramarathon man : confessions of an all-night runner is available
Looking for title=The Power, ISBN=9780316547611
Book with title=The power : a novel is not available
Looking for title=We Are Legion (We Are Bob) (Bobiverse, #1), ISBN=
No ISBN available. Skipping
Looking for title=Split Second, ISBN=
No ISBN available. Skipping
Looking for title=Waking Gods (Themis Files, #2), ISBN=9781101886724
Book with title=Waking gods is available
Looking for title=And the Mountains Echoed, ISBN=9781594631764
Book with title=And the mountains echoed is available
Looking for title=The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers, ISBN=9780062273208
Book with title=The hard thing about hard things : building a business when there are no easy answers is not available
Looking for title=Sing, Unburied, Sing, ISBN=9781501126062
Book with title=Sing, unburied, sing : a novel is not available
Looking for title=A Gentleman in Moscow, ISBN=9780670026197
Book with title=A gentleman in Moscow is not available
Looking for title=Evicted: Poverty and Profit in the American City, ISBN=9780553447439
Book with title=Evicted : poverty and profit in the American city is available
Looking for title=Grant, ISBN=9781594204876
Book with title=Grant is not available
Looking for title=Why We Sleep: Unlocking the Power of Sleep and Dreams, ISBN=9781501144318
Book with title=Why we sleep : unlocking the power of sleep and dreams is not available
That's about all there is to it, for now. There are quiet a few improvements that can be made to it and I'll be working on
a few of them over the coming weeks; I'll post a follow up as and when I learn some new things.