[PW] ?List of blurbs

ADSGarson O'Toole adsgarsonotoole at gmail.com
Mon Feb 28 14:52:40 PST 2022


John Henderson wrote:
> A friend has a question about the blurbs on the dust jacket of
> books. Are they indexed anywhere? Are they somehow
> searchable?

John Henderson has raised an intriguing question. The capability to
search blurbs would be helpful to researchers, but I do not know of
any existing capability of that type.

There is a natural approach using database augmentation and
segmentation that would be extremely helpful for performing this task.

Currently, a few major databases containing books and periodicals
exist, e.g., the Internet Archive, HathiTrust, and Google Books. Each
book (or periodical) consists of a series of scans together with some
metadata.

Proposal number one (I suspect that others have already proposed this,
but it has not been accomplished):

The scans of each book should be partitioned within the database. This
partitioning should be performed for every book, even if a book is
under copyright and only snippets can be shown to database users.

Scan of cover
Scans of blurbs
Scan of title page
Scans of publisher information page
Scans of foreword
Scans of preface
Scans of introduction
Scans of main body
Scans or afterword

If the preface, introduction, or afterword is written by an author who
differs from the main author(s) then the metadata should be augmented
with the section name and author name(s).

If chapters have different authors then the scans should be
partitioned into chapters, and the metadata should record each chapter
name and the corresponding author name(s).

When I say partition I am talking about the logical organization of
the database and not the physical layout for the implementation of the
memory hierarchy, e.g., ram, flash memory, hard drive, et cetera.

When the blurbs for each book have been properly identified and
partitioned it will be possible to simultaneously search the blurbs of
all the books (and only the blurbs). To go further, it would be
possible to partition each blurb section into individual blurbs with
authors, but this would be a more extensive change to the database.

When the introductions of each book have been properly identified and
partitioned it will be possible to simultaneously search the
introductions of all the books (and only the introductions). It will
also be possible to specify the author.

Overall more sophisticated queries will be possible, and the results
from queries will be more informative.

For the Internet Archive this partitioning process could be
crowdsourced. After a user borrowed a book (using controlled digital
lending), the user would examine the book (or periodical)  and
designate the starting page and ending page of each section. The
Internet Archive must build a tool to perform this task and create a
short tutorial. Partitions will sometimes overlap (i.e., share pages).

Proposal number two (very important to fix the problem of misdating):

The scans of each volume of a periodical should be partitioned.

Scans of first issue in volume (specify the date)
Scans of the second issue in the volume (specify the date)
Scans of the third issue of the volume (specify the date)
Scans of each of the remaining issues in turn. (specify each date)

The goal of this partitioning is to allow the precise determination of
the issue and date whenever the results of a query match are returned
to the user.

Garson O'Toole
Once a computer scientist, always a computer scientist
QuoteInvestigator.com

On Sat, Feb 19, 2022 at 5:32 AM John Henderson <jrhenderson9 at gmail.com> wrote:
>
> A friend has a question about the blurbs on the dust jacket of books. Are they indexed anywhere? Are they somehow searchable? Have any publishers compiled the blurbs in one place?
>
> Can you look up to see a list of what books that Neal Gaiman or Margo Livesay has contributed a blurb to?
>
> John Henderson
> Ithaca College Library, retired
> _______________________________________________
> Project Wombat - Project-wombat
> list at project-wombat.org
> http://www.project-wombat.org/


More information about the Project-Wombat-Open mailing list