Caltech Library logo

About (v1.5) is a metadata service provided by Caltech Library. It contains metadata harvested from our institutional repositories, public directory and archival systems. It is part of an initiative to allow data science techniques to be used on library and archive curated metadata for both internal and external needs.


The content of feeds can easily be hosted in a bucket oriented object store (e.g. AWS S3, Minio buckets). All content is calculated and pre-rendered. This doesn’t mean the content can not be made interactive. Modern browser applications can easily be built from the metadata if you know what you’re looking for and the path to retrieve an appropriate JSON document. The feed search implementation is a good example as well as our “widget builder” used to make it easy to integrated feed content into your favorite CMS.

The implementation first harvests from our curated data sources (i.e. CaltechAUTHORS, CaltechDATA, CaltechTHESIS, CaltechGROUPS and CaltechPEOPLE) storing those results as dataset collections. These are then aggregated via dsquery and Python programs. First JSON documents are rendering then Markdown documents, BibTeX and RSS. Finally Pandoc is used to render HTML and HTML include documents from the Markdown content.

More details can be found for changes in v1.5 Changes
Markdown and HTML documents.