Version 1.5 changes
Our feeds system has been updated as a result of the CaltechAUTHORS repository migrating from EPrints to Invenio RDM. As a result some things on feeds.library.caltech.edu had to be changed. This is a list of intended changes.
- Record ids changed between systems for CaltechAUTHORS so if you look at those links they look different
- In the recent directory there is no recent 25 for CaltechTHESIS content, just doesn’t make sense we graduate people in “class of” groupings, which 25 people should be listed? That has gone away as a result
- The People directory only lists Caltech People who have publications in CaltechAUTHORS
- With the migration from one system to another many Caltech GROUPs have revised names and identifiers, e.g.
/groups/TCCON
became /groups/Total-Carbon-Column-Observing-Network
- Minor HTML markup changes to make the feeds site more accessible (e.g. A to Z list in groups now uses a “menu” element instead of a paragraph with pipe delimiters)
- There are a few additional JSON documents included in the htdocs tree that are used to render content in Markdown, HTML and HTML include formats
- Dataset collections are no longer being published and the
*.keys
file are no longer generated
- Some legacy JSON documents have been preserved when possible but may go away in a future release
- Pandoc is used exclusively to render Markdown, HTML, HTML Includes, BibTeX and RSS files from JSON files rendered from the repositories and collections
- Pandoc templates can be found in the “templates” directory. Their file extensions correspond to the format they are intended to render
- The generated Markdown is used to render both HTML, HTML Includes
- HTML Include is generated directly by Pandoc without a template
- BibTeX and RSS required their own templates
- The
recent
directories and their content under individual groups and people are no longer being generated
- The software to generate the feeds website has been completely rewritten. There is invariably changes I have failed to catalog.
- The Caltech Groups list is based on a group having records in either CaltechAUTHORS, CaltechTHESIS or CaltechDATA and being identified within the metadata as a Caltech Group
- The Caltech People list is based on those individuals that are related to Caltech and have records in the CaltechAUTHORS repository
New Feature
- Pagefind provides searching of feed’s HTML pages
Organizational data flow changes for website content
- Everything in the “htdocs” tree is generated, this means that directory can be safely removed and recreated as needed
- Static files that need to be included in the “htdocs” tree can be found in the “static” directory (they are just copied into when needed)
- Generation order in “htdocs” tree is as follows
- static content is rendered into place
- JSON documents
- CSV documents
- Markdown (which is no longer linked in the feed pages)
- HTML/HTML Include
- BibTeX
- RSS
- PageFind indexing is done after tree is populated
System requirements
Feeds v1.5 requires the following software to be built
- irdmtools >= 0.0.59 (use the latest release)
- dataset >= 2.1.6 (use the latest release)
- py_dataset > 2.1.2 (use the latest release)
- pylint
- progressbar2
- PyYAML
- pybtex >= 0.24
- feedgen >= 0.9
- datatools >= 1.2.5 (use the latest release)
- Bash >= 3 (or equivalent POSIX shell)
- Pandoc >= 3
- Postgres >= 12 (prefer 16)
- Python >= 3.10
- PageFind >= v1.0.3
Bash scripts orchestrate most of the processing. Python is used to transformed the legacy data shapes into needed forms and to generate Markdown content via Pandoc.