FactMiners and The Softalk Apple Project are excited to announce that The Softalk Apple Project's digital collection of the Apple edition of Softalk magazine is now included in both "The Computer Magazine Archives" and "Magazine Rack" collections at the Internet Archive. Our projects also were granted full admin rights to the Softalk Apple collection at the Archive in support of our applied research.
And the first BIG NEWS made possible by our having full admin access is that The Softalk Apple Project collection is the FIRST (and so-far ONLY) digital magazine, newspaper, or serial publication at the Archive to provide XML-based FactMiners' MAGAZINE #GTS (Ground Truth Storage) metadata files for each issue of the magazine as well as a "master file" for the entire publication/collection!
"Ground Truth Storage" is a term that image-analysis and text-mining researchers use for metadata files that are human-curated and validated as (close to) 100% accurate as possible. The #GTS format that we developed at FactMiners is based on an 'ontological stack' of the #cidocCRM (the International Council of Museums' Conceptual Reference Model for Cultural Heritage), FRBRoo (the IFLA's Functional Requirements for Bibliographic Records) and PRESSoo (the IFLA model for serial publications). Rather than focus on the within-page ground truth of individual page layout and text recognition, the FactMiners' MAGAZINE metadata format incorporates a comprehensive, publication-wide metadata model that integrates the complex Document Strucure and Content Depiction models.
In an effort to keep our project collaborators and supporters informed, we made two short demo screencast video updates about our progress developing the Python-based ppg2leaf_ferret metadata discovery and validation tool:
with the second update showcasing our generalization of the ferret to handle bottom-margin page number spotting. The issue we quickly explore is the famous August 1981 issue of Byte magazine all about Smalltalk:
To take a look at the initial iteration of the FactMiners MAGAZINE #GTS (Ground Truth Storage) format metadata files at the Internet Archive, see here:
for the publication level MAGAZINE #GTS file, and here:
for an example of the issue level MAGAZINE #GTS metadata file.
Keep in mind that this initial publication of our MAGAZINE metadata files is very "thin" at the moment, with mostly empty placeholder tags that will be filled in with full models and associated datasets. The XML Schemas for the MAGAZINE format are published and available to all researchers via the FactMiners website. See the XML header of the above metadata files for the standard XML schema location reference to these files.
The individual issue level MAGAZINE #GTS metadata files include a "ppg2leaf_map" that guarantees the relationship between Softalk's print page numbers and their respective "leaf" images in the digital copies at the Archive. While our MAGAZINE files are admittedly lean at the moment, The Softalk Apple Project already has extensive data "in the can" -- being curated Advertisers Index, mastheads, Table of Contents, and lists of Companies, People, Products, etc. who made or were covered editorially in the magazine. We are currently writing the Python scripts to generate the XML metadata that will begin populating the publication level model and dataset metadata. Issue-specific subsets of our data will also be included in the issue level metadata files.
The publication of the these MAGAZINE #GTS files is the subject of FactMiners' first paper submitted to #DATeCH2017, and the ppg2leaf mapping found in the issue level files is the subject of our second paper to this EuropeanaTech Digital Humanities research conference which is scheduled to take place in Germany in early June.
The FactMiners #GTS format is being evolved as a resource to support eResearch and machine learning at the Internet Archive. As always, comments and questions are welcome. Even better, we welcome volunteers who would like to become involved in our #CitizenScience and #CitizenHistory projects, FactMiners and The Softalk Apple Project. To express your interest feel free to contact us through this website or via social media channels.
For those interested in reading pre-review PDFs of our #DATeCH2017 submissions, they are available to those with ResearchGate.net access at "Ground Truth & Softalk Magazine: Using Aletheia Web Edition to do FactMiners’ Text-mining" and "Print-Page Number to "Leaf" ID Mapping in Support of eResearch and Machine-Learning at the Internet Archive". Others interested in our applied research may request personal communication copies via the contact form on this website or through any of the social media channels in which we are active.