Click to download the newsletter in PDF format

Sluit venster

home


twee

Technique | Quality

Quality

The quality of 18th-century books is of course less than desirable. This is caused by a variety of problems, for example:

  • Books are damp
  • Books are affected by vermin
  • Books are damaged (see photograph)

In the preparatory stage, the material preparers closely examine the books and decide whether the books qualify for inclusion in the digitisation process. Highly damaged books or ones that are extremely fragile are not included in the selection.

Image Quality

The Koninklijke Bibliotheek pays a great deal of attention to the quality of the digital images. The project is not just intended as a portal to large quantities of materials, but also should meet serious quality standards. At the preparatory stage, when choosing the company that will be chosen to carry out the work, a careful survey must be made of the quality that the company should supply. While the project is running, these standards will be monitored carefully by performing (computerised) quality controls of (a selection of) the materials delivered using an Oracle application. Not only will this check that there is the correct number of files, the correct file names, the correct links between files, but also that the contents of files that have been delivered are correct and that they are technically correct. If the quality standards are not met, the company is to redeliver the files.

Character Recognition Quality

Good quality images guarantee optimal optical character recognition (OCR). Character recognition is of utmost importance given that it guarantees that the materials are properly made available to the public.
The better the character recognition, the more information the website visitor will ultimately be able to find. An initial set of limited tests to check the accuracy of the character recognition we will be dealing with give a varied picture, as is to be expected of the differences in range of quality of the various materials we will be working with.
The OCR test (conducted with FineReader, version 8.0) was intended for determining the accuracy of the OCR at word level in a limited number of randomly chosen works from 1780-1800.

An example of a poor character recognition and one of an accurate recognition:

drie

Source: Het geredde kind, of De getrouwe hond. = L'enfant arraché au peril. / [By Johann Jakob Kämmerer].;Translated from German

This page is ridden with bleeding ink and the printing is not entirely straight, so character recognition is not optimal. It is a section of French text abounding in diacritical marks. On the left, we see the scanned image of the page and on the right, the resulting OCR with the correctly recognised words in green. In this case, only little over half the words have been recognised correctly.

vier

Bron: Beknopte geschiedenis der Fransche staats-omwenteling. / By J.P. Rabaud. ; Translated from the French and annotated

This text has produced a virtually letter-perfect character recognition, where only a few words have not been correctly recognised.

OCR in the Dutch Prints Online project must confront a number of specific problems. For example, the spelling used in 1780-1800 and the use of specific characters such as the long S that the software interprets as an –f – (see text fragment):

vijf