Comparator Command Line Tool

Content


About the Comparator

In the field of digital preservation, file format migration is mostly a decision without any alternative, especially in the case of file formats not being supported by software anymore .
Unfortunately, there is a lack of tools able to control and verify the success of such data migrations in an effective and comprehensive way. The Comparator is meant to fill this gap. It is intended to compare large collections of files to get reliable results on the quality of single migrated files as well as the whole collection. For more about the first steps, compilation, and usage of the Comparator tool please refer the XCL Documentation.

The Comparator is currently available as a command-line tool. Latest version is version 1.1 (Mai 2010). The software is distributed as platform-independent source code and additionally as precompiled binary for Windows (built with MingW compiler on Windows XP professional, SP3). For instructions on how to built Comparator from source code, see here.

Download

The Comparator together with the Extractor are available in an 'easy-to-install' version for Windows (Download here).
To use the Comparator on Linux, Mac and Windows also you have to download the Comparator source from Planets SourceForge (Download here) and to compile the source on your machine.


Version 1.1 Download Comparator Version 1.1 (source and pre-compiled win32 binary, 31.05.2010)
Version 1.0 Download Comparator Version 1.0 (source and pre-compiled win32 binary, 30.09.2009)


Compilation/Installation

If you decide to build the software from source code you should follow the subsequent steps.
  • The whole package is wrapped in a zip file, so unpack it to a local directory of your choice (you need to have a software for unpacking zip files installed on your system, e.g., http://www.info-zip.org/).
  • After doing this you should have gotten a directory structure like this:
    directory/ files sub-directory/ files contained in
    doc/ various documentary files
    res/ various resources Comparator needs to work successfully:
    schemas/config/ : comparatorConfig.xsd
    schemas/xcdl/ : XCDLCore.xsd, XCDLBasicTypes.xsd, preserve.xsd
    schemas/xcdl/image/ : XCDLImageProperties.xsd
    schemas/xcdl/text/ : XCDLTextProperties.xsd
    src/ complete bundle of source code files
    test/ some files for testing Comparator (including Comparator configuration files 'cocoImage.xml' and 'cocoText.xml'
    bin/ binary windows distribution of Comparator: comparator.exe, xercesc.dll
    scripts/ various template scripts for building Comparator
  • Build the source code either by following the instructions of the compiler/ IDE of your choice or by using one of the preformatted scripts you find in the 'scripts/' directory to build from command-line interface. There are currently two preformatted scripts available, 'buildOnMac.sh' for building the software on Macintosh OS or 'buildOnLinux.sh' for building it on Linux OS. You can use the scripts as a template for your customized configuration. In many cases, only a few parameters need to be adjusted.
    In any case you need to adjust the include path (-I) that leads to the directory where xerces-c is located and the path to the xerces-c libraries (-L). Do not forget to add the name of the library itself (in the template 'buildOnLinux.sh' this is the entry: -lxerces-c). Please note that the name of this library may vary depending on the OS and compiler. Also note that the syntax for indication of include and library paths may vary depending on the compiler you use.
    If you build the source code from an IDE, you may also need to set the include paths and to link the xerces-c library - do not forget about this. If you use one of the script templates, do not forget to make sure that the entries for generating the object files (-o) actually point to the directories where the source files are located.
  • If you succeed in building the software, you will find the executable file in the location you declared in the script (first parameter after -o) or chose through your IDE.

Running Comparator

The Comparator is currently available as a program executable from the command-line interface of the given system. Please see the the official XCL specification for comprehensive information on how to run and apply comparator. A short introduction can be found here.
IMPORTANT: Before you start to run Comparator, please check that the following condition applies:
Make sure that Comparator is able to catch the schema files located in 'res/' directory. This is always the case if Comparator executable file (in case of win binary distribution also xercesc.dll) and 'res/' directory are within the same directory! Please note that you may neither change the directory structure within 'res/' nor change any of the directory or file names included there (even the naming of 'res/' itself). Comparator crucially depends on this in the given version in order to work properly, most notably for validation of the XML based files.
E.g.: Your Comparator executable file lies in the directory: ./xcl/comparator/ . Everything is ok if the 'res/' directory you obtained from the download is also located within this directory.
To run Comparator, type the following command in the command line:
pathToComparator [pathToSourceXCDLFile] [pathToTargetXCDLFile] [-c pathToComparatorConfigFile]
where:
  • 'pathToComparator' is the path to the Comparator executable file,
  • 'pathToSourceXCDLFile' is (the path to) an XCDL file,
  • 'pathToTargetXCDLFile' is (the path to) another XCDL file,
  • '-c pathToComparatorConfigFile' is a mandatory switch followed by the path to a Comparator configuration file.

For a more detailed explanation of the parameters, please check the offical XCL specification.
E.g. (Windows):
c:\xcl\comparator\comparator.exe c:\xcl\comparator\test\XCDL1.xcdl c:\xcl\comparator\test\XCDL2.xcdl -c c:\xcl\comparator\test\testConfigImage.xml
If the execution of Comparator was free of I/O errors, one should get a result file called 'copra.xml' (for the structure of this file see below). By default, the file is written to the directory where the Comparator executable file lies. Please note that these three parameters (source and target XCDL file and Comparator configuration file) are mandatory.
If you wish to let Comparator write the results to a different directory, you can use an optional switch '-o'. For example,
-o c:\xcl\comparator\test\result\
forces Comparator to write the result file into the directory indicated.
Concerning this feature, please note for the current version of Comparator:
  1. Always put a final slash at the end of your indicated directory.
  2. The directory you quote must already exist.

System Requirements and Dependencies

The software is written in C++ programming language. It is written in a platform-neutral style so it should also compile and run on different systems. We have actually tested and run it under three different configurations (the Windows configuration is also the developing configuration):
operating system compiler
Windows XP, professional, SP3 MingW (latest version)
Linux, openSUSE 10.2 (X86-64) Gnu (gcc, version 4.1.2 20061115, pre-release SUSE Linux)
Macintosh, Leopard, 10.5. (X86-32) Gnu (gcc, latest version)
There may be some implementation limitations given through the external libraries the Comparator uses .
The current version of the Comparator uses
  • the widely adopted Standard Template Library for C++ (STL), which is by default supported by most of the compilers available and therefore should not be a problem for successful implementation
  • for parsing and validation of the XML structures the Xerces-c libraries.
To successfully build the Comparator from source code, Xerces-c libraries must be installed onto your system. Xerces-c allows for a large number of systems. For download, configuration and installation of the libraries, as well as detailed information of supported platforms and compilers please visit the projects homepage: http://xerces.apache.org/xerces-c/. We recommend using either version 2.5.0 or version 2.8.0, since both have been tested in conjunction with the Comparator. Other versions should also work but this cannot be definitely guaranteed.