Domino Scanner

Introduction

Scanner is the term used in migration-center for an input connector. Using the IBM Domino scanner module to read the data that needs processing into migration-center is the first step in a migration project, thus “scan” also refers to the process used to input data to migration-center.

The IBM Domino Scanner is available since migration-center 3.2.5. It extracts documents, metadata and attachments from IBM Domino/Notes applications and use them as input for migration-center. After the scan the generated data can be processed and migrated to other systems supported by the various migration-center importers.

The currently supported formats of the documents export are Domino XML (dxl), Hypertext Markup Language (html), ARPA Internet Text Message (rfc 822/eml) and HTML from the EML. In addition, the scanner is capable of generating a Portable Document Format (pdf) rendition based on a DXL file of that document.

The IBM Domino Scanner currently supports all IBM Notes/Domino versions 9.x and above. Documents from applications that have been built with older IBM Notes/Domino versions can be extracted without any limitation.

The module works as a job that can be run at any time and can even be executed repeatedly. For every run a detailed history and log file are created.

A Scanner is defined by a unique name, a set of configuration parameters and an optional description.

IBM Domino scanners can be created, configured, started and monitored through migration-center client, but the corresponding processes are executed by migration-center Job Server.

Installation

Prerequisites

32 bit vs 64 bit

The scanner is available in 32 bit and 64 bit versions. Each version has different prerequisites and limitations. Both versions require additional software installed on the migration-center Jobserver.

The 32-bit scanner requires:

The 64-bit scanner requires:

Because the 64-bit version uses the IBM Domino software the scanner can currently not generate any formats other than DXL and PDF

Additional features

If you are scanning Domino documents containing Object Linking and Embedding (OLE) objects, Apache OpenOffice 4.1.5 or later must be installed. See section Exporting OLE objects.

For transforming the documents into PDFs an additional PDF Generation Module needs to be installed on a second system which acts as a Rendition Server. See section Generating PDF renditions.

The PDF Generation Module is licensed separately

Installing the software

Regardless of using the 32-bit or 64-bit scanner, the installation steps are the same. All the steps should be performed on the Jobserver machine where the Domino scanner will be run.

  1. Install IBM Notes and/or IBM Domino software

  2. Add the folder path of the software's executables in the PATH environment variable

  3. Install the appropriate Microsoft Visual C++ 2017 Redistributable Package

  4. Install the migration-center Jobserver. See Installation guide

  5. Locate the mc-domino-scanner_windows-x86-x64_[ver].exe installer in the Domino package

  6. Run the installer using Run As Admin

  7. Set the install location to the .../lib/mc-domino-scanner folder of the Jobserver

  8. Start the Migration Center Jobserver Service

Switching between 32 bit and 64 bit

By default the Jobserver is configured to work with the 32 bit version of Domino Scanner.

In order to use the 64 bit version you need to change the following lines having x86 to x64 in the wrapper.conf:

wrapper.java.additional.4=-Djava.library.path=./lib/mc-dctm-adaptor;./lib/mc-outlook-adaptor;./lib/mc-domino-scanner/lib/x86;${path_env}

wrapper.app.env.path=./lib/mc-domino-scanner/lib/x86;${path_env}

And also change the java used by the jobserver to 64 bit by changing the JAVA_HOME or JRE_HOME environment variable and re-installing the Jobserver service.

Timezone settings

IBM Domino stores all date and time information based on GMT/UTC internally. When a datetime value is converted into text for display purposes, the value is always displayed using the client’s current timezone settings.

Therefore the timezone settings on the migration-center Jobserver will be used to convert values of datetime attributes.

If you require date and time values to be scanned based on a specific timezone set migration-center Jobserver’s timezone accordingly.

If you require “normalized” date and time values in migration-center, set the migration-center Jobserver’s timezone to GMT/UTC.

Exporting objects from an IBM Domino/Notes application

The IBM Domino Scanner connects to a specified IBM Domino/Notes application and can extract documents, content of richtext fields (composite items), metadata and attachments out of this application based on user-defined criteria. See chapter IBM Domino Scanner parameters below for more information about the features and configuration parameters available in the IBM Domino Scanner.

After a scan has completed, the newly scanned documents along with their metadata, attachments and the content of the richtext fields they contain are available for further processing in migration-center.

Scanner Configuration

To create a new IBM Domino Scanner job, specify the respective adapter type in the scanner properties window – from the list of available connectors, “Domino” must be selected. Once the adapter type has been selected, the list of parameters will be populated with the parameters specific to the selected adapter type.

The properties window of a scanner can be accessed by double-clicking a scanner in the list or by selecting the [Properties] button for the corresponding selected entry on the toolbar or context menu.

Scanner Parameters

The common adaptor parameters are described in Common Parameters.

The configuration parameters available for the Database Scanner are described below:

  • dominoServer The IBM Domino server used to connect to the application. If the application (”.nsf” file) is stored and accessed locally without using an IBM Domino server, leave this field empty.

  • dominoDatabase* The filename of the “.nsf” file that holds the application’s documents. If the “.nsf” file is stored inside the IBM Domino/Notes data directory, the path of the “.nsf” file relative to the IBM Domino/Notes data directory is sufficient, otherwise specify the fully qualified filename of the “.nsf” file.

    If PDF is used as either the primary format or one of the secondary formats and PDF is to be generated based on existing documents (s.a.), the value for “dominoServer” and “dominoDatabase” will be passed to PDF generation module. Therefore, the database filename should be specified relative to the IBM Domino/Notes data directory.

  • idFilename* The filename of the ID file used to access the application.

    This ID must have full permissions for all documents that are due to be scanned.

  • password The password for the ID file referenced in parameter “idFilename”.

  • selectionFormula* An IBM Notes formula to select the documents that should be processed by the scanner.

    The default is “select @all” which will process all documents

  • profileName* The name of the profile used to extract information out of the IBM Domino/Notes application.

    The default value for this parameter “mcProfile” which will cause the scanner to process the application according to the the other scanner configuration parameters, e.g. extract document metadata, document contents and attachments etc. By changing the value to ''mcStatistics'' the scanner will ignore most of the other scanner configuration parameters and - instead of processing each document (extract metadata, document contents and attachments) - generate a text file with statistical information about the application (forms, documents, attributes). The generated file will be placed inside the folder specified by scanner parameter “exportLocation” and named “<jobID>_statistics.txt”. The profile “mcStatistics” will not generate any objects in the migration-center database.

    This parameter’s value must not be changed to any other value than “mcProfile” or “mcStatistics” unless a customized profile has been developed to fulfill specific customer needs.

  • primaryDocumentFormat* The primary format used to extract the document. The resulting file will be treated as the primary document content in mc database. Valid values are “dxl”, “html”, “eml”, “eml2html” and “pdf”.

    Details regarding the different formats can be found in chapter Document Formats.

    The default value is “dxl”.

    The 64-bit version of the scanner can only generate “DXL” and “PDF”. Configuring any other format will cause the scanner to fail.

  • secondaryDocumentFormats A list of all document formats that should be generated in addition to the primary document format (see “primaryDocumentFormat” above). Multiple values must be separated by “|” (pipe). Valid values are “dxl”, “html”, “eml”, “eml2html” and “pdf”.

    The resulting files will be associated with the mc object as secondary formats. Their (fully-qualified) filenames are made available using the mc object’s “secondaryFormats” attribute which is a multi-value attribute.

    Details regarding the different formats can be found in chapter Document Formats.

    The 64-bit version of the scanner can only generate “DXL” and “PDF”. Configuring any other format will cause the scanner to fail.

  • includeAttributes A list of all document attributes (metadata) that should be extracted from the IBM Domino/Notes application and made available inside the MC database. If all attributes should be extracted, leave this field empty

  • excludedAttributeTypes A filter specifying Domino data types that should not be exported from the IBM Domino/Notes application.

    Please refer to chapter Domino attribute types for details.

    Default value is “1” which will exclude all composite items from being exported to the migration-center database

  • attributeSplitterMaxChunkSizeBytes Large attribute values are split into chunks of max. bytes as specified with correct handling of multi-byte characters to avoid any SQL exceptions.

    Migration-center uses Oracle’s “varchar2” datatype which has a

    Maximum of 4,000 bytes.

  • exportCompositeItems* Specifies whether composite items (i.e. richtext fields) contained in an IBM Domino/Notes document (e.g. an e-mail’s “Body” element) should be extracted from the document and made available as separate richtext files (RTF format). Valid values are “false” and “true” as well as “0” and “1”.

    If this option is chosen, the scanner will generate one RTF file for each of an IBM Domino/Notes document’s composite items. The name of the file will be created as <document’s NoteID>_<item’s name>.rtf.

    This option is especially useful if the document’s contents (typically contained in richtext fields) should be editable once the document has been migrated into the target system.

    This feature is not supported with the 64-bit version of the scanner.

  • includedCompositeItems A list of names of composite items in a document (e.g. “Body”) that should be extracted into separate richtext files. Multiple values must be separated by “|” (pipe), If all composite items should be extracted, leave this field empty.

    If you want to exclude specific attributes, prefix each attribute name with a “!”.

    It is not possible to mix include and exclude operations. If one composite item’s name in the list is prefixed with “!”, then only those composite item names starting with “!” will be considered and the corresponding items will be excluded

  • exportAttachments* Specifies whether attachments contained in the IBM Domino/Notes documents should be extracted from the document in their native format and made available as separate MC objects. Valid values are “false” and “true” as well as “0” and “1”.

  • embedAttachmentsIntoPDF* Determines whether the Domino documents’ attachments are extracted and embedded into a PDF rendition of the Domino document. If this parameter is set to true:

    - all attachments will automatically be extracted from the document independent of “exportAttachments” parameter’s value,

    - a PDF rendition will automatically be created even if it has not been requested according to the values of parameters “primaryDocumentFormat” or “secondaryDocumentFormats”.

  • embedLinksIntoPDF If a PDF rendition is requested and this parameter is set to true, links (Domino document links and URL links) contained in the original Domino document will be added as bookmarks to the PDF file.

    The default value is “false”.

  • initializingFoldersDocumentIsLinkedTo Defines whether the names of the folders that a document is linked to should be read when a document is opened and made available in migration-center

  • cachingFoldersDocumentsAreLinkedTo Defines whether the lookup table for finding folders that a document is linked to is read only once and cached or read for each document

  • exportLocation* The location where the exported object content should be temporary saved. It can be a local folder on the machine that runs the job server or a shared folder on the network.

    This folder must exist prior to launching the scanner and the MC user must have write permission for it. MC will not create this folder automatically. If the folder cannot be found, an appropriate error will be raised and logged.

    This path must be accessible by both scanner and importer. Therefore, if scanner and importer are running on different machines, using a shared network folder is advisable.

  • loggingLevel* See Common Parameters.

Parameters marked with an asterisk (*) are mandatory.

Document Formats

The IBM Domino Scanner for fme migration-center supports the generation of different output formats for a Domino document. Each of the formats has its advantages and disadvantages. Which one best suits your needs can be determined by closely looking at your requirements, i.e. how users should work with the documents once migration into the new target system has been completed.

The formats currently supported will be described in detail in the following sections.

The .MSG and eml2HTML formats require an additional license for creating.

Domino XML (DXL)

The Domino XML (DXL) format is an XML format that has been defined by IBM/Lotus. It has been around for a while (at least since Domino version 6). A DXL file represents an entire Domino document including all its metadata, richtext elements and attachments.

The generation of DXL files from Domino documents relies on core functionality of Domino’s C-API as provided by IBM.

DXL files can be used to extract any document information from Domino applications. Based on special helper applications that are not part of Domino/Notes, a DXL file can be re-imported back into the original Domino application in order to read its content or otherwise work with the document at a later point in time.

DXL is especially useful whenever Domino documents should be transformed into PDF. The “PDF Generation Module” which is available as an add-on product for the IBM Domino Scanner makes use of the DXL format for PDF generation.

ARPA Internet Text Message (RFC 822/EML)

The ARPA Internet Text Message format (RFC 822) describes the syntax for messages that are sent among computer users (e-mail). The EML file format adheres to RFC 822.

Any Domino document – not only e-mails – can be transformed into EML format based on core functionality of Domino’s C-API as provided by IBM. An EML file contains the document’s content, its metadata as well as its attachments.

The EML format does not guarantee preservation of the Domino document’s integrity. Information from the document maybe lost or changed during conversion into EML (see Domino C-API documentation).

The major benefit of EML is that – since version 8 of Notes – an EML file can be opened in Notes again without the need for special helper applications.

Hypertext Markup Language (HTML)

Hypertext Markup Language (HTML) files can be generated for Domino documents based on two different approaches both of which will now be described.

Hypertext Markup Language (HTML) – direct approach

The Domino C-API offers the ability to directly transform a domino document into an HTML file.

As with the EML file format, the direct HTML generation based on the Domino C-API has some issues regarding completeness of information. One example are images that had been embedded into richtext fields. Those images will not be visible in the HTML file created.

EML to Hypertext Markup Language (EML2HTML) – indirect approach

Besides the direct approach described in the previous section, HTML can also be created from the EML format.

In most scenarios that the Domino scanner has been tested on, the result of the indirect approach had a much higher quality than that of the direct approach.

Generating EML2HTML requires a third party library that needs to be purchased separately. Please contact your fme sales representative for details.

Microsoft Message Format (MSG)

The MSG format is the format that is used by Microsoft Outlook to store e-mails on the filesystem. It’s a container format that includes the e-mail and all its attachments.

Generating MSG requires a third-party library that needs to be purchased separately. Please contact your fme sales representative for details.

Richtext format (RTF)

The Domino scanner can extract the entire Domino document (not just the document’s richtext fields) as a single RTF file. This functionality is provided by the Domino C-API.

Portable Document Format (PDF/PDF/a-1a/PDF/a-1b)

Based on the add-on “PDF Generation Module” (see Exporting OLE objects), the Domino scanner is capable of generating PDF, PDF/a-1a or PDF/a-1b files for any type of Domino document – independent of the application it originates from.

All the PDF formats preserve the Domino document in a read-only form that looks like the document opened in Notes.

The PDF generation module takes care of collapsible sections, fixed-width images and tables and other Domino specific features that PDF printing might interfere with.

If required, all the Domino document’s attachments can be re-attached to the PDF file that was generated (see parameter “embedAttachmentsIntoPDF”). Thereby, the entire e-mail will be preserved in a read-only format that can be viewed anywhere at any time requiring a standard PDF reader only.

Exporting OLE objects

If the IBM Domino documents contain OLE embedded objects, Apache OpenOffice 4.1.5 or later must be installed and configured on the migration-center job server in order to properly extract the OLE objects.

Install Apache OpenOffice 4.1.5 on the migration-center job server.

Add the folder containing the “soffice.exe” file to the system’s search path. This folder is typically:

<Apache OpenOffice installation folder>/program

Add the following entry to the file “wrapper.conf” inside the migration-center server components installation folder and replace <#> with an appropriate value for your installation:

wrapper.java.classpath.<#>=<Apache OpenOffice installation folder>/program/classes/*.jar

Open the configuration file „documentDirectoryRuntimeConfiguration.xml“ located in subfolder „lib/mc-domino-scanner/conf“ of your migration-center server components‘ installation folder in your favorite editor for XML files.

Go to line 83 of the file which looks like:

<parameter name="exportOLEObjects">false</parameter>

and replace “false” with “true”.

The entry inside the configuration file should look like:

<parameter name="exportOLEObjects">true</parameter>

If you want to use a different port for the Apache OpenOffice server than the default port (8100), go to line 84 of the file:

<!--<parameter name="apacheOpenOfficePort">8100</parameter>-->

Uncomment it and and replace “8100” with the portnumber to use, e.g “1234”.

The entry inside the configuration file should look like:

<parameter name="apacheOpenOfficePort">1234</parameter>

Save the configuration file.

Generating PDF renditions

While PDF generation can be activated in the scanner’s configuration (parameters “primaryDocumentFormat”, “secondaryDocumentFormats” and “embedAttachmentsIntoPDF”), the setup of PDF generation requires and the additional “PDF Generation Module”.

From a technical perspective, the “PDF Generation Module” requires an additional system (“rendition server”). This system will be used to print any IBM Notes document using a PDF printer driver based on IBM Notes’ standard print functionality. The process for PDF generation is as follows:

  1. The scanner submits a request to create a PDF rendition for an existing Domino document or a DXL file to PDF Generation Module on the rendition server.

  2. PDF Generation Module creates a PDF rendition of the document.

  3. If PDF generation was successful, PDF Generation Module will save the PDF to a shared network drive.

  4. PDF Generation Module will signal success or failure to the scanner.

Setting up the rendition server requires additional configurative actions. For each IBM Domino application/database template that was used to create documents, an empty database needs to be created based on this template and either made available locally on the rendition server or on the IBM Domino server.

Each of these empty databases needs to be prepared for PDF printing. As necessary configuration steps vary depending on the application that is being worked on, they cannot be described here.

Please contact your fme representative should you wish to implement PDF generation for migration of an IBM Domino application/database.

Log files

A complete history is available for any IBM Domino Scanner job from the respective item’s history window. It is accessible through the [History] button/menu entry on the toolbar/context menu.

The History window displays a list of all runs for the selected job together with additional information, such as the number of processed objects, the start and ending time and the status.

Double clicking an entry or clicking the [Open] button on the toolbar opens the log file created by that run. The log file contains more information about the run of the selected job:

  • version information of the migration-center Server Components the job was run with

  • the parameters the job was run with

  • the execution summary that contains the total number of objects processed, the number of documents and folders scanned or imported, the count of warnings and errors that occurred during runtime

Log files generated by the IBM Domino Scanner can be found in the server components installation folder of the machine where the job was run, e.g. …\fme AG\migration-center Server Components <Version>\logs

The amount of information written to the log files depends on the setting specified in the ‘loggingLevel’ start parameter for the respective job.

Troubleshooting

Common errors

Here are causes and solutions for some common errors when trying to setup the Domino Scanner:

DominoBackendJNI is missing dependent libraries; please check the installation prerequisites!

This error message means you need to do one of the following things if not done correctly: - The IBM Notes or IBM Domino install path was not correctly to the PATH variable - Install the correct VC++ Redistributable package - Reinstall the Jobserver Windows Service using UninstallWinNTService.bat and InstallWinNTService.bat

Known Issues

The following issues with the MC Domino Scanner are known to exist and will be fixed in later releases:

  • The scanner requires that the temporary directory for the user running MC Job Server Service exists and that the user can write to this directory. If the directory does either not exist or the user does not have write permission to the directory, the creation of temporary files during document and attachment extraction will fail. The logfile will show error messages like

„INFO | jvm 1 | 2014/10/02 12:06:26 | 12:06:26,850 ERROR [Job 1351] com.think_e_solutions.application.documentdirectory… - java.io.IOException: The system cannot find the path specified“.

To work around this issue, make sure the temporary folder exists and the user has write permission for this folder. If the MC Job Server is started manually as a normal user then the Temp folder should be C:\Users\Username\AppData\Local\Temp. Therefore, if the MC Job Server is run as a service by the Local System account, the folder is one of the following:

For the 32-bit version of Windows:

C:\Windows\System32\config\systemprofile\AppData\Local\Temp

For the 64-bit version of Windows:

C:\Windows\SysWOW64\config\systemprofile\AppData\Local\Temp

  • If a document is exported from IBM Domino but the related entries in the mc database cannot be created (e.g. because an attribute’s value exceeds the maximum number of characters allowed for a field in the mc database), the related files can be found in the filesystem (inside the export directory). If this document is scanned again, it will be treated as a new document, not as an update.

  • If the scanner parameter “relationType” is set to “relation”, relations will be automatically deleted by migration-center if they do not exist anymore. If the scanner parameter “relationType” is set to “object”, objects representing relationships cannot be deleted if the relation is invalidated.

Example: If a document had one attachment when scanned in scanner run #1 and that attachment was removed from the document before scanner run #2, the scanner cannot remove the object representing the “attachment” relation between document and attachment (created in scanner run #1) in scanner run #2.

  • If a PDF rendition is requested and DXLUtility receives the request to generate the rendition but isn’t able to import the DXL file into the appropriate IBM Domino database on the rendition server, it’s likely that the shared folder used to transfer DXL and PDF files between the scanner and PDF Generation Module cannot be read by the user running PDF Generation Module on the rendition server.

  • The scanner will crash the Java VM if the parameter “exportCompositeItems” is set to “true” and the log level in log4j.xml (located in subdirectory “conf” of the scanner installation directory) is set to “ERROR”.

  • The 64-bit version of the scanner relies on IBM Domino. As Domino lacks the required libraries to export “EML”, “HTML” or “RTF”, the 64-bit version of the scanner cannot export documents in any other format than “DXL” or “PDF”. If other formats are required, the scanner’s 32-bit version needs to be run based on IBM Notes instead.

Domino attribute types

The following table lists all (relevant) Domino attribute types.

The scanner parameter “excludedAttributeTypes” is a logical “OR” of all types that should be excluded from the scan.

Type

Numeric value

TYPE_ACTION

16

TYPE_ASSISTANT_INFO

17

TYPE_CALENDAR_FORMAT

24

TYPE_COLLATION

2

TYPE_COMPOSITE

1

TYPE_ERROR

0

TYPE_FORMULA

1536

TYPE_HIGHLIGHTS

12

TYPE_HTML

21

TYPE_ICON

6

TYPE_INVALID_OR_UNKNOWN

0

TYPE_LS_OBJECT

20

TYPE_MIME_PART

25

TYPE_NOTELINK_LIST

7

TYPE_NOTEREF_LIST

4

TYPE_NUMBER

768

TYPE_NUMBER_RANGE

769

TYPE_OBJECT

3

TYPE_QUERY

15

TYPE_RFC822_TEXT

1282

TYPE_SCHED_LIST

22

TYPE_SEAL

9

TYPE_SEAL_LIST

11

TYPE_SEALDATA

10

TYPE_SIGNATURE

8

TYPE_TEXT

1280

TYPE_TEXT_LIST

1281

TYPE_TIME

1024

TYPE_TIME_RANGE

1025

TYPE_UNAVAILABLE

512

TYPE_USER_DATA

14

TYPE_USERID

1792

TYPE_VIEW_FORMAT

5

TYPE_VIEWMAP_DATASET

18

TYPE_VIEWMAP_LAYOUT

19

TYPE_WORKSHEET_DATA

13