SharePoint Online Scanner

Introduction

The SharePoint Online Scanner allows extracting documents, folders and their related information from Microsoft SharePoint Online libraries.

Known issues and limitations

  • SPO Scanner might receive timeout error from SharePoint Online when scanning libraries with more than 5000 documents (#52865). This can be solved by increasing the timeout values depending on your situation.

    • There are 3 situations when the scanner can get a timeout exception:

      • The initialization phase: the java component waits for the C# component to retrieve all the objects that satisfy the conditions. This is solved by increasing the value for the initialization_timeout property which is present in the additionalConfig.properties file (...\lib\mc-sharepoint-online-scanner).

      • The communication between the Java component and the C# component: This is solved by increasing the value for the timeout property which is present in the serverConfig.properties file (...\lib\mc-sharepoint-online-scanner).

      • The communication between the C# component and Sharepoint: This is solved by increasing the value for the variable sharepointCommunicationTimeout which is present in the de.fme.mc.spo.scanner.winservice.exe.config file (...\lib\mc-sharepoint-online-scanner\CSOM_service)

Prerequisites

CSOM service

The migration-center SharePoint Online Scanner requires installing an additional component.

This additional component needs the .NET Framework 4.7.2 installed and it’s designed to run as a Windows service and must be installed on all machines where the a Job Server is installed.

To install this additional component, it is necessary to run an installation file, which is located within the SharePoint folder in the Jobserver instal: ...\lib\mc-sharepoint-online-scanner\CSOM_Service\install

To install the service run the install.bat file using administrative privileges. You need to start it manually after the install. Afterwards the service is configured to start automatically at system startup.

The CSOM service must be run with the same user as the Job Server service so that it has the same access to the export location.

When running the CSOM service with a domain account you might need to grant access to the account by running the following command: netsh http add urlacl url=http://+:57096/ user=<your user>

<your user> might be in the format domain\username or username@domain.com

To uninstall the service run the uninstall.bat file using administrative privileges.

Before uninstalling the Jobserver component, the CSOM service must be uninstalled as described here.

Port access

The app-only principal authentication used by the scanner calls the following HTTPS endpoints. Please ensure that the job server machine has access to those endpoints:

  • <tenant name>.sharepoint.com:443

  • accounts.accesscontrol.windows.net:443

Authentication

The scanner supports only app-principal authentication for connecting to SharePoint Online. The app-principal authentication comes in two flavors:

  • Azure AD app-only principal authentication Requires full control access for the migration-center application on your SharePoint Online tenant. This includes full control on ALL site collections of your tenant.

  • SharePoint app-only principal authentication Can be set to restrict the access of the migration-center application to certain site collections or sites.

Azure AD app-only

The migration-center SharePoint Online Scanner supports Azure AD app-only authentication. This is the authentication method for background processes accessing SharePoint Online recommended by Microsoft. When using SharePoint Online you can define applications in Azure AD and these applications can be granted permissions to your SharePoint Online tenant.

Please follow these steps in order to setup your migration-center application in your Azure AD.

The information in this chapter is based on the following Microsoft guidelines: https://docs.microsoft.com/en-us/sharepoint/dev/solution-guidance/security-apponly-azuread

Step 1: Create a self-signed certificate for your migration-center Azure AD application

In Azure AD when doing App-Only you typically use a certificate to request access: anyone having the certificate and its private key can use the app and the permissions granted to the app. The below steps walk you through the setup of this model.

You are now ready to configure the Azure AD Application for invoking SharePoint Online with an App-Only access token. To do that, you must create and configure a self-signed X.509 certificate, which will be used to authenticate your migration-center Application against Azure AD, while requesting the App-Only access token. First you must create the self-signed X.509 Certificate, which can be created using the makecert.exe tool that is available in the Windows SDK or through a provided PowerShell script which does not have a dependency to makecert. Using the PowerShell script is the preferred method and is explained in this chapter.

It's important that you run the below scripts with Administrator privileges.

To create a self-signed certificate with this script, which you can find in the <job server folder>\lib\mc-spo-batch-importer\scripts folder:

.\Create-SelfSignedCertificate.ps1 -CommonName "MyCompanyName" -StartDate 2020-07-01 -EndDate 2022-06-30

The dates are provided in ISO date format: YYYY-MM-dd

You will be asked to give a password to encrypt your private key, and both the .PFX file and .CER file will be exported to the current folder.

Save the password of the private key as you’ll need it later.

Step 2: Register the migration-center Azure AD application

Next step is registering an Azure AD application in the Azure Active Directory tenant that is linked to your Office 365 tenant. To do that, open the Office 365 Admin Center (https://admin.microsoft.com) using the account of a user member of the Tenant Global Admins group. Click on the "Azure Active Directory" link that is available under the "Admin centers" group in the left-side tree view of the Office 365 Admin Center. In the new browser's tab that will be opened you will find the Microsoft Azure portal (https://portal.azure.com/). If it is the first time that you access the Azure portal with your account, you will have to register a new Azure subscription, providing some information and a credit card for any payment need. But don't worry, in order to play with Azure AD and to register an Office 365 Application you will not pay anything. In fact, those are free capabilities. Once having access to the Azure portal, select the "Azure Active Directory" section and choose the option "App registrations". See the next figure for further details.

In the "App registrations" tab you will find the list of Azure AD applications registered in your tenant. Click the "New registration" button in the upper left part of the blade. Next, provide a name for your application, e.g. “migration-center” and click on "Register" at the bottom of the blade.

Once the application has been created copy the "Application (client) ID" as you’ll need it later.

Step 3: Configure necessary permissions for the migration-center application

Now click on "API permissions" in the left menu bar and click on the "Add a permission" button. A new blade will appear. Here you choose the permissions that are required by migration-center. Choose i.e.:

  • Microsoft APIs

    • SharePoint

      • Application permissions

        • Sites

          • Sites.FullControl.All

        • TermStore

          • TermStore.Read.All

        • User

          • User.Read.All

    • Graph

      • Application permissions

        • Sites

          • Sites.FullControl.All

Click on the blue "Add permissions" button at the bottom to add the permissions to your application. The "Application permissions" are those granted to the migration-center application when running as App Only.

Step 4: Uploading the self-signed certificate

Next step is “connecting” the certificate you created earlier to the application. Click on "Certificates & secrets" in the left menu bar. Click on the "Upload certificate" button, select the .CER file you generated earlier and click on "Add" to upload it.

The “Sites.FullControl.All” application permission requires admin consent in a tenant before it can be used. In order to do this, click on "API permissions" in the left menu again. At the bottom you will see a section "Grand consent". Click on the "Grand admin consent for" button and confirm the action by clicking on the "Yes" button that appears at the top.

Step 6: Setting the necessary parameters in the importer

In order to use Azure AD app-only principal authentication with the SharePoint Online Batch importer you need to fill in the following importer parameters with the information you gathered in the steps above:

Configuration parameters

Values

appClientId

The ID of the migration-center Azure AD application.

appCertificatePath

The full path to the certificate .PFX file, which you have generated when setting up the Azure AD application.

appCertificatePassword

The password to read the certificate specified in appCertificatePath.

SharePoint app-only

SharePoint app-only authentication allows you to grant fine granular access permissions on your SharePoint Online tenant for the migration-center application.

The information in this chapter is based on the following guidelines from Microsoft: https://docs.microsoft.com/en-us/sharepoint/dev/solution-guidance/security-apponly-azureacs https://docs.microsoft.com/en-us/sharepoint/dev/sp-add-ins/add-in-permissions-in-sharepoint

Step 1: Create a self-signed certificate for your migration-center Azure AD application

In Azure AD when doing App-Only you typically use a certificate to request access: anyone having the certificate and its private key can use the app and the permissions granted to the app. The below steps walk you through the setup of this model.

You are now ready to configure the Azure AD Application for invoking SharePoint Online with an App-Only access token. To do that, you must create and configure a self-signed X.509 certificate, which will be used to authenticate your migration-center Application against Azure AD, while requesting the App-Only access token. First you must create the self-signed X.509 Certificate, which can be created by using the makecert.exe tool that is available in the Windows SDK or through a provided PowerShell script which does not have a dependency to makecert. Using the PowerShell script is the preferred method and is explained in this chapter.

It's important that you run the below scripts with Administrator privileges.

To create a self-signed certificate with this script, which you can find in the <job server folder>\lib\mc-spo-batch-importer\scripts folder:

.\Create-SelfSignedCertificate.ps1 -CommonName "MyCompanyName" -StartDate 2020-07-01 -EndDate 2022-06-30

The dates are provided in ISO date format: YYYY-MM-dd

You will be asked to give a password to encrypt your private key, and both the .PFX file and .CER file will be exported to the current folder.

Save the password of the private key as you’ll need it later.

Step 2: Register the migration-center Azure AD application

Next step is registering an Azure AD application in the Azure Active Directory tenant that is linked to your Office 365 tenant. To do that, open the Office 365 Admin Center (https://admin.microsoft.com) using the account of a user member of the Tenant Global Admins group. Click on the "Azure Active Directory" link that is available under the "Admin centers" group in the left-side tree view of the Office 365 Admin Center. In the new browser's tab that will be opened you will find the Microsoft Azure portal (https://portal.azure.com/). If it is the first time that you access the Azure portal with your account, you will have to register a new Azure subscription, providing some information and a credit card for any payment need. But don't worry, in order to play with Azure AD and to register an Office 365 Application you will not pay anything. In fact, those are free capabilities. Once having access to the Azure portal, select the "Azure Active Directory" section and choose the option "App registrations". See the next figure for further details.

In the "App registrations" tab you will find the list of Azure AD applications registered in your tenant. Click the "New registration" button in the upper left part of the blade. Next, provide a name for your application, e.g. “migration-center” and click on "Register" at the bottom of the blade.

Once the application has been created copy the "Application (client) ID" as you’ll need it later.

Step 3: Uploading the self-signed certificate and generate secret key

Next step is “connecting” the certificate you created earlier to the application. Click on "Certificates & secrets" in the left menu bar. Click on the "Upload certificate" button, select the .CER file you generated earlier and click on "Add" to upload it.

After that, you need to create a secret key. Click on “New client secret” to generate a new secret key. Give it an appropriate description, e.g. “migration-center” and choose an expiration period that matches your migration project time frame. Click on “Add” to create the key.

Store the retrieved information (client id and client secret) since you'll need this later! Please safeguard the created client id/secret combination as would it be your administrator account. Using this client id/secret one can read/update all data in your SharePoint Online environment!

Step 4: Granting permissions to the app-only principal

Next step is granting permissions to the newly created principal in SharePoint Online.

If you want to grant tenant scoped permissions this granting can only be done via the “appinv.aspx” page on the tenant administration site. If your tenant URL is https://contoso-admin.sharepoint.com, you can reach this site via https://contoso-admin.sharepoint.com/_layouts/15/appinv.aspx.

If you want to grant site collection scoped permissions, open the “appinv.aspx” on the specific site collection, e.g. https://contoso.sharepoint.com/sites/mysite/_layouts/15/appinv.aspx.

Once the page is loaded add your client id and look up the created principal by pressing the "Lookup" button:

Please enter “www.migration-center.com” in field “App Domain” and “https://www.migration-center.com” in field “Redirect URL”.

To grant permissions, you'll need to provide the permission XML that describes the needed permissions. The migration-center application will always need the “FullControl” permission. Use the following permission XML for granting tenant scoped permissions:

<AppPermissionRequests AllowAppOnlyPolicy="true">
 <AppPermissionRequest Scope="http://sharepoint/content/tenant" Right="FullControl" />
 <AppPermissionRequest Scope="http://sharepoint/taxonomy" Right="Read" />
</AppPermissionRequests>

Use this permission XML for granting site collection scoped permissions:

<AppPermissionRequests AllowAppOnlyPolicy="true">
 <AppPermissionRequest Scope="http://sharepoint/content/sitecollection" Right="FullControl" />
 <AppPermissionRequest Scope="http://sharepoint/taxonomy" Right="Read" />
</AppPermissionRequests>

When you click on “Create” you'll be presented with a permission consent dialog. Press “Trust It” to grant the permissions:

Please safeguard the created client id/secret combination as would it be your administrator account. Using these one can read/update all data in your SharePoint Online environment!

Step 5: Setting the necessary parameters in the importer

In order to use SharePoint app-only principal authentication with the SharePoint Online importer you need to fill in the following importer parameters with the information you gathered in the steps above:

Configuration parameters

Values

appClientId

The ID of the SharePoint application you have created.

appClientSecret

The client secret, which you have generated when setting up the SharePoint application.

Scanner Configuration

To create a new SharePoint Online Scanner, create a new scanner and select SharePoint Online from the Adapter Type drop-down. Once the adapter type has been selected, the Parameters list will be populated with the parameters specific to the selected adapter type. Mandatory parameters are marked with an *.

The Properties of an existing scanner can be accessed after creating the scanner by double-clicking the scanner in the list or by selecting the Properties button/menu item from the toolbar/context menu. A description is always displayed at the bottom of the window for the selected parameter.

Multiple scanners can be created for scanning different locations, provided each scanner has a unique name.

Scanner Parameters

The common adaptor parameters are described in Common Parameters.

The configuration parameters available for the SharePoint Online Scanner are described below:

  • tenantName* The name of your SharePoint Online Tenant

    Example: Contoso

    There are several web site that explain how to determine a SharePoint Online tenant name, e.g. https://morgantechspace.com/2019/07/how-to-find-your-tenant-name-in-office-365.html

  • tenantURL* The URL of your SharePoint Online Tenant

    Example: https://contoso.sharepoint.com

  • siteName* The path to your target site collection.

    Example: /sites/My Site

  • appClientId* The ID of either the migration-center Azure AD application or the SharePoint application.

    Example: ab187da0-c04d-4f82-9f43-51f41c0a3bf0

  • appCertificatePath The full path to the certificate .PFX file, which you have generated when setting up the Azure AD application.

    Example: D:\migration-center\config\azure-ad-app-cert.pfx

  • appCertificatePassword The password to read the certificate specified in appCertificatePath.

  • appClientSecret The client secret, which you have generated when setting up the SharePoint application (SharePoint app-only principal authentication).

  • proxyServer The name or IP of the proxy server. Example: http://myProxy.com

  • proxyPort The port of the proxy server.

  • proxyUsername The username if required by the proxy server.

  • proxyPassword The password for the proxy username.

  • camlQuery CAML statement that will be used to retrieve the ids of objects that will be scanned.

    In case of setting this parameter the parameters excludeListAndLibraries, includeListAndLibraries, scanSubsites, excludeSubsites must not be set.

  • excludeListsAndLibraries The list of libraries and lists path to be excluded from scanning.

  • includeListsAndLibraries List of Lists and Libraries the connector should scan.

  • excludeSubsites The list of subsites path to be excluded from scanning.

  • excludeContentTypes The list of content types to be excluded from scanning.

  • excludeFolders The list of folders to be excluded from scanning. All the folders with the specified name from the site/subsite/library/list depending of scanner configuration will be ignored by the scanner. To exclude a specific folder, it is necessary to specify the full path.

    Multiple values can be entered and separated with the “,” character.

    Example: folder1 then all the folders with the folder1 name from the site/subsites/library/list will be excluded.

    <Some_Library>/<Test_folder>/folder1 the scanner will exclude just the folder1 that is in the Test_folder.

  • includeFolders List of folders the connector should scan. All the folders with the specified name from the site/subsite/library/list depending of scanner configuration will be scanned. To scan a specific folder, it is necessary to specify the full path.

    The values of the parameter “excludeFolders” will be ignored if this attribute contains values.

    Multiple values can be entered and separated with the “,” character.

    Example: folder1 then all the folders with the folder1 name from the site/subsites/library/list will be scanned.

    <Some_Library>/<Test_folder>/folder1 the scanner will scan just the folder1 that is in the Test_folder.

  • scanSubsites Flag indicting if the objects from subsites will be scanned.

  • scanDocuments Flag indicting if the documents scanned will be added as migration-center objects.

  • scanFolders Flag indicting if the folders scanned will be added as migration center objects.

  • includeAttributes The internal attributes that will be scanned even if the value is null

  • scanLatestVersionOnly Flag indicating if just the latest version of a document will be scanned.

  • computeChecksum If enabled the scanner calculates a checksum for every content it scans. These checksums can be used during import to compare against a second checksum computed during import of the documents. If the checksums differ, it means the content has been corrupted or otherwise altered, causing the affected document to be rolled back and transitioned to the “import error” status in migration-center.

  • hashAlgorithm Specifies the algorithm to be used if the "computeChecksum" parameter is checked. Supported algorithms: MD5, SHA-1, SHA-224, SHA-256, SHA-384 and SHA-512.

    Default algorithm is MD5.

  • hashEncoding The encoding type which will be used for checksum computation. Supported encoding types are HEX, Base32, Base64.

    Default encoding is HEX.

  • exportLocation* Folder path. The location where the exported object content should be temporary saved. It can be a local folder on the same machine with the Job Server or a shared folder on the network. This folder must exist prior to launching the scanner and must have write permissions. migration-center will not create this folder automatically. If the folder cannot be found an appropriate error will be raised and logged. This path must be accessible by both scanner and importer so if they are running on different machines, it should be a shared folder.

  • numberOfThreads Maximum number of concurrent threads.

    Default is 10 and maximum allowed is 20.

  • loggingLevel*

    See: Common Parameters.

Parameters marked with an asterisk (*) are mandatory.

OneDrive

You can scan documents from OneDrive using the same parameters as for scanning SharePoint Online libraries but with a different format for some of them.

Scanning from OneDrive requires the Azure AD app-only principal authentication with the self-signed certificate and password. It does not work with SharePoint app-only credentials with the appClientSecret.

  • tenantName The name of your SharePoint Online Tenant Example: fme

  • siteName For scanning from a personal OneDrive the URL format will be: https://tennantName-my.sharepoint.com e.g. https://fme-my.sharepoint.com

  • includeListsAndLibraries For scanning from a personal OneDrive use /personal/<your personal site> as site, e.g. /personal/john_doe_onmicrosoft_com

  • tenantURL For scanning from a personal OneDrive use Documents as library

Parameters not mentioned here are either not used when scanning from OneDrive or do not have any specific requirement.

CAML query

The SharePoint Online scanner can use SharePoint CAML queries for filtering which objects are to be scanned. Based on the entered query, the scanner scans documents and folders in the lists/libraries.

The queries used must only contain the content that would be placed inside the <Where> block. The scope is already set to recursive.

The following example shows a simple CAML query for scanning the contents of the Docs folder. In this example "mc" is the source site, "Versioning" a subsite, "VersionNumber" a library and "Docs" a folder.

<BeginsWith> 
   <FieldRef Name='FileDirRef'/> 
   <Value Type='Text'>mc/Versioning/VersionNumber/Docs</Value>
</BeginsWith>

More complex queries can also be used. This next example scans only documents created before a chosen date from 2 different subsites.

<And>
   <Or>
        <BeginsWith> 
  	     <FieldRef Name='FileDirRef'/> 
  	     <Value Type='Text'>mc/Versioning/VersionNumber/Docs</Value>
        </BeginsWith>
	<BeginsWith> 
  	     <FieldRef Name='FileDirRef'/> 
  	     <Value Type='Text'>mc/docLib/SubFolder</Value>
	</BeginsWith>
   </Or>
   <Leq>
    	<FieldRef Name='Created' />
    	<Value IncludeTimeValue='TRUE' Type='DateTime'>2020-12-31T00:00:00Z</Value>
   </Leq>
</And>

For details on how to form CAML queries for each version of SharePoint please consult the official Microsoft MSDN documentation.

When using the CAML query parameter “query”, the parameters "excludeListAndLibraries", "includeListAndLibraries", "scanSubsites", "excludeSubsites", "excludeFolders", "includeFolders" must not be set. Otherwise the scanner will fail to start with an error message.

Permissions

The SharePoint Online scanner can extract permission information for documents and folders. Note that only unique permissions are extracted. Permissions inherited from parent objects are not extracted by the scanner.

Additional Settings

There is a configuration file for additional settings regarding the SharePoint Online Scanner. Located under the …/lib/mc-sharepointonline-scanner/ folder in the Job Server install location it has the following properties that can be set:

  • excluded_file_extensions List of file extensions that will be ignored by the scanner.

    Default: .aspx|.webpart|.dwp|.master|.preview.

  • excluded_attributes List of attributes that will be ignored by the scanner. Use "|" as a delimiter when specifying more than one attribute.

  • initialization_timeout Amount of time in milliseconds to wait before the scanner throws a timeout error during the initialization phase.

    Default: 21600000 ms

Additional logs

An additional log file is generated by the SharePoint Online Scanner.

The location of this log file is in the same folder as the regular SharePoint Online scanner log files with the name: mc-sharepointonline-scanner.log.

Last updated