Share via


SharePoint 2010: Bulk Metadata Tagging

Bulk tagging uploads into SharePoint 2010 without third party solutions

This topic details a method that can be used to move content into SharePoint 2010 and automatically tag it based on folder hierarchy, without the need to resort to potentially expensive third party options.  

I recently answered a question on the UK User group forums regarding importing documents en mass from a scanning company. The user in question needed to be able to move hundreds of documents from a DVD and apply metadata appropriately. Being a public sector organisation, they were looking for a cost effective method that did not require developer resources.

The key requirement was that the organisation wanted to use Managed Metadata for some of the columns which immediately ruled out the option to use the Datasheet view for updating documents in bulk.

Options for import

The various options available to move content into SharePoint

Method Pros Cons
Upload each item one by one Document tagged on upload.
Can ensure key columns are filled in.
SLOW!
Use the Multiple Upload feature Quicker
Can use Edit in Dataview to set basic metadata
Have to revisit every document to add metadata for Managed Metadata values.
Drag/Drop using Windows Explorer view Quicker
Can use Edit in Dataview to set basic metadata
 Have to revisit every document to add metadata for Managed Metadata values.
Third Party Migration Tool Fast way of moving lots of content into SharePoint.
Generally intelligent tagging options available.
Cost.
Training and Setup time.
Custom Upload Tool Tailored to your requirements
Business logic can be applied
Cost
Development time

Personally, I like to use out of the box features wherever possible and in this case, SharePoint provides the Location based metadata defaults option in it's document libraries which may well solve this issue for us.

Example Scenario

In this example scenario, I'm going to take the contents of one of our Shared Drives that contains documents from our various suppliers and move the contents into SharePoint, tagging them with the supplier name sourced from the Managed Metadata Service.

The data currently exists on the file system in a series of folders which is probably a fairly typical structure in shared drives:-

  • f:\supplier documents
    • f:\supplier documents\adobe
    • f:\supplier documents\apple
    • f:\supplier documents\cisco
    • f:\supplier documents\hewlett packard
    • f:\supplier documents\microsoft

We're going to configure a SharePoint library to automatically tag our documents with the supplier name as they are uploaded based on the location that we copying them into. We'll do this by configuring the library folders with location based metadata defaults.

Create a Term Set

The first thing we need to do is create a new term set (Managing Enterprise Metadata (MSDN)) to represent our Supplier names. This gives us the ability to use Synonyms for suppliers such as Hewlett Packard (HP) and also helps improve search with the refinement panel.

Create the document library and folder structure

Create a new library within the site and then add a new column to the library. Call this column Supplier and select Managed Metadata. configure the column to accept values from the supplier term set that we created earlier. We could add more columns for use with this method if we wanted to, the only caveat is that they must accept default values to be able to be automatically populated using this method.

Now create a folder structure that mimics the Shared drive that we're copying our data from. (Note: I'm only going one folder deep in this example, however you could use mroe if you wanted, just keep an eye on URL length and folder depth restrictions.)

 

Apply the default column values

With the library configured, we can now configure the default column values for each of the folders. Click on library settings, then under General Settings, click on Column default value settings.

 

This opens us a hierarchial view of the library showing the folders that we created along with the columns that can be configured.

 

If we click on a folder in the left hand view, we are then able to select columns in the content pane and configure a default value which will affect ONLY that location, thus enabling us to have different values per folder.

For each of the folders, select it, click on Supplier and then configure the folder to default to the correct supplier name using the Managed Metadata field editor.

 

Once we return to the default column value screen, each of the folders with default values configured will have a green asterix next to them.

 

Testing the solution

The first test we'll make is a single upload. Using the browser, I'll navigate to the supplier document library, then into the Adobe folder and select Add new item. I'll choose a single document from my local drive and click upload.

The standard SharePoint edit file metadata modal dialog is shown, but you'll notice that our Supplier column has defaulted to Adobe. Click save to close the modal dialog and accept the defaults.

The second test is to use the multiple upload functionality through the add new items link, This time into the Cisco folder. I'll select 4 documents from my local drive and select ok. This time, instead of seeing the document metadata dialog, we're returned directly to the library view, however you should see that all four documents have been tagged with the Cisco supplier tag.

 

 

 

The final test is using the Explorer view to drag and drop a selection of documents into the library. Browse to the Microsoft folder within the library and then select Open in explorer view from the ribbon. Using a another explorer window, select a selection of documents and drag them into the Microsoft folder explorer view, dropping them into SharePoint. Once compelte, return to the SharePoint library in your browser and refresh the view. You should see all of the documents in the MIcrosoft folder tagged properly with Microsoft.

 

Removing the folders

One area that causes a lot of controversy in SharePoint is the use of folders in document libraries. This method obviously requires folders for the initial upload, however if you are in the no folders supporter camp, you can quite easily move all of those freshly uploaded documents out of the source folder into the root of the document library without losing the metadata values on the document. Just open in explorer view and move the documents to the root.

 

 

Once the next crawl runs, your documents will be visible in search properly tagged.

How does it work?

Under the hood, SharePoint registers a Synchronous event handler that applies the tags you select as a document is uploaded. So you shoudl bear this in mind if you're doing anything in your own event handlers or with workflows. Also, consider dragging and dropping files in batches as SharePoint will trigger an event handler on every document. Once the last default column value is removed, or the custom locations removed, SharePoint unregisters the event handler.

References