PowerShell: Add MSG file metadata to a SharePoint list
Introduction
SharePoint has an out-of-box feature to extract metadata such as To, From, CC and Subject from Exchange's EML file and save it in a column, however, SharePoint cannot do the same for MSG files hence this article describes how to use power shell to extract metadata from MSG files and update a SharePoint list.
Setting the scene
Consider a scenario where you have several MSG files that you want to store in a SharePoint document library and in the process extract metadata from it. The MSG files are in a document library and a list will have the metadata and somehow a relationship is built between them.
Requirements
1. PowerShell with SharePoint2010 or 2013 module
2. Notepad to edit the PS1 file
3. Temporary location to extract the Exchange MSG files
4. An open source application to unzip MSG files. In this example we are using 7-zip.
5. A SharePoint document library and list.
Document Library and List
A document library called 'EmailDocs' currently exists with several MSG files present there. We also have a custom list called 'EmailDocs2' with new columns added. Below are screenshots of both EmailDocs and EmailDocs2's settings.
EmailDocs
https://kzpi9q.dm2301.livefilestore.com/y2pwyp5Dqm-LHfF3TDzY5gV5ur6gy2JBcWNWRX2MVB46MgfcKwR-gsXBfSVUjG8xhe2cEtkQW_ECTc6LyDdqNE9RxbSmaURKNpVltT2RsahfHFdseljclqK6BRy31N-n0wf8xBJG6WLLR2WEdei9Xc4Kg/Capture.PNG?psid=1
Below is a screenshot of the Subject column. If you have created the EmailDocs with the Content Type as Document the you should have a 'Copy Source' column available.
https://kzpi9q.dm2301.livefilestore.com/y2p3anpYF9RmG7OdpTajwaGYGW4-2zpq47_qvWc9Gk8aUry1CTke9VNWtfAGjI7HHAYaKWZ7C4idAEq-QEqccxV4osqDfR7BWQ87Nrivu7EyKZ7cTlj1LvAGKrBRL06e36NMHyjYFI70T5N_R7_jwFvwQ/Capture2.PNG?psid=1
EmailDocs2
https://kzpi9q.dm2301.livefilestore.com/y2po_iat0hsIPAD37nxnrgja5R0RU4fTK6N2-CXkjamjh-TlGpKXl_O_AILO5_l4aqOKbCWg0gY0Sx8EL3plbI0Km79h9IwmcHxhlHIjcB2YQPcjecH_rsHmya1uPvSX6kaAdUnmp5QkSG5KPO0bmeoOw/Capture3.PNG?psid=1
The PowerShell scripts
There are two scripts, one is to download the MSG files from the document library and second is to extract the metadata and add them to the list. You can also CLICK HERE to down the ZIP file that has both the scripts combined.
Download the MSG files
In the below script, add the destination folder location, URL of your web application and complete URL of your document library.
######################## Start Variables ######################## $destination = "C:\Temp\scripts\msgextract\msgextract\msg" $webUrl = "http://sp13test.christianfamily.biz/" $listUrl = "http://sp13test.christianfamily.biz/EmailDocs/" ##############################################################
$web = Get-SPWeb -Identity $webUrl $list = $web.GetList($listUrl)
function ProcessFolder { param($folderUrl) $folder = $web.GetFolder($folderUrl) foreach ($file in $folder.Files) { #Ensure destination directory $destinationfolder = $destination if (!(Test-Path -path $destinationfolder)) { $dest = New-Item $destinationfolder -type directory } #Download file $binary = $file.OpenBinary() $stream = New-Object System.IO.FileStream($destinationfolder + "/" + $file.Name), Create $writer = New-Object System.IO.BinaryWriter($stream) $writer.write($binary) $writer.Close() } }
#Download root files ProcessFolder($list.RootFolder.Url) #Download files in folders foreach ($folder in $list.Folders) { ProcessFolder($folder.Url) }
Extract the metadata
For the script to run successfully the below three separate folders are created. They are, 1. MSG - This is where the MSG files are temporarily downloaded. 2. BIN - This is where the 7-Zip application in saved. 3. MSGEXTRACTED - This is where the MSG files are extracted. In order for this script to be reusable, this folder is created at the beginning and then deleted at the end.
#msgextract.ps1 #Version 1 by Daniel Christian
Add-PSSnapin Microsoft.SharePoint.PowerShell -ErrorAction SilentlyContinue
#date $date = Get-Date -format F
#Set directories $bin = "C:\Temp\scripts\msgextract\msgextract\bin" $BaseDir = "C:\Temp\scripts\msgextract\msgextract" $CapDir = $hst + "_" + $date $msgDir = $BaseDir + "\msg\"
# Destination site collection $WebURL = "http://sp13test.christianfamily.biz"
# Destination list name $listName = "EmailDocs2"
#Get the SPWeb object and save it to a variable $web = Get-SPWeb -identity $WebURL
#Get the SPList object to retrieve the list $list = $web.Lists[$listName]
#Get all items in this list and save them to a variable $items = $list.items
#Array to Hold Result - PSObjects $ListItemCollection = @()
#Add a new folder name $foldernumber = 0
#7zip $7z = "C:\Temp\scripts\msgextract\msgextract\bin\7z.exe"
New-Item -Path C:\Temp\scripts\msgextract\msgextract\extractedmsg -ItemType directory
Function meta_extract($_msg){ New-Item -Path C:\Temp\scripts\msgextract\msgextract\extractedmsg\$foldernumber\ -ItemType directory $OutputDir = $BaseDir + "\extractedmsg\" + $foldernumber &$7z x $_msg -oC:\Temp\scripts\msgextract\msgextract\extractedmsg\$foldernumber\ |out-null }
Function fileSorter{ $msg_filters = @( @{ptrn='0037'; class="Subject"}, @{ptrn='0070'; class="Topic"}, @{ptrn='0044'; class="Rcvd repr name"}, @{ptrn='0C1A'; class="Sender name"}, @{ptrn='0E03'; class="CC"}, @{ptrn='0E04'; class="To"}, @{ptrn='1046'; class="From"} )
$data = gci C:\Temp\scripts\msgextract\msgextract\extractedmsg\$foldernumber
#Adding items to the list $newItem = New-Object PSObject $newItem = $list.items.Add()
foreach($f in $data){ foreach($c in $msg_filters){ if ($f -match $c.ptrn){ $OutName = "C:\Temp\scripts\msgextract\msgextract\extractedmsg\$foldernumber" + $c.class + ".txt" $msgdata = gc C:\Temp\scripts\msgextract\msgextract\extractedmsg\$foldernumber\$f|out-file $outName (gc $outName) -replace "`0", "" | sc $outName $tmphtml = "<div><msg>" + $c.class + "</msg>" $tmpdata = gc $outName $ListItemCollection += $tmpdata $tmphtml = "<p>" + $tmpdata + "</p></div>"
#Adding items to the list if ($c.class -match "subject"){ $newItem["Subject"] = $tmpdata }
if ($c.class -match "Rcvd repr name"){ $newItem["To"] = $tmpdata }
if ($c.class -match "CC"){ $newItem["CC"] = $tmpdata }
if ($c.class -match "sender name"){ $newItem["From"] = $tmpdata }
} } } $newItem["Topic"] = $_.Name $newItem.Update() Remove-Item -Path C:\Temp\scripts\msgextract\msgextract\extractedmsg\$foldernumber -recurse }
Function Main{ $files = Get-ChildItem $msgDir\*.msg $files|foreach{ meta_extract $_ Write-host $_.Name.replace(".msg","").replace("FW ","").replace("Fw ","").replace("RE ","").replace("Re ","") -foregroundcolor black -backgroundcolor yellow Write-host "Metadata Extracted!" -foregroundcolor red -backgroundcolor yellow fileSorter $_ $foldernumber += 1 } Remove-Item -Path C:\Temp\scripts\msgextract\msgextract\extractedmsg -recurse Remove-Item -Path C:\Temp\scripts\msgextract\msgextract\msg\*.msg -recurse } Main |
Video
Take a look at the below video which walks you through the different steps of the above two scripts. This video also demonstrates the difference between EML and MSG files:
(Click Here to view a high resolution view)
Conclusion
You can take the above method a step further to download only the new MSG files that have been added, extract the metadata from it and update the list. This can very easily be done by adding a Yes/No column with default No and calling it something similar to 'DataExtracted' such that after the data is extracted the script will change the No to Yes. Also, you will have to add both the scripts to a Task Schedule on the server to run as often as you would like it.