Share via


How to decompile a Compiled HTML Help (.CHM) files and extract information using Powershell

What is a Compiled HTML help (.CHM)?

Microsoft Compiled HTML Help is a Microsoft proprietary online help format, consisting of a collection of HTML pages, an index and other navigation tools. The files are compressed and deployed in a binary format with the extension .CHM, for Compiled HTML. The format is often used for software documentation, like for Sysinternals tools.

How to decompile HTML help

Today me and my friend were looking for a approach through which we can Decompile .chm files into HTML and then parse the HTML DOM to extract some information. After some research I found that there is Windows command line utility HH.exe shipped with Windows operating system which can decompile the .CHM files to HTML using some command line options.

So I wrapped up the commands into a Powershell function, like below

001

002

003

004

005

006

007

008

009

010

011

012

013

014

015

016

017

018

019

020

021

022

023

024

025

026

027

028

029

030

031

032

033

034

<#

 PURPOSE : Script utlizes HH.exe to decompile the Compiled HTML Help file (.CHM)

 AUTHOR : Prateek Singh

 BLOG : http:\\Geekeefy.wordpress.com

#>

Function Get-DecompiledHTMLHelp

{

    [cmdletbinding()]

    param(

            [String] $Destination, [String]$Filename

    )

    $EXE = 'C:\Windows\hh.exe'

    If(-not (Test-Path $destination))

    {

        "Destination folder doesn't exist"

    }

    elseIf(-not (Test-Path $Filename))

    {

        "Target .chm file not found, please make sure you're entering the full path and file name"

    }

    else

    {

        Start-Process -FilePath $EXE -ArgumentList "-decompile $Destination $Filename"

        $FilesAndFolder = Get-ChildItem $Destination -Recurse| group psiscontainer

        $FolderCount = ($Filesandfolder| ?{$_.name -eq $true}).count

        $FileCount = ($Filesandfolder| ?{$_.name -eq $False}).count

        

        Write-host "Decompiled into $(if($Foldercount -gt 0){$Foldercount}else{0}) Folders and $(if($FileCount){$FileCount}else{0}) Files to Destination $Destination" -ForegroundColor Yellow

    }

    

}

Provide the path to a Compiled HTML Help file (.CHM) and a destination folder to place you decompiled content, to the Function which decompile and save the content a the target destination, like in the following image.

Extracting information from HTML file using HTML <Tags>

To extract information from the HTML files, use the function Create-HTMLDomFromFile to create a DOM structure to the HTML content and pull the text residing under a specific HTML , like below