Sdílet prostřednictvím


Fuzzy Lookup Add-In for Excel

Data cleaning is often a big challenge when working with textual data. The Fuzzy Lookup Add-In for Excel is a new tool from Microsoft Research and BI Labs that helps with the problem of identifying and matching textually similar string data in Excel. It is robust to spelling mistakes, synonyms, missing or added words and a number of other data quality problems frequently encountered in the real world. It has support for most languages and works well across a wide variety of data domains. Common uses include cleaning up lists of names, addresses, products or other entity descriptions which contain fuzzy duplicates. It can also be used to fuzzy join two different tables together. For instance, you might clean and augment a table of dirty city, state data with a zip code by matching it against a clean reference table of city, state and zip codes. Give it a whirl and let us know how it works for your data!

Comments

  • Anonymous
    April 29, 2011
    An updated version of the add-in has just been released. It fixes a bug that was causing the tool to fail on systems where the default numeric format for decimal values is configured to use something other than "." as the decimal separator.

  • Anonymous
    April 29, 2011
    Can you use the Microsoft.DataIntegration DLL in C#? Are there any examples?Thank youDoug Finkehttp://dougfinke.com/blog

  • Anonymous
    April 29, 2011
    Hi Doug, at present the functionality is only available through Excel. We are looking at ways to make it more widely accessible to developers.

  • Anonymous
    May 02, 2011
    Just downloaded the addin and still get a COM addin error and it won't run.

  • Anonymous
    May 03, 2011
    Hi W.P., make sure that you installed the add-in by running setup.exe and not just just the .msi file; setup.exe installs a few prerequisites that your machine might require. Please send me an email at krisgan@microsoft.com and we can figure out what the problem is.

  • Anonymous
    May 05, 2011
    Hi Kris - Have you looked at FuzzyDupes by Kroll Software?  I've used that in the past.  Unfortunately, I'm only on Excel 2007 and my computer is locked down and prevents me from installing Dot Net 4.  Hope to eventually get a chance to try out your algorithms.

  • Anonymous
    May 09, 2011
    Hi Yilun, thanks for the pointer. For those interested, some papers highlighting the ongoing research and technical details behind fuzzy lookup and other data cleaning technologies can be found here: research.microsoft.com/.../datacleaning

  • Anonymous
    May 13, 2011
    Hi Kris,I can only get it to work very sporadically. A couple of times I have succesfully created lists which it has run on but, for the vast majority of the time, pressing the "Go" button has no effect. I can not establish any pattern as to when it will work or not.It doesn't even work on the Portfolio file supplied with the Add-in.Any ideas?

  • Anonymous
    May 16, 2011
    Hi Paul, someone else reported a similar problem where the GO button was consistently not doing anything. They had originally installed the add-in by launching the .msi instead of setup.exe. When they uninstalled the add-in and then re-installed via setup.exe it started working. You might give that a try. If you still are seeing problems, send me an email at krisgan@microsoft.com and we can debug offline.

  • Anonymous
    May 16, 2011
    Hi Kris,Will email.ThanksPaul

  • Anonymous
    June 14, 2011
    Hi Kris,I think this software will be very helpful for a current project.  I do, however, have Excel 2007.  Is there anyway I can use the fuzzy look-up with '07 Excel?Thank you

  • Anonymous
    June 16, 2011
    Hi Nick, someone reported that they were able to get it to run with Excel 2007. You might give it a try. If you can let us know whether it worked, that would be great.

  • Anonymous
    June 26, 2011
    The comment has been removed

  • Anonymous
    August 25, 2011
    HiI have installed this today; I am having the same "Go button doesn't do anything" problem as noted below.I have uninstalled and reinstalled using setup.exe but this doesn't seem to be having any effect.Help!Thanks.

  • Anonymous
    September 16, 2011
    Installed this plug-in in my 32-bit Excel in Windows 7 x64 environment and it's brilliant.I've only had it on my PC for an afternoon and already it's saved me time and I can see this will save me hours every week. I am delighted.I've also got a 64-bit Excel in Windows 7 x64 environment I use for my big number crunching work, it's seperate because most of my plug-ins are only available in 32-bit, and I hope it works in that too.

  • Anonymous
    March 12, 2012
    How to get this Addin to load?, I'm running Excel 2010 and I can see the Add In under Inactive Applications Addins but when I try to load it I get "Not Loaded. A runtime error occurred during the loading of the Com Add-In.

  • Anonymous
    March 13, 2012
    portfolio example works as advertised.Tried it on a table >200K rows and got a TITLE:Exception from HRESULT: 0x800A03ECBUTTONS:OK

  • Anonymous
    May 09, 2012
    Having issues with the transformations not working. Is there a good walk-through to show how to use these advanced configuration features properly?

  • Anonymous
    June 05, 2012
    Does this Add-In work with Excel 2007 over Windows XP (Spanish)?

  • Anonymous
    August 24, 2012
    Is there a way to access this Addin through Excel VBA?

  • Anonymous
    September 01, 2012
    Hello Kris,Job well done. It also works perfectly well with Excel 2007.However, as Jim Baldwin said, it would be ideal to access it through Excel VBA ...Do you have any plans ?

  • Anonymous
    September 05, 2012
    Hello Everybody,I am performing fuzzy match on 2 tables, one of 12000 rows, the other one with 18000. I match 4 colums into each tab (street, zip, city and name of a prospect).My problem is that my 12GB server is too slow at performing this. After 6h running, Excel frizz.Any idea on how to handle that?Thanks a lot,Adrien

  • Anonymous
    September 15, 2012
    DavidI had the samee problem matching two address files. It was caused by an unrecognised charecter in the data whci was the = sign. Clean the databases and the problem goes away

  • Anonymous
    February 19, 2013
    Can you point me to a White Paper that describes the logic used by this Fuzzy Lookup Add-In

  • Anonymous
    May 09, 2013
    HiAm new to the Fuzzy Lookup functionality. I tested it out on small tables and it seems to work fine. However when I tried applying it to my work where I need to compare a text string (paragraph of more than 1000 words) with a list of other such big paragraphs (say 1000 of them), the tool seems to hang. Does the add-in have such a limitation?

  • Anonymous
    June 12, 2013
    Hi all,I got it working using the setup file instead of the msi file.  Please note I have only excel 2007 (not 2010).  The installation downloaded some extra files from microsoft to make it work and work it did.  For those who had trouble getting the GO button to do anything, make sure your current cell is selected where you want the results to go.  I used the format suggested by the helpful pdf file that comes with it and put the current cell a few rows below one of my tables used in the matching.I've only used it for a quick project and the results were almost perfect... good luck!-JNFoo

  • Anonymous
    June 13, 2013
    HiI installed the Fuzzy Lookup Add-in, and it works great.But I am trying to solve a bit different problem than described in sample file Portfolio.xlsx.I would like to fuzzy compare two texts in each row:I have two different tables – let say table A and table B. But I don’t want to compare the first row from A to all rows of B, but just the first row of A to first row of B. And then do the same for all other rows. I can’t find a way to use this this Add-in in this way…I found some examples how to that with a custom macro but they do not seem robust enough unlike the Fuzzy Lookup Add-in.What I would like to know is, if there is any possibility to do what I need using Fuzzy Lookup Add-in? Can it be customized this way via Configure option? Any suggestions?Thank you for your answer.Regards,Jan

  • Anonymous
    July 04, 2013
    No results are inserted when I press the "Go" button. Do you have any idea what could be wrong?Office Standard 14.0.6112.5000 64-bit GermanWindows 7 Professional, 64-bit German

  • Anonymous
    July 09, 2013
    The problem I reported earlier has been resolved after uninstalling the add-in and installing it again.If you face the same problem, uninstall the add-in Thenand install it again by running setup.exe. Do not run the .msi file.

  • Anonymous
    November 05, 2013
    The comment has been removed

  • Anonymous
    March 13, 2014
    The comment has been removed

  • Anonymous
    March 14, 2014
    Hi,I installed de addin al ok, but de vlookup and hlookup traditional functions desappeared from my function list.Any idea?

  • Anonymous
    March 28, 2014
    Hi,Downloaded the fuzzy lookup addon from here : www.microsoft.com/.../details.aspxIt installs fine and i can start the lookup with the addon.After 7 rows of data the lookup always ends with this, this happens also in the example provided with the Fuzzy download.Tried another PC with Win8 and Excel2013 and same results.-Unable to set the NumberFormat property of the Range class-  at System.Dynamic.ComRuntimeHelpers.CheckThrowException(Int32 hresult, ExcepInfo& excepInfo, UInt32 argErr, String message)  at CallSite.Target(Closure , CallSite , ComObject , String )  at CallSite.Target(Closure , CallSite , Object , String )  at FuzzyLookupAddInForExcel.FuzzyJoinControl.MoveNextRow(Int32 outputBatchSize, Int32 r0, Int32 c0, Int32& r, Int32 similarityColumnIndex, Int32 similarityXmlColumnIndex, Int32 outputColumnCount, Worksheet activeWorksheet, Range& outputRange, Array& outputArray, Int32& outputRowCount)  at FuzzyLookupAddInForExcel.FuzzyJoinControl.FuzzyJoin(Worksheet worksheet)Any idea how to make it work?

  • Anonymous
    April 08, 2014
    Does anyone know how to install this Add-In into a terminal server/Citrix environment? The installation only sets it up for the user who runs the setup file.I've tried installing (in Install Mode) it into a common location (c:program files..) and also added appropriate registry entries under HKLMSoftwareMicrosoftOfficeExcelAddins however it does not load. I suspect here are further registry entries in the HKCU...Any help would be greatly appreciated!

  • Anonymous
    April 09, 2014
    Hello folks,I have installed fuzzy look up by using  set up file but the 'Go' button doesn't do anything. I uninstalled and re-installed it twice but still it does not work. I am using Excel professional  plus 2010. Thanks.

  • Anonymous
    April 23, 2014
    The comment has been removed

  • Anonymous
    April 30, 2014
    I am having a similar issue to the comments above where fuzzy lookup after pressing go just spins for a while then crashes excel. It was working when I first DL'd it about a month ago. Now I cannot get it to work at all.I uninstalled and reinstalled using the setup.exe.I am using excel 2013.I have not seen an answer to any further issues beyond uninstalling and reinstalling.Please help as I used this a few weeks ago with fantastic results, and now it inexplicably does not work.Thanks,

  • Anonymous
    May 13, 2014
    Hi,I´m having the same problem as fasterlower above.Quote:It installs fine and i can start the lookup with the addon.After 7 rows of data the lookup always ends with this, this happens also in the example provided with the Fuzzy download.Tried another PC with Win8 and Excel2013 and same results.-Unable to set the NumberFormat property of the Range class- at System.Dynamic.ComRuntimeHelpers.CheckThrowException(Int32 hresult, ExcepInfo& excepInfo, UInt32 argErr, String message) at CallSite.Target(Closure , CallSite , ComObject , String ) at CallSite.Target(Closure , CallSite , Object , String ) at FuzzyLookupAddInForExcel.FuzzyJoinControl.MoveNextRow(Int32 outputBatchSize, Int32 r0, Int32 c0, Int32& r, Int32 similarityColumnIndex, Int32 similarityXmlColumnIndex, Int32 outputColumnCount, Worksheet activeWorksheet, Range& outputRange, Array& outputArray, Int32& outputRowCount) at FuzzyLookupAddInForExcel.FuzzyJoinControl.FuzzyJoin(Worksheet worksheet)Any idea how to make it work?

  • Anonymous
    May 20, 2014
    Do we have any documentation for the Fuzzy Lookup which speaks about the features, limitations or the performance issues.

  • Anonymous
    July 03, 2014
    The "Unable to set the NumberFormat property of the Range class" issue seems to disappear when changing the regional settings of the computer to English (United States). It looks like this tool needs some debugging...

  • Anonymous
    August 04, 2014
    @antonioaThanks! You solved my problem.Both region and format needs to be english (US).

  • Anonymous
    September 25, 2014
    Hi there,I noticed that Fuzzy Lookup doesn't distinquish accents, e.g. similarity for Dom Perignon and Dom Pérignon is 1, while the difference is in letter "é". Is it possible to enable this functionality?Thank you

  • Anonymous
    September 28, 2014
    Fuzzy Lookup installer is corrupt: www.microsoft.com/.../confirmation.aspxMSFT, can you please check your download file!Thank youMatthias

  • Anonymous
    October 16, 2014
    Just downloaded this tool and read the license, it says the "The term of this agreement is until 30/07/2014 (day/month/year), or commercial release of the software, whichever is first", is this going to be extended or a commercial version of the tool made available?

  • Anonymous
    November 06, 2014
    Hi KrisI have a small query regarding the Microsoft Fuzzy Look-up Add-in for Excel. Is it possible to perfrom a duplicate search function with this Add-in on a Single column.To explain it in depth say I have 2 same entries in a column, but they are partically matched with eachother. Ex ABC Company LTD and Company ABC LTD. How can I find both the entries in column with this add-in.Regards,Jagdev

  • Anonymous
    November 14, 2014
    The comment has been removed

  • Anonymous
    November 19, 2014
    The comment has been removed

  • Anonymous
    November 19, 2014
    You have to first install Visual Studio 2010 Tools (using Link below) before Fuzzy will install properlywww.microsoft.com/.../details.aspx

  • Anonymous
    December 04, 2014
    Hi everyboy,I had an issue with Fuzzy Lookup Add-In. When I installed the product, it simply did not load and I had a COM addin error.I managed to fix it as following.When you install the product at the licence agreement step, click "advanced" button and choose "install for all users" instead of "only for me". I don't know why but it worked in my case. Hope it will help.J.

  • Anonymous
    December 16, 2014
    Great add-in that solves a very real problem. I have a question/suggestion - is there a quick way to uncheck all of the output columns in one go? I have a table with a large number of columns and I only want 2 or 3 of them in my output. It is taking a very long time to manually uncheck all of the options I don't want.

  • Anonymous
    December 16, 2014
    Also does anybody have any more detailed instructions on how to perform a search for duplicates within a single column? The instructions in the README simply say "Note that Fuzzy Lookup can also be used to identify matches in a single table by setting the left and right tables to be the same." However this is not giving me the results as expected. It appears to list every entry in the column whether there is a duplicate or not (matching a record to itself perhaps?)

  • Anonymous
    January 06, 2015
    I get different results from the same data set when I run with a Similarity threshold of 0 and .75.Something that should be an obvious match shows up with a .9 similarity threshold when I run it at 0, but shows up with a 0 similarity threshold when I run at .75.This is really concerning--I don't know if I can trust the tool anymore.

  • Anonymous
    February 12, 2015
    If I add an ExactMatch column, it returns non-exact matches.  What gives?

  • Anonymous
    February 13, 2015
    Date colums seems to always output in US date format (mm-dd-YYYY) even though the input is in Euro dd-mm-YYYY. This cause subsequent errors.Is there a way to control the output date format (or simply let it follow the format from the input)?

  • Anonymous
    February 15, 2015
    I've used the previous version so was keen to try 1.3 - especially the new Exact Match option.  However this still doesn't seem to force an exact match?My scenario is 2 input tables with 2 columns each - Company Code and Director Name. I want to exact match on Company Code and only consider fuzzy matches on Director Name when the Company Code matches exactly.  However it returns matches seemingly ignoring the Company Code?

  • Anonymous
    February 16, 2015
    I also has thatUnable to set the NumberFormat property of the Range classI switch Windows locale from Finnish to English in Reginoal settings and it started to work.

  • Anonymous
    March 24, 2015
    Sounds good but the software never installs, lost half-day messing with it and had to give up.

  • Anonymous
    May 05, 2015
    I have two tables with two columns each I need to join. However I would get better results if I could specify that the first word in one table's first column should be specifically matched to the first word in the other table's first column, although the spelling might be inexact.  I can post-filter my results, but that would be less efficient.  How can I modify the XML to do this?

  • Anonymous
    May 10, 2015
    How to configure so that the greatest weight is assigned to a name that contains what I'm matching to?  Test case: WalMart.  If I try to find names in my DB matching that, I will get all sorts of variations of that name that are most similar in length.  However the exact string "WalMart" does exist in my DB in much longer names such as WalMart Store number 123, etc. If I change my test case to "WalMart store" then all those names come back.  Which configuration setting(s) can help me control this so that I can get those names back by only looking for "WalMart"?

  • Anonymous
    May 19, 2015
    I am attempting to run some code to check if the add-in is installed and available but I can't seem to find the correct text that the VBA code can check on. Can anyone help with this? Thanks

  • Anonymous
    May 21, 2015
    When using in 2013 I am sometimes getting an error - Exception of type 'System.OutofMemoryException' was thrown.The 2 datasets I am tryign to match are 115k & 7k long.Thanks

  • Anonymous
    May 28, 2015
    The comment has been removed

  • Anonymous
    July 06, 2015
    I downloaded Fuzzy lookup today and it is not working. The download and intall went fine but when I click to open excel the plug in is working. In fact, I do not get the tab on excel. I am using excel 2010. Any pointers will help.

  • Anonymous
    July 07, 2015
    the solution to my issue was a different program needed to be installed for hte fuzzy lookup to work. this is the program: www.microsoft.com/.../details.aspx  (VSTO) Install fuzzy lookup and then this VSTO. Once you have both downloaded, open excel. You "should" (hopefully) see the fuzzy tab.

  • Anonymous
    July 15, 2015
    In this add-in’s demo worksheet; When I use Portfolio as the Left Table and SP-500 as the Right Table it finds all available matches, When I use SP-500 as the Left Table and Portfolio as the Right Table it does not find all available matches, Why does this happen? Please can you advise me on this issue, why it occurs, how to fix it or how to avoid it?

  • Anonymous
    August 06, 2015
    Hi, is it possible to use Fuzzy Lookup functionality within a VBA code module of an MS Access database by adding a reference to Fuzzy Lookup DLLs? Thanks Bodo

  • Anonymous
    August 10, 2015
    Anyone know the product code i could use to uninstall it?

  • Anonymous
    August 11, 2015
    Is it possible to save the Configuration settings - opening excel resets defaults.

  • Anonymous
    August 20, 2015
    I am trying to run Fuzzy Lookup against two tables to match inexact names, however I keep getting the error message, "Index was out of range. Must be non-negative and less than the size of the collection. Parameter name: index" I have no idea what the error is indicating or how to fix it and I cannot find any user manuals covering Fuzzy Lookup detailed troubleshooting. Any help would be greatly appreciated. Thank you.

  • Anonymous
    August 20, 2015
    After converting two groups of data into tables, setting up the Fuzzy Lookup parameters, and hitting "Go", I get the error, "Indes was out of range. Must be non-negative and less than the size of the collection. Parameter name:index." I have no idea what the error is referring to or how to fix the error and get an answer. Any help would be highly appreciated. Paul

  • Anonymous
    September 15, 2015
    Hi from Germany :-) The license terms contain the following sentence: "You may not test the software in a live operating environment unless Microsoft permits you to do so under another agreement." Am I restricted in the use of this tool at my PC at work? Is my PC at work a live operating environment??? Can someone please explain the meaning of this sentence in different words? Thanks, Oliver

  • Anonymous
    October 11, 2015
    Hi, to match between "prizes" fields it would be interesting to have a numerical algorithm, which does not interpret as a possible match $27.01 as $72.10 but $27.10 or 26.90 (+/- % margin). Any idea to implement this? Thanks in advance! Raúl

  • Anonymous
    October 18, 2015
    The comment has been removed

  • Anonymous
    October 25, 2015
    Are there instructions on using it on one table to find fuzzy duplicates?  It works great for joining tow tables, but I can't figure out how to use it to find duplicates in one table. Thanks

  • Anonymous
    October 30, 2015
    The comment has been removed

  • Anonymous
    February 15, 2016
    I just downloaded this add-in on Office 2016 on a machine running Windows 10. Everything seemed to go fine with the download, but when I ran the fuzzy lookup in the Portfolio sample file, it said, "A column with name 'Company' was not found in the schema table." I looked at both tables, though, and both of them do indeed have a column name "Company," which I am using for my left and right columns. Any ideas?

  • Anonymous
    February 23, 2016
    Seems nobody is able to answer any of the queries.

  • Anonymous
    February 24, 2016
    Hello I have an issue with MS Fuzzy lookup, I cannot see it in the excel ribbon: in fact Excel cannot load the com Addin Error message is not loaded. a runtime error occurred during the loading of the com add-in. I have Excel 2010 32bits, (windows 7 enterprise 64 bits)