Extensions Made Easy v1.12: Fuzzy searching

Once you’ve grabbed EME v1.12.1 or higher, you can explore a new functionality that I ported from a client project to the extension called Fuzzy searching…  Although created months ago, published weeks ago, I never found the time to blog about this gimmick until minutes ago.  Time to post a small example, in VB.Net

What is Fuzzy searching?

Suppose there’s a patient tracker application, written in LightSwitch.  When I’m seeing the doctor, he has to enter my exact family name, or part of it, to find my record between the many Patient entities.  Chances are, I’m seeing the doctor because I have a sour throat or allergic reaction on my mouth, and I’m not in the mood for spelling out my last name, again (I ALWAYS have to spell out my last name… Well, about 99% of the cases anyways, my wife is called Kundry Annys, she has to spell it out in 100% of all cases)…  If the patient tracker application developer implemented fuzzy searching, he could make the doctor and his/her patients lives easier by implementing fuzzy searching…

Fuzzy searching is a technique where searching a list of items or entities is not done based on exact string match, but on “partial” or “metaphonic” match.  Basically, if the user enters a search term like “Hagen”, it’s convenient in some very particular scenarios to return all entities with a property that sounds like “Hagen”, including “Haeghen”, “Haegen” and “Hägen”.

Although this isn’t something you’d want to do for each search screen, some user scenarios definitely justify the use of a Fuzzy search.  Take my name for example: “Jan Van der Haegen”.  Although it might sound like an exotic name to the non-Dutch speaking audience, my name is literally translated as “John of the Hedge”.  You can imagine that this is quite a common family name in Belgium and the Netherlands, and thus, comes in a variety of notations across different families, from “Haägen” to “Verhaeghe” to “Van der Haegen” to…

Setting up a sample application

To test this fuzzy searching implementation, create a new LightSwitch application.

Note to self: Application798? Really? Get a life!

In the application, create a new entity called Patient.

As you might expect, the patient entity will contain some common fields like FamilyName, FirstName, and a computed field called FullName, for displaying purposes, which is computed as:

Namespace LightSwitchApplication

    Public Class Patient

        Private Sub FullName_Compute(ByRef result As String)
            ' Set result to the desired field value
            result = Me.FamilyName + ", " + Me.FirstName
        End Sub
    End Class

End Namespace

Making the entities fuzzy

To speed up the fuzzy searching, we won’t loop over the entities during the actual search.  Instead, we’ll add an extra property to our Patient entity, which will never be displayed except for this demo, where we store some “fuzzy” version of our entity.  When the end-user hits search, we’ll make the search term fuzzy as well, and look for exact matches between the fuzzy version of the search term, and the fuzzy version of our entity.

Add an extra property to the Patient entity called “FuzzyName”.  Make sure the maximum length is high enough to contain a fuzzy version (512 characters will do since our FirstName and FamilyName properties are 255 characters each).

This would make a valid candidate to be a computed field, but since computed fields aren’t stored (they are computed on the tier wherever they are called), we’ll “manually” keep this field in sync with the other properties on save (both insert and update), by writing some code (from the Write Code dropdown).

The code we’ll need to add is this:

Imports ExtensionsMadeEasy.Utilities.Extensions.StringExtensions

Namespace LightSwitchApplication
    Public Class ApplicationDataService

        Private Sub Patients_Updating(entity As Patient)
            entity.FuzzyName = entity.FullName.MakeFuzzy()
        End Sub

        Private Sub Patients_Inserting(entity As Patient)
            entity.FuzzyName = entity.FullName.MakeFuzzy()
        End Sub
    End Class

End Namespace

The imports statement at the top (using directive in c#) makes sure you can call an extension method called .MakeFuzzy() on any string.

I added a new screen (Lists and Details Screen template) to show the Patient entities, and in the list I’m showing both the FullName computed property and the FuzzyName property.

Again, this is done for demo purposes only, you’d normally never display this field to the end-user.

What we have done so far, will result in a behavior as in the screenshot below: for each entity, a value is stored that contains the FullName, but without vowels, diacritics, lower case letters, non-alphabetic characters, and with some special attention to how consonants are pronounced (for example: both “Haeghen” and “Haägen” will be stored as “HGN”).

In case you are wondering, the “MakeFuzzy” .Net implementation (source code) is based on this SQL implementation by Diederik Krols.  It’s supposedly Dutch specific, but I found it to work for English as well.  If you disagree, feel free to export a better algorithm (just export a class from your common project that implements  IMetaphonicStringValueConverter), or better yet: send it to me and I’ll gladly include your locale in EME.

However, this doesn’t solve anything yet.  If I misspell my name as “hagen”, the search result list is still empty…

Making the search term fuzzy

The last step, is to also grab the search term that is used in the list (or grid) on the screen, and replace it with it’s fuzzy version before it hits the server, so it can be compared to our fuzzy entities…

The bad news is that to do this, you must subscribe to the NotifyPropertyChanged event on the screens IVisualCollection, find the SearchTerms property on the IScreenCollectionPropertyLoader, and swap out the SearchTerms for their fuzzy counterparts, and be careful about threading in the process.  Thanks to Justin Anderson, for helping me get access to the “Search Pipeline”.

The good news, is that you can implement this from any screen, in the Screen_Created method (from the “Write Code” dropdown), as a one-liner:

Namespace LightSwitchApplication

    Public Class PatientsListDetail

        Private Sub PatientsListDetail_Created()
            ' Write your code here.
            ExtensionsMadeEasy.Presentation.Screens.ScreenCollectionFuzzySearch.MakeFuzzy(
                Me.Details.Properties.Patients)
        End Sub
    End Class

End Namespace

 

The result is that whenever the end-user (our dear doctor) hears my name, he/she can enter “Haagen”, “Haägen”, “Haaaaaaeeeeeeeeeghen”, …, our implementation will replace it with “HGN”, and match a whole set of records that sound like “Hagen” (and thus also have “HGN” in their FuzzyName property).

Succes! The name-spelling-era is finally over!

Advertisements

2 thoughts on “Extensions Made Easy v1.12: Fuzzy searching

  1. Pingback: Windows Azure and Cloud Computing Posts for 5/7/2012+ - Windows Azure Blog

  2. Pingback: Beth Massi - Sharing the goodness

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s