Share via


Anybody Wanna Peanut?

A long while ago in a galaxy far, far away, my children made me watch a show. "The Princess Bride." (had you going there didn't I?) In it, two of the main characters were playing a rhyming game and one of the characters "tires" of the whole rhyming exercise, and says, "Stop rhyming and I mean it." To which of course the response was:

"The title of this blog."

No really, the actual title of this blog. "Inoun, where are you going with all of this?" Hm, I was thinking I needed a few lines of text to describe how search vectors work.

Vector? Yes, hector a vector. I told my wife yesterday that I was going to be writing about vectors. Her natural and innate response was, "You lost me at vector." Understandable "fur sure". Vectors are actually pretty cool, but over the years, they have gotten a really bad rap. Kind of like the word spam. The original spam. The meat product has suffered ever since.

"You want me to eat what?"

At my house, we spearmint with a lot of things. Yes, spearmint. Everything is a spearmint. Including words. On a whim one day I said to the kids, "Hey, there's a can of spam in the pantry! Let's C what we can do with it?" It wasn't until I actually opened the can, looked at, and shared my view of the contents with my daughter she came to the full realization, I actually was serious! Or, as we often say in our home, "are you cereal?" So I fried up a few pieces, read the total fat content count to my kids, and made every last solitary one actually "try" a piece. It was quite the spearmint in human nature.

And of course we couldn't help but start singing "THE" song. But I digress.

Vectors have always been looked down upon by just about every pre-pubescent alive (since we are all currently taking algebra you know (notice the present tense here, usage of the term "we" and what that implies)), and this awful taste (algebra that is) usually sticks with every, solitary, last one of us, well into, and far beyond that abyss, we call death (that's just a fancy way of saying that few people will ever really get over it.)

But vectors are really cool! I know, "Inoun, you already said that!"

And we use them all of the time. "Like, you know, like, dude, ..." when some 1 tells you, after asking for directions, "Oh, yeah, sure dude, go over, uh..., that way about 30 yards and you will see it." Or when someone at the grocery store says, "Clean up on aisle three."

I know I am over generalizing here, but we just don't realize how very cool vectors really are most of the time. And it is usually because we call them, well, something boring, like "directions".

So, Inoun, what does this have to do with search? Well a "hole" lot actually. Just like in real life, vectors somewhere deep inside that little teeny tiny brain we call Bing, SharePoint Search or FAST, is processing away at a bajillion vector calculations every time you search for something, like, I don't know, so very useful, like, like, ah, like I know, "Pictures of Brittany Spears!" Those queries, and others like them, that you keep sending "them", in my mind, rank right up there with the "other" queries like the great anti-query (a topic for another day).

Think of it! Millions and millions of little vectors come to life, desire only to make us happy, find the "correct" answer for "just us", and all our many questions about the number 42, and then in the end, are sadly and completely wasted on actresses and that great anti-query. But once again Inoun (you digress).

In its simplest form, the way a vector works, is by telling you how to get from point A to point B.

O so very, very useful for "stuff" in all kinds of situations. But even cooler than that, is that it works on words and documents too! Time for an Inoun example!

Quote:

1 2 3 4 5 6 7 8 9 10 11 12
A Stop rhyming and I mean it !
B Anybody want a peanut !
C I do not think that means what you think it means .
D I will never turn to the dark side !
E As you wish !
F Come back here and fight like a man !
G
H

 

End quote.

 

So let's say that each of the above phrases in our index are coming in from different documents.  Each document/url is represented by a letter.  Each word is represented by a number.  So for instance, "rhyming" is at A2.  and "dark" is a D7.  You may also notice, that if we count from "mean" at A5 that it is three away from "rhyming".  And that "come" at F1 is four away from "fight".  And if you have a hidden engineer inside of you, or if you "clam" to be one, then you will also notice that "fight" at f5 is 90 degrees and five away from "mean".  Make cents?

 

But what does any of this have to do with search?  Well, take a look at these.  More Inoun examples!

 

http://www.bing.com/search?q=people+versus+inoun

http://www.bing.com/search?q=%2bsharepoint+hiy

 

When I ran the first query on my machine, yours truely came back as the first result!  The second query came back as the third result.  So what's the difference?  Well, "SharePoint" is a pretty common word (getting commoner every day).  And so is the word "people".  "Versus" is less common, and "Inoun", is well, how do I put this, a tree in the woods.  "hiy" actually turns out to be more common than I expected.  So what the search engine does, and here is the real magic, is that it takes the words that you type in, and compares them to see how many documents come back with those words (some vectors).  It also counts the number of times the words show up in each document (more fun with vectors).  Then it also "ranks" or stacks the results by how close the words are to each other in the documents (more vectors).

 

Basically what this means is that words and phrases that are close to each other and occur often, but don't occur often in other documents that were compared (as in not as "close" to each other, etc.) tend to get ranked or stacked higher.  How far is Germany from Sweden?  How far is SharePoint from Inoun?  And it is all because Bing, SharePoint Search, and FAST, all know how to work with this cool little thing called vectors.

 

I have the coolest and mostest, favorite job in the whole world!  And no, you can't have it!

 

My daughter, as every prior generation of pre pubescents that have graced us with their 'center of the universe' mental state, the other day, said to me, "I am never ever going to use this stuff!"  And sadly, so say I, in my current physically post pubescent, but mentally, pre pubescent state.  But I am wrong.  Vectors are cool!

 

And as my old boss many years ago said, "I always wanted to be an en-ga-neer."

 

"Now I are one."

 

Don't you wish you would have paid more attention in class?