Taming Uncertainty
Sahil's Notepad
Locality Sensitive Hashing (LSH) and Min Hash
[Indyk-Motwani’98] Many distance related questions (nearest neighbor, closest x, ..) can be...
Date: 06/11/2008
Set Similarity and Min Hash
Given two sets S1, S2, find similarity(S1, S2) - based not hamming distance (not Euclidean). Jaccard...
Date: 06/10/2008
Information Retrieval & Search - Basic IR Models
Our focus in the database world has primarily been on retrieving information from a structured...
Date: 03/05/2008
Information Theory (1) - The Science of Communication
IT is a beautiful sub-field of CS with applications across the gamut of scientific fields: coding...
Date: 02/21/2008
Random Sampling over Joins
Source: On Random Sampling over Joins. Surajit Chaudhuri, Rajeev Motwani, Vivek Narasayya, Sigmod...
Date: 02/11/2008
Converting Between Random Sampling Methods
Sampling f fraction out of n records: Sampling with replacement Sample is a multi-set of fn...
Date: 02/05/2008
Reservoir Sampling
A simple random sampling strategy to produce a sample without replacement from a stream of data -...
Date: 02/05/2008