Compartilhar via


Co-occurrence Approach to an Item Based Recommender Update

In a previous post I talked about a Co-occurrence Approach to an Item Based Recommender, that utilized the Math.Net Numerics library. Recently the Math.Net Numerics library was updated to version 2.3.0. With this version of the library I was able to update the code to more efficiently read the Sparse Matrix entries. As such I have updated the code to reflect these library changes:

https://code.msdn.microsoft.com/Co-occurrence-Approach-to-57027db7

The new Mat.Net Numerics Library changes were around the storage of the Vector and Matrix elements. As such I was now able to access the storage directly and use the Compress Sparse Row Matrix format to more efficiently access the Sparse Matrix elements.

The original code that accessed the elements of the Sparse Matrix was a simple row/column traverse:

let getQueue (products:int array) =         
    // Define the priority queue and lookup table
    let queue = PriorityQueue(coMatrix.ColumnCount)
    let lookup = HashSet(products)

    // Add the items into a priority queue
    products
    |> Array.iter (fun item ->
        let itemIdx = item - offset
        if itemIdx >= 0 && itemIdx < coMatrix.ColumnCount then
            seq {
                for idx = 0 to (coMatrix.ColumnCount - 1) do
                    let productIdx = idx + offset
                    let item = coMatrix.[itemIdx, idx]
                    if (not (lookup.Contains(productIdx))) && (item > 0.0) then
                        yield KeyValuePair(item, productIdx)
            }
            |> queue.Merge)
    // Return the queue
    queue

Now one has access to the storage elements I was able to more efficiently access just the sparse element values:

products
|> Array.iter (fun item ->
    let itemIdx = item - offset
    let sparse = coMatrix.Storage :?> SparseCompressedRowMatrixStorage<double>
    let last = sparse.RowPointers.Length - 1
    
    if itemIdx >= 0 && itemIdx <= last then
        let (startI, endI) =
            if itemIdx = last then
                (sparse.RowPointers.[itemIdx], sparse.RowPointers.[itemIdx])
            else
                (sparse.RowPointers.[itemIdx], sparse.RowPointers.[itemIdx + 1] - 1)
        seq {
            for idx = startI to endI do
                let productIdx = sparse.ColumnIndices.[idx] + offset
                let item = sparse.Values.[idx]
                if (not (lookup.Contains(productIdx))) && (item > 0.0) then
                    yield KeyValuePair(item, productIdx)
        }
        |> queue.Merge)
// Return the queue
queue

In the new version of the code The Values array provides access to the underlying non-empty values. The RowPointers array provides access to the value indexes where each row starts. Finally, the ColumnIndicies are the column indices corresponding to the values.

Other than this change all other aspects of the library’s usage were effectively unchanged; including the MapReduce code (postings can be found here), as this uses a collection of Vector types. I did however update the job submission scripts.