Partilhar via


"Linq to Memory": another old idea

One of my readers asked if I had any particular plans around memory density for Visual Studio to go with my last posting and indeed I had one thought that I considered a very powerful way to create data structures that were more likely to be scalable.  Like most of my ideas it’s a very simple thought that I think is still effective, and I still give the same advice because I’m boring like that.

Basically, it’s this:  rather than design your data structures the way they taught you in your CS2xx class, design them like you were going to store them in a SQL database.  In fact, you are likely to be well served by using a schema designer.  Then store them exactly like that in memory.  RAM is the new disk.  Cache is the new RAM.

In fact, I suggested at that time that we write a thing I called “Linq To Memory” – a stark contrast from “Linq to Objects” to help facilitate this.  Basically Linq to Memory was a hypotheticial thing that would be more like an in-memory-database with “tables” that were based on dense structures like b-tree and such but no query language other than Linq needed.

Why?

Because these choices give you an “automatic” way to think about isolation, locking, multiple readers, and even multiple writers if needed.  And they do so comparatively economically.  And if you include the compiled query type options that are present in Linq to SQL or Linq to Entities then you can get something very efficient for read access.

Note that this design actively discourages the forest of pointers that you normally get in OOP and yet you get a nice object model to work with (like Linq to SQL gives you) with the slices you are actually working with hydrated into objects and the other parts densely stored.

This metaphor easily supports multiple ACID choices with as much or as little durability and isolation as you want.

It also facilitates things like 64 bit systems because you do not have to pay a 64 bit pointer or five when a smaller or variable sized index would do the job.  Yet you can have huge amounts of data if you need them.

Natural keys facilitate locality as do typical database style indices.

One time overhead for creating your compiled queries to access the data is amortized over the life of the program and saves due to compactness continue to yield benefits forever.  Lack of pointers in the primary storage also reduces garbage collection costs massively.

MANY systems are well suited for this choice.  And as a primary storage it provides both clarity and economy.

I haven’t seen anything like this but it’s totally buildable, I knew where to find the back-end for it in-house.  It's good not only for VS but for many applications with non-trivial data storage requirements.

Comments

  • Anonymous
    January 02, 2015
    That's interesting because right now I am trying to build such structures and a whole transactional engine on them. I chose this way exactly for the reasons you mentioned.

  • Anonymous
    January 02, 2015
    It's a solid choice

  • Anonymous
    January 02, 2015
    Isn't it amazing how many concurrency problems are well solved by relational databases? I sometimes use the implementation details of SQL Server as ideas to build custom solutions in C#. Especially the capability to snapshot data structures (snapshot isolation in SQL Server) is so powerful.

  • Anonymous
    January 02, 2015
    To extend this train of thought a bit: I suspect that the NoSQL kids just have no idea what they are missing. They don't know how many hard problems have been solved already and are now for them to solve as well. This statement goes for the users of NoSQL products as well as for the product designers. Many blunders are being made in this space by both parties.

  • Anonymous
    January 04, 2015
    This sounds nice but if you do repeated queries you get many temp objects back which are again a burden for the GC. With object reuse and streaming you can solve that but the system needs to be carefully designed so that you do not have your raw data and your deserialized objects in memory costing you more than before.

  • Anonymous
    January 04, 2015
    The key is for the temp objects to not make it to Gen2

  • Anonymous
    January 04, 2015
    Also the loose objects have to be a small slice of the total.  Very small.

  • Anonymous
    January 04, 2015
    You should take a look at "Simple Binary Encoding", there's a C# port of the original Java code weareadaptive.com/.../sbe-1. They use some of the ideas you are talking about, in particular data is densely packed and no GC is needed/caused when accessing objects, i.e. no temp-objects are created.

  • Anonymous
    January 04, 2015
    Interesting, how does is different from data tables? I assume data tables would have more overhead and they're not common there session based.

  • Anonymous
    January 05, 2015
    There were several good starting points for L2M but it never went anywhere