A teaser on how OneNote storage and replication works
The other day someone internally was asking how OneNote stored its files and how often the save behaviour actually happened. You know if you were to pull the power cord on your computer what would you lose and what wouldn't you lose? Well Irina Yatsenko from the OneNote Test team wrote up the following to answer the question and she wanted me to post it for all to see:
Now, I'll describe in more details what we do in OneNote 2007:
- Internally all data from a single paragraph on a page up to a notebook are represented in a graph, which is split in areas we call "graph spaces". This allows us to load/save incrementally per a graph space, so when you open a notebook, you'd see all section tabs popping up almost immediately though pages inside those sections aren't yet loaded. When saving we can also choose which piece to save, rather than saving everything.
- We never save directly to the server hosting the files (even if it's a local machine). First we save into local cache file. Because the cache is local and OneNote has exclusive access to it, we can guarantee that save always succeeds (if not, OneNote will force an exit, because running without a cache means users might lose data, and we think it's better to exit then lose data). Save into cache happens every 30 sec or on exit ([descapa] I have found this to be faster at times though I am not pulling my power cord out)
- To propagate the data from the cache back to the original location of the sections we use background process – replication (=sync). Schedule for the sync depends on the actual store: UNC servers / local machine replicate every 30 sec, but for SharePoint it's by default set to 10 min. If replication fails (e.g. because the machine has lost power) the cache will still have the data and will try to replicate again after OneNote is restarted.
- Actual mechanics of the incremental save are rather technical. The bottom line is that we have our own binary format and all changes are stored in form of "revisions", sort of diff between current state and previously saved state. As these revisions grow OneNote will run optimization to clean up the revisions and update the main base state.
Hope it clears things a bit, let me know if you have any questions.
Thanks Irina! So I hope this explains things like why we have a cache (which allows OneNote to go offline, merge changes and more) as well as explain why our app works certain ways. The storage tech is actually quite complex and innovative; I haven't really appreciated it as much until I deal with other sync technologies that make me choose which copy is the most up-to-date, etc. There is still a lot more going on under the covers but this is a good overview, if you have more questions please let us know.
Comments
Anonymous
February 20, 2007
The comment has been removedAnonymous
February 20, 2007
Dave - As you can see on this blog post: http://blogs.msdn.com/descapa/archive/2006/08/02/686087.aspx with OneNote 2007 you can store all of your notes on a USB drive and sycn between two computers.Anonymous
February 21, 2007
Perhaps the periodic "Optimization of revisions" described above explains my biggest problem with Onenote 2007 - sometimes Onenote disk utilization jumps to constant (hard drive light full on), and stays that way for a couple of hours. During this time, CPU utlization hovers between 90 and 100 percent, although Task Manager claims that Onenote's CPU usage is very low. However, if I terminate Onenote, the Disk and CPU usage immediately returns to normal. If this optimization really is a possible culprit, I would be interested to know if there is anything I can do to "force" the optimization to be done at a certain time, so that it doesn't happen when I'm trying to take notes in a meeting, for instance.Anonymous
February 21, 2007
I did not make my point clear. Do just the "deltas" go onto the USB drive? Like: Computer-A's OneNote ---->User makes change--->delta goes to USB---->USB plugged into Computer-B------>Computer-B's OneNote synched. Without this if your OneNote is larger than the USB's capacity then there will be a problem.Anonymous
February 22, 2007
Dave - No the whole file is stored on USB (which will include deltas and the base). If you store your notes on USB then all of your notes will be there but you can make changes on either computer and when you plugin the USB OneNote will sync the changes to the device. If there are too many deltas then OneNote will optimize the files. More clear now?Anonymous
February 22, 2007
Blair - You can look in Tools-->Options under Save and there are some options in there. You can tell OneNote to run all of your optimizations when you click a button and it should clean everything up. In most cases I never have problems with optimization except for when I ran the beta release. In RTM I haven't had problems. Here is what I suggestion, click on the Optimize Now button and let OneNote finish. Then see if you get those errors again. Let us know if this fixes your problem.Anonymous
February 22, 2007
The comment has been removedAnonymous
February 24, 2007
The comment has been removedAnonymous
February 24, 2007
By the way, I should have mentioned that these problems have been experienced using 2007 RTM.Anonymous
March 01, 2007
Dan, Do you know if there's a way to tailor the UNC sync interval? Here's why. I tried running against a non-IIS WebDAV server to share some work with buddies of mine. Despite all my best efforts to configure it to work correctly, the whole setup is just unstable. So now I've set up VPNed access to a samba share on a personal machine. Suffice to say, due to cable upload speeds, access to the SMB shares is pretty slow. So slow that OneNote ALWAYS indicates that it is synching with the share. It'd be nice if I could tweak the registry or something to tell it only to ping the server every 10 minutes or so. Is this possible? EvanAnonymous
March 04, 2007
Evan - I just looked and there are no policies/reg keys for the UNC sync interval, only on SharePoint. If you were to connect via http:// then it would be 10 minutes instead of 30 seconds. How about telling OneNote to work offline and then go back online. You can do this by going to File-->Sync-->Sync Status. You can choose Work Offline and then go online when you are ready to sync. Will this work for you?Anonymous
March 07, 2007
Dan, that's (going offline and back on later) exactly what I and the others I'm working with are doing for the moment. I just don't have access to a SharePoint service (or IIS WebDAV) and am not too keen on investing in a pay-for service at this point. Of course, every once in a while someone forgets to go online when were collaborating and has a doh moment after wondering why they're not seeing updates. So it would be nice if the OneNote team could consider adding per-notebook sync schedule tailoring regardless of the type of share used. Thanks.Anonymous
March 07, 2007
Evan - Good feedback...have you thought about having just a simple account with Office Live? I believe they have a free service that will let you do SharePoint over the Internet. Perfect for what you are doing. Maybe this doesn't work for you but it is a solution. Otherwise good feedbackAnonymous
March 08, 2007
I'll take a look into it. I did do a quick search on "free sharepoint" and "free webdav" a while ago but most outfits had ridiculously small disk space offerings. 500MB for the Office Live Basics might do me for a while.Anonymous
July 30, 2007
Can we change cache location programmatically?Anonymous
August 01, 2007
No you cannot do this with the OneNote API. That value is stored in the registry so you could modify it via registry APIs and then reboot OneNote so it would use the new cache location. Hope that helpsAnonymous
August 02, 2007
Can we change OneNote cache file path programmatically?Anonymous
August 02, 2007
Sorry for post the question again, i can't see the reply you have given.Anonymous
August 02, 2007
No you cannot do this with the OneNote API. That value is stored in the registry so you could modify it via registry APIs and then reboot OneNote so it would use the new cache location. Hope that helpsAnonymous
August 02, 2007
Can we get the Onenote Cache location using the Onenote API. In other words i need to have (C:Documents and Settings....OneNoteOfflineCache_Files) but programmatically at run time.Anonymous
August 02, 2007
Thanks Dan, Can you tell me registry location from where we can findout the offline cache files path for OneNote. I just tried out with registry but only few paths are available into registry like backup folder path and unfiledNoteSection under the HKEY_CURRENT_USERsoftwareMicrosoftOffice12.0OneNoteOptionsPaths.Anonymous
August 03, 2007
It should be in there as well but you just haven't configured the option so it doesn't appear in the registry since ON uses the default value. Please go to Tools-->Options, Save and choose to modify the cache location. Then look for that key in the registry. Now you have the key you can use. Hope this helpsAnonymous
December 12, 2007
I love getting emails from readers of the blog, it is great to hear from so many of you. Of course IAnonymous
February 14, 2008
One problem with this optimization scheme that I and my of my fellow students are suffering from is the following: If you are working on one page, say for an 1.5 hr lecture, then about 45 < x min into the lecture, optimization kicks in ( I believe) and fries your session. This happens consistently for many people. I believe it has to do with "percentage of unused space allowed in files without optimizing" but I am not entirely sure... This problem is particularly bad if you are also recording your lectures as CPU usage will skyrocket to 100% making onenote usable!Anonymous
August 21, 2008
From the first time I use Onenote, knowing I don't have to remember pressing the ctrl+s keys periodically (a habbit from word and almost any other programs), I love it so much that I wish all program will be like that. Just let the computer worry about saving stuff, all the user need to do is concentrate on the work, not on possibility and worry about losing the work. So, can other at least microsoft office program (word, excel, etc.) adapting such saving behavior and free me from the habbit of ctrl+S? Thanks!Anonymous
May 07, 2009
Hi Dan, Great post. I've got a question about file sync services that operate outside of the OneNote environment to allow you to share OneNote notebooks (and other files) across multiple systems (including maintaining a backup in the cloud). Specifically - I'm talking about dropbox, but I guess the same would apply to foldershare or live mesh. I'll call this service file-sync as opposed to the onenote-sync that's built in. If I open a notebook from a local folder on my PC as a local notebook (c:notebooksfred.one), and that local folder happens to be one that's file-syncronized, then at any point - the sync app could pop up and modify the file - possibly at the same time as OneNote is trying to modify the file (given that OneNote doesn't seem to exclusively lock the file). As you mention above that OneNote actually saves to the local cache even when using a local file - then onenote-sync should take care of any inconsistencies. I could also open the local notebook by pretending it is a network file - i.e. via opening it as \pcnotebooksfred.one In this case you get the sync icon, and you can do work-offline, or online. This 'feels safer' but given your comment above, I'm not sure if it is any different. Have you done any OneNote testing in this environment? Am I risking my notes by using it like this? Cheers, Andy.Anonymous
April 25, 2015
Hi, Does anyone knows whether it is more storage efficient to store in Onenote or Word? For example, if I have a file that is 5mb and I copy and paste the exact content in Onenote, will the total file size increase by 5mb or not? I tried Microsoft Answers but the site is not working. Thanks.