Caching objects using serialization

Do you run out of memory becuase everything is cached in memory and hit the limitation imposed by 32-bit machines?
Do you need to preserve objects when your application restarts?

If you answered yes, then I have a simple solution. Use serialization to cache objects to disk. When objects are added or updated to a collection then serialize it to disk and de-serialize when a retreive request is made.

I started this project called PersistentCaching and created a class called PersistentArrayList. The first step was to define methods I wanted to support, and I decided on the following: Add, RemoveAt, Clear, Item, and Count.

Initializing
I created 2 streams for writing and 1 stream for reading.
Write Stream 1 - This stream is used to serialize all objects to a file. When add or updates occur the object must be serialize to disk using this stream.
Write Stream 2 - This stream get all updates to the in-memory index file. We need to preserve this information for quick recovery. It's much faster to recovery from a separate index file then reading a large data file of objects to rebuild an in-memory index. I want my cache to recover in seconds, not minutes and to make it even faster let us use a binary stream.
Read Stream 1 - Used when requests are made for objects cached on the disk. We must position to file pointer, de-serialize the object from disk, and return it back to the user.

Add
Add method must record the position of the stream before writing to disk. This will be very important when servicing requests for the object from our cache. Using this position we can quickly move the stream to the correct spot on disk and de-serialize the object.

The add must follow this order:
1. Record the stream position.
2. Serialize object to disk and flush the stream.
3. Write the add operation to our index file.
4. Add the information to our in-memory index.

RemoveAt
RemoveAt is pretty simple. All we need to do is update our in-memory index and record it to the index file. The serialize object continue to exist on disk but because we remove it from our in-memory index it can never be reference. This is similar to a logical delete because it still exists on disk but will never get accessed.

Clear
Yet another simple operation. All we need to do it clear our in-memory index and record it to the index file.

Item - Set
This is very similar to the add operation and the following steps are taken:
1. Record the starting position.
2. Serialize the object to disk.
3. Write the update operationh to our index file.
4. Update the information in our in-memory index.

Item - Get
We finally get to use our in-memory index file. When a request is made for an object we must first locate the position on disk using our in-memory index. Once we have this position the file pointer gets moved and we begin to de-serialize the object from disk. The object has sucessfully been re-created in memory and can be returned.

Recovery
Much time has been spent on adds and updates to our index file and this is wear it all pays off. I didn't want to recover from the large data file de-serializing every object to re-build the in-memory index. A more elegant approach we used to use a seperate file and record all in-memory index changes. On recovery we simply parse this binary file to rebuild the in-memory index and that is IT!!! Once the in-memory index is built we can easily find where objects live on disk in the big data file without ever touching it.

Wasn't this simple? I attched the source code incase somethnig isn't clear.

AttachmentSize
PersistentCaching.zip35.79 KB

Comments

Secondary, tertiary caching

Secondary, tertiary caching is often overlooked when it is perfectly valid - even preferrable to a unilateral caching model. Very cool to have posted such a (re)useful library.

You might want to check out other so called prevalence frameworks. One from the Java camp is Prevayler.