Skip navigation
Toggle Sidebar

Overview

On initial stages of every In Memory Data Grid project it is always essential to get correct estimates of memory requirements. How much memory my domain objects converted to entries will consume? What will be indexing overhead? The correct answer to this question defines what heap size of the JVMs to choose within cluster and how many of JVMs will be needed to store whole dataset.

It is very hard to answer this question theoretically for the following reasons:

  • Java memory model doesn't allow accurately estimate size of the object on the heap
  • space doesn't store entries as heap objects - they are stored decomposed to fields
  • String uid is generated and stored along with each entry
  • indexing overhead is dependent on field dataset cardinality.

Therefore it is much more practical to take experimental approach in measuring entry footprint. To build a convenient toolkit for that is the goal of Binary Calculator project. Main idea is quite simple:

  1. Connect to remote space
  2. Get the batch of tested entries from some entry source
  3. Write a batch to remote space
  4. Perform remote garbage collecting
  5. Measure memory usage
  6. Repeat step 2

After sufficient number of iterations is done, we will get a number of data points in the format (entriesWritten, memoryUsage). Performing linear approximation (e.g. min square linear fit) we receive the approximation of single entry footprint (including index overhead).

Binary Calculator toolkit

Initial version of Binary Calculator is a simple Swing application which provides basic support for described process. More on the roadmap

Version 0.2 intended to be used as toolkit, e.g. it requires some very simple coding to build fully functional memory footprint measurer tool.

Usage pattern for 0.2 version:

  1. Provide your subject Entry implementation
  2. Implement BatchedDataSource interface from api module which will generate batches subject Entries. DataSource can be, for example, random data generator or database connector.
  3. Wire your BatchedDataSource implementation to batchWriter bean
  4. Compile and run Binary Calculator
  5. Run remote space (with gsInstance command)
  6. Start memory usage experiment

Binary Calculator exposes minimalistic GUI to set up main runtime parameters for experiment:

Property Description
Jini URL jini url of the space to connect to
Batch Size entries will be written in batches to space
Entries to Write maximum number of entries to write
GS pause how long to wait (ms) until remote gc completes

Adaptavist Theme Builder Powered by Atlassian Confluence