Notice: This is the previous version of Bit Dojo. The content here is no longer being updated. Please visit the current site at

Warning: This is serious machine learning stuff, you should find something else to do if you don’t know why you are here.

Bmp2Arff is a simple Java utility that allows you to create spatial datasets from bitmap files. The resulting dataset is in ARFF format which you can feed into your cool algorithms using tools such as WEKA.

The dataset will have 3 attributes: [x, y, class]. X and y correspond to the pixel coordinates and class is the color of the pixel, so a white pixel at (12,34) will become [12, 34, -1], the class is -1 because white is 0xffffff. This means that all the pixels with the same color will be in the same class. The pixel at (0,0) will be the background color, and all pixels of this color will be excluded from the dataset.

It should work with any image types that Java can handle, but “lossy” types (like JPEG) are totally not recommended.

We have created this convenient utility because existing spatial datasets are rare, and hopefully it will be useful to you too.


Java Source, and some samples (replicated datasets from the DBSCAN paper).