(СКОПИРОВАНО в plans/reference/db.raindrop)


Problem: both SQL and fixed schemas suck. Is there something better? (Yep, there's no such thing as abstract «good» or «bad», but anyway.)

Distributed

The general term «distributed database» is not what I want: I need something closer to the Google Wave concept, when only parts of data are replicated, — the parts that concern somebody who is authenticated in given storage=namespace and is registered as such with the piece of data (i.e. takes part in a discussion).

Mariposa

What's this: a solid replicating DB or rather a net?
Why that project had no(?) success? was active 1993—1999.

CouchDB

(TODO: gather notes scattered in blog, papers, etc.)

Evaluation:

  • benefits:
  • flaws:
    • though views can be saved, they are stored in database and cannot be fully integrated into the application code, subclassed/decorated, etc. → difficult to build a truly native API on top of CouchDB and to reuse existing code is different projects
      • otoh, see couchapp — easily upload Python code to CouchDB (but what are the limitations?)

MongoDB

Evaluation:

  • benefits:
    • ...
    • mature python bindings
  • flaws:
    • not portable: fixed system-wide data directory (wanted: custom file)
      • whoops, wrong!! bin/mongod —dbpath some_path works fine.

StrokeD

(?) — and see example for model metadata usage

DBM

It should be noted that while dbm and its derivatives are pre-relational databases—effectively a hash fixed to disk—in practice they can offer a more practical solution for high-speed storage looked up by-key as they do not require the overhead of connecting and preparing queries. This is balanced by the fact that they can generally only be opened for writing by a single process at a time. While this can be addressed by the use of an agent daemon which can receive signals from multiple processes, this does, in practice, add back some of the overhead (though not all). In simpler terms, they may be old tech but they're fast. — http://en.wikipedia.org/wiki/Dbm

Evaluation:

  • benefits:
    • portable
    • fast
    • flexible structure (almost as in document-oriented databases)
  • flaws:
    • no indexing of values
    • string-only values → overhead on coercing types (e.g. calculating statistical data)

Standard Python libraries

  • shelve — automatic (un)pickling of values
  • anydbm — string-only values; used by shelve

Tokyo Cabinet

Evaluation:

  • benefits:
    • very easy to install and deploy (especially in cmp to MongoDB)
    • compact (engine + server = 1Mb)
    • portable (each database sits in a single and freely movable file, as with SQLite. Tyrant server can be run to enable concurrent connections)
    • fast
    • flexible (multiple flavours, including something similar to DODB)
    • good Python API available
    • full-text search system
  • flaws:
    • documentation describes lots of C functions but no reader-friendly information on the HTTP API
    • (?) no transactions for table storage
    • (?) only *nix

Specs:

Python bindings:

Discussion:

Hint — how to query column values via a TDB-less version of pytyrant:

pytyrant.PyTyrant.open('127.0.0.1', 1978).misc('search', 0, ['addcondx00foox000x00bar'])

...where x00 is separator, and the 3rd argument 0 is the sequence number of chosen query operation («string is equal») as defined in tctdb.h from tokyocabinet.

For newer pytyrant with table support it's much easier (thanks to Eric):

pytyrant.PyTableTyrant.open('127.0.0.1', 1978).search.filter(foo__streq='bar')