(СКОПИРОВАНО в plans/reference/db.raindrop)
Problem: both SQL and fixed schemas suck. Is there something better? (Yep, there's no such thing as abstract «good» or «bad», but anyway.)
Distributed
The general term «distributed database» is not what I want: I need something closer to the Google Wave concept, when only parts of data are replicated, — the parts that concern somebody who is authenticated in given storage=namespace and is registered as such with the piece of data (i.e. takes part in a discussion).
Mariposa
What's this: a solid replicating DB or rather a net?
Why that project had no(?) success? was active 1993—1999.
CouchDB
(TODO: gather notes scattered in blog, papers, etc.)
Evaluation:
- benefits:
- ...
- mature python bindings
- flaws:
- though views can be saved, they are stored in database and cannot be fully integrated into the application code, subclassed/decorated, etc. → difficult to build a truly native API on top of CouchDB and to reuse existing code is different projects
- otoh, see couchapp — easily upload Python code to CouchDB (but what are the limitations?)
- though views can be saved, they are stored in database and cannot be fully integrated into the application code, subclassed/decorated, etc. → difficult to build a truly native API on top of CouchDB and to reuse existing code is different projects
MongoDB
Evaluation:
- benefits:
- ...
- mature python bindings
- flaws:
- not portable: fixed system-wide data directory (wanted: custom file)
- whoops, wrong!!
bin/mongod —dbpath some_pathworks fine.
- whoops, wrong!!
- not portable: fixed system-wide data directory (wanted: custom file)
StrokeD
(?) — and see example for model metadata usage
DBM
It should be noted that while dbm and its derivatives are pre-relational databases—effectively a hash fixed to disk—in practice they can offer a more practical solution for high-speed storage looked up by-key as they do not require the overhead of connecting and preparing queries. This is balanced by the fact that they can generally only be opened for writing by a single process at a time. While this can be addressed by the use of an agent daemon which can receive signals from multiple processes, this does, in practice, add back some of the overhead (though not all). In simpler terms, they may be old tech but they're fast. — http://en.wikipedia.org/wiki/Dbm
Evaluation:
- benefits:
- portable
- fast
- flexible structure (almost as in document-oriented databases)
- flaws:
- no indexing of values
- string-only values → overhead on coercing types (e.g. calculating statistical data)
Standard Python libraries
Tokyo Cabinet
Evaluation:
- benefits:
- very easy to install and deploy (especially in cmp to MongoDB)
- compact (engine + server = 1Mb)
- portable (each database sits in a single and freely movable file, as with SQLite. Tyrant server can be run to enable concurrent connections)
- fast
- flexible (multiple flavours, including something similar to DODB)
- good Python API available
- full-text search system
- flaws:
- documentation describes lots of C functions but no reader-friendly information on the HTTP API
- (?) no transactions for table storage
- (?) only *nix
Specs:
- Cabinet — the base library
- Tyrant specs — database server for Cabinet
- Dystopia specs — a full-text search system for Cabinet
Python bindings:
- pytc — TC, original bindings (outdated?)
- tc — TC, no Tyrant
- python-tokyotyrant — Tyrant, no TDB
- pytyrant — Tyrant, no TDB
- issue #5 about TDB
- ericflo-pytyrant, a fork with claimed TDB support
- my fork :)
- pyrant another fork/rewrite of pytyrant (TDB, etc.)
- tokyocabinet — TC (claimed TDB support)
Discussion:
- Tokyo Cabinet: Beyond Key-Value Store — a nice overview article of TC and Tyrant with Ruby API examples
- discussion at stackoverflow: are there Python bindings for TDB?
- Why I abandoned Tokyo Tyrant — by Pete Warden
Hint — how to query column values via a TDB-less version of pytyrant:
pytyrant.PyTyrant.open('127.0.0.1', 1978).misc('search', 0, ['addcondx00foox000x00bar'])
...where x00 is separator, and the 3rd argument 0 is the sequence number of chosen query operation («string is equal») as defined in tctdb.h from tokyocabinet.
For newer pytyrant with table support it's much easier (thanks to Eric):
pytyrant.PyTableTyrant.open('127.0.0.1', 1978).search.filter(foo__streq='bar')