A rather long rambling email in which I make a case for the use of XML
as the PIM backend format.
* My major complaint against SQLlite is that it feels like overkill for
such a simple application, especially if we have libxml being installed
as a dependency for libglade, which a quite a few applications use.
* XML is human readable, and doesn't require a front-end to tweak it or
play with it. Just a simple text editor.
* I suspect that an XML backend could be made nearly as fast as an sql
backend, although an indexing layer might need to be written. (SAX-based
interface scans file, and generates an index on "searchable" attributes"
and stores <keyword,offset> in a index file) This would only apply when
rapid lookups are needed, and I'm not sure how often this is the case,
or if a sequential scan would suffice.
Does sqllite build indexes? How does it speed searches?
If anyone takes issue to this, I suppose I can hack together some perf
numbers for: memory usage/search time scaling for SAX-based scanner,
DOM-based scanner, sqllite-based search). What are common tasks that the
database needs to do?
A proposal for an XML backend, that provides "hotsync" capabilities for
a daemon that may be ignorant of the actual schemas:
Basicly:
<contacts>
<contact contact-id="0x1234" fullname="..." .../>
<phone location="work">(111)555-5555</phone>
<email location="work">foo_at_corp.net</email>
...
<img path="/root/.pim/contacts/blobs/0000001.blob"/>
</contact>
</contacts>
Lightweight and metadata is kept in the XML tree. Anything bigger (image
blobs, memo blobs, XML blobs that are not useful for searching or
filtering) is kept in individual flatfile in a directory hierarchy for
the application: "$HOME/.pim/<appname>/blobs/blobID.blob".
If we don't care about searching/indexing our contacts data, we can save
a ton of memory (and still get the ease-of-use of DOM.) like so:
<contacts>
<contact contact-id="0x1234" fullname="..." blobid="0002.blob"/>
</contacts>
We can provide a simple system for hotsyncing as well, by requiring that
each primary subdivision ("contact" in this example) has a required set
of attributes:
<primary-subdivision sync-id="0x1234567" status=""/>
Where status is:
none = no changes since last sync
new = newly created, not sync'ed with peer
delete = delete on next sync (local and peer)
change = changed locally, not sync'ed
conflict = change has occured on both hosts, resolve manually
* IDs are generated serially by the common sync daemon.
* IDs are assigned after the first sync (if status == new, id="")
To sync, each host just generates a list of needed changes, they are
compared for conflicts, and executed. A similar database could track
files to be synced in a "my documents" type folder. (Check ctime >
last_sync or save an MD5sum, etc)
Questions, comments, snide remarks?
Adam Lydick
Received on Sun Feb 10 2002 - 11:20:51 EST
This archive was generated by hypermail 2.2.0 : Mon Jul 25 2005 - 17:18:59 EDT