Archive

Posts Tagged ‘syncing’

A week for csync

December 9, 2012 11 comments

On Friday I arrived back from Berlin where I had the pleasure to work with my great colleague Danimo and our friends from Woboq, Markus and Olivier, in the Woboq Headquarter in Berlin Kreuzberg for a week.

We thought that it might be fun to work together on csync, our sync engine under the hood of the ownCloud client. There were some issues that should be fixed and on the way we cleaned and improved quite some code in csync.

Here are some things we worked on:

  • We added a function that lets the program that uses the csync library pass arbitrary module parameters to the backend module. That way its more easy to steer the behaviour of the ownCloud modules from the calling app.
  • Error handling was improved, ie. if an http error happens, csync works errno based error reporting. We added custom errnos because not all error cases with http can be mapped to system errnos.
  • Formerly the csync ownCloud module was spooling files through an additional temporar file on client side. That step is skipped now which results in performance improvements as well as in more clean code.
  • We were able to reduce the number of HTTP requests that go over the wire even more. For example to check if there are changes on server side, now there is exactly one http propfind required. Also if files have to be synced, we could save some HTTP requests by improving caching of some requests.
  • Andreas recently changed the logging system in csync upstream master branch. We merged that back and now do not longer need the log4c framework. One build dependency less and a nice new logging framework.
  • Other bugs were fixed, such as a potential crash if a folder as deleted during it is synced, SSL handling shortcomings, code streamlining in handling compressed data streams and more.
  • We finalized a patch that uniforms the utf8 representations of characters over all platforms. That will fix problems we saw especially with MacOS and special filenames.

Ah, yes, we also did other things, more related to the ownCloud client. Danimo managed to implement a cross platform filesystem watcher class that is able to fulfill our requirements. That obsoletes polling for changes on the local file system, one of the most popular enhancement requests.

And finally there now is a API in csync thats reports file transmission progress if a callback is installed accordingly. So the client hopefully soon will tell ya what it’s doing for you. Also appreciated I guess…

Last but not least we added code to use QtKeyChain, a cross platform password storage library that stores password encrypted. For example on Linux QtKeyChain connects to kwallet. QtKeyChain was provided by
Frank Osterfeld, thanks a lot for that contribution.

Quite some stuff for a short week, note that stuff that fills a short line in this blog can be quite nifty to investigate, implement and test. Not everything is stable, polished and properly integrated but it was a great and productive week. The next release of ownCloud Client will be a nice one.

woboq_dinner2
And sice you can not always work, we had a nice dinner at a very cool italian restaurant. We met with other ownCloud employees located in Berlin, Arthur and Georg. Fun 🙂 And Berlin, yes, a great place to be, but finally I appreciated to arrive back to my snow covered home.

Many thanks to Olivier and Markus for hosting us and for the nice week.

Csync for ownCloud Client 1.1.0 – A New Sync Engine

October 11, 2012 18 comments

Along with todays ownCloud 4.5 release we released the new ownCloud Client 1.1.0 with a new syncing concept.

This blog will shed some light on the details. I apologize, it’s a long read.

Time Issues

ownCloud Client versions 1.0.x worked with csyncs traditional way of using the file modification times to detect updates between the two repositories that should be synced to each other. That works fine and conforms to our idea to ideally not use any other metadata in syncing than what the file system has anyway.

Time flies

However, there is one drawback which we all know from daily life: If at least two parties sync on time its important that all clocks are set exactly the same way. Remember good crime movies where a bank robbery always starts with a clock adjustment of all gangsters? We have exactly the same in ownClouds syncing: All involved have to have the same time setting, otherwise modification times of files can not be compared reliably.

There are solutions for computers to set the exact time (like ntp) so in general that works. However, in real life scenarios these are not reliable because either people do not have them started on the system or because the daemon updates the time once in a while and in that time span the clock skews already too much.

Users all the time reported problems with that and other experts continued to advise that we never get around that problems if we don’t change something fundamental and go away from pure time based syncing.

Well, we did that with our csync version 0.60.0 which is the sync engine for ownCloud Client 1.1.0.

An Unique Id

Now, every file and directory inside a sync directory has an unique Id associated. The idea is that the Id changes if the file changes. So in the sync process the need for a file update in either direction can be computed by comparing the two Ids of the file. If the id has changed on one repository the file was changed there and needs to be synced to the other side.

The Ids are generated on the ownCloud server and one challenge for the client is to always download the correct Id of a file. The Ids are just random tags for a file version. It is not associated to the file content as MD5 sums would be. Actually it was a frequent advise to use MD5 sums or a similar approach which digests the files content to detect updates. That would have come very handy because that means comparing file contents directly and, more important, it’s reproducable on either side. Also the client would have been able to recalculate the MD5-Sum of the local files and would not have depended on a local database with Ids that were pulled from the server before.

But we decided against hashes. Calculating MD5-Sums is costly in terms of CPU and time, especially for large files. The CPU problem is small on clients, but not on servers where a lot of clients connect to. Even though the sums can be calculated during upload, the problems remain for the case where the server does not see the upload stream, think of the “mount my Dropbox” case.

For files on the ownCloud server, the Id is always updated when the file gets updated. On the client side the last Id of a file is in the client database. It is invalidated in case the files modification time changed meanwhile to detect local changes.

Change Propagation

Another remarkable change in the 1.1.0 client is that change events in the file tree propagate up to the top directory on the owncloud server, ie. if a file changes in a directory, the id of the directory changes as well as the one of its parent directory etc.

That means that to detect if a file tree has changed, it’s enough to check the top most directories Id. If that has changed, ok, than the client needs to dig deeper, but in the not so rare case that nothing has changed, the one call is enough to detect that. That dramatically lowers the server load with clients because instead of digging through the whole directory structure what we did with the 1.0.x series
it is a few requests now.

CSync and ownCloud for Success

These are very intrusive changes to csync. For example, we had to add two additional fields to the database, add code that is able to build a representation
of the local file tree from the database and make csync query for the file Ids from
the server if needed. Deep under the hood the updater, reconciler and propagator code needed changes to work with the Ids. All these changes did not go back to csync upstream yet.

To not conflict with the upstream version of csync we decided to rename our csync version to ocsync. But: This is a temporar solution for the time we need to catch up with upstream again. That will take a while until everything is sorted again but we will work on that.

I am are very excited about the new version of csync. But obviously there are other changes in the ownCloud Client 1.1.0 which will be subject of another blog post.