Csync for ownCloud Client 1.1.0 – A New Sync Engine

Home > FOSS, ownCloud, Release > Csync for ownCloud Client 1.1.0 – A New Sync Engine

Csync for ownCloud Client 1.1.0 – A New Sync Engine

October 11, 2012 dragotin Leave a comment Go to comments

Along with todays ownCloud 4.5 release we released the new ownCloud Client 1.1.0 with a new syncing concept.

This blog will shed some light on the details. I apologize, it’s a long read.

Time Issues

ownCloud Client versions 1.0.x worked with csyncs traditional way of using the file modification times to detect updates between the two repositories that should be synced to each other. That works fine and conforms to our idea to ideally not use any other metadata in syncing than what the file system has anyway.

However, there is one drawback which we all know from daily life: If at least two parties sync on time its important that all clocks are set exactly the same way. Remember good crime movies where a bank robbery always starts with a clock adjustment of all gangsters? We have exactly the same in ownClouds syncing: All involved have to have the same time setting, otherwise modification times of files can not be compared reliably.

There are solutions for computers to set the exact time (like ntp) so in general that works. However, in real life scenarios these are not reliable because either people do not have them started on the system or because the daemon updates the time once in a while and in that time span the clock skews already too much.

Users all the time reported problems with that and other experts continued to advise that we never get around that problems if we don’t change something fundamental and go away from pure time based syncing.

Well, we did that with our csync version 0.60.0 which is the sync engine for ownCloud Client 1.1.0.

An Unique Id

Now, every file and directory inside a sync directory has an unique Id associated. The idea is that the Id changes if the file changes. So in the sync process the need for a file update in either direction can be computed by comparing the two Ids of the file. If the id has changed on one repository the file was changed there and needs to be synced to the other side.

The Ids are generated on the ownCloud server and one challenge for the client is to always download the correct Id of a file. The Ids are just random tags for a file version. It is not associated to the file content as MD5 sums would be. Actually it was a frequent advise to use MD5 sums or a similar approach which digests the files content to detect updates. That would have come very handy because that means comparing file contents directly and, more important, it’s reproducable on either side. Also the client would have been able to recalculate the MD5-Sum of the local files and would not have depended on a local database with Ids that were pulled from the server before.

But we decided against hashes. Calculating MD5-Sums is costly in terms of CPU and time, especially for large files. The CPU problem is small on clients, but not on servers where a lot of clients connect to. Even though the sums can be calculated during upload, the problems remain for the case where the server does not see the upload stream, think of the “mount my Dropbox” case.

For files on the ownCloud server, the Id is always updated when the file gets updated. On the client side the last Id of a file is in the client database. It is invalidated in case the files modification time changed meanwhile to detect local changes.

Change Propagation

Another remarkable change in the 1.1.0 client is that change events in the file tree propagate up to the top directory on the owncloud server, ie. if a file changes in a directory, the id of the directory changes as well as the one of its parent directory etc.

That means that to detect if a file tree has changed, it’s enough to check the top most directories Id. If that has changed, ok, than the client needs to dig deeper, but in the not so rare case that nothing has changed, the one call is enough to detect that. That dramatically lowers the server load with clients because instead of digging through the whole directory structure what we did with the 1.0.x series
it is a few requests now.

CSync and ownCloud for Success

These are very intrusive changes to csync. For example, we had to add two additional fields to the database, add code that is able to build a representation
of the local file tree from the database and make csync query for the file Ids from
the server if needed. Deep under the hood the updater, reconciler and propagator code needed changes to work with the Ids. All these changes did not go back to csync upstream yet.

To not conflict with the upstream version of csync we decided to rename our csync version to ocsync. But: This is a temporar solution for the time we need to catch up with upstream again. That will take a while until everything is sorted again but we will work on that.

I am are very excited about the new version of csync. But obviously there are other changes in the ownCloud Client 1.1.0 which will be subject of another blog post.

Categories: FOSS, ownCloud, Release Tags: Concept, csync, mirall, owncloud, ownCloud Client, status, syncing, Unique Id

Comments (16) Trackbacks (2) Leave a comment Trackback

Xapa0

October 11, 2012 at 17:52

Reply

Just for curiosity about the integrity of syncronized files:
– How OC knows what to do when both files (the one stored in OC and the same in a local copy of a OC client) has been modified?
– How OC knows what to do when the same file has been modified in multiple local copies (for example a file modified inside a smartphone and in a PC… which one of these 2 modified versions will be uploaded to OC)?
- dragotin
  
  October 13, 2012 at 14:26
  
  Reply
  
  In the first case, the newer file (according to mtimes, which get corrected by the time difference between oC and client) is synced to the other side, where the other, changed file is saved as a “conflict” file so that the user can decide what happens.
  
  In the second case, the syncing is still happens in a sequence, so its two times a normal sync case or the case above.
jstaniek

October 11, 2012 at 20:50

Reply

Klaas, so nice to read about algorithms on planetkde, it’s rare topic 🙂 As always, nice read!
Thijs

October 11, 2012 at 22:21

Reply

Awesome! While toying around with ownCloud, it turns out that a smooth client sync is the make or break feature. And the serverload with the old client was just a bit too much, so I ceased to actively sync my cloud. Thanks for making this kind of progress!

I hope that the spurious conflict file issue is now also resolved – but it woudl mak sense to think that it is.
d-fens_

October 11, 2012 at 23:34

Reply

great work so far and nice insights!
still i’m waiting for block based delta syncing; then owncloud will be truely top notch 🙂
- dragotin
  
  October 13, 2012 at 14:27
  
  Reply
  
  I will right away start to work on a more smooth handling of big files, stay tuned on this one. But that will be the first step into that direction, yet not the final beautiness. But: one step after the other.
Bart Noordervliet

October 12, 2012 at 07:52

Reply

That seems like a very graceful solution. Good job! I was wondering if this system also allows for delete propagation. If for instance I have 2 computers synced to my owncloud and I delete a file on one of the computers, would it be possible to have the file deleted from owncloud as well as the other computer?
- dragotin
  
  October 13, 2012 at 14:28
  
  Reply
  
  exactly that should work today.
rpedrica

October 12, 2012 at 17:25

Reply

Hi Dragotin, thank you for the new client. I have some questions:

1. when you say ‘add fields to database’, are you referring to the OC database?
2. if yes, then are there any manual steps required when upgrading 4.0.x to 4.5.x beyond the steps listed in the Upgrade doc, to use the new client?
3. will upgrade of client from 1.0.x preserve the users existing config?
4. OC 4.5 requires 1.1 client, but can OC < 4.5 work with new client in timer mode/file mod time?
5. IDs are now used instead of file modification but does the client still sync on a timer or is fs notify available across platforms?

Thank you

Robby
- dragotin
  
  October 13, 2012 at 14:31
  
  Reply
  
  1. No, I meant the client database. However, probably the server db had to change as well for the id, but I haven’t done it. Pls ask on the mailinglist for details.
  2. No, update is automatically (on the client)
  3. yes
  4. No, oCC 1.1.0 does not work with oC < 4.5, yet oCC 1.0.x works with oC 4.5
  5. fs notification is only available on Linux (inotify) so far. the rest is timer driven.
sen

October 12, 2012 at 19:03

Reply

As your post shows, file syncing is all but trivial. How about an option (plugin?) to let the owncloud server (inotify)watch the filesystem, to also make it play well with any type of ftp, ssh etc. updates?

Then, we could also one of the real fast tried a proven two- or n-way syncing solutions like unison (+sucsynct), or csync^2 to have the syncing part covered properly.
- dragotin
  
  October 13, 2012 at 14:32
  
  Reply
  
  TBH I don’t quite get your idea. Please try to become more specific, and have that discussion with us on the mailinglist. Thanks.
RandolphCarter

October 13, 2012 at 16:10

Reply

I’m very interested in this Unique Id: So how is this unique id now assembled? You say it is not a Hash value for the file, but I didn’t see any info on how it is determined (or i just didn’t see it). Could you share how that is done with us? Thanks!

One remark on a possible improvement: Couldn’t you do it similar to e.g. unison: Calculate the unique id as hash of the file, and store that along with the last modified date of the file, and only recalculate the hash if the modified date of the file has changed? That way you would have the benefit of the hash, but also wouldn’t have to recalculate the hash each time the file tree is scanned!
- dragotin
  
  November 14, 2012 at 14:54
  
  Reply
  
  The uniq id is just a string formed of random numbers. They have no relation to the file content so far. We decided against a hash so far because it would be costly to compute it and not always easily possible, for example if we mount dropboxes into the ownCloud server.
  - Etienne
    
    September 12, 2013 at 14:13
    
    Hello,
    in order to avoid a large download on a new computer (that I want to sync), I prefer to copy all my files on the new computer, before starting sync. How could I copy this uniq ID on my new computer?
    Thank you!
    Etienne
Cyryl Sochacki

October 16, 2012 at 10:11

Reply

Missing translations for language Polish