Archive

Posts Tagged ‘syncing’

ownCloud and CryFS

August 17, 2019 5 comments

It is a great idea to encrypt files on client side before uploading them to an ownCloud server if that one is not running in controlled environment, or if one just wants to act defensive and minimize risk.

Some people think it is a great idea to include the functionality in the sync client.

I don’t agree because it combines two very complex topics into one code base and makes the code difficult to maintain. The risk is high to end up with a kind of code base which nobody is able to maintain properly any more. So let’s better avoid that for ownCloud and look for alternatives.

A good way is to use a so called encrypted overlay filesystem and let ownCloud sync the encrypted files. The downside is that you can not use the encrypted files in the web interface because it can not decrypt the files easily. To me, that is not overly important because I want to sync files between different clients, which probably is the most common usecase.

Encrypted overlay filesystems put the encrypted data in one directory called the cipher directory. A decrypted representation of the data is mounted to a different directory, in which the user works.

That is easy to setup and use, and also in principle good to use with file sync software like ownCloud because it does not store the files in one huge container file that needs to be synced if one bit changes as other solutions do.

To use it, the cypher directory must be configured as local sync dir of the client. If a file is changed in the mounted dir, the overlay file system changes the crypto files in the cypher dir. These are synced by the ownCloud client.

One of the solutions I tried is CryFS. It works nicely in general, but is unfortunately very slow together with ownCloud sync.

The reason for that is that CryFS is chunking all files in the cypher dir into 16 kB blocks, which are spread over a set of directories. It is very beneficial because file names and sizes are not reconstructable in the cypher dir, but it hits on one of the weak sides of the ownCloud sync. ownCloud is traditionally a bit slow with many small files spread over many directories. That shows dramatically in a test with CryFS: Adding eleven new files with a overall size of around 45 MB to a CryFS filesystem directory makes the ownCloud client upload for 6:30 minutes.

Adding another four files with a total size of a bit over 1MB results in an upload of 130 files and directories, with an overall size of 1.1 MB.

A typical change use case like changing an existing office text document locally is not that bad. CryFS splits a 8,2 kB big LibreOffice text doc into three 16 kB files in three directories here. When one word gets inserted, CryFS needs to create three new dirs in the cypher dir and uploads four new 16 kB blocks.

My personal conclusion: CryFS is an interesting project. It has a nice integration in the KDE desktop with Plasma Vault. Splitting files into equal sized blocks is good because it does not allow to guess data based on names and sizes. However, for syncing with ownCloud, it is not the best partner.

If there is a way how to improve the situation, I would be eager to learn. Maybe the size of the blocks can be expanded, or the number of directories limited?
Also the upcoming ownCloud sync client version 2.6.0 again has optimizations in the discovery and propagation of changes, I am sure that improves the situation.

Let’s see what other alternatives can be found.

Categories: FOSS, KDE, ownCloud Tags: , , ,

ownCloud Chunking NG Part 2: Announcing an Upload

July 10, 2015 5 comments

The first part of this little blog series explained the basic operations of chunk file upload as we set it up for discussion. This part goes a bit beyond and talks about an addition to that, called announcing the upload.

With the processing described in the first part of the blog, the upload is done savely and with a clean approach, but it also has some drawbacks.

Most notably the server does not know the target filename of the uploaded file upfront. Also it does not know the final size or mimetype of the target file. That is not a problem in general, but imagine the following situation: A big file should be uploaded, which would exceed the users quota. That would only become an error for the user once all uploads happened, and the final upload directory is going to be moved on the final file name.

To avoid useless file transfers like that or to implement features like a file firewall, it would be good if the server would know these data at start of the upload and stop the upload in case it can not be accepted.

To achieve that, the client creates a file called _meta in /uploads/ before the upload of the chunks starts. The file contains information such as overall size, target file name and other meta information.

The server’s reply to the PUT of the _meta file can be a fail result code and error description to indicate that the upload will not be accepted due to certain server conditions. The client should check the result codes in order to avoid not necessary upload of data volume of which the final MOVE would fail anyway.

This is just a collection of ideas for an improved big file chunking protocol, nothing is decided yet. But now is the time to discuss. We’re looking forward to hearing your input.

The third and last part will describe how this plays into delta sync, which is especially interesting for big files, which are usually chunked.

ownCloud Client 1.8.0 Released

March 17, 2015 14 comments

Today, we’re happy to release the best ownCloud Desktop Client ever to our community and users! It is ownCloud Client 1.8.0 and it will push syncing with ownCloud to a new level of performance, stability and convenience.

The Share Dialog

The Share Dialog

This release brings a new integration into the operating system file manager. With 1.8.0, there is a new context menu that opens a dialog to allow the user to create a public link on a synced file. This link can be forwarded to other users who get access to the file via ownCloud.

Also the clients behavior when syncing files that are opened by other applications on Windows has greatly been improved. The problems with file locking some users saw for example with MS office apps were fixed.

Another area of improvements is again performance. With latest ownCloud servers, the client uses even more parallized requests, now for all kind of operations. Depending on the synced data structure, this can make a huge difference.

All the other changes, improvements and bug-fixes are too hard to count. Finally, this release received around 700 git commits compared to the previous release.

All this is only possible with the powerful and awesome community of ownClouders. We received a lot of very good contributions through the GitHub tracker, which helped us to nail down a lot of issues and improved the client tremendously.

But this time we’d like to specifically point out the code contributions of Alfie “Azelphur” Day and Roeland Jago Douma who contributed significant code bits to the sharing dialog on the client and also some server code.

A great thanks goes out to all of you who helped with this release. It was a great experience again and it is big fun working with you!

We hope you enjoy 1.8.0! Get it from https://owncloud.org/install/#desktop

ownCloud ETags and FileIDs

March 13, 2015 2 comments

Often questions come up about the meaning of FileIDs and ETags. Both values are metadata that the ownCloud Server stores for each of the files and directories in the server database. These values are fundamentally important for the integrity of data in the overall system.
Here are some thoughts about what they are why these are so important.This is mainly from a clients point of view, but there are other use cases as well.

ETags

ETags are strings that describe exactly one specific version of a file (example: 71a89a94b0846d53c17905a940b1581e).

data2Whenever the file changes, the ownCloud server will make sure that the ETag of the specific file changes as well. It is not important in which way the ETag changes, it also does not have to be strictly unique, it’s just important that it changes reliably if the file changes for whatever reason. However, ETags should not change if the file was not changed, otherwise the client will download that file again.

In addition to that, The ETags of the parent directories of the file have to change as well, up to the root directory. That way client systems can detect changes that happen somewhere in the file tree. This is in contrast to normal computer file systems where only the modification time of the direct parent of a file is changing.

File IDs

FileIDs are also strings that are created once at the creation time of the file (example: 00003867ocobzus5kn6s).

data3But contrary to the ETags, the file IDs should never ever change over the files lifetime. Not on an edit of the file, and also not if the file is renamed or moved. One of the important usages of the FileID is to detect renames and moves of a file on the server.

The FileID is used as an unique key to identify a file. FileIDs need to be unique within one ownCloud, and in inter-owncloud connections, they must be compared together with the ownCloud server instance id.

Also, the FileIDs must never be recycled or reused.

Checksums?

Often ETags and FileIDs are confused with checksums such as MD5 or SHA1 sums over the file content.

Neither ETags nor FileIDs are, even if there are similarities: Especially the ETag can be seen as a checksum over the file content. However, file checksums are way more costly to compute than just a value that only needs to change somehow.

What happens if…?

Let’s make a thought experiment and consider what it would mean especially for sync clients if either fileID or ETag gets lost from the servers database.

If ETags are lost, clients loose the ability to decide if files have changed since the last time that was checked by the clients. So what happens is that the client will download the files again, byte-wise compare them to the local file and use the server file if the files differ. A conflict file will be created. Because the ETag was lost, the server will create new ETags on download. This could be improved by the server creating more predictable ETags based on the storage backends capabilities.

If the ETags are changed without reason, for example because a backup was played back on the server, the clients will consider the ones with changed ETags as changed and redownload them. Conflict handling will happen as described if there was a local change as well.

For the user, this means a lot of unnecessary downloads as well as potential conflicts. However, there will not be data loss.

If FileIDs got lost or changed, the problem is that renames or moves on server side can no longer be detected. That would result in a new download of files in the good case. If a fileID however changes to something that was used before, that can result in a rename that overwrites an unrelated file. That is because clients might still have the FileID associated with another file.

Hopefully this little post explains the importance of the additional metadata that we maintain in ownCloud.

Workshop at CERN

November 27, 2014 5 comments

cern_logoLast week, Thomas, Christian and myself were attending a workshop in CERN, the European Organization for Nuclear Research in Geneve, Switzerland.

CERN is a very inspiring place, attracting intelligent people from all over the world to get behind the secrets of our being. I felt honored to be at the place where for example the world wide web was invented.

The event was called Workshop on Cloud Services for File Synchronisation and Sharing and was hosted by CERN IT department. There have been around 100 attendees.

I was giving a talk called The File Sync Algorithm of the ownCloud Desktop Clients, which was very well received. If you happen to be interested in the sync algorithm we’re using, the slides are a nice starting point.

What amazed me most was the great atmosphere and the very positive attitude towards ownCloud. Many representatives of edu organizations that use ownCloud to which I talked were very happy with the product (even though there are problems here and there) from the technical POV. A lot of interesting setups and environments were explained and also showcased ownCloud’s flexibility to integrate into existing structures.

What also was pointed out by the attendees of the workshop was the importance of the fact that ownCloud is open source. Non free software does not have a chance at all in that market. That was the very clear statement in the final discussion session of the workshop.

The keynote was given by Prof. Benjamin Pierce from Pennsylvania with the title Principles of Synchronization. He is the lead author of
the project Unison which is another opensource sync project. It’s sync engine marks very high quality, but is not “up-to-date software” any more as he said.

I had the pleasure to spend quite some time with him to discuss syncing in general and our sync algorithms in particular, amongst other interesting things.

Atlas Detectors

Atlas Detectors

As part of his work, he works with a tool called QuickCheck to do very enhanced testing. One night we were sitting in the cantina there hacking to adopt the testing to ownCloud client and server. The first results were very promising, for example we revealed a “problem” in our sync core that I knew of, which formally is a sync error, yet very very unlikely to happen and thus accepted for the sake of an easier algorithm. It was impressive how fast the testing method was identifying that problem.
I like to follow up with the testing method.

Furthermore we met with a whole variety of other interesting people, backend developers, operators of the huge datasets (100 Peta-Byte), the director of CERN IT, a maintainer of the Scientific Linux and others.

Also we had the chance to visit the Atlas experiment, it is 100 meter underneath the surface and huge. That is where the particles are accelerated, and it was great to have the chance to visit that.

The trip was a great experience and very motivating for me, and I think it should be for all of us all doing ownCloud. Frank was really hitting a nerv when he was seeding the idea, and we all were doing a nice product of it so far.

Lets do more of this cool stuff!

Categories: Event, FOSS, ownCloud Tags: , , ,

After the 1.4.0 ownCloud Client Release

September 11, 2013 10 comments

You might have heard, ownCloud Client 1.4.0 was released last week. It is available from our sync clients page for all major desktop platforms, investigate the Changelog.

Danimos Visual Guide has outlined the new stuff in the release already, so no need to repeat it here. You should install and try it, that seems to be the opinion of many people who tried it.

Also people who shared their critical view on the client very publically in the past are much more pleased now with 1.4.0. One example is a recent blog post on BITBlokes. It is a blog about all kind of topics around FOSS. I regularly read it and often share its opinions. He concludes very positively about the 1.4.0 client.

It is good to see the positive feedback overall. That shows a couple of things from my engineering point of view: The concentrated work we continously do on all parts of ownCloud pays off. That is obvious of course, but still nice to see. And our (also obvious) actions to improve code quality such as the consequent use of continous integration, code reviews and such helps to improve quality.

“People are always excited if releases come with GUI changes!” I heard people saying. Well, maybe, but that’s not the whole truth. It also proves for me again is how important UI design and UX is. Me as a knee-deep-developer have an interesting relationship to all UX topics: I always have an opinion. Often a strong opinion. But the results coming out of that have not always been the, well, the most optimal. Very fortunate on the client we work together with our UX guy Jan and the positive feedback also shows how good that is for the software.

But enough of release pride. There is more work to do: The bug tracker is still not empty, the list of feature ideas is long. We will continue to focus on correctness, stability and robustness of syncing, performance and useful features and work on a version 1.5 for you.

These are a couple of concrete points we’re focussing on for 1.5:

  1. we already merged the client code on the new upstream sync version in git.
  2. performace improvements through further reduction of the number of requests and more efficiency in database operations on the client.
  3. we are working on a new propagator component that allows us to do the changes mentioned in 2 more easily.
  4. File manager integration, which means havingn icons in Explorer, Dolphin and friends.

A more detailed list can be found at github.

Thank you for all your help and support. It’s big fun!

ownCloud Client 1.2.0 beta1

December 21, 2012 10 comments

xmas_bulb2012 is slowly coming to an end and we all are looking forward to a few silent days around Christmas. But we did not want to leave to holidays without adding another thing to your vacation experience: I am happy to announce the first beta of the upcoming ownCloud Client release 1.2.0, ready now for you to test and enjoy under the tree.

This is the first build with the new things we did in Berlin a couple of weeks ago, you will

  • discover that there is much better error reporting if something goes wrong.
  • probably feel like it syncs faster, yes faster.
  • see that there are less HTTP requests to the server for a single sync run.
  • don’t see any issues with MacOSX and funny characters in filenames any more.
  • recognize a new icon set, which is not finalized yet (actually not all sizes are there, thats why the status dialog looks a bit funny) but we thought its nice to already add it to the beta. It should fit nicely into your operating system environment.
  • realize that this client comes with a cross platform file system watcher on clientside, so no polling any more.
  • have your password stored in a secure keychain on all platforms since we added qtkeychain to the client.
    • Maybe there is more, but we thought that’s already a nice beta release.

      Please find packages for MacOSX, Windows and Linuxes. Note, not all packages are finished yet. If the one for your distro is missing, please come back later, or even better – speak up at packaging@owncloud.org and help fixing 🙂

      Of course you also should note that this is an early beta and you would not want to use it without a good backup of your data and only on your test account without important data.

      We would appreciate if you let us know your experience on the mailinglist. If you find problems, please report it to the client’s bugtracker mentioning client- and server versions and at best with useful logs.

      With that we are happily vanishing to spend some time away from the computer, looking back on a very exciting and very busy year, working on an interesting topic with a lot of nice people.

      Thanks and best Season’s Greetings!

A week for csync

December 9, 2012 11 comments

On Friday I arrived back from Berlin where I had the pleasure to work with my great colleague Danimo and our friends from Woboq, Markus and Olivier, in the Woboq Headquarter in Berlin Kreuzberg for a week.

We thought that it might be fun to work together on csync, our sync engine under the hood of the ownCloud client. There were some issues that should be fixed and on the way we cleaned and improved quite some code in csync.

Here are some things we worked on:

  • We added a function that lets the program that uses the csync library pass arbitrary module parameters to the backend module. That way its more easy to steer the behaviour of the ownCloud modules from the calling app.
  • Error handling was improved, ie. if an http error happens, csync works errno based error reporting. We added custom errnos because not all error cases with http can be mapped to system errnos.
  • Formerly the csync ownCloud module was spooling files through an additional temporar file on client side. That step is skipped now which results in performance improvements as well as in more clean code.
  • We were able to reduce the number of HTTP requests that go over the wire even more. For example to check if there are changes on server side, now there is exactly one http propfind required. Also if files have to be synced, we could save some HTTP requests by improving caching of some requests.
  • Andreas recently changed the logging system in csync upstream master branch. We merged that back and now do not longer need the log4c framework. One build dependency less and a nice new logging framework.
  • Other bugs were fixed, such as a potential crash if a folder as deleted during it is synced, SSL handling shortcomings, code streamlining in handling compressed data streams and more.
  • We finalized a patch that uniforms the utf8 representations of characters over all platforms. That will fix problems we saw especially with MacOS and special filenames.

Ah, yes, we also did other things, more related to the ownCloud client. Danimo managed to implement a cross platform filesystem watcher class that is able to fulfill our requirements. That obsoletes polling for changes on the local file system, one of the most popular enhancement requests.

And finally there now is a API in csync thats reports file transmission progress if a callback is installed accordingly. So the client hopefully soon will tell ya what it’s doing for you. Also appreciated I guess…

Last but not least we added code to use QtKeyChain, a cross platform password storage library that stores password encrypted. For example on Linux QtKeyChain connects to kwallet. QtKeyChain was provided by
Frank Osterfeld, thanks a lot for that contribution.

Quite some stuff for a short week, note that stuff that fills a short line in this blog can be quite nifty to investigate, implement and test. Not everything is stable, polished and properly integrated but it was a great and productive week. The next release of ownCloud Client will be a nice one.

woboq_dinner2
And sice you can not always work, we had a nice dinner at a very cool italian restaurant. We met with other ownCloud employees located in Berlin, Arthur and Georg. Fun 🙂 And Berlin, yes, a great place to be, but finally I appreciated to arrive back to my snow covered home.

Many thanks to Olivier and Markus for hosting us and for the nice week.

Csync for ownCloud Client 1.1.0 – A New Sync Engine

October 11, 2012 18 comments

Along with todays ownCloud 4.5 release we released the new ownCloud Client 1.1.0 with a new syncing concept.

This blog will shed some light on the details. I apologize, it’s a long read.

Time Issues

ownCloud Client versions 1.0.x worked with csyncs traditional way of using the file modification times to detect updates between the two repositories that should be synced to each other. That works fine and conforms to our idea to ideally not use any other metadata in syncing than what the file system has anyway.

Time flies

However, there is one drawback which we all know from daily life: If at least two parties sync on time its important that all clocks are set exactly the same way. Remember good crime movies where a bank robbery always starts with a clock adjustment of all gangsters? We have exactly the same in ownClouds syncing: All involved have to have the same time setting, otherwise modification times of files can not be compared reliably.

There are solutions for computers to set the exact time (like ntp) so in general that works. However, in real life scenarios these are not reliable because either people do not have them started on the system or because the daemon updates the time once in a while and in that time span the clock skews already too much.

Users all the time reported problems with that and other experts continued to advise that we never get around that problems if we don’t change something fundamental and go away from pure time based syncing.

Well, we did that with our csync version 0.60.0 which is the sync engine for ownCloud Client 1.1.0.

An Unique Id

Now, every file and directory inside a sync directory has an unique Id associated. The idea is that the Id changes if the file changes. So in the sync process the need for a file update in either direction can be computed by comparing the two Ids of the file. If the id has changed on one repository the file was changed there and needs to be synced to the other side.

The Ids are generated on the ownCloud server and one challenge for the client is to always download the correct Id of a file. The Ids are just random tags for a file version. It is not associated to the file content as MD5 sums would be. Actually it was a frequent advise to use MD5 sums or a similar approach which digests the files content to detect updates. That would have come very handy because that means comparing file contents directly and, more important, it’s reproducable on either side. Also the client would have been able to recalculate the MD5-Sum of the local files and would not have depended on a local database with Ids that were pulled from the server before.

But we decided against hashes. Calculating MD5-Sums is costly in terms of CPU and time, especially for large files. The CPU problem is small on clients, but not on servers where a lot of clients connect to. Even though the sums can be calculated during upload, the problems remain for the case where the server does not see the upload stream, think of the “mount my Dropbox” case.

For files on the ownCloud server, the Id is always updated when the file gets updated. On the client side the last Id of a file is in the client database. It is invalidated in case the files modification time changed meanwhile to detect local changes.

Change Propagation

Another remarkable change in the 1.1.0 client is that change events in the file tree propagate up to the top directory on the owncloud server, ie. if a file changes in a directory, the id of the directory changes as well as the one of its parent directory etc.

That means that to detect if a file tree has changed, it’s enough to check the top most directories Id. If that has changed, ok, than the client needs to dig deeper, but in the not so rare case that nothing has changed, the one call is enough to detect that. That dramatically lowers the server load with clients because instead of digging through the whole directory structure what we did with the 1.0.x series
it is a few requests now.

CSync and ownCloud for Success

These are very intrusive changes to csync. For example, we had to add two additional fields to the database, add code that is able to build a representation
of the local file tree from the database and make csync query for the file Ids from
the server if needed. Deep under the hood the updater, reconciler and propagator code needed changes to work with the Ids. All these changes did not go back to csync upstream yet.

To not conflict with the upstream version of csync we decided to rename our csync version to ocsync. But: This is a temporar solution for the time we need to catch up with upstream again. That will take a while until everything is sorted again but we will work on that.

I am are very excited about the new version of csync. But obviously there are other changes in the ownCloud Client 1.1.0 which will be subject of another blog post.