Archive

Archive for the ‘Opinion’ Category

Open Search Foundation

June 14, 2020 3 comments

recently I learned about the Open Search Foundation in the public broadcast radio (Bayern 2 Radio Article). That surprised me: I had not heard about OSF before, even though I am active in the field of free software and culture. But this new foundation made it into the mainstream broadcast already. Reason enough to take a closer look.

It is a very good sign to have the topic of internet search in the news. It is a fact that one company has a gigantic market share in searching which is indeed a threat to the freedom of internet users. The key to be found in the web is the key to success with whatever message or service a web site might come up with, and all that is controlled by one enterprise driven by commercial interests. That should be realized by a broad audience.

The Open Search Foundation has the clear vision to build up an publicly owned search index as an alternative for Europe.

Geographical and Political Focus

The whitepaper talks about working on the search machine specifically for Europe. It mentions that there are search indexes in the US, China and Russia, but none rooted in Europe. While this is a geographical statement in the first place, it is of course also a political, because some of the existing services are probably politically controlled.

It is good to start with a focus on Europe, but the idea of a free and publicly controlled project should not be limited to Europes borders. In fact, it will not stop there if it is attractive because it might offer a way to escape from potentially controlled systems.

On the other hand, Europe (in opposite to any single European country alone) seems like a good base to start with this huge effort as it is able to come up with the needed resources.

Organization

The founding members of the Open Search Foundation are not very well known members of the wider open source community. That is good, as it shows that the topics around the free internet do not only concern nerds in the typical communities, but also people who work for an open and future proof society in other areas like academia, research and medicine.

On the other hand, an organization like for example the Wikimedia e.V. might have been a more obvious candidate to address this topic. Neither on the web site nor in the whitepaper I found mentions of any of the “usual suspects” or other organizations and companies who have already tried to set up alternative indices. I wonder if there have been discussions, cooperations or plans to work together?

I am very curious to see how the collaboration between the more “traditional” open data/open source community and the Open Search Foundation will be, as I think it is a crucial part to combine all players in this area without falling into the “endless discussion trap” while not achieving countable results. It is the question of building an efficient community.

Pillars of Success

Does the idea of the OSF have a realistic chance to succeed? The following four pillars might play an important role for the success of the idea to build the free search index of the internet:

1. Licenses and Governance

The legal framework has to be well defined and thought through, so that it will be resilient longer term. As we talk about a huge commercial potential to control this index, parties might wanna try to get into control of it.

Only a strong governance and legal framework can ensure that the idea lasts.

The OSF mentions in the whitepaper that it is one of the first steps to set this up.

2. Ressources

A search index requires big amounts of computing power in the wider sense, including storage, networking, redundancy and so on. Additionally there need to be people who take care on that. For that, there needs to be financial support for staffing, marketing, legal support and all that.

The whitepaper mentions ideas to collect the computing power from academia or from company donations.

For the financial backing the OSF will have to find sources like EC money, from governments and academia, and maybe private fund raising. Organizations like Wikimedia would already have experience with that.

If that will not be enough, the idea of selling better search results for money or offering SEO help for development will quickly come up. This will be interesting discussions that require the strong governance.

3. Technical Excellence

Who will use a search index that does not come up with reasonable search results?
To be able to compete with the existing solutions that even made it into our daily communication habits already, the service needs to be just great in terms of search results and user experience.

Many already existing approaches that use the Google index as a backend have already show that even with that it is not easy to provide a comparable result.

It is a fact that users of the commercial competition trade their personal data against optimal search results, even if they dont do that consciously. It is more difficult for a privacy oriented service, so this is another handicap.

The whitepaper mentions ideas on how to work on this huge task and also accepts that it will be challenging. But that is no reason to not try it. We all know plenty of examples where these kind of tasks were successful even though nobody believed that in the beginning.

4. Community

To achieve all the points a strong community is key factor.

There need to be people who do technical work like administering the data centers, developers who code, technical writers for documentation, translators and much more. But that is only the technical part.

For the financial-, marketing- and legal support there are other people needed, not speaking about political lobby and such.

All these parts have to be built up, managed and kept intact long term.

The Linux kernel, which was mentioned as a model in the whitepaper, is different. Not even the technical work is comparable between the free search index and the Linux kernel.

The long term stable development of the Linux kernel is based on people who work full time on the kernel while being employed by certain companies who are actually competitors. But on the kernel, they collaborate.

This way, the companies share cost for inevitable base development work. There differentiators in the market are not depending on there work on the kernel, but in the levels above the kernel.

How is that for OSF? I am failing to see how enough sustainable business can be based on an open, privacy respecting search index so that companies will be happy to fund engineers working on it.

Apart from that, the kernel has the benefit that it had strong companies like RedHat, SUSE and IBM who pushed Linux in the early times, so no special marketing budgets etc. were needed for the kernel specifically. Also that is different for OSF, as quite some marketing- and community management money will be required to start.

Conclusion

Building a lasting, productive and well established community will be the vital question for the whole project in my opinion. Offering a great idea, which this initiative is without question, will not be enough to motivate people to participate long term.

There has to be an interesting offer for potential contributors at all levels, starting from individuals and companies for contributions, to universities for donating hardware or the governments and the European Community for money. There needs to be some kind of benefit they will gain for their engagement on the project. It is interesting if the OSF can come up with a model that will get that kickstarted.

I very much hope that this gets traction as it would be an important step towards a more free internet again. And I also hope that there will be collaboration on this topic with the traditional free culture communities and the foundations there.

Categories: FOSS, KDE, Opinion Tags:

Eighty Percent ownCloud

December 23, 2018 25 comments

Recently the German computer magazin C’t posted an article about file sync solutions (“Unter eigener Regie”, C’t 23, 2018) with native sync clients. The article was pretty positive about the FOSS solution of… Nextcloud! I was wondering why they had not choosen ownCloud’s client as my feeling is that ownCloud is way more busy and innovative developing the desktop client for file synchronization together with community.

lines_changed

Code lines changed as of Nov. 10, 2018

That motivated me to do some investigation what the Nextcloud client actually consists of (at due date Nov. 10, 2018). I was looking into the NC desktop client git repoository grouped the numbers of commits of people that can be associated clearly to either the ownCloud- or Nextcloud project, or to “other communities” or machine commits. Since the number of commits could be misleading (maybe some commits are huge?) I did the same exercise with numbers of changed lines of code.

When looking on the changed lines, the first top six contributors to the Nextcloud desktop client are only active in the ownCloud project. Number seven is an “other community” contributor whos project the client was based on in the beginning. Number eight to eleven go to Nextcloud, with a low percentage figure.

commits

# of commits to the Nextcloud Desktop repository as of Nov. 10, 2018

As a result, far more than 80% of the changed lines of the Nextcloud client is actually work that ownClouders did (not considering the machine commits). In the past, and also today. The number would be even higher if it considered all the commits that go into the NC repo with an NC author, but are actually ownCloud patches where the original author got lost on the way by merging them through a NC branch. It looks like the Nextcloud developers were actually adding less commits to their client than all “other community” developers so far.

No wonder, it is a fork, you might think, and that is of course true. However, to my taste these numbers are not reflecting a “constructive” fork driving things forward when we talk about sync technology.

That is all fine, and I am proud that the work we do in ownCloud is actually stimulating two projects, with different focus areas nowadays. On the other hand, I would appreciate if the users of the technology would take a closer look to understand who really innovates, drives things forward and also fixes the nasty bugs in the stack. As a matter of fairness, that should be acknowledged. That is the motivation that keeps free software contributors busy and communities proud.

Change in Professional Life

November 23, 2018 2 comments

This November was very exciting for me so far, as I was starting a new job at a company called Heidolph. I left SUSE after working there for another two years. My role there that was pretty far away from interesting technical work, which I missed more and more, so I decided to grab the opportunity to join in a new adventure.

Heidolph is a mature German engineering company building premium laboratory equipment. It is based in Schwabach, Germany. For me it is the first time that I am working in company that doesn’t do only software. At Heidolph, software is just one building block besides mechanical and electronic parts and tons of special know how. That is a very different situation and a lot to learn for me, but in a small, co-located team of great engineers, I am able to catch up fast in this interesting area.

We build software for the next generation Heidolph devices based on Linux and C++/Qt. Both technologies are in the center of my interest, over the years it has become more than clear for me that I want to continue with that and deepen my knowledge even more.

Since the meaning of open source has changed a lot since I started to contribute to free software and KDE in particular, it was a noticeable but not difficult step for me to take and move away from a self-proclaimed open source company towards a company that is using open source technologies as one part of their tooling and is
interested in learning about the processes we do in open source to build great products. An exciting move for me where I will learn a lot but also benefit from my experience. This of course that does not mean that I will stop to contribute to open source projects.

We are still building up the team and look for a Software Quality Engineer. If you are interested in working with us in an exciting environment, you might wanna get in touch.

Categories: FOSS, KDE, Opinion, Qt Tags: ,

Kraft out of KDE

March 22, 2018 13 comments

Following my last blog about Krafts upcoming release 0.80 I got a lot of positive reactions.

There was one reaction however, that puzzles me a bit and I want to share my thoughts here. It is about a comment about my announcement that I prefer to continue to develop Kraft on Github. The commenter reminded my friendly that there is still Kraft code on KDE infrastructure, and that switching to a different repository might waste peoples time when they work with the KDE repo.

That is a fair statement, of course I don’t want to waste peoples time. What sounds a bit strange to me is the second paragraph, that says that if I decide to stay with Github, I should let KDE people know that I wish Kraft to not be a KDE project anymore.

But … I never felt that Kraft should not be a KDE project any more.

A little History

Kraft has come a long way together with KDE. I started Kraft in (probably) 2004, gave a talk about Kraft at the Akademy Dublin 2006, maintained it with the best effort I could contribute until today. There is a small but loyal community around Kraft.

During all the time I got little substancial contribution to the code directly, with the exception of one cool developer who got interested for some time and made some very interesting contributions.

When I asked a for the subdomain http://kraft.kde.org long time ago I got the reply that it is not in the interest of KDE to give every little project a subdomain. As a result I reserved http://volle-kraft-voraus.de and run it since then, happily showing a “Part of the KDE family” logo on it.

Beside the indirect contributions to libraries that Kraft uses, I shipped Kraft with the translations made by the KDE i18n team, for which I always was very grateful. Otherwise I got no other services from KDE.

Why Github?

Githubs workflow serves me well in my day job, and since I have only little time for Kraft, I like to use the tools that I know best and give me the most efficiency.

I know that Github is not free software and I am sceptical about that. But Github also does not lock in, as we still are on git. We all know the arguments that usually come on the table at this point, so I am not elaborating here. One thing I want to mention though is that since I moved to Github publically I already got two little pull requests with code contributions. That is a lot compared to what came in the last twelfe years when living on KDE infrastructure only.

Summary

Kraft is a small project, driven by me alone. My development turnaround is good with Github as I am used to it. Even if no KDE developer would ever look at Github (which I know is not true) I have to say with heavy heart that Kraft would not take big harm by leaving KDEs infra, based on the experience of the last 12 years.

If the KDE translation teams do not want to work with Github, I am fine to accept that, and wonder if there could be a solution rather than switching to Transifex.

One point however I like to make very clear: I did not wish to leave KDE, nor aimed to move Kraft out.
I still have friends in the KDE community, I am still very interested in free software on desktop and elsewhere, and my opinion is still that KDE is the best around.

If the KDE community feels that Kraft must not be a KDE project any longer because it is on Github, ok. I asked KDE Sysadmins to remove Kraft from the KDE git, and it is already done.

Kraft now lifes on on Github.

Categories: FOSS, KDE, Kraft, Opinion, Qt Tags: , ,

Raspberry based Private Cloud?

December 11, 2016 15 comments

Here is something that might be a little outdated already, but I hope it still adds some interesting thoughts. The rainy Sunday afternoon today finally gives the opportunity to write this little blog.

Recently an ownCloud fork was coming up with a little shiny box with one harddisk, that can be complemented with a Rapsberry Pi and their software, promoting that as your private cloud.

While I like the idea of building a private cloud for everybody (I started to work on ownCloud because of that idea back in the days), I do not think that this example of gear is a good solution for private cloud.

In fact I believe that throwing this kind of implementations on the table is especially unfortunate because if we come up with too many not optimal proposals, we waste theĀ  willingness of users to try it. This idea should not target geeks who might be willing to try ideas on and on. The idea of the private cloud needs to target at every computer user who wants to store data safely, but does not want to care about longer than ever necessary. And with them I fear we only have very little chances, if one at all, to introduce them to a private cloud solution before they go back to something that simply works.

Here are some points why I think solutions like the proposed one are not good enough:

Hardware

That is nothing new: The hardware of the Raspberry Pi was not designed for this kind of usecases. It is simply too weak to drive ownCloud, which is an PHP app plus database server that has some requirements on the servers power. Even with PHP7, which is faster, and the latest revisions of the mini computer, it might look ok in the beginning, but after all the neccessary bells and whistles were added to the installation and data run in, it will turn out that the CPU power is simply not enough. Similar weaknesses are also true for the networking capabilities for example.

A user that finds that out after a couple of weeks after she worked with the system will remain angry and probably go (back) to solutions that we do not fancy.

One Disk Setup

The solution comes as one disk setup: How secure can data be that is on one single hardisk? A seriously engineered solution should at least recommend a way to store the data more securely and/or backup, like on an at homes NAS for example.
That can be done, but requires manual work and might require more network capabilities and CPU power.

Advanced Networking

Last, but for me the most important point: Having such a box in the private network requires to drill a whole in the firewall, to allow port forwarding. I know, that is nothing unusual for experienced people, and in theory little problem.

But for people who are not so interested, that means they need to click in the interface of their router on a button that they do not understand what it does, and maybe even insert data by following an documentation that they have to believe. (That is not very much different from downloading a script from somewhere letting it do the changes which I would not recommend as well).
Doing mistakes here could potentially have a huge impact for the network behind the router, without that the person who did it even has an understanding for.

Also DynDNS is needed: That is also not a big problem in theory and for geeks, but in practice it is nothing easily done.

With a good solution for private cloud, it should not be necessary to ask for that kind of setups.

Where to go from here?

There should be better ways to solve this problems with ownCloud, and I am sure ownCloud is the right tool to solve that problem. I will share some thought experiments that we were doing some time back to foster discussion on how we can use the Raspberry Pi with ownCloud (because it is a very attractive piece of hardware) and solve the problems.

This will be subject of an upcoming blog here, please stay tuned.

 

Categories: FOSS, Opinion, ownCloud Tags: ,

ownCloud Chunking NG Part 3: Incremental Syncing

November 13, 2015 2 comments

This is the third and final part of a little blog series about a new chunking algorithm that we discussed in ownCloud. You might be interested to read the first two parts ownCloud Chunking NG and Announcing an Upload as well.

This part makes a couple of ideas how the new chunking could be useful with a future feature of incremental sync (also called delta sync) in ownCloud.

In preparartion of delta sync the server could provide another new WebDAV route: remote.php/dav/blocks.

For each file, remote.php/dav/blocks/file-id exists as long as the server has valid checksums for blocks of the file which is identified by its unique file id.

A successful reply to remote.php/dav/blocks/file-id returns an JSON formatted data block with byte ranges and the respective checksums (and the checksum type) over the data blocks for the file. The client can use that information to calculate the blocks of data that has changed and thus needs to be uploaded.

If a file was changed on the server and as a result the checksums are not longer valid, access to remote.php/blocks/file-id is returning the 404 "not found" return code. The client needs to be able to handle missing checksum information at any time.

The server gets the checksums of file blocks along the upload of the chunks from the client. There is no obligation of the server to calculate the checksums of data blocks that came in other than through the clients, yet it can if there is capacity.

To implement incremental sync, the following high level processing could be implemented:

  1. The client downloads the blocklist of the file: GET remote.php/dav/blocks/file-id
  2. If GET succeeded: Client computes the local blocklist and computes changes
  3. If GET failed: All blocks of the file have to be uploaded.
  4. Client sends request MKCOL /uploads/transfer-id as described in an earlier part of the blog.
  5. For blocks that have changed: PUT data to /uploads/transfer-id/part-no
  6. For blocks that have NOT changed: COPY /blocks/file-id/block-no /uploads/transfer-id/part-no
  7. If all blocks are handled by either being uploaded or copied: Client sends MOVE /uploads/transfer-id /path/to/target-file to finalize the upload.

This would be an extension to the previously described upload of complete files. The PROPFIND semantic on /uploads/transfer-id remains valid.

Depending on the amount of not changed blocks, this could be a dramatic cut for the data that have to be uploaded. More information has to be collected to find out how much that is.

Note that this is still in the idea- and to-be-discussed state, and not yet an agreed specification for a new chunking algorithm.

Please, as usual, share your feedback with us!

Incremental Sync in ownCloud

February 9, 2015 36 comments

KLAAS

Nautilus Shell, David Bygott


Incremental Sync is probably the feature that most people ask, or even sometimes cry for. Recently there was another wave of discussion about ownCloud is doing incremental sync or not.

I will try again (as in this issue) to explain why we decided to slowing that feature. Slowing means that it will be done later, not never, as it was stated. It is just that we think that other things benefit the whole idea of ownCloud more. That has plain technical reasons. Let’s dive a bit into.

RSync is great

Nobody will object here. In a nutshell, this is how rsync works: There is a file on the client and on the server. The idea is to not transfer the entire file from one side to the other if either side changes, but only the parts that have changed.

The original rsync does that by chopping the file to blocks of a given size and calculating a checksum of each of the blocks. The list of checksums is sent to the server and – here’s the trick – the server looks at its version of the file and for each of the checksum in the list, it seeks if it finds the same block in the file. That will often not be at the same position in the file, but maybe somewhere else. That is done for each block, and finally the server will work out the information of which parts of the file are existing and which are not and have to be sent by the client.

By way of this clever algorithm, we will just have to transmit a very small fraction of the changed file, because most content did not change. And that is what we want! Yeah!

Mission accomplished? No, not really. While there is basically nothing wrong with the idea in general, there is a severe architectural downside. The rsync algorithm depends on a strong server component which, for each file, searches around and calculates checksums. In an environment where we potentially have a lot of clients connecting to one server that would create a huge load on it which we need to avoid. So what if instead of putting the burden on the server’s shoulder, we could make the clients take the responsibility?

And guess what, there has been somebody thinking about that before and he says:

Use ZSync for this!

ZSync basically turns the idea of rsync upside down and shifts the calculation of checksums away from the server and onto the clients. That means that with zsync, the server can keep a static list of checksums for every block specific to a version of a file. The list can for example be computed along the upload of the file to the server. From that point it does not change, as long as the file does not change. That means less computation work for the server, and maybe this job can also put into the client.

So far that sounds cool (even though some questions remain) and sounds like something that can help us.

Unfortunately, the approach does not work very well for compressed files. The reason is that if a file gets compressed, even if only a couple of bytes in the original file change, the compression algorithm usually changes a lot all over the entire file. As a result, the zsync algorithm can only compute a comparably large diff. Given the cost of computation that can turn inefficient quickly.

“But who uses compressed files?” you might argue. The problem is that almost each and every of the files in everyday life are stored compressed. This is for example true for Microsoft Office files and the Open Document files produced by LibreOffice and Apache OpenOffice. They are really renamed ZIP containers, that hold the documents with all its embedded files, etc.

Now of course you will reply that zsync has an improved algorithm for compressed files. Yes, true, that is a great thing. However, it involves that the compressed file gets uncompressed to be worked on by zsync. Afterwards it is compressed again. And that is the problem: As common compressors do not leave a hint behind _how_ the file was compressed, it is not possible to reliably recreate a file that is equivalent to the original one. How will apps react on a file that has changed its compression scheme?

Results

As said above, yes, we will at one point of time implement something along the zsync algorithm. The explanations above should show however, that at the current state of ownCloud, other features will improve ownClouds performance, stability and convenience more. And that is the important thing for us, more than pleasing the loudest barking dogs.

Here is a rough outline of how I would move on on this, open for your suggestions and critique:

The zsync algorithm is designed to improve downloads. We need it for both up- and downloads, and it needs to be thought through if that is also possible. For the server side functionality, there are a couple of open questions which carefully have to be investigated. Preferably an app can be written that provides the handling of the zsync checksum lists. That has to be clarified and discussed, and that will take a while.

But as outlined above, this idea is only clever for a limited amount of file types. So what I would suggest first is that we get an idea of the file types users usually store in their ownCloud, so that we can do a validated estimate on how this feature helps. I will follow up on this first step.

Thanks for reading this long blog post. Thanks Danimo for lectorate.