Welcome to the MDC-Blog. Find news on what we do and what we believe in. A lot of stuff in here is about Foswiki and how we make use of it in a corporate environment. But there are also random bits of things related to our work using internet technologies.
You are here: Blog

The sixth Foswiki Anniversary
all the best to you, my dear

19 Nov 2014 | Michael Daum | 1 | , , , ,

This morning somebody on IRC mentioned Gource. What an excellent tool to visualize check-in history of a revision control system in a video. As we recently moved Foswiki to Git it didn't take much to fire it off Gource, add a bit of music by Sun Rai and here you are. Enjoy.

Linux Sucks 2014
a talk on Open Source that sticks out

25 Jul 2014 | Michael Daum | , , ,

Take a moment or three and watch this year's "Linux Sucks" session by Bryan Lunducke, from start to end to make sure you get the full message. Most of it is more about the Open Source ecosystem as well as dynamics in Community & Commerce rather than about Linux or even the Linux kernel. So the title is actually a bit misleading, which Bryan admits in a sidenote during his talk. But hey, a rant is not a rant until all knobs are on 10. There's no "a little bit of a rant". That's the style of this talk: pretty rantish and full of spice yet still professional. So quite enjoyable.

Let's say I'd second most of what Bryan says, especially about forking, Mir, Blink and Canonical. I am a lot less enthusiastic about current trends frankly. Most of it is experiments I'd rather opt out from if I could and let the train pass. Don't get me wrong: visions are a necessity. However some of them have far too many implications while being quite off at the same time.

For many people managing changes in IT is more of a survival issue being confronted with a constant stream of changes for the better or - more often - for the worse.

Recently on the Foswiki's support channel Vicki Brown forwarded a question of one of here clients about sorting search results by relevance:
We've had request by some folks that search be more "relevant". We asked them "relevant to what". The thinking is that the search results be based on "metrics" like "most visited" shown first or something like that. Is that possible? Just wondering what our options are. (source)
Here's my answer.

Sorting search results by relevance is a bit more difficult to explain compared to sorting documents according to their own static properties (e.g. sort by name). That's because it is a dynamic value computed during search time whereas properties of result documents themselves are gathered during index time and thus are rather static. Relevance is a function that computes a score for each document in the result set based on the query and the person doing the query and a few other factors that might play a role. Think of the relevance function as a weighted sum of factors each contributing to the score of result documents. Each factor of a relevance function influences the sorting thus contributing to the document score. When designing a relevance function you'd like to model a kind of sorting that feels most natural to users. Things that are considered to be more "interesting" or "relevant" to the user are sorted before others. Typical properties playing a role in relevance functions are
  • best matching the query term: the fewer parts of the query match, the lower the score; (might also take fuzzy or phonetic matches into account)
  • distance of query terms found in a result documet: a two-word query matches best when found adjacent in a document rather than finding both words in distant places of the same document
  • most recently changed: documents that have been created or edited more recently can be considered more relevant
  • number of likes
  • number of times a document has been put into a favorites list
  • five-star rating done by users voting on the document's importance or quality
  • number of times a document has been viewed, fading out numbers of views in the past
  • featured documents: manually boost documents to appear at the top of certain queries
  • click-through: elevate result documents that have been actually clicked on
  • personal interests: metadata about the person doing the query, probably gathered explicitly as part of a skills management undertaking (fields of interest, current projects, past projects)
  • documents that "friends" or team members have liked or been visiting recently (or friends of friends).
most relevant results searching for web server performance using SolrPlugin
Some of these factors don't really play a role on small to medium size wikis as they can distort a search order more than helping to create one that feels "natural". Small wikis suffer from sparse statistics. In corporate intranet wikis, it might very well be that the single most important document isn't visited at all anymore as everybody has internalized its content already being an employee of the organization. Yet still this one document is considered most important. How to configure a search engine is thus pretty specific to the use case and data available. Most of these factors can be implemented using SolrPlugin for Foswiki. Some of them need additional extensions interacting with solr, e.g. we don't have a proper LikePlugin, not hard to implement, just didn't have the time to implement it. Another missing feature playing a role in other products is "follow" or "friends". These things are added into the soup as well on well known social networking sites. The "sort by relevance" available in SolrPlugin out of the box is: sort search results by showing best matches first (relevance of documents vs user query) while breaking ties using a most-recently-changed metric. Matches in titles score higher than found elsewhere, followed by a few other places a match counts higher (topic title, topic name, categories, tags, summary field, …). Anything else needs more customizing. Besides its role in sorting search results, "relevance" is used to compute "similar documents", another feature of SolrPlugin. It allows you to add an extra navigation widget pointing towards other documents in the wiki, based on the meta data that are considered relevant while comparing two documents. Following these navigation links allows you to hop from topic to topic within the same thematic field that you are currently dealing with without you having to craft these links manually. All you have to do is assign proper meta data like categories and tags to your topics. Below snapshot shows a topic about nginx on my personal wiki being similar to a few other topics in the web.
similar topics using default relevance metric
Note that documents might be considered "similar" based on quite different properties of DataForm fields depending on the WikiApp the current topic is participating in.
similar topics using a customized relevance metric
Here the recent remake of RoboCop is considered most similar to the original RoboCop movie from 1987. Movies are considered similar based on their genre, director, writer, producer besides the normal meta data being gathered by the indexer. Bottom line: relevance is a pretty central concept in search engines. It can help to find results quicker by sorting searched documents in a way that user expect them to be sorted, whatever is applicable for the deployed system. Sort-by-relevance does play an important role even in small wikis for personal knowledge management. It becomes more powerful the broader the scope of the relevance function is, for instance taking social factors into account. However be warned: wikis behind a firewall suffer from sparse data. Extracting the wrong metric might hurt more than it helps by distorting search order in unexpected ways.

Just in case you missed this great talk of Karen McGrane, the (not-wannabe) President of WYSIWYG Haters: go for it, listen to her talk that she gave on the Drupalcon Portland this year. While mainly focusing on the consequences and opportunities for Drupa as a CMS, her critiques somewhat hit wikis in the guts as well, maybe even more than regular CMSes.


There are two main points made by Karen:

(1) WYSIWYG sucks. Better separate content from form, or as she coins it:

Get your HTML out of my butter.

(2) Page-oriented authoring sucks.

Both issues prevent content from being future-friendly as it can only be presented in the context that it has been created for and is not reusable for other use cases or devices. Point (2) also means that inline-editing sucks as it enforces a mental model of blips of content only being valid in a specific context or situation.

Most content authoring systems present a single WYSIWYG area where authors can just go wild and do what they want (almost). People love their WYSIWYG because it reminds them so much of their beloved MS Word. So what's wrong with giving them what they want? Basically, nothing. Yet still they have a problem: their content is not reusable. Its main value is lost in piles of markup dealing with presentation. Getting the real value out of it means stripping off markup that authors better shouldn't be adding in the first place. The content management system is much better suited to take care of it. It is this markup noise that prevents content from being reusable in a different context, on multiple devices or even audio-devices.

Let's put that into the light of wikis. Wikis are build around the central notion of a Wiki Page. It serves as a container for a note or idea that people can create quickly and without any admin overhead: click - edit - save - done. In short this is called "wikiness". Wikiness is a measure for the amount of overhead people have to face creating content. Wikiness is high when there's barely any overhead. Wikiness is low when there's a lot of planning ahead creating structures to be filled out in a guided mostly form-based way. Of course there's a middle ground where you can do both on different grades by increaseing the amount of form-based input while decreaseing the importance of blob content. Still, the basic storage unit in a wiki is one page where content is located. This location corellates directly with the URI that the system uses to present the wiki content. So wikis tightly couple the location where the content is authored with the location of its presentation. That's actually different in other content management systems such as Drupal, which come with a dedicated admin interface, a second layer behind the curtain where content is created. These admin shells are totally separated from the way the content is perceived on the front.

And here's the problem: separating content authoring too much from the way content is perceived requires additional mental efforts. It is far more challenging to build up a mental model for content chunks being used in distant locations rather than seeing hands-on what your content is going to be for the reader.

The important point here is that every writer is his own reader while producing it. I am a bit concerned about separating content production in an admin shell from the way it is perceived. This is disturbing the process of creating the content significantly. In the end creating content as an author requires him to be productive. Productivity is highest when he is in a so called flow state, similar to musicians improvising while listening to themselves, or a martial artist being in a body dialog with an opponent (uke-nage). In that sense, separating writing too much from reading will hurt productivity. Not the less does carrying the burden of fancy content styling interrupt the writer in capturing his thoughts in fluent language.

After having written a few chunks of content a writer will regularly test the text by putting himself into the reader-role to see whether the text makes any sense. The process of writing thus follows loops of continual typing, re-reading, simplifying, rephrasing and reordering paragraphs. The outcome of it is best when the means for the writer support his thinking and doesn't get into his way too much in terms of (1) forcing him to deal with presentation issues as well as (2) forcing him into chunks that don't match his mental model of the concepts he is thinking about. Both will inevitably cause the writer to drop out of flow state. So he ends up more playing CMS or WYSIWYG rather than producing valuable content.

To be fair, we are actually talking about different things here. Karen did not talk about the authoring process as such but more as seen from the other end of the chain of content management. She ends noting that it is not so much the systems that need to change but more our mental models about content. Agreed, but even more do we have to support creativity and productivity of its authors. Chunking content for the sake of reusability puts that into danger, even more when those chunk don't match the mental model of the concepts being transported.

Studies show that for similar reasons current mobile devices are counter-productive. They are the highway to snarf in content snippets in real-time. They don't lend towards a flow-state in content production by far, nor do current CMSes like Drupal actually. That's where wikis still have an advantage.

Contact details

  • Michael Daum Consulting
  • Neum├╝nstersche Stra├če 12
  • D-20251 Hamburg

Find us on the web


Copyright © 1999-2016 by the contributing authors. All material on this collaboration platform is the property of the contributing authors. Ideas, requests, problems regarding Michael Daum Consulting? Send feedback.
This page was cached on 30 Aug 2016 - 01:31.