Welcome to the MDC-Blog. Find news on what we do and what we believe in. A lot of stuff in here is about Foswiki and how we make use of it in a corporate environment. But there are also random bits of things related to our work using internet technologies.
We've had request by some folks that search be more "relevant". We asked them "relevant to what". The thinking is that the search results be based on "metrics" like "most visited" shown first or something like that. Is that possible? Just wondering what our options are. (source)
Here's my answer.
Relevance is a function that computes a score for each document in the result set based on the query and the person doing the query and a few other factors that might play a role. Think of the relevance function as a weighted sum of factors each contributing to the score of result documents. Each factor of a relevance function influences the sorting thus contributing to the document score.
When designing a relevance function you'd like to model a kind of sorting that feels most natural to users. Things that are considered to be more "interesting" or "relevant" to the user are sorted before others.
Typical properties playing a role in relevance functions are
How to configure a search engine is thus pretty specific to the use case and data available.Most of these factors can be implemented using SolrPlugin for Foswiki. Some of them need additional extensions interacting with solr, e.g. we don't have a proper LikePlugin, not hard to implement, just didn't have the time to implement it. Another missing feature playing a role in other products is "follow" or "friends". These things are added into the soup as well on well known social networking sites.
The "sort by relevance" available in SolrPlugin out of the box is: sort search results by showing best matches first (relevance of documents vs user query) while breaking ties using a most-recently-changed metric. Matches in titles score higher than found elsewhere, followed by a few other places a match counts higher (topic title, topic name, categories, tags, summary field, ...).
Anything else needs more customizing.
Besides its role in sorting search results, "relevance" is used to compute "similar documents", another feature of SolrPlugin. It allows you to add an extra navigation widget pointing towards other documents in the wiki, based on the meta data that are considered relevant while comparing two documents. Following these navigation links allows you to hop from topic to topic within the same thematic field that you are currently dealing with without you having to craft these links manually. All you have to do is assign proper meta data like categories and tags to your topics.Below snapshot shows a topic about nginx on my personal wiki being similar to a few other topics in the web.
Note that documents might be considered "similar" based on quite different properties of DataForm fields depending on the WikiApp the current topic is participating in.
Here the recent remake of RoboCop is considered most similar to the original RoboCop movie from 1987. Movies are considered similar based on their genre, director, writer, producer besides the normal meta data being gathered by the indexer.Bottom line: relevance is a pretty central concept in search engines. It can help to find results quicker by sorting searched documents in a way that user expect them to be sorted, whatever is applicable for the deployed system. Sort-by-relevance does play an important role even in small wikis for personal knowledge management. It becomes more powerful the broader the scope of the relevance function is, for instance taking social factors into account. However be warned: wikis behind a firewall suffer from sparse data. Extracting the wrong metric might hurt more than it helps by distorting search order in unexpected ways.
There are two main points made by Karen:
(1) WYSIWYG sucks. Better separate content from form, or as she coins it:
Get your HTML out of my butter.
(2) Page-oriented authoring sucks.
Both issues prevent content from being future-friendly as it can only be presented in the context that it has been created for and is not reusable for other use cases or devices. Point (2) also means that inline-editing sucks as it enforces a mental model of blips of content only being valid in a specific context or situation.
Most content authoring systems present a single WYSIWYG area where authors can just go wild and do what they want (almost). People love their WYSIWYG because it reminds them so much of their beloved MS Word. So what's wrong with giving them what they want? Basically, nothing. Yet still they have a problem: their content is not reusable. Its main value is lost in piles of markup dealing with presentation. Getting the real value out of it means stripping off markup that authors better shouldn't be adding in the first place. The content management system is much better suited to take care of it. It is this markup noise that prevents content from being reusable in a different context, on multiple devices or even audio-devices.
Let's put that into the light of wikis. Wikis are build around the central notion of a Wiki Page. It serves as a container for a note or idea that people can create quickly and without any admin overhead: click - edit - save - done. In short this is called "wikiness". Wikiness is a measure for the amount of overhead people have to face creating content. Wikiness is high when there's barely any overhead. Wikiness is low when there's a lot of planning ahead creating structures to be filled out in a guided mostly form-based way. Of course there's a middle ground where you can do both on different grades by increaseing the amount of form-based input while decreaseing the importance of blob content. Still, the basic storage unit in a wiki is one page where content is located. This location corellates directly with the URI that the system uses to present the wiki content. So wikis tightly couple the location where the content is authored with the location of its presentation. That's actually different in other content management systems such as Drupal, which come with a dedicated admin interface, a second layer behind the curtain where content is created. These admin shells are totally separated from the way the content is perceived on the front.
And here's the problem: separating content authoring too much from the way content is perceived requires additional mental efforts. It is far more challenging to build up a mental model for content chunks being used in distant locations rather than seeing hands-on what your content is going to be for the reader.The important point here is that every writer is his own reader while producing it. I am a bit concerned about separating content production in an admin shell from the way it is perceived. This is disturbing the process of creating the content significantly. In the end creating content as an author requires him to be productive. Productivity is highest when he is in a so called flow state, similar to musicians improvising while listening to themselves, or a martial artist being in a body dialog with an opponent (uke-nage). In that sense, separating writing too much from reading will hurt productivity. Not the less does carrying the burden of fancy content styling interrupt the writer in capturing his thoughts in fluent language. After having written a few chunks of content a writer will regularly test the text by putting himself into the reader-role to see whether the text makes any sense. The process of writing thus follows loops of continual typing, re-reading, simplifying, rephrasing and reordering paragraphs. The outcome of it is best when the means for the writer support his thinking and doesn't get into his way too much in terms of (1) forcing him to deal with presentation issues as well as (2) forcing him into chunks that don't match his mental model of the concepts he is thinking about. Both will inevitably cause the writer to drop out of flow state. So he ends up more playing CMS or WYSIWYG rather than producing valuable content. To be fair, we are actually talking about different things here. Karen did not talk about the authoring process as such but more as seen from the other end of the chain of content management. She ends noting that it is not so much the systems that need to change but more our mental models about content. Agreed, but even more do we have to support creativity and productivity of its authors. Chunking content for the sake of reusability puts that into danger, even more when those chunk don't match the mental model of the concepts being transported. Studies show that for similar reasons current mobile devices are counter-productive. They are the highway to snarf in content snippets in real-time. They don't lend towards a flow-state in content production by far, nor do current CMSes like Drupal actually. That's where wikis still have an advantage. … more
This release comes with a set of important security and performance fixes:
There are a few fixes to the JQueryPlugin shipped with Foswiki. It now comes with jquery-1.10.1 and jquery-2.0.2. You can switch your site to jquery-2.0.2 and still serve jquery-1.10.1 to old Internet Explorers. All of the sub-modules in JQueryPlugin have been updated to the latest upstream packages available.
Foswiki-1.1.9 will be the last release on the 1.1.x branch, hopefully. We claimed that before but this time it is for real (fingers crossed). Work on the new 1.2.0 release starts from now on. This is where the train goes now- Foswiki-1.2.0 will have a couple of important fixes on board that didn't make it into 1.1.9 but that's another story. Stay tuned.… more
I hope that we will see great content on this blog in the future explaining the bits and bytes underneath.… more