A few months ago, Jamie Mahoney spent several tedious hours looking at a further 66 HEIs’ websites, recording their URI syntaxes. It builds on the work of Alex Bilbie, who looked at an initial 40 HEIs for the Linking You project. You can see the combined spreadsheet on Google docs. It requires further analysis and colour coding, but may be of interest to some people in its current state.
URIs
The Evolution of the Address Bar
In the early browsers the address bar was simply a box where users typed the address of the webpage they needed to get to and then clicked a large ‘go’ button. As browsers developed so did the functionality of the address bar, one of the basic updates came with browsers remembering the viewing history of the user. When a user wanted to go back to a site they had visited in the past the browser began to recognise the URL as it was typed from the history and returned suggestions.
Recently web development has been shifting with the new technologies that are being developed; new standards such as HTML5 and CSS3 mixed with the increased use AJAX techniques all meant that browsers had to shift and change to keep up with them. On of the major changes that came into the browser around the shift of web 2.0 was a major update to how users use the address bar.
The update of the address bar between FireFox 2 and FireFox 3 wasn’t just in the change of name (unofficially known as the ‘Awesome Bar’) or change in design. The address bar became more of a global search of your browser based upon the user’s bookmarks and page history, matching words and phrases to text within URLs, page titles and tags on the page, not just from the beginning of the URL, but text that appears throughout the URL. The results returned were then ranked in the address bar drop-down based upon ‘frecency’ – a mixture between the most frequency viewed pages and the recency of visiting the suggested page.
This feature has been adapted and brought across to the other main browsers, Google implements a similar technique in Chrome’s ‘Onmibox’ however also expanding this functionality across to the user’s search history, as well as opening up the Omnibox API, allowing developers to write their own plugins to expand the address bar function further.
Furthermore the address bar is not just about remembering the history of the user, Chrome started to implement the ability to search the internet straight from the address bar, bypassing the Google homepage. This combined with several existing features allowed Chrome to firsly suggest popular searches through Google Suggest (http://lncn.eu/fgq) and suggest previous searches from the user’s history (http://lncn.eu/em8). This provision means that some users have privacy concerns as companies such as Google log the search queries, as a result Chrome has implemented incognito browser, in FireFox known as Private Browsing, which prevents these logs and many other things from being created.
During the production of FireFox 3 Mozilla’s Mike Beltzner said:
“I confidently predict that the Awesome Bar is going to change the way people navigate the web…”
Within comments on a blog post by a developer at Mozilla relating to the the beta release of FireFox 3 specifically about the Awesome Bar, users were stating just how much it had changed how they use their browsers:
“Yes, I’ve found the biggest advantage is that you don’t have to redo web searches that you did before. And if you do want to redo a web search, you can just type in one or two of the keywords and firefox will find the search page in your history. Wonderful!” – David Nelson
“I have been using Firefox 3 since the first beta and after I type two or three letters in the URL bar, the page I want is usually in the top three results.” – Neelark
“AwesomeBar really made my life easier, no need to open bookmark, no need to search for history. Just simply type and enter.” – Karbonfootprint
The practical consequences of these developments in the address bar is that users no longer need to remember full URLs, instead users can simply remember keywords within the URL, page title, or similar and then use the address bar to get to the page they had been on.
The implications of this for those building sitemaps is that URL design and query strings need to contain useful and meaningful information that relates to the page content. For example acronyms that may mean something to internal staff, outside to the average user the acronyms are rarely memorable to external users.
Running through the list of URLs that Alex Bilbie posted (http://lncn.eu/i49) there are many URLs that make little sense to external users such as http://www.lincoln.ac.uk/cjmh/. The sections in this domain that are most likely to be picked up for searching is ‘lincoln’ and ‘cjmh’, while almost certain the ‘lincoln’ element will be remembered, the ‘cjmh’ will not. Additionally the page heading – Criminal Justice and Mental Health, isn’t included in the page title, meaning that all the advantages of the Awesome Bar remembering keywords in the URL, title, etc are lost. The URL and the page title, held within the HTML <title> tag, has to reflect the page content, allowing the user to benefit from the new features of the evolved address bar.
Further Reading:
- Awesome Bar: FireFox’s Next Killer Feature – http://lncn.eu/inu
- Chrome OmniBox – http://lncn.eu/g46
- IE8 Smart Address Bar – http://lncn.eu/cip
BBC Online Strategy
In February 2010, BBC Online sumbitted it’s response to the BBC Strategy (read: budget) Review, announced the summer before.
Along with committing to reducing it’s budget by 25% by 2013, they’ve committed to halving the number of top-level directories# (i.e. anything that falls after http://bbc.co.uk/, such as /eastenders or /drwho). The BBC currently has over 400# of these top-level directories (not including redirects) and by the end of this year, 172# will be shut down with their content moved to other areas of the site or archived offline.
The new online strategy focuses on doing “fewer things better” and they plan on grouping online content into one of ten categories:
Noticeable changes will include programmes no longer having their own top-level directory, for example Eastenders will move from http://bbc.co.uk/eastenders to http://www.bbc.co.uk/programmes/b006m86d. Likewise http://bbc.co.uk/cbeebies and http://bbc.co.uk/cbbc will probably become http://bbc.co.uk/children which will then link off to CBBC/CBeebies and teaching material such as Bitesize from the Knowledge and Learning product.
There’s already been some lively discussion on the issues around deleting and archiving BBC websites facing removal that kicked off with an initial post from Adactio blogger Jeremy Keith. He suggested that the BBC’s plans to halve its top level directories were cultural vandalism. The tenor of the criticism was the same – that the BBC is failing in its duty to preserve a record of its online past. Some sites, like http://www.bbc.co.uk/ww2peopleswar/ which is a collection of 47,000 memories and 15,000 image created by people who lived through World War 2, has been debated heavily of something that should be preserved regardless of it’s age or irrelevance to the BBC’s new strategy simply because of it’s historical and cultural value to people around the world.
This massive re-organisation that BBC Online are currently undertaking is very similar to our Linking You project; as we have discovered so far, higher education institution’s websites (including our own) have also over the years become monolithic beasts. I think for the BBC, with the huge success of iPlayer and the huge increase in second screen viewing (e.g. chatting to your friends on Facebook whilst watching TV) has made the BBC realise that they need to wake up a bit and envelop themselves in the digital age. This quote below by Erik Huggers (director of BBC Future Media and Technology) particularly emphasises the point:
“The BBC’s online strategy has, for many years, been to play a supporting role to our broadcast output. Programme first, website later. This is not the best way to deliver our public purposes in a digital age.” #
Likewise universities are slowly realising that their primary audiences (i.e. students) aren’t living in a world of paper handouts and prospectuses any more; they’re connected 24/7 and want real time, personalised content. In age of increased tuition fees, potential students are going to be more interested in HE websites that suggest courses to them based on the things they’ve “liked” on Facebook and email you a personalised prospectus, versus those institutions that ask for their address so they can send them a massive document in the post a fortnight later.
The recent redesign of University of Lincoln’s homepage has already started the process of culling unnecessary links and the grouping of content into, not products, but areas of interest:
In terms of a URI model this could easily convert into:
/home
/undergraduate
/postgraduate
/business
/schools (or /departments)
/information
and maybe a few others such as:
/contact
/news
/current_students
From this BBC debate I think the thing that we’ve got to consider as we develop a model for HE websites is that we are going to have to make sacrifices because physical value does not necessarily represent value on the web (e.g. a University may stand by it’s vice chancellor’s vision but that doesn’t mean it is necessarily worth being a top-level directory on a HE website at /vc_message). Also we need to work out exactly what elements a university is made up such as courses, faculties, accommodation information and then try to fit it into a group of core categories (similar to the BBC’s “online products”).
Other University of Lincoln services URIs
Following on from my post about the URI structure in our existing corporate website I’m now taking a brief look at a number of other websites and web applications that we have at the University, WordPress, Blackboard (blackboard.lincoln.ac.uk), SharePoint 2003 (internal – portal.lincoln.ac.uk, external – visit.lincoln.ac.uk) and Posters at Lincoln (posters.lincoln.ac.uk).
WordPress
dev.lincoln.ac.uk
We have an active blogging community here at Lincoln with over 400 registered blogs running on a WordPress MU install. WordPress has built in friendly URIs (permalinks in WordPress terminology) so I was expecting to see a good proportion of blogs that had a good URI structure.
I wrote a script – https://gist.github.com/890378 – which simply grabs the permalink structure for each registered blog so that I can see an aggregated view of the settings that have been set up for each blog.
Here are the results:
Permalink structure | Number of blogs |
/%year%/%monthnum%/%day%/%postname%/ | 404 |
/%postname%/ | 8 |
no permalink structure | 3 |
/%year%/%postname%/ | 2 |
/articles/%postname%/ | 1 |
/%year%/%monthnum%/%postname%/ | 1 |
/%monthnum%/%year%/%postname%/ | 1 |
96% of blogs are running with the default permalink structure which basically means you have URIs that looks like http://example.dev.lincoln.ac.uk/2011/03/22/hello-world. In terms of readability this does result in URIs that can be easily understood. As for predictability of a specific post’s URI then I think you’re perhaps better off doing a search from the blog’s home page. I did a Google search for “thoughts on WordPress permalinks” and there seems to be a consesus that the default permalink structure is “good enough” SEO wise however there are performance gains by using it over %postname% because it reduces the number of queries WordPress has to do internally to return the correct post.
Blackboard
blackboard.lincoln.ac.uk
The interface of Blackboard comprises of a HTML frameset page which if you read the source code actually explains itself:
“The Blackboard Academic Suite environment includes a header frame with images and buttons customized by the institution and tabs that navigate to different areas within Blackboard Academic Suite. Clicking on a tab will open that area in the content frame. Web pages containing specific content, features, functions, and tools are accessed from the tab areas.”
Basically this means that the home page, regardless of whether or not you’re signed into Blackboard or not is always http://blackboard.lincoln.ac.uk/webapps/portal/frameset.jsp and it also means you can’t directly link to a Blackboard resource (as I’m about to without screwing up the interface). You could say this doesn’t matter at all from an SEO stand point because resources aren’t externally available however I’m can’t help sympathising with someone trying to explain how to access a Blackboard resource over the phone; I’d imagine it would go something like “click on this, now that, now sign in, now click the top link in the list on the left, now select you the course you want, then this, that and you should now see what you want”, as opposed to “just click on the link I’ve just sent you”.
Taking the above into consideration (i.e. that Blackboard links are essentially irrelevant to the end user) it is interesting to see what some Blackboard URIs look like:
Blackboard Module | URI |
Announcements | http://blackboard.lincoln.ac.uk/bin/common/announcement.pl?action=LIST&context=mybb&scope=_all |
View Grades | http://blackboard.lincoln.ac.uk/webapps/gradebook/do/student/viewCourses |
Community | http://blackboard.lincoln.ac.uk/webapps/portal/tab/_3_1/index.jsp |
Logout | https://blackboard.lincoln.ac.uk/webapps/login?action=logout |
Help | http://blackboard.lincoln.ac.uk/webapps/portal/frameset.jsp?tab_id=_40_1 |
Example course | http://blackboard.lincoln.ac.uk/bin/common/course.pl?course_id=_46268_1 |
Course blog | http://blackboard.lincoln.ac.uk/webapps/blackboard/content/listContent.jsp?course_id=_46250_1&content_id=_444545_1&mode=reset |
Supervisor wiki | http://blackboard.lincoln.ac.uk/webapps/lobj-wiki-bb_bb60/wiki_home/Handler?course_id=_32711_1&content_id=_457614_1 |
A few URIs seem mildly related to their content such as the example course and the gradebook however others like the community home page almost seem random.
SharePoint 2003
portal.lincoln.ac.uk (interal) visit.lincoln.ac.uk (external)
Our SharePoint 2003 installation is our institution’s internal content repository and intranet. Every department and faculty has it’s own “site” (read: section) and providing you know exactly what you are looking for (searching doesn’t work) then it is generally quite useful. Unlike Blackboard however SharePoint is not built using frames and so you can give out direct links to content, for example:
Resource | URI |
ICT department | https://portal.lincoln.ac.uk/C15/CS/default.aspx |
University Resource | https://portal.lincoln.ac.uk/C17/UniversityResources/default.aspx |
First aiders | https://portal.lincoln.ac.uk/C11/C0/First%20Aiders/default.aspx |
External news | https://portal.lincoln.ac.uk/External%20News/default.aspx |
FreeCycle | https://portal.lincoln.ac.uk/C13/C18/Freecycle/default.aspx |
At first glance the SharePoint URIs look as random as Blackboard’s however I’ve had explained to me why this is. Basically SharePoint is made up of “sites”. In the 2003 version the first 20 sites can be named whatever you want, for example “External News” or “University Resources” however after that SharePoint insists on using folders that count up, prefixed with the letter “C”, e.g. C0, C1, C2, and so on up to C19. Inside these “sites” again you can have another 20 directories named whatever you want and then the C directories start. This basically means that sites are capped at 40 directories per directory, half of which you can alter the name. Confusing yes. Logical no. I’ve been informed that this is no longer the case in SharePoint 2007 onwards.
As a result you can potentially have sites that have a nice friendly URI structure (if you discount the /default.aspx at the end) e.g. https://portal.lincoln.ac.uk/Examples/HelloWorld/default.aspx. However for sites which are granted C-directories (such as the ICT department) the URL loses all contextual relevance. Again I think the best bet for users is to follow links through to the resource (or if they’re feeling brave, try searching for the content).
Posters at Lincoln
posters.lincoln.ac.uk
Posters at Lincoln was one of the first sites I worked on when I started working for the Online Services Team here at the University. It’s development brought about the Common Web Design and a number of other projects we’ve worked on over the last year. Therefore forgive me if I’m a bit bias in this overview.
The site is split up into “groups”, such as ICT Department or Marketing and Communications, and “campaigns”, which are posters created for different events and public notices.
We designed the URI structure to be SEO friendly and semantically relevant with URLs like:
Resource | URL |
Home page | http://posters.online.lincoln.ac.uk/home |
About page | http://posters.lincoln.ac.uk/about |
All campaigns | http://posters.lincoln.ac.uk/all |
Marketing and Communications group | http://posters.lincoln.ac.uk/group/comms |
Get Satisfaction campaign | http://posters.lincoln.ac.uk/campaign/getsatisfaction |
Science Fair campaign | http://posters.lincoln.ac.uk/campaign/Science |
As you can see, the URI structure is very simple and clean in contrast to some of the other examples mentioned above. This is partly down to the fact that the framework we built the website in, Codeigniter, has sexy URIs support built in, but also because it’s trivial to make Apache serve up extension less URIs.
This brief overview has hopefully outlined some of the differences between the URI construction of some of the online services we use at Lincoln.