On January 20, 2025, during President Donald Trump’s inauguration festivities at Washington D.C.’s Capital One Arena, Elon Musk made a hand gesture that ignited…
The semantic web for publishers and bloggers
The semantic web is often referred to as the “next phase” of the world wide web. It’s also sometimes referred to — perhaps pretentiously — as “Web 3.0”. Wrapped up in this semantic web is an appearance of artificial intelligence as it involves computers “understanding” content (eg: teaching a machine that “Africa” is a continent and that “Barack Obama” is person and politician).
Semantic tagging for dummies
Adding semantic power to your website content essentially involves adding machine-readable metadata to your articles or posts that denote relationships and meaning. This could involve tagging your content according to various categories, such as certain words in an article referring to people, places, companies and/or types of technologies. This metadata could appear as a database field or XML RSS attached to the content.
Why do it?
It’s important to add semantic power to your content because it allows your servers to find, extract, share, and re-use the information. Tagging your content in the semantic sense will allow a computer to “know” that Tony Blair or George Bush in your article or post are in fact people, or that the United States of America is a country, and Africa a continent. It gives context to the tags in your articles — and allows you to automatically do more with your content, such as build up an index of people mentioned on your site or call up a map with the locations referred to in an article. In a search sense, it helps search engines deliver more relevant and accurate results.
What’s a practical example?
Here’s an example: When redesigning the M&G Online we decided to semantically tag our articles. As a start we chose just four simple categories: people, cities, countries and companies. We created fields in our Content Management System (CMS) with each article where our journalists would pick out these tags. To save them time we used an automatic semantic tagging service called Open Calais (Read the blog here) which suggested tags to the journalists as they inputted them. For our historical archive of hundreds of thousands of articles, we also used Calais to automatically sift through and tag the content.
Because we were pulling out these fields it allowed us to do the following things:
- Build an index of topics A-Z
- Automatically pull in related articles or pictures, based on the tags
- Automatically pull in related content for each article from external (competitor) news media and the blogosphere
- Create news alerts on companies or people (useful for PR companies?)
- Pull out map images corresponding to the countries mentioned in articles
- Predict readers’ interests and suggest articles to read, based on their previous browsing habits (based on the tags)
- Create basic tag clouds, showing popular subjects, people and places.
- Via intelligent semantic tagging — we’ve performed a basic SEO function by making the site more search-engine friendly
- …and many more applications…
How could it work in a blogging context?
Recently I downloaded two plugins to add semantic power to my posts. The first was a plugin called Tagaroo, also by Open Calais. Based on the tags it pulls from my posts, it also recommends relevant pictures from Flickr I can use. The second was a plugin called Simple Tags, which allowed me to do things like pull up related articles for each post automatically — however its not as semantically “aware” as Calais.
How could this apply to a social media context?
Via Wired magazine, I came across Twine, which says it is powered by “semantic understanding”. Twine automatically organises information, learns about your interests and makes connections and recommendations. The more you use Twine, the better it understands your interests and the more useful it becomes. It’s in beta still, but the idea is a good one. One of the hallmarks of the digital age of cheap content production and distribution is too much information. Filters, like Twine, are needed to deliver relevant, quality content.