The ‘Chinese Room’ and why computers can’t figure out sarcasm


Econsultancy.com, which recently launched its Twitter for Business Guide, suggests that, contrary to popular belief, most of the sentiment posted about brands on Twitter is positive.

This piece of data is also contrary to Brandwatch’s Customer Service Index, which states that out of the 16 000 tweets it studied 48 percent were negative, 16 percent positive and 36 percent were neutral.

The survey found that 26 percent of consumers say they have complained about a brand on Twitter compared to over half (58 percent) who have praised a brand on the site. Whilst the two documents definitely differ in terms of their methodologies what becomes apparent is that Twitter Analysis is akin to John Searle’s Chinese Room’ thought experiment:

Imagine a native English speaker who knows no Chinese locked in a room full of boxes of Chinese symbols (a data base) together with a book of instructions for manipulating the symbols (the program). Imagine that people outside the room send in other Chinese symbols which, unknown to the person in the room, are questions in Chinese(the input). And imagine that by following the instructions in the program the man in the room is able to pass out Chinese symbols which are correct answers to the questions (the output). The program enables the person in the room to pass the Turing Test for understanding Chinese but he does not understand a word of Chinese.

Searle goes on to say, “The point of the argument is this: if the man in the room does not understand Chinese on the basis of implementing the appropriate program for understanding Chinese then neither does any other digital computer solely on that basis because no computer, qua computer, has anything the man does not have.”

Consider the following tweet: “Checkers is a great grocery store, if you’re into 4 day-old chicken’. Now, whilst an ordinary human being can tell you this is stock-standard sarcasm, Twitter sentiment monitoring software is going to see the words ‘Checkers’ and ‘great’ and report back that this is a favourable tweet about the brand.

Tim Shier, MD of sentiment analysis company Brandseye, mentions that whilst sarcasm is hard for computer programs to pick up in a social media context, it’s not impossible.

He pointed me to this rather academic looking paper on semi-supervised recognition of sarcastic sentences, wherein syntactic and pattern based features of sarcastic sentences are used to create an algorithm to identify sarcasm in the future. The writers of the paper analysed hundreds of differently formed sarcastic sentences in terms of what words were used, the pattern in which they were presented and the accompanying punctuation.

After experimenting with a data set of 66 000 Amazon product reviews their SASI algorithm came back with 77 percent accuracy in terms of identifying sarcasm, yet it couldn’t discern between a sentence which reads: “This book was really good until page 2!” and “This book was really good until page 302!”.

While the former is clearly sarcastic, the latter sentence (with the same syntactical and punctuation structure) merely intimates the book didn’t have a great ending.

The question is: is that good enough?

Image: applebutter

More

News

Sign up to our newsletter to get the latest in digital insights. sign up

Welcome to Memeburn

Sign up to our newsletter to get the latest in digital insights.