• Motorburn
      Because cars are gadgets
    • Gearburn
      Incisive reviews for the gadget obsessed
    • Ventureburn
      Startup news for emerging markets
    • Jobsburn
      Digital industry jobs for the anti 9 to 5!

The ‘Chinese Room’ and why computers can’t figure out sarcasm

Econsultancy.com, which recently launched its Twitter for Business Guide, suggests that, contrary to popular belief, most of the sentiment posted about brands on Twitter is positive.

This piece of data is also contrary to Brandwatch’s Customer Service Index, which states that out of the 16 000 tweets it studied 48 percent were negative, 16 percent positive and 36 percent were neutral.

The survey found that 26 percent of consumers say they have complained about a brand on Twitter compared to over half (58 percent) who have praised a brand on the site. Whilst the two documents definitely differ in terms of their methodologies what becomes apparent is that Twitter Analysis is akin to John Searle’s Chinese Room’ thought experiment:

Imagine a native English speaker who knows no Chinese locked in a room full of boxes of Chinese symbols (a data base) together with a book of instructions for manipulating the symbols (the program). Imagine that people outside the room send in other Chinese symbols which, unknown to the person in the room, are questions in Chinese(the input). And imagine that by following the instructions in the program the man in the room is able to pass out Chinese symbols which are correct answers to the questions (the output). The program enables the person in the room to pass the Turing Test for understanding Chinese but he does not understand a word of Chinese.

Searle goes on to say, “The point of the argument is this: if the man in the room does not understand Chinese on the basis of implementing the appropriate program for understanding Chinese then neither does any other digital computer solely on that basis because no computer, qua computer, has anything the man does not have.”

Consider the following tweet: “Checkers is a great grocery store, if you’re into 4 day-old chicken’. Now, whilst an ordinary human being can tell you this is stock-standard sarcasm, Twitter sentiment monitoring software is going to see the words ‘Checkers’ and ‘great’ and report back that this is a favourable tweet about the brand.

Tim Shier, MD of sentiment analysis company Brandseye, mentions that whilst sarcasm is hard for computer programs to pick up in a social media context, it’s not impossible.

He pointed me to this rather academic looking paper on semi-supervised recognition of sarcastic sentences, wherein syntactic and pattern based features of sarcastic sentences are used to create an algorithm to identify sarcasm in the future. The writers of the paper analysed hundreds of differently formed sarcastic sentences in terms of what words were used, the pattern in which they were presented and the accompanying punctuation.

After experimenting with a data set of 66 000 Amazon product reviews their SASI algorithm came back with 77 percent accuracy in terms of identifying sarcasm, yet it couldn’t discern between a sentence which reads: “This book was really good until page 2!” and “This book was really good until page 302!”.

While the former is clearly sarcastic, the latter sentence (with the same syntactical and punctuation structure) merely intimates the book didn’t have a great ending.

The question is: is that good enough?

Image: applebutter

Author | Graeme Lipschitz: Columnist

Graeme Lipschitz: Columnist
Graeme Lipschitz is the co-founder of digital innovation agency Wonderland Works where he heads up Social Media, Search and Product Development. He was previously the business development manager at Clicks2Customers.com and has worked for Google UK and South Africa as a lead account strategist, growing Adwords accounts in the... More
  • Hi Graeme.

    Interesting piece and thanks a lot for the reference to our report.

    I should add that while Brandwatch does include automatic sentiment analysis, we had our analysts manually verify each mention over the period this study was conducted, so they should be as accurate as is humanly possible!

    It’s also worth considering that the tweets we looked at were solely about customer service (i.e. they had to include a customer service related phrase near the brand mention).

    So, the overall conclusion from our report can be interpreted in two ways: either sentiment about customer service is negative because customer service as represented online is currently not meeting overall consumer demands, OR people are more likely to tweet about bad customer service experiences than good ones.

    My hunch is that it’s a combination of both…



  • Nice piece, but I do think that this may be like the pursuit of the Holy Grail in some respects.  I am not sure that we are ever going to be able to get programming to extrapolate sentiment with a 100% accuracy, we should see it improve, but this is going to be one element that will always require some sort of human analysis.

  • Rowan Puttergill

    Hi Graeme

    Firstly, thanks for the article. I am currently working on a project where we are running into precisely these kinds of difficult with natural language processing, so picking up on other work that is being done in the area is extremely useful. I am apologising in advance for the length of my response to your article. ;-)

    With regard to the Chinese Room Experiment, not everybody buys into Searle’s view which is strongly opposed to the idea that any machine might ever become sentient. It has taken me a long time to finally reach a point where I am willing to commit to one camp or the other, but I must admit that I have found myself quite firmly situated on the other side of the fence. Nonetheless, I must point out that the Chinese Room Experiment is not wholly applicable to the problem that you are discussing. Sure, you may argue that understanding sarcasm requires some level of semantic understanding, but in this, I think you will find that detecting sarcasm is really about placing a statement into context and determining whether or not is uttered falsely.

    Part of the problem that you describe about sarcasm is that sarcastic statements require context that may not be provided in the statement that is issued. As a vegetarian, I am not sure whether 4-day old chicken tastes better than 1-day old chicken. Certainly, I know that in order for venison or game to taste good, it is meant to hang for some time. Without your cultural frame of reference, I can only guess as to whether you are being sarcastic or not. Sure, I am likely to make a good guess, but I am still only guessing.

    A better example would be a statement by myself to the effect of “I really love software patents.”. If you know me, you will know that I am a big advocate of open-source software and am most likely really opposed to software patents. But, you really need to know about me in order to pick up on the sarcasm. The jump that an algorithm would need to make would include mining all of the relevant data about me and then evaluating whether I have made this statement in contravention to all of the other things that apply. That job can become enormous because it is tangled up with obtaining all of the information about the culture that I belong to. Certainly, my completely non-geeky partner would have no idea whether I was being sarcastic or not, even though she knows I love open-source software. That’s because she doesn’t have the information about how patenting affects open-source development, not because she doesn’t understand the meaning of ‘software patents’ or ‘open-source software’. Armed with the correct information she could apply the following logic:

    1) Software patenting is bad for open-source software development (research)
    2) Rowan supports open-source software development (prior information)
    3) Rowan has stated that he loves software patenting (statement to be evaluated)
    4) Rowan’s statement is false (sarcasm)

    This process could be done with almost no semantic understanding of the concepts involved, which is why I don’t believe Searle’s Chinese Room Experiment really applies to the problem. For text based content, the process is admittedly very complex because any algorithm designed to work out sarcasm will need to mine for additional context and this can be so time and processor intensive that it is just not worth pursuing that path at all. There are other cues that you can look for though. For instance the fact that you append an ‘if’ clause to your original statement, is an indicator that there is uncertainty about the original statement. A reasonable algorithm can be designed to look out for these sorts of indicators, increasing the likelihood of success.

    More complicated than sarcasm is the use of slang in social media. I have seen tweets like “cherry coke is the shit” and “my new Nike trainers are sick”. It is very difficult to determine whether these words are being used pejoratively or not. However, for a non-native English speaker, the same problem exists, so I don’t think software that is getting a 70% success rate is doing too badly.

    All in all, I don’t think that creating software that can properly evaluate the polarity of a statement is completely impossible, it is just incredibly complicated and probably a long way off. It is easy to envisage the man in the chinese room being handed another set of rules that will output the correct information for this, even if he still does not grasp the semantics behind the statements. Either way, this isn’t about whether a system passes the turing test, only whether it can correctly evaluate whether statements are positive or not.

  •  Now I understood why computers doesn’t feel

  • Kira

    indeed sentiment is such a subjective matter – what could be positive for someone or a department might be the contrary to others…

  • Pingback: The sarcasm machine: Measuring sentiment in social media mentions | memeburn()

More in General Tech

What does the Shionogi incident teach us about Virtualisation?

Read More »