Trying to find an app on the Google Play Store can be an exercise in frustration, especially if it’s an eagerly anticipated app or new release. Things don’t get much better for mega-popular apps, as the search results are often cluttered with irrelevant results.
Fortunately, Google is working on a solution, using machine learning to get better results. The company detailed its efforts on the Google Research Blog.
“Searches by topic require more than simply indexing apps by query terms; they require an understanding of the topics associated with an app,” the team of software engineers wrote. The work required machine-learning approaches, but one big challenge for machine learning was the size of the data-set to work with.
“While for some popular topics such as ‘social networking’ we had many labeled apps to learn from, the majority of topics had only a handful of examples,” the team explained.
The neural network for Google Play search initially used the title and description to predict topics for an app
The first solution was to build a deep neural network (DNN) trained to predict topics for an app based on words/phrases used in the title and description.
“For example, if the app description mentioned ‘frightening’, ‘very scary’, and ‘fear’ then associate the ‘horror game’ topic with it. However, given the learning capacity of DNNs, it completely ‘memorised’ the topics for the apps in our small training data and failed to generalise to new apps it hadn’t seen before,” Google’s engineers elaborated.
The team then revised its machine-learning approach, using the way humans learn as inspiration. It noted, as an example, that humans need to see very few ‘horror game’ app descriptions before learning how to associate new apps to that genre.
“To emulate this, we tried a very rough approximation of this language-centric learning. We trained a neural network to learn how language was used to describe apps,” the team said of their overhauled approach, currently in use on the Google Play Store.
The engineers said that their new approach yielded “reasonable results” but would over-generalise at times. “For instance, it might associate Facebook with ‘dating’ or Plants vs Zombies with ‘educational games’,” they wrote.
Far from just using humans as inspiration, the team is actually using humans in the process too.
“We built a pipeline to have human raters evaluate the classifier output and fed consensus results back as training data. This process allowed us to bootstrap from our existing system, giving us a path to steadily improve classifier performance.”