GOOGLE CHANGED THE world with its PageRank algorithm, creating a new kind of internet search engine that could instantly sift through the world’s online information and, in many cases, show us just what we wanted to see. But that was a long time ago. As the volume of online documents continues to increase, we need still newer ways of finding what we want.
That’s why Google is now running its search engine with help from machine learning, augmenting its predetermined search rules with deep neural networks that can learn to identify the best search results by analyzing vast amounts of existing search data. And it’s not just Google. Microsoft is pushing its Bing search engine in the same direction, and so are others beyond the biggest names in tech.
This morning, the Allen Institute for Artificial Intelligence, a not-for-profit created by Microsoft co-founder Paul Allen, unveiled a search engine it calls Semantic Scholar. It uses machine learning and other AI in an effort to significantly improve the way the academic world searches through the increasingly enormous corpus of published research. Pointing to recent improvements to the Google search engine, Amazon’s product recommendation engine, and the Facebook News Feed, Allen Institute CEO Oren Etzioni says the organization is trying to leverage many of the same techniques for the academic community.
Etzioni likes to talk about “the Moore’s Law of scientific publication.” Judging from the research papers already indexed by the new search engine, the volume of academic research is increasing at an exponential rate, and one independent study says that the number of papers is increasing about 4 or 5 percent a year, with 2.5 million published in 2014. That means researchers just don’t have the time to look through everything. They need some help.
“They need a way to deal with this overload,” says Marti Hearst, a professor at the University of California Berkeley whose research focuses on search engines. “And with improvements in user interface design and AI, we are seeing tools that can make it easier.”
The Allen Institute search engine is designed not only to help scholars find the papers they’re looking for but also surface the specific results and images that can serve their own research. It does this with a variety of techniques, including natural language processing algorithms that can better understand what the paper is saying and computer vision technology that can identify tables and photos in the paper and extract them. “We want to transform from a keyword exercise to something that is using semantics and AI,” Etzioni says.
Initially, the new search engine will focus on neuroscience and computer science research, covering over 10 million papers, but the organization plans on expanding into other subjects. By next year, it says, the service will cover all biomedical literature as defined by PubMed, the existing medical and science database. The techniques that underpin Semantic Scholar are hardly groundbreaking, but the tool at least points in the right direction. And after so many recent advances in machine learning across the tech world, the promise is that we can get there.