Charles Arthur’s Overspill blog recently linked to Million Short, a site that aims to help you find the search results that get lost or buried when you use Google, DuckDuckGo or Bing. It tries to do this by allowing you to discard the first 100, 1,000, 10,000, 100,000 or million sites from the results, or to exclude certain types of site, such as ecommerce. I’ve written before about how algorithmic search has developed to the point where it tends to produce lots of very similar hits, giving the impression that it may be missing many high quality and relevant but idiosyncratic results.
My initial impression of Million Short’s approach is that it’s too crude a measure to turn up the good stuff reliably (or indeed often). When Google first appeared, its algorithm ranked pages on the basis of the number of links from other pages with a high reputation or standing. These factors were proxies for relevance. The more “high quality” links it had, the more likely it was that a page would be worth your while. So was born the industry of Search Engine Optimization.
The existence of sites like Million Short indicates that such proxies for relevance aren’t always very good at finding what we’re looking for. We need something better. My own preference would be for a search engine that rewarded the quality (and, in some cases, the originality) of the writing. The crudeness of the Million Short approach made me think it might be the kind of problem that’s susceptible to a solution based on neural networks, machine learning and big data (AI for short).
It’s widely recognized that the quality of Google Translate improved immeasurably when Google introduced an AI model (Neural Machine Translation) in place of its previous statistical approach to translation. Search seems ripe for an analogous change of approach. Instead of proxies for relevance, search could be based on the identification by suitably trained neural networks of writing that is coherent, well informed, thoughful and illuminating. It can’t happen soon enough for me.
With its enormous resources and its experience of building an AI-based machine translation system, as well as its history of dominating the search ecosystem, Google would appear to be better placed to undertake this task than almost any other company. Strange as it might seem, the incumbent may be our best hope for overdue innovative disruption in search.
Posted by Art on 13-Nov-2019.