State-of-the-Art Mobile Search: Introduction

Rod Smith

Rod Smith

State-of-the-Art Mobile Search is a series exploring how to implement advanced mobile search.

State-of-the-Art Mobile Search Part 1: Offline Search

Mobile apps that support offline reading need advanced search, but most fall severely short of state of the art. They either only work online or they lack key features, which frustrates users who try to find natural-language content (e.g., articles, brochures, documents, e-books, email, guides, help content, knowledge bases, legal briefs, medical abstracts, news, policies, recipes, reviews, and social media updates) quickly. This blog series helps remove those barriers to customer engagement and advocacy by implementing advanced search in mobile apps.

In the general search model, a search engine satisfies a user’s information need by answering a query with relevant search results from a corpus (the searchable collection) of unstructured documents. Optionally, the user may repeatedly refine the query and search again for more relevant results. Part 1 of this series shows how to implement such a search engine appropriate for an offline-capable mobile app. The rest of the series explains how to minimize the number of times the user has to refine the query and search again.

Queries provided by users are subject to misconception and misformulation. For example, consider a user who needs to know the title of the song “Scream & Shout”:

  • The user may misconceive the information need, resulting in a query like the following: “What song did Black Eyed Peas release in 2012 featuring a British female singer?” The Black Eyed Peas no longer exist, and no British women were involved. The song was actually by Will.i.am and Britney Spears with a British accent.
  • The user may misformulate the search query, e.g., “What’s the name of the song that Will.i.am and Britney Spears sang together?” A naïve search engine might exclude documents that say “featuring” rather than “sang together” and documents without the words “What’s” or “name.”

A search engine can address misformulation by canonicalizing terms and ranking results with a good tf-idf (term frequency, inverse document frequency) model, i.e., by returning the documents with text similar to the most important terms in the search query. Such techniques are explained in parts 2 and 3 of this series. Semantic search can resolve some trivial forms of misconception, but full semantic search requires approaches that are beyond the scope of this blog series (e.g., machine learning and sophisticated language models that are impractical in mobile apps today). However, a first-order approximation to semantic search will be illustrated in part 5.

Following the lead of the major web-based search engines, websites of all sizes have significantly improved their search in the past several years by adopting server-based search products, including Apache’s Lucene (which powers LinkedIn, Twitter, and Wikipedia), its companion product Apache Solr (which powers Apple, Buy.com, Disney, and Netflix), and the very popular Google Search Appliance. Server-based search products cannot be used by mobile apps when they run offline, but their features highlight opportunities to improve offline search. This series of blog posts explores approaches to implementing those features and other advanced search components in an offline mobile app:

  • Part 1: Offline search explains how to implement fast offline search, which is a key component of mobile apps that support offline reading.
  • Part 2: Ranked retrieval discusses how to sort results by relevance with tf-idf models, as server-based search products do.
  • Part 3: Information content weighting and document focus normalization explains how to make information-rich terms influence the result ranking more than less informative terms; and how to make documents that primarily focus on matching the search query rank higher than longer documents that only incidentally mention the search terms.
  • Part 4: Fields and phrases talks about awarding a search bonus to implicit phrases and to matches in fields other than the main document body.
  • Part 5: Term canonicalization explores the offline version of server-based search features such as matching search terms on word stems rather than literal substrings and indexing related terms.
  • Part 6: Faceted search filters and context snippets shares how to create server-based search execution features to offline mobile apps, including faceted search, category filtering, and showing context snippets.
  • Part 7: Spelling correction explains how to augment misspelled search query words with suggested spelling corrections in an offline mobile app, similar to the spelling suggestion feature that is common in server-based search products.
  • Part 8: Analytics explores evaluating results in a mobile search engine, including analytics as is standard in server-based search products.

About rodasmith
Rod is a Slalom Consulting Solution Architect and software developer specializing in mobile applications for HTML5, Android, and iOS, with a passion for natural language processing and machine learning.

8 Responses to State-of-the-Art Mobile Search: Introduction

  1. Pingback: State-of-the-Art Mobile Search Part 1: Offline Search | The Slalom Blog

  2. Pingback: State-of-the-Art Mobile Search Part 3: TF-IDF Models | The Slalom Blog

  3. Pingback: State-of-the-Art Mobile Search Part 2: Ranked Retrieval | The Slalom Blog

  4. Pingback: State-of-the-Art Mobile Search Part 4: Fields and Phrases | The Slalom Blog

  5. Pingback: State-of-the-Art Mobile Search Part 5: Term Canonicalization | The Slalom Blog

  6. Pingback: State-of-the-Art Mobile Search Part 6: Search Execution | The Slalom Blog

  7. Pingback: State-of-the-art mobile search part 7: spelling correction | The Slalom Blog

  8. Pingback: State-of-the-art mobile search part 8: evaluation | The Slalom Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: