Search at Calm

Search at Calm

Calm’s Search feature is an integral part of our app experience. In 2021, we refreshed our app to accommodate the scale of our ever-expanding content library. With a larger content library, it became more important than ever to provide a world-class Search.

We had a few goals in mind for building an improved version of Search: (1) give ourselves flexibility to expand functionality in the future, (2) reduce client-side custom logic, and (3) reuse existing UI components.

Challenges

We quickly realized that we could not use the same approach from our first version and faced a few hurdles, including:

  • How were we going to provide search in 7 languages?
  • How could we improve search relevance?
  • How could we make Search fast and avoid exceeding the request limits of our third-party search engine?
  • How would we rethink the original data model to support all content types? (examples: breathe bubbles, narrator profiles, nested tracks, soundscapes and more)

Before tackling these questions, let’s review our original implementation.

Search V1 Flow Review

We needed to index data in our third-party search engine, which meant storing searchable data there so the third party search engine could query against our searchable library.

To index data in the search engine, we built a nested object with all searchable programs.

A program is like an album that might include nested tracks, which we at Calm refer to as guides. In the first search version, we only indexed the program and not the nested guides. V1 only supported English so the buildSearchablePrograms would filter out all international content.

// function to reindex search content
async index() {
   const programs = await buildSearchablePrograms();
   await this.index.replaceAllObjects(programs, {
     autoGenerateObjectIDIfNotExist: false,
   });
   return { success: true };
 }

The indexing process was invoked by Calm content administrators on our internal admin site. Whenever they created or updated content in our admin portal, we would trigger reindexing of all data in the 3rd party search engine. Imagine if a single Calm administrator changes a word in a sleep story description, we would replace all searchable data that we store in the third party search engine. If multiple Calm administrators make changes at the same time or back to back throughout the day, then we will constantly re-index, which can be very expensive and unnecessary.

V2: Reimagining data models for indexing



In search V1, we defined a very limited data schema, limiting the format of data that we would store in the search engine. For example, the program Train Your Mind has nested guides like Intro to Mental Fitness, Managing Emotions, and more. V1 only indexed the outer program Train Your Mind, so users were unable to search for these nested guides. With V2, we wanted to index nearly everything including guides.

Package Records (Packs) 📦

We did not want to build custom data formats to support all content types.

We abandoned the V1 approach and utilized our rising content data model called packs. Less than a year prior, the team had started refactoring the client and server-side data models for content to use packs. A Pack is a container of closely related content pieces, selected and translated appropriately based on language. For example, each of the carousels on the home screen is a pack, a collection of content. We wrote SQL scripts to generate search packs. We wrote a cron job for search pack creation that runs every 10 minutes (more on this in a few paragraphs!). Thus, our search engine always stays up to date with Calm's latest content offerings.

An additional benefit to updating search to use the packs data model was that, since the mobile apps were already using packs, we could reuse a lot of existing components to display search results. This decision to index with packs allowed us to efficiently expose the majority of our content library through search without having to refactor large parts of the backend and client-side codebases.

Backend Implementation

Search cron ⏲️

As mentioned earlier, we use a cron to automatically index searchable content every ten minutes. The search cron is run after our pack creation cron. The pack cron runs SQL files to generate package records like all narrators, all programs, all nested guides, all check-ins and more. Each piece of content can be marked available for all seven languages that Calm supports or for a specific language. The search cron takes these package records and enriches them with additional metadata. Translated strings are used where appropriate (Music titles and Narrator names, for example, are never translated). Then, the cron indexes translated data in the corresponding international search index. We support 7 languages, so 7 indices for each language.

Caching

To reduce search requests to the third party search engine and reduce latency, we built caching. We use ioredis and have ElasticCache Redis clusters in prod and dev. After implementing caching, we noticed p50, the 50th percentile, of the search V2 endpoint reduced from an average of 450ms to 50ms. The number of search requests to the third party engine reduced by 70% from an average of 1.3 million requests to 377K monthly.

Endpoint: GET /search/v2?query endpoint

This is the new endpoint for V2. We did not want to introduce breaking changes to our mobile clients using the V1 backend, so we simply introduced a new endpoint for V2.

We first checked for an active language to select the correct index to query. Then, we created a search client by calling the makeClient method, which initializes a search client once and caches it for subsequent requests. We applied the singleton pattern here.

export class SearchV2Client {
  #searchProxy: SearchProxy<HydratedSearchPackItem[] | PackItem[]>;
  static _client: SearchV2Client;
  constructor(proxy: SearchProxy<HydratedSearchPackItem[] | PackItem[]>) {
    this.#searchProxy = proxy;
  }
  // queryTerm
  // makeClient - initialize a client once after the first search request
  static async makeClient(): Promise<SearchV2Client> {
    if (this._client === undefined) {
      // 1. Get search indices
      // 2. Create an ioredis instance (singleton, cached for subsequent requests)
      const redis = IORedisClient.instance('search-query');
      const proxy = new CompositeSearchProxy({ redis, indices });
      this._client = new SearchV2Client(proxy);
    }
    return this._client;
  }
}

We created a CompositeSearchProxyto handle the use case of search in Redis – the cache – or in our third-party engine. In this case, if we fail to pass a redis instance, the CompositeSearchProxy will continue searching in our third-party engine. We utilize the proxy design pattern in this case.

export class CompositeSearchProxy
  implements SearchProxy<HydratedSearchPackItem[] | PackItem[]> {
  RedisSearchProxy: RedisSearchProxy;
  ThirdPartySearchProxy: ThirdPartySearchProxy;

  constructor({ redis, indices }: CompositeSearchProxyConstructor) {
    this.RedisSearchProxy = new RedisSearchProxy(redis);
    this.ThirdPartySearchProxy = new ThirdPartySearchProxy(indices);
  }

  async get(
    searchParams: SearchParams,
    { userToken, attributesToHighlight, hitsPerPage }: ThirdPartyParams,
  ): Promise<HydratedSearchPackItem[] | PackItem[]> {
    const should_use_cache = await isCacheEnabled();
    if (should_use_cache) {
      const redis_results = await this.RedisSearchProxy.get(searchParams);
      if (redis_results) {
        return redis_results;
      }
    }

    const third_party_results = await this.ThirdPartySearchProxy.get(
      searchParams,
      {
        userToken,
        attributesToHighlight,
        hitsPerPage,
      },
    );
    if (should_use_cache) {
      this.set(searchParams, third_party_results);
    }
    return third_party_results;
  }

  private set(
    searchParams: SearchParams,
    third_party_results: HydratedSearchPackItem[],
  ): void {
    this.RedisSearchProxy.set(searchParams, third_party_results);
  }
}

In the CompositeSearchProxy.get() method, we check the caching feature flag to enable or disable search via the Redis cache. If we were to experience any issues with caching, we could simply disable the Redis cache through a feature flag instead of making code changes.

After getting back results from either Redis or a third party engine, we prepare the data for clients by removing unnecessary fields and enriching the metadata. We also filtered the search results to exclude content that may not be playable due to the user's device, location, or subscription tier.

Supporting search across seven languages

For the first time, our users are able to search for content in their preferred language within the Calm apps. We currently support seven languages (English, Korean, French, Japanese, Portuguese and German). For each language, we create a specific search index named with the format “calmsearch_prod<language>”. If a user selects a different language in the apps, we will switch to the language-specific index that matches their preference. This allows our users to search for content relevant to their language. The Calm Localization team also gets a language-specific view of top searches, results, and searches that yield no result. This allows our team at Calm to iterate and deliver a better experience to users worldwide.

One of the challenges that we faced was how to translate all content. Going with Packs solves this problem because we already translate all packs data. After we have all the packs generated by a cron and recorded in the database, the API service pulls records from tables that need translation and creates a JSON file to send to the third-party translation service. That third-party service then translates the JSON to all the seven languages that we support. A cron uploads those translated JSON files to AWS S3 for the Calm API service to pull from.

Our database only has to store content in English. Before returning content data to the client, we enrich a content piece with additional metadata and use the translated fields from the downloaded Smartling JSON. This process is quite automated, so adding new content marked for certain languages will be translated as expected.

This is at the beginning of search V2 launch.

Fine-tuning search results 🔧

Searchable attributes

We utilized our third-party search engine to set searchable attributes. Attributes at the top have a higher weight than attributes at the bottom.

For each piece of content, we added a “searchcategory” field, which categorizes the content and increases search relevancy. Currently, we tie searchcategory and title in one ranking to improve results for sleep content. We also use searchcategory in the app to section search results by categories. These searchcategory are used for the navigation tags in the app.

search_category: navigation tags

We learned that by ignoring word positions, we get better search results.

Tie Breaker Rules

We rely on custom tie-breaker rules to boost certain content. For example, we prioritize returning newer content higher in search results. The product team also wanted the ability to boost certain categories of content. So for each indexed item, we added a field “boosted” that we use in our third-party search engine as a custom ranking.

As a last resort tie-breaker, we sort titles of content alphabetically.

🌎 Internationalization (i18n) & Synonyms

The team had a lot of fun building with i18n in mind. For the first time, we were able to test search results in many languages. Our mighty i18n team developed a test plan to search for the top popular terms in each language and reported back any results that needed tuning. For example, in the Calm Korean app, we searched for celebrity names in Hangeul to see if our search results would bring up content for those artists. We also thought about banned expressions. The search engine that we use has a feature for uploading banned expressions or profanity terms. We uploaded banned expressions for each language. We would use these lists later on when implementing dynamic suggestions, which is another feature of the search engine.

🎉 Conclusion 🎉

While we are still early in our journey to improve the Calm search experience, we are very excited to see a positive impact on our users’ experience thus far. In the three months since we rolled out Search v2, we’ve seen a 4% decrease in users’ reliance on suggested search terms, which indicates that more searches have become custom queries.

Click Through Rate (CTR) increased by about 20%.
“Session began” metric increased from by about 14%.
Completion of sessions clicked through search increased by 11%.

We’ve also noticed an increase in English searches due to higher repeat search rates. Search V2 users have also been clicking on a wider variety of content types. We’ve also learned about content that yields no search results, which enables us to home in on the content that our users want. Since the V2 launch in August 2021, Search V2 has supported over 6 Million search requests.

This project was such a fun ride. Improving Calm’s search experience was a big goal, but approaching this project with the user experience in mind helped us bring this project to fruition. We started out by asking ourselves how we could present more searchable content to the users and quickly realized that we could utilize the pack system – which had already demonstrated a 10x decrease in CPU, faster API responses, and simpler client component development – to expose our searchable library. Then, we asked ourselves how we could bring Search to all languages. This question helped us focus on fine-tuning results in each language and exposing search packs in the right language. After answering both of these questions, we moved on to making search queries fast. Caching reduces search request time, making search faster by 8 times and significantly decreases the number of requests made to the third-party search engine.

We believe that every Calm user should be able to quickly explore our vast content collection. A much improved search experience empowers them to do so. Happy searching!