Skip to main content
. 2020 Jul 6;9(7):e16543. doi: 10.2196/16543

Table 1.

Developing a list of top search queries.

Step Description
1.1 Begin with a list of regions to explore and a single, broad, initial search term, such as birth control.
1.2 For each region, make a request to Google Trends’ getTopTopics function to obtain the most searched-for topics for a specific initial search term. The function will return a list of topics that term is most closely related to as well as a value from 0 to 100 that denotes how strongly linked the topic is to the initial term: 100 is the most closely associated and 0 is the least. This list of top topics serves only to validate the top queries by examining similarities between the top topics and top queries.
1.3 Next, make a call to Google Trends’ getTopQueries function to get a list of the search queries most related to the initial search term in Step 1.1 for a given region, such as the United States. Each response from the getTopQueries method contains a title, or query, and a value attribute, which is a number from 0 to 100 and represents how related the query is to the provided initial search term in the United States: 100 is the most associated and 0 is the least. The data are presented in the form of a JSON (JavaScript Object Notation)-encoded mapping (see Figure MA2-1 in Multimedia Appendix 2), which can easily be converted into a graph via Python or exported to a CSV (Comma-Separated Values) file. If there are other regions of interest (eg, US states), this step must be repeated for all other regions. Each region will have a final list variable that stores all the top queries for that region. Once all final lists are generated for all regions of interest, they will be combined to create a master list that includes the top queries for every region of interest (Figure 1 shows an example). Figure MA2-2 in Multimedia Appendix 2 shows a snippet of the Python code.
1.4 For every query generated in Step 1.3, send a request to getTopQueries to obtain follow-up terms. Only queries with a value attribute greater than or equal to 70, as this indicates a high level of correlation between the terms, is added to our follow-up queries list. Irrelevant searches relating to pop culture should be manually filtered from results. Step 1.4 should be recursively executed—the follow-up queries become the base set at each iteration—until no new queries can be added to the base set. During this step, how each query is related to each other (ie, how a query ended up in our set of queries) should be recorded. This step is terminated when requests to getTopQueries do not return unique queries that have not already been received in the simulation for this region.
1.5 Then, generate a graph using the Graphviz package for Python 2.7 [38] that illustrates how the search queries in the final list and the follow-up queries list are related to one another. As shown in Figure 2, every node in the graph is a search query, and those in the first level will be included in the final list of search queries for the simulation. If a node is encapsulated by a double circle, then this represents an overarching topic coded for internal organizational purposes within the Google application programming interface (API) and is not included in the final list or follow-up queries. Every direct edge (arrow) in the graph represents a relationship between two search queries (nodes) in the graph. Note that with the current cutoff value of 70 in Step 1.4, there may be other intermediate terms in the graph not captured. Figure MA2-3 in Multimedia Appendix 2 shows the Python function used.