Effective Information Searches

Caroline M. Eastman

August 2002

 

 

The Internet has rapidly become one of the most important sources of information today, and computer-based information searches have moved from being the domain of information specialists searching on-line databases for end users to a much less tightly controlled environment where most searches are conducted using search engines or portals by the users themselves. Such searches often result in overwhelming numbers of hits matching the query, and users are not always successful in finding the information they seek. This problem has been addressed by attempts to improve both the performance of search engines and the searching abilities of users.

 

Our research in this area has focused on the effectiveness of various searching options when using Web search engines.  Search specialists have traditionally used a variety of techniques to improve their searches, including the use of Boolean operators, truncation, and field restrictions; they often advise users to learn these techniques for more precisely specifying queries in order to improve their search results. Most Web searchers, however, continue to use very short and unstructured queries, usually consisting of one or more words. And, when they do use more advanced features, they often use them incorrectly. Our research indicates that the users are, in fact, behaving rationally by asking simple queries. The algorithms used by search engines today are far more sophisticated than those used by the online retrieval systems of a few years ago. They are now using matching and ranking algorithms that render many of the traditional searching techniques obsolete in the current environment. 

 

A study of queries conducted by students in information retrieval courses found that the use of advanced operators did not always improve the results of the search and sometimes produced inferior results, as measured by top 10 precision (Eastman, 2001; Eastman, 2002). The use of operators intended to allow more precise queries, such as phrases and Boolean AND, usually resulted in fewer hits but also resulted in lower top 10 precision in about one third of the searches. We found this behavior across a variety of query topics and search engines. A follow-up study with Jim Jansen of Pennsylvania State University is examining this behavior in more detail using queries selected from Excite query logs. One hundred queries with advanced operators were used to search both in the form originally posed by Excite users and in a simplified form consisting of simply the keywords used. Preliminary results indicate that the use of advanced operators by users does not in general result in improved top 10 precision.

 

A related study with Susan Doran (MS student in Computer Science and Engineering) is investigating the effectiveness of searches in nutrition web sites. A set of 76 nutrition-related queries drawn from the same Excite query log has been used to search selected nutrition web sites. Sites were selected from those reviewed and included on the Tufts University Nutrition Navigator website; both higher rated and lower rated sites were included.  Matches to only about a third of the queries (34.5%) were found on the nutrition sites searched, including the highly rated sites.  In many cases, these matches were not found in the first 10 hits but required navigation through the site. The matches were judged on the basis of topic, so it is not know whether or not the original user’s information need would have been met. A search in Excite found a much higher percentage of matches. These preliminary results indicate that the information needs of users searching for nutrition information are not well met by web sites specifically established to provide this information. A further examination of this apparent mismatch is looking at the sites FAQs to see if these are in fact addressing frequently asked questions. We are attempting to obtain some site query logs to investigate this in more detail.

 

Future work in the area of nutrition searching will consider a broader range of nutrition-related queries and examine ways in which nutrition web sites and portals can better meet the information needs of their users. In some cases it may be that some of the sites are not in fact attempting to serve the general public but rather a more specialized population. An advantage of the highly rated nutrition sites is that the information made available is of relatively high quality. It may be possible be further automate the current primarily manual screening of information used by these sites. More general work in the area of effective information search investigates methods and algorithms for better matching information needs rather than simply topic by more detailed consideration of context and intended tasks.

 

Eastman, C. M. (2001) Finding it on the Web: Effective retrieval with Internet search engines. ACM Southeast Conference, Athens, GA, March 16-17, 2001, pp. 231-232.

 

Eastman, C. M. (2002) 30,000 hits may be better than 300: Precision anomalies in Internet searches. Journal of the American Society for Information Science and Technology, 53(11), 879-882.