Rethinking Search in Science

1. Popularity scores: a hindrance to new discoveries

Groundbreaking science is often hindered by popularity bias. Search engines prioritize content by general appeal and prior interactions — colliding with the scientific goal of surfacing new discoveries. Useful-but-unpopular information includes:

Forefront research that hasn’t yet gained attention
Niche studies and findings
Information valid only under special conditions
Complex information that is hard to understand

2. Keyword searches

Scientific concepts and their interrelations can rarely be reduced to simple keywords without losing essential context. Researchers sift through overloads of loosely related documents in a tedious, iterative process of refining terms. This yields a high rate of false positives and may also increase the likelihood of missing critical information (false negatives).

3. Semantic searches

Semantic search is a step forward — but not flawless. Today’s vectorization tools cannot losslessly convert running text into vector representations. Scientific articles, which discuss observations bordering on off-topic, are particularly affected: during vectorization, details get lost, so querying for that detail can’t surface the document.

4. A novel solution: graph-based document representation

Each scientific document can be conceptualized as a network of interconnected pieces of information rather than a linear block of text. Each observation, result, and concept becomes a node; the relationships become edges. This allows nuanced indexing based on the strength and nature of connections, not mere occurrence.

Advantages:

Graph search — navigate interconnected concepts intuitively, with targeted searches free of irrelevant noise, via edge traversal and node querying.
Enhanced semantic resolution — preserve the rich context around each piece of information for more accurate retrieval based on depth, not keyword or vector similarity.
Dynamic discovery — the search system doubles as a recommender, showing what else links to the queried information; follow chains of related information to discover new insights or validate hypotheses.

Although implementing graph-based representations can be resource-intensive, the potential for transforming scientific research is profound.

Conclusion

The adoption of graph-based document representations challenges the status quo of document search and opens a pathway to more effective scientific inquiry — ensuring crucial information is no longer obscured by the limitations of conventional search. This leap could well be a pivotal moment in accelerating scientific progress.

See the solution Literature review

Rethinking search in science.

1. Popularity scores: a hindrance to new discoveries

2. Keyword searches

3. Semantic searches

4. A novel solution: graph-based document representation

Conclusion