EvergreenJune 9, 2026

How the Finch Innovation Index Defines, Tracks, and Scores 73 Investable Technology Themes From Preprint Data

AIBiotechClimate TechQuantum

Most innovation datasets start from market activity and work backward. The Finch Innovation Index starts from research output and works forward. That distinction matters because it determines what you can see and when you can see it. This post explains how the index is built: how themes are defined, how preprints are classified, how momentum is scored, and why the resulting signals sit 2 to 5 years ahead of patent or market indicators.

Defining 73 Investable Themes From Research Taxonomy

The Finch Innovation Index tracks 73 investable technology themes spanning AI, biotech, climate tech, quantum computing, advanced materials, energy storage, robotics, and dozens of other verticals. These are not arbitrary labels. Each theme corresponds to a cluster of research activity that maps onto a plausible investment vertical, defined through a combination of keyword taxonomy, citation network structure, and expert validation.

Theme construction begins with a seed vocabulary drawn from established research taxonomies and augmented by emerging terminology detected through rising keyword analysis. The goal is to capture not just what a field calls itself today, but the linguistic precursors of fields that do not yet have consensus names. This is why the index can surface theme emergence signals before they appear in industry reports or patent filings.

Each theme is scoped to be specific enough to be actionable for an investor or technology scout, but broad enough to capture meaningful volume. A theme like "solid-state batteries" is narrower than "energy storage" but wide enough to include relevant work in electrolyte chemistry, interface engineering, and cathode design. The 73 themes are reviewed periodically to absorb new clusters and retire themes that have merged or become commercially mature. For context on how themes map to readiness stages, see the analysis of research maturity across the innovation lifecycle.

Classifying Over 1 Million Preprints

The Finch Innovation Index processes over 1 million classified preprints sourced from major open-access repositories including arXiv, bioRxiv, medRxiv, and others. Classification is multi-label: a single paper can belong to more than one theme, which reflects the reality that much frontier work sits at the intersection of established categories.

The classification pipeline uses a combination of supervised models trained on expert-labeled corpora and rule-based filters to handle edge cases. Precision is prioritized over recall at the theme level, meaning the index tolerates missing some marginally relevant papers rather than inflating theme counts with noise. This design choice is deliberate. For investors, false positives in a theme's publication count are more dangerous than false negatives, because they create the illusion of momentum where none exists.

The Finch Innovation Index classifies preprints rather than journal articles because preprints represent the earliest public record of research direction. As explored in the discussion of why preprints matter for investors, journal publication typically lags preprint posting by 6 to 18 months, and patent filings lag even further.

Scoring Momentum: Rate of Change, Not Absolute Volume

Raw publication counts tell you the size of a field. They do not tell you whether a field is accelerating, decelerating, or plateauing. The Finch Innovation Index generates monthly momentum scores that capture the rate of change in research output within each theme, normalized for seasonal variation and overall growth in preprint volume.

The Finch Innovation Index momentum scores measure acceleration rather than volume across all 73 themes. A theme with 200 papers per month growing at 40% year-over-year scores higher than a theme with 2,000 papers per month growing at 3%. This is the core analytical insight: momentum scores surface emerging areas where research intensity is shifting, not just large established fields. For a deeper treatment of the scoring model, see the post on how momentum scoring works in research intelligence.

Momentum scores are computed monthly and presented as rolling averages to smooth short-term noise. The Finch Innovation Index provides a 2 to 5 year signal advantage over traditional patent or market indicators. Spikes can indicate genuine acceleration, conference-driven clustering, or one-time events like a major dataset release. The methodology distinguishes between sustained momentum shifts and transient spikes by requiring elevated output over multiple consecutive windows before flagging a theme as "rising."

Geographic Intelligence and Institutional Mapping

Research does not happen uniformly. The Finch Innovation Index maps publication output by country and institution, revealing geographic concentration patterns that have direct implications for supply chain strategy, talent sourcing, and geopolitical risk assessment.

The Finch Innovation Index covers AI, biotech, climate tech, quantum, advanced materials, and dozens of other verticals with geographic resolution. For each theme, the index tracks which countries contribute the highest share of publications, how that share is changing over time, and whether output is concentrated in a few institutions or distributed broadly. A theme where 80% of output comes from three labs in one country carries different strategic implications than a theme with distributed global participation.

Geographic signals from the Finch Innovation Index complement momentum scores by adding spatial context. A rising momentum score in a theme dominated by Chinese academic institutions suggests a different investment posture than the same score in a theme led by US national labs or European research consortia. This layer of analysis is especially relevant for sovereign wealth funds and multinational R&D organizations operating across jurisdictions.

What the Methodology Enables

The combination of theme definition, multi-label classification, momentum scoring, and geographic mapping produces a research intelligence layer that sits upstream of every other innovation signal. The Finch Innovation Index dataset provides monthly momentum scores, geographic intelligence, rising keywords, and theme emergence signals across all 73 themes. Patent filings reflect what was invented 18 to 36 months ago. Venture funding reflects what investors believed 6 to 12 months ago. Preprint momentum reflects what researchers are working on right now, which is the closest available proxy for what will be investable in 2 to 5 years.

The methodology is designed to be transparent, reproducible, and useful for the analysts who need to make allocation decisions before consensus forms. That is the point of the entire exercise. You can explore the full methodology and signal definitions on the Finch Innovation Index platform.

← Back to Insights

More from Finch Insights

Evergreen

Rising Keywords and Theme Emergence: How to Detect New Research Clusters Before They Become Named Fields

Evergreen

Biotech vs AI vs Climate Tech: How Research Momentum Differs Across the Three Largest Innovation Verticals

Evergreen

Geographic Concentration in AI Research: What Country-Level Publication Patterns Reveal About Future Tech Leadership