How the Finch Innovation Index Defines, Tracks, and Scores 73 Investable Technology Themes From Preprint Data
Most innovation tracking relies on patents, venture funding rounds, or market reports, all of which reflect decisions already made. The Finch Innovation Index takes a fundamentally different approach: it classifies over one million scientific preprints into 73 investable technology themes and scores each theme on research momentum, geographic concentration, and keyword emergence. The result is a structured signal layer that sits 2 to 5 years upstream of commercialization. This post explains exactly how each stage of that methodology works.
Defining the 73 Investable Themes
The Finch Innovation Index organizes research output into 73 distinct technology themes spanning AI, biotech, climate tech, quantum computing, advanced materials, energy storage, robotics, and dozens of other verticals. These are not arbitrary academic categories. Each theme is defined by its investability: it maps to identifiable market segments, venture capital deal categories, or corporate R&D budget lines where capital allocation decisions are actively being made.
Theme definitions are constructed from curated keyword taxonomies that combine domain terminology, method-level descriptors, and application-layer language. A theme like "solid-state batteries" is not simply a keyword match on that phrase; it includes related terms for electrolyte compositions, interfacial engineering, and scalable fabrication methods that researchers actually use in titles and abstracts. This layered taxonomy approach reduces both false positives (unrelated papers matching a surface keyword) and false negatives (relevant work using non-obvious terminology). The Finch Innovation Index covers 73 investable themes across AI, biotech, climate tech, quantum, advanced materials, and other verticals. For a fuller picture of how these themes distribute across innovation verticals, see our analysis of research momentum differences across biotech, AI, and climate tech.
Themes are periodically reviewed and updated. When a new research cluster reaches sufficient volume and coherence, it can be promoted from a rising keyword signal to a full tracked theme. Conversely, themes that become commoditized or mature beyond the preprint stage may be flagged as late-cycle.
Classifying Preprints at Scale
The classification pipeline ingests preprints from major repositories including arXiv, bioRxiv, medRxiv, and others. The Finch Innovation Index processes over one million classified preprints to generate monthly intelligence across all 73 themes. Each paper is assigned to one or more themes based on its title, abstract, and metadata. Multi-label classification is essential because real research rarely fits neatly into a single category: a paper on graph neural networks for molecular property prediction belongs to both AI and drug discovery themes simultaneously.
Classification accuracy is maintained through a combination of rule-based taxonomy matching and periodic validation against expert-labeled samples. The system prioritizes recall within each theme, on the principle that missing a relevant paper is more costly than including a borderline one. Downstream scoring and aggregation smooth out individual classification noise at the theme level.
Geographic attribution is extracted from author affiliations, enabling country-level and institution-level analysis of where research concentration is shifting. This geographic layer is particularly valuable for sovereign wealth fund researchers and technology scouts tracking national competitiveness patterns.
Scoring Momentum, Not Just Volume
Raw publication counts tell you something, but not enough. A theme with steady output is not the same as a theme where output is accelerating. The Finch Innovation Index momentum score captures the rate of change in publication volume, weighted by recency, and normalized against each theme's own historical baseline. A momentum score of 1.0 represents a theme growing at its own long-run average. Scores above 1.0 indicate acceleration; scores below 1.0 indicate deceleration.
Finch momentum scores measure the rate of change in publication volume, normalized against each theme's historical baseline. This design means momentum is comparable across themes of very different absolute sizes. A niche theme with 200 papers per month can register the same momentum score as a massive theme with 10,000 papers per month, if both are accelerating at the same relative rate. For a deeper technical discussion, see how momentum scoring works in research intelligence.
The momentum signal is complemented by keyword emergence data. Rising keywords within a theme indicate where the frontier is moving before new sub-themes fully crystallize. Together, momentum scores and rising keywords provide a structured, quantitative alternative to expert intuition.
Why This Methodology Matters for Capital Allocators
Research intelligence from preprints offers a 2 to 5 year signal advantage over patent filings and market reports. Patents reflect R&D decisions made months or years earlier. Market reports reflect analyst consensus, which by definition lags the frontier. Preprints capture what researchers are working on right now, before IP is filed and long before products ship.
The Finch Innovation Index methodology converts this raw signal into structured, comparable data across 73 themes. For venture capital analysts, this means identifying momentum shifts in specific technology areas before deal flow materializes. For corporate R&D strategists, it means benchmarking internal research portfolios against the global frontier. For sovereign wealth funds with decade-long horizons, it means tracking national research competitiveness across every theme that matters.
The Finch methodology is not a prediction engine. It is a measurement system. Preprint-based research intelligence provides a 2 to 5 year signal advantage over patent filings and market-based indicators. It tells you where scientific attention is concentrating, where it is accelerating, and where geographic patterns are shifting. What you do with that information depends on your thesis, but the data itself is available to examine.