1,010,889 papers indexed

Two years
ahead of
the money.

Preprints surface days after a discovery. Years before the patents, the funding rounds, the analyst coverage. Finch tracks that signal across 73 themes, every month.

Explore Sample Data
Fig. 01
Momentum Index · Sample
Speech & Audio AI
81
AI Agents & Reasoning
78
LLMs & NLP
72
AR/VR & Immersive
66
Checkpoint Inhibitors
64
Next-Gen Vaccines
62
73
Themes tracked
1.01M
Papers classified
99.3%
Classification rate
66mo
Historical depth
19
Countries
Venture Capital
Corporate Strategy
M&A Advisory
Hedge Funds
Consulting
Data Platforms
01
Signal Pipeline

Systematic signal at stage one.

Scientific papers appear within days of a discovery — years before patents, funding rounds, and analyst coverage. The Finch Innovation Index captures that signal systematically, every month.

1
Research
Day 0
← FII
2
Journal
+6–18 mo
3
Patent
+1–2 yr
4
Startup
+2–3 yr
5
VC Funding
+2–4 yr
6
Product
+3–5 yr
7
Bloomberg
+4–6 yr
By the time it's on Bloomberg, you've already missed it. The Finch Innovation Index monitors preprint velocity across 73 themes, giving you a systematic leading indicator at the first stage of the innovation pipeline — before any other data source can see it.
02
Rising Keywords

Emerging signals before they hit the market.

Each month the index scans 1.01M research abstracts to surface technical concepts surging in frequency — the building blocks of tomorrow's products, before they have ticker symbols.

Explore Sample Data →
Rising terms · sample data
BigramGrowthType
latent actions×24.3novel
attention sink×23.1novel
computational budgets×22.4novel
gated attention×21.0surging
coded caching×35.2novel
test-time scaling×19.7surging
03
Historical Validation

The signal was always there.

Looking back across six years of preprint data, the precursor signals for every major technology wave were visible at the research stage — years before capital arrived.

Early 20212–3 yr lead
GLP-1 obesity research spike +180%

Preprint velocity in GLP-1 receptor agonists surged 180% before any major pharma coverage. The signal was unambiguous two years before Ozempic became a household name.

Novo Nordisk
stock return
Late 20193–4 yr lead
AI-accelerated drug discovery velocity

Research output at the intersection of deep learning and molecular biology began compounding well before the sector attracted major venture capital attention.

$4B+Sector market
cap created
2018–20193–4 yr lead
Transformer architecture adoption curve

Preprint publication rates for transformer-based models began an exponential climb in 2018–2019. The GPT revolution and its commercial consequences followed on schedule.

$100B+Market value
created
20182–3 yr lead
mRNA lipid nanoparticle delivery methods

Delivery mechanism research for mRNA therapeutics was compounding quietly in 2018. Moderna and BioNTech were building on a preprint signal that had been accumulating for years.

$50B+Combined market
cap peak
2015–20162–3 yr lead
CRISPR therapeutic applications spike

CRISPR preprint volume doubled across two consecutive years before therapeutic applications entered clinical development and attracted institutional capital.

$890MCRISPR Tx
IPO raise
2020–20212–3 yr lead
Perovskite solar efficiency research doubled

Efficiency research publications in perovskite photovoltaics doubled over two years, well ahead of the commercial investment wave that followed into the sector.

$200M+Oxford PV
total raised
04
The Datasets

Four datasets. Delivered
monthly.

Structured CSV and Parquet files updated on the 1st of each month. Every dataset covers all 73 themes across 66 months of history.

Monthly MomentumSample
thememomentum_scorepaper_countmom_changeranksector
Speech & Audio AI81322+3.21AI
AI Agents & Reasoning782,052+1.82AI
LLMs & NLP723,759+0.43AI
AR/VR & Immersive66824-0.64HW
Checkpoint Inhibitors64420-1.25LS
Next-Gen Vaccines62677-0.36LS
Quantum Error Correction61318+0.97HW
Federated Learning59589-1.48AI
73 rows/month · 7 fields
Geographic IntelligenceSample
themecountrypaper_countinst_countshare_pctrank
AI Agents🇺🇸 USA8,4201,01124.1%1
LLMs & NLP🇨🇳 China5,62067418.3%2
Perovskite Solar🇨🇳 China3,84046133.4%3
Next-Gen Vaccines🇬🇧 UK2,22026614.8%4
Checkpoint Inhibitors🇺🇸 USA4,06848852.1%5
Remote Monitoring🇬🇧 UK2,89734829.3%6
AI Agents🇨🇦 Canada1,3201583.8%7
LLMs & NLP🇮🇳 India2,8603439.3%8
~1,330 rows/month · 7 fields · 19 countries
Rising KeywordsSample
keyword_bigramgrowth_multtagprimary_themepaper_countprior_12m_avg
latent actions×24.3novelAI Agents & Reasoning2148.8
attention sink×23.1novelLLMs & NLP1878.1
computational budgets×22.4novelChip Architecture1767.9
gated attention×21.0surgingLLMs & NLP31014.8
coded caching×35.2novelFederated Learning1414.0
test-time scaling×19.7surgingAI Agents & Reasoning28914.7
reward shaping×18.4novelAI Agents & Reasoning19810.8
vector quantization×17.1surgingSpeech & Audio AI1629.5
~3,900 rows/month · CSV + Parquet
Theme EmergenceSample
themeemergedmonthscum_growthaccel_3mosig_scoremomentum
AI Agents & ReasoningJan 202073+337%+12.4%7881
mRNA TherapeuticsApr 202070+512%+8.1%5855
Speech & Audio AIJun 201980+621%+15.2%8184
Solid-State BatteriesMar 202159+218%+6.3%6657
Federated LearningFeb 202160+284%+4.9%5952
Perovskite SolarSep 202065+193%+3.7%5451
Checkpoint InhibitorsNov 201975+141%+2.1%6448
Quantum Error CorrectionAug 202154+256%+7.8%6159
73 themes · updated monthly · since Jan 2019
05
Methodology

How the index works.

A systematic, four-stage pipeline converts raw preprint publications into structured investment signals — no manual curation, no subjective scoring.

01
Source Collection

Monthly ingestion of academic preprints from the largest open-access repositories for frontier research. Over 1M papers processed since January 2019.

02
Theme Classification

Each abstract is classified into one or more of 73 investable technology themes using a proprietary taxonomy validated against expert panels. 99.3% classification accuracy.

03
Momentum Scoring

Monthly publication velocity is normalised into a 0–100 momentum score per theme. Scores account for baseline volume, growth rate, and 3-month acceleration to separate signal from noise.

04
Signal Generation

Four structured datasets produced on the 1st of each month: Momentum rankings, Geographic Intelligence across 19 countries, Rising Keywords via bigram analysis, and Theme Emergence tracking.

ReproducibleSame inputs, same outputs. Every score is deterministic.
TransparentFull methodology documentation included with every dataset delivery.
No opinionsMomentum scores are derived from publication data — not analyst sentiment.
06
Who This Is For

Built for decision-makers with long time horizons.

The index is used by investors, strategists, and advisors who need a quantitative foundation for technology thesis work — not a narrative, a dataset.

Venture Capital

Identify emerging categories 2–4 years before deal flow appears. Build data-backed thesis documents before the sector has a name.

Corporate Strategy

Scan adjacent technical domains for threats and opportunities. Build R&D pipeline maps grounded in publication velocity, not analyst opinion.

M&A Advisory

Validate target company positioning against underlying research trends. Identify sectors approaching peak publication velocity before valuation follows.

Hedge Funds

Systematic, monthly signals across 73 themes. Integrate preprint momentum into quantitative models as a leading factor for technology-sector positioning.

Management Consulting

Deliver technology landscape assessments backed by publication data, not keyword searches. Differentiate strategy reports with proprietary signal intelligence.

Financial Data Platforms

License structured monthly feeds to enrich alternative data products. Four clean datasets, consistent schema, CSV and Parquet on a monthly cadence.

07
Request Access

Ready to see the signal early?

Over one million papers classified. 73 investable themes. Rising Keywords, Geographic Intelligence, and Theme Emergence — updated every month.

Explore Sample Data
Index Summary
Themes73 investable themes
Papers classified1,010,889
DatasetsMonthly Momentum + Geographic Intelligence + Rising Keywords + Theme Emergence
FormatCSV + Parquet · monthly cadence
Coverage66 months · Jan 2019–present
SourcesAcademic preprints