A training-free, deterministic protein stability classifier built on Recursive Coherence Field Theory. No neural networks. No GPUs. No learned parameters. Pure mathematical coherence from sequence alone.
Named in loving memory of the creator's mother — because the best science is built on love, not just logic.
Select from the sample library or submit your own sequence. Pre-computed results demonstrate the Angelika Fold classification system across multiple protein categories.
Register to access the interactive Protein Coherence Engine. Free access for researchers, developers, and collaborators.
Everything you'd want to ask about Angelika Fold — what it does, what it doesn't, where it stands, and where it's going. No marketing language. Just the science.
Let's be direct about this. Angelika Fold is a protein stability classifier — not a structure predictor. It does not attempt to predict 3D coordinates, backbone angles, or contact maps. That's AlphaFold's territory, and they do it extraordinarily well with billions of dollars and thousands of GPUs behind them.
What Angelika Fold does is different: given a raw amino acid sequence, it computes a single coherence value — Ω (Omega) — that reflects the sequence's intrinsic tendency toward structural stability. If Ω exceeds the golden threshold τ = φ⁻² ≈ 0.382, the protein is classified as stable. Below that, it's flagged as intrinsically disordered or aggregation-prone.
The output is a binary classification with a continuous confidence score. Not a folded structure. Not a ΔG prediction. A stability signal derived from first principles — no training data, no neural networks, no learned parameters.
The benchmark framework is designed around a reproducible 1,000-protein target sourced from UniProt (reviewed/Swiss-Prot entries). Currently, the live engine demonstrates 8 reference proteins across 5 categories, with the full 1,000-protein benchmark in active development. The dataset is deliberately adversarial — it's not designed to make RCFT look good. It's designed to break it.
| CAT | COUNT | DESCRIPTION | EXPECTED |
|---|---|---|---|
| A | 150 | Extremophile & hyperstable (thermophiles, disulfide-rich) | STABLE |
| B | 150 | Intrinsically disordered / LLPS (α-synuclein, Tau, FUS, TDP-43) | DISORDERED |
| C | 150 | Membrane proteins (GPCRs, channels, transporters) | CONTEXT-DEP |
| D | 150 | Amyloid / aggregation-prone (Aβ, prion, IAPP, transthyretin) | UNSTABLE |
| E | 150 | Giant multi-domain (titin, dystrophin, BRCA2, mTOR) | MIXED |
| F | 100 | Designed / de novo engineered proteins | STABLE |
| G | 100 | Viral proteins (SARS-CoV-2 spike, HIV gp120, influenza HA) | MIXED |
| H | 50 | Hard negative controls (shuffled/reversed from Category A) | UNSTABLE |
Every sequence is publicly identified by UniProt accession. None of them are proprietary. The dataset includes force-included "curveball" proteins — yeast prion-like sequences in Category B, bacterial beta-barrel OMPs in Category C, disease mutants in Category D, ankyrin/TPR repeats in Category E — specifically chosen because they're hard edge cases that trip up simpler methods.
Category H deserves special attention: these are 50 sequences taken from Category A (known stable proteins), then shuffled, reversed, or window-randomized with a fixed seed. They have identical amino acid composition to real stable proteins but destroyed sequence order. If RCFT is just measuring composition, it would classify these as stable too. That's the test.
Here's the part where most pitch decks would show you a cherry-picked accuracy number and move on. We're not going to do that.
The RCFT engine consistently assigns higher Ω values to known stable proteins and lower values to disordered and amyloid-prone proteins. In the live web engine (which uses a normalized 0–1 Ω scale), hyperstable proteins like Ubiquitin score Ω ≈ 0.46, while intrinsically disordered proteins like Alpha-Synuclein score Ω ≈ 0.29 and amyloid-prone proteins score lowest. The ordering is correct — the system sees a real signal. Category D (amyloid) consistently scores lowest, which is exactly what biophysics would predict.
But we'll be the first to say: the calibration is an active area of development. The desktop benchmark pipeline (C++/Python) uses a raw Ω scale, while the web demonstration engine uses a normalized 0–1 scale with the golden threshold at τ = φ⁻² ≈ 0.382. Both pipelines produce the same ranking — the normalization is different. Threshold tuning across protein categories remains an active area of development, not a solved problem we're hiding behind marketing language.
What we can say with confidence: the system detects sequence-order-dependent coherence patterns that correlate with known stability classes. Shuffled controls (Category H) produce different Ω distributions than their parent sequences. The signal is real. The calibration is ongoing.
We're measuring stability classification accuracy — not RMSD, not TM-score, not GDT_TS. Those are structure prediction metrics, and we're not predicting structures. Our metric is binary: does the system correctly classify a protein as stable or disordered/aggregation-prone?
The benchmark pipeline computes ROC/AUC using Category A (expected stable) as positives and Categories B+D (expected unstable) as negatives. When the Ω values are used as a continuous score rather than a binary threshold, the ranking performance is meaningful — amyloid-prone proteins consistently score lower than hyperstable ones.
As for baselines: the most trivial predictor would be "classify everything as stable" (which would get Category A right and everything else wrong) or a composition-based predictor (which would fail on Category H controls). RCFT's advantage is that it's sequence-order-sensitive — it doesn't just count amino acids, it processes their arrangement through a coherence field. That's what makes the shuffled controls a meaningful test.
We have not yet submitted to CASP. That's a deliberate choice — CASP evaluates structure prediction, and we're not claiming to predict structures. A more appropriate benchmark would be against experimental ΔG databases or disorder prediction competitions (like CAID). That comparison is planned.
The general architecture is no secret: each amino acid is encoded into a multi-dimensional property vector (hydrophobicity, charge, size, flexibility, aromaticity). A sliding window scans the sequence, feeding these vectors into the RCFT engine, which evolves an 11-dimensional state through recursive compression dynamics. The coherence value Ω emerges from this evolution.
The Harmonic Catalyst (H) modulates the threshold based on golden-ratio frequency alignment — sequences whose internal periodicity resonates with φ-based harmonics get a slight threshold adjustment. This captures something real about protein secondary structure periodicity (α-helices repeat every ~3.6 residues, which is close to a φ-harmonic).
What we're not going to share here is the specific field evolution equations, the compression operators, or the exact encoding weights. Those are the subject of a provisional patent (filed 2026), and they represent the core intellectual property of this system. If you're a researcher interested in collaboration or an investor interested in the details, that conversation happens under NDA.
Angelika Fold runs on a single CPU core. No GPU. No cloud compute. No training phase. A typical 200-residue protein processes in under a second on commodity hardware. The C++ core engine handles the heavy computation; Python bindings provide the interface.
Runtime scales linearly with sequence length — O(n) where n is the number of residues. The 1,000-protein benchmark (including proteins up to 35,000 residues like titin) completes in minutes, not hours. This matters because if you want to screen millions of sequences, you need something faster than a neural network inference pass.
Same sequence in, same Ω out. Every time. No stochastic elements, no random initialization, no dropout. The benchmark uses a fixed random seed (4242) for control generation, and all UniProt queries are sorted by accession for deterministic retrieval.
The benchmark configuration, fetch script, and runner are all in the repository. Anyone with access can reproduce the exact same 1,000-protein dataset and results (subject to UniProt database updates, which we version-lock in the manifest).
Membrane proteins (Category C) are genuinely hard. Their stability is context-dependent — a transmembrane helix is "stable" in a lipid bilayer but "unstable" in aqueous solution. RCFT currently treats all sequences in the same context, which means membrane proteins are an acknowledged limitation. We're exploring environment-dependent threshold modulation, but it's not implemented yet.
Very short peptides (under ~30 residues) don't give the sliding window enough signal to produce reliable Ω values. Very long multi-domain proteins (Category E) produce a single global Ω that may mask domain-level instability. Per-domain analysis is a planned feature.
Non-standard amino acids and post-translational modifications are not currently handled — the encoder maps the 20 canonical amino acids. Selenocysteine, pyrrolysine, and modified residues are stripped or ignored. Multi-chain complexes are processed as individual chains.
We also don't claim to predict mutation effects with single-residue resolution — yet. The framework could theoretically compute ΔΩ for point mutations, but we haven't validated that against experimental ΔΔG data. Claiming it works without that validation would be irresponsible.
The underlying premise is that protein stability is not random — it's a consequence of sequence-encoded coherence. Stable proteins have amino acid arrangements that produce self-reinforcing interaction patterns. Disordered proteins don't. This isn't controversial — it's essentially what the hydrophobic core model, secondary structure propensities, and contact order all describe from different angles.
RCFT's contribution is a unified mathematical framework for measuring this coherence. Instead of separate predictors for hydrophobicity, charge distribution, and secondary structure propensity, RCFT processes all of these simultaneously through a single field evolution. The golden ratio enters because φ-based compression is mathematically optimal for preserving information under recursive transformation — and protein folding is, at its core, a recursive compression of a 1D sequence into a 3D structure.
Is this proven? No. It's a hypothesis with promising early results and a rigorous mathematical foundation. The connection between φ-optimal compression and biological stability is the core claim that needs further validation. We believe it's real. We're building the evidence. We're not pretending the evidence is already complete.
Angelika Fold has not yet been tested against newly determined experimental structures or wet-lab stability measurements (melting temperatures, ΔG from calorimetry, etc.). That's the next milestone, and it's the one that matters most.
Planned validation targets include: correlation with ProTherm experimental ΔG values, comparison against the CAID disorder prediction benchmark, blind testing on proteins deposited in PDB after our benchmark was locked, and collaboration with experimental labs for prospective validation on de novo designed proteins.
This is where we need partners. An independent researcher, a university lab, or a pharmaceutical company willing to run Angelika Fold predictions against their internal experimental stability data — under NDA, with proper controls — would produce the kind of evidence that moves this from "interesting mathematical framework" to "validated tool." That's the collaboration we're actively seeking.
Angelika Fold is a working system with a novel mathematical foundation, a reproducible benchmark, and honest limitations. It's not AlphaFold — it doesn't try to be. It's a fundamentally different approach to a related problem: can you determine protein stability from sequence alone, without training data, without GPUs, and without billions of parameters?
The early signal says yes. The threshold calibration says we have more work to do. The mathematical framework says the approach is sound. The experimental validation says we need collaborators.
Named in loving memory of the creator's mother — because the best science is built on love, not just logic.
Angelika Fold needs experimental validation partners and funding to convert provisional patents to utility patents. Here's what we're looking for.
Run Angelika Fold predictions against your experimental stability data (ΔG, Tm, disorder annotations). We provide the engine under NDA; you provide the ground truth. Co-authorship on validation papers.
Two provisional patents with a 12-month conversion window. A working engine with a novel mathematical foundation. A clear path to validation and commercialization in the $4B+ protein engineering market.
Screen millions of candidate sequences in minutes on commodity hardware. No GPU clusters. No training pipelines. Deterministic, reproducible results for regulatory documentation.