ProteinMPNN (message passing neural network) is a a deep learning based protein sequence design method developed for the generation of plausible protein sequences to fit the 3D structure of natural and artificial protein backbones. While natural proteins function optimally in their biological context, they often suffer from decreased functional ability under conditions required for lab and industrial scale biotechnological use. This paper from a team at the Institute for Protein Design, University of Washington details a design strategy using ProteinMPNN which allows for improved protein expression, stability, and function outside of their native biological context. Improving Protein Expression, Stability, and Function with ProteinMPNN. https://lnkd.in/gEH-xUtD Methods overview: The authors chose a design space that preserved the catalytic machinery and substrate-binding site of the original protein. They then generated sequences with ProteinMPNN, predicted the structures with AlphaFold2, and filtered by the predicted local distance difference test score (pLDDT) and Cα root-mean-square deviation (RMSD) to the input structure. Myoglobin Design and results: The authors generated 60 novel sequences with ProteinMPNN and evaluated them using AlphaFold2 single-sequence predictions. They then generated two distinct sets of designs with structural remodeling and selected a total of 20 sequences for experimental testing. All 20 novel designs were expressed in E Coli and purified via size exclusion chromatography. Thirteen designs showed higher levels of total soluble protein yield compared to native myoglobin, with up to a 4.1-fold increase. All designs maintained similar heme-binding spectra to native myoglobin. Thermal stability testing revealed that all eight tested designs had higher melting temperatures than native myoglobin, with six remaining fully folded at 95°C. The authors also evaluated heme binding over a temperature gradient, finding that all designs preserved heme binding at higher temperatures than native myoglobin. TEV Protease Design and results: A total of 144 sequences were generated and selected for experimental testing of TEV protease, a widely used enzyme in biotechnology with suboptimal properties. After expression and purification, catalytic activity was evaluated using a known enzyme substrate, with 64 designs displaying substrate turnover. The authors performed detailed kinetic analysis on three highly active designs, which displayed improved catalytic efficiencies compared to the parent sequence, with up to 26-fold improvements. They also tested the most active designs with a fusion protein substrate, finding that two designs exhibited significantly higher rates of cleavage compared to the parent and other published TEV variants.
Key Techniques for Protein Design
Explore top LinkedIn content from expert professionals.
Summary
Protein design involves engineering proteins with specific structures and functions, often using advanced computational methods and tools to create or modify protein sequences for desired purposes. Recent innovations, including techniques like ProteinMPNN, ProteinGenerator (PG), and SPARKS, are revolutionizing this complex field by enabling the development of proteins with enhanced stability, functionality, and novel properties.
- Incorporate AI-driven tools: Explore advanced technologies like ProteinMPNN and ProteinGenerator to design proteins with improved stability, activity, and tailored functionalities for industrial and research applications.
- Optimize structural and functional traits: Focus on co-design approaches that simultaneously generate protein structures and sequences, helping you achieve specific objectives like thermostability, solubility, or bioactivity.
- Leverage new design insights: Utilize newly discovered protein design principles, such as length-dependent stability and frustration zones, to avoid challenges like instability and create more robust protein designs.
-
-
New Baker Lab protocol alert! Introducing ProteinGenerator (PG) — a new de novo protein design method that gives you more control over sequence types compared to the current “best approach” using RFDiffusion and ProteinMPNN. How is it different? The traditional pipeline separates structure and sequence generation: -- RFDiffusion creates stable, soluble protein backbones tailored to your binding mode but focuses solely on structure. -- ProteinMPNN predicts sequences that fold into these backbones, starting from a poly-glycine template. Enter ProteinGenerator (PG) PG generates both structure and sequence simultaneously, allowing you to: -- Enrich sequences with specific amino acids. -- Adjust net charge or hydrophobicity. -- Use experimental activity data to improve functionality. By co-generating structures and sequences, PG creates designs that are better optimized for the desired features, potentially improving on the RFDiffusion+ProteinMPNN approach. Highlights from PG’s Initial Designs: -- Thermostable Proteins: Designed proteins enriched with rare amino acids like cysteine and tryptophan. 68 out of 96 expressed designs were soluble, with many stable up to 95°C. -- Multistate Proteins: Designed sequences that adopt different folds under specific conditions, validated with NMR and AlphaFold2. -- Bioactive Peptide Cages: Scaffolded bioactive peptides (e.g., melittin), demonstrating activity release upon proteolytic cleavage. -- Repeat Proteins: Created repeat proteins with secondary structure constraints, including a crystal structure matching the design with a 1.38 Å RMSD. -- Intrinsic Barcodes: Embedded short peptide barcodes for efficient mass spectrometry-based identification. -- Guided Functional Design: Used PG with experimental data to optimize IgG-binding activity, outperforming Bayesian optimization baselines. This unified approach simplifies the pipeline and gives you more control. https://lnkd.in/gWE77csi
-
Big breakthrough: A few months my lab at MIT introduced SPARKS, our autonomous scientific discovery model. Since then we have demonstrated applicability to broad problem spaces across domains from proteins, bio-inspired materials to inorganic materials. SPARKS learns by doing, thinks by critiquing itself & creates knowledge through recursive interaction; not just with data, but with the physical & logical consequences of its own ideas. It closes the entire scientific loop - hypothesis generation, data retrieval, coding, simulation, critique, refinement, & detailed manuscript drafting - without prompts, manual tuning, or human oversight. SPARKS is fundamentally different from frontier models. While models like o3-pro and o3 deep research can produce summaries, they stop short of full discovery. SPARKS conducts the entire scientific process autonomously, generating & validating falsifiable hypotheses, interpreting results & refining its approach until a reproducible, fully validated evidence-based discovery emerges. This is the first time we've seen AI discover new science. SPARKS is orders of magnitude more capable than frontier models & even when comparing just the writing, SPARKS still outperforms: in our benchmark evaluation, it scored 1.6× higher than o3-pro and over 2.5× higher than o3 deep research - not because it writes more, but because it writes with purpose, grounded in original, validated compositional reasoning from start to finish. We benchmarked SPARKS on several case studies, where it uncovered two previously unknown protein design rules: 1⃣ Length-dependent mechanical crossover β-sheet-rich peptides outperform α-helices—but only once chains exceed ~80 amino acids. Below that, helices dominate. No prior systematic study had exposed this crossover, leaving protein designers without a quantitative rule for sizing sheet-rich materials. This discovery resolves a long-standing ambiguity in molecular design and provides a principle to guide the structural tuning of biomaterials and protein-based nanodevices based on mechanical strength. 2⃣ A stability “frustration zone” At intermediate lengths (~50- 70 residues) with balanced α/β content, peptide stability becomes highly variable. Sparks mapped this volatile region and explained its cause: competing folding nuclei and exposed edge strands that destabilize structure. This insight pinpoints a failure regime in protein design where instability arises not from randomness, but from well-defined physical constraints, giving designers new levers to avoid brittle configurations or engineer around them. This gives engineers and biologists a roadmap for avoiding stability traps in de novo design - especially when exploring hybrid motifs. Stay tuned for more updates & examples, papers and more details.