Really nice post, Niko, and a pretty wild result. After the Evo 2 paper, I thought viable genome generation was still further out. However, I think you're glossing over the downsides a tad too quickly.
You correctly point out that this result has biosecurity folks worried since Evo 2 is an open-weights model and many viruses of concern are only a few kb longer than ΦX174.
But your bottom line from that is, to me, a little bit disconnected: "Although the risk of training on human viruses seems troubling, the real barriers to moving from phages to larger organisms are data and atoms."
It is no doubt correct that larger organisms are much much more challenging to generate and synthesize, but that has little to do with the aforementioned worries about human viruses—and I do worry a lot at this point!
I think that a method that was able to de novo generate very sequence-divergent, viable viruses warrants a pretty thorough dual-use discussion, especially after we've seen what a little bit of genetic distance and a few kb more did in late 2019...
You're right that this is clearly dual-use, and there should be much larger discussions about this sort of thing. My transition to data and atoms was not intended to convey that this is not a valid concern.
"Of course, this paper will probably be alarming to folks in the biosecurity community. The authors point out that Evo 2 excludes human viruses from its pretraining data"
This is unfortunately not entirely true, at least based on my reading of the methods section of the Evo 2; they *tried* to exclude human viruses but in fact I believe the training set included a large amount of viral sequence. This new result is quite impressive but I really wish they had consulted a bit more widely before releasing it. I don't think at this point it is an immediate biosecurity threat but (as someone who is typically fairly dismissive of AI biosecurity concerns) this paper is genuinely concerning to me.
Thanks for this! What a fascinating read! As a cancer survivor, my mind always gravitates to potential cancer applications whenever I even see the term biotech or synthetic bio etc. Recently had a chat with a gal pal working on cancer research with engineered proteins and that, plus learning lately about everything that is coming from GenAI in the biology field is just getting me so excited for the future of medical technology!
What struck me most in this paper was Evo-Φ36: the model rescued a protein swap that normally cripples the phage. That feels beyond the PoC AI designed genome.
If genAI can consistently compensate for otherwise nonfunctional changes, then maybe an underrated advance here could be in making structural motifs from outside the local evolutionary tree accessible?
Perhaps turning previously impossible swaps into workable modules - something akin to a compatibility engine for synthetic biology?
Really nice post, Niko, and a pretty wild result. After the Evo 2 paper, I thought viable genome generation was still further out. However, I think you're glossing over the downsides a tad too quickly.
You correctly point out that this result has biosecurity folks worried since Evo 2 is an open-weights model and many viruses of concern are only a few kb longer than ΦX174.
But your bottom line from that is, to me, a little bit disconnected: "Although the risk of training on human viruses seems troubling, the real barriers to moving from phages to larger organisms are data and atoms."
It is no doubt correct that larger organisms are much much more challenging to generate and synthesize, but that has little to do with the aforementioned worries about human viruses—and I do worry a lot at this point!
I think that a method that was able to de novo generate very sequence-divergent, viable viruses warrants a pretty thorough dual-use discussion, especially after we've seen what a little bit of genetic distance and a few kb more did in late 2019...
You're right that this is clearly dual-use, and there should be much larger discussions about this sort of thing. My transition to data and atoms was not intended to convey that this is not a valid concern.
"Of course, this paper will probably be alarming to folks in the biosecurity community. The authors point out that Evo 2 excludes human viruses from its pretraining data"
This is unfortunately not entirely true, at least based on my reading of the methods section of the Evo 2; they *tried* to exclude human viruses but in fact I believe the training set included a large amount of viral sequence. This new result is quite impressive but I really wish they had consulted a bit more widely before releasing it. I don't think at this point it is an immediate biosecurity threat but (as someone who is typically fairly dismissive of AI biosecurity concerns) this paper is genuinely concerning to me.
Thanks for this! What a fascinating read! As a cancer survivor, my mind always gravitates to potential cancer applications whenever I even see the term biotech or synthetic bio etc. Recently had a chat with a gal pal working on cancer research with engineered proteins and that, plus learning lately about everything that is coming from GenAI in the biology field is just getting me so excited for the future of medical technology!
What struck me most in this paper was Evo-Φ36: the model rescued a protein swap that normally cripples the phage. That feels beyond the PoC AI designed genome.
If genAI can consistently compensate for otherwise nonfunctional changes, then maybe an underrated advance here could be in making structural motifs from outside the local evolutionary tree accessible?
Perhaps turning previously impossible swaps into workable modules - something akin to a compatibility engine for synthetic biology?
Curious what others here think!
I'm usually optimistic about technology, but its hard not to imagine a horrendous outcome if the wrong person hatches the wrong plan.
Great take about the wow factor “AI can design whole genomes” whilst highlighting how it compensates for fatal changes!
I’d love to discuss more below, if anyone’s interest is equally piqued. 😊
This is like when Profluent made the generative CRISPR with same or better activity but very different seq. But this time, with life.