how much information is *outside* of the genome? e.g. does the developmental code live in the genome, or does the embryo have behaviour that is not reducible to genes ?
"DNA is a “blueprint” for a cell. But information is needed to interpret that blueprint. Imagine a machine that could take in a DNA sequence and build a human cell. How many bits would be needed to describe that machine? A lot, right?
Of course, there’s a recursive “chicken and the egg” issue here: The machines that actually make human cells from DNA are… other human cells. But you need some information to get the loop started!"
I guess it depends on your reference.. the zygote obviously has everything it needs. The information or potential is...somewhere. “embodied information” as you say. But an external observer trying to define the full information content of that program - across development, physiology, etc.. they would need to capture basically every activity and state of every cell/tissue/organ at every time point through development? feel like it could be OOM beyond however many bytes we say are in DNA
do you thnk it will be possible for software to simulate development just from genome + perfect simulation of zygote ?
I certainly agree that if you're thinking about an adult human, information comes from everywhere. But I think it's reasonable to think about how much of that comes from (A) DNA in the gametes, (B) the physical structure of the gametes, and (C) everything else. So I guess the "phenotypic Kolmogorov complexity" would only be looking at category A, category B seems hard, and category C seems REALLY hard?
I'm not sure the zygote even has everything it needs. There's a lot of interaction between the developing embryo and the mother that can influence the final phenotype. Another large well of information is the microbiome, much of which is transferred from the mother but has many different species with their own DNA. Given how the microbiome can influence overall health and cognition, I think it's necessary to at least consider it. It does seem that it is somewhat fungible as to the specific composition of the microbiome, so that info may be quite compressible.
yeah I'd maybe distinguish between "necessary information" and "other inputs" that can influence the developing organism but aren't essential to development - teratogens, nutritional deficiencies, toxins, microbiome constituents...
I'm not sure if there are any maternal developmental factors necessary for the code to run or if the womb is really just an incubator
I wonder if you could get an estimate of your "phenotypic Kolmogorov complexity" from gene essentiality or better yet one of those reductionist synthetic biology attempts to make a minimal cell.
Though you run into the problem of defining phenotypically identical. Proving that a genetic element is *never* useful seems impossible : maybe you simply haven't tested the condition where it is used. e.g. lab strains of S. Cerevicae commonly have genes required for sporulation knocked out, but you'd never conclude they were necessary (or "informative") unless you deprived them of nutrients : then you'd see that the wild-type made spores, and the mutants did not. As the old saw goes : "The knockout mouse has no phenotype", "Well, did you take it to the opera?" (implication is that the phenotype may only be observable under opera conditions). So maybe the information figure would be contingent on a "reasonable" phenotyping panel?
Also a minor nitpick: there is X-Y crossing-over on the PARs, I believe.
Pretty informative! This reminds me of a paper entitled : "The Genomic Code: The genome instantiates a generative model of the organism".
They argue that the genome is a latent representation of an organisms. Conceptually similar to this compressed view of DNA. Interestingly, some stretch of "useless" DNA could be there for structural regulation (TADs, LADs, etc).
If you take (say) lungfish DNA, they have much more repetitive elements / jumping genes than humans. The exact cause of this seems to be unclear. But the impact in terms of information is that while they have more "storage space", I doubt they actually have more "information". That is, I speculate that you could theoretically engineer DNA to create a lungfish-like organism with vastly smaller DNA.
how much information is *outside* of the genome? e.g. does the developmental code live in the genome, or does the embryo have behaviour that is not reducible to genes ?
I believe it's substantial!
"DNA is a “blueprint” for a cell. But information is needed to interpret that blueprint. Imagine a machine that could take in a DNA sequence and build a human cell. How many bits would be needed to describe that machine? A lot, right?
Of course, there’s a recursive “chicken and the egg” issue here: The machines that actually make human cells from DNA are… other human cells. But you need some information to get the loop started!"
(Although I have no idea how to quantify it.)
https://dynomight.net/data-wall/#its-not-just-dna
I guess it depends on your reference.. the zygote obviously has everything it needs. The information or potential is...somewhere. “embodied information” as you say. But an external observer trying to define the full information content of that program - across development, physiology, etc.. they would need to capture basically every activity and state of every cell/tissue/organ at every time point through development? feel like it could be OOM beyond however many bytes we say are in DNA
do you thnk it will be possible for software to simulate development just from genome + perfect simulation of zygote ?
I certainly agree that if you're thinking about an adult human, information comes from everywhere. But I think it's reasonable to think about how much of that comes from (A) DNA in the gametes, (B) the physical structure of the gametes, and (C) everything else. So I guess the "phenotypic Kolmogorov complexity" would only be looking at category A, category B seems hard, and category C seems REALLY hard?
I'm not sure the zygote even has everything it needs. There's a lot of interaction between the developing embryo and the mother that can influence the final phenotype. Another large well of information is the microbiome, much of which is transferred from the mother but has many different species with their own DNA. Given how the microbiome can influence overall health and cognition, I think it's necessary to at least consider it. It does seem that it is somewhat fungible as to the specific composition of the microbiome, so that info may be quite compressible.
yeah I'd maybe distinguish between "necessary information" and "other inputs" that can influence the developing organism but aren't essential to development - teratogens, nutritional deficiencies, toxins, microbiome constituents...
I'm not sure if there are any maternal developmental factors necessary for the code to run or if the womb is really just an incubator
If only someone had written a book about that subject. We wouldn't have to guess so much. A book like this one perhaps. https://press.princeton.edu/books/paperback/9780691241142/the-evolution-of-biological-information
I wonder if you could get an estimate of your "phenotypic Kolmogorov complexity" from gene essentiality or better yet one of those reductionist synthetic biology attempts to make a minimal cell.
Though you run into the problem of defining phenotypically identical. Proving that a genetic element is *never* useful seems impossible : maybe you simply haven't tested the condition where it is used. e.g. lab strains of S. Cerevicae commonly have genes required for sporulation knocked out, but you'd never conclude they were necessary (or "informative") unless you deprived them of nutrients : then you'd see that the wild-type made spores, and the mutants did not. As the old saw goes : "The knockout mouse has no phenotype", "Well, did you take it to the opera?" (implication is that the phenotype may only be observable under opera conditions). So maybe the information figure would be contingent on a "reasonable" phenotyping panel?
Also a minor nitpick: there is X-Y crossing-over on the PARs, I believe.
Pretty informative! This reminds me of a paper entitled : "The Genomic Code: The genome instantiates a generative model of the organism".
They argue that the genome is a latent representation of an organisms. Conceptually similar to this compressed view of DNA. Interestingly, some stretch of "useless" DNA could be there for structural regulation (TADs, LADs, etc).
I think some simple organisms have much more DNA than humans. Any idea why that is and how that fits with your information estimates?
If you take (say) lungfish DNA, they have much more repetitive elements / jumping genes than humans. The exact cause of this seems to be unclear. But the impact in terms of information is that while they have more "storage space", I doubt they actually have more "information". That is, I speculate that you could theoretically engineer DNA to create a lungfish-like organism with vastly smaller DNA.
Nice, tightly compressed bit of information here.