[
    {
        "id": "thesis:17729",
        "collection": "thesis",
        "collection_id": "17729",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:10232025-192458502",
        "primary_object_url": {
            "basename": "ADThesisResubmit20251023.pdf",
            "content": "final",
            "filesize": 73758354,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/17729/9/ADThesisResubmit20251023.pdf",
            "version": "v5.0.0"
        },
        "type": "thesis",
        "title": "Resolving and Mathematizing Energetic Gradients That\r\nFacilitate Cytoskeletal Self-Assembly",
        "author": [
            {
                "family_name": "Duarte",
                "given_name": "Ana Isabel",
                "orcid": "0000-0003-3726-3018",
                "clpid": "Duarte-Ana-Isabel"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Phillips",
                "given_name": "Robert B.",
                "orcid": "0000-0003-3082-2809",
                "clpid": "Phillips-R"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Hsieh",
                "given_name": "David",
                "orcid": "0000-0002-0812-955X",
                "clpid": "Hsieh-David"
            },
            {
                "family_name": "Patterson",
                "given_name": "Ryan B.",
                "orcid": "0000-0002-5787-9517",
                "clpid": "Patterson-R-B"
            },
            {
                "family_name": "Phillips",
                "given_name": "Robert B.",
                "orcid": "0000-0003-3082-2809",
                "clpid": "Phillips-R"
            }
        ],
        "local_group": [
            {
                "literal": "div_pma"
            }
        ],
        "abstract": "In the thesis that follows, I describe three interconnected stories. These threads strive to paint a unified picture of the energetic and mechanical assembly of motor proteins and microtubules into structures that resemble mitotic spindles, the complex molecular machines that segregate chromosomes during cell division. In the work described, we introduce a new method for direct measurement of ATP molecules in space and time, building upon the field\u2019s excitement towards witnessing gradients in isolated processes. We additionally write mathematical models exploring the physics of building and maintaining gradients in non-equilibrium steady states. And, in the spirit of comprehensively understanding our system, we explore the material properties of dynamic network formation.",
        "doi": "10.7907/1ptv-0r61",
        "publication_date": "2026",
        "thesis_type": "phd",
        "thesis_year": "2026"
    },
    {
        "id": "thesis:17566",
        "collection": "thesis",
        "collection_id": "17566",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:07282025-205651638",
        "primary_object_url": {
            "basename": "white_elephants_and_cash_cows.pdf",
            "content": "final",
            "filesize": 12002392,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/17566/1/white_elephants_and_cash_cows.pdf",
            "version": "v4.0.0"
        },
        "type": "thesis",
        "title": "White Elephants and Cash Cows: Economically Wrangling the Zoo of AI Models",
        "author": [
            {
                "family_name": "Zellinger",
                "given_name": "Michael J.",
                "orcid": "0009-0001-7499-148X",
                "clpid": "Zellinger-Michael-J"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "B\u00fchlmann",
                "given_name": "Peter",
                "orcid": "0000-0002-1782-6015",
                "clpid": "B\u00fchlmann-Peter"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Perona",
                "given_name": "Pietro",
                "orcid": "0000-0002-7583-5809",
                "clpid": "Perona-P"
            },
            {
                "family_name": "Wierman",
                "given_name": "Adam C.",
                "orcid": "0000-0002-5923-0199",
                "clpid": "Wierman-A-C"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "B\u00fchlmann",
                "given_name": "Peter",
                "orcid": "0000-0002-1782-6015",
                "clpid": "B\u00fchlmann-Peter"
            }
        ],
        "local_group": [
            {
                "literal": "div_eng"
            }
        ],
        "abstract": "The capabilities of artificial intelligence are rapidly expanding, but deploying AI systems in practice still poses significant challenges. Specifically, practitioners find limited guidance on selecting the most suitable AI model for a concrete use case, balancing the economics of an AI deployment, and managing the risk of AI errors. These challenges call for a unified framework addressing pain points in a conceptually clear and statistically sound manner. In this thesis, we present several components of such a framework: 1) uncertainty-aware system optimization, 2) economic evaluation, 3) error reduction with human-in-the-loop, and 4) a proof-of-concept system for synthetic data generation. Our work presents novel technical and conceptual approaches for orchestrating natural language-based systems, advancing the economical and reliable deployment of artificial intelligence.",
        "doi": "10.7907/xj31-xm14",
        "publication_date": "2026",
        "thesis_type": "phd",
        "thesis_year": "2026"
    },
    {
        "id": "thesis:17550",
        "collection": "thesis",
        "collection_id": "17550",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:07232025-195327620",
        "primary_object_url": {
            "basename": "Thesis_Enrique (4).pdf",
            "content": "final",
            "filesize": 30948826,
            "license": "cc_by_nc_sa",
            "mime_type": "application/pdf",
            "url": "/17550/1/Thesis_Enrique (4).pdf",
            "version": "v4.0.0"
        },
        "type": "thesis",
        "title": "Designing Intelligent Agents for Real-Time Experimental Control and Multi-Task Generalization",
        "author": [
            {
                "family_name": "Amaya Perez",
                "given_name": "Enrique",
                "orcid": "0000-0003-3166-8583",
                "clpid": "Amaya-Perez-Enrique"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Sternberg",
                "given_name": "Paul W.",
                "orcid": "0000-0002-7699-0173",
                "clpid": "Sternberg-P-W"
            },
            {
                "family_name": "Rutishauser",
                "given_name": "Ueli",
                "orcid": "0000-0002-9207-7069",
                "clpid": "Rutishauser-U"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Van Valen",
                "given_name": "David A.",
                "orcid": "0000-0001-7534-7621",
                "clpid": "Van-Valen-D"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>Scientific discovery has traditionally relied on human-led iterative loops of observation, modeling, and intervention. This thesis explores the possibility of automating components of this loop using artificial intelligence (AI), particularly in systems characterized by non-equilibrium dynamics, high dimensionality, and emergent behaviors. Two foundational challenges are addressed: automating physical modeling and enabling adaptive interaction with dynamic experimental systems, and generalizing agent behavior across tasks and contexts without retraining.</p>\r\n\r\n<p>To address the first challenge, we introduce a hierarchical AI framework for controlling active biomolecular matter, exemplified by microtubule\u2013kinesin networks driven by light-activated motors. At the foundation are predictive models that learn the system\u2019s response to static light patterns, enabling inverse design by selecting inputs that yield desired structural outcomes. Building on this, dynamic models construct low-dimensional representations of the system\u2019s evolving state under time-varying stimuli, supporting forward simulation and real-time tracking. At the highest level, reinforcement learning agents\u2014trained in simulation\u2014discover and execute closed-loop control policies that achieve fine-grained manipulation objectives. These agents are deployed across ~100 parallel experimental setups, demonstrating autonomous operation with robustness, scalability, and reliable transfer.</p>\r\n\r\n<p>To address the second challenge, we investigate how generalist reinforcement learning agents can be constructed by leveraging the geometry of policy parameter space. We show that agents trained on distinct tasks self-organize into functionally segregated regions of weight space that encode both task identity and strategic variability. This insight enables the design of a hypernetwork\u2014a network that generates the weights of other networks\u2014that can interpolate smoothly between tasks and strategies via a single scalar input. Combined with a meta-controller, this architecture enables real-time modulation of agent behavior\u2014ranging from conservative to risk-seeking\u2014without retraining.</p>\r\n\r\n<p>Together, these contributions demonstrate that intelligent systems can both design and control physical experiments in real time, and adapt cognitive strategies across tasks through principled representations in policy space. This work establishes a foundation for closed-loop scientific autonomy, programmable biomaterials, and generalist AI agents, converging at the intersection of machine learning, biophysics, and automation.</p>",
        "doi": "10.7907/nmvs-7b59",
        "publication_date": "2026",
        "thesis_type": "phd",
        "thesis_year": "2026"
    },
    {
        "id": "thesis:17682",
        "collection": "thesis",
        "collection_id": "17682",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:09162025-184128136",
        "primary_object_url": {
            "basename": "Subramanian_Arjuna_thesis_vFINAL.pdf",
            "content": "final",
            "filesize": 98788613,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/17682/1/Subramanian_Arjuna_thesis_vFINAL.pdf",
            "version": "v5.0.0"
        },
        "type": "thesis",
        "title": "Rewriting the Sequence and Structure Rules of Deep Protein Space",
        "author": [
            {
                "family_name": "Subramanian",
                "given_name": "Arjuna Michael",
                "orcid": "0009-0004-2790-0209",
                "clpid": "Subramanian-Arjuna-Michael"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Mayo",
                "given_name": "Stephen L.",
                "orcid": "0000-0002-9785-5018",
                "clpid": "Mayo-S-L"
            },
            {
                "family_name": "Murray",
                "given_name": "Richard M.",
                "orcid": "0000-0002-5785-7481",
                "clpid": "Murray-R-M"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Winfree",
                "given_name": "Erik",
                "orcid": "0000-0002-5899-7523",
                "clpid": "Winfree-E"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>With a 20-letter alphabet, conceivable protein sequence-space is enormous; sparks of structure and function are vanishingly rare. Despite massive advances in AI-guided protein design, we remain largely ignorant of the sequences and structures that populate the depths of protein space more than a handful of mutations away from what nature has tried. In this work, we leverage the potential of one specific class of AI protein model \u2014 the protein language model, or PLM \u2014 to internalize the essential features of the protein sequence-structure map while retaining the capacity to explore its extremes. Guided by a \"novelty first, fitness next\" mentality, we harness this balance towards systematic discovery of new-to-nature sequences and structures throughout deep protein space.</p>\r\n\r\n<p> In the first section, we dissect the ability of PLMs to explore natural and novel regimes of sequence and structure during free generation. We find that while these models readily emit novel sequences encoding artificial proteins that appear biophysically feasible in silico, they fail to completely or representatively capture the known distribution of natural protein structures. We expose a fundamental tradeoff between the ability of a PLM to generate with sequence novelty or structural coverage but not both simultaneously; prioritizing sampling of far-from-natural sequences triggers a collapse to a handful of simple structural motifs and disordered regions. </p>\r\n\r\n<p> Turning this sequence novelty vs. structural breadth tradeoff to our advantage, the second section is devoted to the development of \"foldtuning\" \u2014 a structure-preserving, sequence-remodeling engine for navigating the far corners of sequence-space with PLM-based probes. We successfully scale and deploy foldtuning for &gt;700 targets, pushing artificial sequences past the point of detectable homology to any real protein documented in nature, discovering novel sequence-level semantics and grammar for mimicking known protein folds, and accessing potential reservoirs of downstream structural and functional innovation. Experimental validation of select targets reveals that foldtuning produces realizable and functional binders in contexts including a toxin/antitoxin system and peptide hormone signaling. </p>\r\n\r\n<p> Shifting to focus on structural novelty, the final section introduces two PLM-driven methods for the discovery of new-to-nature structures. We show that with appropriate steering functions, PLMs readily yield well-structured  domains (featuring diverse secondary and supersecondary elements) outside the several thousand such families cataloged from among known proteins. Overall, this work makes substantial inroads towards the challenge of locating viable far-from-natural regions of protein density across the global sequence-structure map, and revises our notions of the physical constraints on sequence and structure in valid proteins. Moreover, it sets the stage for future assembly of synthetic biological systems composed fully of new-to-nature parts and ultimately for modeling efforts that close the design loop from sequence all the way to complex phenotype.</p>",
        "doi": "10.7907/p4st-m614",
        "publication_date": "2026",
        "thesis_type": "phd",
        "thesis_year": "2026"
    },
    {
        "id": "thesis:17785",
        "collection": "thesis",
        "collection_id": "17785",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:12042025-191849199",
        "primary_object_url": {
            "basename": "251231_Thesis_Duncan_Chadly.pdf",
            "content": "final",
            "filesize": 10290195,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/17785/2/251231_Thesis_Duncan_Chadly.pdf",
            "version": "v5.0.0"
        },
        "type": "thesis",
        "title": "High-Resolution Phylogenetic Lineage Recording with CRISPR Base Editors",
        "author": [
            {
                "family_name": "Chadly",
                "given_name": "Duncan Matthew",
                "orcid": "0000-0002-8417-1522",
                "clpid": "Chadly-Duncan-Matthew"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Elowitz",
                "given_name": "Michael B.",
                "orcid": "0000-0002-1221-0967",
                "clpid": "Elowitz-M-B"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Cai",
                "given_name": "Long",
                "orcid": "0000-0002-7154-5361",
                "clpid": "Cai-Long"
            },
            {
                "family_name": "Elowitz",
                "given_name": "Michael B.",
                "orcid": "0000-0002-1221-0967",
                "clpid": "Elowitz-M-B"
            },
            {
                "family_name": "Guttman",
                "given_name": "Mitchell",
                "orcid": "0000-0003-4748-9352",
                "clpid": "Guttman-M"
            },
            {
                "family_name": "Lois",
                "given_name": "Carlos",
                "orcid": "0000-0002-7305-2317",
                "clpid": "Lois-Carlos"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "Dividing and differentiating cells form exquisitely organized structures across every facet of multicellular life. If we could measure the complete history of cells as they divide, change transcriptional state, and move spatially, we could address critical questions about stem cell differentiation, development, and the onset of disease. However, determining cellular ontologies is challenging except in rare cases where continual optical access is possible. Base editing technology enables the generation of stochastic, heritable mutations into genomic DNA while cells grow and divide. Comparing mutation patterns between cells allows inference of their lineage relationships in a manner analogous to evolutionary phylogenetic reconstruction. Here, we present two phylogenetic recording systems that enable high resolution lineage reconstruction over long time scales. In the first system, termed baseMEMOIR, we introduce a multiplexed, genomically dispersed set of editable targets that can be read out by imaging in situ. This system preserves spatial organization of cells and is compatible with downstream transcriptional measurements. In the second system, which we term the hypercascade, we take advantage of the predictability of A-to-G base editing to create a system in which edits not only alter bases but also generate new editable target sites in synthetic sequences. This behavior linearizes the rate at which mutations accumulate, improving lineage reconstruction. These methods enable analysis of temporal dynamics in diverse biological contexts.",
        "doi": "10.7907/0afd-8p19",
        "publication_date": "2026",
        "thesis_type": "phd",
        "thesis_year": "2026"
    },
    {
        "id": "thesis:17643",
        "collection": "thesis",
        "collection_id": "17643",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:08212025-191338101",
        "primary_object_url": {
            "basename": "olson_blade_2026_thesis.pdf",
            "content": "final",
            "filesize": 8714374,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/17643/6/olson_blade_2026_thesis.pdf",
            "version": "v7.0.0"
        },
        "type": "thesis",
        "title": "Synthetic Antigen-Presenting Vesicles for Selective Immunomodulation",
        "author": [
            {
                "family_name": "Olson",
                "given_name": "Blade A.",
                "orcid": "0000-0002-1526-1399",
                "clpid": "Olson-Blade-A"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Mayo",
                "given_name": "Stephen L.",
                "orcid": "0000-0002-9785-5018",
                "clpid": "Mayo-S-L"
            },
            {
                "family_name": "Murray",
                "given_name": "Richard M.",
                "orcid": "0000-0002-5785-7481",
                "clpid": "Murray-R-M"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Mayo",
                "given_name": "Stephen L.",
                "orcid": "0000-0002-9785-5018",
                "clpid": "Mayo-S-L"
            },
            {
                "family_name": "Murray",
                "given_name": "Richard M.",
                "orcid": "0000-0002-5785-7481",
                "clpid": "Murray-R-M"
            },
            {
                "family_name": "Bjorkman",
                "given_name": "Pamela J.",
                "orcid": "0000-0002-2277-3990",
                "clpid": "Bjorkman-P-J"
            },
            {
                "family_name": "Gradinaru",
                "given_name": "Viviana",
                "orcid": "0000-0001-5868-348X",
                "clpid": "Gradinaru-V"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>The rapid advancement of generative artificial intelligence has enabled unprecedented progress for the field of computational protein design. A forthcoming challenge for generative protein design algorithms is the immunocompatibility of these de novo designed molecules with organism physiology, namely humans. A separate, but related, aspirational goal for synthetic biology is to perform cellular reprogramming in vivo so that cell-based therapies and biologics are generated endogenously by patients rather than being externally manufactured or expanded before delivery, as is the case with biologics, T cell therapies, and stem cell therapies; again, a major hurdle for the in vivo production of these therapies and in vivo cellular reprogramming is immunogenicity.</p>\r\n\r\n<p>To address these challenges, we first demonstrate a cell-like, cell-free approach for in vivo cellular reprogramming with the induced release of pMHCI and pMHCII-loaded synthetic antigen-presenting vesicles that are secreted from non-immune cells by DNA and mRNA transfection to facilitate the selective expansion or silencing of immune responses. Next, we show initial results for the use of human tonsil organoids as a quantitative assay for adenoviral vector immunogenicity, enabling future directed evolution approaches for immunogenicity reduction as well as generation of an immunogenicity dataset to tailor modern computational protein design algorithms for human immunocompatibility. Together, these projects represent complementary methods to control protein immunogenicity, either through rationally engineered or directedly evolved modifications identified by physiologically-relevant in vitro models, or with an administered mRNA therapeutic that selectively modifies the immune response to a protein that cannot be computationally redesigned.</p>",
        "doi": "10.7907/fxzx-yn04",
        "publication_date": "2026",
        "thesis_type": "phd",
        "thesis_year": "2026"
    },
    {
        "id": "thesis:17794",
        "collection": "thesis",
        "collection_id": "17794",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:12102025-235050354",
        "primary_object_url": {
            "basename": "Thesis, Yameng Zhang.pdf",
            "content": "final",
            "filesize": 5533519,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/17794/1/Thesis, Yameng Zhang.pdf",
            "version": "v4.0.0"
        },
        "type": "thesis",
        "title": "Neural Circuits Underlying Salt-Taste Valence",
        "author": [
            {
                "family_name": "Zhang",
                "given_name": "Yameng",
                "orcid": "0009-0005-7038-7049",
                "clpid": "Zhang-Yameng"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Oka",
                "given_name": "Yuki",
                "orcid": "0000-0003-2686-0677",
                "clpid": "Oka-Yuki"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Anderson",
                "given_name": "David J.",
                "orcid": "0000-0001-6175-3872",
                "clpid": "Anderson-D-J"
            },
            {
                "family_name": "Lois",
                "given_name": "Carlos",
                "orcid": "0000-0002-7305-2317",
                "clpid": "Lois-Carlos"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Oka",
                "given_name": "Yuki",
                "orcid": "0000-0003-2686-0677",
                "clpid": "Oka-Yuki"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>Salt consumption is unique among the five basic tastes. The perception of salty taste revealed a concentration-dependent and internal-state-dependent valence pattern. Low concentrations of salt trigger sodium-specific taste receptors while high concentrations recruit bitter and sour pathways, which have been investigated by previous research and proven with daily experience. I focus on the dynamic nature of salt perception, in the perspective of physiological states.</p>\r\n\r\n<p>In calorie restricted state, food cues become more appetitive, but in sodium depletion state, the hedonic value of salt fundamentally reversed. High concentrations of salt induce innate aversion under sated states, whereas such aversive stimuli transform into appetitive ones under sodium depletion. Neural mechanisms underlying this state-dependent salt valence switch are poorly understood. Using transcriptomics state-to-cell-type mapping and neural manipulations, we show that positive and negative valences of salt are controlled by anatomically distinct neural circuits in the mammalian brain. The hindbrain interoceptive circuit regulates sodium-specific appetitive drive, whereas behavioral tolerance of aversive salts is encoded by a dedicated class of neurons in the forebrain lamina terminalis (LT) expressing prostaglandin E2 (PGE2) receptor, Ptger3. We show that these LT neurons regulate salt tolerance by selectively modulating aversive taste sensitivity, partly through a PGE2-Ptger3 axis. These results reveal the bimodal regulation of appetitive and tolerance signals toward salt, which together dictate the amount of sodium consumption under different internal states.</p>\r\n\r\n<p>Maintaining the fluid balance requires complex crosstalk within the neural circuits and endocrine systems through the brain-body axis. Despite the prevalence of fluid balance dysregulation and salt overconsumption in modern life, its health relevance remains underappreciated. I have been attracted to the global salt overconsumption crisis from the start of my Ph. D. study. The current approaches to regulate salt intake rely heavily on imperfect salt substitutes. The PGE2-Ptger3 brain-body axis posed a new top-down approach to reduce salt intake. Interestingly, PGE2 is a critical biomarker in pro-inflammation state. It has been investigated that high salt intake induces chronic inflammation, desensitization of salty tastes, and intensified craving for consumption. I regard my research of the tolerance circuit as an entry point and hope future research into inflammation and its role in salt intake can be translated into concrete salt reduction solutions.</p>",
        "doi": "10.7907/kp37-4v28",
        "publication_date": "2026",
        "thesis_type": "phd",
        "thesis_year": "2026"
    },
    {
        "id": "thesis:17820",
        "collection": "thesis",
        "collection_id": "17820",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:01152026-202412598",
        "primary_object_url": {
            "basename": "Thesis_RongrongDu_20260115.pdf",
            "content": "final",
            "filesize": 42260770,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/17820/1/Thesis_RongrongDu_20260115.pdf",
            "version": "v4.0.0"
        },
        "type": "thesis",
        "title": "Build Synthetic Circuits at Different Scales",
        "author": [
            {
                "family_name": "Du",
                "given_name": "Rongrong",
                "orcid": "0009-0003-4942-3020",
                "clpid": "Du-Rongrong"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Elowitz",
                "given_name": "Michael B.",
                "orcid": "0000-0002-1221-0967",
                "clpid": "Elowitz-M-B"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Bronner",
                "given_name": "Marianne E.",
                "orcid": "0000-0003-4274-1862",
                "clpid": "Bronner-M-E"
            },
            {
                "family_name": "Cai",
                "given_name": "Long",
                "orcid": "0000-0002-7154-5361",
                "clpid": "Cai-Long"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Elowitz",
                "given_name": "Michael B.",
                "orcid": "0000-0002-1221-0967",
                "clpid": "Elowitz-M-B"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>Multicellular organisms rely on the coordinated actions of diverse organs to sustain life. Each organ comprises cells that communicate with each other to execute physiological functions, and each cell encodes gene regulatory networks that shape its gene expression programs. The intrinsic complexity of biological systems, including features such as redundancy that endow them with robustness, also makes them difficult to study using reductionist approaches alone.</p>\r\n\r\n<p>To elucidate quantitative design principles underlying multicellular organization, I adopted a bottom-up approach and built synthetic circuits at multiple scales. In the first project, I engineered a single-gene incoherent feedforward circuit that leverages multispecific microRNA targeting to achieve dosage-invariant and tunable protein expression across wide ranges of gene copy numbers. In the second project, I constructed a multicellular reaction\u2013diffusion circuit that integrates juxtacrine and paracrine signaling to generate self-organized, periodic Turing patterns.</p>\r\n \r\n<p>Together, these studies introduce new tools for engineering regulatory behaviors, reveal general principles that govern biological organization across scales, and pave the way for potential translational applications.</p>",
        "doi": "10.7907/1ksn-sb30",
        "publication_date": "2026",
        "thesis_type": "phd",
        "thesis_year": "2026"
    },
    {
        "id": "thesis:17880",
        "collection": "thesis",
        "collection_id": "17880",
        "cite_using_url": "https://resolver.caltech.edu/CaltechThesis:02102026-091429391",
        "primary_object_url": {
            "basename": "Thesis_final_CF.pdf",
            "content": "final",
            "filesize": 8081793,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/17880/1/Thesis_final_CF.pdf",
            "version": "v4.0.0"
        },
        "type": "thesis",
        "title": "Biophysical Modeling for Gene Expression and Evolution",
        "author": [
            {
                "family_name": "Felce",
                "given_name": "Catherine E.",
                "orcid": "0009-0009-9909-6711",
                "clpid": "Felce-Catherine-E"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Pachter",
                "given_name": "Lior S.",
                "orcid": "0000-0002-9164-6231",
                "clpid": "Pachter-L"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Phillips",
                "given_name": "Robert B.",
                "orcid": "0000-0003-3082-2809",
                "clpid": "Phillips-R"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Pennell",
                "given_name": "Matthew",
                "orcid": "0000-0002-2886-3970",
                "clpid": "Pennell-Matthew"
            },
            {
                "family_name": "Pachter",
                "given_name": "Lior S.",
                "orcid": "0000-0002-9164-6231",
                "clpid": "Pachter-L"
            }
        ],
        "local_group": [
            {
                "literal": "div_pma"
            }
        ],
        "abstract": "Principled biophysical modeling is a necessary foundation for analyzing RNA sequencing data. In recent years, higher quality data for other data modalities at single-cell resolution have become available. I present joint biophysical models combining two of these modalities, chromatin accessibility measurements (ATAC-seq) and protein counts, individually with single-cell transcriptomic data, and give preliminary data results. I consider the extension of biophysically motivated models to the field of phylogenetics. I present competing mechanistic hypotheses for gene expression evolution and test them via parametrized single-cell cross-species data. I also consider a physics-inspired model for population-level evolution via maternal effects and interacting subpopulations.",
        "doi": "10.7907/chmp-kt37",
        "publication_date": "2026",
        "thesis_type": "phd",
        "thesis_year": "2026"
    },
    {
        "id": "thesis:17647",
        "collection": "thesis",
        "collection_id": "17647",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:08252025-232825764",
        "primary_object_url": {
            "basename": "Manisha_Kapasiawala_Caltech_PhD_Thesis.pdf",
            "content": "final",
            "filesize": 35470940,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/17647/1/Manisha_Kapasiawala_Caltech_PhD_Thesis.pdf",
            "version": "v5.0.0"
        },
        "type": "thesis",
        "title": "Design Considerations for Synthetic Cells",
        "author": [
            {
                "family_name": "Kapasiawala",
                "given_name": "Manisha Kaushik",
                "orcid": "0000-0002-0302-2921",
                "clpid": "Kapasiawala-Manisha-Kaushik"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Murray",
                "given_name": "Richard M.",
                "orcid": "0000-0002-5785-7481",
                "clpid": "Murray-R-M"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Murray",
                "given_name": "Richard M.",
                "orcid": "0000-0002-5785-7481",
                "clpid": "Murray-R-M"
            },
            {
                "family_name": "Voorhees",
                "given_name": "Rebecca M.",
                "orcid": "0000-0003-1640-2293",
                "clpid": "Voorhees-R-M"
            },
            {
                "family_name": "Winfree",
                "given_name": "Erik",
                "orcid": "0000-0002-5899-7523",
                "clpid": "Winfree-E"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>Efforts to understand life as we know it and life as it can be have culminated in the field of synthetic cell research, which aims to build life from the bottom up using individual biological components. Recent progress in the field has enabled the reconstitution of many functions of living cells in synthetic cells, from cell-cell communication to membrane protein expression and function. However, future progress in the field is limited by many challenges, including irreproducibility, lack of predictability, difficulties in integrating existing synthetic cell modules (or subsystems), and the need for autonomous functionalities.</p>\r\n\r\n<p>In this work, I describe my efforts towards addressing these challenges. In Chapter 2, I describe sources of variability in transcription-translation (TX-TL) systems, the biological machinery used to implement biomolecular programs in synthetic cells. In Chapter 3, I describe a novel methodology for readily building more predictive models of TX-TL performance. In Chapter 4, I present a design for a proof-of-concept for integrating an energy regeneration subsystem and a motility subsystem to achieve autonomous programmable motility and highlight some early successes towards achieving that goal. Throughout this work, I highlight many design principles for building synthetic cells reproducibly, more predictably, and with novel functionalities.</p>",
        "doi": "10.7907/zfhy-bk03",
        "publication_date": "2026",
        "thesis_type": "phd",
        "thesis_year": "2026"
    },
    {
        "id": "thesis:17740",
        "collection": "thesis",
        "collection_id": "17740",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:10312025-170502666",
        "type": "thesis",
        "title": "Algae as a Platform for Sustainable Biocomposites: Process\u2013Structure\u2013Property Relations",
        "author": [
            {
                "family_name": "Wexler",
                "given_name": "Helen",
                "orcid": "0000-0003-4030-9603",
                "clpid": "Wexler-Helen"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Daraio",
                "given_name": "Chiara",
                "orcid": "0000-0001-5296-4440",
                "clpid": "Daraio-C"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Daraio",
                "given_name": "Chiara",
                "orcid": "0000-0001-5296-4440",
                "clpid": "Daraio-C"
            },
            {
                "family_name": "Burdick",
                "given_name": "Joel Wakeman",
                "orcid": "0000-0002-3091-540X",
                "clpid": "Burdick-J-W"
            },
            {
                "family_name": "McAniff",
                "given_name": "Peter",
                "clpid": "McAniff-Peter-J"
            },
            {
                "family_name": "Saigal",
                "given_name": "Anil",
                "orcid": "0000-0001-7911-8674",
                "clpid": "Saigal-Anil"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "This thesis investigates whole-cell algae as a possible binder matrix for fully bio-based composites and evaluates agricultural residues as reinforcing fillers. The study quantifies how feedstock morphology and preprocessing govern microstructure and mechanical response under compression molding that uses only water, heat, and pressure. Candidate algal feedstocks include food grade algae, single strain wastewater algae, and mixed wastewater communities.",
        "doi": "10.7907/ba11-m769",
        "publication_date": "2026",
        "thesis_type": "phd",
        "thesis_year": "2026"
    },
    {
        "id": "thesis:16912",
        "collection": "thesis",
        "collection_id": "16912",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:12092024-223834150",
        "type": "thesis",
        "title": "Quantitative Nucleic Acid Measurements Inform Strategies to Mitigate Viral Outbreaks",
        "author": [
            {
                "family_name": "Viloria Winnett",
                "given_name": "Alexander",
                "orcid": "0000-0002-7338-5605",
                "clpid": "Viloria-Winnett-Alexander"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Ismagilov",
                "given_name": "Rustem F.",
                "orcid": "0000-0002-3680-4399",
                "clpid": "Ismagilov-R-F"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Rothenberg",
                "given_name": "Ellen V.",
                "orcid": "0000-0002-3901-347X",
                "clpid": "Rothenberg-E-V"
            },
            {
                "family_name": "Arboleda",
                "given_name": "Valerie",
                "orcid": "0000-0002-9687-9122",
                "clpid": "Aboleda-V-A"
            },
            {
                "family_name": "Ismagilov",
                "given_name": "Rustem F.",
                "orcid": "0000-0002-3680-4399",
                "clpid": "Ismagilov-R-F"
            }
        ],
        "local_group": [
            {
                "literal": "3MT Competition (Caltech)"
            },
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "Humans have always been and continue to be at risk of infection by pathogens that surround us. However, recent advancements in quantitative nucleic acid technologies have allowed for more detailed study of these pathogens, how they spread among individuals, and how our immune systems respond to infection. In this thesis, I describe the design and execution of the Caltech COVID-19 Study, which used quantitative nucleic acid measurements to investigate the natural history of SARS-CoV-2 infection and inform strategies for diagnostics and vaccine development to reduce viral transmission. The Caltech COVID-19 Study enrolled participants in the Los Angeles area between September 2020 and April 2022 who were at risk of SARS-CoV-2 infection due to recent exposure to a household contact with acute infection. Participants collected paired upper respiratory specimens (saliva, nasal swabs, and throat swabs) daily or twice daily for approximately two weeks. These specimens underwent SARS-CoV-2 viral load quantification to assess transmission risk and determine whether to extend or terminate study enrollment. For participants who initially tested negative for SARS-CoV-2 RNA but later developed sustained infection, we tracked viral load from the very start of infection. These measurements were then used to evaluate the performance of various COVID-19 diagnostic tests. Our findings revealed a significant advantage of high-analytical-sensitivity tests over those with lower sensitivity, as well as the benefit of testing both the throat and nose rather than just the nose. In addition to viral load quantification, we sequenced human mRNA from these specimens to assess gene expression. Analyzing these changes allowed us to study how the mucosal immune system responds to acute viral infection across multiple anatomical sites over time, providing insights that could improve mucosal vaccine design. Notably, our data showed that, contrary to current models of localized paracrine interferon signaling, distinct compartments of the upper respiratory mucosa exhibited synchronized interferon stimulation during early infection\u2014even in the absence of detectable local viral replication. Mucosal vaccines capable of triggering this coordinated interferon response, maintaining CD8+ T memory cells to rapidly execute effector functions upon viral exposure, may be key to achieving sterilizing immunity. Findings from quantitative nucleic acid measurements in this thesis inform strategies to more effectively mitigate viral outbreaks.",
        "doi": "10.7907/qe3a-a670",
        "publication_date": "2025",
        "thesis_type": "phd",
        "thesis_year": "2025"
    },
    {
        "id": "thesis:16533",
        "collection": "thesis",
        "collection_id": "16533",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:07052024-170119371",
        "primary_object_url": {
            "basename": "Thesis.pdf",
            "content": "final",
            "filesize": 43525786,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/16533/1/Thesis.pdf",
            "version": "v4.0.0"
        },
        "type": "thesis",
        "title": "Active Acquisition Methods for Single Cell Genomics",
        "author": [
            {
                "family_name": "Chen",
                "given_name": "Xiaoqiao",
                "orcid": "0000-0003-4685-3466",
                "clpid": "Chen-Xiaoqiao"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Cai",
                "given_name": "Long",
                "orcid": "0000-0002-7154-5361",
                "clpid": "Cai-Long"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Yue",
                "given_name": "Yisong",
                "orcid": "0000-0001-9127-1989",
                "clpid": "Yue-Yisong"
            },
            {
                "family_name": "Bouman",
                "given_name": "Katherine L.",
                "orcid": "0000-0003-0077-4367",
                "clpid": "Bouman-K-L"
            }
        ],
        "local_group": [
            {
                "literal": "div_eng"
            }
        ],
        "abstract": "<p>We introduce two novel computational methodologies, ActiveSVM and Active Cell Inference, aimed at reducing the costs and enhancing the efficiency of single-cell mRNA sequencing and spatial transcriptomics, respectively. ActiveSVM employs an active learning approach to identify minimal yet highly informative gene sets for cell-type classification, physiological state identification, and genetic perturbation responses in single-cell datasets. By focusing on misclassified cells through an iterative process, ActiveSVM efficiently scales to analyze over a million cells, demonstrating around 90% accuracy across various datasets, including cell atlas and disease characterization studies.</p>\r\n\r\n<p>Active Cell Inference complements this by utilizing ordered gene sets, developed through ActiveSVM, to streamline spatial genomics measurements. This end-to-end pipeline significantly reduces measurement time and costs by up to 100-fold in scientific and clinical settings. It optimizes the gene probing process by identifying well-classified cells early, allowing for targeted gene application based on cell classification certainty. This method's efficacy is further enhanced by a temporal scaling calibration scheme, improving calibration accuracy throughout its iterative process.</p>\r\n\r\n<p>Both methodologies were rigorously tested on the expansive Human Cell Atlas dataset, using the advanced computational tool, CellxGene-Census, involving over 60 million cells. This integration facilitated the creation of precise gene sets for various human tissues, dramatically improving the efficiency and reliability of these cutting-edge genomic techniques. Together, ActiveSVM and Active Cell Inference represent significant advancements in the application of genomics to clinical diagnostics, therapeutic discovery, and genetic screens, promising substantial reductions in the operational complexities and costs associated with next-generation sequencing technologies.</p>",
        "doi": "10.7907/nsn8-nd79",
        "publication_date": "2025",
        "thesis_type": "phd",
        "thesis_year": "2025"
    },
    {
        "id": "thesis:16525",
        "collection": "thesis",
        "collection_id": "16525",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:06152024-132652470",
        "primary_object_url": {
            "basename": "Wang_Zitong_2025.pdf",
            "content": "final",
            "filesize": 14747054,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/16525/2/Wang_Zitong_2025.pdf",
            "version": "v3.0.0"
        },
        "type": "thesis",
        "title": "Theoretical and Computational Analysis of Cell Migration in Complex Tissue Environments",
        "author": [
            {
                "family_name": "Wang",
                "given_name": "Zitong (Jerry)",
                "orcid": "0000-0001-8008-7318",
                "clpid": "Wang-Zitong-Jerry"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Cai",
                "given_name": "Long",
                "orcid": "0000-0002-7154-5361",
                "clpid": "Cai-Long"
            },
            {
                "family_name": "Eberhardt",
                "given_name": "Frederick",
                "clpid": "Eberhardt-Frederick"
            },
            {
                "family_name": "Merchant",
                "given_name": "Akil Abid",
                "orcid": "0000-0001-7472-822X",
                "clpid": "Merchant-Akil-Abid"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>Cells sense and respond in spatially structured environments, including soils and tissue. My Ph.D. projects centered on developing new theoretical models and computational methods to understand how cells migrate in complex environments.</p> \r\n   \r\n<p>The first project is more theoretical in nature, leveraging information theory to study how the spatial organization of cell signaling pathways are adapted to the cell's natural environment. In tissue and soil, cells must localize to their targets by navigating distributions of extracellular ligands that are spatially discontinuous, consisting of local concentration peaks, due to binding a non-uniform network of ECM fibers. It is unclear how cells navigate patchy environments while not getting trapped in local concentration peaks. To answer this question, we framed navigation as a problem of maximizing mutual information in space and developed a computational algorithm for computing signaling pathway architectures that maximize mutual information in simulated natural environments. We found that for cells in tissues and soils, dynamic localization of membrane receptors dramatically boosts sensing precision and enables cells to navigate to chemical sources 30 times faster, but this receptor localization strategy is relatively inconsequential for cells in purely diffusive environments. Further, we found that anisotropic receptor dynamics previously observed in immune cells and growth cones are nearly optimal as predicted by our model.</p>\r\n\r\n<p>The second project is more computational in nature, leveraging multiplexed tissue imaging to understand T-cell migration in tumor microenvironments. Immunotherapies can halt or slow down cancer progression by activating either endogenous or engineered T-cells to detect and kill cancer cells. T-cells must infiltrate the tumor core for immunotherapies to be effective. However, many solid tumors resist T-cell infiltration, challenging the efficacy of current therapies. In collaboration with clinician scientists at Cedars-Sinai Medical Center, we developed an integrated deep learning framework, Morpheus, that takes large-scale spatial omics profiles of patient tumors, and combines a formulation of T-cell infiltration prediction as a self-supervised machine learning problem with a counterfactual optimization strategy to generate minimal tumor perturbations predicted to boost T-cell infiltration. We applied Morpheus to 368 metastatic melanoma and colorectal cancer samples assayed using 40-plex imaging mass cytometry, discovering cohort-dependent, combinatorial perturbations, involving CXCL9, CXCL10, CCL22 and CCL18 for melanoma and CXCR4, PD-1, PD-L1 and CYR61 for colorectal cancer, predicted to support T-cell infiltration across large patient cohorts. Using only raw image data, Morpheus also identified distinct therapeutic strategies for different patient strata such as cancer stage or fatty liver presence. Our work presents a paradigm for counterfactual-based prediction and design of cancer therapeutics using spatial omics data.</p>",
        "doi": "10.7907/mj08-b258",
        "publication_date": "2025",
        "thesis_type": "phd",
        "thesis_year": "2025"
    },
    {
        "id": "thesis:16525",
        "collection": "thesis",
        "collection_id": "16525",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:06152024-132652470",
        "primary_object_url": {
            "basename": "Wang_Zitong_2025.pdf",
            "content": "final",
            "filesize": 14747054,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/16525/2/Wang_Zitong_2025.pdf",
            "version": "v3.0.0"
        },
        "type": "thesis",
        "title": "Theoretical and Computational Analysis of Cell Migration in Complex Tissue Environments",
        "author": [
            {
                "family_name": "Wang",
                "given_name": "Zitong (Jerry)",
                "orcid": "0000-0001-8008-7318",
                "clpid": "Wang-Zitong-Jerry"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Cai",
                "given_name": "Long",
                "orcid": "0000-0002-7154-5361",
                "clpid": "Cai-Long"
            },
            {
                "family_name": "Eberhardt",
                "given_name": "Frederick",
                "clpid": "Eberhardt-Frederick"
            },
            {
                "family_name": "Merchant",
                "given_name": "Akil Abid",
                "orcid": "0000-0001-7472-822X",
                "clpid": "Merchant-Akil-Abid"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>Cells sense and respond in spatially structured environments, including soils and tissue. My Ph.D. projects centered on developing new theoretical models and computational methods to understand how cells migrate in complex environments.</p> \r\n   \r\n<p>The first project is more theoretical in nature, leveraging information theory to study how the spatial organization of cell signaling pathways are adapted to the cell's natural environment. In tissue and soil, cells must localize to their targets by navigating distributions of extracellular ligands that are spatially discontinuous, consisting of local concentration peaks, due to binding a non-uniform network of ECM fibers. It is unclear how cells navigate patchy environments while not getting trapped in local concentration peaks. To answer this question, we framed navigation as a problem of maximizing mutual information in space and developed a computational algorithm for computing signaling pathway architectures that maximize mutual information in simulated natural environments. We found that for cells in tissues and soils, dynamic localization of membrane receptors dramatically boosts sensing precision and enables cells to navigate to chemical sources 30 times faster, but this receptor localization strategy is relatively inconsequential for cells in purely diffusive environments. Further, we found that anisotropic receptor dynamics previously observed in immune cells and growth cones are nearly optimal as predicted by our model.</p>\r\n\r\n<p>The second project is more computational in nature, leveraging multiplexed tissue imaging to understand T-cell migration in tumor microenvironments. Immunotherapies can halt or slow down cancer progression by activating either endogenous or engineered T-cells to detect and kill cancer cells. T-cells must infiltrate the tumor core for immunotherapies to be effective. However, many solid tumors resist T-cell infiltration, challenging the efficacy of current therapies. In collaboration with clinician scientists at Cedars-Sinai Medical Center, we developed an integrated deep learning framework, Morpheus, that takes large-scale spatial omics profiles of patient tumors, and combines a formulation of T-cell infiltration prediction as a self-supervised machine learning problem with a counterfactual optimization strategy to generate minimal tumor perturbations predicted to boost T-cell infiltration. We applied Morpheus to 368 metastatic melanoma and colorectal cancer samples assayed using 40-plex imaging mass cytometry, discovering cohort-dependent, combinatorial perturbations, involving CXCL9, CXCL10, CCL22 and CCL18 for melanoma and CXCR4, PD-1, PD-L1 and CYR61 for colorectal cancer, predicted to support T-cell infiltration across large patient cohorts. Using only raw image data, Morpheus also identified distinct therapeutic strategies for different patient strata such as cancer stage or fatty liver presence. Our work presents a paradigm for counterfactual-based prediction and design of cancer therapeutics using spatial omics data.</p>",
        "doi": "10.7907/mj08-b258",
        "publication_date": "2025",
        "thesis_type": "phd",
        "thesis_year": "2025"
    },
    {
        "id": "thesis:17429",
        "collection": "thesis",
        "collection_id": "17429",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:06092025-113042101",
        "primary_object_url": {
            "basename": "visual-systems-and-the-forces-that-shape-them.pdf",
            "content": "final",
            "filesize": 111299864,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/17429/2/visual-systems-and-the-forces-that-shape-them.pdf",
            "version": "v5.0.0"
        },
        "type": "thesis",
        "title": "Visual Systems and the Forces That Shape Them",
        "author": [
            {
                "family_name": "McGill",
                "given_name": "Mason Benjamin",
                "orcid": "0000-0002-2782-3977",
                "clpid": "McGill-Mason-Benjamin"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Perona",
                "given_name": "Pietro",
                "orcid": "0000-0002-7583-5809",
                "clpid": "Perona-P"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Parker",
                "given_name": "Joseph",
                "orcid": "0000-0001-9598-2454",
                "clpid": "Parker-J"
            },
            {
                "family_name": "Rutishauser",
                "given_name": "Ueli",
                "orcid": "0000-0002-9207-7069",
                "clpid": "Rutishauser-U"
            },
            {
                "family_name": "Perona",
                "given_name": "Pietro",
                "orcid": "0000-0002-7583-5809",
                "clpid": "Perona-P"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "Vision neuroscience provides a unique opportunity to draw a correspondance between the physical world and its neural representation. But despite the amazing advances in neural recording technology that have occurred over the past two decades, we can't yet simultaneously record from more than a tiny fraction of the neurons in most of the visual systems currently being studied, which limits our ability to develop a holistic cause-and-effect understanding of how they operate. So it may make sense, as a complement to directly studying a visual system found in nature, to also study synthetic visual systems that in some way resemble it but are easier to inspect. This document describes four lines of work aimed at improving our ability to learn about biological visual systems using models optimized in ways that are analogous to the selective pressures that biological visual systems face, like the pressures to relay accurate information about the world, minimize energy consumption, and withstand perturbation. The first two of these lines of work---discussed in chapters 2 and 3---focus on expanding the space of selective forces that can be factored into optimization-guided models, and the other two---discussed in chapters 4 and 5---focus on modeling particular visual systems (in the macaque and the fruit fly, respectively). Taken together, optimization-guided modeling is shown to be a promising approach to advancing our understanding of visual processing across the animal kingdom, allowing us to leverage hypotheses about the high-level properties of visual systems to amplify the value of sparse neural data.",
        "doi": "10.7907/y27w-m760",
        "publication_date": "2025",
        "thesis_type": "phd",
        "thesis_year": "2025"
    },
    {
        "id": "thesis:17389",
        "collection": "thesis",
        "collection_id": "17389",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:06032025-002120461",
        "primary_object_url": {
            "basename": "Thesis.pdf",
            "content": "final",
            "filesize": 25197916,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/17389/1/Thesis.pdf",
            "version": "v5.0.0"
        },
        "type": "thesis",
        "title": "A Biophysical Approach to Normalization and Trajectory Inference in Single-Cell RNA Sequencing Data Analysis",
        "author": [
            {
                "family_name": "Fang",
                "given_name": "Meichen",
                "orcid": "0000-0002-8217-0710",
                "clpid": "Fang-Meichen"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Pachter",
                "given_name": "Lior S.",
                "orcid": "0000-0002-9164-6231",
                "clpid": "Pachter-L"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Pachter",
                "given_name": "Lior S.",
                "orcid": "0000-0002-9164-6231",
                "clpid": "Pachter-L"
            },
            {
                "family_name": "Bois",
                "given_name": "Justin S.",
                "orcid": "0000-0001-7137-8746",
                "clpid": "Bois-J-S"
            },
            {
                "family_name": "Chong",
                "given_name": "Shasha",
                "orcid": "0000-0002-5372-311X",
                "clpid": "Chong-Shasha"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>Single-cell genomics assays, particularly single-cell RNA sequencing that enables genome-wide profiling of gene expression, have been driven forward by a combination of technological and computational advances. While producing extraordinary large amounts of data for biological discovery, methods for mining results currently rely heavily on heuristics and lack of modeling has resulted in limited mechanistic biological insight. This thesis presents two models for normalization and trajectory inference in single-cell RNA sequencing analysis to demonstrate how biophysical modeling, when combined with principled statistical inference, can yield interpretable insights grounded in rigorous theoretical frameworks.</p>\r\n\r\n<p>We begin by explaining the two cultures in single-cell RNA sequencing analysis. Next, we present the chemical master equation, which forms the theoretical foundation for biophysically informed stochastic models of gene expression, and explore an existing gap in developing uniform approximations over time under the large-volume limit. Returning to single-cell RNA sequencing data analysis, we introduce two mechanistic models for normalization and trajectory inference, which are essential components of single-cell RNA sequencing analysis.</p>",
        "doi": "10.7907/asek-t904",
        "publication_date": "2025",
        "thesis_type": "phd",
        "thesis_year": "2025"
    },
    {
        "id": "thesis:17383",
        "collection": "thesis",
        "collection_id": "17383",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:06022025-234901151",
        "type": "thesis",
        "title": "The Topology of Cellular Ontogeny",
        "author": [
            {
                "family_name": "Flores-Bautista",
                "given_name": "Emanuel",
                "orcid": "0000-0002-2810-1757",
                "clpid": "Flores-Bautista-Emanuel"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Prober",
                "given_name": "David A.",
                "orcid": "0000-0002-7371-4675",
                "clpid": "Prober-D-A"
            },
            {
                "family_name": "Marcolli",
                "given_name": "Matilde",
                "orcid": "0000-0002-2045-2907",
                "clpid": "Marcolli-M"
            },
            {
                "family_name": "Pachter",
                "given_name": "Lior S.",
                "orcid": "0000-0002-9164-6231",
                "clpid": "Pachter-L"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "A fundamental goal of modern biology is to build global, predictive models of gene regulation that encompass diverse physiological contexts. Single-cell transcriptomics has enabled the creation of developmental cell atlases--detailed catalogs of gene expression patterns and differentiation trajectories at an organismal scale. The widespread availability of  cell atlases across metazoan model organisms presents an opportunity to construct global theories of cell-state control. In this thesis, we introduce a framework that uses persistent homology to decompose cell atlases into topological structures that provide signatures of gene regulation at the scale of an organism. Using this framework, we found that the topological structure of a broad set of developmental atlases contains only a discrete set of topological structures\u2014such as clusters, trees, and loops\u2014-revealing the recurrent use of global gene regulatory strategies. Our analysis revealed that the tree topology, while predominant, is not universal. Indeed, we identified non-trivial topologies containing loops in the development of human immune cells, seam-hypodermal cells in \\textit{C. elegans}, and the cnidocytes of multiple cnidarians. Analysis of cell-state manifolds with non-trivial topology demonstrated an important role of convergent structures in increasing cellular diversity along paths to a common cell fate, and of cyclic structures in self-renewal of progenitor-like states. Together, this work provides a global perspective on principles of cell-state regulation, and suggests that loops are important organizing structures for controlling cell differentiation.",
        "doi": "10.7907/t8hc-yq15",
        "publication_date": "2025",
        "thesis_type": "phd",
        "thesis_year": "2025"
    },
    {
        "id": "thesis:16607",
        "collection": "thesis",
        "collection_id": "16607",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:08022024-005547280",
        "type": "thesis",
        "title": "Studies on Scaling Throughput in Protein Engineering",
        "author": [
            {
                "family_name": "Schaus",
                "given_name": "Lucas Jean Nicolas",
                "orcid": "0000-0002-6094-7402",
                "clpid": "Schaus-Lucas-Jean-Nicolas"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Mayo",
                "given_name": "Stephen L.",
                "orcid": "0000-0002-9785-5018",
                "clpid": "Mayo-S-L"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Rees",
                "given_name": "Douglas C.",
                "orcid": "0000-0003-4073-1185",
                "clpid": "Rees-D-C"
            },
            {
                "family_name": "Bjorkman",
                "given_name": "Pamela J.",
                "orcid": "0000-0002-2277-3990",
                "clpid": "Bjorkman-P-J"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Mayo",
                "given_name": "Stephen L.",
                "orcid": "0000-0002-9785-5018",
                "clpid": "Mayo-S-L"
            }
        ],
        "local_group": [
            {
                "literal": "Resnick Sustainability Institute"
            },
            {
                "literal": "div_chem"
            }
        ],
        "abstract": "<p>In this work we present three studies in protein engineering. While all three protein classes that have been targeted for engineering tasks are very different, the studies have a focus on scaling-up the throughput in protein engineering.</p>\r\n\r\n<p>The first study concerns machine learning (ML) based antibody humanization techniques. Achieving a reduction of patient anti-drug antibody responses in clinical trials is the goal of antibody humanization. To measure this however, one needs to pass significant scientific, bureaucratic, and financial hurdles, which is very rarely done and especially never at scale. Most existing ML-based antibody humanization techniques claim that they work without providing any experimental evidence. We developed Mousify as an in silico antibody humanization platform to place existing models into one framework for wet-laboratory validation. We demonstrate that even the best models have a fundamental flaw in that they only generate a single antibody. We use Mousify and Markov chains to show that using ML-based antibody humanization models for library generation is not only feasible but produces both stable and functional variants. Learning the lessons from our wet-laboratory experiments, we then developed a variational autoencoder model with properties that hopefully improve the outcomes of antibody humanization experiments.</p>\r\n \r\n<p>In the second study, we outline our plans and initial results to develop a bioelectrocatalytic system for the conversion of N2 to ammonia using nitrogenase. Most of the world\u2019s ammonia is used for agricultural purposes and is produced via the environmentally damaging Haber-Bosch process. Engineering nitrogenase for the bioelectrocatalytic production of ammonia is not trivial and a high throughput is not guaranteed. We present preliminary results in how throughput can be increased through diazotrophic pre-selection of nitrogenase variants, as well as a quest to find the ideal starting point for engineering using a combination of ancestral sequence reconstruction and generative protein language models.</p>\r\n\r\n<p>In the third and final study we present a directed evolution campaign to evolve protoglobins for the enantioselective catalytic formation of cis-trifluoromethyl substituted cyclopropanes, the first such reaction in both the chemical and biological world. Not only is the enzyme ApePgb LQ capable of efficiently performing carbene insertions into double-bonds, but it also shows a much more diverse substrate scope than similar enantioselective formations of trans-trifluoromethyl substituted cyclopropanes. After demonstrating that ApePgb LQ reactions can be increased to a 1-mmol scale, we investigated the nature of protoglobin cis-selectivity using various computational methods.</p>",
        "doi": "10.7907/jqng-x012",
        "publication_date": "2025",
        "thesis_type": "phd",
        "thesis_year": "2025"
    },
    {
        "id": "thesis:17344",
        "collection": "thesis",
        "collection_id": "17344",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:06012025-204136107",
        "primary_object_url": {
            "basename": "Methods_for_long_read_RNA_seq_transcriptomics_Loving_Rebekah_2025.pdf",
            "content": "final",
            "filesize": 51348381,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/17344/1/Methods_for_long_read_RNA_seq_transcriptomics_Loving_Rebekah_2025.pdf",
            "version": "v5.0.0"
        },
        "type": "thesis",
        "title": "Methods for Long Read RNA-Seq Transcriptomics",
        "author": [
            {
                "family_name": "Loving Ngo",
                "given_name": "Rebekah Kiana",
                "orcid": "0000-0001-8725-0376",
                "clpid": "Loving-Ngo-Rebekah-Kiana"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Pachter",
                "given_name": "Lior S.",
                "orcid": "0000-0002-9164-6231",
                "clpid": "Pachter-L"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Wold",
                "given_name": "Barbara J.",
                "orcid": "0000-0003-3235-8130",
                "clpid": "Wold-B-J"
            },
            {
                "family_name": "Perona",
                "given_name": "Pietro",
                "orcid": "0000-0002-7583-5809",
                "clpid": "Perona-P"
            },
            {
                "family_name": "Mortazavi",
                "given_name": "Ali",
                "orcid": "0000-0002-4259-6362",
                "clpid": "Mortazavi-Ali"
            },
            {
                "family_name": "Pachter",
                "given_name": "Lior S.",
                "orcid": "0000-0002-9164-6231",
                "clpid": "Pachter-L"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "While short read RNA-seq dominated the field for decades, long read RNA-seq is particularly useful for isoform-level expression analysis, genome annotation, detecting novelly splicing transcripts, identifying exact breakpoints in gene fusions, and discovering chimeric RNAs. Long read RNA-seq has rapidly scaled to the point of producing terabytes of data from a single set of experiments. Technological advances in RNA and DNA sequencing library preparation, chemistry used in the Oxford nanopores, and basecalling algorithms have reduced long read sequencing error rates to sub-1% error. Further, the cost of long read sequencing has dropped to about one hundred US dollars per human genome. These two factors have lead to the mass production of high-throughput, long read, and single-cell RNA-seq data. While recent tools for long read RNA-seq have been developed, they have not kept pace in scalability and accuracy with long read RNA-seq in the fashion that short read RNA-seq tools have met computational scalability and accuracy challenges. To address this, in this thesis, we leverage long k-mers and pseudoalignment for mapping and quantifying long reads in the novel algorithm implemented within lr-kallisto, which yields both efficiency and higher accuracy for long read mapping and quantification than previous tools. We demonstrate that long read RNA-seq has reached sufficient depth and accuracy to yield accurate quantification of isoform-level expression for differential expression analysis. Furthermore, we explore the feasibilty of also utilizing long k-mers and pseudoalignment in both transcript discovery in dn-kallisto and gene fusion and immune receptor sequence discovery with fugi with measured success. Thus, our tools will enable a more complete, accurate, and scalable analysis of single-cell and bulk RNA-seq than has hitherto been possible in both quantifications and differential expression analysis as well as investigation of gene fusions, chimeric RNAs, and immune receptor sequences without bias.",
        "doi": "10.7907/3nz8-3c83",
        "publication_date": "2025",
        "thesis_type": "phd",
        "thesis_year": "2025"
    },
    {
        "id": "thesis:17326",
        "collection": "thesis",
        "collection_id": "17326",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:05312025-073148050",
        "primary_object_url": {
            "basename": "Caltech_Thesis___Alec_Lourenc\u0327o-1.pdf",
            "content": "final",
            "filesize": 9832833,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/17326/2/Caltech_Thesis___Alec_Lourenc\u0327o-1.pdf",
            "version": "v5.0.0"
        },
        "type": "thesis",
        "title": "Building Closed-Loop Frameworks for AI-Guided Protein Design",
        "author": [
            {
                "family_name": "Louren\u00e7o",
                "given_name": "Alexandre Luiz",
                "orcid": "0009-0005-0758-2968",
                "clpid": "Louren\u00e7o-Alexandre-Luiz"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Mayo",
                "given_name": "Stephen L.",
                "orcid": "0000-0002-9785-5018",
                "clpid": "Mayo-S-L"
            },
            {
                "family_name": "Zinn",
                "given_name": "Kai George",
                "orcid": "0000-0002-6706-5605",
                "clpid": "Zinn-K-G"
            },
            {
                "family_name": "Bjorkman",
                "given_name": "Pamela J.",
                "orcid": "0000-0002-2277-3990",
                "clpid": "Bjorkman-P-J"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>The design of proteins with tailored properties remains a central challenge in protein engineering, with profound implications for therapeutics, sustainable manufacturing, and environmental remediation. Recent advances in artificial intelligence have dramatically improved our ability to design novel proteins, yet the precision required for many applications remains elusive. This thesis details the development and implementation of closed-loop frameworks that integrate AI-guided protein design with quantitative experimental data to iteratively improve design outcomes.</p>\r\n\r\n<p>First, I present Protein CREATE (Computational Redesign via an Experiment-Augmented Training Engine), a high-throughput platform that combines phage display with molecular counting techniques to generate quantitative binding data at scale. This platform enables rapid evaluation of thousands (and is in the process of being scaled to millions) of designed protein variants against multiple targets simultaneously.</p>\r\n\r\n<p>In subsequent chapters, I explore two separate strands of protein design as they reach for each other to close the loop. One thread focuses on collecting data on binders I engineered to the interleukin 7 receptor alpha (IL7RA) and Insulin receptor while the other investigates the value data, even when limited, adds to improve the design process of enzymes to solve a pressing environmental remediation problem: cleaning up per and polyfluoroalkyl substances (PFAS).</p>\r\n\r\n<p>While all of the targets discussed so far have benefited from developments in artificial intelligence, I explore one target where the benefits are limited, the human sweet taste receptor. Here, I leverage alternative computational methods coupled to experimental testing to chart a course for design.</p>\r\n\r\n<p>Finally, I discuss the technologies we are integrating within the Protein CREATE framework to enable rapid in vitro and in vivo testing.</p>\r\n\r\n<p>Throughout my PhD, I have been bringing the two threads of computational design and experimental characterization closer together for not only theoretically interesting, but also practically relevant, engineering cases. The methodologies developed here represent a significant advancement in our ability to design proteins with precisely tailored properties for diverse applications.</p>",
        "doi": "10.7907/8can-jz97",
        "publication_date": "2025",
        "thesis_type": "phd",
        "thesis_year": "2025"
    },
    {
        "id": "thesis:17314",
        "collection": "thesis",
        "collection_id": "17314",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:05302025-191432965",
        "primary_object_url": {
            "basename": "caltech_thesis_Yujing.pdf",
            "content": "final",
            "filesize": 28670824,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/17314/1/caltech_thesis_Yujing.pdf",
            "version": "v6.0.0"
        },
        "type": "thesis",
        "title": "Exploring Cell Diversity in Complex Tissues through Spatial Genomics and Spatial Transcriptomics",
        "author": [
            {
                "family_name": "Yang",
                "given_name": "Yujing",
                "orcid": "0000-0002-2338-6263",
                "clpid": "Yang-Yujing"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Cai",
                "given_name": "Long",
                "orcid": "0000-0002-7154-5361",
                "clpid": "Cai-Long"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Guttman",
                "given_name": "Mitchell",
                "orcid": "0000-0003-4748-9352",
                "clpid": "Guttman-M"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Van Valen",
                "given_name": "David A.",
                "orcid": "0000-0001-7534-7621",
                "clpid": "Van-Valen-D"
            },
            {
                "family_name": "Cai",
                "given_name": "Long",
                "orcid": "0000-0002-7154-5361",
                "clpid": "Cai-Long"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "The study of cellular diversity is a fundamental requirement for understanding how multicellular organisms function. During the development of multicellular organisms, cells differentiate into various cell types with different molecular compositions, exhibit different phenotypes, and show distinct morphologies. Each single cell occupies a specific spatial location within different tissues and organs and performs a unique function. A holistic understanding of cells requires the integration of multiple \u201comics\u201d modalities, including genomics, epigenomics, transcriptomics, and proteomics. Current well-established single-cell sequencing methods have been used to build enormous single-cell transcriptomic atlases. While single-cell sequencing methods are now capable of multi-omic profiling, they all require cell dissociation, during which important spatial context information is lost. To study cellular diversity within its native spatial context, our lab has developed innovative spatial genomics and transcriptomics tools that enable multi-omics profiling at single-cell resolution while preserving intact tissue organization. This thesis presents two projects that leverage these tools to investigate cellular diversity in complex tissues across different biological scales, from subnuclear to tissue-level organization. In Chapter 2, we applied spatial multi-omics to the mouse cerebellum, achieving single-cell resolution profiling of 100,049 genomic loci, 17,856 nascent transcripts, 60 mature mRNAs, and 28 immunofluorescently labeled subnuclear structures. To achieve this, we developed innovative two-layer barcodes for DNA sequential fluorescence in situ hybridization (seqFISH). Combining cell-type information from nascent and mature transcriptomes, we captured the three-dimensional genomic architecture and its interactions with subnuclear compartments in a cell-type-specific manner. Our findings show that repressive chromatin compartments have greater cell-type specificity than active chromatin compartments in the mouse cerebellum. In Chapter 3, we integrated single-cell multiome sequencing, which profiles single-nucleus RNA and chromatin accessibility (ATAC) from the same cells, with seqFISH spatial transcriptomics. This approach was applied to the 17- to 18-week-old human fetal kidney, targeting 224 marker genes. By combining sequencing and spatial profiling data, we constructed a comprehensive developmental atlas of human kidney organogenesis, providing new insights into the tissue organization and gene expression patterns during kidney development.",
        "doi": "10.7907/r85x-qs80",
        "publication_date": "2025",
        "thesis_type": "phd",
        "thesis_year": "2025"
    },
    {
        "id": "thesis:17117",
        "collection": "thesis",
        "collection_id": "17117",
        "cite_using_url": "https://resolver.caltech.edu/CaltechThesis:03312025-203601435",
        "primary_object_url": {
            "basename": "KatsuyaColon_Thesis_Final.pdf",
            "content": "final",
            "filesize": 13949697,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/17117/1/KatsuyaColon_Thesis_Final.pdf",
            "version": "v4.0.0"
        },
        "type": "thesis",
        "title": "In Situ Signal Amplification for Spatial Transcriptomics Using Programmable DNA Assemblies",
        "author": [
            {
                "family_name": "Col\u00f3n",
                "given_name": "Katsuya Lex",
                "orcid": "0000-0002-7347-6128",
                "clpid": "Col\u00f3n-Katsuya-Lex"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Cai",
                "given_name": "Long",
                "orcid": "0000-0002-7154-5361",
                "clpid": "Cai-Long"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Ismagilov",
                "given_name": "Rustem F.",
                "orcid": "0000-0002-3680-4399",
                "clpid": "Ismagilov-R-F"
            },
            {
                "family_name": "Shapiro",
                "given_name": "Mikhail G.",
                "orcid": "0000-0002-0291-4215",
                "clpid": "Shapiro-M-G"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Cai",
                "given_name": "Long",
                "orcid": "0000-0002-7154-5361",
                "clpid": "Cai-Long"
            }
        ],
        "local_group": [
            {
                "literal": "div_chem"
            }
        ],
        "abstract": "Sequential Fluorescent In Situ Hybridization (seqFISH) has been an invaluable tool in imaging-based spatial transcriptomics, aiding researchers in elucidating spatially-resolved, gene expression patterns in intact tissues and cell culture models. However, methods that rely on smFISH, such as seqFISH, suffer from poor signal-to-noise ratio in certain tissue types or target RNA, require many fluorescently labeled RNA targeting probes which prohibits imaging of small RNA species, and exhibit poor sample throughput due to the need of high magnification objective or long exposure times. Herein, we develop solutions to these limitations by developing and utilizing a robust signal amplification strategy. While various amplification technologies exist, their limitations often hinder broad applicability. Moreover, we desire an amplification platform that is amenable to the denaturing wash conditions used in seqFISH. We will begin Chapter I by discussing the background, technical challenges, and utility of various in situ signal amplification technologies. Chapter II details the exploration and technical limitations of rolling circle amplification (RCA) and branched DNA (bDNA) assembly utilizing ssDNA padlock amplifier strands. Chapter III discusses the design and development of a novel amplification strategy called Signal amPlicAtion by Recursive Crosslinking (SPARC), which builds upon the knowledge gained from Chapter II. We highlight SPARC as a unique photochemical signal amplification method that iteratively deposits amplifier strands near the primary probe target for linear signal amplification. Then, the deposited amplifier strands act as a scaffold for branched DNA assembly, leading to an exponential signal amplification. Through each deposition and assembly step, amplifier strands are photo-crosslinked to the extracellular matrix, forming highly stable DNA nanostructures that can withstand harsh denaturing wash conditions. We demonstrate the utility of SPARC in amplifying signal of both single-molecule transcripts and proteins.",
        "doi": "10.7907/pp5f-pk64",
        "publication_date": "2025",
        "thesis_type": "phd",
        "thesis_year": "2025"
    },
    {
        "id": "thesis:17070",
        "collection": "thesis",
        "collection_id": "17070",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:03182025-023751137",
        "type": "thesis",
        "title": "RNA-Mediated Toxicity In Neurodegeneration: The Mechanistic Role Of The C9ORF72 Repeat Expansion In ALS Molecular Pathogenesis",
        "author": [
            {
                "family_name": "Bhattacharya",
                "given_name": "Paulomi",
                "clpid": "Bhattacharya-Paulomi"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Guttman",
                "given_name": "Mitchell",
                "orcid": "0000-0003-4748-9352",
                "clpid": "Guttman-M"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Shapiro",
                "given_name": "Mikhail G.",
                "orcid": "0000-0002-0291-4215",
                "clpid": "Shapiro-M-G"
            },
            {
                "family_name": "Ichida",
                "given_name": "Justin K.",
                "orcid": "0000-0002-8827-8087",
                "clpid": "Ichida-Justin-K"
            },
            {
                "family_name": "Lester",
                "given_name": "Henry A.",
                "orcid": "0000-0002-5470-5255",
                "clpid": "Lester-H-A"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "The G4C2 hexanucleotide repeat expansion in the first intron of the C9ORF72 gene is the most common genetic mutation linked to ALS, accounting for ~40 percent of familial and 10 percent of sporadic cases. Yet, its functional contribution to molecular pathogenesis remains unknown. The prevailing model is that this expansion leads to transcription of a novel RNA (C9-repeat RNA) that leads to disease either through its RNA product or translation of dipeptide repeat proteins it encodes (\u201cgain-of-function\u201d). However, recent attempts to degrade the C9-repeat RNA in several major clinical trials have failed to show any improvement in C9-ALS patients, raising questions about what role, if any, the C9-repeat RNA plays in ALS pathogenesis. Here, we demonstrate that the C9-repeat RNA is not detectable in C9-ALS patient-derived iPSNs or postmortem brain tissue. We show that transcription of the C9ORF72 gene initiates downstream of the G4C2 repeat sequence with the repeat expansion residing at a promoter-proximal region and displaying chromatin signatures of an enhancer. Because this region is GC-rich and has been reported to be preferentially methylated in C9-ALS patients, we explored whether this repeat expansion might lead to reduced C9ORF72 gene expression. We show that the C9-repeat is associated with reduced allele-specific expression of the C9ORF72 gene, consistent with the GC-rich features of the repeat expansion and previous reports of preferential DNA methylation in C9-ALS patients. Taken together, our findings challenge the prevailing gain-of-function models in C9-ALS and instead suggest that the repeat expansion region may function as a regulatory element that silences C9ORF72 expression from the mutant allele.",
        "doi": "10.7907/2ywx-7a47",
        "publication_date": "2025",
        "thesis_type": "phd",
        "thesis_year": "2025"
    },
    {
        "id": "thesis:17219",
        "collection": "thesis",
        "collection_id": "17219",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:05112025-035044867",
        "type": "thesis",
        "title": "Bridging Space and Time: Resolving the Temporal Dynamics of the Seminiferous Epithelial Cycle Using Spatial Transcriptomics",
        "author": [
            {
                "family_name": "Chakravorty",
                "given_name": "Arun",
                "orcid": "0000-0003-2890-0855",
                "clpid": "Chakravorty-Arun"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Cai",
                "given_name": "Long",
                "orcid": "0000-0002-7154-5361",
                "clpid": "Cai-Long"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Guttman",
                "given_name": "Mitchell",
                "orcid": "0000-0003-4748-9352",
                "clpid": "Guttman-M"
            },
            {
                "family_name": "Elowitz",
                "given_name": "Michael B.",
                "orcid": "0000-0002-1221-0967",
                "clpid": "Elowitz-M-B"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Cai",
                "given_name": "Long",
                "orcid": "0000-0002-7154-5361",
                "clpid": "Cai-Long"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "Biology is inherently spatial, with tissue architecture and cell\u2013cell interactions shaping dynamic developmental and homeostatic processes. In this thesis, we harness high-resolution spatial transcriptomics via RNA seqFISH+ to show how spatial information can be used to resolve temporal information in complex tissues, using adult mouse spermatogenesis as a model. By profiling 2,638 genes in over 216,000 cells, we find that each seminiferous tubule cross-section represents a distinct timepoint of the seminiferous epithelial cycle, and collectively all tubules form a circular topology in gene expression space that precisely aligns with the known 12-stage progression. Intriguingly, Sertoli cells exhibit a robust cyclic transcriptional program synchronized with germ cell differentiation, raising the question of whether this cycle is driven solely by germ cells or whether Sertoli cells display an intrinsic cyclic expression profile. To address this, we ablate differentiating germ cells using a DNA alkylating agent, busulfan. In this model, despite the lack of differentiating germ cells, Sertoli cells maintain much of their cyclic expression suggesting an autonomous cycle that partially dephases without germ cell input. Integrative analyses suggest that the underlying mechanism of this oscillation may involve an innate retinoic acid metabolic cycle and/or an interconnected transcription factor network. Finally, we discuss how these findings broaden our understanding of tissue processes and propose that spatial transcriptomics can be adopted to reconstruct temporal dynamics for many tissues from static snapshots.",
        "doi": "10.7907/2rcd-0v79",
        "publication_date": "2025",
        "thesis_type": "phd",
        "thesis_year": "2025"
    },
    {
        "id": "thesis:16459",
        "collection": "thesis",
        "collection_id": "16459",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:06012024-054725051",
        "primary_object_url": {
            "basename": "240531_PB_thesis_final.pdf",
            "content": "final",
            "filesize": 44817586,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/16459/1/240531_PB_thesis_final.pdf",
            "version": "v4.0.0"
        },
        "type": "thesis",
        "title": "Modeling and Design of Synthetic Biochemical Circuits for Biological Phenotypes",
        "author": [
            {
                "family_name": "Bhamidipati",
                "given_name": "Pranav Subramanyam",
                "orcid": "0000-0002-6199-6505",
                "clpid": "Bhamidipati-Pranav-Subramanyam"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Elowitz",
                "given_name": "Michael B.",
                "orcid": "0000-0002-1221-0967",
                "clpid": "Elowitz-M-B"
            },
            {
                "family_name": "Bois",
                "given_name": "Justin S.",
                "orcid": "0000-0001-7137-8746",
                "clpid": "Bois-J-S"
            },
            {
                "family_name": "Barr",
                "given_name": "Alan H.",
                "clpid": "Barr-A-H"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>Biological behaviors arise from the dynamical interactions of biochemical networks. For example, the various immune responses to damage are manifestations of signaling networks between immune cell types. A central goal in systems and synthetic biology is to elucidate the design principles of these networks, or circuits, both in the sense of dissecting how function arises from structure in the natural context and in the sense of understanding the guidelines for optimal engineering of synthetic biological systems. The study of design principles in both senses is aided by mathematical modeling and simulation, which provide a self-consistent framework for evaluating the theoretical implications of biological hypotheses as well as a testbed for the development of novel circuits for desired biological phenotypes. This thesis pertains to two related challenges in this field, namely the scaling of computational design to larger circuits and the engineering of global phenotypes that emerge nonlinearly from local interactions.</p> \r\n    \r\n<p>The first section of this thesis presents a novel design platform for biological circuits, called CircuiTree, that uses a game-playing paradigm to overcome the combinatorial complexity of \\textit{de novo} circuit design. This platform treats circuit design as a game of circuit assembly and traverses the tree of possible assemblies using Monte Carlo tree search (MCTS). Borrowed from artificial intelligence (AI) agents that have mastered complex games, MCTS is a reinforcement learning (RL)-based search algorithm that efficiently searches for the most effective design strategies and naturally discovers design principles in the form of network motifs, which appear as clusters of solutions in the search tree. Finally, when tasked with designing fault-tolerant oscillators with five components, CircuiTree finds a novel design strategy, which we call motif multiplexing, in which multiple sub-oscillators are interleaved so as to render the circuit highly resistant to deletions and knockdowns. This design principle, which may be responsible for the multiple oscillatory loops observed in eukaryotic circadian clocks, opens the possibility of engineering synthetic circuits at a larger scale and suggests that larger biological circuits contain yet-unknown design features that are not simply extensions of smaller circuits.</p>\r\n\r\n<p>The second section describes a novel mechanosensitive property of the SynNotch synthetic chimeric receptor and uses a multicellular modeling framework to show how it can be used to control spatiotemporal patterning \\textit{in vitro}. Modified from the endogenous juxtacrine receptor Notch, SynNotch binds to an arbitrary extracellular ligand and, in response, releases an arbitrary transcription factor, thus acting as a user-defined signal transducer. We show that, in mouse fibroblasts, a simple sender-receiver SynNotch circuit ceases to transduce a membrane-bound GFP signal at high cell densities in 2D culture. Because of this feature, a lawn of cells expressing a signal-relay circuit, which we call the transceiver circuit, can undergo spatially limited activation, where the signal propagates in a wave outward from a GFP-expressing sender cell until, due to cell division, the cell density crosses a threshold value and the signaling system shuts down. Using a multicellular lattice-based model combined with experiments, we demonstrate that perturbations of growth parameters can be used to control the size of activated spots. Finally, we achieve spatiotemporal patterns of activation by seeding the growth dish nonuniformly, creating a wave of activation at the millimeter scale that recapitulates the kinematic wave patterning phenomenon observed during vertebrate somitogenesis.</p>\r\n\r\n<p>Together, this body of work represents an advance in the use of computational methods and mathematical modeling to guide the design and control of complex biological phenotypes. Advances in these methods promise to catalyze the development of more advanced cell-based therapies and engineered tissues.</p>",
        "doi": "10.7907/gpc6-hb40",
        "publication_date": "2024-06-14",
        "thesis_type": "phd",
        "thesis_year": "2024"
    },
    {
        "id": "thesis:16431",
        "collection": "thesis",
        "collection_id": "16431",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:05282024-221603734",
        "primary_object_url": {
            "basename": "MorganSchwartz_Thesis_20240601.pdf",
            "content": "final",
            "filesize": 21199702,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/16431/2/MorganSchwartz_Thesis_20240601.pdf",
            "version": "v5.0.0"
        },
        "type": "thesis",
        "title": "Accelerating Biological Discovery with Deep Learning and Spatial Optical Barcodes",
        "author": [
            {
                "family_name": "Schwartz",
                "given_name": "Morgan Sarah",
                "orcid": "0000-0001-8131-9125",
                "clpid": "Schwartz-Morgan-Sarah"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Van Valen",
                "given_name": "David A.",
                "orcid": "0000-0001-7534-7621",
                "clpid": "Van-Valen-D"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Rothenberg",
                "given_name": "Ellen V.",
                "orcid": "0000-0002-3901-347X",
                "clpid": "Rothenberg-E-V"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Cai",
                "given_name": "Long",
                "orcid": "0000-0002-7154-5361",
                "clpid": "Cai-Long"
            },
            {
                "family_name": "Sternberg",
                "given_name": "Paul W.",
                "orcid": "0000-0002-7699-0173",
                "clpid": "Sternberg-P-W"
            },
            {
                "family_name": "Van Valen",
                "given_name": "David A.",
                "orcid": "0000-0001-7534-7621",
                "clpid": "Van-Valen-D"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "Methodological advances in biology have given us a powerful suite of tools for measuring the state of the cell. Among these methods, next-generation sequencing, including single-cell methods, enables comprehensive measurement of gene expression; however, sequencing-based methods often preclude the collection of other visible phenotypic information. In contrast, light microscopy supports many different measurements that can be acquired in sequential rounds of labeling and imaging because light microscopy does not destroy the sample. Furthermore, light microscopy supports live cell imaging, including the use of fluorescent reporters to observe signaling dynamics in real time. In order to fully understand cellular function, multimodal data collection is needed that encompasses live cell response, end-point phenotypes, and finally perturbations to test the components of relevant signaling networks. In this thesis, I present key advances to create a unified experimental platform for interrogating the cell state. This platform uses light microscopy to collect multimodal measurements of cell state while supporting high-throughput perturbation screening. This platform is supported by a suite of deep learning analysis tools to enable quantitative analysis of these high-dimensional datasets. In Chapter 2, I introduce Caliban, our deep learning method for nuclear segmentation and tracking. In Chapter 3, I present a new method of optical barcodes to enable microscopy-based pooled perturbation screens. Finally, in Chapter 4, I describe preliminary work that leverages the previously described cell tracking and barcoding methodologies to explore the interdependencies of signaling pathway dynamics.",
        "doi": "10.7907/55c7-8142",
        "publication_date": "2024",
        "thesis_type": "phd",
        "thesis_year": "2024"
    },
    {
        "id": "thesis:16213",
        "collection": "thesis",
        "collection_id": "16213",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:10232023-184021847",
        "primary_object_url": {
            "basename": "saladi-dissertation.pdf",
            "content": "final",
            "filesize": 327264663,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/16213/1/saladi-dissertation.pdf",
            "version": "v6.0.0"
        },
        "type": "thesis",
        "title": "Some Computer Studies of Membrane Proteins, Molecular Chaperones, and Color",
        "author": [
            {
                "family_name": "Saladi",
                "given_name": "Shyam Madhukar",
                "orcid": "0000-0001-9701-3059",
                "clpid": "Saladi-Shyam-Madhukar"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Clemons",
                "given_name": "William M.",
                "orcid": "0000-0002-0021-889X",
                "clpid": "Clemons-W-M"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Murray",
                "given_name": "Richard M.",
                "orcid": "0000-0002-5785-7481",
                "clpid": "Murray-R-M"
            },
            {
                "family_name": "Clemons",
                "given_name": "William M.",
                "orcid": "0000-0002-0021-889X",
                "clpid": "Clemons-W-M"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Rees",
                "given_name": "Douglas C.",
                "orcid": "0000-0003-4073-1185",
                "clpid": "Rees-D-C"
            },
            {
                "family_name": "Hoelz",
                "given_name": "Andre",
                "orcid": "0000-0003-0923-3284",
                "clpid": "Hoelz-A"
            }
        ],
        "local_group": [
            {
                "literal": "div_chem"
            }
        ],
        "abstract": "This thesis shares a series of stories on seemingly disparate topics united by my efforts and love of computers. Initially, I discuss how the challenge of membrane protein expression provided an initial impetus for research. I channeled efforts towards developing a predictive (machine-learning) model for heterologous overexpression in E. coli. While we made strides to extend this model to other systems (not discussed here), my time was refocused onto questions of more fundamental biochemical interest: the biogenesis of tail-anchored membrane proteins. I built structural, predictive, and phylogenetic models to better understand how the C-terminal domain of co-chaperone Sgt2 functioned, refined the definition of the wider Sti1 family which includes Sgt2-C, and extended our understanding of those features of tail-anchored proteins that determine successful targeting in Yeast and Human cells. I developed a deep phylogeny of Get3, a chaperone involved in tail-anchored protein biogenesis, and helped specifically place Get3 proteins of photosynthesising organisms into evolutionary context. Along the way, I developed a parallel and compelling theme around data visualization, specifically around the use of colormaps across the life sciences. In particular, I built an application to screen and notify preprint authors when their manuscript had poor colormap usage. This was the first time automated software has been used to help authors improve their work at the preprint stage, an area that has grown significantly since my initial work. Finally, I brought together structural biology and data visualization by making perceptually uniform colormaps available in popular molecular visualization software tools to advocate for more thoughtful color usage in the field.",
        "doi": "10.7907/40cw-kn70",
        "publication_date": "2024",
        "thesis_type": "phd",
        "thesis_year": "2024"
    },
    {
        "id": "thesis:16486",
        "collection": "thesis",
        "collection_id": "16486",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:06032024-182223499",
        "primary_object_url": {
            "basename": "Thesis_Draft_final_final.pdf",
            "content": "final",
            "filesize": 21944874,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/16486/1/Thesis_Draft_final_final.pdf",
            "version": "v4.0.0"
        },
        "type": "thesis",
        "title": "Revealing Regulatory Network Organization Through Single-Cell Perturbation Profiling and Maximum Entropy Models",
        "author": [
            {
                "family_name": "Jiang",
                "given_name": "Jialong",
                "orcid": "0000-0001-8560-8397",
                "clpid": "Jiang-Jialong"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Elowitz",
                "given_name": "Michael B.",
                "orcid": "0000-0002-1221-0967",
                "clpid": "Elowitz-M-B"
            },
            {
                "family_name": "Phillips",
                "given_name": "Robert B.",
                "orcid": "0000-0003-3082-2809",
                "clpid": "Phillips-R"
            },
            {
                "family_name": "Pachter",
                "given_name": "Lior S.",
                "orcid": "0000-0002-9164-6231",
                "clpid": "Pachter-L"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "Gene regulatory networks within cells modulate the expression of the genome in response to signals and changing environmental conditions. Reconstructions of gene regulatory networks can reveal the information processing and control principles used by cells to maintain homeostasis and execute cell-state transitions. In this thesis, we introduce a computational framework, D-SPIN, that generates quantitative models of gene regulatory networks from single-cell mRNA-seq datasets collected across thousands of distinct perturbation conditions. D-SPIN models the cell as a collection of interacting gene-expression programs, and constructs a probabilistic model to infer regulatory interactions between gene-expression programs and external perturbations. Using large Perturb-seq and drug-response datasets, we demonstrate that D-SPIN models reveal the organization of cellular pathways, sub-functions of macromolecular complexes, and the logic of cellular regulation of transcription, translation, metabolism, and protein degradation in response to gene knockdown perturbations. D-SPIN can also be applied to dissect drug response mechanisms in heterogeneous cell populations, elucidating how combinations of immunomodulatory drugs can induce novel cell states through additive recruitment of gene expression programs. D-SPIN provides a computational framework for constructing interpretable models of gene-regulatory networks to reveal principles of cellular information processing and physiological control.",
        "doi": "10.7907/5zta-9818",
        "publication_date": "2024",
        "thesis_type": "phd",
        "thesis_year": "2024"
    },
    {
        "id": "thesis:16459",
        "collection": "thesis",
        "collection_id": "16459",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:06012024-054725051",
        "primary_object_url": {
            "basename": "240531_PB_thesis_final.pdf",
            "content": "final",
            "filesize": 44817586,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/16459/1/240531_PB_thesis_final.pdf",
            "version": "v4.0.0"
        },
        "type": "thesis",
        "title": "Modeling and Design of Synthetic Biochemical Circuits for Biological Phenotypes",
        "author": [
            {
                "family_name": "Bhamidipati",
                "given_name": "Pranav Subramanyam",
                "orcid": "0000-0002-6199-6505",
                "clpid": "Bhamidipati-Pranav-Subramanyam"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Elowitz",
                "given_name": "Michael B.",
                "orcid": "0000-0002-1221-0967",
                "clpid": "Elowitz-M-B"
            },
            {
                "family_name": "Bois",
                "given_name": "Justin S.",
                "orcid": "0000-0001-7137-8746",
                "clpid": "Bois-J-S"
            },
            {
                "family_name": "Barr",
                "given_name": "Alan H.",
                "clpid": "Barr-A-H"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>Biological behaviors arise from the dynamical interactions of biochemical networks. For example, the various immune responses to damage are manifestations of signaling networks between immune cell types. A central goal in systems and synthetic biology is to elucidate the design principles of these networks, or circuits, both in the sense of dissecting how function arises from structure in the natural context and in the sense of understanding the guidelines for optimal engineering of synthetic biological systems. The study of design principles in both senses is aided by mathematical modeling and simulation, which provide a self-consistent framework for evaluating the theoretical implications of biological hypotheses as well as a testbed for the development of novel circuits for desired biological phenotypes. This thesis pertains to two related challenges in this field, namely the scaling of computational design to larger circuits and the engineering of global phenotypes that emerge nonlinearly from local interactions.</p> \r\n    \r\n<p>The first section of this thesis presents a novel design platform for biological circuits, called CircuiTree, that uses a game-playing paradigm to overcome the combinatorial complexity of \\textit{de novo} circuit design. This platform treats circuit design as a game of circuit assembly and traverses the tree of possible assemblies using Monte Carlo tree search (MCTS). Borrowed from artificial intelligence (AI) agents that have mastered complex games, MCTS is a reinforcement learning (RL)-based search algorithm that efficiently searches for the most effective design strategies and naturally discovers design principles in the form of network motifs, which appear as clusters of solutions in the search tree. Finally, when tasked with designing fault-tolerant oscillators with five components, CircuiTree finds a novel design strategy, which we call motif multiplexing, in which multiple sub-oscillators are interleaved so as to render the circuit highly resistant to deletions and knockdowns. This design principle, which may be responsible for the multiple oscillatory loops observed in eukaryotic circadian clocks, opens the possibility of engineering synthetic circuits at a larger scale and suggests that larger biological circuits contain yet-unknown design features that are not simply extensions of smaller circuits.</p>\r\n\r\n<p>The second section describes a novel mechanosensitive property of the SynNotch synthetic chimeric receptor and uses a multicellular modeling framework to show how it can be used to control spatiotemporal patterning \\textit{in vitro}. Modified from the endogenous juxtacrine receptor Notch, SynNotch binds to an arbitrary extracellular ligand and, in response, releases an arbitrary transcription factor, thus acting as a user-defined signal transducer. We show that, in mouse fibroblasts, a simple sender-receiver SynNotch circuit ceases to transduce a membrane-bound GFP signal at high cell densities in 2D culture. Because of this feature, a lawn of cells expressing a signal-relay circuit, which we call the transceiver circuit, can undergo spatially limited activation, where the signal propagates in a wave outward from a GFP-expressing sender cell until, due to cell division, the cell density crosses a threshold value and the signaling system shuts down. Using a multicellular lattice-based model combined with experiments, we demonstrate that perturbations of growth parameters can be used to control the size of activated spots. Finally, we achieve spatiotemporal patterns of activation by seeding the growth dish nonuniformly, creating a wave of activation at the millimeter scale that recapitulates the kinematic wave patterning phenomenon observed during vertebrate somitogenesis.</p>\r\n\r\n<p>Together, this body of work represents an advance in the use of computational methods and mathematical modeling to guide the design and control of complex biological phenotypes. Advances in these methods promise to catalyze the development of more advanced cell-based therapies and engineered tissues.</p>",
        "doi": "10.7907/gpc6-hb40",
        "publication_date": "2024-06-14",
        "thesis_type": "phd",
        "thesis_year": "2024"
    },
    {
        "id": "thesis:16105",
        "collection": "thesis",
        "collection_id": "16105",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:06112023-211027828",
        "type": "thesis",
        "title": "Stem Cell-Derived Embryo Models in Mouse and Human to Illuminate the \u201cBlack Box\u201d of Pre- to Post-Implantation Development",
        "author": [
            {
                "family_name": "Jorgensen",
                "given_name": "Victoria Lynn",
                "orcid": "0000-0002-4205-6198",
                "clpid": "Jorgensen-Victoria-Lynn"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Zernicka-Goetz",
                "given_name": "Magdalena",
                "orcid": "0000-0002-7004-2471",
                "clpid": "Zernicka-Goetz-M"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Guttman",
                "given_name": "Mitchell",
                "orcid": "0000-0003-4748-9352",
                "clpid": "Guttman-M"
            },
            {
                "family_name": "Hay",
                "given_name": "Bruce A.",
                "orcid": "0000-0002-5486-0482",
                "clpid": "Hay-B-A"
            },
            {
                "family_name": "Parker",
                "given_name": "Joseph",
                "orcid": "0000-0001-9598-2454",
                "clpid": "Parker-J"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Zernicka-Goetz",
                "given_name": "Magdalena",
                "orcid": "0000-0002-7004-2471",
                "clpid": "Zernicka-Goetz-M"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>Mammalian development is a complex and highly regulated process by which a single cell, the totipotent zygote, gives rise to all lineages of the future organism.  While incredible advancements have been made to study and understand the earliest events of our life, many questions are still unanswered. Moreover, the most precarious stage of development, implantation, remains a \u201cblack box\u201d to researchers due to inaccessibility of the embryo within the uterus of the mother. In the last decade, however, the emergence of stem cell derived embryos represents an exciting alternative avenue to study these dynamic stages.</p>\r\n\r\n<p>During my PhD, I worked to establish two pre-implantation stem cell models, one in human and one in mouse, to better understand the earliest days of mammalian development. These models replicate the blastocyst stage of development; at this point in time the embryo is ready to implant into the uterus and contains all embryonic and extra-embryonic tissues needed to form the future organism: the epiblast, the hypoblast, and the trophectoderm. Beginning with my human model, I demonstrate the ability of a single cell type, expanded potential stem cells (EPSCs), to give rise to structures that replicate the natural blastocyst in size, morphology, and initiation of lineage segregation. Furthermore, these human blastocyst-like structures can undergo the very beginning of post-implantation remodeling by forming an epiblast rosette and initiating lumenogenesis.  Nevertheless, single cell RNA-seq (scRNA-seq) analysis reveals that lineages are not fully committed in this model, perhaps explaining why development is limited in these structures up to about Day 7/8. In the context of my mouse model, I combine not one but three distinct cell types to generate blastocyst-like structures: 1) wildtype embryonic stem cells (ESCs) to form the epiblast, 2) trophoblast stem cells (TSCs) to form the trophectoderm, and 3) Gata4-inducible ESCs to form the primitive endoderm. Again, these structures mimic the natural mouse blastocyst in morphology and lineage segregation and demonstrate the ability to transition to post-implantation stages. Development of the three blastocyst lineages was further confirmed via global scRNA-seq analysis comparing our Gata4i-Blastoids to natural embryos; importantly, however, this analysis also showed that differentiation of the mural trophectoderm, the tissue responsible for uterine invasion, is lacking in our stem cell model and likely explains the inability for these blastoids to implant <i>in vivo</i>.</p>\r\n\r\n<p>Altogether, this dissertation explains key aspects of pre- to post-implantation development and highlights the incredible power of stem cell-derived embryos to self-organize into structures that closely mimic the natural embryo.</p>",
        "doi": "10.7907/t1fe-3915",
        "publication_date": "2023",
        "thesis_type": "phd",
        "thesis_year": "2023"
    },
    {
        "id": "thesis:14986",
        "collection": "thesis",
        "collection_id": "14986",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:07252022-061122576",
        "primary_object_url": {
            "basename": "Thesis Ronghui Zhu.pdf",
            "content": "final",
            "filesize": 31903908,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/14986/1/Thesis Ronghui Zhu.pdf",
            "version": "v4.0.0"
        },
        "type": "thesis",
        "title": "Multicellular Circuit Design in Mammalian Cells",
        "author": [
            {
                "family_name": "Zhu",
                "given_name": "Ronghui",
                "orcid": "0000-0001-8171-482X",
                "clpid": "Zhu-Ronghui"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Elowitz",
                "given_name": "Michael B.",
                "orcid": "0000-0002-1221-0967",
                "clpid": "Elowitz-M-B"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Hay",
                "given_name": "Bruce A.",
                "orcid": "0000-0002-5486-0482",
                "clpid": "Hay-B-A"
            },
            {
                "family_name": "Bjorkman",
                "given_name": "Pamela J.",
                "orcid": "0000-0002-2277-3990",
                "clpid": "Bjorkman-P-J"
            },
            {
                "family_name": "Murray",
                "given_name": "Richard M.",
                "orcid": "0000-0002-5785-7481",
                "clpid": "Murray-R-M"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Elowitz",
                "given_name": "Michael B.",
                "orcid": "0000-0002-1221-0967",
                "clpid": "Elowitz-M-B"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>Multicellular circuits control the development of multicellular organisms, through programming processes such as cell proliferation, cell differentiation, cell movement, and cell signaling. A fundamental goal of biology is to understand the design principles of these multicellular circuits, and use these principles to design synthetic multicellular systems for therapeutic purposes. Top-down approaches, for example analyzing embryos bearing genetic mutations, have identified key genes in many multicellular circuits, but are challenging to study these circuits in an isolated context and in a quantitative and systematic manner. An alternative, complementary approach is to engineer or reconstitute multicellular circuits from bottom-up, which allows us to overcome the limitations of top-down approach and gain quantitative insights into multicellular circuit design. In this thesis, we use this bottom-up approach to explore the design principles of two multicellular circuits. In the first project, we took inspiration from two prevalent features from natural multistable circuits, namely competitive protein-protein interactions and positive autoregulation, to design a synthetic multistable circuit architecture called MultiFate. Both in the model and in the experiment, MultiFate circuits generate multiple cellular states, each stable for weeks, allow control over state-switching and state stability, and can be easily expanded to generate more states. In the second project, we use a gradient reconstitution system to systematically analyze a gradient modulation circuit consisting of BMP4 and its modulators, Chordin, Twsg and BMP-1. We found that the circuit can give rise to diverse gradient modulation capabilities. In particular, the full circuit is sufficient for active ligand shuttling and generation of non-monotonic displaced gradient. These multicellular circuits could provide a foundation for engineering synthetic multicellular systems in mammalian cells.</p>",
        "doi": "10.7907/p0fn-qa56",
        "publication_date": "2023",
        "thesis_type": "phd",
        "thesis_year": "2023"
    },
    {
        "id": "thesis:15008",
        "collection": "thesis",
        "collection_id": "15008",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:08252022-153300158",
        "primary_object_url": {
            "basename": "hirokawa_soichi_thesis.pdf",
            "content": "final",
            "filesize": 23936329,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/15008/4/hirokawa_soichi_thesis.pdf",
            "version": "v7.0.0"
        },
        "type": "thesis",
        "title": "Dynamics of Protein-Mediated Polymer Coupling and their Implications in Antibody Production and Emergent Patterning",
        "author": [
            {
                "family_name": "Hirokawa",
                "given_name": "Soichi",
                "orcid": "0000-0001-5584-2676",
                "clpid": "Hirokawa-Soichi"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Phillips",
                "given_name": "Robert B.",
                "orcid": "0000-0003-3082-2809",
                "clpid": "Phillips-R"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Schwab",
                "given_name": "Keith C.",
                "orcid": "0000-0001-8216-4815",
                "clpid": "Schwab-K-C"
            },
            {
                "family_name": "Phillips",
                "given_name": "Robert B.",
                "orcid": "0000-0003-3082-2809",
                "clpid": "Phillips-R"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Hsieh",
                "given_name": "David",
                "orcid": "0000-0002-0812-955X",
                "clpid": "Hsieh-David"
            }
        ],
        "local_group": [
            {
                "literal": "div_eng"
            }
        ],
        "abstract": "<p>Proteins serve a wide range of functions in and out of the cell, from signaling and gene regulation to transport and structural reinforcement. These functions are usually carried out from interactions with other molecules in the surrounding medium such as other proteins, small molecules, or DNA. One such class of proteins are what I will call polymer-coupling proteins: these proteins intentionally link identical polymers or two regions of the same polymer together so that their coupled interactions critically affect the state of the biological system. A vast array of such proteins exist in nature with roles such as the looping of DNA to physically inhibit the expression of a gene or the formation of the cytoskeleton which provides a cell with its shape. In this thesis, I use <i>in vitro</i> experimental methods to explore two cases of coupling proteins and understand their roles not only in reorganizing their complementary polymers but influencing the final state of their respective systems. </p>\r\n\r\n<p>In Chapter 2, I examine the starting process for the assembly of an antibody-encoding gene in developing immune cells. Motivated by data suggesting that some antibodies are less likely to be made than others, I explore how the early steps of constructing an antibody-encoding gene affect this uneven frequency of assembly. To initiate recombination, the recombination-activating gene (RAG) protein complex simultaneously binds and cuts two well-recognized sequences neighboring two antibody-encoding gene segments in order to allow other proteins to combine these exposed segments together. The sequences to which the RAG protein performs its binding and cutting functions have certain identifiable sequence patterns but can still vary. Through a single-molecule experimental method known as tethered particle motion (TPM) I show how changes to the binding site sequence can enhance or diminish the propensity of the RAG protein to bind and cut the DNA and thus explore the consequences of these altered interactions in the unequal selection for certain antibody gene segments over others. </p>\r\n\r\n<p>In Chapter 3, I turn to questions of the emergence of order from self-organization in biological systems. From the molecular to the population scale, biology constantly demonstrates that with an injection of energy, systems can be driven out of equilibrium and allow for the organization of its constituents. A case of such organization in cells is the coupling of microtubules by motor proteins to create and maintain the mitotic spindle, a critical biological architecture for ensuring that each cell obtains a copy of the genome during division. <i>In vitro</i> experiments that exploit similar motor-microtubule interactions have become a convenient way to identify the effects of perturbing a key player such as motor properties or boundary conditions of the system on the spatiotemporal extent of organization. However, in many instances, the dynamics under which such cytoskeletal systems reduce their entropy over the course of creating order have not been carefully examined in experimental systems. Here, I use engineered light-dimerizable motors that can give rise to the formation of a highly connected network that compacts to form a dense, organized structure, and through the use of a noninvasive imaging technique observe how the polymers that make up the network continually reorganize in the bulk during a global contraction of the network.</p>",
        "doi": "10.7907/fpmm-a552",
        "publication_date": "2023",
        "thesis_type": "phd",
        "thesis_year": "2023"
    },
    {
        "id": "thesis:15041",
        "collection": "thesis",
        "collection_id": "15041",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:10132022-000100592",
        "primary_object_url": {
            "basename": "bernstein_thesis.pdf",
            "content": "final",
            "filesize": 2113002,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/15041/1/bernstein_thesis.pdf",
            "version": "v5.0.0"
        },
        "type": "thesis",
        "title": "Optimisation & Generalisation in Networks of Neurons",
        "author": [
            {
                "family_name": "Bernstein",
                "given_name": "Jeremy David",
                "orcid": "0000-0001-9110-7476",
                "clpid": "Bernstein-Jeremy-David"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Yue",
                "given_name": "Yisong",
                "orcid": "0000-0001-9127-1989",
                "clpid": "Yue-Yisong"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Tropp",
                "given_name": "Joel A.",
                "orcid": "0000-0003-1024-1791",
                "clpid": "Tropp-J-A"
            },
            {
                "family_name": "Liu",
                "given_name": "Ming-Yu",
                "orcid": "0000-0002-2951-2398",
                "clpid": "Liu-Ming-Yu"
            },
            {
                "family_name": "Meister",
                "given_name": "Markus",
                "orcid": "0000-0003-2136-6506",
                "clpid": "Meister-M"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Yue",
                "given_name": "Yisong",
                "orcid": "0000-0001-9127-1989",
                "clpid": "Yue-Yisong"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>The goal of this thesis is to develop the optimisation and generalisation theoretic foundations of learning in artificial neural networks. The thesis tackles two central questions. Given training data and a network architecture:</p>\r\n\r\n<ol>\r\n<li style=\"text-align:left\"><span style=\"padding-left:10px\">Which weight setting will generalise best to unseen data, and why?</span></li>\r\n<li style=\"text-align:left\"><span style=\"padding-left:10px\">What optimiser should be used to recover this weight setting?</span></li>\r\n</ol>\r\n\r\n<p>On optimisation, an essential feature of neural network training is that the network weights affect the loss function only indirectly through their appearance in the network architecture. This thesis proposes a three-step framework for deriving novel \u201carchitecture aware\u201d optimisation algorithms. The first step\u2014termed <em>functional majorisation</em>\u2014is to majorise a series expansion of the loss function in terms of functional perturbations. The second step is to derive <em>architectural perturbation bounds</em> that relate the size of functional perturbations to the size of weight perturbations. The third step is to substitute these architectural perturbation bounds into the functional majorisation of the loss and to obtain an optimisation algorithm via minimisation. This constitutes an application of the <em>majorise-minimise meta-algorithm</em> to neural networks.</p>\r\n\r\n<p>On generalisation, a promising recent line of work has applied PAC-Bayes theory to derive non-vacuous generalisation guarantees for neural networks. Since these guarantees control the average risk of ensembles of networks, they do not address which individual network should generalise best. To close this gap, the thesis rekindles an old idea from the kernels literature: the <em>Bayes point machine</em>. A Bayes point machine is a single classifier that approximates the aggregate prediction of an ensemble of classifiers. Since aggregation reduces the variance of ensemble predictions, Bayes point machines tend to generalise better than other ensemble members. The thesis shows that the space of neural networks consistent with a training set concentrates on a Bayes point machine if both the network width and normalised margin are sent to infinity. This motivates the practice of returning a wide network of large normalised margin.</p>\r\n\r\n<p>Potential applications of these ideas include novel methods for uncertainty quantification, more efficient numerical representations for neural hardware, and optimisers that transfer hyperparameters across learning problems.</p>",
        "doi": "10.7907/1jz8-5t85",
        "publication_date": "2023",
        "thesis_type": "phd",
        "thesis_year": "2023"
    },
    {
        "id": "thesis:15123",
        "collection": "thesis",
        "collection_id": "15123",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:03172023-050019811",
        "primary_object_url": {
            "basename": "Guru_PhD_thesis_v2.pdf",
            "content": "final",
            "filesize": 16253785,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/15123/1/Guru_PhD_thesis_v2.pdf",
            "version": "v5.0.0"
        },
        "type": "thesis",
        "title": "Engineering Artificial Systems with Natural Intelligence",
        "author": [
            {
                "family_name": "Raghavan",
                "given_name": "Guruprasad",
                "orcid": "0000-0002-1970-9963",
                "clpid": "Raghavan-Guruprasad"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Winfree",
                "given_name": "Erik",
                "orcid": "0000-0002-5899-7523",
                "clpid": "Winfree-E"
            },
            {
                "family_name": "Rutishauser",
                "given_name": "Ueli",
                "orcid": "0000-0002-9207-7069",
                "clpid": "Rutishauser-U"
            },
            {
                "family_name": "Lois",
                "given_name": "Carlos",
                "orcid": "0000-0002-7305-2317",
                "clpid": "Lois-Carlos"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>Although Deep neural networks achieve human-like performance on a variety of perceptual and decision-making tasks, they perform poorly when confronted with changing tasks or goals, and broadly fail to match the flexibility and robustness of human intelligence. Additionally, artificial neural networks rely heavily on human-designed, hand-programmed architectures for their remarkable performance. In this thesis, I work towards achieving two goals: (i) development of a set of mathematical frameworks inspired by facets of natural intelligence, to endow artificial networks with flexibility and robustness, two key traits of natural intelligence; and (ii) inspired by the development of the biological vision system, I propose an algorithm that can \u2018grow\u2019 a functional, layered neural network from a single initial cell, with the aim of enabling autonomous development of artificial networks akin to living neural networks.</p>\r\n\r\n<p>For the first goal of endowing networks with flexibility and robustness, I propose a mathematical framework to enable continuous training of neural networks on a range of objectives by constructing path connected sets of networks, resulting in the discovery of a series of networks with equivalent functional performance on a given machine learning task. In this framework, I view the weight space of a neural network as a curved Riemannian manifold and move a network along a functionally invariant path in weight space while searching for networks that satisfy secondary objectives. A path-sampling algorithm trains computer vision and natural language processing networks with millions of weight parameters to learn a series of classification tasks without performance loss while accommodating secondary objectives including network sparsification, incremental task learning, and increased adversarial robustness. Broadly, for achieving this goal, I conceptualize a neural network as a mathematical object that can be iteratively transformed into distinct configurations by the path- sampling algorithm to define a sub-manifold of networks that can be harnessed to achieve user goals.</p>\r\n\r\n<p>For the second goal of \u2018growing\u2019 artificial neural networks in a manner similar to living neural networks, I develop an approach inspired by the mechanisms employed by the early visual system to wire the retina to the lateral geniculate nucleus (LGN), days before animals open their eyes. I find that the key ingredients for robust self- organization are (a) an emergent spontaneous spatiotemporal activity wave in the first layer and (b) a local learning rule in the second layer that \u2018learns\u2019 the underlying activity pattern in the first layer. As the bio-inspired developmental rule is adapt- able to a wide-range of input-layer geometries and robust to malfunctioning units in the first layer, it can be used to successfully grow and self-organize pooling architectures of different pool-sizes and shapes. The algorithm provides a primitive procedure for constructing layered neural networks through growth and self-organization. Finally, I also demonstrate that networks grown from a single unit perform as well as hand-crafted networks on a wide variety of static (MNIST recognition) and dynamic (gesture-recognition) tasks. Broadly, the work in the second section of this thesis shows that biologically inspired developmental algorithms can be applied to autonomously grow functional \u2018brains\u2019 in-silico.</p>",
        "doi": "10.7907/374f-1202",
        "publication_date": "2023",
        "thesis_type": "phd",
        "thesis_year": "2023"
    },
    {
        "id": "thesis:15235",
        "collection": "thesis",
        "collection_id": "15235",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:05302023-215054202",
        "type": "thesis",
        "title": "Diversity in Notch Ligand-Receptor Signaling Interactions",
        "author": [
            {
                "family_name": "Kuintzle",
                "given_name": "Rachael Christine",
                "orcid": "0000-0002-1035-4983",
                "clpid": "Kuintzle-Rachael-Christine"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Elowitz",
                "given_name": "Michael B.",
                "orcid": "0000-0002-1221-0967",
                "clpid": "Elowitz-M-B"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Pachter",
                "given_name": "Lior S.",
                "orcid": "0000-0002-9164-6231",
                "clpid": "Pachter-L"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Bronner",
                "given_name": "Marianne E.",
                "orcid": "0000-0003-4274-1862",
                "clpid": "Bronner-M-E"
            },
            {
                "family_name": "Hay",
                "given_name": "Bruce A.",
                "orcid": "0000-0002-5486-0482",
                "clpid": "Hay-B-A"
            },
            {
                "family_name": "Elowitz",
                "given_name": "Michael B.",
                "orcid": "0000-0002-1221-0967",
                "clpid": "Elowitz-M-B"
            }
        ],
        "local_group": [
            {
                "literal": "div_chem"
            }
        ],
        "abstract": "The ability to understand and predict signaling between different cell types is a major challenge in biology. The Notch pathway enables direct signaling through membrane-bound ligands and receptors, and is used in diverse contexts. While its canonical molecular signaling mechanism is well characterized, its many-to-many interacting pathway components, the complexity of their expression patterns, and the presence of same-cell (cis) as well as inter-cellular (trans) receptor-ligand interactions, have made it difficult to predict how a given cell will signal to others. Here, we use a cell-based approach, with Chinese hamster ovary (CHO-K1) cells and C2C12 mouse myoblasts, to systematically characterize trans-activation, cis-inhibition, and cis-activation efficiencies for the essential receptors (Notch1 and Notch2) and activating ligands (Dll1, Dll4, Jag1, and Jag2), in the presence of Lunatic Fringe (Lfng) or the enzymatically dead Lfng D289E mutant. All ligands trans-activate Notch1 and Notch2, except for Jag1, which competitively inhibits Notch1 signaling, and whose Notch1 binding strength is potentiated by Lfng. For Notch1, cis-activation is generally weaker than trans-activation, but for Notch2, cis-activation by Delta ligands is much stronger than trans-activation, and Notch2 cis-activation by Jag1 is similar in strength to trans-activation. Cis-inhibition is associated with weak cis-activation, as Dll1 and Dll4 do not cis-inhibit Notch2. Lfng expression potentiates trans-activation of both Notch1 and Notch2 by the Delta ligands and weakens trans-activation of both receptors by the Jagged ligands. The map of receptor-ligand-Fringe interaction outcomes revealed here should help guide rational perturbation and control of the Notch pathway.",
        "doi": "10.7907/w8gj-jb92",
        "publication_date": "2023",
        "thesis_type": "phd",
        "thesis_year": "2023"
    },
    {
        "id": "thesis:15247",
        "collection": "thesis",
        "collection_id": "15247",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:05312023-213322223",
        "primary_object_url": {
            "basename": "moses_lambda_2023.pdf",
            "content": "final",
            "filesize": 48841902,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/15247/1/moses_lambda_2023.pdf",
            "version": "v4.0.0"
        },
        "type": "thesis",
        "title": "Computation Foundations of Spatial Transcriptomics",
        "author": [
            {
                "family_name": "Moses",
                "given_name": "Lambda",
                "orcid": "0000-0002-7092-9427",
                "clpid": "Moses-Lambda"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Pachter",
                "given_name": "Lior S.",
                "orcid": "0000-0002-9164-6231",
                "clpid": "Pachter-L"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Van Valen",
                "given_name": "David A.",
                "orcid": "0000-0001-7534-7621",
                "clpid": "Van-Valen-D"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Wold",
                "given_name": "Barbara J.",
                "orcid": "0000-0003-3235-8130",
                "clpid": "Wold-B-J"
            },
            {
                "family_name": "Pimentel",
                "given_name": "Harold",
                "orcid": "0000-0001-8556-2499",
                "clpid": "Pimentel-Harold"
            },
            {
                "family_name": "Pachter",
                "given_name": "Lior S.",
                "orcid": "0000-0002-9164-6231",
                "clpid": "Pachter-L"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>Single-cell and spatial transcriptomics have come of age in the past few years; datasets and data analysis software packages have proliferated. With the increasing sizes of datasets, proliferating new data collection technologies, and mainstreaming of high-throughput technologies, the software can be improved for better speed and memory efficiency, standardized and consistent user interface for multiple technologies, and in documentation to onboard new users. First, I collected a database of spatial transcriptomics literature and analyzed the data on trends and sociology in this field. Based on the database and data analyses, I wrote a comprehensive book both qualitatively and quantitatively documenting the history of the field since the 1960s and reviewing more recent developments, which informed the software and methods I later developed. Then, to address the challenges with the pre-processing large datasets, we developed \\texttt{kallisto} \\texttt{bustools}  for fast and modular pseudoalignment of sequencing reads to the transcriptome in single-cell RNA-seq (scRNA-seq), giving consistent results with the established and much more computationally demanding alignment method Cell Ranger. Briefly summarized are my attempt to map dissociated cells in scRNA-seq to a spatial gene expression reference and to build a image processing pipeline for image based spatial transcriptomics data analysis. Finally, to address the challenges in downstream analyses of spatial -omics data, I first wrote the new \\texttt{SpatialFeatureExperiment} (SFE) data structure to represent and operate on geometries in spatial transcriptomics data and to organize results from spatial analyses. Based on SFE, I wrote Voyager, which brings decades of research in geospatial data analysis to spatial transcriptomics, to better utilize the opportunities from spatial information to gain novel biological insights. To reduce user learning curve, Voyager conforms to SCE styles and conventions and has a comprehensive documentation website and consistent user interface to many geospatial methods.</p>",
        "doi": "10.7907/rt24-pq60",
        "publication_date": "2023",
        "thesis_type": "phd",
        "thesis_year": "2023"
    },
    {
        "id": "thesis:15132",
        "collection": "thesis",
        "collection_id": "15132",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:04132023-015900885",
        "primary_object_url": {
            "basename": "Ma_Yitong_2023.pdf",
            "content": "final",
            "filesize": 17632710,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/15132/1/Ma_Yitong_2023.pdf",
            "version": "v4.0.0"
        },
        "type": "thesis",
        "title": "Multicellular Synthetic Biology in Mammalian Systems",
        "author": [
            {
                "family_name": "Ma",
                "given_name": "Yitong",
                "orcid": "0000-0003-4446-7326",
                "clpid": "Ma-Yitong"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Elowitz",
                "given_name": "Michael B.",
                "orcid": "0000-0002-1221-0967",
                "clpid": "Elowitz-M-B"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Guttman",
                "given_name": "Mitchell",
                "orcid": "0000-0003-4748-9352",
                "clpid": "Guttman-M"
            },
            {
                "family_name": "Murray",
                "given_name": "Richard M.",
                "orcid": "0000-0002-5785-7481",
                "clpid": "Murray-R-M"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Elowitz",
                "given_name": "Michael B.",
                "orcid": "0000-0002-1221-0967",
                "clpid": "Elowitz-M-B"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>In multicellular organisms, different types of cells use intercellular signals to communicate and regulate population dynamics, and further coordinate complex behaviors. This presents a rarely tapped into potential for mammalian synthetic biology, which was largely restricted to engineering a single cell type in the past to mimic and use similar multicellular designs to achieve more functionalities. However, with current synthetic biology tools and designs, there are several major challenges to achieve a multicellular circuit. Challenges include precise and tunable control over cell type switching, having an orthogonal cell-cell communication signal, and robust control of cell populations.</p>\r\n\r\n<p>To address these challenges, this thesis presents a system for tunable regulating of gene expression with DNA methylation, an auxin-based module for mammalian cell-cell communication, and a robust circuit for population control in mammalian cells. I further applied these work to engineering immune cells to show the potential of multicellular circuits in immunotherapies. Together, these works demonstrated the possibility of constructing multicellular circuits in mammalian systems, and that multicellular circuit can further extend the scope of synthetic biology to achieve more complex functions.</p>",
        "doi": "10.7907/w0q1-7s17",
        "publication_date": "2023",
        "thesis_type": "phd",
        "thesis_year": "2023"
    },
    {
        "id": "thesis:16081",
        "collection": "thesis",
        "collection_id": "16081",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:06042023-195408313",
        "primary_object_url": {
            "basename": "galvezmerchan_angel_2023_thesis.pdf",
            "content": "final",
            "filesize": 15283976,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/16081/1/galvezmerchan_angel_2023_thesis.pdf",
            "version": "v4.0.0"
        },
        "type": "thesis",
        "title": "Studies of mRNA Expression and Degradation",
        "author": [
            {
                "family_name": "G\u00e1lvez Merch\u00e1n",
                "given_name": "\u00c1ngel",
                "orcid": "0000-0001-7420-8697",
                "clpid": "G\u00e1lvez-Merch\u00e1n-\u00c1ngel"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Pachter",
                "given_name": "Lior S.",
                "orcid": "0000-0002-9164-6231",
                "clpid": "Pachter-L"
            },
            {
                "family_name": "Voorhees",
                "given_name": "Rebecca M.",
                "orcid": "0000-0003-1640-2293",
                "clpid": "Voorhees-R-M"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Aravin",
                "given_name": "Alexei A.",
                "orcid": "0000-0002-6956-8257",
                "clpid": "Aravin-A-A"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Pachter",
                "given_name": "Lior S.",
                "orcid": "0000-0002-9164-6231",
                "clpid": "Pachter-L"
            },
            {
                "family_name": "Voorhees",
                "given_name": "Rebecca M.",
                "orcid": "0000-0003-1640-2293",
                "clpid": "Voorhees-R-M"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>Part 1: Protein degradation coupled to Nonsense-mediated mRNA decay</p>\r\n\r\n<p>Translation of mRNAs containing premature termination codons (PTCs) results in truncated protein products with deleterious effects. Nonsense-mediated decay (NMD) is a surveillance pathway responsible for detecting PTC containing transcripts. While the molecular mechanisms governing mRNA degradation have been extensively studied, the fate of the nascent protein product remains largely uncharacterized. In part 1 of this thesis, we use a fluorescent reporter system in mammalian cells to reveal a selective degradation pathway specifically targeting the protein product of an NMD mRNA. We show that this process is post-translational, and dependent on the ubiquitin proteasome system. To systematically uncover factors involved in NMD-linked protein quality control, we conducted genome-wide flow cytometry-based screens. Our screens recovered known NMD factors, but suggested protein degradation did not depend on the canonical ribosome-quality control (RQC) pathway. A subsequent arrayed screen demonstrated that protein and mRNA branches of NMD rely on a shared recognition event. Our results establish the existence of a targeted pathway for nascent protein degradation from PTC containing mRNAs, and provides a reference for the field to identify and characterize required factors.</p>\r\n\r\n<p>Part 2: The Commons Cell Atlas</p>\r\n\r\n<p>Current cell atlas projects aim to curate representative datasets, cell-types, and marker genes for tissues across an organism. Despite their ubiquity, atlas projects rely on duplicated and manual effort to curate marker genes and annotate cell-types. Importantly, the lack of data-compatible tools and a fixed representation of the atlas make their reanalysis near-impossible. To overcome these challenges, we present a collection of data, algorithms, and tools to automate cataloging and analyzing cell-types across all tissues in an organism. We leveraged this work to build a Human Commons Cell Atlas comprising 2.9 million cells across 27 tissues that can be easily updated and that is structured to facilitate custom analyses. To showcase the flexibility of the atlas, we demonstrate that it can be used for isoform analyses. In particular, we study cell-type specificity of isoforms of OAS1, which has recently been shown to offer SARS-CoV-2 protection in certain individuals that display higher expression of the p46 isoform. Using our Commons Cell Atlas, we localize the OAS1 p44b isoform to the testis, and find that it is specific to germ line cells. By virtue of enabling customized analyses via a modular and dynamic atlas structure, the Commons Cell Atlas should be useful for exploratory analyses that are intractable within the rigid framework of current gene-centric static atlases.</p>",
        "doi": "10.7907/esxk-ch24",
        "publication_date": "2023",
        "thesis_type": "phd",
        "thesis_year": "2023"
    },
    {
        "id": "thesis:14517",
        "collection": "thesis",
        "collection_id": "14517",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:03162022-173632582",
        "primary_object_url": {
            "basename": "Abdel-haq_Reem_2022_thesis.pdf",
            "content": "final",
            "filesize": 44751420,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/14517/1/Abdel-haq_Reem_2022_thesis.pdf",
            "version": "v4.0.0"
        },
        "type": "thesis",
        "title": "Gut Microbiome Modulates Microglia Physiology in Homeostatic and Disease States",
        "author": [
            {
                "family_name": "Abdel-Haq",
                "given_name": "Reem",
                "orcid": "0000-0002-7418-5736",
                "clpid": "Abdel-Haq-Reem"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Mazmanian",
                "given_name": "Sarkis K.",
                "orcid": "0000-0003-2713-1513",
                "clpid": "Mazmanian-S-K"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Gradinaru",
                "given_name": "Viviana",
                "orcid": "0000-0001-5868-348X",
                "clpid": "Gradinaru-V"
            },
            {
                "family_name": "Mazmanian",
                "given_name": "Sarkis K.",
                "orcid": "0000-0003-2713-1513",
                "clpid": "Mazmanian-S-K"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Chan",
                "given_name": "David C.",
                "orcid": "0000-0002-0191-2154",
                "clpid": "Chan-D-C"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "The gastrointestinal tract (GI) harbors a complex community of ~100 trillion bacteria, fungi, and viruses collectively referred to as the gut microbiome. Through direct and indirect signaling mechanisms, the gut microbiome exerts its effects on almost every organ system, including the brain. Constant, bi-directional communication along the gut-brain axis is required for the normal and healthy development of the host Central Nervous System (CNS). One of the cells in the CNS shaped by microbial-derived cues is microglia, the resident immune cells in the brain. Aberrant microglia activity is a driving force of several neurological diseases in which the gut microbiome plays a role, including Parkinson\u2019s disease (PD). \r\n\r\nIn this thesis, we explore the interplay between gut microbiota signaling and microglia physiology during homeostatic and disease states. We first detail how microbial signaling along the gut-brain axis shapes microglial development and function. Next, we explore how the gut microbiome composition influences microglial activation states in the context of disease. Leveraging a preclinical mouse model of PD, we show that dietary-driven changes to the gut microbiome through the use of prebiotics attenuates motor deficits and \u03b1-synuclein aggregation. These effects result from changes in microglial gene expression and activation status. Collectively, these findings have broad implications for the gut microbiome research community and highlight potential for development of microbiome-based therapies for diseases of the brain.",
        "doi": "10.7907/ht1j-2461",
        "publication_date": "2022",
        "thesis_type": "phd",
        "thesis_year": "2022"
    },
    {
        "id": "thesis:14496",
        "collection": "thesis",
        "collection_id": "14496",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:02132022-064810187",
        "primary_object_url": {
            "basename": "David_Brown_Thesis_V4.pdf",
            "content": "final",
            "filesize": 10109719,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/14496/1/David_Brown_Thesis_V4.pdf",
            "version": "v5.0.0"
        },
        "type": "thesis",
        "title": "Principles of Massively Parallel Sequencing for Engineering and Characterizing Gene Delivery",
        "author": [
            {
                "family_name": "Brown",
                "given_name": "David",
                "orcid": "0000-0002-9757-1744",
                "clpid": "Brown-David"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Gradinaru",
                "given_name": "Viviana",
                "orcid": "0000-0001-5868-348X",
                "clpid": "Gradinaru-V"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Yue",
                "given_name": "Yisong",
                "orcid": "0000-0001-9127-1989",
                "clpid": "Yue-Yisong"
            },
            {
                "family_name": "Arnold",
                "given_name": "Frances Hamilton",
                "orcid": "0000-0002-4027-364X",
                "clpid": "Arnold-F-H"
            },
            {
                "family_name": "Gradinaru",
                "given_name": "Viviana",
                "orcid": "0000-0001-5868-348X",
                "clpid": "Gradinaru-V"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>The advent of massively parallel sequencing and synthesis technologies have ushered in a new paradigm of biology, where high throughput screening of billions of nucleid acid molecules and production of libraries of millions of genetic mutants are now routine in labs and clinics. During my Ph.D., I worked to develop data analysis and experimental methods that take advantage of the scale of this data, while making the minimal assumptions necessary for deriving value from their application. My Ph.D. work began with the development of software and principles for analyzing deep mutational scanning data of libraries of engineered AAV capsids. By looking at not only the top variant in a round of directed evolution, but instead a broad distribution of the variants and their phenotypes, we were able to identify AAV variants with enhanced ability to transduce specific cells in the brain after intravenous injection. I then shifted to better understand the phenotypic profile of these engineered variants. To that end, I turned to single-cell RNA sequencing to seek to identify, with high resolution, the delivery profile of these variants in all cell types present in the cortex of a mouse brain. I began by developing infrastructure and tools for dealing with the data analysis demands of these experiments. Then, by delivering an engineered variant to the animal, I was able to use the single-cell RNA sequencing profile, coupled with a sequencing readout of the delivered genetic cargo present in each cell type, to define the variant\u2019s tropism across the full spectrum of cell types in a single step. To increase the throughput of this experimental paradigm, I then worked to develop a multiplexing strategy for delivering up to 7 engineered variants in a single animal, and obtain the same high resolution readout for each variant in a single experiment. Finally, to take a step towards translation to human diagnostics, I leveraged the tools I built for scaling single-cell RNA sequencing studies and worked to develop a protocol for obtaining single-cell immune profiles of low volumes of self-collected blood. This study enabled repeat sampling in a short period of time, and revealed an incredible richness in individual variability and time-of-day dependence of human immune gene expression. Together, my Ph.D. work provides strategies for employing massively parallel sequencing and synthesis for new biological applications, and builds towards a future paradigm where personalized, high-resolution sequencing might be coupled with modular, customized gene therapy delivery.</p>",
        "doi": "10.7907/yqjm-6609",
        "publication_date": "2022",
        "thesis_type": "phd",
        "thesis_year": "2022"
    },
    {
        "id": "thesis:14435",
        "collection": "thesis",
        "collection_id": "14435",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:11282021-042001335",
        "primary_object_url": {
            "basename": "Rachel_Caltech_PhD_Thesis_V2-6.pdf",
            "content": "final",
            "filesize": 93199721,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/14435/11/Rachel_Caltech_PhD_Thesis_V2-6.pdf",
            "version": "v6.0.0"
        },
        "type": "thesis",
        "title": "Experimental and Theoretical Studies of Non-Equilibrium Systems: Motor-Microtubule Assemblies and the Human-Earth System",
        "author": [
            {
                "family_name": "Banks",
                "given_name": "Rachel A.",
                "orcid": "0000-0003-2028-2925",
                "clpid": "Banks-Rachel-A"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Phillips",
                "given_name": "Robert B.",
                "orcid": "0000-0003-3082-2809",
                "clpid": "Phillips-R"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Van Valen",
                "given_name": "David A.",
                "orcid": "0000-0001-7534-7621",
                "clpid": "Van-Valen-D"
            },
            {
                "family_name": "Bois",
                "given_name": "Justin S.",
                "orcid": "0000-0001-7137-8746",
                "clpid": "Bois-J-S"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Phillips",
                "given_name": "Robert B.",
                "orcid": "0000-0003-3082-2809",
                "clpid": "Phillips-R"
            }
        ],
        "local_group": [
            {
                "literal": "div_chem"
            }
        ],
        "abstract": "<p>Systems out of equilibrium are pervasive around us. In fact, being out of equilibrium is a key property of life, as described by Erwin Schrodinger in his series of essays \"What is life?\". Through the consumption of energy, i.e. food, living organisms achieve ordered states that would be very unlikely to occur at equilibrium, such as the mitotic spindle during cell division, swarms of bacteria, or flocks of starlings. The Earth system is another example of a non-equilibrium system. The state of the Earth has been evolving for billions of years, often under the influence of life. Today, humanity is a dominant influence forcing the Earth system to new states. Understanding these non-equilibrium systems has posed many challenges; in this thesis, we work towards quantitatively dissecting and gaining an intuition for the functioning of both a molecular scale and planetary scale non-equilibrium system. </p>\r\n\r\n<p>Underlying many cellular functions such as cell division and transportation of organelles is the cytoskeleton composed of motor proteins and their constituent filaments. One of the key components are kinesin motors, which consume chemical energy to walk along and reorganize microtubules. Collections of these motors and microtubules are able to form organized structures. Understanding how these structures are formed has remained an open question. In Chapter 2, we develop a system of kinesin motors and microtubules wherein motor activity is controlled by light, thereby gaining spatiotemporal control over the formation of motor-microtubule assemblies. We demonstrate the creation of a variety of structures of different sizes and geometry, and measure how length and time scales of these assemblies depend on the activated region. </p>\r\n\r\n<p>A remaining question was how the microscopic details of the interaction between motors and microtubule affect the dynamics and steady-state structure formed. With our scheme for light-control in hand, we extended the system to a variety of motor proteins that have different speeds, processivities (how many steps they take before unbinding from the microtubule), directionalities (which end of the microtubule they walk towards), and forces they are able to exert in Chapter 3. We found that the size of steady-state structures, distribution of motors within assemblies, and rate of contraction of networks depend on motor properties. Further, we demonstrate that various structures can be formed by combining different motors. This work begins to build a connection between the detailed microscopic interactions of cytoskeletal components to the larger scale structures they form. </p>\r\n\r\n<p>Chapter 4 begins our work on understanding the state of the human-Earth system. A major hurdle to quantitatively understanding this system is the difficulty of finding and parsing the relevant data, which is often within long, complicated reports. In order to facilitate access to this data, we created the Human Impacts Database, which houses a collection of $>$ 300 carefully curated values related to human impacts on the Earth, introduced in Chapter 4. In this chapter, we describe the format of the database as well as demonstrate how it can be harnessed to gain a more holistic perspective on humanity's influence on the Earth.</p>\r\n\r\n<p>Having this data is only a starting point towards deciphering the ways that humans are altering the state of the Earth, though. In Chapter 5, we combine these quantitative measurements with simple order-of-magnitude estimates to gain an intuition for the magnitude of several of the values. In this way, we show that many of the ways humanity is affecting the Earth can be tied back to how much land, water, and power we use. We further contextualize the magnitude of human influence by comparing human activities to natural analogs, finding that humans currently rival natural processes in influencing the state of the Earth system.</p>",
        "doi": "10.7907/5ee6-j454",
        "publication_date": "2022-06-10",
        "thesis_type": "phd",
        "thesis_year": "2022"
    },
    {
        "id": "thesis:14424",
        "collection": "thesis",
        "collection_id": "14424",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:11102021-210013472",
        "type": "thesis",
        "title": "Compilation and Inference with Chemical Reaction Networks",
        "author": [
            {
                "family_name": "Poole",
                "given_name": "William",
                "orcid": "0000-0002-2958-6776",
                "clpid": "Poole-William"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Winfree",
                "given_name": "Erik",
                "orcid": "0000-0002-5899-7523",
                "clpid": "Winfree-E"
            },
            {
                "family_name": "Murray",
                "given_name": "Richard M.",
                "orcid": "0000-0002-5785-7481",
                "clpid": "Murray-R-M"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Winfree",
                "given_name": "Erik",
                "orcid": "0000-0002-5899-7523",
                "clpid": "Winfree-E"
            },
            {
                "family_name": "Murray",
                "given_name": "Richard M.",
                "orcid": "0000-0002-5785-7481",
                "clpid": "Murray-R-M"
            },
            {
                "family_name": "Phillips",
                "given_name": "Robert B.",
                "orcid": "0000-0003-3082-2809",
                "clpid": "Phillips-R"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>The successful advancement and deployment of technologies in the field of synthetic biology will require sophisticated computational infrastructure coupled with new theoretical ideas in order to more effectively engineer and reverse engineer biochemical networks. This thesis argues that the field of machine learning can inform the development of these underlying principles and techniques. First, software for compiling diverse chemical reaction network models of biological circuits from simple specifications is described. Second, three chemical reaction network implementations of a powerful machine learning model called a Boltzmann machine are analyzed and compared. Third, the class of detailed balanced chemical reaction networks are proven to be capable of probabilistic inference and, when coupled to a driven chemical system, autonomous learning. Finally, the use of machine learning to interpret and understand biological systems is explored in an experimental case study modeling <i>E. coli</i> cell extract metabolism.</p>",
        "doi": "10.7907/x3qc-je74",
        "publication_date": "2022",
        "thesis_type": "phd",
        "thesis_year": "2022"
    },
    {
        "id": "thesis:14938",
        "collection": "thesis",
        "collection_id": "14938",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:06022022-201232129",
        "primary_object_url": {
            "basename": "Final Thesis version Eduardo da Veiga Beltrame.pdf",
            "content": "final",
            "filesize": 17501518,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/14938/1/Final Thesis version Eduardo da Veiga Beltrame.pdf",
            "version": "v4.0.0"
        },
        "type": "thesis",
        "title": "Stories in Single Cell RNA Sequencing",
        "author": [
            {
                "family_name": "da Veiga Beltrame",
                "given_name": "Eduardo",
                "orcid": "0000-0002-1529-9207",
                "clpid": "da-Veiga-Beltrame-Eduardo"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Sternberg",
                "given_name": "Paul W.",
                "orcid": "0000-0002-7699-0173",
                "clpid": "Sternberg-P-W"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Pachter",
                "given_name": "Lior S.",
                "orcid": "0000-0002-9164-6231",
                "clpid": "Pachter-L"
            },
            {
                "family_name": "Van Valen",
                "given_name": "David A.",
                "orcid": "0000-0001-7534-7621",
                "clpid": "Van-Valen-D"
            },
            {
                "family_name": "Sternberg",
                "given_name": "Paul W.",
                "orcid": "0000-0002-7699-0173",
                "clpid": "Sternberg-P-W"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>This thesis describes the projects I have worked on since starting the Caltech bioengineering program in fall 2017. The general theme of my projects is that they are all about single cell RNA sequencing (scRNA-seq), spanning the experimental and computational realms.</p> \r\n\r\n<p>Chapter 1 is an introduction explaining the essential concepts and is meant to be readable by a wide audience. For the other chapters, each one describes a separate project in a succinct manner, including links to the related preprint, published paper or code repositories at the start of each chapter.</p>\r\n\r\n<p>Chapter 2 describes the scVI generative model for scRNA-seq data and the scvi-tools framework, which forms the basis of many of my computational projects.</p> \r\n\r\n<p>Chapter 3 describes an open source 3D printable syringe pump system that was developed envisioning facilitating many kinds of experiments, in particular droplet based scRNA-seq.</p> \r\n\r\n<p>Chapter 4 describes a new way of fabricating hydrogel beads with unique DNA barcodes that are used for scRNA-seq experiments.</p> \r\n\r\n<p>Chapter 5 describes a database listing most published scRNA-seq studies that I helped create, and provides a useful overview of the state of the field.</p> \r\n\r\n<p>Chapter 6 describes the kallisto bus workflow, which is used for pre-processing scRNA-seq data, going from FASTQ file to gene count matrix in a very efficient manner.</p> \r\n\r\n<p>Chapter 7 describes a new way of using scVI to quantify the trade- off in the quality of scRNA-seq of a given dataset when surveying more cells or sequencing more reads per cell.</p> \r\n\r\n<p>Chapter 8 describes tools developed for the WormBase users to leverage scRNA-seq data on <i>C. elegans</i>, and which can be deployed with any other scRNA-seq dataset.</p> \r\n\r\n<p>Chapter 9 describes a remarkably successful offshoot of the devel- opment of these tools: a simple scVI based analysis and visualization strategy for finding candidate marker genes using <i>C. elegans</i> scRNA-seq data, which was experimentally validated by members of the Sternberg lab.</p>",
        "doi": "10.7907/4kgh-8420",
        "publication_date": "2022",
        "thesis_type": "phd",
        "thesis_year": "2022"
    },
    {
        "id": "thesis:14934",
        "collection": "thesis",
        "collection_id": "14934",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:06022022-032024376",
        "primary_object_url": {
            "basename": "Thesis_ChristinaSu_2022.pdf",
            "content": "final",
            "filesize": 5427101,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/14934/1/Thesis_ChristinaSu_2022.pdf",
            "version": "v4.0.0"
        },
        "type": "thesis",
        "title": "Principles of Addressing Specificity in Promiscuous Ligand-Receptor Systems",
        "author": [
            {
                "family_name": "Su",
                "given_name": "Christina Janet",
                "orcid": "0000-0002-9223-9777",
                "clpid": "Su-Christina-Janet"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Elowitz",
                "given_name": "Michael B.",
                "orcid": "0000-0002-1221-0967",
                "clpid": "Elowitz-M-B"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Chan",
                "given_name": "David C.",
                "orcid": "0000-0002-0191-2154",
                "clpid": "Chan-D-C"
            },
            {
                "family_name": "Elowitz",
                "given_name": "Michael B.",
                "orcid": "0000-0002-1221-0967",
                "clpid": "Elowitz-M-B"
            },
            {
                "family_name": "Goentoro",
                "given_name": "Lea A.",
                "orcid": "0000-0002-3904-0195",
                "clpid": "Goentoro-L-A"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>In multicellular organisms, a relatively small number of highly conserved signaling pathways are used to enable intercellular communication. While the underlying molecular components and interactions are increasingly well understood, a fundamental mystery is how the diverse cell types of the body can be so precisely coordinated by so few pathways. It has long been known that different cell types exhibit varied responses to molecular signals, and it is unclear how this cell type specificity arises. In this work, we take a different perspective on this question and explore how cell type specificity can be generated at the level of intracellular signal. We refer to this ability to selectively activate different cell types as \"addressing.\" By eliminating the complexity of considering downstream pathway effectors, we are able to more comprehensively understand how cell type specificity can arise in spite of\u2014or because of\u2014promiscuity in ligand-receptor interactions. We focus on the bone morphogenetic protein (BMP) pathway as an ideal example. This pathway is essential in development, is of therapeutic interest in an array of pathologies, and has proven amenable to theoretical and experimental analysis. We first describe a minimal model of the pathway and identify what types of response functions can be achieved. We show that each layer of computation, from the formation of signaling complexes to the activation of downstream second messenger, can provide nontrivial integrations of ligand inputs. We then extend this analysis to systems with multiple cell types that may vary in receptor expression profile. The diverse response functions of this pathway enable systems in which different cell types or sets of cell types may be addressed with high specificity. In particular, the BMP pathway can address multiple cell types with high capacity, flexibility, and robustness. Taken together, these results provide a framework for understanding how molecular promiscuity in signaling pathways can, in fact, enable cellular specificity in pathway responses.</p>",
        "doi": "10.7907/z7dv-m192",
        "publication_date": "2022",
        "thesis_type": "phd",
        "thesis_year": "2022"
    },
    {
        "id": "thesis:14409",
        "collection": "thesis",
        "collection_id": "14409",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:10282021-191743624",
        "type": "thesis",
        "title": "Quantitative Sequencing and its Application to Studies of the Human Small-Intestine Microbiota",
        "author": [
            {
                "family_name": "Barlow",
                "given_name": "Jacob T.",
                "orcid": "0000-0002-1842-4835",
                "clpid": "Barlow-Jacob-T"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Ismagilov",
                "given_name": "Rustem F.",
                "orcid": "0000-0002-3680-4399",
                "clpid": "Ismagilov-R-F"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Mazmanian",
                "given_name": "Sarkis K.",
                "orcid": "0000-0003-2713-1513",
                "clpid": "Mazmanian-S-K"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Cai",
                "given_name": "Long",
                "orcid": "0000-0002-7154-5361",
                "clpid": "Cai-Long"
            },
            {
                "family_name": "Ismagilov",
                "given_name": "Rustem F.",
                "orcid": "0000-0002-3680-4399",
                "clpid": "Ismagilov-R-F"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>Our understanding of the interplay between microbial species and the hosts they live on and in is continually expanding. New insights have focused not only microorganisms that drive specific disease states but also those that help maintain human health. As research drives towards mechanistic understanding of host-microbe relationships new quantitative tools are needed to help interrogate these complex interactions. Chapter I of this thesis discusses formulation of a method for rapid detection of antibiotic resistance in <i>Neisseria gonorrhoeae</i>. Our approach identified RNA signatures from transcriptional profiling of Neisseria gonorrhoeae after 10-minute antibiotic exposure. Utilization of these RNA markers allowed for rapid identification of antibiotic susceptibility or resistance to the antibiotic ciprofloxacin. Chapter II shifts focus to the development of a quantitative sequencing technique for the measurement of absolute taxon abundances in complex microbial communities. Combining the precision of digital PCR with the high-throughput nature of 16S rRNA gene amplicon sequencing allowed for simultaneous quantitative profiling of all bacterial taxa in host-associated microbial communities. We extensively characterized our quantitative sequencing methodology in the presence of high host nucleic acid levels and low microbial loads to understand the limits of quantification and detection in complex sample types. Last, Chapter III applies the quantitative sequencing technology from Chapter II to investigate the microbial community of the human small intestine, specifically the duodenum. Data from the duodenum of 250 individuals revealed a wide range of total microbial loads and a distinct subset of microbes, termed disruptor taxa, that were associated with small intestinal bacterial overgrowth (SIBO) and GI symptom severity.</p>",
        "doi": "10.7907/ca28-fk21",
        "publication_date": "2022",
        "thesis_type": "phd",
        "thesis_year": "2022"
    },
    {
        "id": "thesis:14399",
        "collection": "thesis",
        "collection_id": "14399",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:10172021-215439860",
        "primary_object_url": {
            "basename": "Dobreva_Tatyana_2021_v7.pdf",
            "content": "final",
            "filesize": 10892340,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/14399/1/Dobreva_Tatyana_2021_v7.pdf",
            "version": "v4.0.0"
        },
        "type": "thesis",
        "title": "Engineering Tools to Probe and Manipulate the Immune System at Single-Cell Resolution",
        "author": [
            {
                "family_name": "Dobreva",
                "given_name": "Tatyana",
                "orcid": "0000-0002-2625-8873",
                "clpid": "Dobreva-Tatyana"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Gradinaru",
                "given_name": "Viviana",
                "orcid": "0000-0001-5868-348X",
                "clpid": "Gradinaru-V"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Gao",
                "given_name": "Wei",
                "orcid": "0000-0002-8503-4562",
                "clpid": "Gao-Wei"
            },
            {
                "family_name": "Gradinaru",
                "given_name": "Viviana",
                "orcid": "0000-0001-5868-348X",
                "clpid": "Gradinaru-V"
            },
            {
                "family_name": "Elowitz",
                "given_name": "Michael B.",
                "orcid": "0000-0002-1221-0967",
                "clpid": "Elowitz-M-B"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            }
        ],
        "local_group": [
            {
                "literal": "div_eng"
            }
        ],
        "abstract": "<p>My thesis focuses on developing experimental and computational tools to probe and manipulate cellular transcriptomes in the context of human health and disease. Chapter 1 and 2 focus on published work where we leverage single-cell RNA sequencing (scRNA-seq) to understand human immune variability, characterize cell-type specific biases of multiple viral variants within an animal, and assess temporal immune response in the brain to delivery of genetic cargo via an adeno-associated virus (AAV). Chapter 3 and 4 present progress I have made on tools for exporting RNA extracellularly and engineering of a transcription factor for modulating macrophage state.</p>\r\n\r\n<p>For probing cellular transcriptome states, we have developed a platform using multiplexed single-cell sequencing and out-of-clinic capillary blood extraction to understand temporal and inter-individual variability of gene expression within immune cell types. Our platform enables simplified, cost-effective profiling of the human immune system across subjects and time at single-cell resolution. To demonstrate the power of our platform, we performed a three day time-of-day study of four healthy individuals, generating gene expression data for 24,087 cells across 22 samples. We detected genes with cell type-specific time-of-day expression and identified robust genes and pathways particular to each individual, all of which could have been missed if analyzed with bulk RNA-sequencing. Also, using scRNA-seq, we have developed a method to screen and characterize cellular tropism of multiple AAV variants. Additionally, I have looked at AAV-mediated transcriptomic changes in animals injected with AAV-PHP.eB three days and twenty-five days post-injection. I have found that there is an upregulation of genes involved in p53 signaling in endothelial cells three days post-injection.</p>\r\n\r\n<p>In the context of manipulating cellular transcriptomic states, I demonstrate that a fusion between RNA targeting enzyme, dCas13, and capsid-forming neuronal protein, Arc, is able to form a capsid-like structure capable of encapsulating RNA. I also present methods and preliminary data for tuning macrophage states through mutations in transcription factor EB (TFEB) using scRNA-seq as a readout.</p>",
        "doi": "10.7907/n3rs-ft69",
        "publication_date": "2022",
        "thesis_type": "phd",
        "thesis_year": "2022"
    },
    {
        "id": "thesis:14339",
        "collection": "thesis",
        "collection_id": "14339",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:08242021-212959886",
        "primary_object_url": {
            "basename": "thesis_vahe_galstyan_final.pdf",
            "content": "final",
            "filesize": 45962171,
            "license": "cc_by",
            "mime_type": "application/pdf",
            "url": "/14339/1/thesis_vahe_galstyan_final.pdf",
            "version": "v6.0.0"
        },
        "type": "thesis",
        "title": "Studies in Physical Biology: Exploring Allosteric Regulation, Enzymatic Error Correction, and Cytoskeletal Self-Organization Using Theory and Modeling",
        "author": [
            {
                "family_name": "Galstyan",
                "given_name": "Vahe",
                "orcid": "0000-0001-7073-9175",
                "clpid": "Galstyan-Vahe"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Phillips",
                "given_name": "Robert B.",
                "orcid": "0000-0003-3082-2809",
                "clpid": "Phillips-R"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Van Valen",
                "given_name": "David A.",
                "orcid": "0000-0001-7534-7621",
                "clpid": "Van-Valen-D"
            },
            {
                "family_name": "Winfree",
                "given_name": "Erik",
                "orcid": "0000-0002-5899-7523",
                "clpid": "Winfree-E"
            },
            {
                "family_name": "Phillips",
                "given_name": "Robert B.",
                "orcid": "0000-0003-3082-2809",
                "clpid": "Phillips-R"
            }
        ],
        "local_group": [
            {
                "literal": "div_chem"
            }
        ],
        "abstract": "<p>Physical biology offers powerful tools for quantitatively dissecting the various aspects of cellular life that one cannot attribute to inanimate matter. Signature examples of living matter include adaptation, self-organization, and division. In this thesis, we explore different interconnected facets of these processes using statistical mechanics, nonequilibrium thermodynamics, and biophysical modeling.</p>\r\n\r\n<p>One of the key mechanisms underlying physiological and evolutionary adaptation is allosteric regulation. It allows cells to dynamically respond to changes in the state of the environment often expressed through altered levels of different environmental cues. The first thread of our work is dedicated to exploring the combinatorial diversity of responses available to allosteric proteins that are subject to multi-ligand regulation. We demonstrate that proteins characterized through the Monod-Wyman-Changeux model of allostery and operating at thermodynamic equilibrium are capable of eliciting a wide range of response behaviors which include the kinds known from the field of digital circuits (e.g., NAND logic response), as well as more sophisticated computations such as ratiometric sensing. </p>\r\n\r\n<p>Despite the fact that biomolecules at thermodynamic equilibrium are able to orchestrate a variety of fascinating behaviors, the cell is ultimately 'alive' because it constantly metabolizes nutrients and generates energy to drive functions that cannot be sustained in the absence of energy consumption. One prominent example of such a function is nonequilibrium error correction present in high-fidelity processes such as protein synthesis, DNA replication, or pathogen recognition. We begin the second thread of our work by providing a conceptual understanding of the prevailing mechanism used in explaining this high-fidelity behavior, namely that of kinetic proofreading. Specifically, we develop an allostery-based mechanochemical model of a kinetic proofreader where chemical driving is replaced with a mechanical engine with tunable knobs which allow modulating the amount of dissipation in a transparent way. We demonstrate how varying levels of error correction can be attained at different regimes of dissipation and offer intuitive interpretations for the conditions required for efficient biological proofreading.</p>\r\n\r\n<p>We then extend the notion of error correction to equilibrium enzymes not endowed with structural features typically required for proofreading. We show that, under physiological conditions, purely diffusing enzymes can take advantage of the existing nonequilibrium organization of their substrates in space and enhance the fidelity of catalysis. Our proposed mechanism called spatial proofreading offers a novel perspective on spatial structures and compartmentalization in cells as a route to specificity.</p>\r\n\r\n<p>In the last thread of the thesis, we make a transition from molecular-scale studies to the mesoscopic scale, and explore the principles of self-organization in nonequilibrium structures formed in reconstituted microtubule-motor mixtures. In particular, we develop a theoretical framework that predicts the spatial distribution of kinesin motors in radially symmetric microtubule asters formed under various conditions using optogenetic control. The model manages to accurately recapitulate the experimentally measured motor profiles through effective parameters that are specific for each kind of kinesin motor used. Our theoretical work of rigorously assessing the motor distribution therefore offers an avenue for understanding the link between the microscopic motor properties (e.g., processivity or binding affinity) and the large-scale structures they create.</p>\r\n\r\n<p>In all, the thesis encompasses a series of case studies with shared themes of allostery and nonequilibrium, highlighting the capacity of living matter to perform remarkable tasks inaccessible to nonliving materials.</p>",
        "doi": "10.7907/1fzr-1240",
        "publication_date": "2022",
        "thesis_type": "phd",
        "thesis_year": "2022"
    },
    {
        "id": "thesis:14338",
        "collection": "thesis",
        "collection_id": "14338",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:08242021-212609828",
        "primary_object_url": {
            "basename": "thesis.pdf",
            "content": "final",
            "filesize": 36577433,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/14338/1/thesis.pdf",
            "version": "v4.0.0"
        },
        "type": "thesis",
        "title": "Physical Biology of Cellular Information Processing",
        "author": [
            {
                "family_name": "Razo-Mejia",
                "given_name": "Manuel",
                "orcid": "0000-0002-9510-0527",
                "clpid": "Razo-Mejia-Manuel"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Phillips",
                "given_name": "Robert B.",
                "orcid": "0000-0003-3082-2809",
                "clpid": "Phillips-R"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Newman",
                "given_name": "Dianne K.",
                "orcid": "0000-0003-1647-1918",
                "clpid": "Newman-D-K"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Goentoro",
                "given_name": "Lea A.",
                "orcid": "0000-0002-3904-0195",
                "clpid": "Goentoro-L-A"
            },
            {
                "family_name": "Pachter",
                "given_name": "Lior S.",
                "orcid": "0000-0002-9164-6231",
                "clpid": "Pachter-L"
            },
            {
                "family_name": "Phillips",
                "given_name": "Robert B.",
                "orcid": "0000-0003-3082-2809",
                "clpid": "Phillips-R"
            }
        ],
        "local_group": [
            {
                "literal": "div_chem"
            }
        ],
        "abstract": "<p>The state of matter that we define as <em>life</em> is different from anything else we have encountered so far in the universe. Living systems not only perpetuate their existence out of equilibrium against the will of the second law of thermodynamics, but they do so while keeping up with an ever-changing environment. A key part of this capacity to adapt to environmental changes is the ability of organisms to gather information from their surroundings to put together an adequate response to the challenges presented to them. This thesis presents an effort to understand, from first principles, this fundamental feature of information gathering that all life on earth shares. We dig into the physics behind one of the most pervasive mechanisms through which living systems sense and respond to the environment\u2013the ability to turn <em>on</em> and <em>off</em> genes. In doing so, we hope to uncover general principles of how organisms deal with the problem of collecting information about the world that surrounds them.</p>\r\n\r\n<p>In Chapter 1, we develop the theoretical and conceptual tools to navigate the rest of the thesis. I introduce the idea of gene regulation, as well as different theoretical models of this pervasive biological phenomenon. We also delve into the realm of information theory and learn how the plastic concept of information can be mathematically defined and quantified.</p>\r\n\r\n<p>The second stop in our exploration (Chapter 2) asks the following question: can we understand, from first principles, how it is that proteins allow cells to regulate their genes on-demand upon sensing environmental cues? For this, we explore the physics behind transcriptional control due to allosteric transcription factors. Using simple quasi-equilibrium models of the two processes involved in this type of regulation\u2014the regulation of the gene by the binding and unbinding of the transcription factor, and the regulation of the activity of the transcription factor itself by the binding and unbinding of an effector molecule\u2014we are able to predict the input-output function of a simple genetic circuit, and compare such predictions with experimental determinations of the mean response of a population of bacterial cells.</p>\r\n\r\n<p>We then expand on these insights to ask questions about the inescapable cell-to-cell variability that isogenic cells encounter. For this, we have to leave behind the pure thermodynamic framework and work in the language of chemical kinetics. This allows us to make predictions beyond the mean input-output gene expression response of cells by reconstructing full gene expression distributions. With these probabilistic input-output functions, in Chapter 3 we formalize the question of the <em>amount of information</em> that cells can gather from the environment. For this, we turn to information-theoretic concepts of maximal mutual information (otherwise known as channel capacity) between the state of the environment and the gene expression response from bacterial cells. Finally, we compare our predictions of the maximum amount of information\u2014measured in bits\u2014that cells can gather with single-cell inferences of this quantity.</p>",
        "doi": "10.7907/kpc2-b345",
        "publication_date": "2022",
        "thesis_type": "phd",
        "thesis_year": "2022"
    },
    {
        "id": "thesis:14327",
        "collection": "thesis",
        "collection_id": "14327",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:08182021-053622635",
        "primary_object_url": {
            "basename": "beeler_suzannah_thesis_2021.pdf",
            "content": "final",
            "filesize": 8849776,
            "license": "cc_by",
            "mime_type": "application/pdf",
            "url": "/14327/1/beeler_suzannah_thesis_2021.pdf",
            "version": "v4.0.0"
        },
        "type": "thesis",
        "title": "Deciphering Regulation in Escherichia coli: From Genes to Genomes",
        "author": [
            {
                "family_name": "Beeler",
                "given_name": "Suzannah Michelle",
                "orcid": "0000-0002-1930-4827",
                "clpid": "Beeler-Suzannah-Michelle"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Phillips",
                "given_name": "Robert B.",
                "orcid": "0000-0003-3082-2809",
                "clpid": "Phillips-R"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Rothenberg",
                "given_name": "Ellen V.",
                "orcid": "0000-0002-3901-347X",
                "clpid": "Rothenberg-E-V"
            },
            {
                "family_name": "Goentoro",
                "given_name": "Lea A.",
                "orcid": "0000-0002-3904-0195",
                "clpid": "Goentoro-L-A"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Phillips",
                "given_name": "Robert B.",
                "orcid": "0000-0003-3082-2809",
                "clpid": "Phillips-R"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>Advances in DNA sequencing have revolutionized our ability to read genomes. However, even in the most well-studied of organisms, the bacterium <i>Escherichia coli</i>, for \u2248 65% of promoters we remain ignorant of their regulation. Until we crack this regulatory Rosetta Stone, efforts to read and write genomes will remain haphazard. We introduce a new method, Reg-Seq, that links massively-parallel reporter assays with mass spectrometry to produce a base pair resolution dissection of more than 100 <i>E. coli</i> promoters in 12 growth conditions. We demonstrate that the method recapitulates known regulatory information. Then, we examine regulatory architectures for more than 80 promoters which previously had no known regulatory information. In many cases, we also identify which transcription factors mediate their regulation. This method clears a path for highly multiplexed investigations of the regulatory genome of model organisms, with the potential of moving to an array of microbes of ecological and medical relevance.</p>",
        "doi": "10.7907/p3rg-m937",
        "publication_date": "2022",
        "thesis_type": "phd",
        "thesis_year": "2022"
    },
    {
        "id": "thesis:14617",
        "collection": "thesis",
        "collection_id": "14617",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:05252022-172145394",
        "type": "thesis",
        "title": "Mechanical Approach to Active Matter: Reverse Osmotic Effect and Motility-Induced Phase Separation",
        "author": [
            {
                "family_name": "Row",
                "given_name": "Hyeongjoo",
                "orcid": "0000-0003-3623-512X",
                "clpid": "Row-Hyeongjoo"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Brady",
                "given_name": "John F.",
                "orcid": "0000-0001-5817-9128",
                "clpid": "Brady-J-F"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Shapiro",
                "given_name": "Mikhail G.",
                "orcid": "0000-0002-0291-4215",
                "clpid": "Shapiro-M-G"
            },
            {
                "family_name": "Brady",
                "given_name": "John F.",
                "orcid": "0000-0001-5817-9128",
                "clpid": "Brady-J-F"
            },
            {
                "family_name": "Wang",
                "given_name": "Zhen-Gang",
                "orcid": "0000-0002-3361-6114",
                "clpid": "Wang-Zhen-Gang"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            }
        ],
        "local_group": [
            {
                "literal": "div_chem"
            }
        ],
        "abstract": "The defining feature of active matter, self-propulsion requires constant consumption of energy to be maintained. As a result, active matter systems are inherently out of equilibrium and some principles that are accepted as common knowledge, particularly from thermodynamics, do not apply to the active matter systems. Arguably the most popular example is the motility-induced phase separation (MIPS) -- active matter can spontaneously phase separate into liquid-like dense phase and gas-like sparse phase even without any attractive interactions between the self-propelling constituents. In this thesis, I demonstrate the utility of a mechanical perspective in revealing and understanding the underlying physics of seemingly confounding behaviors of active matter systems. In Chapters 2 and 3, I consider the mechanics of a suspension of active colloidal particles when the transport properties (self-propelling speed and diffusivities) vary spatially. The mechanical analysis reveals the reverse-osmotic nature of active matter systems with a spatial variation in activity. I provide an explanation for why physical processes governed by the osmotic pressure of particles can appear in a reversed manner in active matter systems, e.g. a fluid can flow from regions of high concentration to low in a suspension of active colloids. In Chapter 4, I develop a mechanical theory of phase coexistence that applies to both equilibrium and nonequilibrium systems. By applying the mechanical theory to MIPS, I find phase coexistence conditions of the MIPS that allow a construction of a phase diagram, which excellently agrees with the results from computer simulations. The mechanical theory also allows access to the microscopic structure of phase interfaces. By investigating the interfacial structure, I discover interesting nonequilibrium interfacial behavior of the MIPS. I find that the width of the MIPS interface varies nonmonotically  with the activity of particles and provide a mechanical explanation for the phenomena.",
        "doi": "10.7907/qef0-e420",
        "publication_date": "2022",
        "thesis_type": "phd",
        "thesis_year": "2022"
    },
    {
        "id": "thesis:14551",
        "collection": "thesis",
        "collection_id": "14551",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:04162022-233242577",
        "primary_object_url": {
            "basename": "tang_weiyi_2022_thesis.pdf",
            "content": "final",
            "filesize": 28039388,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/14551/1/tang_weiyi_2022_thesis.pdf",
            "version": "v4.0.0"
        },
        "type": "thesis",
        "title": "Retroviral Lineage Analysis of the Vagal Neural Crest Reveals Multipotency Towards the Cardiac and Enteric Fates",
        "author": [
            {
                "family_name": "Tang",
                "given_name": "Weiyi",
                "orcid": "0000-0002-1279-1001",
                "clpid": "Tang-Weiyi"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Bronner",
                "given_name": "Marianne E.",
                "orcid": "0000-0003-4274-1862",
                "clpid": "Bronner-M-E"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Stathopoulos",
                "given_name": "Angelike",
                "orcid": "0000-0001-6597-2036",
                "clpid": "Stathopoulos-A"
            },
            {
                "family_name": "Rothenberg",
                "given_name": "Ellen V.",
                "orcid": "0000-0002-3901-347X",
                "clpid": "Rothenberg-E-V"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Bronner",
                "given_name": "Marianne E.",
                "orcid": "0000-0003-4274-1862",
                "clpid": "Bronner-M-E"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>The neural crest is a migratory stem cell population that gives rise to the craniofacial skeleton, heart septa, pigment cells, and peripheral nervous system.  Defects in neural crest development can lead to a broad range of congenital diseases, e.g., persistent truncus arteriosus, characterized by a mixture of oxygenated and deoxygenated blood, is related to the absence of the neural crest-derived outflow tract septum. Thus, a thorough understanding about neural crest migration, differentiation, and cell fate can shine lights on diagnosis and treatment of many congenital defects. A long-standing question is whether neural crest cells are composed of multipotent cells capable of giving rise to a wide range of cell types, or a mixture of fate-determined cells migrating to their destinations. Avian embryos resemble humans during neural crest development, but are more accessible to experimental manipulations than mammalian models, making them an ideal model to study the neural crest. Despite the abundance of information obtained from elegant experiments through interspecies grafting, the avian model lacks a direct tool to determine whether these cells are multipotent <i>in vivo</i>.</p>\r\n\r\n<p>Here, we present a new clonal analysis tool that takes advantage of Replication Incompetent Avian retroviruses (RIAs). We validate the method <i>in vitro</i> and present the potential application in the chick embryo to test the multipotency of the trunk neural crest. Next, we perform RIA-mediated lineage tracing at a population level and uncover cardiomyocytes as a previously unknown cardiac neural crest derivative in both chicken and mouse. Furthermore, we utilize RIA-mediated clonal analysis to identify individual premigratory vagal neural crest cells as a multipotent stem cell that forms cell types in both the heart and the gut. We then confirm the results by single-cell photoconversion assay that further confirms that migrating neural crest cells are also multipotent. Time-lapse imaging shows that stochastic post-mitotic migration is a cellular mechanism underlying multipotency. Finally, molecular perturbation experiments show that CXCR4 and RET are essential guidance cues for migratory neural crest cells to enter the heart and the gut, respectively. Together, these results demonstrate the utility of using RIA viruses to tackle questions regarding the lineage, developmental potential, and migratory pathways followed by neural crest cells in avian embryos.</p>",
        "doi": "10.7907/qakz-vm04",
        "publication_date": "2022",
        "thesis_type": "phd",
        "thesis_year": "2022"
    },
    {
        "id": "thesis:14223",
        "collection": "thesis",
        "collection_id": "14223",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:06012021-203020365",
        "primary_object_url": {
            "basename": "Shashank_Gandhi_Thesis_Final.pdf",
            "content": "final",
            "filesize": 80122487,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/14223/1/Shashank_Gandhi_Thesis_Final.pdf",
            "version": "v4.0.0"
        },
        "type": "thesis",
        "title": "Molecular Mechanisms Underlying Cardiac Neural Crest Development in Avian Embryos",
        "author": [
            {
                "family_name": "Gandhi",
                "given_name": "Shashank",
                "orcid": "0000-0002-4081-4338",
                "clpid": "Gandhi-Shashank"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Bronner",
                "given_name": "Marianne E.",
                "orcid": "0000-0003-4274-1862",
                "clpid": "Bronner-M-E"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Rothenberg",
                "given_name": "Ellen V.",
                "orcid": "0000-0002-3901-347X",
                "clpid": "Rothenberg-E-V"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Parker",
                "given_name": "Joseph",
                "orcid": "0000-0001-9598-2454",
                "clpid": "Parker-J"
            },
            {
                "family_name": "Zernicka-Goetz",
                "given_name": "Magdalena",
                "orcid": "0000-0002-7004-2471",
                "clpid": "Zernicka-Goetz-M"
            },
            {
                "family_name": "Bronner",
                "given_name": "Marianne E.",
                "orcid": "0000-0003-4274-1862",
                "clpid": "Bronner-M-E"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>The neural crest is a multipotent, vertebrate-specific stem cell population that gives rise to diverse cell types in the developing embryo, including craniofacial cartilage, enteric ganglia, and cardiac septa. Neural crest cells that originate from a given axial level in the embryo give rise to a characteristic array of progeny and follow distinct pathways from those arising at other levels. One of these subpopulations, called the cardiac neural crest, originates in the dorsal hindbrain and migrates into the developing heart, where it forms the aorticopulmonary septum, cardiac ganglion, and part of the interventricular septum. Mutations in or loss of these cells causes heart defects that are among the most common birth defects in the general population. For my thesis, I sought to identify the mechanisms that underlie the formation of neural crest cells, and confer cardiac neural crest cells with their unique developmental potential.</p>\r\n\r\n<p>To enable interrogation of epistatic relationships between key neural crest genes during neural crest induction and crest specification, I first optimized the CRISPR-Cas9 system for genome editing in gastrula and neurula-stage chicken embryos. I then further improved the CRISPR toolbox by devising an all-in-one single-plasmid strategy that harnesses the self-cleavage properties of ribozymes for the simultaneous delivery of Cas9, gRNAs, and fluorescent reporters in transfected cells. This has enabled live tracking of wildtype and mutant neural crest cells as they migrate to their terminal locations.</p>\r\n\r\n<p>Prior to their induction at the neural plate border, precursors in the neural plate border are transcriptionally primed toward multiple cell fates, including neural tube, neural crest, epidermis, and placode. While this priming has been thought to involve epigenetic regulation, chromatin remodeler genes have been overlooked in the context of neural crest formation given their concomitant expression in surrounding cell types. By combining single-cell transcriptional profiling of the early chick embryonic hindbrain with temporally-controlled knockouts, I uncovered a novel bimodal mechanism whereby the chromatin remodeler gene <i>Hmga1</i> first regulates <i>Pax7</i>-dependent neural crest induction at the neural plate border, and later modulates Wnt signaling in the dorsal neural tube to control neural crest delamination. These results established <i>Hmga1</i> as a direct regulator of neural crest induction and emigration.</p>\r\n\r\n<p>Finally, given that amongst distinct neural crest subpopulations designated as cranial, cardiac/vagal, and trunk, only cardiac crest has the ability to contribute to heart development, and that neither trunk nor cranial neural crest subpopulations can rescue the loss of cardiac crest, I investigated the genetic logic that imbues cardiac crest with its unique ability to form cardiovascular derivatives. To this end, I combined surgical ablations, bulk and single-cell transcriptional profiling, RNA labeling, CRISPR-Cas9-mediated gene editing, transcription factor binding motif mutation analysis, and transgenic tissue grafting approaches to uncover and characterize a cardiac-neural-crest-specific subcircuit comprised of the transcription factors <i>Sox8</i>, <i>Tgif1</i>, and <i>Ets1</i>. I demonstrated that ectopic expression of this subcircuit in trunk neural crest cells reprogrammed them towards a cardiac-crest-like fate, and transplanting these reprogrammed cells in place of ablated cardiac crest restored cardiac-crest-like migration patterns and rescued outflow tract septation defects.</p>\r\n\r\n<p>Taken together, my thesis work has not only built a genome engineering toolbox for a key model system in developmental biology, but has also expanded our understanding of the genetic circuits that govern the formation of the cardiac neural crest and underlie its unique ability to contribute to the heart.</p>",
        "doi": "10.7907/y1e4-d090",
        "publication_date": "2021",
        "thesis_type": "phd",
        "thesis_year": "2021"
    },
    {
        "id": "thesis:13837",
        "collection": "thesis",
        "collection_id": "13837",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:07072020-154545363",
        "type": "thesis",
        "title": "Mechanism and Scaling of Eukaryotic Transcription Activation",
        "author": [
            {
                "family_name": "Quintero Cadena",
                "given_name": "Porfirio",
                "orcid": "0000-0003-0067-5844",
                "clpid": "Quintero-Cadena-Porfirio"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Sternberg",
                "given_name": "Paul W.",
                "orcid": "0000-0002-7699-0173",
                "clpid": "Sternberg-P-W"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Rothenberg",
                "given_name": "Ellen V.",
                "orcid": "0000-0002-3901-347X",
                "clpid": "Rothenberg-E-V"
            },
            {
                "family_name": "Guttman",
                "given_name": "Mitchell",
                "orcid": "0000-0003-4748-9352",
                "clpid": "Guttman-M"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Sternberg",
                "given_name": "Paul W.",
                "orcid": "0000-0002-7699-0173",
                "clpid": "Sternberg-P-W"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>Transcription activation is a universal process by which living cells adapt. Decades of work in this field have produced an intelligible paradigm of transcription activation that provides fundamental insights into its underlying molecular mechanisms. This thesis attempts to extend such paradigm to explain how transcription activation can be implemented across the diversity of molecular environments found in eukaryotic nuclei. Specifically, this diversity calls for an explanation of how this process scales throughout a range of genome sizes that spans five orders of magnitude, and of how to think about this subject in the increasingly relevant context of liquid-liquid phase-separation. We leverage data from RNA-seq, smFISH, growth-rate, fluorescence microscopy, computer simulations and literature to identify an appropriate and useful level of abstraction in which to grow our current paradigm. We propose scaling and phase-separation, two seemingly disparate aspects of transcription, are explained and intrinsically linked by a novel molecular state in which multiple RNA polymerases can bind the transcription complex. We provide support and rationale for this addition to the transcription model, and generate testable hypotheses that may further clarify the mechanism and evolution of eukaryotic transcription activation.</p>",
        "doi": "10.7907/m21w-8461",
        "publication_date": "2021",
        "thesis_type": "phd",
        "thesis_year": "2021"
    },
    {
        "id": "thesis:14084",
        "collection": "thesis",
        "collection_id": "14084",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:02192021-010538691",
        "primary_object_url": {
            "basename": "chour_william_2021_thesis.pdf",
            "content": "final",
            "filesize": 140220437,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/14084/1/chour_william_2021_thesis.pdf",
            "version": "v6.0.0"
        },
        "type": "thesis",
        "title": "Molecular Technologies for Antigen-Based Immunity",
        "author": [
            {
                "family_name": "Chour",
                "given_name": "William",
                "orcid": "0000-0003-1817-0123",
                "clpid": "Chour-William"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Heath",
                "given_name": "James R.",
                "orcid": "0000-0001-5356-4385",
                "clpid": "Heath-J-R"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Shapiro",
                "given_name": "Mikhail G.",
                "orcid": "0000-0002-0291-4215",
                "clpid": "Shapiro-M-G"
            },
            {
                "family_name": "Heath",
                "given_name": "James R.",
                "orcid": "0000-0001-5356-4385",
                "clpid": "Heath-J-R"
            },
            {
                "family_name": "Rothenberg",
                "given_name": "Ellen V.",
                "orcid": "0000-0002-3901-347X",
                "clpid": "Rothenberg-E-V"
            },
            {
                "family_name": "Yang",
                "given_name": "Changhuei",
                "orcid": "0000-0001-8791-0354",
                "clpid": "Yang-Changhuei"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>The presence and proliferation antigen-specific T cells is a defining characteristic of an adaptive immune response against various disease types (autoimmune, cancer, and infectious). The use of Class I and Class II peptide-major histocompatibility complex (pMHC) reagents to identify such cells, however, is technically difficult and expensive, and it has been challenging to refine synthesis protocols for higher yield and more efficient assembly to accommodate large-scale applications. This achievement would enable high-throughput capture of corresponding T cell receptors (TCR), which may be further used in clinical applications such as adoptive cell transfer therapies. Overcoming this hurdle requires the development and integration of various molecular technologies and analytical methods.</p>\r\n\r\n<p>Toward this end, the bulk of my thesis work, covered in Chapter 2, introduces these developments in the context of pMHCs, where the three subunits of each reagent are covalent linked together and expressed as a single protein. These single-chain trimer (SCT) technologies primarily consist of traditional DNA cloning and protein production techniques which have been streamlined for applications requiring output on the scale of 10<sup>2</sup>-10<sup>3</sup> of reagents. This chapter serves as the foundation for much of the methodology discussed throughout the rest of my thesis, and thus should serve as a reference point. The generated constructs are also functionally validated here, and potential future research directions are outlined.</p>\r\n\r\n<p>In Chapter 3, I explore the use of this technology in the context of COVID-19 to enumerate antigen specificity of the CD8+ T cell immune response. Class I SCTs were constructed to present peptides across several SARS-CoV-2 protein domains, using various HLA alleles to match haplotyped participant blood samples. These reagents were then used to capture SARS-CoV-2-specific T cells through flow and nanoparticle cytometry to demonstrate HLA-dependent, domain-dependent immune responses. Identified TCRs were cloned into T cells for confirmation of antigen specificity and functional cytotoxicity.</p>\r\n\r\n<p>In Chapters 4 and 5, I explore potential pMHC applications in cancer antigen contexts, covering both tumor-associated and tumor-specific antigens. Through various collaborations across the west coast (UCLA, Parker Institute, Fred Hutchinson Cancer Research Center), I make use of the SCT platform to showcase new assays to discover and rank key tumor targets (Chapter 4). Finally, Chapter 5 is a reproduction of our lab\u2019s published work concerning identification of antigen-specific CD8+ T cells from melanoma cancer patients.</p>\r\n\r\n<p>In summary, the adaptation of SCTs in a high-throughput format allows for the rapid enumeration of antigen-specific T-cell receptor sequences. As demonstrated in the contexts of COVID-19 and cancer, this SCT platform enables subsequent downstream applications, such as single-cell, antigen-specific immunophenotypic mapping/analysis and target discovery for personalized immunotherapies.</p>",
        "doi": "10.7907/z20t-nq62",
        "publication_date": "2021",
        "thesis_type": "phd",
        "thesis_year": "2021"
    },
    {
        "id": "thesis:14204",
        "collection": "thesis",
        "collection_id": "14204",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:05302021-051953086",
        "primary_object_url": {
            "basename": "CheeHuat(Linus)Eng_thesis.pdf",
            "content": "final",
            "filesize": 6063063,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/14204/1/CheeHuat(Linus)Eng_thesis.pdf",
            "version": "v4.0.0"
        },
        "type": "thesis",
        "title": "Plus Ultra: Genome-Wide Spatial Transcriptomics with RNA seqFISH+",
        "author": [
            {
                "family_name": "Eng",
                "given_name": "Chee Huat (Linus)",
                "orcid": "0000-0002-2521-9696",
                "clpid": "Eng-Chee-Huat-Linus"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Cai",
                "given_name": "Long",
                "orcid": "0000-0002-7154-5361",
                "clpid": "Cai-Long"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Ismagilov",
                "given_name": "Rustem F.",
                "orcid": "0000-0002-3680-4399",
                "clpid": "Ismagilov-R-F"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Guttman",
                "given_name": "Mitchell",
                "orcid": "0000-0003-4748-9352",
                "clpid": "Guttman-M"
            },
            {
                "family_name": "Cai",
                "given_name": "Long",
                "orcid": "0000-0002-7154-5361",
                "clpid": "Cai-Long"
            }
        ],
        "local_group": [
            {
                "literal": "div_chem"
            }
        ],
        "abstract": "<p>Visualizing single cells and their organization in intact tissue is crucial to understanding their governing biological function. Even though single cell RNA sequencing has provided many insights into the heterogeneity and gene expression profiles across many tissue types, the dissociation process which loses the spatial information is hindering our deeper understanding of how these transcriptional distinct cell types are organized and interacting in their native tissue environment.</p>\r\n\r\n<p>The thesis begins by giving a background on how single cell RNA sequencing has transformed biology and the emergence of spatial technology such as sequential fluorescence in situ hybridization (seqFISH).  While spatial methods are useful for mapping the cell types identified from single cell RNA sequencing, the need for turning spatial technology such as seqFISH, which has high detection efficiency of the transcriptome with spatial information, into an in situ discovery tool is discussed as the scientific community\u2019s goal heads towards building spatial atlases for every human tissues and organs such as the brain.</p>\r\n \r\n<p>While seqFISH has high detection efficiency, it is still limited in the number of genes capable of profiling at once. The major obstacle is the optical crowding problems when more RNA species are targeted and imaged using a fluorescence microscope. In Chapter 2, we first investigated, if the RNA molecules are instead captured on a coverslip and profiled with sequential barcoding strategy, the FISH-based method will reliably characterize the transcriptome when molecular crowding is not an issue.</p>\r\n \r\n<p>Finally, in Chapter 3, we demonstrate the barcoding strategy to break through the molecular crowding limit of multiplexed FISH. From being able to profile hundreds to a thousand genes by various multiplexed FISH methods at that time in the field, we succeeded in profiling 10,000 genes by RNA seqFISH+, an evolved version of seqFISH, in various intact tissue sections, turning seqFISH+ into a spatial discovery technology with its genome-wide coverage and high detection efficiency. The work described in this part of the thesis is highlighted in Nature Method\u2019s Method of The Year 2020- Spatially-resolved Transcriptomic article.</p>",
        "doi": "10.7907/nvfe-5j74",
        "publication_date": "2021",
        "thesis_type": "phd",
        "thesis_year": "2021"
    },
    {
        "id": "thesis:14260",
        "collection": "thesis",
        "collection_id": "14260",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:06082021-005042886",
        "type": "thesis",
        "title": "Statistical Mechanics of Problems in Transcription Regulation",
        "author": [
            {
                "family_name": "Morrison",
                "given_name": "Muir",
                "orcid": "0000-0002-0768-7234",
                "clpid": "Morrison-Muir"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Phillips",
                "given_name": "Robert B.",
                "orcid": "0000-0003-3082-2809",
                "clpid": "Phillips-R"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Roukes",
                "given_name": "Michael Lee",
                "orcid": "0000-0002-2916-6026",
                "clpid": "Roukes-M-L"
            },
            {
                "family_name": "Phillips",
                "given_name": "Robert B.",
                "orcid": "0000-0003-3082-2809",
                "clpid": "Phillips-R"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Van Valen",
                "given_name": "David A.",
                "orcid": "0000-0001-7534-7621",
                "clpid": "Van-Valen-D"
            }
        ],
        "local_group": [
            {
                "literal": "div_pma"
            }
        ],
        "abstract": "<p>As the quantity of sequenced genome data continues to multiply, our understanding of the transcriptional regulation of genomes has lagged behind. This deficit impinges on research throughout biology, from fundamental questions of how evolution proceeds to eminently practical questions such as how antibiotic resistance arises.</p>\r\n\r\n<p>In this thesis we present three threads that address the question of transcriptional regulation from distinct perspectives. The first thread focuses on the simplest nontrivial regulation motif common in bacteria. We analyze in turn a sampling of the myriad mathematical models previously proposed in the literature for this system. We attempt to shine light on the similarities and differences of the models\u2019 predictions, clarify their microscopic interpretations, and offer guidance as to situations when one model or another should be preferred or even distinguishable.</p>\r\n\r\n<p>The second thread considers a substantially more complicated genetic circuit, for which we build a minimal phenomenological model that retains intuitive microscopic meaning for all its parameters. The model neatly explains recent experimental observations of bistability in the circuit, and suggests natural generalizations to other metabolically important gene circuits with qualitatively similar architectures.</p>\r\n\r\n<p>Motivation for the third thread comes from even more complicated transcriptional regulation problems with a multitude of regulatory proteins and binding sites, where even enumerating all possible DNA-protein complexes manually is a formidable challenge. Here we propose a method to tackle this complexity that uses ideas from quantum field theory to encode assembly rules for macromolecular complexes. By specifying a small set of rules, we avoid manual enumeration of the much larger set of complexes, allowing the formalism to automatically generate this set for us.</p>",
        "doi": "10.7907/d042-rp26",
        "publication_date": "2021",
        "thesis_type": "phd",
        "thesis_year": "2021"
    },
    {
        "id": "thesis:13838",
        "collection": "thesis",
        "collection_id": "13838",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:07082020-113341068",
        "type": "thesis",
        "title": "Guiding Self-Organization in Active Matter with Spatiotemporal Boundary Conditions",
        "author": [
            {
                "family_name": "Ross",
                "given_name": "Tyler David",
                "orcid": "0000-0002-7872-3992",
                "clpid": "Ross-Tyler-David"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Winfree",
                "given_name": "Erik",
                "orcid": "0000-0002-5899-7523",
                "clpid": "Winfree-E"
            },
            {
                "family_name": "Rothemund",
                "given_name": "Paul W. K.",
                "orcid": "0000-0002-1653-3202",
                "clpid": "Rothemund-P-W-K"
            },
            {
                "family_name": "Qian",
                "given_name": "Lulu",
                "orcid": "0000-0003-4115-2409",
                "clpid": "Qian-Lulu"
            },
            {
                "family_name": "Phillips",
                "given_name": "Robert B.",
                "orcid": "0000-0003-3082-2809",
                "clpid": "Phillips-R"
            },
            {
                "family_name": "Brady",
                "given_name": "John F.",
                "orcid": "0000-0001-5817-9128",
                "clpid": "Brady-J-F"
            },
            {
                "family_name": "Shapiro",
                "given_name": "Mikhail G.",
                "orcid": "0000-0002-0291-4215",
                "clpid": "Shapiro-M-G"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>In this thesis, I demonstrate that self-organized structures and forces can be guided by modulating the interactions between force-generating molecules in space and time. The physics of self-organizing systems is an open frontier. We do not have a complete set of principles that can describe how a dynamic structure forms based on the non-equilibrium dynamics of its constituent components. Yet, living systems appear to depend on some set of rules of self-organization in order to reliably carry out their mechanical functions. Force-generating, active, molecules in the form of motor proteins and filamentous polymers are responsible for performing fundamental tasks in living matter, such as locomotion and division. While it is known that the regulation of motor-filament interactions is necessary to achieve the dynamic structures that drive movement and propagation, the role of spatial and temporal patterning in self-organizing systems has not been explored. I design a artificial system of purified molecules where the interactions between motors and filaments are toggled with light. By patterning molecular interactions in space and time, I show that it is possible to localize the formation of spherically symmetric asters, which can be moved, merged, and used to generate advective fluid flows. The ability to pattern molecular interactions in space and time offers a new perspective in the search for principles of active self-organization. Spatial and temporal control makes it possible to start distilling how the interactions between active molecules determine the mesoscopic behaviors of self-organized structures. These rules ultimately govern the physics of living matter and may eventually be harnessed to build new materials and cell-like machines.</p>",
        "doi": "10.7907/q85h-j730",
        "publication_date": "2021",
        "thesis_type": "phd",
        "thesis_year": "2021"
    },
    {
        "id": "thesis:14111",
        "collection": "thesis",
        "collection_id": "14111",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:03262021-160841703",
        "primary_object_url": {
            "basename": "thesis.pdf",
            "content": "final",
            "filesize": 3545856,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/14111/1/thesis.pdf",
            "version": "v7.0.0"
        },
        "type": "thesis",
        "title": "Signal Amplification in Synthetic Bacterial Communication",
        "author": [
            {
                "family_name": "Parkin",
                "given_name": "James Michael",
                "orcid": "0000-0002-4058-2338",
                "clpid": "Parkin-James-Michael"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Murray",
                "given_name": "Richard M.",
                "orcid": "0000-0002-5785-7481",
                "clpid": "Murray-R-M"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Winfree",
                "given_name": "Erik",
                "orcid": "0000-0002-5899-7523",
                "clpid": "Winfree-E"
            },
            {
                "family_name": "Leadbetter",
                "given_name": "Jared R.",
                "orcid": "0000-0002-7033-0844",
                "clpid": "Leadbetter-J-R"
            },
            {
                "family_name": "Bois",
                "given_name": "Justin S.",
                "orcid": "0000-0001-7137-8746",
                "clpid": "Bois-J-S"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Murray",
                "given_name": "Richard M.",
                "orcid": "0000-0002-5785-7481",
                "clpid": "Murray-R-M"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>Synthetic biology will one day enable embedded control of a variety of chemical and biological contexts, from the human gastrointestinal tract to crop roots. Groups of engineered organisms, also known as synthetic consortia, can inhabit niches of interest while monitoring and intervening according to their genetic design. However, the spatial structure of the deployment environments can obstruct coordination between cosortia members. The mechanisms engineered bacteria use to communicate must contend with these adversarial conditions to maximize group performance.</p>\r\n\r\n<p>Coordination between synthetic bacteria is typically achieved using small molecules that can traverse cell membranes through passive transport. Cell communicate by producing and sensing these small molecules. In cell-cell signaling relationships composed of a sender population and a receiver population, the concentration of signaling molecule sensed by the receiver cells depends on the spatial patterning of the two groups, the geometry of the diffusive environment, and the sender population\u2019s signal secretion rate.</p>\r\n\r\n<p>To make sender-receiver communication more robust to these environmental features, we introduce a third consortium strain that transiently amplifies local signaling molecule concentrations. These amplifier cells employ a synchronized pulse-generating circuit built using Lux-type quorum sensing components and an IFFL transcriptional architecture. When applied to sender-receiver consortia growing on semi-solid media, these amplifier cells respond to sender-secreted signaling molecules by contributing a small amount themselves. The support of amplifier cells enables communication over longer distances than can be achieved by sender cells alone and can partially recover coordination in small consortia where the sender population is too small to successfully signal its receiver population alone. We extend these results using simulation to investigate the benefit that amplifier cells confer to consortia of varying complexity.</p>",
        "doi": "10.7907/50p8-bd89",
        "publication_date": "2021",
        "thesis_type": "phd",
        "thesis_year": "2021"
    },
    {
        "id": "thesis:13664",
        "collection": "thesis",
        "collection_id": "13664",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:03262020-092455420",
        "type": "thesis",
        "title": "A Quantitative and High-Throughput Approach to Gene Regulation in Escherichia coli",
        "author": [
            {
                "family_name": "Ireland",
                "given_name": "William Thornton",
                "orcid": "0000-0003-0971-2904",
                "clpid": "Ireland-William-Thornton"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Phillips",
                "given_name": "Robert B.",
                "orcid": "0000-0003-3082-2809",
                "clpid": "Phillips-R"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Phillips",
                "given_name": "Robert B.",
                "orcid": "0000-0003-3082-2809",
                "clpid": "Phillips-R"
            },
            {
                "family_name": "Roukes",
                "given_name": "Michael Lee",
                "orcid": "0000-0002-2916-6026",
                "clpid": "Roukes-M-L"
            },
            {
                "family_name": "Orphan",
                "given_name": "Victoria J.",
                "orcid": "0000-0002-5374-6178",
                "clpid": "Orphan-V-J"
            }
        ],
        "local_group": [
            {
                "literal": "div_pma"
            }
        ],
        "abstract": "<p>Measurements in biology have reached a level of precision that demands quantitative modeling. This is particularly true in the field of gene regulation, where concepts from physics such as thermodynamics have allowed for accurate models to be made.</p>\r\n   \r\n<p>Many issues remain. DNA sequencing is routine enough to sequence new genomes in days and cheap enough to use deep sequencing to perform precision measurements, but our ability to interpret the wealth of genomic data is lagging behind, especially in the realm of gene regulation. The primary reason is that we lack any information what so ever as to the basic regulatory details of approximately 65 percent of operons even in <i>E. coli</i>, the best understood organism in biology. As a result we cannot use our hard won modeling efforts to understand any of these operons.</p>\r\n  \r\n<p>This work takes steps to address these issues. First we use 30 LacI mutants as a test case to prove that we can make quantitatively accurate models of gene expression and sequence-dependent binding energies of transcription factors and RNA polymerase.</p>\r\n\r\n<p>Next we note that much of the quantitative insight available on transcriptional regulation relies on work on only a few model regulatory systems such as LacI as was considered above. We develop an approach, through a combination of massively parallel reporter assays, mass spectrometry, and information-theoretic modeling that can be used to dissect bacterial promoters in a systematic and scalable way. We demonstrate that we can uncover a qualitative list of transcription factor binding sites as well as their associated quantitative details from both well-studied and previously uncharacterized promoters in <i>E. coli</i>.</p>\r\n\r\n<p>Finally we extend the above method to over 100 <i>E. coli</i> promoters using over 12 growth conditions. We show the method recapitulates known regulatory information. Then, we examine regulatory architectures for more than 80 promoters which previously had no known regulation. In many cases, we identify which transcription factors mediate their regulation. The method introduced clears a path for fully characterizing the regulatory genome of <i>E. coli</i> and advances  towards the goal of using this method on a wide variety of other organisms including other prokaryotes and eukaryotes such as <i>Drosophila melanogaster</i>.</p>",
        "doi": "10.7907/0sk3-hd69",
        "publication_date": "2020",
        "thesis_type": "phd",
        "thesis_year": "2020"
    },
    {
        "id": "thesis:13609",
        "collection": "thesis",
        "collection_id": "13609",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:12162019-183140887",
        "primary_object_url": {
            "basename": "Thesis_Dong-Wook_Kim_v3.pdf",
            "content": "final",
            "filesize": 17443112,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/13609/1/Thesis_Dong-Wook_Kim_v3.pdf",
            "version": "v5.0.0"
        },
        "type": "thesis",
        "title": "Multimodal Analysis of Cell Types in a Hypothalamic Node Controlling Social Behavior in Mice",
        "author": [
            {
                "family_name": "Kim",
                "given_name": "Dong-Wook",
                "orcid": "0000-0002-5497-5853",
                "clpid": "Kim-Dong-Wook"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Anderson",
                "given_name": "David J.",
                "orcid": "0000-0001-6175-3872",
                "clpid": "Anderson-D-J"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Pachter",
                "given_name": "Lior S.",
                "orcid": "0000-0002-9164-6231",
                "clpid": "Pachter-L"
            },
            {
                "family_name": "Anderson",
                "given_name": "David J.",
                "orcid": "0000-0001-6175-3872",
                "clpid": "Anderson-D-J"
            },
            {
                "family_name": "Oka",
                "given_name": "Yuki",
                "orcid": "0000-0003-2686-0677",
                "clpid": "Oka-Yuki"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>The advent and recent advances of single-cell RNA sequencing (scRNA-seq) have yielded transformative insights into our understanding of cellular diversity in the central nervous system (CNS) with unprecedented detail. However, due to current experimental and computational limitations on defining transcriptomic cell types (T-types) and the multiple phenotypic features of cell types in the CNS, an integrative and multimodal approach should be required for the comprehensive classification of cell types.</p>\r\n\r\n<p>To this end, performing multimodal analysis of scRNA-seq in hypothalamus would be very beneficial in that hypothalamus, controlling homeostatic and innate survival behaviors which known to be highly conserved across a wide range of species and encoded in hard-wired brain circuits, is likely to display the more straightforward relationship between transcriptomic identity, axonal projections, and behavioral activation, respectively. In my dissertation, I have been focused on the cell type characterizations of a hypothalamic node controlling innate social behavior in mice, the ventrolateral subdivision of the ventromedial hypothalamus (VMHvl). VMHvl only contains ~4,000 neurons per hemisphere in mice but due to its behavioral, anatomical, and molecular heterogeneity, which T-types in VMHvl are related to connectivity and behavioral function is largely unknown.</p>\r\n\r\n<p>In Chapter II, I described my main thesis work to perform scRNA-seq in VMHvl using two independent platforms: SMART-seq2 (~4,500 neurons sequenced) and 10x (~78,000 neurons sequenced). Specifically, 17 joint VMHvl T-types including several sexually dimorphic clusters were identified by canonical correlation analysis (CCA) in Seurat, and the majority of them were validated by multiplexed single-molecule FISH (seqFISH). Correspondence between transcriptomic identity, and axonal projections or behavioral activation, respectively, was also investigated. Immediate early gene analysis identified T-types exhibiting preferential responses to intruder males versus females but only rare examples of behavior-specific activation. Unexpectedly, many VMHvl T-types comprise a mixed population of neurons with different projection target preferences. Overall our analysis revealed that, surprisingly, few VMHvl T-types exhibit a clear correspondence with behavior-specific activation and connectivity.</p>\r\n\r\n<p>In Chapter III, I will discuss about future directions for a deeper and better understanding of VMHvl cell types. Briefly, my previous data from whole-cell patch clamp recording in VMHvl slices suggested that there were at least 4 distinct electrophysiological cell types (E-types). Additionally, two distinct neuromodulatory effects on VMHvl were observed (persistently activated by vasopressin/oxytocin vs. silenced by nitric oxide) by monitoring populational activities using two-photon Ca2+ imaging in slices. Based on the results from the first part and combined with advanced molecular techniques (e.g. Patch-seq and CRISPR-Cas9), we can further dissect out the cellular diversity in VMHvl and their functional implications.</p>",
        "doi": "10.7907/RGVK-9962",
        "publication_date": "2020",
        "thesis_type": "phd",
        "thesis_year": "2020"
    },
    {
        "id": "thesis:13709",
        "collection": "thesis",
        "collection_id": "13709",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:05182020-141933604",
        "primary_object_url": {
            "basename": "NeumannAdam2020Thesis.pdf",
            "content": "final",
            "filesize": 13775647,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/13709/1/NeumannAdam2020Thesis.pdf",
            "version": "v12.0.0"
        },
        "type": "thesis",
        "title": "Towards Single Molecule Imaging Using Nanoelectromechanical Systems",
        "author": [
            {
                "family_name": "Neumann",
                "given_name": "Adam Patrick",
                "orcid": "0000-0002-2961-7640",
                "clpid": "Neumann-Adam-Patrick"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Roukes",
                "given_name": "Michael Lee",
                "orcid": "0000-0002-2916-6026",
                "clpid": "Roukes-M-L"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Phillips",
                "given_name": "Robert B.",
                "orcid": "0000-0003-3082-2809",
                "clpid": "Phillips-R"
            },
            {
                "family_name": "Roukes",
                "given_name": "Michael Lee",
                "orcid": "0000-0002-2916-6026",
                "clpid": "Roukes-M-L"
            },
            {
                "family_name": "Beauchamp",
                "given_name": "Jesse L.",
                "orcid": "0000-0001-8839-4822",
                "clpid": "Beauchamp-J-L"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Sader",
                "given_name": "John E.",
                "orcid": "0000-0002-7096-0627",
                "clpid": "Sader-J-E"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>We incorporate nanoelectromechanical systems (NEMS) into a state-of-the-art commercial mass spectrometer (Q Exactive Plus with Orbitrap detection). This unique hybrid instrument is capable of ionizing molecules up to 4.5 MDa in their intact native state, isolating molecules of interest according to their mass-to-charge ratio, performing high resolution mass spectrometry (MS), and delivering those molecules to the NEMS. We use NEMS optimized for detecting the inertial mass of adsorbed species directly, which contrasts with indirect measurements of the mass-to-charge ratio performed with typical instruments. This unique form of mass spectrometry, NEMS-MS, with its single-molecule sensitivity, has promising applications to the fields of proteomics and native mass spectrometry, including deep proteomic profiling, single-cell proteomics, mass spectrometry-based imaging, or identifying viruses in their <i>in vivo</i> state.</p>\r\n\r\n<p>We analyze intact <i>E. coli</i> GroEL chaperonin, a noncovalent 801 kDa complex consisting of 14 identical subunits. GroEL was sent to NEMS operated with the first two vibrational modes monitored in real time. Molecules physisorbing to the NEMS cause an abrupt shift in its resonance frequencies. The change in resonance frequencies is used to calculate the mass of each molecule. A mass spectrum is compiled with a main peak of 846 kDa, close to the expected value, and a secondary peak resolved near twice the mass of GroEL.</p>\r\n<p>Measurements are then performed operating the first three modes simultaneously. Using a technique called inertial imaging, frequency shifts are used to calculate the first three mass moments: mass, position, and variance (size). This is used to distinguish between adsorbates arriving in a single, point-like distribution or a more extended distribution, thus demonstrating a rudimentary form of molecular imaging.</p>\r\n\r\n<p>Two new theories are presented for analyzing frequency-shift data. The first approach offers a more streamlined approach for calculating the mass moments. This approach is used to improve the mass spectrum of the GroEL calculated using three-mode data, producing a main peak almost fully resolved at 805 kDa. An entirely different approach is presented that allows for obtaining the mass density distribution of an adsorbed molecule (i.e., imaging) with a higher number of modes.</p>",
        "doi": "10.7907/n4ap-7h91",
        "publication_date": "2020",
        "thesis_type": "phd",
        "thesis_year": "2020"
    },
    {
        "id": "thesis:11226",
        "collection": "thesis",
        "collection_id": "11226",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:10102018-143313907",
        "type": "thesis",
        "title": "Statistical Methods for Gene Differential Expression Analysis of RNA-Sequencing",
        "author": [
            {
                "family_name": "Yi",
                "given_name": "Lynn Donglin",
                "orcid": "0000-0003-4575-0158",
                "clpid": "Yi-Lynn-Donglin"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Pachter",
                "given_name": "Lior S.",
                "orcid": "0000-0002-9164-6231",
                "clpid": "Pachter-L"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Chan",
                "given_name": "David C.",
                "orcid": "0000-0002-0191-2154",
                "clpid": "Chan-D-C"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Pachter",
                "given_name": "Lior S.",
                "orcid": "0000-0002-9164-6231",
                "clpid": "Pachter-L"
            },
            {
                "family_name": "Chandrasekaran",
                "given_name": "Venkat",
                "clpid": "Chandrasekaran-V"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>RNA-Sequencing (\"RNA-Seq\") is performed to measure gene expression, often to ask the question of what genes are differentially expressed across various biological conditions. Statistical methods have been used to model RNA-Seq quantifications in order to determine differential expression, and have traditionally be divided into gene-level methods and transcript-level methods. There has been little attempt to connect the statistical divide, although transcript expression and gene expression are biologically inextricably linked. In this thesis, we provide a case study of a comparative differential expression analysis, demonstrating that many differential expression events happen on the isoform-level, and that performing an analysis using only summarized gene quantifications would fail to capture these events. Furthermore, we develop statistical methods that unify the transcript-level and gene-level analysis. In bulk RNA-Seq, by using p-value aggregation methods, we are able to translate transcript-level results into gene-level results under a unified framework. For single cell RNA-Seq, we propose using multiple logistic regression, leveraging the high dimensionality of the data in order to determine if the transcript quantifications pertaining to a gene are able to constitute a linear discriminant for cell type. This method combines differential transcript expression analysis and differential gene expression analysis into a unified framework which we call \u201cgene differential expression.\u201d Lastly, we demonstrate that our methods could be used on transcript compatibility counts instead of transcript quantifications in order to bypass ambiguous read assignment and improve accuracy. We show that transcript compatibility counts obtained via transcriptome pseudoalignment are comparable in quantification accuracy to quantifications from genome alignment methods.</p>",
        "doi": "10.7907/0YE6-2217",
        "publication_date": "2019",
        "thesis_type": "phd",
        "thesis_year": "2019"
    },
    {
        "id": "thesis:11161",
        "collection": "thesis",
        "collection_id": "11161",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:08262018-213846283",
        "primary_object_url": {
            "basename": "vipul_singhal_thesis_2018.pdf",
            "content": "final",
            "filesize": 5525152,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/11161/1/vipul_singhal_thesis_2018.pdf",
            "version": "v8.0.0"
        },
        "type": "thesis",
        "title": "Modeling, Computation, and Characterization to Accelerate the Development of Synthetic Gene Circuits in Cell-Free Extracts",
        "author": [
            {
                "family_name": "Singhal",
                "given_name": "Vipul",
                "orcid": "0000-0003-1670-1824",
                "clpid": "Singhal-Vipul"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Murray",
                "given_name": "Richard M.",
                "clpid": "Murray-R-M"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Winfree",
                "given_name": "Erik",
                "clpid": "Winfree-E"
            },
            {
                "family_name": "Goentoro",
                "given_name": "Lea A.",
                "clpid": "Goentoro-L-A"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Murray",
                "given_name": "Richard M.",
                "clpid": "Murray-R-M"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>Synthetic biology may be defined as an attempt at using engineering principles to design and build novel biological functionalities. An important class of such functionalities involves the bottom up design of genetic networks (or 'circuits') to control cellular behavior. Performing design iterations on these circuits in vivo is often a time consuming process. One approach that has been developed to address these long design times is to use E. coli cell extracts as simplified circuit prototyping environments. The analogy with similar approaches in engineering, such as prototyping using wind tunnels and breadboards, may be extended by developing accompanying computer aided design tools. In this thesis, we discuss the development of computational and mathematical tools to accelerate circuit prototyping in the TX-TL cell free prototyping platform, and demonstrate some applications of these tools.</p>\r\n\r\n<p>We start by discussing the problem of reducing circuit behavior variability between different batches of TX-TL cell extracts. To this end, we demonstrate a model-based methodology for calibrating extract batches, and for using the calibrations to 'correct' the behavior of genetic circuits between batches. We also look at the interaction of this methodology with the phenomenon of parameter non-identifiability, which occurs when the parameter identification inverse problem has multiple solutions. In particular, we derive conditions under which parameter non-identifiability does not hinder our modeling objectives, and subsequently demonstrate the use of such non-identifiable models in performing data variability reduction.</p> \r\n\r\n<p>Next, we describe <b>txtlsim</b>, a MATLAB Simbiology based toolbox for automatically generating models of genetic circuits in TX-TL, and for using these models for part characterization and circuit behavior prediction. Large genetic circuits can have non-negligible resource usage needs, leading to unintended interactions between circuit nodes arising due to the loading of cellular machinery, transcription factors or other regulatory elements. The usage of consumable resources like nucleotides and amino acids can also have non-trivial effects on complex genetic circuits. These types of effects are handled by the modeling framework of <b>txtlsim</b> in a natural way.</p>\r\n\r\n<p>We also highlight <b>mcmc-simbio</b>, a smaller toolbox within <b>txtlsim</b> for performing concurrent Bayesian parameter inference on Simbiology models. Concurrent inference here means that a common set of parameters can be identified using data from an ensemble of different circuits and experiments, with each experiment informing a subset of the parameters. The combination of the concurrence feature with the fact that Markov chain Monte Carlo based Bayesian inference methods allow for the direct visualization of parameter non-identifiability enables the design of ensembles of experiments that reduce such non-identifiability.</p>\r\n\r\n<p>Finally, we end with a method for performing model order reduction on transcription and translation elongation models while maintaining the ability of these models to track resource consumption. We show that due to their network topology, our models cannot be brought into the two-timescale form of singular perturbation theory when written in species concentration coordinates. We identify a coordinate system in which singular perturbation theory may be applied to chemical reaction networks more naturally, and use this to achieve the desired model reduction.</p>",
        "doi": "10.7907/g31j-ch52",
        "publication_date": "2019",
        "thesis_type": "phd",
        "thesis_year": "2019"
    },
    {
        "id": "thesis:11243",
        "collection": "thesis",
        "collection_id": "11243",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:10232018-150005837",
        "primary_object_url": {
            "basename": "AngelesAlbores_David_2019.pdf",
            "content": "final",
            "filesize": 8920493,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/11243/1/AngelesAlbores_David_2019.pdf",
            "version": "v6.0.0"
        },
        "type": "thesis",
        "title": "A Theory of Genetic Analysis Using Transcriptomic Phenotypes",
        "author": [
            {
                "family_name": "Angeles-Albores",
                "given_name": "David",
                "orcid": "0000-0001-5497-8264",
                "clpid": "Angeles-Albores-David"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Sternberg",
                "given_name": "Paul W.",
                "orcid": "0000-0002-7699-0173",
                "clpid": "Sternberg-P-W"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Newman",
                "given_name": "Dianne K.",
                "orcid": "0000-0003-1647-1918",
                "clpid": "Newman-D-K"
            },
            {
                "family_name": "Meyerowitz",
                "given_name": "Elliot M.",
                "orcid": "0000-0003-4798-5153",
                "clpid": "Meyerowitz-E-M"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Sternberg",
                "given_name": "Paul W.",
                "orcid": "0000-0002-7699-0173",
                "clpid": "Sternberg-P-W"
            }
        ],
        "local_group": [
            {
                "literal": "WormBase"
            },
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>This thesis deals with the conceptual and computational framework required to use transcriptomes as effective phenotypes for genetic analysis. I demonstrate that there are powerful theoretical reasons why Batesonian epistasis should feature prominently in transcriptional phenotypes. I also show how to compute and interpret the aggregate statistics for transcriptome-wide epistasis and transcriptome-wide dominance using whole-organism transcriptomic profiles of C. elegans mutants. Finally, I developed the WormBase Enrichment Suite for enrichment analysis of genomic data.</p>\r\n\r\n<p>RNA-seq as a tool has enormous potential because it relies on protocols that are fast, simple and increasingly cheap. In spite of their potential, transcriptomes have seen their use largely limited to single-factor experiments. Even when many transcriptomes are collected, the main analytic approach is to apply clustering algorithms that correlate responses but do not have any power to identify causal mechanisms.</p>\r\n\r\n<p>I demonstrate that if a complete genetic experimental design is used (in the form of a full two-factor matrix), transcriptomes can establish genetic interactions between a pair of genes without the need for clustering algorithms. Surprisingly, when we performed epistasis analyses of hypoxia pathway mutants in C. elegans we did not simply observe a generalized epistatic interaction between the mutants. In fact, the transcriptomes recapitulated the same Batesonian epistatic relationship that had been observed using classical phenotypes. In other words, we observed that the transcriptomic phenotype of one gene can be masked by the transcriptomic phenotype of a second gene, such that a double mutant of these two genes has exactly the same phenotype as a single mutant of the epistatic gene. Motivated by this observation, we developed methods to recognize and interpret Batesonian epistasis at the transcriptomic level. This method relies on the calculation of a single aggregate coefficient that we named the transcriptome-wide epistasis coefficient.</p>\r\n\r\n<p>The observation that Batesonian epistasis could be reproduced on a transcriptomic level was surprising. To explain how transcriptome-wide epistasis can arise, I studied a simplified model of transcriptional regulation using statistical mechanics. These studies demonstrate that epistatic analysis is equivalent to a perturbative analysis of the partition function of a promoter. Moreover, these studies revealed that a sufficient condition for Batesonian epistasis to occur is if the two genes encode variables that are transformed and multiplied together to form an effective single compound variable. Finally, these studies clearly demonstrate the connection between statistical (or generalized) epistasis and Batesonian epistasis and establish a physical basis for genetic logic.</p>\r\n\r\n<p>Genetic analyses of gene functional units can also be carried out using allelic series in tandem with complementation (also known as dominance) tests. I developed a statistical coefficient known as transcriptome-wide dominance to enable analyses of allelic series using expression profiles. A crucial aspect of allelic series is the ability to enumerate the independent phenotypes associated with an arbitrary set of alleles. I developed the concept of phenotypic classes as a transcriptomic analogue of classical phenotypes for this purpose. Briefly, a phenotypic class is a set of transcripts that are differentially expressed in a specific set of genotypes. Thus, an allelic series consisting of two mutant alleles (and a wild-type) can at most result in 7 phenotypic classes. However, some of these phenotypic classes may be artifactual as a result of the significant false positive and false negative rates that are associated with RNA-seq. I developed a simple algorithm that tries to identify phenotypic classes that are artifactual, though often these classes may also be identified through a critical evaluation of their biological implications. I applied these concepts to a small allelic series of the dpy-22 gene, which encodes a Mediator subunit in C. elegans, and identified 3\u20134 functional units along with their sequence requirements.</p>\r\n\r\n<p>Finally, I developed the WormBase Enrichment Suite by implementing a hypergeometric test on the tissue, gene and phenotype ontology for C. elegans. The importance of this tool derives mainly from its integration to WormBase, the repository of all C. elegans knowledge, which means that the databases that are tested will undergo continuous improvement and curation, and thus will yield the most accurate results.</p>",
        "doi": "10.7907/JRNS-NS05",
        "publication_date": "2019",
        "thesis_type": "phd",
        "thesis_year": "2019"
    },
    {
        "id": "thesis:11396",
        "collection": "thesis",
        "collection_id": "11396",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:02192019-004236200",
        "type": "thesis",
        "title": "The Changing Mouse Embryo Transcriptome at Whole Tissue and Single-Cell Resolution",
        "author": [
            {
                "family_name": "He",
                "given_name": "Peng",
                "orcid": "0000-0002-2457-3554",
                "clpid": "He-Peng"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Wold",
                "given_name": "Barbara J.",
                "orcid": "0000-0003-3235-8130",
                "clpid": "Wold-B-J"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Pachter",
                "given_name": "Lior S.",
                "orcid": "0000-0002-9164-6231",
                "clpid": "Pachter-L"
            },
            {
                "family_name": "Sternberg",
                "given_name": "Paul W.",
                "orcid": "0000-0002-7699-0173",
                "clpid": "Sternberg-P-W"
            },
            {
                "family_name": "Fejes Toth",
                "given_name": "Katalin",
                "orcid": "0000-0001-6558-2636",
                "clpid": "Fejes-Toth-K"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Guttman",
                "given_name": "Mitchell",
                "orcid": "0000-0003-4748-9352",
                "clpid": "Guttman-M"
            },
            {
                "family_name": "Wold",
                "given_name": "Barbara J.",
                "orcid": "0000-0003-3235-8130",
                "clpid": "Wold-B-J"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>Mammalian histogenesis is a sophisticated process of coordinated changes of cellular composition governed by selective gene expression. This thesis focuses on the systematic application of modern RNA-seq methods to histogenesis processes in developing mouse embryos. Most of the work presented here is conducted as part of the ENCODE (ENCyclopedia Of DNA Elements) Project. Chapter 1 introduces the current advances of transcriptome studies on tissue development. Chapter 2 discusses a large-scale study on the whole-tissue transcriptome of 12 embryonic tissues at up to 8 time points and 5 additional perinatal tissues. Coherent themes of biological function and underlying regulatory mechanisms are revealed from the large-scale analysis. Chapter 3 presents a high-resolution single-cell RNAseq study focused on the developing forelimb of the mouse embryo. This approach enables the assignment of differential genes to corresponding lineages and provides an even more accurate picture of RNA level patterns and regulatory modes. Finally, whole-tissue and single-cell methods are compared, contrasted, and integrated in Chapter 4 to extrapolate from the main discoveries of this thesis.</p>",
        "doi": "10.7907/35S4-HG18",
        "publication_date": "2019",
        "thesis_type": "phd",
        "thesis_year": "2019"
    },
    {
        "id": "thesis:10623",
        "collection": "thesis",
        "collection_id": "10623",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:01052018-221609680",
        "type": "thesis",
        "title": "Quantitative Dissection of the Allosteric and Sequence-Dependent Regulatory Genome in E. coli",
        "author": [
            {
                "family_name": "Belliveau",
                "given_name": "Nathan Maurice",
                "orcid": "0000-0002-1536-1963",
                "clpid": "Belliveau-Nathan-Maurice"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Phillips",
                "given_name": "Robert B.",
                "orcid": "0000-0003-3082-2809",
                "clpid": "Phillips-R"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Phillips",
                "given_name": "Robert B.",
                "orcid": "0000-0003-3082-2809",
                "clpid": "Phillips-R"
            },
            {
                "family_name": "Elowitz",
                "given_name": "Michael B.",
                "orcid": "0000-0002-1221-0967",
                "clpid": "Elowitz-M-B"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Van Valen",
                "given_name": "David A.",
                "orcid": "0000-0001-7534-7621",
                "clpid": "Van-Valen-D"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>Transcriptional regulation of gene expression is one of the most ubiquitous processes in biology. But while the catalog of bacterial genomes continues to expand rapidly, we remain ignorant about how almost all of the genes in these genomes are regulated. One of the ways genes are regulated is through external signals. To that end, we begin by presenting a general theory of allosteric transcriptional regulation using a statistical formulation of the Monod-Wyman-Changeux model, which we rigorously test using the ubiquitous simple repression motif in <i>Escherichia coli</i>.  We then move to consider the consequence of the regulatory sequences themselves on gene expression. Here we apply a massively parallel reporter assay, Sort-Seq, to build models that describe the sequence-dependent binding energies of transcription factors and RNA polymerase to DNA. By coupling such models to our thermodynamic models of regulation, we construct a genotype to phenotype mapping that predicts gene expression as a function of regulatory sequence. We first  demonstrate this approach in the context of the allosteric simple repression motif, and then show how it can be applied broadly across a bacterial genome, in conjunction with mass spectrometry, to uncover how genes are regulated.</p>",
        "doi": "10.7907/Z9DN438T",
        "publication_date": "2018",
        "thesis_type": "phd",
        "thesis_year": "2018"
    },
    {
        "id": "thesis:10958",
        "collection": "thesis",
        "collection_id": "10958",
        "cite_using_url": "https://resolver.caltech.edu/CaltechTHESIS:05292018-133205686",
        "primary_object_url": {
            "basename": "barnes_stephanie_2018.pdf",
            "content": "final",
            "filesize": 61047841,
            "license": "other",
            "mime_type": "application/pdf",
            "url": "/10958/1/barnes_stephanie_2018.pdf",
            "version": "v4.0.0"
        },
        "type": "thesis",
        "title": "Decoding the Regulatory Genome: Quantitative Analysis of Transcriptional Regulation in Escherichia coli",
        "author": [
            {
                "family_name": "Barnes",
                "given_name": "Stephanie Loos",
                "orcid": "0000-0002-5237-603X",
                "clpid": "Barnes-Stephanie-Loos"
            }
        ],
        "thesis_advisor": [
            {
                "family_name": "Phillips",
                "given_name": "Robert B.",
                "orcid": "0000-0003-3082-2809",
                "clpid": "Phillips-R"
            }
        ],
        "thesis_committee": [
            {
                "family_name": "Newman",
                "given_name": "Dianne K.",
                "orcid": "0000-0003-1647-1918",
                "clpid": "Newman-D-K"
            },
            {
                "family_name": "Elowitz",
                "given_name": "Michael B.",
                "orcid": "0000-0002-1221-0967",
                "clpid": "Elowitz-M-B"
            },
            {
                "family_name": "Thomson",
                "given_name": "Matthew",
                "orcid": "0000-0003-1021-1234",
                "clpid": "Thomson-M-W"
            },
            {
                "family_name": "Phillips",
                "given_name": "Robert B.",
                "orcid": "0000-0003-3082-2809",
                "clpid": "Phillips-R"
            }
        ],
        "local_group": [
            {
                "literal": "div_bbe"
            }
        ],
        "abstract": "<p>Over the past decades DNA sequencing has become significantly cheaper and faster, which has enabled the accumulation of a huge amount of genomic data. However, much of this genomic data is illegible to us. For noncoding regions of the genome in particular, it is difficult to determine what role is played by specific DNA sequences. Here we focus on regions of DNA that play a role in transcriptional regulation. We develop models and techniques that allow us to discover new regulatory sequences and better understand how DNA sequence determines regulatory output.</p>\r\n\r\n<p>We start by considering how quantitative models serve as a powerful tool for testing our understanding of biological systems. We apply a statistical mechanical framework that incorporates the Monod-Wyman-Changeux model to analyze the effects of allostery in simple repression, using the lac operon as a test case. By fitting our model to experimental data, we are able to determine the values of the unknown parameter values in our model. We then show that we can use the model to accurately predict the induction responses of an array of simple repression constructs with a variety of repressor copy numbers and repressor binding energies.</p>\r\n\r\n<p>Next, we consider how the DNA sequence of a promoter region can provide details about how the promoter is regulated. We begin by describing an approach for discovering regulatory architectures for promoters whose regulation has not previously been studied. We focus on six promoters from E. coli including three well-studied promoters (rel, mar, and lac) to serve as test cases. We use the massively parallel reporter assay Sort-Seq to identify transcription factor binding sites with base-pair resolution, determine the regulatory role of each binding site, and infer energy matrices for each binding site. Then, we use DNA affinity chromatography and mass spectrometry to identify each transcription factor.</p>\r\n\r\n<p>We conclude with an in vivo approach for analyzing the sequence-dependence of transcription factor binding energies. Again using Sort-Seq, we show that we can represent transcription factor binding sites using energy matrices in absolute energy units. We then show that these energy matrices can be used to accurately predict the binding energies of mutated binding sites. We provide several examples of how understanding the relationship between DNA sequence and transcription factor binding provides us with a foundation for addressing additional scientific topics, such as the co-evolution of transcription factors and their binding sites.</p>",
        "doi": "10.7907/D13T-7868",
        "publication_date": "2018",
        "thesis_type": "phd",
        "thesis_year": "2018"
    }
]