[ { "id": "https://authors.library.caltech.edu/records/4bdzc-r3y87", "eprint_status": "archive", "datestamp": "2024-02-01 22:33:10", "lastmod": "2024-02-01 22:34:19", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Gorin-Gennady", "name": { "family": "Gorin", "given": "Gennady" }, "orcid": "0000-0001-6097-2029" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "New and notable: Revisiting the \"two cultures\" through extrinsic noise", "ispublished": "pub", "full_text_status": "public", "keywords": "Biophysics", "note": "
G.G. and L.P. wrote the article. G.G. implemented the simulation in Fig. 1.
\n\nThe authors declare no competing interests.
\n\nIn a classic article (1), Leo Breiman bears witness to the divergence between “two cultures” of statistics that emerged in the wake of readily accessible computing technology: the data modeling culture, which concerns itself with developing and fitting stochastic models, and the algorithmic modeling culture, which concerns itself with improving predictive accuracy without delving into unknown (and perhaps unknowable) mechanisms. More than two decades later, the distinct cultures of statistics are evident in approaches to single-molecule transcriptomics. The biophysics subfield focuses on assays that target a small number of genes and develops increasingly sophisticated mechanistic models, whereas the sequence census subfield uses descriptive, data-scientific methods such as those championed by Breiman.
", "date": "2024-01-02", "date_type": "published", "publication": "Biophysical Journal", "volume": "123", "number": "1", "publisher": "Cell Press", "pagerange": "1-3", "issn": "0006-3495", "official_url": "https://authors.library.caltech.edu/records/4bdzc-r3y87", "funders": { "items": [ {} ] }, "local_group": { "items": [ { "id": "Division-of-Biology-and-Biological-Engineering" } ] }, "doi": "10.1016/j.bpj.2023.11.3400", "resource_type": "article", "pub_year": "2024", "author_list": "Gorin, Gennady and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/axtf4-cb576", "eprint_status": "archive", "datestamp": "2024-01-12 20:13:49", "lastmod": "2024-01-12 20:13:49", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Booeshaghi-A-Sina", "name": { "family": "Booeshaghi", "given": "A. Sina" }, "orcid": "0000-0002-6442-4502" }, { "name": { "family": "Min", "given": "Kyung Hoi (Joseph)" }, "orcid": "0000-0003-0894-4017" }, { "id": "Gehring-Jase", "name": { "family": "Gehring", "given": "Jase" }, "orcid": "0000-0002-3894-9495" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Quantifying orthogonal barcodes for sequence census assays", "ispublished": "pub", "full_text_status": "public", "keywords": "Computer Science Applications; Genetics; Molecular Biology; Structural Biology", "note": " \n\n\nSupplementary data are available at Bioinformatics Advances online.
\nNone declared.
\nBarcode-based sequence census assays utilize custom or random oligonucloetide sequences to label various biological features, such as cell-surface proteins or CRISPR perturbations. These assays all rely on barcode quantification, a task that is complicated by barcode design and technical noise. We introduce a modular approach to quantifying barcodes that achieves speed and memory improvements over existing tools. We also introduce a set of quality control metrics, and accompanying tool, for validating barcode designs.
\n\u00a9 2023 Elsevier.
\n\nThe authors thank members of the McMahon laboratory for helpful comments on experimental design and members of the Kim laboratory for useful discussion on single-cell analyses. We thank Dr. Ron Korstanje for orientating us to, and sharing data from, his groups analysis of multiple organs in a diversity outbred cross. Work in A.P.M.'s laboratory is supported by a grant from the National Institutes of Health (R01 DK126925). A.L.M. acknowledges support from the National Institutes of Health (R35GM143019) and the National Science Foundation (DMS2045327).
\n\nA.P.M. conceived the study. Funding support was generated by A.P.M., J.K., and L.P. Data were collected by J.L., K.K., and J.-J.G. and analyzed by L.X., J.L., S.Y.H., M.R., Z.M., F.G., and I.B.H. in consultation with A.P.M., A.L.M., J.K., and L.P. L.X., J.L., and A.P.M. wrote the manuscript incorporating comments from all participants.
\n\nMammalian organs exhibit distinct physiology, disease susceptibility, and injury responses between the sexes. In the mouse kidney, sexually dimorphic gene activity maps predominantly to proximal tubule (PT) segments. Bulk RNA sequencing (RNA-seq) data demonstrated that sex differences were established from 4 and 8 weeks after birth under gonadal control. Hormone injection studies and genetic removal of androgen and estrogen receptors demonstrated androgen receptor (AR)-mediated regulation of gene activity in PT cells as the regulatory mechanism. Interestingly, caloric restriction feminizes the male kidney. Single-nuclear multiomic analysis identified putative cis-regulatory regions and cooperating factors mediating PT responses to AR activity in the mouse kidney. In the human kidney, a limited set of genes showed conserved sex-linked regulation, whereas analysis of the mouse liver underscored organ-specific differences in the regulation of sexually dimorphic gene expression. These findings raise interesting questions on the evolution, physiological significance, disease, and metabolic linkage of sexually dimorphic gene activity.
", "date": "2023-11-06", "date_type": "published", "publication": "Developmental Cell", "volume": "58", "number": "21", "publisher": "Cell Press", "pagerange": "2338-2358.e5", "issn": "1878-1551", "official_url": "https://authors.library.caltech.edu/records/ntw84-7bx60", "funders": { "items": [ { "grant_number": "R01 DK126925" }, { "grant_number": "R35GM143019" }, { "grant_number": "DMS-2045327)" } ] }, "local_group": { "items": [ { "id": "Division-of-Biology-and-Biological-Engineering" } ] }, "doi": "10.1016/j.devcel.2023.08.010", "primary_object": { "basename": "ScienceDirect_files_31Oct2023_17-40-25.611.zip", "url": "https://authors.library.caltech.edu/records/ntw84-7bx60/files/ScienceDirect_files_31Oct2023_17-40-25.611.zip" }, "resource_type": "article", "pub_year": "2023", "author_list": "Xiong, Lingyun; Liu, Jing; et el." }, { "id": "https://authors.library.caltech.edu/records/5z5v2-jjy66", "eprint_status": "archive", "datestamp": "2023-10-30 21:36:43", "lastmod": "2024-01-09 22:18:57", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Gorin-Gennady", "name": { "family": "Gorin", "given": "Gennady" }, "orcid": "0000-0001-6097-2029" }, { "id": "Yoshida-Shawn-R", "name": { "family": "Yoshida", "given": "Shawn" }, "orcid": "0000-0002-0866-2741" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Assessing Markovian and Delay Models for Single-Nucleus RNA Sequencing", "ispublished": "pub", "full_text_status": "public", "keywords": "Computational Theory and Mathematics; General Agricultural and Biological Sciences; Pharmacology; General Environmental Science; General Biochemistry, Genetics and Molecular Biology; General Mathematics; Immunology; General Neuroscience", "note": "Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
\n\nG.G. thanks Dr. John J. Vastola and Catherine Felce for valuable discussions. The MCMC inference procedure was based on the algorithm developed by Dr. John J. Vastola and Meichen Fang (Gorin et al. 2022). G.G. and L.P. were partially funded by the National Institutes of Health Grants U19MH114830 and 5UM1HG012077-02. S.Y. was supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE-1745301. A part of the reported results were obtained during a Data Sciences Co-op with Celsius Therapeutics, Inc. The DNA and RNA illustrations are derived from the DNA Twemoji by Twitter, Inc., used under CC-BY 4.0.
\n\nThe authors declare no conflict of interests.
", "abstract": "The serial nature of reactions involved in the RNA life-cycle motivates the incorporation of delays in models of transcriptional dynamics. The models couple a transcriptional process to a fairly general set of delayed monomolecular reactions with no feedback. We provide numerical strategies for calculating the RNA copy number distributions induced by these models, and solve several systems with splicing, degradation, and catalysis. An analysis of single-cell and single-nucleus RNA sequencing data using these models reveals that the kinetics of nuclear export do not appear to require invocation of a non-Markovian waiting time.
", "date": "2023-11", "date_type": "published", "publication": "Bulletin of Mathematical Biology", "volume": "85", "number": "11", "publisher": "Springer Nature", "pagerange": "114", "issn": "0092-8240", "official_url": "https://authors.library.caltech.edu/records/5z5v2-jjy66", "funders": { "items": [ { "agency": "National Institutes of Health", "grant_number": "U19MH114830" }, { "agency": "NIH", "grant_number": "5UM1HG012077-02" }, { "grant_number": "DGE-1745301" } ] }, "local_group": { "items": [ { "id": "Division-of-Biology-and-Biological-Engineering" } ] }, "doi": "10.1007/s11538-023-01213-9", "primary_object": { "basename": "11538_2023_1213_MOESM1_ESM.pdf", "url": "https://authors.library.caltech.edu/records/5z5v2-jjy66/files/11538_2023_1213_MOESM1_ESM.pdf" }, "related_objects": [ { "basename": "11538_2023_1213_MOESM2_ESM.xlsx", "url": "https://authors.library.caltech.edu/records/5z5v2-jjy66/files/11538_2023_1213_MOESM2_ESM.xlsx" } ], "resource_type": "article", "pub_year": "2023", "author_list": "Gorin, Gennady; Yoshida, Shawn; et el." }, { "id": "https://authors.library.caltech.edu/records/x9gbd-0gk44", "eprint_status": "archive", "datestamp": "2023-10-30 20:31:47", "lastmod": "2024-01-09 22:20:27", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Gorin-Gennady", "name": { "family": "Gorin", "given": "Gennady" }, "orcid": "0000-0001-6097-2029" }, { "id": "Vastola-John-J", "name": { "family": "Vastola", "given": "John J." }, "orcid": "0000-0002-5625-2106" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Studying stochastic systems biology of the cell with single-cell genomics data", "ispublished": "pub", "full_text_status": "public", "keywords": "Cell Biology; Histology; Pathology and Forensic Medicine", "note": "\u00a9 2023 Elsevier.
\n\nG.G. and L.P. were partially funded by NIH 5UM1HG012077-02 and NIH U19MH114830. J.J.V. was partially funded by NIH 1U19NS118246-01. The RNA, DNA, and cDNA illustrations were derived from the DNA Twemoji by Twitter, Inc., used under the CC-BY 4.0 license. The authors thank Dr. A. Sina Booeshaghi, Maria Carilli, Tara Chari, Taleen Dilanyan, Dr. Kristj\u00e1n Eldj\u00e1rn Hj\u00f6rleifsson, Meichen Fang, Catherine Felce, and Delaney Sullivan for fruitful discussions of co-regulation, contamination, transient behaviors, catalysis, fragmentation, genomic alignment, and a variety of other phenomena and processes. Part of this work was performed during G.G.'s Data Sciences Co-op with Celsius Therapeutics, Inc.
\n\nG.G. performed all computational experiments. G.G. and J.J.V. developed the theoretical framework. All authors conceptualized the work and wrote the manuscript.
\n\nThe authors declare no competing interests.
Recent experimental developments in genome-wide RNA quantification hold considerable promise for systems biology. However, rigorously probing the biology of living cells requires a unified mathematical framework that accounts for single-molecule biological stochasticity in the context of technical variation associated with genomics assays. We review models for a variety of RNA transcription processes, as well as the encapsulation and library construction steps of microfluidics-based single-cell RNA sequencing, and present a framework to integrate these phenomena by the manipulation of generating functions. Finally, we use simulated scenarios and biological data to illustrate the implications and applications of the approach.
", "date": "2023-10-18", "date_type": "published", "publication": "Cell Systems", "volume": "14", "number": "10", "publisher": "Cell Press", "pagerange": "822-843.e22", "issn": "2405-4712", "official_url": "https://authors.library.caltech.edu/records/x9gbd-0gk44", "funders": { "items": [ { "grant_number": "1U19NS118246-01" }, { "grant_number": "5UM1HG012077-02" }, { "grant_number": "U19MH114830" } ] }, "local_group": { "items": [ { "id": "Division-of-Biology-and-Biological-Engineering" } ] }, "doi": "10.1016/j.cels.2023.08.004", "pmcid": "PMC10725240", "primary_object": { "basename": "ScienceDirect_files_30Oct2023_20-30-14.752.zip", "url": "https://authors.library.caltech.edu/records/x9gbd-0gk44/files/ScienceDirect_files_30Oct2023_20-30-14.752.zip" }, "resource_type": "article", "pub_year": "2023", "author_list": "Gorin, Gennady; Vastola, John J.; et el." }, { "id": "https://authors.library.caltech.edu/records/hs9mc-jb762", "eprint_status": "archive", "datestamp": "2023-09-27 22:28:43", "lastmod": "2024-01-18 17:25:54", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Booeshaghi-A-Sina", "name": { "family": "Booeshaghi", "given": "A. Sina" }, "orcid": "0000-0002-6442-4502" }, { "id": "Beltrame-Eduardo-da-Veiga", "name": { "family": "Beltrame", "given": "Eduardo da Veiga" }, "orcid": "0000-0002-1529-9207" }, { "id": "Bannon-Dylan", "name": { "family": "Bannon", "given": "Dylan" } }, { "id": "Gehring-Jase", "name": { "family": "Gehring", "given": "Jase" }, "orcid": "0000-0002-3894-9495" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Author Correction: Principles of open source bioinstrumentation applied to the poseidon syringe pump system", "ispublished": "pub", "full_text_status": "public", "keywords": "Multidisciplinary", "note": "\u00a9 The Author(s) 2023. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
", "abstract": "Correction to: Scientific Reports https://doi.org/10.1038/s41598-019-48815-9, published online 27 August 2019
This Article contains an error in Figure 4, where the replotting of a subset of data in Figure 4a, which pertain to the Harvard dataset is incorrect in panels (1) and (3). The correct Figure 4 and accompanying legend appear below.
", "date": "2023-09-08", "date_type": "published", "publication": "Scientific Reports", "volume": "13", "publisher": "Nature Publishing Group", "pagerange": "14834", "issn": "2045-2322", "official_url": "https://authors.library.caltech.edu/records/hs9mc-jb762", "local_group": { "items": [ { "id": "Division-of-Biology-and-Biological-Engineering" } ] }, "doi": "10.1038/s41598-023-42035-y", "pmcid": "PMC10491597", "primary_object": { "basename": "s41598-023-42035-y.pdf", "url": "https://authors.library.caltech.edu/records/hs9mc-jb762/files/s41598-023-42035-y.pdf" }, "resource_type": "article", "pub_year": "2023", "author_list": "Booeshaghi, A. Sina; Beltrame, Eduardo da Veiga; et el." }, { "id": "https://authors.library.caltech.edu/records/ewrjt-pbk58", "eprint_status": "archive", "datestamp": "2023-11-09 22:35:21", "lastmod": "2024-01-09 22:20:30", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Jackson-Kayla-C", "name": { "family": "Jackson", "given": "Kayla C." }, "orcid": "0000-0001-6483-0108" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "A standard for sharing spatial transcriptomics data", "ispublished": "pub", "full_text_status": "public", "keywords": "Genetics; Biochemistry, Genetics and Molecular Biology (miscellaneous)", "note": "\u00a9 2023 The Authors. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
\n\nThe authors declare no competing interests.
", "abstract": "Spatial transcriptomic technologies have the potential to reveal critical relationships between the function of genes and cells and their spatial organization. Here, we provide a sharing model for spatial transcriptomics data with the aim of establishing a set of primary data and metadata needed to reproduce analyses and facilitate computational methods development.
", "date": "2023-08-09", "date_type": "published", "publication": "Cell Genomics", "volume": "3", "number": "8", "publisher": "Cell Press", "pagerange": "100374", "issn": "2666-979X", "official_url": "https://authors.library.caltech.edu/records/ewrjt-pbk58", "local_group": { "items": [ { "id": "Division-of-Biology-and-Biological-Engineering" } ] }, "doi": "10.1016/j.xgen.2023.100374", "pmcid": "PMC10435375", "primary_object": { "basename": "ScienceDirect_files_09Nov2023_22-34-13.342.zip", "url": "https://authors.library.caltech.edu/records/ewrjt-pbk58/files/ScienceDirect_files_09Nov2023_22-34-13.342.zip" }, "related_objects": [ { "basename": "main.pdf", "url": "https://authors.library.caltech.edu/records/ewrjt-pbk58/files/main.pdf" } ], "resource_type": "article", "pub_year": "2023", "author_list": "Jackson, Kayla C. and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/fzh9v-hjh15", "eprint_status": "archive", "datestamp": "2023-11-09 20:03:17", "lastmod": "2024-01-09 22:20:47", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Chari-Tara", "name": { "family": "Chari", "given": "Tara" }, "orcid": "0000-0002-6953-4313" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "The specious art of single-cell genomics", "ispublished": "pub", "full_text_status": "public", "keywords": "Computational Theory and Mathematics; Cellular and Molecular Neuroscience; Genetics; Molecular Biology; Ecology; Modeling and Simulation; Ecology, Evolution, Behavior and Systematics", "note": "\u00a9 2023 Chari, Pachter. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
\n\nL.P. received the National Institutes of Health (nih.gov) award U19MH114830, administered by the National Institute of Mental Health (nimh.nih.gov). T.C. and L.P. were partially funded by this award. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
\n\nData Availability: Download links for the original data used to generate the figures and results in the paper are listed in Table A in S1 Text. Processed and normalized versions of the count matrices are available on CaltechData, with links provided in Table B in S1 Text. All analysis code used to generate the figures and results in the paper is available at https:// github.com/pachterlab/CP_2023 and deposited at Zenodo (DOI https://doi.org/10.5281/zenodo.8087950). Code is provided in Colab notebooks which can be run for free on the Google cloud.
\n\nThe authors have declared that no competing interests exist.
", "abstract": "Dimensionality reduction is standard practice for filtering noise and identifying relevant features in large-scale data analyses. In biology, single-cell genomics studies typically begin with reduction to 2 or 3 dimensions to produce \"all-in-one\" visuals of the data that are amenable to the human eye, and these are subsequently used for qualitative and quantitative exploratory analysis. However, there is little theoretical support for this practice, and we show that extreme dimension reduction, from hundreds or thousands of dimensions to 2, inevitably induces significant distortion of high-dimensional datasets. We therefore examine the practical implications of low-dimensional embedding of single-cell data and find that extensive distortions and inconsistent practices make such embeddings counter-productive for exploratory, biological analyses. In lieu of this, we discuss alternative approaches for conducting targeted embedding and feature exploration to enable hypothesis-driven biological discovery.", "date": "2023-08", "date_type": "published", "publication": "PLOS Computational Biology", "volume": "19", "number": "8", "publisher": "Public Library of Science", "pagerange": "e1011288", "issn": "1553-7358", "editors": { "items": [ { "name": { "family": "Papin", "given": "Jason A." } } ] }, "official_url": "https://authors.library.caltech.edu/records/fzh9v-hjh15", "funders": { "items": [ { "agency": "National Institutes of Health", "grant_number": "U19MH114830" } ] }, "local_group": { "items": [ { "id": "Division-of-Biology-and-Biological-Engineering" } ] }, "doi": "10.1371/journal.pcbi.1011288", "pmcid": "PMC10434946", "primary_object": { "basename": "pcbi.1011288.pdf", "url": "https://authors.library.caltech.edu/records/fzh9v-hjh15/files/pcbi.1011288.pdf" }, "related_objects": [ { "basename": "pcbi.1011288.s001.pdf", "url": "https://authors.library.caltech.edu/records/fzh9v-hjh15/files/pcbi.1011288.s001.pdf" } ], "resource_type": "article", "pub_year": "2023", "author_list": "Chari, Tara and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/nvj6h-bzw14", "eprint_id": 121277, "eprint_status": "archive", "datestamp": "2023-08-22 20:28:20", "lastmod": "2023-12-22 23:16:51", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Fenelon-Kelli-D", "name": { "family": "Fenelon", "given": "Kelli D." }, "orcid": "0000-0002-1294-9200" }, { "id": "Gao-Fan", "name": { "family": "Gao", "given": "Fan" }, "orcid": "0000-0001-6832-3402" }, { "id": "Borad-Priyanshi", "name": { "family": "Borad", "given": "Priyanshi" }, "orcid": "0000-0001-8446-5312" }, { "id": "Abbasi-Shiva", "name": { "family": "Abbasi", "given": "Shiva" }, "orcid": "0000-0002-6470-335X" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Koromila-Theodora", "name": { "family": "Koromila", "given": "Theodora" }, "orcid": "0000-0001-5504-1369" } ] }, "title": "Cell-specific occupancy dynamics between the pioneer-like factor Opa/ZIC and Ocelliless/OTX regulate early head development in embryos", "ispublished": "pub", "full_text_status": "public", "keywords": "Cell Biology; Developmental Biology", "note": "\u00a9 2023 Fenelon, Gao, Borad, Abbasi, Pachter and Koromila. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. \n\nWe are grateful to Rhea Datta and Angela Stathopoulos for generously providing us with fly lines, antibodies, and ChIPseq data. We would further like to thank Anupama Chandrasekhar for her help with the StarkLab database; Hinduja Sathishkumar and Saubia Zareen, students in the Koromila Lab, for their assistance with administrative tasks and fly husbandry; and Mounia Lagha for helpful discussions. This work was made possible by funding from the UTA STARS program and the Bioinformatics Resource Center at the Beckman Institute of Caltech. \n\nThis work was made possible by funding from the UTA STARS program. \n\nAuthor contributions. TK conceived and directed the project. TK and KF planned the experimental approaches and oversaw the computational approach. KF performed in situ hybridizations and designed the image analysis pipeline. KF and PB performed the imaging and image analyses. FG wrote the bioinformatic scripts and carried out the bioinformatic analyses. LP gave input for writing the manuscript. KF, SA, and TK compiled embryo. images from the Stark database, made ChIP peak alignments and designed figures. TK and KF analyzed the data and wrote the manuscript. \n\nData availability statement. The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material. \n\nThe authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.\n\nPublished - fcell-11-1126507.pdf
Supplemental Material - Data_Sheet_1.pdf
", "abstract": "During development, embryonic patterning systems direct a set of initially uncommitted pluripotent cells to differentiate into a variety of cell types and tissues. A core network of transcription factors, such as Zelda/POU5F1, Odd-paired (Opa)/ZIC3 and Ocelliless (Oc)/OTX2, are conserved across animals. While Opa is essential for a second wave of zygotic activation after Zelda, it is unclear whether Opa drives head cell specification, in the Drosophila embryo. Our hypothesis is that Opa and Oc are interacting with distinct cis-regulatory regions for shaping cell fates in the embryonic head. Super-resolution microscopy and meta-analysis of single-cell RNAseq datasets show that opa's and oc's overlapping expression domains are dynamic in the head region, with both factors being simultaneously transcribed at the blastula stage. Additionally, analysis of single-embryo RNAseq data reveals a subgroup of Opa-bound genes to be Opa-independent in the cellularized embryo. Interrogation of these genes against Oc ChIPseq combined with in situ data, suggests that Opa is competing with Oc for the regulation of a subgroup of genes later in gastrulation. Specifically, we find that Oc binds to late, head-specific enhancers independently and activates them in a head-specific wave of zygotic transcription, suggesting distinct roles for Oc in the blastula and gastrula stages.", "date": "2023-03-27", "date_type": "published", "publication": "Frontiers in Cell and Developmental Biology", "volume": "11", "publisher": "Frontiers Media", "pagerange": "Art. No. 1126507", "id_number": "CaltechAUTHORS:20230502-708586900.1", "issn": "2296-634X", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20230502-708586900.1", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "University of Texas at Arlington" }, { "agency": "Caltech Beckman Institute" } ] }, "local_group": { "items": [ { "id": "Division-of-Biology-and-Biological-Engineering" } ] }, "doi": "10.3389/fcell.2023.1126507", "pmcid": "PMC10083704", "primary_object": { "basename": "Data_Sheet_1.pdf", "url": "https://authors.library.caltech.edu/records/nvj6h-bzw14/files/Data_Sheet_1.pdf" }, "related_objects": [ { "basename": "fcell-11-1126507.pdf", "url": "https://authors.library.caltech.edu/records/nvj6h-bzw14/files/fcell-11-1126507.pdf" } ], "resource_type": "article", "pub_year": "2023", "author_list": "Fenelon, Kelli D.; Gao, Fan; et el." }, { "id": "https://authors.library.caltech.edu/records/nz49t-npq98", "eprint_id": 122526, "eprint_status": "archive", "datestamp": "2023-08-22 18:40:40", "lastmod": "2023-12-22 23:07:03", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Luebbert-Laura", "name": { "family": "Luebbert", "given": "Laura" }, "orcid": "0000-0003-1379-2927" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Efficient querying of genomic reference databases with gget", "ispublished": "pub", "full_text_status": "public", "keywords": "Computational Mathematics; Computational Theory and Mathematics; Computer Science Applications; Molecular Biology; Biochemistry; Statistics and Probability", "note": "\u00a9 The Author(s) 2023. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. \n\nWe thank Kyung Hoi (Joseph) Min for advice on the command line interface, Matteo Guareschi for advice on Windows operability, and A. Sina Booeshaghi, Alessandro Groaz, Kristj\u00e1n Eldj\u00e1rn Hj\u00f6rleifsson and \u00c1ngel G\u00e1lvez-Merch\u00e1n for insightful discussions about gget. Illustrations in Fig. 1 and Supplementary Figure S1 were created with BioRender.com. \n\nThis work was supported by funding from the Biology and Bioengineering Division at the California Institute of Technology and the Chen Graduate Innovator Grant [CHEN.SYS3.CGIAFY21 to L.L.]; in part by National Institutes of Health (NIH) [U19MH114830 to L.P.]. \n\nConflict of Interest: none declared.\n\nPublished - btac836.pdf
Supplemental Material - btac836_supplementary_data.zip
", "abstract": "Motivation: A recurring challenge in interpreting genomic data is the assessment of results in the context of existing reference databases. With the increasing number of command line and Python users, there is a need for tools implementing automated, easy programmatic access to curated reference information stored in a diverse collection of large, public genomic databases. \n\nResults: gget is a free and open-source command line tool and Python package that enables efficient querying of genomic reference databases, such as Ensembl. gget consists of a collection of separate but interoperable modules, each designed to facilitate one type of database querying required for genomic data analysis in a single line of code.", "date": "2023-01-01", "date_type": "published", "publication": "Bioinformatics", "volume": "39", "number": "1", "publisher": "Oxford University Press", "pagerange": "Art. No. btac836", "id_number": "CaltechAUTHORS:20230725-706344000.34", "issn": "1367-4811", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20230725-706344000.34", "funders": { "items": [ { "agency": "Caltech Division of Biology and Biological Engineering" }, { "agency": "Tianqiao and Chrissy Chen Institute for Neuroscience", "grant_number": "CHEN.SYS3.CGIAFY21" }, { "agency": "NIH", "grant_number": "U19MH114830" } ] }, "local_group": { "items": [ { "id": "Tianqiao-and-Chrissy-Chen-Institute-for-Neuroscience" }, { "id": "Division-of-Biology-and-Biological-Engineering" } ] }, "doi": "10.1093/bioinformatics/btac836", "pmcid": "PMC9835474", "primary_object": { "basename": "btac836.pdf", "url": "https://authors.library.caltech.edu/records/nz49t-npq98/files/btac836.pdf" }, "related_objects": [ { "basename": "btac836_supplementary_data.zip", "url": "https://authors.library.caltech.edu/records/nz49t-npq98/files/btac836_supplementary_data.zip" } ], "resource_type": "article", "pub_year": "2023", "author_list": "Luebbert, Laura and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/ryza3-50d52", "eprint_id": 120722, "eprint_status": "archive", "datestamp": "2023-08-22 18:39:22", "lastmod": "2023-12-22 23:17:05", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "G\u00e1lvez-Merch\u00e1n-\u00c1ngel", "name": { "family": "G\u00e1lvez-Merch\u00e1n", "given": "\u00c1ngel" }, "orcid": "0000-0001-7420-8697" }, { "id": "Min-Kyung-Hoi-Joseph", "name": { "family": "Min", "given": "Kyung Hoi (Joseph)" }, "orcid": "0000-0003-0894-4017" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Booeshaghi-A-Sina", "name": { "family": "Booeshaghi", "given": "A. Sina" }, "orcid": "0000-0002-6442-4502" } ] }, "title": "Metadata retrieval from sequence databases with ffq", "ispublished": "pub", "full_text_status": "public", "keywords": "Computational Mathematics; Computational Theory and Mathematics; Computer Science Applications; Molecular Biology; Biochemistry; Statistics and Probability", "note": "\u00a9 The Author(s) 2023. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. \n\nThis work was motivated by the need to obtain metadata for Booeshaghi and Pachter (2020). We thank Ali Mortazavi for his suggestion to include ffq querying of the ENCODE database and Anders Goncalves da Silva, Andrea Telatin, Laura Luebbert and Phil Ewels for their contributions to the code base. \n\nThis work was supported in part by National Institutes of Health (NIH) [U19MH114830]. \n\nData availability. All data and code associated with this manuscript is available at https://github.com/pachterlab/ffq. \n\nConflict of Interest: none declared.\n\nPublished - btac667.pdf
", "abstract": "Motivation: Several genomic databases host data and metadata for an ever-growing collection of sequence datasets. While these databases have a shared hierarchical structure, there are no tools specifically designed to leverage it for metadata extraction.\n\nResults: We present a command-line tool, called ffq, for querying user-generated data and metadata from sequence databases. Given an accession or a paper's DOI, ffq efficiently fetches metadata and links to raw data in JSON format. ffq's modularity and simplicity make it extensible to any genomic database exposing its data for programmatic access.\n\nAvailability and implementation: ffq is free and open source, and the code can be found here: https://github.com/pachterlab/ffq.", "date": "2023-01", "date_type": "published", "publication": "Bioinformatics", "volume": "39", "number": "1", "publisher": "Oxford University Press", "pagerange": "Art. No. btac667", "id_number": "CaltechAUTHORS:20230411-694477200.2", "issn": "1367-4811", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20230411-694477200.2", "funders": { "items": [ { "agency": "NIH", "grant_number": "U19MH114830" } ] }, "local_group": { "items": [ { "id": "Division-of-Biology-and-Biological-Engineering" } ] }, "doi": "10.1093/bioinformatics/btac667", "pmcid": "PMC9883619", "primary_object": { "basename": "btac667.pdf", "url": "https://authors.library.caltech.edu/records/ryza3-50d52/files/btac667.pdf" }, "resource_type": "article", "pub_year": "2023", "author_list": "G\u00e1lvez-Merch\u00e1n, \u00c1ngel; Min, Kyung Hoi (Joseph); et el." }, { "id": "https://authors.library.caltech.edu/records/d6ppt-egs07", "eprint_id": 121966, "eprint_status": "archive", "datestamp": "2023-08-22 18:26:51", "lastmod": "2023-12-22 23:16:49", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Gorin-Gennady", "name": { "family": "Gorin", "given": "Gennady" }, "orcid": "0000-0001-6097-2029" }, { "id": "Vastola-John-J", "name": { "family": "Vastola", "given": "John J." }, "orcid": "0000-0002-5625-2106" }, { "id": "Fang-Meichen", "name": { "family": "Fang", "given": "Meichen" }, "orcid": "0000-0002-8217-0710" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Interpretable and tractable models of transcriptional noise for the rational design of single-molecule quantification experiments", "ispublished": "pub", "full_text_status": "public", "keywords": "General Physics and Astronomy; General Biochemistry, Genetics and Molecular Biology; General Chemistry; Multidisciplinary", "note": "\u00a9 The Author(s) 2022. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. \n\nThe DNA, pre-mRNA, and mature mRNA used in Fig. 1 are derivatives of the DNA Twemoji by Twitter, Inc., used under CC-BY 4.0. G.G. acknowledges the help of Victor Rohde in exploration of the stochastic process literature. G.G., M.F., and L.P. were partially funded by NIH U19MH114830. J.J.V. was supported by NSF Grant # DMS 1562078. \n\nThese authors contributed equally: Gennady Gorin and John J. Vastola. \n\nAuthor contributions. J.J.V. and G.G. conceived of the work, derived the mathematical results, and drafted the manuscript. G.G., M.F., and J.J.V. worked on simulating the models and numerically implementing their analytic solutions. G.G. and M.F. fit the single-cell data. L.P. supervised the work. All authors reviewed and edited the manuscript. \n\nData availability. Publicly available data were downloaded from the NeMO archive. The metadata were obtained from http://data.nemoarchive.org/biccn/grant/u19_zeng/zeng/transcriptome/scell/10x_v3/mouse/processed/analysis/10X_cells_v3_AIBS/. Raw FASTQs were obtained from http://data.nemoarchive.org/biccn/grant/u19_zeng/zeng/transcriptome/scell/10x_v3/mouse/raw/MOp/. Pre-built genome references were obtained from the 10\u00d7 Genomics website, at https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest. The FASTQ files were used to generate loom files with spliced and unspliced count matrices. These count matrices are available in the Zenodo package 10.5281/zenodo.7262328. The results of the fits generated with the Monod package, the SDE gradient descent fit, and the MCMC fit are available at https://github.com/pachterlab/GVFP_2021, as well as the Zenodo package 10.5281/zenodo.7262328. All synthetic data, generated using custom stochastic simulation code, as well as the simulation parameters, are deposited in the GitHub and Zenodo repositories. \n\nCode availability. Single-cell RNA sequencing data were pseudoaligned using kallisto\u2223bustools 0.26.0, wrapping kallisto 0.46.2 and bustools 0.40.0. Dataset filtering, reduced model fits, and Akaike information criterion computation were performed using Monod 0.2.4.0. MCMC parameter inference was performed using PyMC3 3.11.4, dependent on Theano-PyMC 1.1.2. Data input/output were performed using loompy 3.0.7. Numerical procedures, such as gradient descent and quadrature, were performed using SciPy 1.4.1 and NumPy 1.21.5. The algorithms were implemented in the framework of Python 3.7.12. All code is available at https://github.com/pachterlab/GVFP_2021 and the associated Zenodo package 10.5281/zenodo.726232892. The GitHub and Zenodo repositories include scripts used to construct a mouse genome reference, pseudoalign datasets, and generate all figures. They are modular: the analysis can be restarted at a set of intermediate steps. The outputs of certain steps, viz. pseudoaligned count matrices, results of the Monod pipeline, the list of genes of interest, results of the gradient descent procedure, and results of the Bayes factor computation procedure can be recomputed, or loaded in based on files available in the repositories. Synthetic data generated by simulation, as well as the routines used to generate the data, are available in the repositories. The CIR simulation is implemented in Python 3.7.12. The Gamma-OU simulation was developed using MATLAB 2020a, and executed in the Python wrapper for Octave, using versions oct2py 5.4.3 and octave-kernel 0.34.1. \n\nThe authors declare no competing interests.\n\nPublished - 41467_2022_Article_34857.pdf
Supplemental Material - 41467_2022_34857_MOESM1_ESM.pdf
", "abstract": "The question of how cell-to-cell differences in transcription rate affect RNA count distributions is fundamental for understanding biological processes underlying transcription. Answering this question requires quantitative models that are both interpretable (describing concrete biophysical phenomena) and tractable (amenable to mathematical analysis). This enables the identification of experiments which best discriminate between competing hypotheses. As a proof of principle, we introduce a simple but flexible class of models involving a continuous stochastic transcription rate driving a discrete RNA transcription and splicing process, and compare and contrast two biologically plausible hypotheses about transcription rate variation. One assumes variation is due to DNA experiencing mechanical strain, while the other assumes it is due to regulator number fluctuations. We introduce a framework for numerically and analytically studying such models, and apply Bayesian model selection to identify candidate genes that show signatures of each model in single-cell transcriptomic data from mouse glutamatergic neurons.", "date": "2022-12-09", "date_type": "published", "publication": "Nature Communications", "volume": "13", "publisher": "Nature Publishing Group", "pagerange": "Art. No. 7620", "id_number": "CaltechAUTHORS:20230622-883274000.1", "issn": "2041-1723", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20230622-883274000.1", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "U19MH114830" }, { "agency": "NSF", "grant_number": "DMS-1562078" } ] }, "local_group": { "items": [ { "id": "Division-of-Biology-and-Biological-Engineering" } ] }, "doi": "10.1038/s41467-022-34857-7", "pmcid": "PMC9734650", "primary_object": { "basename": "41467_2022_34857_MOESM1_ESM.pdf", "url": "https://authors.library.caltech.edu/records/d6ppt-egs07/files/41467_2022_34857_MOESM1_ESM.pdf" }, "related_objects": [ { "basename": "41467_2022_Article_34857.pdf", "url": "https://authors.library.caltech.edu/records/d6ppt-egs07/files/41467_2022_Article_34857.pdf" } ], "resource_type": "article", "pub_year": "2022", "author_list": "Gorin, Gennady; Vastola, John J.; et el." }, { "id": "https://authors.library.caltech.edu/records/sgfnt-bab08", "eprint_id": 116968, "eprint_status": "archive", "datestamp": "2023-08-22 17:32:31", "lastmod": "2023-12-22 23:17:09", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Gorin-Gennady", "name": { "family": "Gorin", "given": "Gennady" }, "orcid": "0000-0001-6097-2029" }, { "id": "Fang-Meichen", "name": { "family": "Fang", "given": "Meichen" }, "orcid": "0000-0002-8217-0710" }, { "id": "Chari-Tara", "name": { "family": "Chari", "given": "Tara" }, "orcid": "0000-0002-6953-4313" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "RNA velocity unraveled", "ispublished": "pub", "full_text_status": "public", "keywords": "Computational Theory and Mathematics; Cellular and Molecular Neuroscience; Genetics; Molecular Biology; Ecology; Modeling and Simulation; Ecology, Evolution, Behavior and Systematics", "note": "G.G. thanks Dr. John J. Vastola for fruitful discussions about landscape representations of biophysical systems.\n\nL.P. received the National Institutes of Health (nih.gov) award U19MH114830, administered by the National Institute of Mental Health (nimh.nih.gov). G.G., M.F., T.C., and L.P. were partially funded by this award. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.", "abstract": "We perform a thorough analysis of RNA velocity methods, with a view towards understanding the suitability of the various assumptions underlying popular implementations. In addition to providing a self-contained exposition of the underlying mathematics, we undertake simulations and perform controlled experiments on biological datasets to assess workflow sensitivity to parameter choices and underlying biology. Finally, we argue for a more rigorous approach to RNA velocity, and present a framework for Markovian analysis that points to directions for improvement and mitigation of current problems.", "date": "2022-09", "date_type": "published", "publication": "PLOS Computational Biology", "volume": "18", "number": "9", "publisher": "Public Library of Science", "pagerange": "Art. No. e1010492", "id_number": "CaltechAUTHORS:20220916-665804000.785", "issn": "1553-7358", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20220916-665804000.785", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "U19MH114830" } ] }, "local_group": { "items": [ { "id": "Division-of-Biology-and-Biological-Engineering" } ] }, "doi": "10.1371/journal.pcbi.1010492", "pmcid": "PMC9499228", "resource_type": "article", "pub_year": "2022", "author_list": "Gorin, Gennady; Fang, Meichen; et el." }, { "id": "https://authors.library.caltech.edu/records/m4wef-4m072", "eprint_id": 109117, "eprint_status": "archive", "datestamp": "2023-08-22 15:16:09", "lastmod": "2023-12-22 23:16:30", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Moses-Lambda", "name": { "family": "Moses", "given": "Lambda" }, "orcid": "0000-0002-7092-9427" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Museum of spatial transcriptomics", "ispublished": "pub", "full_text_status": "public", "keywords": "Fluorescence in situ hybridization; RNA sequencing; Software", "note": "\u00a9 2022 Nature Publishing Group. \n\nReceived 28 June 2021; Accepted 26 January 2022; Published 10 March 2022. \n\nThis work was supported by a grant from the National Institute of Mental Health (NIMH), National Institute of Health (NIH), of the U.S. Department of Health & Human Services (number U19MH114830, L.P.). We thank the following people for providing feedback for earlier versions of this paper and the supplement: D. Furth from the Cold Spring Harbor Laboratories, L. Cai from the California Institute of Technology, and G. Victora from the Rockefeller University. \n\nData availability: The database of spatial transcriptomics literature can be accessed at https://docs.google.com/spreadsheets/d/1sJDb9B7AtYmfKv4-m8XR7uc3XXw_k4kGSout8cqZ8bY/edit#gid=1363594152. The version used as of writing is in the metadata.xlsx file in the frozen DOI version of the GitHub repository to reproduce the figures in this paper and render the supplementary website: https://doi.org/10.5281/zenodo.5774128. \n\nCode availability: All code used to generate figures in this paper and render the supplementary website is in the GitHub repository: https://github.com/pachterlab/LP_2021. The frozen DOI version of the repository as of final submission of this paper is on Zenodo: https://doi.org/10.5281/zenodo.5774129. \n\nContributions: L.P. suggested the project. L.M. curated the database, performed the analyses of the metadata, and wrote the manuscript and the supplement, which have been proofread and edited by L.P. \n\nThe authors declare no competing interests. \n\nPeer review information: Nature Methods thanks Sten Linnarsson, Quan Nguyen and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Lei Tang was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.\n\nMoses, L., Pachter, L. Publisher Correction: Museum of spatial transcriptomics. Nat Methods (2022). https://doi.org/10.1038/s41592-022-01494-3\n\nSubmitted - 2021.05.11.443152v2.full.pdf
Supplemental Material - 41592_2022_1409_MOESM1_ESM.pdf
", "abstract": "The function of many biological systems, such as embryos, liver lobules, intestinal villi, and tumors, depends on the spatial organization of their cells. In the past decade, high-throughput technologies have been developed to quantify gene expression in space, and computational methods have been developed that leverage spatial gene expression data to identify genes with spatial patterns and to delineate neighborhoods within tissues. To comprehensively document spatial gene expression technologies and data-analysis methods, we present a curated review of literature on spatial transcriptomics dating back to 1987, along with a thorough analysis of trends in the field, such as usage of experimental techniques, species, tissues studied, and computational approaches used. Our Review places current methods in a historical context, and we derive insights about the field that can guide current research strategies. A companion supplement offers a more detailed look at the technologies and methods analyzed: https://pachterlab.github.io/LP_2021/.", "date": "2022-05", "date_type": "published", "publication": "Nature Methods", "volume": "19", "number": "5", "publisher": "Nature Publishing Group", "pagerange": "534-546", "id_number": "CaltechAUTHORS:20210513-122736659", "issn": "1548-7091", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20210513-122736659", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "U19MH114830" } ] }, "local_group": { "items": [ { "id": "Division-of-Biology-and-Biological-Engineering" } ] }, "doi": "10.1038/s41592-022-01409-2", "primary_object": { "basename": "41592_2022_1409_MOESM1_ESM.pdf", "url": "https://authors.library.caltech.edu/records/m4wef-4m072/files/41592_2022_1409_MOESM1_ESM.pdf" }, "related_objects": [ { "basename": "2021.05.11.443152v2.full.pdf", "url": "https://authors.library.caltech.edu/records/m4wef-4m072/files/2021.05.11.443152v2.full.pdf" } ], "resource_type": "article", "pub_year": "2022", "author_list": "Moses, Lambda and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/myke3-zsv09", "eprint_id": 108549, "eprint_status": "archive", "datestamp": "2023-08-20 07:17:41", "lastmod": "2023-12-22 23:16:53", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Gorin-Gennady", "name": { "family": "Gorin", "given": "Gennady" }, "orcid": "0000-0001-6097-2029" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Modeling bursty transcription and splicing with the chemical master equation", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2022 Biophysical Society. \n\nReceived 30 July 2021, Accepted 3 February 2022, Available online 7 February 2022. \n\nThe spectral solution was derived by Meichen Fang. G.G. and L.P. were partially funded by NIH U19MH114830. The DNA illustration used in Figs. S1, S2, and S4, modified from (70), is a derivative of the DNA Twemoji by Twitter, Inc., used under CC-BY 4.0. The directed acyclic graph generation code was adapted from the IPython Parallel reference documentation: https://ipyparallel.readthedocs.io/en/latest/dag_dependencies.html. \n\nAuthor contributions. L.P. and G.G. designed the research and wrote the article. G.G. derived and implemented the analytical solutions, validated them against simulations, and performed the sequencing data analysis. \n\nCode availability. Google Colab Python notebooks that reproduce the analyses and benchmarking are available at https://github.com/pachterlab/GP_2021_2.\n\nAccepted Version - 1-s2.0-S0006349522001047-main_acc.pdf
Submitted - 2021.03.24.436847v2.full.pdf
Supplemental Material - 1-s2.0-S0006349522001047-mmc1.pdf
", "abstract": "Splicing cascades that alter gene products posttranscriptionally also affect expression dynamics. We study a class of processes and associated distributions that emerge from models of bursty promoters coupled to directed acyclic graphs of splicing. These solutions provide full time-dependent joint distributions for an arbitrary number of species with general noise behaviors and transient phenomena, offering qualitative and quantitative insights about how splicing can regulate expression dynamics. Finally, we derive a set of quantitative constraints on the minimum complexity necessary to reproduce gene coexpression patterns using synchronized burst models. We validate these findings by analyzing long-read sequencing data, where we find evidence of expression patterns largely consistent with these constraints.", "date": "2022-03-15", "date_type": "published", "publication": "Biophysical Journal", "volume": "121", "number": "6", "publisher": "Cell Press", "pagerange": "1056-1069", "id_number": "CaltechAUTHORS:20210325-075042340", "issn": "0006-3495", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20210325-075042340", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "U19MH114830" } ] }, "local_group": { "items": [ { "id": "Division-of-Biology-and-Biological-Engineering" } ] }, "doi": "10.1016/j.bpj.2022.02.004", "primary_object": { "basename": "1-s2.0-S0006349522001047-main_acc.pdf", "url": "https://authors.library.caltech.edu/records/myke3-zsv09/files/1-s2.0-S0006349522001047-main_acc.pdf" }, "related_objects": [ { "basename": "1-s2.0-S0006349522001047-mmc1.pdf", "url": "https://authors.library.caltech.edu/records/myke3-zsv09/files/1-s2.0-S0006349522001047-mmc1.pdf" }, { "basename": "2021.03.24.436847v2.full.pdf", "url": "https://authors.library.caltech.edu/records/myke3-zsv09/files/2021.03.24.436847v2.full.pdf" } ], "resource_type": "article", "pub_year": "2022", "author_list": "Gorin, Gennady and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/jy3nx-j5b58", "eprint_id": 108947, "eprint_status": "archive", "datestamp": "2023-08-22 13:32:15", "lastmod": "2023-12-22 23:17:01", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Gayoso-Adam", "name": { "family": "Gayoso", "given": "Adam" }, "orcid": "0000-0001-9537-0845" }, { "id": "Lopez-Romain", "name": { "family": "Lopez", "given": "Romain" }, "orcid": "0000-0003-0495-738X" }, { "id": "Xing-Galen", "name": { "family": "Xing", "given": "Galen" }, "orcid": "0000-0001-7376-6312" }, { "id": "Boyeau-Pierre", "name": { "family": "Boyeau", "given": "Pierre" }, "orcid": "0000-0003-4549-3972" }, { "id": "Amiri-Valeh-Valiollah-Pour", "name": { "family": "Amiri", "given": "Valeh Valiollah Pour" }, "orcid": "0000-0002-2008-5297" }, { "id": "Hong-Justin", "name": { "family": "Hong", "given": "Justin" }, "orcid": "0000-0003-2115-9101" }, { "id": "Wu-Katherine", "name": { "family": "Wu", "given": "Katherine" }, "orcid": "0000-0001-7562-4545" }, { "id": "Jayasuriya-Michael", "name": { "family": "Jayasuriya", "given": "Michael" }, "orcid": "0000-0003-2366-841X" }, { "id": "Mehlman-Edouard", "name": { "family": "Mehlman", "given": "Edouard" }, "orcid": "0000-0001-6351-2220" }, { "id": "Langevin-Maxime", "name": { "family": "Langevin", "given": "Maxime" }, "orcid": "0000-0002-5498-4661" }, { "id": "Liu-Yining", "name": { "family": "Liu", "given": "Yining" }, "orcid": "0000-0002-8779-2906" }, { "id": "Samaran-Jules", "name": { "family": "Samaran", "given": "Jules" }, "orcid": "0000-0001-7317-8190" }, { "id": "Misrachi-Gabriel", "name": { "family": "Misrachi", "given": "Gabriel" }, "orcid": "0000-0002-6020-4641" }, { "id": "Nazaret-Achille", "name": { "family": "Nazaret", "given": "Achille" }, "orcid": "0000-0002-5428-9810" }, { "id": "Clivio-Oscar", "name": { "family": "Clivio", "given": "Oscar" }, "orcid": "0000-0001-8668-4535" }, { "id": "Xu-Chenling", "name": { "family": "Xu", "given": "Chenling" }, "orcid": "0000-0001-9610-7627" }, { "id": "Ashuach-Tal", "name": { "family": "Ashuach", "given": "Tal" }, "orcid": "0000-0003-1939-0865" }, { "id": "Gabitto-Mariano", "name": { "family": "Gabitto", "given": "Mariano" }, "orcid": "0000-0001-6911-344X" }, { "id": "Lotfollahi-Mohammad", "name": { "family": "Lotfollahi", "given": "Mohammad" }, "orcid": "0000-0001-6858-7985" }, { "id": "Svensson-Valentine", "name": { "family": "Svensson", "given": "Valentine" }, "orcid": "0000-0002-9217-2330" }, { "id": "da-Veiga-Beltrame-Eduardo", "name": { "family": "da Veiga Beltrame", "given": "Eduardo" }, "orcid": "0000-0002-1529-9207" }, { "id": "Kleshchevnikov-Vitalii", "name": { "family": "Kleshchevnikov", "given": "Vitalii" }, "orcid": "0000-0001-9110-7441" }, { "id": "Talavera-L\u00f3pez-Carlos", "name": { "family": "Talavera-L\u00f3pez", "given": "Carlos" }, "orcid": "0000-0001-8590-2393" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Theis-Fabian-J", "name": { "family": "Theis", "given": "Fabian J." }, "orcid": "0000-0002-2419-1943" }, { "id": "Streets-Aaron-M", "name": { "family": "Streets", "given": "Aaron" }, "orcid": "0000-0002-3909-8389" }, { "id": "Jordan-Michael-I", "name": { "family": "Jordan", "given": "Michael I." }, "orcid": "0000-0001-8935-817X" }, { "id": "Regier-Jeffrey", "name": { "family": "Regier", "given": "Jeffrey" }, "orcid": "0000-0002-1472-5235" }, { "id": "Yosef-Nir", "name": { "family": "Yosef", "given": "Nir" }, "orcid": "0000-0001-9004-1225" } ] }, "title": "A Python library for probabilistic analysis of single-cell omics data", "ispublished": "pub", "full_text_status": "public", "keywords": "Computational models; Machine learning; Software; Statistical methods", "note": "\u00a9 2022 Nature Publishing Group. \n\nPublished 07 February 2022. \n\nWe acknowledge members of the Streets and Yosef laboratories for general feedback. We thank all the GitHub users who contributed code to scvi-tools over the years. We thank Nicholas Everetts for help with the analysis of the Drosophila data. We thank David Kelley and Nick Bernstein for help implementing Solo. We thank Marco Wagenstetter and Sergei Rybakov for help with the transition of the scGen package to use scvi-tools, as well as feedback on the scArches implementation. We thank Hector Roux de B\u00e9zieux for insightful discussions about the R ecosystem. We thank Kieran Campbell and Allen Zhang for clarifying aspects of the original CellAssign implementation. We thank the Pyro team, including Eli Bingham, Martin Jankowiak and Fritz Obermeyer, for help integrating Pyro in scvi-tools. Research reported in this manuscript was supported by the NIGMS of the National Institutes of Health under award number R35GM124916 and by the Chan-Zuckerberg Foundation Network under grant number 2019-02452. O.C. is supported by the EPSRC Centre for Doctoral Training in Modern Statistics and Statistical Machine Learning (EP/S023151/1, studentship 2420649). A.G. is supported by NIH Training Grant 5T32HG000047-19. A.S. and N.Y. are Chan Zuckerberg Biohub investigators. \n\nContributions: A.G., R.L and G.X. contributed equally. A.G. designed the scvi-tools application programming interface with input from G.X. and R.L. G.X. and A.G. led development of scvi-tools with input from R.L. G.X. reimplemented scVI, totalVI, AutoZI and scANVI with input from A.G. R.L. implemented Stereoscope with input from A.G. Data analysis in this manuscript was led by A.G., R.L. and G.X, with input from N.Y. A.G., R.L., P.B., E.M., M. Langevin., Y.L., J.S., G.M. and A.N., O.C. worked on the initial version of the codebase (scvi package), with input from M.I.J, J.R. and N.Y. R.L., E.M. and C.X. contributed the scANVI model, with input from J.R. and N.Y. A.G. implemented totalVI with input from A.S. and N.Y. T.A. implemented peakVI with input from A.G. A.G implemented scArches with input from M. Lotfollahi., F.J.T and N.Y. V.S. made several contributions to the codebase, including the LDVAE model. P.B. contributed the differential expression programming interface. E.d.V.B. and C.T.-L. provided tutorials on differential expression and deconvolution of spatial transcriptomics, with input from L.P. K.W. implemented CellAssign in the codebase with input from A.G. V.V.P.A., J.H. and M.J. made general code contributions and helped maintain scvi-tools. J.H. implemented LDA. T.A. and M.G. implemented MultiVI. V.K. improved Pyro support in scvi-tools and ported Cell2Location to use scvi-tools. N.Y. supervised all research. A.G., R.L., G.X., J.R. and N.Y. wrote the manuscript. \n\nCompeting interests: V.S. is a full-time employee of Serqet Therapeutics and has ownership interest in Serqet Therapeutics. F.J.T. reports consulting fees from Roche Diagnostics GmbH and Cellarity Inc., and ownership interest in Cellarity, Inc. N.Y. is an advisor to and/or has equity in Cellarity, Celsius Therapeutics and Rheos Medicines. The remaining authors declare no competing interests. \n\nPeer review information: Nature Biotechnology thanks Martin Hemberg and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.\n\nSubmitted - 2021.04.28.441833v1.full.pdf
Supplemental Material - 41587_2021_1206_MOESM1_ESM.pdf
", "abstract": "Methods for analyzing single-cell data perform a core set of computational tasks. These tasks include dimensionality reduction, cell clustering, cell-state annotation, removal of unwanted variation, analysis of differential expression, identification of spatial patterns of gene expression, and joint analysis of multi-modal omics data. Many of these methods rely on likelihood-based models to represent variation in the data; we refer to these as 'probabilistic models'. Probabilistic models provide principled ways to capture uncertainty in biological systems and are convenient for decomposing the many sources of variation that give rise to omics data.", "date": "2022-02", "date_type": "published", "publication": "Nature Biotechnology", "volume": "40", "number": "2", "publisher": "Nature Publishing Group", "pagerange": "163-166", "id_number": "CaltechAUTHORS:20210503-142332959", "issn": "1087-0156", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20210503-142332959", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R35GM124916" }, { "agency": "Chan-Zuckerberg Foundation", "grant_number": "2019-02452" }, { "agency": "Engineering and Physical Sciences Research Council (EPSRC)", "grant_number": "EP/S023151/1" }, { "agency": "Engineering and Physical Sciences Research Council (EPSRC)", "grant_number": "2420649" }, { "agency": "NIH Predoctoral Fellowship", "grant_number": "5T32HG000047-19" } ] }, "local_group": { "items": [ { "id": "Division-of-Biology-and-Biological-Engineering" } ] }, "doi": "10.1038/s41587-021-01206-w", "primary_object": { "basename": "2021.04.28.441833v1.full.pdf", "url": "https://authors.library.caltech.edu/records/jy3nx-j5b58/files/2021.04.28.441833v1.full.pdf" }, "related_objects": [ { "basename": "41587_2021_1206_MOESM1_ESM.pdf", "url": "https://authors.library.caltech.edu/records/jy3nx-j5b58/files/41587_2021_1206_MOESM1_ESM.pdf" } ], "resource_type": "article", "pub_year": "2022", "author_list": "Gayoso, Adam; Lopez, Romain; et el." }, { "id": "https://authors.library.caltech.edu/records/5afgq-qgg51", "eprint_id": 107730, "eprint_status": "archive", "datestamp": "2023-08-20 06:02:40", "lastmod": "2023-12-22 23:32:48", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Chari-Tara", "name": { "family": "Chari", "given": "Tara" }, "orcid": "0000-0002-6953-4313" }, { "id": "Weissbourd-Brandon", "name": { "family": "Weissbourd", "given": "Brandon" }, "orcid": "0000-0001-5422-3873" }, { "id": "Gehring-Jase", "name": { "family": "Gehring", "given": "Jase" }, "orcid": "0000-0002-3894-9495" }, { "id": "Ferraioli-Anna", "name": { "family": "Ferraioli", "given": "Anna" }, "orcid": "0000-0003-1817-6891" }, { "id": "Lecl\u00e8re-Lucas", "name": { "family": "Lecl\u00e8re", "given": "Lucas" }, "orcid": "0000-0002-7440-0467" }, { "id": "Herl-Makenna", "name": { "family": "Herl", "given": "Makenna" }, "orcid": "0000-0001-8518-5179" }, { "id": "Gao-Fan", "name": { "family": "Gao", "given": "Fan" }, "orcid": "0000-0001-6832-3402" }, { "id": "Chevalier-Sandra", "name": { "family": "Chevalier", "given": "Sandra" }, "orcid": "0000-0002-2717-6925" }, { "id": "Copley-Richard-R", "name": { "family": "Copley", "given": "Richard R." }, "orcid": "0000-0001-7846-4954" }, { "id": "Houliston-Evelyn", "name": { "family": "Houliston", "given": "Evelyn" }, "orcid": "0000-0001-9264-2585" }, { "id": "Anderson-D-J", "name": { "family": "Anderson", "given": "David J." }, "orcid": "0000-0001-6175-3872" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Whole-animal multiplexed single-cell RNA-seq reveals transcriptional shifts across Clytia medusa cell types", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2021 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution NonCommercial License 4.0 (CC BY-NC). \n\nReceived: 20 February 2021. Accepted: 6 October 2021. \n\nWe thank X. Da and X. Wang for technical assistance, T. Momose for assistance with the single-cell experimentation, the Caltech Single-Cell Profiling and Engineering Center for the use of their single-cell and sequencing tools, and the Caltech Bioinformatics Resource Center for transcriptome assembly and annotation analysis. We additionally thank the Caltech Center for Evolutionary Science for the bioinformatics resources to create a local UCSC Genome Browser. We thank A. S. Booeshaghi for help with kallisto, bustools, and the kITE demultiplexing of the ClickTag reads and for rescuing the stimulation experiment sequencing data. We thank J. Malamy for helping to establish Clytia work at Caltech. We thank S. Peron for initial characterization of some of the cell type marker genes, P. Lap\u00e9bie for identification of novel neuropeptide sequences, M. Jager for valuable advice on the in situ protocol, and J. R. Mateu for providing pp11 probe. \n\nJ.G., M.H., and L.P. were supported in part by a seed grant from the Chen Institute at the California Institute of Technology. T.C., J.G., and L.P. were supported in part by NIH U19MH114830 and NIH RF1AG062324A. We thank the Marine Resources Centre (CRBM and PIV imaging platform) of Institut de la Mer de Villefranche (IMEV), supported by EMBRC-France. The French state funds of EMBRC-France are managed by the ANR within the investments of the Future program. L.L. was supported by the Agence Nationale de la Recherche (ANR-19-CE13-0003). A.F., R.R.C., and E.H. were supported by the H2020/Marie Sk\u0142odowska-Curie ITN \"EvoCell\" Grant agreement no. 766053. B.W. was supported in part by a Howard Hughes Medical Institute Fellowship of the Life Sciences Research Foundation and by NIH K99NS119749. This work was in part supported by the Whitman Center of the Marine Biological Laboratory in Woods Hole, MA and a visiting grant from EMBRC-France. D.J.A. is an Investigator of the Howard Hughes Medical Institute. \n\nAuthor contributions: Conceived of the experiments: T.C., B.W., J.G., R.R.C., E.H., D.J.A., and L.P. Developed cell dissociation, fixation, and labeling procedures compatible with the 10X Genomics platform: J.G. and M.H. Performed the single-cell experiments: T.C., B.W., and J.G. Performed the in situ hybridization and other microscopy experiments: B.W., A.F., L.L., and S.C. Performed whole-organism qPCR: T.C. Performed bioinformatics analysis including assembly and annotation of the transcriptome: F.G. and R.R.C. Wrote scripts for processing the data and code for the analysis: T.C. and J.G. Developed the Google Colab notebooks: T.C. Analyzed and interpreted the data: T.C., B.W., J.G., A.F., L.L., R.R.C., E.H., D.J.A., and L.P. Writing and editing the manuscript: T.C., B.W., J.G., A.F., L.L., R.R.C., E.H., D.J.A., and L.P. \n\nThe authors declare that they have no competing interests. \n\nData and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. All raw sequencing and processed data files used for analysis are available from CaltechData, (https://data.caltech.edu/search?page=1&size=25&ln=en&q=clytia), with links additionally provided via the notebooks in the code repository. The sequencing read alignments are available at http://evolution.caltech.edu/genomebrowser/cgi-bin/hgTracks?db=hub_135_clyHem1&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=scaffold_1%3A1%2D30003&hgsid=3413_LS7OcP5N7VA2rApGOfk8iaX2kVFR, and an interactive browser for gene expression visualization (http://131.215.78.40/cb), is publicly hosted on a UCSC Genome Browser by the Caltech Bioinformatics Resource Center. The softwares used are as follows: Cell Ranger 3.0.1, Trinity-v2.8.4, Cufflinks v2.2.1, kallisto v0.46.2, bustools v0.40.0, anndata 0.7.5, louvain 0.7.0, rpy2 3.4.2, scanpy 1.6.0, biopython 1.78, pysam 0.16.0.1, fuzzywuzzy 0.18.0, numpy 0.19.5, pandas 1.1.5, matplotlib 3.2.2, sklearn 0.0, scipy 1.4.1, seaborn 0.11.1, requests 2.23.0, tqdm 4.41.1, multiprocess 0.70.11.1, DESeq2 1.3.0, topGO 2.42.0, and UpSet 1.4.0. Code availability: All the codes used to perform the analyses and generate the results and figures are available in Google Colab notebooks archived with Zenodo at https://zenodo.org/record/5519756#.YUonytNKgUE and directly available at https://github.com/pachterlab/CWGFLHGCCHAP_2021. The notebooks, which include the complete preprocessing of the raw data and a walkthrough of the code, provide a transparent implementation of the methods and can be run for free in the Google cloud.\n\nPublished - sciadv.abh1683.pdf
Submitted - 2021.01.22.427844v2.full.pdf
Supplemental Material - sciadv.abh1683_sm.pdf
Supplemental Material - sciadv.abh1683_tables_s3_and_s5.zip
", "abstract": "We present an organism-wide, transcriptomic cell atlas of the hydrozoan medusa Clytia hemisphaerica and describe how its component cell types respond to perturbation. Using multiplexed single-cell RNA sequencing, in which individual animals were indexed and pooled from control and perturbation conditions into a single sequencing run, we avoid artifacts from batch effects and are able to discern shifts in cell state in response to organismal perturbations. This work serves as a foundation for future studies of development, function, and regeneration in a genetically tractable jellyfish species. Moreover, we introduce a powerful workflow for high-resolution, whole-animal, multiplexed single-cell genomics that is readily adaptable to other traditional or nontraditional model organisms.", "date": "2021-11-26", "date_type": "published", "publication": "Science Advances", "volume": "7", "number": "48", "publisher": "American Association for the Advancement of Science", "pagerange": "Art. No. eabh1683", "id_number": "CaltechAUTHORS:20210126-133110736", "issn": "2375-2548", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20210126-133110736", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "Caltech Center for Evolutionary Science" }, { "agency": "Tianqiao and Chrissy Chen Institute for Neuroscience" }, { "agency": "NIH", "grant_number": "U19MH114830" }, { "agency": "NIH", "grant_number": "RF1AG062324A" }, { "agency": "European Marine Biological Resource Centre" }, { "agency": "Agence Nationale pour la Recherche (ANR)", "grant_number": "ANR-19-CE13-0003" }, { "agency": "Marie Curie Fellowship", "grant_number": "766053" }, { "agency": "Howard Hughes Medical Institute (HHMI)" }, { "agency": "Life Sciences Research Foundation" }, { "agency": "NIH", "grant_number": "K99NS119749" }, { "agency": "Marine Biological Laboratory" } ] }, "local_group": { "items": [ { "id": "Tianqiao-and-Chrissy-Chen-Institute-for-Neuroscience" }, { "id": "Division-of-Biology-and-Biological-Engineering" } ] }, "doi": "10.1126/sciadv.abh1683", "pmcid": "PMC8626072", "primary_object": { "basename": "sciadv.abh1683_sm.pdf", "url": "https://authors.library.caltech.edu/records/5afgq-qgg51/files/sciadv.abh1683_sm.pdf" }, "related_objects": [ { "basename": "sciadv.abh1683_tables_s3_and_s5.zip", "url": "https://authors.library.caltech.edu/records/5afgq-qgg51/files/sciadv.abh1683_tables_s3_and_s5.zip" }, { "basename": "2021.01.22.427844v2.full.pdf", "url": "https://authors.library.caltech.edu/records/5afgq-qgg51/files/2021.01.22.427844v2.full.pdf" }, { "basename": "sciadv.abh1683.pdf", "url": "https://authors.library.caltech.edu/records/5afgq-qgg51/files/sciadv.abh1683.pdf" } ], "resource_type": "article", "pub_year": "2021", "author_list": "Chari, Tara; Weissbourd, Brandon; et el." }, { "id": "https://authors.library.caltech.edu/records/mn91e-p6r12", "eprint_id": 111143, "eprint_status": "archive", "datestamp": "2023-08-20 06:00:01", "lastmod": "2023-12-22 23:16:22", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Rahman-Atif", "name": { "family": "Rahman", "given": "Atif" }, "orcid": "0000-0003-1805-3971" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "SWALO: scaffolding with assembly likelihood optimization", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. \n\nReceived: 22 December 2020; Revision received: 16 June 2021; Accepted: 16 August 2021; Published: 20 August 2021. \n\nWe thank Dan Rokhsar, P\u00e1ll Melsted, Harold Pimentel, Shannon McCurdy and Nicolas Bray for helpful conversations during the development of SWALO. \n\nFunding: NIH [R01 HG006129 to L.P., in part]; Fulbright Science & Technology Fellowship [15093630 to A.R., in part]. Funding for open access charge: NIH [R01 HG006129]. \n\nConflict of interest statement: None declared.\n\nPublished - gkab717.pdf
Submitted - 081786v2.full.pdf
Supplemental Material - gkab717_supplemental_file.pdf
", "abstract": "Scaffolding, i.e. ordering and orienting contigs is an important step in genome assembly. We present a method for scaffolding using second generation sequencing reads based on likelihoods of genome assemblies. A generative model for sequencing is used to obtain maximum likelihood estimates of gaps between contigs and to estimate whether linking contigs into scaffolds would lead to an increase in the likelihood of the assembly. We then link contigs if they can be unambiguously joined or if the corresponding increase in likelihood is substantially greater than that of other possible joins of those contigs. The method is implemented in a tool called SWALO with approximations to make it efficient and applicable to large datasets. Analysis on real and simulated datasets reveals that it consistently makes more or similar number of correct joins as other scaffolders while linking very few contigs incorrectly, thus outperforming other scaffolders and demonstrating that substantial improvement in genome assembly may be achieved through the use of statistical models. SWALO is freely available for download at https://atifrahman.github.io/SWALO/.", "date": "2021-11-18", "date_type": "published", "publication": "Nucleic Acids Research", "volume": "49", "number": "20", "publisher": "Oxford University Press", "pagerange": "Art. No. e117", "id_number": "CaltechAUTHORS:20210930-221100053", "issn": "0305-1048", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20210930-221100053", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R01 HG006129" }, { "agency": "Fulbright Foundation", "grant_number": "15093630" } ] }, "local_group": { "items": [ { "id": "Division-of-Biology-and-Biological-Engineering" } ] }, "doi": "10.1093/nar/gkab717", "primary_object": { "basename": "gkab717.pdf", "url": "https://authors.library.caltech.edu/records/mn91e-p6r12/files/gkab717.pdf" }, "related_objects": [ { "basename": "gkab717_supplemental_file.pdf", "url": "https://authors.library.caltech.edu/records/mn91e-p6r12/files/gkab717_supplemental_file.pdf" }, { "basename": "081786v2.full.pdf", "url": "https://authors.library.caltech.edu/records/mn91e-p6r12/files/081786v2.full.pdf" } ], "resource_type": "article", "pub_year": "2021", "author_list": "Rahman, Atif and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/6zspp-xxk39", "eprint_id": 106288, "eprint_status": "archive", "datestamp": "2023-08-22 11:31:32", "lastmod": "2023-12-22 23:16:28", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Adkins-Ricky-S", "name": { "family": "Adkins", "given": "Ricky S." }, "orcid": "0000-0002-7983-5486" }, { "name": { "family": "Aldridge", "given": "Andrew I." }, "orcid": "0000-0003-1962-8802" }, { "name": { "family": "Allen", "given": "Shona" }, "orcid": "0000-0003-0186-0574" }, { "name": { "family": "Ament", "given": "Seth A." }, "orcid": "0000-0001-6443-7509" }, { "name": { "family": "An", "given": "Xu" }, "orcid": "0000-0003-3386-5521" }, { "name": { "family": "Armand", "given": "Ethan" }, "orcid": "0000-0002-4516-6317" }, { "name": { "family": "Ascoli", "given": "Giorgio A." }, "orcid": "0000-0002-0964-676X" }, { "name": { "family": "Bakken", "given": "Trygve E." }, "orcid": "0000-0003-3373-7386" }, { "name": { "family": "Bandrowski", "given": "Anita" }, "orcid": "0000-0002-5497-0243" }, { "name": { "family": "Banerjee", "given": "Samik" }, "orcid": "0000-0003-2325-1489" }, { "name": { "family": "Barkas", "given": "Nikolaos" }, "orcid": "0000-0002-4675-0718" }, { "name": { "family": "Bartlett", "given": "Anna" }, "orcid": "0000-0001-7059-4033" }, { "name": { "family": "Bateup", "given": "Helen S." }, "orcid": "0000-0002-0135-0972" }, { "name": { "family": "Behrens", "given": "M. Margarita" }, "orcid": "0000-0002-7168-8186" }, { "name": { "family": "Berens", "given": "Philipp" }, "orcid": "0000-0002-0199-4727" }, { "name": { "family": "Berg", "given": "Jim" }, "orcid": "0000-0002-3300-5399" }, { "name": { "family": "Bernabucci", "given": "Matteo" }, "orcid": "0000-0003-4458-117X" }, { "name": { "family": "Bernaerts", "given": "Yves" }, "orcid": "0000-0003-4948-0423" }, { "name": { "family": "Bertagnolli", "given": "Darren" }, "orcid": "0000-0002-6626-1567" }, { "name": { "family": "Biancalani", "given": "Tommaso" }, "orcid": "0000-0001-9104-9755" }, { "name": { "family": "Boggeman", "given": "Lara" } }, { "id": "Booeshaghi-A-Sina", "name": { "family": "Booeshaghi", "given": "A. Sina" }, "orcid": "0000-0002-6442-4502" }, { "name": { "family": "Bowman", "given": "Ian" }, "orcid": "0000-0001-7366-9192" }, { "name": { "family": "Bravo", "given": "H\u00e9ctor Corrada" }, "orcid": "0000-0002-1255-4444" }, { "name": { "family": "Cadwell", "given": "Cathryn Ren\u00e9" }, "orcid": "0000-0003-1963-8285" }, { "name": { "family": "Callaway", "given": "Edward M." }, "orcid": "0000-0002-6366-5267" }, { "name": { "family": "Carlin", "given": "Benjamin" }, "orcid": "0000-0002-9360-9143" }, { "name": { "family": "O'Connor", "given": "Carolyn" }, "orcid": "0000-0002-3301-7912" }, { "name": { "family": "Carter", "given": "Robert" }, "orcid": "0000-0003-0937-8141" }, { "name": { "family": "Casper", "given": "Tamara" }, "orcid": "0000-0003-1638-3651" }, { "name": { "family": "Castanon", "given": "Rosa G." }, "orcid": "0000-0003-1791-002X" }, { "name": { "family": "Castro", "given": "Jesus Ramon" }, "orcid": "0000-0002-6628-980X" }, { "name": { "family": "Chance", "given": "Rebecca K." }, "orcid": "0000-0001-7059-6119" }, { "name": { "family": "Chatterjee", "given": "Apaala" }, "orcid": "0000-0003-1170-8971" }, { "name": { "family": "Chen", "given": "Huaming" }, "orcid": "0000-0001-5289-7882" }, { "name": { "family": "Chun", "given": "Jerold" }, "orcid": "0000-0003-3964-0921" }, { "name": { "family": "Colantuoni", "given": "Carlo" }, "orcid": "0000-0001-6818-6380" }, { "name": { "family": "Crabtree", "given": "Jonathan" }, "orcid": "0000-0002-7286-5690" }, { "name": { "family": "Creasy", "given": "Heather" }, "orcid": "0000-0002-1369-6882" }, { "name": { "family": "Crichton", "given": "Kirsten" }, "orcid": "0000-0002-7869-1492" }, { "name": { "family": "Crow", "given": "Megan" }, "orcid": "0000-0002-1172-5897" }, { "name": { "family": "D'Orazi", "given": "Florence D." }, "orcid": "0000-0002-7354-4725" }, { "name": { "family": "Daigle", "given": "Tanya L." }, "orcid": "0000-0001-9700-8452" }, { "name": { "family": "Dalley", "given": "Rachel" }, "orcid": "0000-0001-7461-7845" }, { "name": { "family": "Dee", "given": "Nick" }, "orcid": "0000-0002-2831-9254" }, { "name": { "family": "Degatano", "given": "Kylee" }, "orcid": "0000-0002-0945-3300" }, { "name": { "family": "Dichter", "given": "Benjamin" }, "orcid": "0000-0001-5725-6910" }, { "name": { "family": "Diep", "given": "Dinh" }, "orcid": "0000-0001-6057-4119" }, { "name": { "family": "Ding", "given": "Liya" }, "orcid": "0000-0002-1209-875X" }, { "name": { "family": "Ding", "given": "Song-Lin" }, "orcid": "0000-0002-7072-5272" }, { "name": { "family": "Dominguez", "given": "Bertha" }, "orcid": "0000-0002-9470-7300" }, { "name": { "family": "Dong", "given": "Hong-Wei" }, "orcid": "0000-0001-9972-3177" }, { "name": { "family": "Dong", "given": "Weixiu" }, "orcid": "0000-0003-1059-5653" }, { "name": { "family": "Dougherty", "given": "Elizabeth L." }, "orcid": "0000-0001-8922-5078" }, { "name": { "family": "Dudoit", "given": "Sandrine" }, "orcid": "0000-0002-6069-8629" }, { "name": { "family": "Ecker", "given": "Joseph R." }, "orcid": "0000-0001-5799-5895" }, { "name": { "family": "Eichhorn", "given": "Stephen W." }, "orcid": "0000-0002-6410-4699" }, { "name": { "family": "Fang", "given": "Rongxin" }, "orcid": "0000-0003-0107-7504" }, { "name": { "family": "Felix", "given": "Victor" }, "orcid": "0000-0002-9773-0629" }, { "name": { "family": "Feng", "given": "Guoping" }, "orcid": "0000-0002-8021-277X" }, { "name": { "family": "Feng", "given": "Zhao" }, "orcid": "0000-0001-5035-7655" }, { "name": { "family": "Fischer", "given": "Stephan" }, "orcid": "0000-0002-7034-4103" }, { "name": { "family": "Fitzpatrick", "given": "Conor" }, "orcid": "0000-0003-2625-6277" }, { "name": { "family": "Fong", "given": "Olivia" }, "orcid": "0000-0002-7091-9667" }, { "name": { "family": "Foster", "given": "Nicholas N." }, "orcid": "0000-0003-1740-9788" }, { "name": { "family": "Galbavy", "given": "William" }, "orcid": "0000-0003-0948-9538" }, { "name": { "family": "Gee", "given": "James C." }, "orcid": "0000-0002-2258-0187" }, { "name": { "family": "Ghosh", "given": "Satrajit S." }, "orcid": "0000-0002-5312-6729" }, { "name": { "family": "Giglio", "given": "Michelle" }, "orcid": "0000-0001-7628-5565" }, { "name": { "family": "Gillespie", "given": "Thomas H." }, "orcid": "0000-0002-7509-4801" }, { "name": { "family": "Gillis", "given": "Jesse" }, "orcid": "0000-0002-0936-9774" }, { "name": { "family": "Goldman", "given": "Melissa" }, "orcid": "0000-0003-1469-5360" }, { "name": { "family": "Goldy", "given": "Jeff" }, "orcid": "0000-0001-5140-6922" }, { "name": { "family": "Gong", "given": "Hui" }, "orcid": "0000-0001-5519-6248" }, { "name": { "family": "Gou", "given": "Lin" }, "orcid": "0000-0002-3109-1879" }, { "name": { "family": "Grauer", "given": "Michael" }, "orcid": "0000-0002-4167-1076" }, { "name": { "family": "Halchenko", "given": "Yaroslav O." }, "orcid": "0000-0003-3456-2493" }, { "name": { "family": "Harris", "given": "Julie A." }, "orcid": "0000-0003-0820-2021" }, { "name": { "family": "Hartmanis", "given": "Leonard" }, "orcid": "0000-0002-4922-8781" }, { "name": { "family": "Hatfield", "given": "Joshua T." }, "orcid": "0000-0002-1639-7212" }, { "name": { "family": "Hawrylycz", "given": "Mike" }, "orcid": "0000-0002-5741-8024" }, { "name": { "family": "Helba", "given": "Brian" }, "orcid": "0000-0003-2628-805X" }, { "name": { "family": "Herb", "given": "Brian R." }, "orcid": "0000-0002-5910-9647" }, { "name": { "family": "Hertzano", "given": "Ronna" }, "orcid": "0000-0002-8093-6567" }, { "name": { "family": "Hintiryan", "given": "Houri" }, "orcid": "0000-0002-9721-6785" }, { "name": { "family": "Hirokawa", "given": "Karla E." }, "orcid": "0000-0002-9954-5515" }, { "name": { "family": "Hockemeyer", "given": "Dirk" }, "orcid": "0000-0002-5598-5092" }, { "name": { "family": "Hodge", "given": "Rebecca D." }, "orcid": "0000-0002-5784-9668" }, { "name": { "family": "Hood", "given": "Greg" }, "orcid": "0000-0001-9871-7154" }, { "name": { "family": "Horwitz", "given": "Gregory D." }, "orcid": "0000-0001-5130-5259" }, { "name": { "family": "Hou", "given": "Xiaomeng" }, "orcid": "0000-0002-5453-9015" }, { "name": { "family": "Hu", "given": "Lijuan" }, "orcid": "0000-0003-1869-0372" }, { "name": { "family": "Hu", "given": "Qiwen" }, "orcid": "0000-0003-2798-919X" }, { "name": { "family": "Huang", "given": "Z. Josh" }, "orcid": "0000-0003-0592-028X" }, { "name": { "family": "Huo", "given": "Bingxing" }, "orcid": "0000-0002-9389-2591" }, { "name": { "family": "Ito-Cole", "given": "Tony" }, "orcid": "0000-0001-5898-3108" }, { "name": { "family": "Jacobs", "given": "Matthew" }, "orcid": "0000-0002-3004-8553" }, { "name": { "family": "Jia", "given": "Xueyan" }, "orcid": "0000-0002-1221-6357" }, { "name": { "family": "Jiang", "given": "Shengdian" }, "orcid": "0000-0002-2277-263X" }, { "name": { "family": "Jiang", "given": "Tao" }, "orcid": "0000-0002-4487-299X" }, { "name": { "family": "Jiang", "given": "Xiaolong" }, "orcid": "0000-0001-8066-1383" }, { "name": { "family": "Jin", "given": "Xin" }, "orcid": "0000-0002-1106-4013" }, { "name": { "family": "Jorstad", "given": "Nikolas L." }, "orcid": "0000-0001-7906-9470" }, { "name": { "family": "Kalmbach", "given": "Brian E." }, "orcid": "0000-0003-3136-8097" }, { "name": { "family": "Kancherla", "given": "Jayaram" }, "orcid": "0000-0001-5855-5031" }, { "name": { "family": "Keene", "given": "C. Dirk" }, "orcid": "0000-0002-5291-1469" }, { "name": { "family": "Kelly", "given": "Kathleen" }, "orcid": "0000-0003-2334-9785" }, { "name": { "family": "Khajouei", "given": "Farzaneh" }, "orcid": "0000-0002-0148-9122" }, { "name": { "family": "Kharchenko", "given": "Peter V." }, "orcid": "0000-0002-6036-5875" }, { "name": { "family": "Kim", "given": "Gukhan" }, "orcid": "0000-0002-3338-5045" }, { "name": { "family": "Ko", "given": "Andrew L." }, "orcid": "0000-0002-6253-9891" }, { "name": { "family": "Kobak", "given": "Dmitry" }, "orcid": "0000-0002-5639-7209" }, { "name": { "family": "Konwar", "given": "Kishori" }, "orcid": "0000-0001-5152-4777" }, { "name": { "family": "Kramer", "given": "Daniel J." }, "orcid": "0000-0003-4241-3586" }, { "name": { "family": "Krienen", "given": "Fenna M." }, "orcid": "0000-0002-1400-6820" }, { "name": { "family": "Kroll", "given": "Matthew" }, "orcid": "0000-0002-0126-7618" }, { "name": { "family": "Kuang", "given": "Xiuli" }, "orcid": "0000-0001-7569-7605" }, { "name": { "family": "Kuo", "given": "Hsien-Chi" }, "orcid": "0000-0002-0215-2302" }, { "name": { "family": "Lake", "given": "Blue B." }, "orcid": "0000-0002-8637-9044" }, { "name": { "family": "Larsen", "given": "Rachael" }, "orcid": "0000-0003-0178-003X" }, { "name": { "family": "Lathia", "given": "Kanan" }, "orcid": "0000-0003-0080-1951" }, { "name": { "family": "Laturnus", "given": "Sophie" }, "orcid": "0000-0001-9532-788X" }, { "name": { "family": "Lee", "given": "Angus Y." }, "orcid": "0000-0002-7649-2705" }, { "name": { "family": "Lee", "given": "Cheng-Ta" }, "orcid": "0000-0001-6183-2319" }, { "name": { "family": "Lee", "given": "Kuo-Fen" }, "orcid": "0000-0003-2224-2708" }, { "name": { "family": "Lein", "given": "Ed S." }, "orcid": "0000-0001-9012-6552" }, { "name": { "family": "Lesnar", "given": "Phil" }, "orcid": "0000-0002-2152-604X" }, { "name": { "family": "Li", "given": "Anan" }, "orcid": "0000-0002-5877-4813" }, { "name": { "family": "Li", "given": "Xiangning" }, "orcid": "0000-0002-3747-2824" }, { "name": { "family": "Li", "given": "Xu" } }, { "name": { "family": "Li", "given": "Yang Eric" }, "orcid": "0000-0001-6997-6018" }, { "name": { "family": "Li", "given": "Yaoyao" }, "orcid": "0000-0001-5468-9876" }, { "name": { "family": "Li", "given": "Yuanyuan" }, "orcid": "0000-0002-0897-5270" }, { "name": { "family": "Lim", "given": "Byungkook" }, "orcid": "0000-0002-3766-5415" }, { "name": { "family": "Linnarsson", "given": "Sten" }, "orcid": "0000-0002-3491-3444" }, { "name": { "family": "Liu", "given": "Christine S." }, "orcid": "0000-0002-1239-4612" }, { "name": { "family": "Liu", "given": "Hanqing" }, "orcid": "0000-0002-5114-6048" }, { "name": { "family": "Liu", "given": "Lijuan" }, "orcid": "0000-0002-9548-6183" }, { "name": { "family": "Lucero", "given": "Jacinta D." }, "orcid": "0000-0001-7578-6624" }, { "name": { "family": "Luo", "given": "Chongyuan" }, "orcid": "0000-0002-8541-0695" }, { "name": { "family": "Luo", "given": "Qingming" }, "orcid": "0000-0002-6725-9311" }, { "name": { "family": "Macosko", "given": "Evan Z." }, "orcid": "0000-0002-2794-5165" }, { "name": { "family": "Mahurkar", "given": "Anup" }, "orcid": "0000-0002-4999-2296" }, { "name": { "family": "Martone", "given": "Maryann E." }, "orcid": "0000-0002-8406-3871" }, { "name": { "family": "Matho", "given": "Katherine S." }, "orcid": "0000-0002-6105-4219" }, { "name": { "family": "McCarroll", "given": "Steven A." }, "orcid": "0000-0002-6954-8184" }, { "name": { "family": "McCracken", "given": "Carrie" }, "orcid": "0000-0002-8038-9727" }, { "name": { "family": "McMillen", "given": "Delissa" }, "orcid": "0000-0002-3413-4424" }, { "name": { "family": "Miranda", "given": "Elanine" }, "orcid": "0000-0002-1633-9303" }, { "name": { "family": "Mitra", "given": "Partha P" }, "orcid": "0000-0001-8818-6804" }, { "name": { "family": "Miyazaki", "given": "Paula Assakura" }, "orcid": "0000-0003-1295-8710" }, { "name": { "family": "Mizrachi", "given": "Judith" }, "orcid": "0000-0003-2195-8210" }, { "name": { "family": "Mok", "given": "Stephanie" }, "orcid": "0000-0002-2688-1569" }, { "name": { "family": "Mukamel", "given": "Eran A." }, "orcid": "0000-0003-3203-9535" }, { "name": { "family": "Mulherkar", "given": "Shalaka" }, "orcid": "0000-0001-8736-527X" }, { "name": { "family": "Nadaf", "given": "Naeem M." }, "orcid": "0000-0002-7805-8523" }, { "name": { "family": "Naeemi", "given": "Maitham" }, "orcid": "0000-0001-9139-3548" }, { "name": { "family": "Narasimhan", "given": "Arun" }, "orcid": "0000-0002-0246-6301" }, { "name": { "family": "Nery", "given": "Joseph R." }, "orcid": "0000-0003-0153-5659" }, { "name": { "family": "Ng", "given": "Lydia" }, "orcid": "0000-0002-7499-3514" }, { "name": { "family": "Ngai", "given": "John" }, "orcid": "0000-0002-1191-8971" }, { "name": { "family": "Nguyen", "given": "Thuc Nghi" }, "orcid": "0000-0002-6466-5883" }, { "name": { "family": "Nickel", "given": "Lance" }, "orcid": "0000-0002-5836-3571" }, { "name": { "family": "Nicovich", "given": "Philip R." }, "orcid": "0000-0002-8517-4469" }, { "name": { "family": "Niu", "given": "Sheng-Yong" }, "orcid": "0000-0002-7734-1191" }, { "name": { "family": "Ntranos", "given": "Vasilis" }, "orcid": "0000-0002-2477-0670" }, { "name": { "family": "Nunn", "given": "Michael" }, "orcid": "0000-0002-6771-9912" }, { "name": { "family": "Olley", "given": "Dustin" }, "orcid": "0000-0001-8685-0839" }, { "name": { "family": "Orvis", "given": "Joshua" }, "orcid": "0000-0002-5705-5710" }, { "name": { "family": "Osteen", "given": "Julia K." }, "orcid": "0000-0001-7058-3297" }, { "name": { "family": "Osten", "given": "Pavel" }, "orcid": "0000-0002-6385-7541" }, { "name": { "family": "Owen", "given": "Scott F." }, "orcid": "0000-0001-6294-7513" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "name": { "family": "Palaniswamy", "given": "Ramesh" }, "orcid": "0000-0003-4322-2407" }, { "name": { "family": "Palmer", "given": "Carter R." }, "orcid": "0000-0002-2385-2068" }, { "name": { "family": "Pang", "given": "Yan" }, "orcid": "0000-0003-3323-5052" }, { "name": { "family": "Peng", "given": "Hanchuan" }, "orcid": "0000-0002-3478-3942" }, { "name": { "family": "Pham", "given": "Thanh" }, "orcid": "0000-0002-4738-5062" }, { "name": { "family": "Pinto-Duarte", "given": "Antonio" }, "orcid": "0000-0002-2215-7653" }, { "name": { "family": "Plongthongkum", "given": "Nongluk" }, "orcid": "0000-0002-1305-285X" }, { "name": { "family": "Poirion", "given": "Olivier" }, "orcid": "0000-0002-0429-7003" }, { "name": { "family": "Preissl", "given": "Sebastian" }, "orcid": "0000-0001-8971-5616" }, { "name": { "family": "Purdom", "given": "Elizabeth" }, "orcid": "0000-0001-9455-7990" }, { "name": { "family": "Qu", "given": "Lei" }, "orcid": "0000-0002-2129-5253" }, { "name": { "family": "Rashid", "given": "Mohammad" }, "orcid": "0000-0002-7884-4954" }, { "name": { "family": "Reed", "given": "Nora M." }, "orcid": "0000-0003-0408-1568" }, { "name": { "family": "Regev", "given": "Aviv" }, "orcid": "0000-0003-3293-3158" }, { "name": { "family": "Ren", "given": "Bing" }, "orcid": "0000-0002-5435-1127" }, { "name": { "family": "Ren", "given": "Miao" }, "orcid": "0000-0002-5555-5279" }, { "name": { "family": "Rimorin", "given": "Christine" }, "orcid": "0000-0003-1491-8552" }, { "name": { "family": "Risso", "given": "Davide" }, "orcid": "0000-0001-8508-5012" }, { "name": { "family": "Rivkin", "given": "Angeline C." }, "orcid": "0000-0003-0399-9043" }, { "name": { "family": "Mu\u00f1oz-Casta\u00f1eda", "given": "Rodrigo" }, "orcid": "0000-0002-1176-7421" }, { "name": { "family": "Romanow", "given": "William J." }, "orcid": "0000-0002-3808-6482" }, { "name": { "family": "Ropelewski", "given": "Alexander J." }, "orcid": "0000-0001-6874-4477" }, { "name": { "family": "Roux de B\u00e9zieux", "given": "Hector" }, "orcid": "0000-0002-1489-8339" }, { "name": { "family": "Ruan", "given": "Zongcai" }, "orcid": "0000-0003-1547-165X" }, { "name": { "family": "Sandberg", "given": "Rickard" }, "orcid": "0000-0001-6473-1740" }, { "name": { "family": "Savoia", "given": "Steven" }, "orcid": "0000-0003-4514-7367" }, { "name": { "family": "Scala", "given": "Federico" }, "orcid": "0000-0002-2680-8572" }, { "name": { "family": "Schor", "given": "Michael" }, "orcid": "0000-0002-4493-7992" }, { "name": { "family": "Shen", "given": "Elise" }, "orcid": "0000-0002-3295-3928" }, { "name": { "family": "Siletti", "given": "Kimberly" }, "orcid": "0000-0001-7620-8973" }, { "name": { "family": "Smith", "given": "Jared B." }, "orcid": "0000-0002-0273-4898" }, { "name": { "family": "Smith", "given": "Kimberly" }, "orcid": "0000-0002-3142-1970" }, { "name": { "family": "Somasundaram", "given": "Saroja" }, "orcid": "0000-0002-3729-9849" }, { "name": { "family": "Song", "given": "Yuanyuan" }, "orcid": "0000-0002-9183-5884" }, { "name": { "family": "Sorensen", "given": "Staci A." }, "orcid": "0000-0002-6799-2126" }, { "name": { "family": "Stafford", "given": "David A." }, "orcid": "0000-0002-3310-5402" }, { "name": { "family": "Street", "given": "Kelly" }, "orcid": "0000-0001-6379-5013" }, { "name": { "family": "Sulc", "given": "Josef" }, "orcid": "0000-0002-4928-7183" }, { "name": { "family": "Sunkin", "given": "Susan" }, "orcid": "0000-0001-9893-3834" }, { "id": "Svensson-Valentine", "name": { "family": "Svensson", "given": "Valentine" }, "orcid": "0000-0002-9217-2330" }, { "name": { "family": "Tan", "given": "Pengcheng" }, "orcid": "0000-0001-7276-0381" }, { "name": { "family": "Tan", "given": "Zheng Huan" }, "orcid": "0000-0002-1886-2421" }, { "name": { "family": "Tasic", "given": "Bosiljka" }, "orcid": "0000-0002-6861-4506" }, { "name": { "family": "Thompson", "given": "Carol" }, "orcid": "0000-0003-1528-3237" }, { "name": { "family": "Tian", "given": "Wei" }, "orcid": "0000-0002-2146-1717" }, { "name": { "family": "Tickle", "given": "Timothy L." }, "orcid": "0000-0002-6592-6272" }, { "name": { "family": "Tieu", "given": "Michael" }, "orcid": "0000-0001-9286-5623" }, { "name": { "family": "Ting", "given": "Jonathan T." }, "orcid": "0000-0001-8266-0392" }, { "name": { "family": "Tolias", "given": "Andreas Savas" }, "orcid": "0000-0002-4305-6376" }, { "name": { "family": "Torkelson", "given": "Amy" }, "orcid": "0000-0002-9465-4202" }, { "name": { "family": "Tung", "given": "Herman" }, "orcid": "0000-0002-0812-3318" }, { "name": { "family": "Vaishnav", "given": "Eeshit Dhaval" }, "orcid": "0000-0003-3720-8051" }, { "name": { "family": "Van den Berge", "given": "Koen" }, "orcid": "0000-0002-1833-8478" }, { "name": { "family": "van Velthoven", "given": "Cindy T.J." }, "orcid": "0000-0001-5120-4546" }, { "name": { "family": "Vanderburg", "given": "Charles R." }, "orcid": "0000-0001-8979-5054" }, { "name": { "family": "Veldman", "given": "Matthew B." }, "orcid": "0000-0002-0328-5916" }, { "name": { "family": "Vu", "given": "Minh" }, "orcid": "0000-0003-4154-5659" }, { "name": { "family": "Wakeman", "given": "Wayne" }, "orcid": "0000-0002-3693-3609" }, { "name": { "family": "Wang", "given": "Peng" }, "orcid": "0000-0003-1181-5558" }, { "name": { "family": "Wang", "given": "Quanxin" }, "orcid": "0000-0002-0007-7935" }, { "name": { "family": "Wang", "given": "Xinxin" }, "orcid": "0000-0001-6393-2276" }, { "name": { "family": "Wang", "given": "Yimin" }, "orcid": "0000-0003-2515-6602" }, { "name": { "family": "Wang", "given": "Yun" }, "orcid": "0000-0001-5501-8433" }, { "name": { "family": "Welch", "given": "Joshua D." }, "orcid": "0000-0002-5869-2391" }, { "name": { "family": "White", "given": "Owen" }, "orcid": "0000-0003-2407-7320" }, { "name": { "family": "Williams", "given": "Elora" }, "orcid": "0000-0002-0178-5511" }, { "name": { "family": "Xie", "given": "Fangming" }, "orcid": "0000-0001-5232-1648" }, { "name": { "family": "Xie", "given": "Peng" }, "orcid": "0000-0002-9509-7268" }, { "name": { "family": "Xiong", "given": "Feng" }, "orcid": "0000-0002-6927-8903" }, { "name": { "family": "Yang", "given": "X. William" }, "orcid": "0000-0003-3705-7935" }, { "name": { "family": "Yanny", "given": "Anna Marie" }, "orcid": "0000-0001-7250-8450" }, { "name": { "family": "Yao", "given": "Zizhen" }, "orcid": "0000-0002-9361-5607" }, { "name": { "family": "Yin", "given": "Lulu" }, "orcid": "0000-0003-2932-6349" }, { "name": { "family": "Yu", "given": "Yang" }, "orcid": "0000-0002-4340-430X" }, { "name": { "family": "Yuan", "given": "Jing" }, "orcid": "0000-0001-9050-4496" }, { "name": { "family": "Zeng", "given": "Hongkui" }, "orcid": "0000-0002-0326-5878" }, { "name": { "family": "Zhang", "given": "Kun" }, "orcid": "0000-0002-7596-5224" }, { "name": { "family": "Zhang", "given": "Meng" }, "orcid": "0000-0002-9753-0635" }, { "name": { "family": "Zhang", "given": "Zhuzhu" }, "orcid": "0000-0002-2661-4700" }, { "name": { "family": "Zhao", "given": "Sujun" }, "orcid": "0000-0001-7807-7495" }, { "name": { "family": "Zhao", "given": "Xuan" }, "orcid": "0000-0002-5778-5422" }, { "name": { "family": "Zhou", "given": "Jingtian" }, "orcid": "0000-0003-2060-1922" }, { "name": { "family": "Zhuang", "given": "Xiaowei" }, "orcid": "0000-0002-6034-7853" }, { "name": { "family": "Zingg", "given": "Brian" }, "orcid": "0000-0001-8657-8863" } ] }, "title": "A multimodal cell census and atlas of the mammalian primary motor cortex", "ispublished": "pub", "full_text_status": "public", "keywords": "Cellular neuroscience; Molecular neuroscience; Motor cortex; Neural circuits", "note": "\u00a9 The Author(s) 2021. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. \n\nReceived 04 October 2020; Accepted 25 August 2021; Published 06 October 2021. \n\nWe thank additional members of our laboratories and institutions who contributed to the experimental and analytical components of this project. This work was supported by grants from the National Institute of Mental Health (NIMH) of the National Institutes of Health (NIH) under: U24MH114827, U19MH114821, U19MH114830, U19MH114831, U01MH117072, U01MH114829, U01MH121282, U01MH117023, U01MH114825, U01MH114819, U01MH114812, U01MH121260, U01MH114824, U01MH117079, U01MH116990, U01MH114828, R24MH117295, R24MH114793, R24MH114788, R24MH114815. We thank NIH BICCN program officers, in particular Yong Yao, for their guidance and support throughout this study. Additional support: NIH grants R01NS39600 and R01NS86082 to G.A.A. H.S.B. is a Chan Zuckerberg Biohub Investigator. Deutsche Forschungsgemeinschaft through a Heisenberg Professorship (BE5601/4-1), the Cluster of Excellence Machine Learning\u2014New Perspectives for Science (EXC 2064, project number 390727645) and the Collaborative Research Center 1233 Robust Vision (project number 276693517), the German Federal Ministry of Education and Research (FKZ 01GQ1601 and 01IS18039A) to P.B. This work was supported in part by the Flow Cytometry Core Facility of the Salk Institute with funding from NIH-NCI CCSG: P30 014195 and Shared Instrumentation Grant S10-OD023689. NIH grant R01MH094360 to H.-W.D. We thank M. Becerra, T. Boesen, C. Cao, M. Fayzullina, K. Cotter, L. Gao, L. Gacia, L. Korobkova, D. Lo, C. Mun, S. Yamashita and M Zhu for their technical and informatics support. Hearing Health Foundation Hearing Restoration Project grant to R.H. NIH grant OD010425 to G.D.H. NIH grant RF1MH114126 to E.S.L. and J.T.T. National Natural Science Foundation of China (NNSFC) grant 61890953 to H.G. NNSFC grant 81827901 to Q.L. This project was supported in part by NIH grants P51OD010425 from the Office of Research Infrastructure Programs (ORIP) and UL1TR000423 from the National Center for Advancing Translational Sciences (NCATS). Its contents are solely the responsibility of the authors and do not necessarily represent the official view of NIH, ORIP, NCATS or the Institute of Translational Health Sciences at the Washington National Primate Research Center. NNSFC grant 61871411 and the University Synergy Innovation Program of Anhui Province GXXT-2019-008 to L.Q. Howard Hughes Medical Institute and the Klarman Cell Observatory for A.R. Howard Hughes Medical Institute for J.R.E. and X. Zhuang. NNSFC Grant 32071367 and NSF Shanghai Grant 20ZR1420100 to Yimin Wang. NIH grants R01EY023173 and U01MH105982 to H.Z. Researchers from Allen Institute for Brain Science wish to thank the Allen Institute founder, P. G. Allen, for his vision, encouragement and support. \n\nData availability: Primary data are accessible through the Brain Cell Data Center and data archives. Brain Cell Data Center (BCDC), Overall BICCN organization and data, www.biccn.org. Neuroscience Multi-omic Data Archive (NeMO), RRID:SCR_016152. Brain Image Library (BIL), RRID:SCR_017272. Distributed Archives for Neurophysiology Data Integration (DANDI), RRID:SCR_017571. Publicly used databases in study: NCBI Homologene, 11/22/2019, https://www.ncbi.nlm.nih.gov/homologene, GENCODE mm10 (v16), https://www.gencodegenes.org, JASPAR 2020 database, http://jaspar.genereg.net. All data resources associated with this publication are available as listed at: https://github.com/BICCN/CellCensusMotorCortex and https://doi.org/10.5281/zenodo.4726182. \n\nCode availability: All code and libraries used in the manuscript are available at https://github.com/BICCN/CellCensusMotorCortex and https://doi.org/10.5281/zenodo.4726182. \n\nAuthor Contributions: BICCN contributing principal investigators: G.A.A., M.M.B., E.M.C., J. Chun, J.R.E., G.F., J.C.G., S.S.G., Y.O.H., M.J.H., R.H., H.-W.D., Z.J.H., E.S.L., B.K.L., M.E.M., L. Ng, P.O., L.P., A.J.R., T.L.T., A.S.T., O.W., X.W.Y., H.Z., K.Z., X. Zhuang and J.N. Principal manuscript editors: Z.J.H., E.S.L. and H.Z. Manuscript writing and figure generation: G.A.A., T.E.B., P.B., E.M.C., T.L.D., J.A.H., J.R.E., M.J.H., H.-W.D., Z.J.H., N.L.J., B.E.K., D.K., E.S.L., Y.E.L., H.L., K.S.M., E.A.M., M. Naeemi, B.Z., P.O., B.R., F.S., P.T., J.T.T., A.S.T., F. Xie, H.Z., M.Z., Z.Z., J.Z., X. Zhuang and J.N. Analysis coordination: T.E.B., E.M.C., J.A.H., J.R.E., M.J.H., H.-W.D., Z.J.H., E.S.L., E.A.M., P.O., B.R., A.S.T., H.Z., X. Zhuang and J.N. Integrated data analysis: E.A., T.E.B., P.B., J.A.H., J.R.E., H.-W.D., Z.J.H., N.L.J., B.E.K., D.K., E.S.L., Y.E.L., H.L., E.A.M., P.O., B.R., F.S., P.T., A.S.T., F. Xie, Z.Y., H.Z., M.Z., Z.Z., J.Z. and X. Zhuang. scRNA-seq and snRNA-seq data generation and processing: D.B., T.N.N., T.C., J. Chun, K.C., N.D., D.D., S.D., W.D., E.L.D., G.F., O.F., M. Goldman, J. Goldy, R.D.H., L. Hu, C.D.K., F.M.K., M.K., B.B.L., K.L., E.S.L., S. Linnarsson, C.S.L., E.Z.M., S.A.M., D.M., N.M.N., C.R.P., T.P., N.P., N.M.R., A.R., C.R., W.J.R., S. Savoia, K. Siletti, K. Smith, J.S., B.T., M.T., A.T., H.T., C.T.J.v.V., C.R.V., A.M.Y., H.Z. and K.Z. ATAC-seq data generation and processing: M.M.B., J. Chun, D.D., W.D., R.F., X.H., B.B.L., Y.E.L., C.S.L., J.D.L., J.K.O., C.R.P., A.P.-D., N.P., O.P., S.P., B.R., W.J.R., X.W. and K.Z. Methylcytosine data production and analysis: A.I.A., A. Bartlett, M.M.B., L.B., C.O., R.G.C., H.C., J.R.E., C.F., C.L., H.L., J.D.L., J.R.N., M. Nunn, J.K.O., A.P.-D., A.C.R., W.T. and J.Z. Epi-retro-seq data generation and processing: A. Bartlett, M.M.B., L.B., E.M.C., C.O., R.G.C., B. Dominguez, J.R.E., C.F., T.I.-C., M.J., X. Jin, C.L., K.L., P.A.M., E.A.M., J.R.N., M. Nunn, Y.P., A.P.-D., M. Rashid, A.C.R., J.B.S., P.T., M.V., E.W., Z.Z. and J.Z. 'Omics data analysis: E.A., T.E.B., T.B., A.S.B., M.C., D.D., S.D., J.R.E., R.F., S.F., O.F., J. Gillis, J. Goldy, Q.H., N.L.J., P.V.K., F.M.K., B.B.L., E.S.L., Y.E.L., S. Linnarsson, H.L., E.Z.M., E.A.M., S.-Y.N., V.N., L.P., O.P., E.P., A.R., D.R., H.R.d.B., K. Siletti, K. Smith, S. Somasundaram, K. Street, V.S., B.T., W.T., E.D.V., K.V.d.B., C.T.J.v.V., J.D.W., F. Xie, Z.Y., H.Z., J.Z. and J.N. Tracing and connectivity data generation: X.A., H.S.B., R.K.C., J.A.H., K.E.H., W.G., H.G., J.T.H., I.B., H.-W.D., Z.J.H., G.K., D.J.K., A.L., Xiangning Li, B.K.L., Q.L., K.S.M., L. Ng, L.G., H.H., B.Z., R.M.-C., D.A.S., H.Z. and J.N. Morphology data generation and reconstruction: T.L.D., J.A.H., Z.F., K.E.H., H.G., H.-W.D., Z.J.H., X. Jia, S.J., T.J., X.K., R.L., P.L., Xiangning Li, Yaoyao Li, Yuanyuan Li, L.L., Q.L., H.P., L.Q., M. Ren, Z.R., E.S., Y.S., W.W., P.W., Yimin Wang, Yun Wang, L.Y., J.Y., H.Z., S.Z. and X. Zhao. OLST/STPT and other data generation: X.A., W.G., J.T.H., Z.J.H., G.K., K.S.M., A.N., P.O., R.P. and R.M.-C. Morphology, connectivity and imaging analysis: X.A., G.A.A., S.B., L.D., J.A.H., Z.F., W.G., H.G., J.T.H., H.-W.D., Z.J.H., D. Huilgol, B. Huo, X. Jia, G.K., H.-C.K., S. Laturnus, A.L., Xu Li, N.N.F., K.S.M., P.P.M., J.M., M. Naeemi, A.N., L. Ng, P.O., R.P., H.P., R.M.-C., Q.W., Yimin Wang, Yun Wang, P.X., F. Xiong, Y.Y. and H.Z. Spatially resolved single-cell transcriptomics (MERFISH): M.Z., S.W.E., B.Z., Z.Y., H.Z., H.-W.D. and X. Zhuang. Multimodal profiling (Patch-seq): P.B., J.B., M.B., Y.B., C.R.C., J.R.C., R.D., P.R.N., L. Hartmanis, G.D.H., X. Jiang, B.E.K., C.D.K., A.L.K., D.K., S. Laturnus, E.S.L., E.M., S. Mulherkar, S.F.O., R.S., F.S., K. Smith, S.A.S., Z.H.T., J.T.T., A.S.T. and H.Z. Transgenic tools: S.A., X.A., H.S.B., R.K.C., T.L.D., W.G., J.T.H., D. Hockemeyer, Z.J.H., D. Huilgol, G.K., D.J.K., A.Y.L., K.S.M., D.A.S., B.T., M.B.V., X.W.Y., Z.Y., H.Z. and J.N. NeMO archive and analytics: R.S.A., S.A.A., H.C.B., R.C., A.C., C.C., J. Crabtree, H.C., V.F., M. Giglio, B.R.H., R.H., J.K., A.M., C.M., L. Nickel, D.O., J.O., M.S. and O.W. Brain Image Library (BIL) archive: G.H. and A.J.R. DANDI archive: B. Dichter, S.S.G., M. Grauer, Y.O.H. and B. Helba. Brain Cell Data Center (BCDC): A. Bandrowski, N.B., B.C., F.D.D., K.D., J.C.G., T.H.G., M.J.H., F.K., K. Konwar, M.E.M., L. Ng, C.T. and T.L.T. Project management: F.D.D., H.G., K. Kelly, B.B.L., K.S.M., S. Mok, H.H., M. Nunn, S. Sunkin and C.T. Manuscript correspondence: H.Z. \n\nCompeting interests: A. Bandrowski is a cofounder of SciCrunch, a company devoted to improving scientific communication. J.R.E. is a member of Zymo Research SAB. J.A.H., K.E.H., T.N.N. and P.R.N. are currently employed by Cajal Neuroscience. P.V.K. serves on the Scientific Advisory Board of Celsius Therapeutics Inc. M.E.M. is a founder and CSO of SciCrunch Inc., a UCSD tech start up that produces tools in support of reproducibility including RRIDs. A.R. is a founder and equity holder of Celsius Therapeutics, an equity holder in Immunitas Therapeutics and until 31 August 2020 was a member of the scientific advisory board of Syros Pharmaceuticals, Neogene Therapeutics, Asimov and ThermoFisher Scientific. From 1 August 2020, A.R. has been an employee of Genentech. B.R. is a co-founder of Arima Genomics, Inc. and Epigenome Technologies, Inc. K.Z. is a co-founder, equity holder and serves on the Scientific Advisor Board of Singlera Genomics. X. Zhuang is a co-founder and consultant of Vizgen. \n\nPeer review information: Nature thanks Peter Jones, Manolis Kellis and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.\n\nPublished - s41586-021-03950-0.pdf
Submitted - 2020.10.19.343129v1.full.pdf
Supplemental Material - 41586_2021_3950_Fig10_ESM.webp
Supplemental Material - 41586_2021_3950_Fig11_ESM.webp
Supplemental Material - 41586_2021_3950_Fig12_ESM.webp
Supplemental Material - 41586_2021_3950_Fig13_ESM.webp
Supplemental Material - 41586_2021_3950_Fig14_ESM.webp
Supplemental Material - 41586_2021_3950_Fig15_ESM.webp
Supplemental Material - 41586_2021_3950_Fig16_ESM.webp
Supplemental Material - 41586_2021_3950_MOESM1_ESM.pdf
Supplemental Material - 41586_2021_3950_MOESM2_ESM.xlsx
Supplemental Material - 41586_2021_3950_MOESM3_ESM.xlsx
Supplemental Material - 41586_2021_3950_MOESM4_ESM.pdf
Supplemental Material - 41586_2021_3950_Tab1_ESM.jpg
Supplemental Material - 41586_2021_3950_Tab2_ESM.jpg
", "abstract": "Here we report the generation of a multimodal cell census and atlas of the mammalian primary motor cortex as the initial product of the BRAIN Initiative Cell Census Network (BICCN). This was achieved by coordinated large-scale analyses of single-cell transcriptomes, chromatin accessibility, DNA methylomes, spatially resolved single-cell transcriptomes, morphological and electrophysiological properties and cellular resolution input\u2013output mapping, integrated through cross-modal computational analysis. Our results advance the collective knowledge and understanding of brain cell-type organization. First, our study reveals a unified molecular genetic landscape of cortical cell types that integrates their transcriptome, open chromatin and DNA methylation maps. Second, cross-species analysis achieves a consensus taxonomy of transcriptomic types and their hierarchical organization that is conserved from mouse to marmoset and human. Third, in situ single-cell transcriptomics provides a spatially resolved cell-type atlas of the motor cortex. Fourth, cross-modal analysis provides compelling evidence for the transcriptomic, epigenomic and gene regulatory basis of neuronal phenotypes such as their physiological and anatomical properties, demonstrating the biological validity and genomic underpinning of neuron types. We further present an extensive genetic toolset for targeting glutamatergic neuron types towards linking their molecular and developmental identity to their circuit function. Together, our results establish a unifying and mechanistic framework of neuronal cell-type organization that integrates multi-layered molecular genetic and spatial information with multi-faceted phenotypic properties.", "date": "2021-10-07", "date_type": "published", "publication": "Nature", "volume": "598", "number": "7879", "publisher": "Nature Publishing Group", "pagerange": "86-102", "id_number": "CaltechAUTHORS:20201027-075126222", "issn": "0028-0836", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20201027-075126222", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "U24MH114827" }, { "agency": "NIH", "grant_number": "U19MH114821" }, { "agency": "NIH", "grant_number": "U19MH114830" }, { "agency": "NIH", "grant_number": "U19MH114831" }, { "agency": "NIH", "grant_number": "U01MH117072" }, { "agency": "NIH", "grant_number": "U01MH114829" }, { "agency": "NIH", "grant_number": "U01MH121282" }, { "agency": "NIH", "grant_number": "U01MH117023" }, { "agency": "NIH", "grant_number": "U01MH114825" }, { "agency": "NIH", "grant_number": "U01MH114819" }, { "agency": "NIH", "grant_number": "U01MH114812" }, { "agency": "NIH", "grant_number": "U01MH121260" }, { "agency": "NIH", "grant_number": "U01MH114824" }, { "agency": "NIH", "grant_number": "U01MH117079" }, { "agency": "NIH", "grant_number": "U01MH116990" }, { "agency": "NIH", "grant_number": "U01MH114828" }, { "agency": "NIH", "grant_number": "R24MH117295" }, { "agency": "NIH", "grant_number": "R24MH114793" }, { "agency": "NIH", "grant_number": "R24MH114788" }, { "agency": "NIH", "grant_number": "R24MH114815" }, { "agency": "NIH", "grant_number": "R01NS39600" }, { "agency": "NIH", "grant_number": "R01NS86082" }, { "agency": "Chan Zuckerberg Initiative" }, { "agency": "Deutsche Forschungsgemeinschaft (DFG)", "grant_number": "BE5601/4-1" }, { "agency": "Deutsche Forschungsgemeinschaft (DFG)", "grant_number": "EXC 2064" }, { "agency": "Deutsche Forschungsgemeinschaft (DFG)", "grant_number": "390727645" }, { "agency": "Deutsche Forschungsgemeinschaft (DFG)", "grant_number": "276693517" }, { "agency": "Bundesministerium f\u00fcr Bildung und Forschung (BMBF)", "grant_number": "FKZ 01GQ1601" }, { "agency": "Bundesministerium f\u00fcr Bildung und Forschung (BMBF)", "grant_number": "FKZ 01IS18039A" }, { "agency": "NIH", "grant_number": "P30 014195" }, { "agency": "NIH", "grant_number": "S10-OD023689" }, { "agency": "NIH", "grant_number": "R01MH094360" }, { "agency": "NIH", "grant_number": "OD010425" }, { "agency": "NIH", "grant_number": "RF1MH114126" }, { "agency": "National Natural Science Foundation of China", "grant_number": "61890953" }, { "agency": "National Natural Science Foundation of China", "grant_number": "81827901" }, { "agency": "NIH", "grant_number": "P51OD010425" }, { "agency": "NIH", "grant_number": "UL1TR000423" }, { "agency": "National Natural Science Foundation of China", "grant_number": "61871411" }, { "agency": "Anhui Province", "grant_number": "GXXT-2019-008" }, { "agency": "Howard Hughes Medical Institute (HHMI)" }, { "agency": "Klarman Cell Observatory" }, { "agency": "National Natural Science Foundation of China", "grant_number": "32071367" }, { "agency": "National Natural Science Foundation of China", "grant_number": "20ZR1420100" }, { "agency": "NIH", "grant_number": "R01EY023173" }, { "agency": "NIH", "grant_number": "U01MH105982" }, { "agency": "Allen Institute for Brain Science" }, { "agency": "Salk Institute" } ] }, "local_group": { "items": [ { "id": "Division-of-Biology-and-Biological-Engineering" } ] }, "corp_creators": { "items": [ "BRAIN Initiative Cell Census Network (BICCN)" ] }, "doi": "10.1038/s41586-021-03950-0", "primary_object": { "basename": "41586_2021_3950_MOESM1_ESM.pdf", "url": "https://authors.library.caltech.edu/records/6zspp-xxk39/files/41586_2021_3950_MOESM1_ESM.pdf" }, "related_objects": [ { "basename": "41586_2021_3950_Fig11_ESM.webp", "url": "https://authors.library.caltech.edu/records/6zspp-xxk39/files/41586_2021_3950_Fig11_ESM.webp" }, { "basename": "41586_2021_3950_Fig14_ESM.webp", "url": "https://authors.library.caltech.edu/records/6zspp-xxk39/files/41586_2021_3950_Fig14_ESM.webp" }, { "basename": "41586_2021_3950_Fig15_ESM.webp", "url": "https://authors.library.caltech.edu/records/6zspp-xxk39/files/41586_2021_3950_Fig15_ESM.webp" }, { "basename": "41586_2021_3950_MOESM3_ESM.xlsx", "url": "https://authors.library.caltech.edu/records/6zspp-xxk39/files/41586_2021_3950_MOESM3_ESM.xlsx" }, { "basename": "41586_2021_3950_MOESM4_ESM.pdf", "url": "https://authors.library.caltech.edu/records/6zspp-xxk39/files/41586_2021_3950_MOESM4_ESM.pdf" }, { "basename": "41586_2021_3950_Tab2_ESM.jpg", "url": "https://authors.library.caltech.edu/records/6zspp-xxk39/files/41586_2021_3950_Tab2_ESM.jpg" }, { "basename": "41586_2021_3950_Fig10_ESM.webp", "url": "https://authors.library.caltech.edu/records/6zspp-xxk39/files/41586_2021_3950_Fig10_ESM.webp" }, { "basename": "41586_2021_3950_Fig13_ESM.webp", "url": "https://authors.library.caltech.edu/records/6zspp-xxk39/files/41586_2021_3950_Fig13_ESM.webp" }, { "basename": "41586_2021_3950_Tab1_ESM.jpg", "url": "https://authors.library.caltech.edu/records/6zspp-xxk39/files/41586_2021_3950_Tab1_ESM.jpg" }, { "basename": "41586_2021_3950_Fig12_ESM.webp", "url": "https://authors.library.caltech.edu/records/6zspp-xxk39/files/41586_2021_3950_Fig12_ESM.webp" }, { "basename": "41586_2021_3950_Fig16_ESM.webp", "url": "https://authors.library.caltech.edu/records/6zspp-xxk39/files/41586_2021_3950_Fig16_ESM.webp" }, { "basename": "s41586-021-03950-0.pdf", "url": "https://authors.library.caltech.edu/records/6zspp-xxk39/files/s41586-021-03950-0.pdf" }, { "basename": "2020.10.19.343129v1.full.pdf", "url": "https://authors.library.caltech.edu/records/6zspp-xxk39/files/2020.10.19.343129v1.full.pdf" }, { "basename": "41586_2021_3950_MOESM2_ESM.xlsx", "url": "https://authors.library.caltech.edu/records/6zspp-xxk39/files/41586_2021_3950_MOESM2_ESM.xlsx" } ], "resource_type": "article", "pub_year": "2021", "author_list": "Adkins, Ricky S.; Aldridge, Andrew I.; et el." }, { "id": "https://authors.library.caltech.edu/records/13dbk-bh430", "eprint_id": 101686, "eprint_status": "archive", "datestamp": "2023-08-22 11:31:03", "lastmod": "2023-12-22 23:16:58", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Yao-Zizhen", "name": { "family": "Yao", "given": "Zizhen" }, "orcid": "0000-0002-9361-5607" }, { "name": { "family": "Liu", "given": "Hanqing" }, "orcid": "0000-0002-5114-6048" }, { "name": { "family": "Xie", "given": "Fangming" }, "orcid": "0000-0001-5232-1648" }, { "name": { "family": "Fischer", "given": "Stephan" }, "orcid": "0000-0002-7034-4103" }, { "name": { "family": "Adkins", "given": "Ricky S." }, "orcid": "0000-0002-7983-5486" }, { "id": "Aldrige-Andrew-I", "name": { "family": "Aldrige", "given": "Andrew I." }, "orcid": "0000-0003-1962-8802" }, { "name": { "family": "Ament", "given": "Seth A." }, "orcid": "0000-0001-6443-7509" }, { "name": { "family": "Bartlett", "given": "Anna" }, "orcid": "0000-0001-7059-4033" }, { "name": { "family": "Behrens", "given": "M. Margarita" }, "orcid": "0000-0002-7168-8186" }, { "name": { "family": "Van den Berge", "given": "Koen" }, "orcid": "0000-0002-1833-8478" }, { "name": { "family": "Bertagnolli", "given": "Darren" }, "orcid": "0000-0002-6626-1567" }, { "name": { "family": "de Bezieux", "given": "Hector Roux" }, "orcid": "0000-0002-1489-8339" }, { "name": { "family": "Biancalani", "given": "Tommaso" }, "orcid": "0000-0001-9104-9755" }, { "id": "Booeshaghi-A-Sina", "name": { "family": "Booeshaghi", "given": "A. Sina" }, "orcid": "0000-0002-6442-4502" }, { "name": { "family": "Corrada Bravo", "given": "Hector" }, "orcid": "0000-0002-1255-4444" }, { "name": { "family": "Casper", "given": "Tamara" }, "orcid": "0000-0003-1638-3651" }, { "name": { "family": "Colantuoni", "given": "Carlo" }, "orcid": "0000-0001-6818-6380" }, { "name": { "family": "Crabtree", "given": "Jonathan" }, "orcid": "0000-0002-7286-5690" }, { "name": { "family": "Creasy", "given": "Heather" }, "orcid": "0000-0002-1369-6882" }, { "name": { "family": "Crichton", "given": "Kirsten" }, "orcid": "0000-0002-7869-1492" }, { "name": { "family": "Crow", "given": "Megan" }, "orcid": "0000-0002-1172-5897" }, { "name": { "family": "Dee", "given": "Nick" }, "orcid": "0000-0002-2831-9254" }, { "name": { "family": "Dougherty", "given": "Elizabeth L." }, "orcid": "0000-0001-8922-5078" }, { "name": { "family": "Doyle", "given": "Wayne I." }, "orcid": "0000-0001-8276-2591" }, { "name": { "family": "Dudoit", "given": "Sandrine" }, "orcid": "0000-0002-6069-8629" }, { "name": { "family": "Fang", "given": "Rongxin" }, "orcid": "0000-0003-0107-7504" }, { "name": { "family": "Felix", "given": "Victor" }, "orcid": "0000-0002-9773-0629" }, { "name": { "family": "Fong", "given": "Olivia" }, "orcid": "0000-0002-7091-9667" }, { "name": { "family": "Giglio", "given": "Michelle" }, "orcid": "0000-0001-7628-5565" }, { "name": { "family": "Goldy", "given": "Jeff" }, "orcid": "0000-0001-5140-6922" }, { "name": { "family": "Hawrylycz", "given": "Michael" }, "orcid": "0000-0002-5741-8024" }, { "name": { "family": "Herb", "given": "Brian R." }, "orcid": "0000-0002-5910-9647" }, { "name": { "family": "Hertzano", "given": "Ronna" }, "orcid": "0000-0002-8093-6567" }, { "name": { "family": "Hou", "given": "Xiaomeng" }, "orcid": "0000-0002-5453-9015" }, { "name": { "family": "Hu", "given": "Qiwen" }, "orcid": "0000-0003-2798-919X" }, { "name": { "family": "Kancherla", "given": "Jayaram" }, "orcid": "0000-0001-5855-5031" }, { "name": { "family": "Kroll", "given": "Matthew" }, "orcid": "0000-0002-0126-7618" }, { "name": { "family": "Lathia", "given": "Kanan" }, "orcid": "0000-0003-0080-1951" }, { "name": { "family": "Li", "given": "Yang Eric" }, "orcid": "0000-0001-6997-6018" }, { "name": { "family": "Lucero", "given": "Jacinta D." }, "orcid": "0000-0001-7578-6624" }, { "name": { "family": "Luo", "given": "Chongyuan" }, "orcid": "0000-0002-8541-0695" }, { "name": { "family": "Mahurkar", "given": "Anup" }, "orcid": "0000-0002-4999-2296" }, { "name": { "family": "McMillen", "given": "Delissa" }, "orcid": "0000-0002-3413-4424" }, { "name": { "family": "Nadaf", "given": "Naeem M." }, "orcid": "0000-0002-7805-8523" }, { "name": { "family": "Nery", "given": "Joseph R." }, "orcid": "0000-0003-0153-5659" }, { "name": { "family": "Nguyen", "given": "Thuc Nghi" }, "orcid": "0000-0002-6466-5883" }, { "name": { "family": "Niu", "given": "Sheng-Yong" }, "orcid": "0000-0002-7734-1191" }, { "name": { "family": "Ntranos", "given": "Vasilis" }, "orcid": "0000-0002-2477-0670" }, { "name": { "family": "Orvis", "given": "Joshua" }, "orcid": "0000-0002-5705-5710" }, { "name": { "family": "Osteen", "given": "Julia K." }, "orcid": "0000-0001-7058-3297" }, { "name": { "family": "Pham", "given": "Thanh" }, "orcid": "0000-0002-4738-5062" }, { "name": { "family": "Pinto-Duarte", "given": "Antonio" }, "orcid": "0000-0002-2215-7653" }, { "name": { "family": "Poirion", "given": "Olivier" }, "orcid": "0000-0002-0429-7003" }, { "name": { "family": "Preissl", "given": "Sebastian" }, "orcid": "0000-0001-8971-5616" }, { "name": { "family": "Purdom", "given": "Elizabeth" }, "orcid": "0000-0001-9455-7990" }, { "name": { "family": "Rimorin", "given": "Christine" }, "orcid": "0000-0003-1491-8552" }, { "name": { "family": "Risso", "given": "Davide" }, "orcid": "0000-0001-8508-5012" }, { "id": "Rivkin-Angeline-C", "name": { "family": "Rivkin", "given": "Angeline C." }, "orcid": "0000-0003-0399-9043" }, { "name": { "family": "Smith", "given": "Kimberly" }, "orcid": "0000-0002-3142-1970" }, { "name": { "family": "Street", "given": "Kelly" }, "orcid": "0000-0001-6379-5013" }, { "name": { "family": "Sulc", "given": "Josef" }, "orcid": "0000-0002-4928-7183" }, { "id": "Svensson-Valentine", "name": { "family": "Svensson", "given": "Valentine" }, "orcid": "0000-0002-9217-2330" }, { "name": { "family": "Tieu", "given": "Michael" }, "orcid": "0000-0001-9286-5623" }, { "name": { "family": "Torkelson", "given": "Amy" }, "orcid": "0000-0002-9465-4202" }, { "name": { "family": "Tung", "given": "Herman" }, "orcid": "0000-0002-0812-3318" }, { "name": { "family": "Vaishnav", "given": "Eeshit Dhaval" }, "orcid": "0000-0003-3720-8051" }, { "name": { "family": "Vanderburg", "given": "Charles R." }, "orcid": "0000-0001-8979-5054" }, { "name": { "family": "van Velthoven", "given": "Cindy" }, "orcid": "0000-0001-5120-4546" }, { "name": { "family": "Wang", "given": "Xinxin" }, "orcid": "0000-0001-6393-2276" }, { "name": { "family": "White", "given": "Owen R." }, "orcid": "0000-0003-2407-7320" }, { "name": { "family": "Huang", "given": "Z. Josh" }, "orcid": "0000-0003-0592-028X" }, { "name": { "family": "Kharchenko", "given": "Peter V." }, "orcid": "0000-0002-6036-5875" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "name": { "family": "Ngai", "given": "John" }, "orcid": "0000-0002-1191-8971" }, { "name": { "family": "Regev", "given": "Aviv" }, "orcid": "0000-0003-3293-3158" }, { "name": { "family": "Tasic", "given": "Bosiljka" }, "orcid": "0000-0002-6861-4506" }, { "name": { "family": "Welch", "given": "Joshua D." }, "orcid": "0000-0002-5869-2391" }, { "name": { "family": "Gillis", "given": "Jesse" }, "orcid": "0000-0002-0936-9774" }, { "name": { "family": "Macosko", "given": "Evan Z." }, "orcid": "0000-0002-2794-5165" }, { "name": { "family": "Ren", "given": "Bing" }, "orcid": "0000-0002-5435-1127" }, { "id": "Ecker-Joseph-R", "name": { "family": "Ecker", "given": "Joseph R." }, "orcid": "0000-0001-5799-5895" }, { "name": { "family": "Zeng", "given": "Hongkui" }, "orcid": "0000-0002-0326-5878" }, { "name": { "family": "Mukamel", "given": "Eran A." }, "orcid": "0000-0003-3203-9535" } ] }, "title": "A transcriptomic and epigenomic cell atlas of the mouse primary motor cortex", "ispublished": "pub", "full_text_status": "public", "keywords": "Cellular neuroscience; Epigenomics; Gene expression profiling; Molecular neuroscience; Motor cortex", "note": "\u00a9 2021 The Author(s). This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. \n\nReceived 05 March 2020; Accepted 26 March 2021; Published 06 October 2021. \n\nWe are grateful to A. Bandrowski and Y. Yao for their insightful comments. This work was funded by the NIH BRAIN Initiative (U19MH114830 to H.Z., U19MH121282 to J.R.E., U19MH114821 to Z.J.H., R24MH114788 to O.R.W., U24MH114827 to M.H., R24MH114815 to R.H. and O.R.W., and NIH NIDCD DC013817 to R.H.), the Hearing Restoration project Hearing Health Foundation (R.H.) and the NIH NIGMS (GM114267 to H.C.B.). \n\nData availability: The BICCN MOp data (RRID: SCR_015820) can be accessed via the NeMO archive (RRID: SCR_016152) at: https://assets.nemoarchive.org/dat-ch1nqb7. Visualization and analysis resources can be found at: NeMO analytics (https://nemoanalytics.org/), Genome browser (https://brainome.ucsd.edu/BICCN_MOp) and Epiviz browser (https://epiviz.nemoanalytics.org/biccn_mop). \n\nCode availability: The codes used for data analysis: scrattch.hicat (hierarchical, iterative clustering for analysis of transcriptomics) for RNA clustering (https://github.com/AllenInstitute/scrattch.hicat); SnapTools for ATAC-seq analysis (https://github.com/r3fang/SnapTools); YAP (Yet Another Pipeline) and ALLCools for DNA methylation (snmC-seq2) mapping and cluster-level aggregation (https://github.com/lhqing/cemba_data; documentation: cemba-data.rtfd.io; https://lhqing.github.io/ALLCools); MetaNeighbor for cluster reproducibility analysis (https://github.com/gillislab/MetaNeighbor-BICCN); LIGER for multimodal integration, embedding and clustering (https://github.com/welch-lab/liger); SingleCellFusion for multimodal integration, embedding and clustering (https://github.com/mukamel-lab/SingleCellFusion); Conos for cluster reproducibility analysis (https://github.com/kharchenkolab/conos); STAR v2.5.3 for RNA-seq alignment49; and Bismark for DNA methylation (snmC-seq2) alignment55. \n\nThese authors contributed equally: Zizhen Yao, Hanqing Liu, Fangming Xie, Stephan Fischer. \n\nAuthor Contributions: A.R., A.T., B.T., C.R., C.R.V., D.B., D.M., E.L.D., E.Z.M., H.T., H.Z., J. Goldy, J.S., K.C., K.L., K. Smith, M.K., M.T., N.D., N.M.N., O.F., T.C., T.N.N. and T.P. contributed to RNA data generation. A.B., A.C.R., A.I.A., A.P.-D., C.L., H.L., J.D.L., J.K.O., J.R.E., J.R.N., M.M.B., S.-Y.N. and Y.E.L. contributed to DNA methylation (snmC-seq2) data generation. A.P.-D., B.R., J.D.L., J.K.O., M.M.B., S.P., X.H., X.W. and Y.E.L. contributed to snATAC data generation. A.M., B.R.H., C.C., C.v.V., E.A.M., F.X., H.C., H.C.B., J.C., J. Goldy, J.K., J.O., M.G., M.H., O.R.W., R.F., R.H., R.S.A., S.A.A., S.-Y.N., V.F., W.I.D. and Z.Y. contributed to data archive/infrastructure. A.R., A.S.B., B.T., D.R., E.A.M., E.D.V., E.P., E.Z.M., F.X., H.L., H.R.d.B., H.Z., J.D.W., J. Goldy, J. Gillis, J.O., K. Smith, K. Street, K.V.d.B., L.P., M.C., O.F., O.P., P.V.K., Q.H., R.F., S.D., S.F., S.-Y.N., T.B., V.N., V.S., W.I.D., Y.E.L. and Z.Y. contributed to data analysis. A.R., B.R., B.T., C.L., E.A.M., E.D.V., E.Z.M., F.X., H.L., H.Z., J.D.W., J. Gillis, J.N., M.C., M.M.B., P.V.K., Q.H., R.F., S.F., T.B., Y.E.L. and Z.Y. contributed to data interpretation. A.S.B., E.A.M., F.X., H.L., H.Z., J.D.W., J. Gillis, L.P., M.C., Q.H., S.F., Z.J.H. and Z.Y. contributed to writing the manuscript. \n\nCompeting interests: B.R. is a shareholder of Arima Genomics, Inc. P.V.K. serves on the Scientific Advisory Board to Celsius Therapeutics, Inc. A.R. is an equity holder and founder of Celsius Therapeutics, an equity holder in Immunitas, and a Scientific Advisory Board member to Syros Pharmaceuticals, Neogene Therapeutics, Asimov and Thermo Fisher Scientific. \n\nPeer review information: Nature thanks Andrew Adey, Aaron D. Gitler and John Krakauer for their contribution to the peer review of this work. Peer reviewer reports are available.\n\nPublished - s41586-021-03500-8.pdf
Submitted - 2020.02.29.970558v1.full.pdf
Supplemental Material - 41586_2021_3500_Fig10_ESM.webp
Supplemental Material - 41586_2021_3500_Fig11_ESM.webp
Supplemental Material - 41586_2021_3500_Fig12_ESM.webp
Supplemental Material - 41586_2021_3500_Fig5_ESM.webp
Supplemental Material - 41586_2021_3500_Fig6_ESM.webp
Supplemental Material - 41586_2021_3500_Fig7_ESM.webp
Supplemental Material - 41586_2021_3500_Fig8_ESM.webp
Supplemental Material - 41586_2021_3500_Fig9_ESM.webp
Supplemental Material - 41586_2021_3500_MOESM1_ESM.pdf
Supplemental Material - 41586_2021_3500_MOESM2_ESM.pdf
Supplemental Material - 41586_2021_3500_MOESM3_ESM.pdf
Supplemental Material - 41586_2021_3500_MOESM4_ESM.zip
", "abstract": "Single-cell transcriptomics can provide quantitative molecular signatures for large, unbiased samples of the diverse cell types in the brain. With the proliferation of multi-omics datasets, a major challenge is to validate and integrate results into a biological understanding of cell-type organization. Here we generated transcriptomes and epigenomes from more than 500,000 individual cells in the mouse primary motor cortex, a structure that has an evolutionarily conserved role in locomotion. We developed computational and statistical methods to integrate multimodal data and quantitatively validate cell-type reproducibility. The resulting reference atlas\u2014containing over 56 neuronal cell types that are highly replicable across analysis methods, sequencing technologies and modalities\u2014is a comprehensive molecular and genomic account of the diverse neuronal and non-neuronal cell types in the mouse primary motor cortex. The atlas includes a population of excitatory neurons that resemble pyramidal cells in layer 4 in other cortical regions. We further discovered thousands of concordant marker genes and gene regulatory elements for these cell types. Our results highlight the complex molecular regulation of cell types in the brain and will directly enable the design of reagents to target specific cell types in the mouse primary motor cortex for functional analysis.", "date": "2021-10-07", "date_type": "published", "publication": "Nature", "volume": "598", "number": "7879", "publisher": "Nature Publishing Group", "pagerange": "103-110", "id_number": "CaltechAUTHORS:20200303-153620082", "issn": "0028-0836", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20200303-153620082", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "U19MH114830" }, { "agency": "NIH", "grant_number": "U19MH121282" }, { "agency": "NIH", "grant_number": "U19MH114821" }, { "agency": "NIH", "grant_number": "R24MH114788" }, { "agency": "NIH", "grant_number": "U24MH114827" }, { "agency": "NIH", "grant_number": "R24MH114815" }, { "agency": "NIH", "grant_number": "DC013817" }, { "agency": "Hearing Health Foundation" }, { "agency": "NIH", "grant_number": "GM114267" } ] }, "local_group": { "items": [ { "id": "Division-of-Biology-and-Biological-Engineering" } ] }, "corp_creators": { "items": [ "BRAIN Initiative Cell Census Network (BICCN)" ] }, "doi": "10.1038/s41586-021-03500-8", "pmcid": "PMC8494649", "primary_object": { "basename": "2020.02.29.970558v1.full.pdf", "url": "https://authors.library.caltech.edu/records/13dbk-bh430/files/2020.02.29.970558v1.full.pdf" }, "related_objects": [ { "basename": "41586_2021_3500_Fig9_ESM.webp", "url": "https://authors.library.caltech.edu/records/13dbk-bh430/files/41586_2021_3500_Fig9_ESM.webp" }, { "basename": "41586_2021_3500_MOESM2_ESM.pdf", "url": "https://authors.library.caltech.edu/records/13dbk-bh430/files/41586_2021_3500_MOESM2_ESM.pdf" }, { "basename": "41586_2021_3500_Fig7_ESM.webp", "url": "https://authors.library.caltech.edu/records/13dbk-bh430/files/41586_2021_3500_Fig7_ESM.webp" }, { "basename": "s41586-021-03500-8.pdf", "url": "https://authors.library.caltech.edu/records/13dbk-bh430/files/s41586-021-03500-8.pdf" }, { "basename": "41586_2021_3500_Fig10_ESM.webp", "url": "https://authors.library.caltech.edu/records/13dbk-bh430/files/41586_2021_3500_Fig10_ESM.webp" }, { "basename": "41586_2021_3500_Fig11_ESM.webp", "url": "https://authors.library.caltech.edu/records/13dbk-bh430/files/41586_2021_3500_Fig11_ESM.webp" }, { "basename": "41586_2021_3500_Fig8_ESM.webp", "url": "https://authors.library.caltech.edu/records/13dbk-bh430/files/41586_2021_3500_Fig8_ESM.webp" }, { "basename": "41586_2021_3500_MOESM4_ESM.zip", "url": "https://authors.library.caltech.edu/records/13dbk-bh430/files/41586_2021_3500_MOESM4_ESM.zip" }, { "basename": "41586_2021_3500_MOESM3_ESM.pdf", "url": "https://authors.library.caltech.edu/records/13dbk-bh430/files/41586_2021_3500_MOESM3_ESM.pdf" }, { "basename": "41586_2021_3500_Fig12_ESM.webp", "url": "https://authors.library.caltech.edu/records/13dbk-bh430/files/41586_2021_3500_Fig12_ESM.webp" }, { "basename": "41586_2021_3500_Fig5_ESM.webp", "url": "https://authors.library.caltech.edu/records/13dbk-bh430/files/41586_2021_3500_Fig5_ESM.webp" }, { "basename": "41586_2021_3500_Fig6_ESM.webp", "url": "https://authors.library.caltech.edu/records/13dbk-bh430/files/41586_2021_3500_Fig6_ESM.webp" }, { "basename": "41586_2021_3500_MOESM1_ESM.pdf", "url": "https://authors.library.caltech.edu/records/13dbk-bh430/files/41586_2021_3500_MOESM1_ESM.pdf" } ], "resource_type": "article", "pub_year": "2021", "author_list": "Yao, Zizhen; Liu, Hanqing; et el." }, { "id": "https://authors.library.caltech.edu/records/mb2qc-b8r09", "eprint_id": 101743, "eprint_status": "archive", "datestamp": "2023-08-20 05:31:22", "lastmod": "2023-12-22 23:38:51", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Booeshaghi-A-Sina", "name": { "family": "Booeshaghi", "given": "A. Sina" }, "orcid": "0000-0002-6442-4502" }, { "id": "Yao-Zizhen", "name": { "family": "Yao", "given": "Zizhen" }, "orcid": "0000-0002-9361-5607" }, { "id": "van-Velthoven-Cindy", "name": { "family": "van Velthoven", "given": "Cindy" }, "orcid": "0000-0001-5120-4546" }, { "id": "Smith-Kimberly", "name": { "family": "Smith", "given": "Kimberly" }, "orcid": "0000-0002-3142-1970" }, { "id": "Tasic-Bosiljka", "name": { "family": "Tasic", "given": "Bosiljka" }, "orcid": "0000-0002-6861-4506" }, { "id": "Zeng-Hongkui", "name": { "family": "Zeng", "given": "Hongkui" }, "orcid": "0000-0002-0326-5878" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Isoform cell-type specificity in the mouse primary motor cortex", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 The Author(s) 2021. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. \n\nReceived 09 March 2020; Accepted 27 August 2021; Published 06 October 2021. \n\nWe thank members of the BICCN consortium, especially the Mini-MOp analysis group, for helpful conversations related to transcriptome analysis of the MOp. We thank N. Volovich, V. Ntranos and P. Melsted for help with a preliminary quantification of the SMART-seq data. Figure 1 was created from scratch using the tools available on Biorender.com. Extended Data Fig. 6a was obtained from http://atlas.brain-map.org/atlas. This work was funded by the NIH Brain Initiative via grant U19MH114930 to H.Z. and L.P. \n\nData availability: The single-cell RNA-seq data used in this study were generated as part of the BICCN consortium22. The 10xv3 and SMART-seq data can be downloaded from http://data.nemoarchive.org/biccn/lab/zeng/transcriptome/scell/. The MERFISH data are available at https://caltech.box.com/shared/static/dzqt6ryytmjbgyai356s1z0phtnsbaol.gz. All cell annotations and cluster labels are available at https://github.com/pachterlab/BYVSTZP_2020/tree/master/reference. \n\nCode availability: The software used to generate the results and figures of the paper is available at https://github.com/pachterlab/BYVSTZP_2020. \n\nAuthor Contributions: A.S.B. and L.P. conceived the study. A.S.B. implemented the methods and produced the results and figures. A.S.B. and L.P. analysed the data and wrote the manuscript. Z.Y., C.v.V., K.S., B.T. and H.Z. produced the SMART-seq and 10xv3 data. \n\nThe authors declare no competing interests. \n\nPeer review information: Nature thanks Chris Burge and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.\n\nPublished - s41586-021-03969-3.pdf
Submitted - 2020.03.05.977991v3.full.pdf
Supplemental Material - 41586_2021_3969_Fig10_ESM.webp
Supplemental Material - 41586_2021_3969_Fig11_ESM.webp
Supplemental Material - 41586_2021_3969_Fig12_ESM.webp
Supplemental Material - 41586_2021_3969_Fig13_ESM.webp
Supplemental Material - 41586_2021_3969_Fig14_ESM.webp
Supplemental Material - 41586_2021_3969_Fig5_ESM.webp
Supplemental Material - 41586_2021_3969_Fig6_ESM.webp
Supplemental Material - 41586_2021_3969_Fig7_ESM.webp
Supplemental Material - 41586_2021_3969_Fig8_ESM.webp
Supplemental Material - 41586_2021_3969_Fig9_ESM.webp
Supplemental Material - 41586_2021_3969_MOESM10_ESM.xlsx
Supplemental Material - 41586_2021_3969_MOESM11_ESM.xlsx
Supplemental Material - 41586_2021_3969_MOESM12_ESM.xlsx
Supplemental Material - 41586_2021_3969_MOESM13_ESM.xlsx
Supplemental Material - 41586_2021_3969_MOESM14_ESM.xlsx
Supplemental Material - 41586_2021_3969_MOESM15_ESM.xlsx
Supplemental Material - 41586_2021_3969_MOESM1_ESM.pdf
Supplemental Material - 41586_2021_3969_MOESM2_ESM.pdf
Supplemental Material - 41586_2021_3969_MOESM3_ESM.pdf
Supplemental Material - 41586_2021_3969_MOESM4_ESM.xlsx
Supplemental Material - 41586_2021_3969_MOESM5_ESM.xlsx
Supplemental Material - 41586_2021_3969_MOESM6_ESM.xlsx
Supplemental Material - 41586_2021_3969_MOESM7_ESM.xlsx
Supplemental Material - 41586_2021_3969_MOESM8_ESM.xlsx
Supplemental Material - 41586_2021_3969_MOESM9_ESM.xlsx
", "abstract": "Full-length SMART-seq single-cell RNA sequencing can be used to measure gene expression at isoform resolution, making possible the identification of specific isoform markers for different cell types. Used in conjunction with spatial RNA capture and gene-tagging methods, this enables the inference of spatially resolved isoform expression for different cell types. Here, in a comprehensive analysis of 6,160 mouse primary motor cortex cells assayed with SMART-seq, 280,327 cells assayed with MERFISH and 94,162 cells assayed with 10x Genomics sequencing3, we find examples of isoform specificity in cell types\u2014including isoform shifts between cell types that are masked in gene-level analysis\u2014as well as examples of transcriptional regulation. Additionally, we show that isoform specificity helps to refine cell types, and that a multi-platform analysis of single-cell transcriptomic data leveraging multiple measurements provides a comprehensive atlas of transcription in the mouse primary motor cortex that improves on the possibilities offered by any single technology.", "date": "2021-10-07", "date_type": "published", "publication": "Nature", "volume": "598", "number": "7879", "publisher": "Nature Publishing Group", "pagerange": "195-199", "id_number": "CaltechAUTHORS:20200306-130944112", "issn": "0028-0836", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20200306-130944112", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "U19MH114930" } ] }, "local_group": { "items": [ { "id": "Tianqiao-and-Chrissy-Chen-Institute-for-Neuroscience" }, { "id": "Division-of-Biology-and-Biological-Engineering" } ] }, "doi": "10.1038/s41586-021-03969-3", "pmcid": "PMC8494650", "primary_object": { "basename": "41586_2021_3969_MOESM9_ESM.xlsx", "url": "https://authors.library.caltech.edu/records/mb2qc-b8r09/files/41586_2021_3969_MOESM9_ESM.xlsx" }, "related_objects": [ { "basename": "s41586-021-03969-3.pdf", "url": "https://authors.library.caltech.edu/records/mb2qc-b8r09/files/s41586-021-03969-3.pdf" }, { "basename": "41586_2021_3969_MOESM4_ESM.xlsx", "url": "https://authors.library.caltech.edu/records/mb2qc-b8r09/files/41586_2021_3969_MOESM4_ESM.xlsx" }, { "basename": "41586_2021_3969_Fig7_ESM.webp", "url": "https://authors.library.caltech.edu/records/mb2qc-b8r09/files/41586_2021_3969_Fig7_ESM.webp" }, { "basename": "41586_2021_3969_MOESM2_ESM.pdf", "url": "https://authors.library.caltech.edu/records/mb2qc-b8r09/files/41586_2021_3969_MOESM2_ESM.pdf" }, { "basename": "41586_2021_3969_Fig13_ESM.webp", "url": "https://authors.library.caltech.edu/records/mb2qc-b8r09/files/41586_2021_3969_Fig13_ESM.webp" }, { "basename": "41586_2021_3969_MOESM13_ESM.xlsx", "url": "https://authors.library.caltech.edu/records/mb2qc-b8r09/files/41586_2021_3969_MOESM13_ESM.xlsx" }, { "basename": "41586_2021_3969_MOESM6_ESM.xlsx", "url": "https://authors.library.caltech.edu/records/mb2qc-b8r09/files/41586_2021_3969_MOESM6_ESM.xlsx" }, { "basename": "41586_2021_3969_MOESM12_ESM.xlsx", "url": "https://authors.library.caltech.edu/records/mb2qc-b8r09/files/41586_2021_3969_MOESM12_ESM.xlsx" }, { "basename": "41586_2021_3969_MOESM14_ESM.xlsx", "url": "https://authors.library.caltech.edu/records/mb2qc-b8r09/files/41586_2021_3969_MOESM14_ESM.xlsx" }, { "basename": "41586_2021_3969_MOESM5_ESM.xlsx", "url": "https://authors.library.caltech.edu/records/mb2qc-b8r09/files/41586_2021_3969_MOESM5_ESM.xlsx" }, { "basename": "41586_2021_3969_MOESM8_ESM.xlsx", "url": "https://authors.library.caltech.edu/records/mb2qc-b8r09/files/41586_2021_3969_MOESM8_ESM.xlsx" }, { "basename": "41586_2021_3969_Fig14_ESM.webp", "url": "https://authors.library.caltech.edu/records/mb2qc-b8r09/files/41586_2021_3969_Fig14_ESM.webp" }, { "basename": "41586_2021_3969_Fig9_ESM.webp", "url": "https://authors.library.caltech.edu/records/mb2qc-b8r09/files/41586_2021_3969_Fig9_ESM.webp" }, { "basename": "41586_2021_3969_MOESM3_ESM.pdf", "url": "https://authors.library.caltech.edu/records/mb2qc-b8r09/files/41586_2021_3969_MOESM3_ESM.pdf" }, { "basename": "41586_2021_3969_Fig10_ESM.webp", "url": "https://authors.library.caltech.edu/records/mb2qc-b8r09/files/41586_2021_3969_Fig10_ESM.webp" }, { "basename": "41586_2021_3969_MOESM15_ESM.xlsx", "url": "https://authors.library.caltech.edu/records/mb2qc-b8r09/files/41586_2021_3969_MOESM15_ESM.xlsx" }, { "basename": "41586_2021_3969_Fig5_ESM.webp", "url": "https://authors.library.caltech.edu/records/mb2qc-b8r09/files/41586_2021_3969_Fig5_ESM.webp" }, { "basename": "41586_2021_3969_Fig6_ESM.webp", "url": "https://authors.library.caltech.edu/records/mb2qc-b8r09/files/41586_2021_3969_Fig6_ESM.webp" }, { "basename": "41586_2021_3969_MOESM11_ESM.xlsx", "url": "https://authors.library.caltech.edu/records/mb2qc-b8r09/files/41586_2021_3969_MOESM11_ESM.xlsx" }, { "basename": "41586_2021_3969_Fig11_ESM.webp", "url": "https://authors.library.caltech.edu/records/mb2qc-b8r09/files/41586_2021_3969_Fig11_ESM.webp" }, { "basename": "41586_2021_3969_Fig12_ESM.webp", "url": "https://authors.library.caltech.edu/records/mb2qc-b8r09/files/41586_2021_3969_Fig12_ESM.webp" }, { "basename": "41586_2021_3969_MOESM10_ESM.xlsx", "url": "https://authors.library.caltech.edu/records/mb2qc-b8r09/files/41586_2021_3969_MOESM10_ESM.xlsx" }, { "basename": "41586_2021_3969_MOESM1_ESM.pdf", "url": "https://authors.library.caltech.edu/records/mb2qc-b8r09/files/41586_2021_3969_MOESM1_ESM.pdf" }, { "basename": "41586_2021_3969_MOESM7_ESM.xlsx", "url": "https://authors.library.caltech.edu/records/mb2qc-b8r09/files/41586_2021_3969_MOESM7_ESM.xlsx" }, { "basename": "2020.03.05.977991v3.full.pdf", "url": "https://authors.library.caltech.edu/records/mb2qc-b8r09/files/2020.03.05.977991v3.full.pdf" }, { "basename": "41586_2021_3969_Fig8_ESM.webp", "url": "https://authors.library.caltech.edu/records/mb2qc-b8r09/files/41586_2021_3969_Fig8_ESM.webp" } ], "resource_type": "article", "pub_year": "2021", "author_list": "Booeshaghi, A. Sina; Yao, Zizhen; et el." }, { "id": "https://authors.library.caltech.edu/records/06pjn-hnt60", "eprint_id": 107798, "eprint_status": "archive", "datestamp": "2023-08-22 11:19:11", "lastmod": "2023-12-22 23:16:16", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Booeshaghi-A-Sina", "name": { "family": "Booeshaghi", "given": "A. Sina" }, "orcid": "0000-0002-6442-4502" }, { "id": "Kil-Yeokyoung", "name": { "family": "Kil", "given": "Yeokyoung" }, "orcid": "0000-0002-1235-7379" }, { "id": "Min-Kyung-Hoi", "name": { "family": "Min", "given": "Kyung Hoi" }, "orcid": "0000-0003-0894-4017" }, { "id": "Gehring-Jase", "name": { "family": "Gehring", "given": "Jase" }, "orcid": "0000-0002-3894-9495" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Low-cost, scalable, and automated fluid sampling for fluidics applications", "ispublished": "pub", "full_text_status": "public", "keywords": "Fraction collector; Fluidics; 3D-printing", "note": "\u00a9 2021 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). \n\nReceived 24 February 2021, Revised 3 May 2021, Accepted 8 May 2021, Available online 31 May 2021, Version of Record 5 June 2021. \n\nWe thank Justin Bois for naming the colosseum instrument. We also thank the Caltech Library Techlab for helping us 3D print parts. We thank Taleen Dilanyan for wet lab training and support, and Eduardo da Veiga Beltrame for assistance with 3D printing. The authors received no specific funding for this work. \n\nAuthor contributions: ASB, YK, and LP designed the fraction collector. JG helped set instrument specifications. YK assembled and built the fraction collector and performed the experiments. ASB, YK, and KHM designed the GUI. KHM coded the installable GUI and YK, KHM, and ASB coded the web-browser GUI. ASB coded the browser-serial package. ASB, YK, and KHM wrote the documentation. ASB and YK analyzed the data and made figures. ASB, YK, and LP wrote the manuscript. \n\nData & software availability: All data and software to reproduce the results in this manuscript can be found in Zenodo: https://doi.org/10.5281/zenodo.4677604. The project can be found in the GitHub repository: https://github.com/pachterlab/colosseum. \n\nThe authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.\n\nPublished - 1-s2.0-S2468067221000304-main.pdf
Submitted - 2021.01.27.428538v2.full.pdf
Supplemental Material - 1-s2.0-S2468067221000304-mmc1.docx
Supplemental Material - 1-s2.0-S2468067221000304-mmc2.docx
Supplemental Material - 1-s2.0-S2468067221000304-mmc3.zip
Supplemental Material - 1-s2.0-S2468067221000304-mmc4.zip
Supplemental Material - 1-s2.0-S2468067221000304-mmc5.zip
Supplemental Material - 1-s2.0-S2468067221000304-mmc6.zip
", "abstract": "We present colosseum, a low-cost, modular, and automated fluid sampling device for scalable fluidic applications. The colosseum fraction collector uses a single motor, can be built for less than $100 using off-the-shelf and 3D-printed components, and can be assembled in less than an hour. Build Instructions and source files are available at https://doi.org/10.5281/zenodo.4677604.", "date": "2021-10", "date_type": "published", "publication": "HardwareX", "volume": "10", "publisher": "Elsevier", "pagerange": "Art. No. e00201", "id_number": "CaltechAUTHORS:20210129-070700034", "issn": "2468-0672", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20210129-070700034", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "local_group": { "items": [ { "id": "Division-of-Biology-and-Biological-Engineering" } ] }, "doi": "10.1016/j.ohx.2021.e00201", "primary_object": { "basename": "1-s2.0-S2468067221000304-mmc1.docx", "url": "https://authors.library.caltech.edu/records/06pjn-hnt60/files/1-s2.0-S2468067221000304-mmc1.docx" }, "related_objects": [ { "basename": "1-s2.0-S2468067221000304-mmc2.docx", "url": "https://authors.library.caltech.edu/records/06pjn-hnt60/files/1-s2.0-S2468067221000304-mmc2.docx" }, { "basename": "1-s2.0-S2468067221000304-mmc3.zip", "url": "https://authors.library.caltech.edu/records/06pjn-hnt60/files/1-s2.0-S2468067221000304-mmc3.zip" }, { "basename": "1-s2.0-S2468067221000304-mmc4.zip", "url": "https://authors.library.caltech.edu/records/06pjn-hnt60/files/1-s2.0-S2468067221000304-mmc4.zip" }, { "basename": "1-s2.0-S2468067221000304-mmc5.zip", "url": "https://authors.library.caltech.edu/records/06pjn-hnt60/files/1-s2.0-S2468067221000304-mmc5.zip" }, { "basename": "1-s2.0-S2468067221000304-mmc6.zip", "url": "https://authors.library.caltech.edu/records/06pjn-hnt60/files/1-s2.0-S2468067221000304-mmc6.zip" }, { "basename": "2021.01.27.428538v2.full.pdf", "url": "https://authors.library.caltech.edu/records/06pjn-hnt60/files/2021.01.27.428538v2.full.pdf" }, { "basename": "1-s2.0-S2468067221000304-main.pdf", "url": "https://authors.library.caltech.edu/records/06pjn-hnt60/files/1-s2.0-S2468067221000304-main.pdf" } ], "resource_type": "article", "pub_year": "2021", "author_list": "Booeshaghi, A. Sina; Kil, Yeokyoung; et el." }, { "id": "https://authors.library.caltech.edu/records/jp119-47j39", "eprint_id": 103346, "eprint_status": "archive", "datestamp": "2023-08-20 04:35:37", "lastmod": "2023-12-22 23:31:56", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Booeshaghi-A-Sina", "name": { "family": "Booeshaghi", "given": "A. Sina" }, "orcid": "0000-0002-6442-4502" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Normalization of single-cell RNA-seq counts by log(x+1)* or log(1+x)*", "ispublished": "pub", "full_text_status": "public", "keywords": "single-cell, log1p, normalization, ACE2", "note": "\u00a9 The Author(s) 2021. Published by Oxford University Press.\nThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. \n\nReceived: 14 October 2020; Revision received: 23 December 2020; Editorial decision: 29 January 2021; Accepted: 01 March 2021; Published: 02 March 2021. \n\nWe thank Charles Herring, Michael Hoffman, Johan Gustafsson,\nHarold Pimentel, Jeffrey Spence, and Valentine Svensson for\nhelpful comments. \n\nA.S.B. and L.P. were partially funded by NIH U19MH114830. \n\nData Availability Statement. Data and code that reproduce the results in this paper are available here: https://github.com/pachterlab/BP_2021_2. \n\nConflict of Interest: none declared.\n\nPublished - btab085.pdf
Submitted - 2020.05.19.100214v3.full.pdf
", "abstract": "Single-cell RNA-seq technologies have been successfully employed over the past decade to generate many high resolution cell atlases. These have proved invaluable in recent efforts aimed at understanding the cell type specificity of host genes involved in SARS-CoV-2 infections. While single-cell atlases are based on well-sampled highly-expressed genes, many of the genes of interest for understanding SARS-CoV-2 can be expressed at very low levels. Common assumptions underlying standard single-cell analyses don't hold when examining low-expressed genes, with the result that standard workflows can produce misleading results.", "date": "2021-08-01", "date_type": "published", "publication": "Bioinformatics", "volume": "37", "number": "15", "publisher": "Oxford University Press", "pagerange": "2223-2224", "id_number": "CaltechAUTHORS:20200520-084505912", "issn": "1367-4803", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20200520-084505912", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "U19MH114830" } ] }, "local_group": { "items": [ { "id": "COVID-19" }, { "id": "Division-of-Biology-and-Biological-Engineering" } ] }, "doi": "10.1093/bioinformatics/btab085", "pmcid": "PMC7989636", "primary_object": { "basename": "btab085.pdf", "url": "https://authors.library.caltech.edu/records/jp119-47j39/files/btab085.pdf" }, "related_objects": [ { "basename": "2020.05.19.100214v3.full.pdf", "url": "https://authors.library.caltech.edu/records/jp119-47j39/files/2020.05.19.100214v3.full.pdf" } ], "resource_type": "article", "pub_year": "2021", "author_list": "Booeshaghi, A. Sina and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/05kjw-6t056", "eprint_id": 106738, "eprint_status": "archive", "datestamp": "2024-01-29 19:48:02", "lastmod": "2024-01-29 19:48:02", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Bloom-Joshua-S", "name": { "family": "Bloom", "given": "Joshua S." }, "orcid": "0000-0002-7241-1648" }, { "id": "Sathe-Laila", "name": { "family": "Sathe", "given": "Laila" }, "orcid": "0000-0003-1016-3295" }, { "id": "Munugala-Chetan", "name": { "family": "Munugala", "given": "Chetan" } }, { "id": "Jones-Eric-M", "name": { "family": "Jones", "given": "Eric M." } }, { "id": "Gasperini-Molly", "name": { "family": "Gasperini", "given": "Molly" }, "orcid": "0000-0003-4559-8432" }, { "id": "Lubock-Nathan-B", "name": { "family": "Lubock", "given": "Nathan B." }, "orcid": "0000-0001-8064-2465" }, { "id": "Yarza-Fauna", "name": { "family": "Yarza", "given": "Fauna" }, "orcid": "0000-0002-2512-6182" }, { "id": "Thompson-Erin-M", "name": { "family": "Thompson", "given": "Erin M." }, "orcid": "0000-0002-6085-3051" }, { "id": "Kovary-Kyle-M", "name": { "family": "Kovary", "given": "Kyle M." }, "orcid": "0000-0002-7616-2968" }, { "id": "Park-Jimin", "name": { "family": "Park", "given": "Jimin" } }, { "id": "Marquette-Dawn", "name": { "family": "Marquette", "given": "Dawn" }, "orcid": "0000-0002-3964-7683" }, { "id": "Kay-Stephania", "name": { "family": "Kay", "given": "Stephania" } }, { "id": "Lucas-Mark", "name": { "family": "Lucas", "given": "Mark" } }, { "id": "Love-TreQuan", "name": { "family": "Love", "given": "TreQuan" } }, { "id": "Booeshaghi-A-Sina", "name": { "family": "Booeshaghi", "given": "A. Sina" }, "orcid": "0000-0002-6442-4502" }, { "id": "Brandenberg-Oliver-F", "name": { "family": "Brandenberg", "given": "Oliver F." }, "orcid": "0000-0001-5662-1234" }, { "id": "Guo-Longhua", "name": { "family": "Guo", "given": "Longhua" }, "orcid": "0000-0001-9690-9750" }, { "id": "Boocock-James", "name": { "family": "Boocock", "given": "James" }, "orcid": "0000-0003-0323-8818" }, { "id": "Hochman-Myles", "name": { "family": "Hochman", "given": "Myles" }, "orcid": "0000-0001-5172-6395" }, { "id": "Simpkins-Scott-W", "name": { "family": "Simpkins", "given": "Scott W." }, "orcid": "0000-0002-5997-2838" }, { "id": "Lin-Isabella", "name": { "family": "Lin", "given": "Isabella" }, "orcid": "0000-0002-7102-6879" }, { "id": "LaPierre-Nathan", "name": { "family": "LaPierre", "given": "Nathan" }, "orcid": "0000-0003-2394-8868" }, { "id": "Hong-Duke", "name": { "family": "Hong", "given": "Duke" } }, { "id": "Zhang-Yi", "name": { "family": "Zhang", "given": "Yi" } }, { "id": "Oland-Gabriel", "name": { "family": "Oland", "given": "Gabriel" }, "orcid": "0000-0002-6941-3060" }, { "id": "Choe-Bianca-Judy", "name": { "family": "Choe", "given": "Bianca Judy" } }, { "id": "Chandrasekaran-Sukantha", "name": { "family": "Chandrasekaran", "given": "Sukantha" }, "orcid": "0000-0002-6232-5535" }, { "id": "Hilt-Evann-E", "name": { "family": "Hilt", "given": "Evann E." } }, { "id": "Butte-Manish-J", "name": { "family": "Butte", "given": "Manish J." }, "orcid": "0000-0002-4490-5595" }, { "id": "Damoiseaux-Robert", "name": { "family": "Damoiseaux", "given": "Robert" }, "orcid": "0000-0002-7611-7534" }, { "id": "Kravit-Clifford", "name": { "family": "Kravit", "given": "Clifford" }, "orcid": "0000-0002-0624-5514" }, { "id": "Cooper-Aaron-R", "name": { "family": "Cooper", "given": "Aaron R." }, "orcid": "0000-0003-4588-2513" }, { "id": "Yin-Yi", "name": { "family": "Yin", "given": "Yi" }, "orcid": "0000-0003-0963-2672" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Garner-Omai-B", "name": { "family": "Garner", "given": "Omai B." }, "orcid": "0000-0002-7366-2692" }, { "id": "Flint-Jonathan", "name": { "family": "Flint", "given": "Jonathan" }, "orcid": "0000-0002-9427-4429" }, { "id": "Eskin-Eleazar", "name": { "family": "Eskin", "given": "Eleazar" }, "orcid": "0000-0003-1149-4758" }, { "id": "Luo-Chongyuan", "name": { "family": "Luo", "given": "Chongyuan" }, "orcid": "0000-0002-8541-0695" }, { "id": "Kosuri-Sriram", "name": { "family": "Kosuri", "given": "Sriram" }, "orcid": "0000-0002-4661-0600" }, { "id": "Kruglyak-Leonid", "name": { "family": "Kruglyak", "given": "Leonid" }, "orcid": "0000-0002-8065-3057" }, { "id": "Arboleda-Valerie-A", "name": { "family": "Arboleda", "given": "Valerie A." }, "orcid": "0000-0002-9687-9122" } ] }, "title": "Massively scaled-up testing for SARS-CoV-2 RNA via next-generation sequencing of pooled and barcoded nasal and saliva samples", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2021 Nature Publishing Group. \n\nReceived 20 November 2020; Accepted 20 May 2021; Published 01 July 2021. \n\nWe thank J. Semel for her support. We also thank the staff at the Held Foundation and the Carol Moss Foundation for their support of this project; staff at the UCLA David Geffen School of Medicine's Dean's Office for their support; Fast Grants Inc. for funding this work; L. Starita, B. Martin, J. Gehring, S. Srivatsan, J. Shendure and the members of the Covid Testing Scaleup Slack for their input, guidance and openness in sharing their processes; M. Berro for her guidance with the FDA EUA201963; the clinical laboratory scientists at the UCLA Clinical Microbiology laboratory for their assistance in collecting and processing the remnant specimens and data; our staff at the UCLA SwabSeq COVID19 Testing laboratory for deploying our CLIA test; and L. Yost and A. Martin for their advice and guidance during our scaling process. This work was supported by funding from the Howard Hughes Medical Institute (to L.K.) and DP5OD024579 (to V.A.A.). I.L. is supported by T32GM008042. Figures 1a,f and 3b created with BioRender.com. \n\nData availability: The main data supporting the results in this study are available within the paper and its Supplementary Information. Source data for all figures are available at GitHub (https://github.com/joshsbloom/swabseq). All protocols and primers are available under an Open COVID License online (https://www.notion.so/Octant-COVID-License-816b04b442674433a2a58bff2d8288df). Videos of the workflow for SwabSeq assay are available at Figshare (https://figshare.com/projects/Additional_SwabSeq_Data/113643). \n\nCode availability: All code can be accessed at GitHub (https://github.com/joshsbloom/swabseq). An R package to automate the diagnosis of patient samples is available at GitHub (https://github.com/joshsbloom/swabseqr). Codes for primer design and for the analysis of cross-reactivity can be found at GitHub (https://github.com/octantbio/SwabSeq). The core technology has been made available under the Open COVID Pledge, and software and data under the MIT license (UCLA) and Apache 2.0 license (Octant). \n\nAuthor Contributions: J.S.B. and V.A.A. wrote the manuscript with assistance from C.L., J.F., L.K., E.E., E.M.J., A.R.C., N.B.L., M.G. and S.Kosuri. E.M.J., A.R.C., N.B.L., M.G., S.W.S., J.S.B. and S.Kosuri designed barcodes and performed early testing and analysis of protocols and reagents. C.L., Y.Y., Y.Z., L.G., R.D. and M.J.B. provided early guidance and key automation resources. E.E., D.H., N.L. and C.K. developed the registration webapp and IT infrastructure. L.S., C.M., M.G., E.M.J., N.B.L., S.Kosuri, I.L., O.F.B., V.A.A. and J.S.B. performed and analysed experiments. A.S.B. and L.P. analysed misassignment of index barcodes. V.A.A., O.B.G., S.C., E.E.H., G.O. and B.J.C. collected and processed clinical samples. D.M. optimized operational protocols and D.M., S.Kay, M.L., T.L. and E.E. optimized scale up. E.E., L.K., J.F., C.L., Y.Y., Y.Z. and J.B. provided helpful insights into protocols, software, and development and optimization of our specimen collection and handling. F.Y., E.M.T., K.M.K., J.P. and M.H. developed the diversified S standard mixture, N1 primers and flu primers. \n\nCompeting interests: E.M.J., M.G., N.B.L., S.W.S., F.Y., E.M.T., K.M.K., J.P. and S.Kosuri are employed by and hold equity in Octant Inc., J.S.B. consults for and holds equity in Octant Inc., and A.R.C. holds equity in Octant Inc, which initially developed SwabSeq and has filed for patents for some of the work here, although they have been made available under the Open COVID License (https://www.notion.so/Octant-COVID-License-816b04b442674433a2a58bff2d8288df). \n\nPeer review information: Nature Biomedical Engineering thanks Enzo Poirier and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.\n\nSubmitted - 2020.08.04.20167874v4.full.pdf
Supplemental Material - 41551_2021_754_MOESM1_ESM.pdf
Supplemental Material - 41551_2021_754_MOESM2_ESM.pdf
Supplemental Material - 41551_2021_754_MOESM3_ESM.xlsx
Supplemental Material - 41551_2021_754_MOESM4_ESM.xlsx
Supplemental Material - 41551_2021_754_MOESM5_ESM.xlsx
Supplemental Material - 41551_2021_754_MOESM6_ESM.xlsx
Supplemental Material - 41551_2021_754_MOESM7_ESM.xlsx
", "abstract": "Frequent and widespread testing of members of the population who are asymptomatic for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is essential for the mitigation of the transmission of the virus. Despite the recent increases in testing capacity, tests based on quantitative polymerase chain reaction (qPCR) assays cannot be easily deployed at the scale required for population-wide screening. Here, we show that next-generation sequencing of pooled samples tagged with sample-specific molecular barcodes enables the testing of thousands of nasal or saliva samples for SARS-CoV-2 RNA in a single run without the need for RNA extraction. The assay, which we named SwabSeq, incorporates a synthetic RNA standard that facilitates end-point quantification and the calling of true negatives, and that reduces the requirements for automation, purification and sample-to-sample normalization. We used SwabSeq to perform 80,000 tests, with an analytical sensitivity and specificity comparable to or better than traditional qPCR tests, in less than two months with turnaround times of less than 24\u2009h. SwabSeq could be rapidly adapted for the detection of other pathogens.", "date": "2021-07", "date_type": "published", "publication": "Nature Biomedical Engineering", "volume": "5", "number": "7", "publisher": "Nature Publishing Group", "pagerange": "657-665", "id_number": "CaltechAUTHORS:20201119-132151980", "issn": "2157-846X", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20201119-132151980", "funders": { "items": [ { "agency": "Held Foundation" }, { "agency": "Carol Moss Foundation" }, { "agency": "UCLA" }, { "agency": "Fast Grants, Inc." }, { "agency": "Howard Hughes Medical Institute (HHMI)" }, { "agency": "NIH", "grant_number": "DP5OD024579" }, { "agency": "NIH Predoctoral Fellowship", "grant_number": "T32GM008042" } ] }, "local_group": { "items": [ { "id": "COVID-19" }, { "id": "Division-of-Biology-and-Biological-Engineering" } ] }, "doi": "10.1038/s41551-021-00754-5", "pmcid": "PMC7480060", "primary_object": { "basename": "nihms-1957064.pdf", "url": "https://authors.library.caltech.edu/records/05kjw-6t056/files/nihms-1957064.pdf" }, "related_objects": [ { "basename": "41551_2021_754_MOESM7_ESM.xlsx", "url": "https://authors.library.caltech.edu/records/05kjw-6t056/files/41551_2021_754_MOESM7_ESM.xlsx" }, { "basename": "41551_2021_754_MOESM1_ESM.pdf", "url": "https://authors.library.caltech.edu/records/05kjw-6t056/files/41551_2021_754_MOESM1_ESM.pdf" }, { "basename": "41551_2021_754_MOESM2_ESM.pdf", "url": "https://authors.library.caltech.edu/records/05kjw-6t056/files/41551_2021_754_MOESM2_ESM.pdf" }, { "basename": "41551_2021_754_MOESM3_ESM.xlsx", "url": "https://authors.library.caltech.edu/records/05kjw-6t056/files/41551_2021_754_MOESM3_ESM.xlsx" }, { "basename": "41551_2021_754_MOESM4_ESM.xlsx", "url": "https://authors.library.caltech.edu/records/05kjw-6t056/files/41551_2021_754_MOESM4_ESM.xlsx" }, { "basename": "41551_2021_754_MOESM5_ESM.xlsx", "url": "https://authors.library.caltech.edu/records/05kjw-6t056/files/41551_2021_754_MOESM5_ESM.xlsx" }, { "basename": "41551_2021_754_MOESM6_ESM.xlsx", "url": "https://authors.library.caltech.edu/records/05kjw-6t056/files/41551_2021_754_MOESM6_ESM.xlsx" } ], "resource_type": "article", "pub_year": "2021", "author_list": "Bloom, Joshua S.; Sathe, Laila; et el." }, { "id": "https://authors.library.caltech.edu/records/ykq5c-d8h82", "eprint_id": 106738, "eprint_status": "archive", "datestamp": "2023-08-20 03:54:55", "lastmod": "2024-01-29 17:08:41", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Bloom-Joshua-S", "name": { "family": "Bloom", "given": "Joshua S." }, "orcid": "0000-0002-7241-1648" }, { "id": "Sathe-Laila", "name": { "family": "Sathe", "given": "Laila" }, "orcid": "0000-0003-1016-3295" }, { "id": "Munugala-Chetan", "name": { "family": "Munugala", "given": "Chetan" } }, { "id": "Jones-Eric-M", "name": { "family": "Jones", "given": "Eric M." } }, { "id": "Gasperini-Molly", "name": { "family": "Gasperini", "given": "Molly" }, "orcid": "0000-0003-4559-8432" }, { "id": "Lubock-Nathan-B", "name": { "family": "Lubock", "given": "Nathan B." }, "orcid": "0000-0001-8064-2465" }, { "id": "Yarza-Fauna", "name": { "family": "Yarza", "given": "Fauna" }, "orcid": "0000-0002-2512-6182" }, { "id": "Thompson-Erin-M", "name": { "family": "Thompson", "given": "Erin M." }, "orcid": "0000-0002-6085-3051" }, { "id": "Kovary-Kyle-M", "name": { "family": "Kovary", "given": "Kyle M." }, "orcid": "0000-0002-7616-2968" }, { "id": "Park-Jimin", "name": { "family": "Park", "given": "Jimin" } }, { "id": "Marquette-Dawn", "name": { "family": "Marquette", "given": "Dawn" }, "orcid": "0000-0002-3964-7683" }, { "id": "Kay-Stephania", "name": { "family": "Kay", "given": "Stephania" } }, { "id": "Lucas-Mark", "name": { "family": "Lucas", "given": "Mark" } }, { "id": "Love-TreQuan", "name": { "family": "Love", "given": "TreQuan" } }, { "id": "Booeshaghi-A-Sina", "name": { "family": "Booeshaghi", "given": "A. Sina" }, "orcid": "0000-0002-6442-4502" }, { "id": "Brandenberg-Oliver-F", "name": { "family": "Brandenberg", "given": "Oliver F." }, "orcid": "0000-0001-5662-1234" }, { "id": "Guo-Longhua", "name": { "family": "Guo", "given": "Longhua" }, "orcid": "0000-0001-9690-9750" }, { "id": "Boocock-James", "name": { "family": "Boocock", "given": "James" }, "orcid": "0000-0003-0323-8818" }, { "id": "Hochman-Myles", "name": { "family": "Hochman", "given": "Myles" }, "orcid": "0000-0001-5172-6395" }, { "id": "Simpkins-Scott-W", "name": { "family": "Simpkins", "given": "Scott W." }, "orcid": "0000-0002-5997-2838" }, { "id": "Lin-Isabella", "name": { "family": "Lin", "given": "Isabella" }, "orcid": "0000-0002-7102-6879" }, { "id": "LaPierre-Nathan", "name": { "family": "LaPierre", "given": "Nathan" }, "orcid": "0000-0003-2394-8868" }, { "id": "Hong-Duke", "name": { "family": "Hong", "given": "Duke" } }, { "id": "Zhang-Yi", "name": { "family": "Zhang", "given": "Yi" } }, { "id": "Oland-Gabriel", "name": { "family": "Oland", "given": "Gabriel" }, "orcid": "0000-0002-6941-3060" }, { "id": "Choe-Bianca-Judy", "name": { "family": "Choe", "given": "Bianca Judy" } }, { "id": "Chandrasekaran-Sukantha", "name": { "family": "Chandrasekaran", "given": "Sukantha" }, "orcid": "0000-0002-6232-5535" }, { "id": "Hilt-Evann-E", "name": { "family": "Hilt", "given": "Evann E." } }, { "id": "Butte-Manish-J", "name": { "family": "Butte", "given": "Manish J." }, "orcid": "0000-0002-4490-5595" }, { "id": "Damoiseaux-Robert", "name": { "family": "Damoiseaux", "given": "Robert" }, "orcid": "0000-0002-7611-7534" }, { "id": "Kravit-Clifford", "name": { "family": "Kravit", "given": "Clifford" }, "orcid": "0000-0002-0624-5514" }, { "id": "Cooper-Aaron-R", "name": { "family": "Cooper", "given": "Aaron R." }, "orcid": "0000-0003-4588-2513" }, { "id": "Yin-Yi", "name": { "family": "Yin", "given": "Yi" }, "orcid": "0000-0003-0963-2672" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Garner-Omai-B", "name": { "family": "Garner", "given": "Omai B." }, "orcid": "0000-0002-7366-2692" }, { "id": "Flint-Jonathan", "name": { "family": "Flint", "given": "Jonathan" }, "orcid": "0000-0002-9427-4429" }, { "id": "Eskin-Eleazar", "name": { "family": "Eskin", "given": "Eleazar" }, "orcid": "0000-0003-1149-4758" }, { "id": "Luo-Chongyuan", "name": { "family": "Luo", "given": "Chongyuan" }, "orcid": "0000-0002-8541-0695" }, { "id": "Kosuri-Sriram", "name": { "family": "Kosuri", "given": "Sriram" }, "orcid": "0000-0002-4661-0600" }, { "id": "Kruglyak-Leonid", "name": { "family": "Kruglyak", "given": "Leonid" }, "orcid": "0000-0002-8065-3057" }, { "id": "Arboleda-Valerie-A", "name": { "family": "Arboleda", "given": "Valerie A." }, "orcid": "0000-0002-9687-9122" } ] }, "title": "Massively scaled-up testing for SARS-CoV-2 RNA via next-generation sequencing of pooled and barcoded nasal and saliva samples", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2021 Nature Publishing Group. \n\nReceived 20 November 2020; Accepted 20 May 2021; Published 01 July 2021. \n\nWe thank J. Semel for her support. We also thank the staff at the Held Foundation and the Carol Moss Foundation for their support of this project; staff at the UCLA David Geffen School of Medicine's Dean's Office for their support; Fast Grants Inc. for funding this work; L. Starita, B. Martin, J. Gehring, S. Srivatsan, J. Shendure and the members of the Covid Testing Scaleup Slack for their input, guidance and openness in sharing their processes; M. Berro for her guidance with the FDA EUA201963; the clinical laboratory scientists at the UCLA Clinical Microbiology laboratory for their assistance in collecting and processing the remnant specimens and data; our staff at the UCLA SwabSeq COVID19 Testing laboratory for deploying our CLIA test; and L. Yost and A. Martin for their advice and guidance during our scaling process. This work was supported by funding from the Howard Hughes Medical Institute (to L.K.) and DP5OD024579 (to V.A.A.). I.L. is supported by T32GM008042. Figures 1a,f and 3b created with BioRender.com. \n\nData availability: The main data supporting the results in this study are available within the paper and its Supplementary Information. Source data for all figures are available at GitHub (https://github.com/joshsbloom/swabseq). All protocols and primers are available under an Open COVID License online (https://www.notion.so/Octant-COVID-License-816b04b442674433a2a58bff2d8288df). Videos of the workflow for SwabSeq assay are available at Figshare (https://figshare.com/projects/Additional_SwabSeq_Data/113643). \n\nCode availability: All code can be accessed at GitHub (https://github.com/joshsbloom/swabseq). An R package to automate the diagnosis of patient samples is available at GitHub (https://github.com/joshsbloom/swabseqr). Codes for primer design and for the analysis of cross-reactivity can be found at GitHub (https://github.com/octantbio/SwabSeq). The core technology has been made available under the Open COVID Pledge, and software and data under the MIT license (UCLA) and Apache 2.0 license (Octant). \n\nAuthor Contributions: J.S.B. and V.A.A. wrote the manuscript with assistance from C.L., J.F., L.K., E.E., E.M.J., A.R.C., N.B.L., M.G. and S.Kosuri. E.M.J., A.R.C., N.B.L., M.G., S.W.S., J.S.B. and S.Kosuri designed barcodes and performed early testing and analysis of protocols and reagents. C.L., Y.Y., Y.Z., L.G., R.D. and M.J.B. provided early guidance and key automation resources. E.E., D.H., N.L. and C.K. developed the registration webapp and IT infrastructure. L.S., C.M., M.G., E.M.J., N.B.L., S.Kosuri, I.L., O.F.B., V.A.A. and J.S.B. performed and analysed experiments. A.S.B. and L.P. analysed misassignment of index barcodes. V.A.A., O.B.G., S.C., E.E.H., G.O. and B.J.C. collected and processed clinical samples. D.M. optimized operational protocols and D.M., S.Kay, M.L., T.L. and E.E. optimized scale up. E.E., L.K., J.F., C.L., Y.Y., Y.Z. and J.B. provided helpful insights into protocols, software, and development and optimization of our specimen collection and handling. F.Y., E.M.T., K.M.K., J.P. and M.H. developed the diversified S standard mixture, N1 primers and flu primers. \n\nCompeting interests: E.M.J., M.G., N.B.L., S.W.S., F.Y., E.M.T., K.M.K., J.P. and S.Kosuri are employed by and hold equity in Octant Inc., J.S.B. consults for and holds equity in Octant Inc., and A.R.C. holds equity in Octant Inc, which initially developed SwabSeq and has filed for patents for some of the work here, although they have been made available under the Open COVID License (https://www.notion.so/Octant-COVID-License-816b04b442674433a2a58bff2d8288df). \n\nPeer review information: Nature Biomedical Engineering thanks Enzo Poirier and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.\n\nSubmitted - 2020.08.04.20167874v4.full.pdf
Supplemental Material - 41551_2021_754_MOESM1_ESM.pdf
Supplemental Material - 41551_2021_754_MOESM2_ESM.pdf
Supplemental Material - 41551_2021_754_MOESM3_ESM.xlsx
Supplemental Material - 41551_2021_754_MOESM4_ESM.xlsx
Supplemental Material - 41551_2021_754_MOESM5_ESM.xlsx
Supplemental Material - 41551_2021_754_MOESM6_ESM.xlsx
Supplemental Material - 41551_2021_754_MOESM7_ESM.xlsx
", "abstract": "Frequent and widespread testing of members of the population who are asymptomatic for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is essential for the mitigation of the transmission of the virus. Despite the recent increases in testing capacity, tests based on quantitative polymerase chain reaction (qPCR) assays cannot be easily deployed at the scale required for population-wide screening. Here, we show that next-generation sequencing of pooled samples tagged with sample-specific molecular barcodes enables the testing of thousands of nasal or saliva samples for SARS-CoV-2 RNA in a single run without the need for RNA extraction. The assay, which we named SwabSeq, incorporates a synthetic RNA standard that facilitates end-point quantification and the calling of true negatives, and that reduces the requirements for automation, purification and sample-to-sample normalization. We used SwabSeq to perform 80,000 tests, with an analytical sensitivity and specificity comparable to or better than traditional qPCR tests, in less than two months with turnaround times of less than 24\u2009h. SwabSeq could be rapidly adapted for the detection of other pathogens.", "date": "2021-07", "date_type": "published", "publication": "Nature Biomedical Engineering", "volume": "5", "number": "7", "publisher": "Nature Publishing Group", "pagerange": "657-665", "id_number": "CaltechAUTHORS:20201119-132151980", "issn": "2157-846X", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20201119-132151980", "funders": { "items": [ { "agency": "Held Foundation" }, { "agency": "Carol Moss Foundation" }, { "agency": "UCLA" }, { "agency": "Fast Grants, Inc." }, { "agency": "Howard Hughes Medical Institute (HHMI)" }, { "agency": "NIH", "grant_number": "DP5OD024579" }, { "agency": "NIH Predoctoral Fellowship", "grant_number": "T32GM008042" } ] }, "local_group": { "items": [ { "id": "COVID-19" }, { "id": "Division-of-Biology-and-Biological-Engineering" } ] }, "doi": "10.1038/s41551-021-00754-5", "pmcid": "PMC7480060", "primary_object": { "basename": "41551_2021_754_MOESM3_ESM.xlsx", "url": "https://authors.library.caltech.edu/records/ykq5c-d8h82/files/41551_2021_754_MOESM3_ESM.xlsx" }, "related_objects": [ { "basename": "41551_2021_754_MOESM4_ESM.xlsx", "url": "https://authors.library.caltech.edu/records/ykq5c-d8h82/files/41551_2021_754_MOESM4_ESM.xlsx" }, { "basename": "41551_2021_754_MOESM5_ESM.xlsx", "url": "https://authors.library.caltech.edu/records/ykq5c-d8h82/files/41551_2021_754_MOESM5_ESM.xlsx" }, { "basename": "41551_2021_754_MOESM6_ESM.xlsx", "url": "https://authors.library.caltech.edu/records/ykq5c-d8h82/files/41551_2021_754_MOESM6_ESM.xlsx" }, { "basename": "41551_2021_754_MOESM7_ESM.xlsx", "url": "https://authors.library.caltech.edu/records/ykq5c-d8h82/files/41551_2021_754_MOESM7_ESM.xlsx" }, { "basename": "2020.08.04.20167874v4.full.pdf", "url": "https://authors.library.caltech.edu/records/ykq5c-d8h82/files/2020.08.04.20167874v4.full.pdf" }, { "basename": "41551_2021_754_MOESM1_ESM.pdf", "url": "https://authors.library.caltech.edu/records/ykq5c-d8h82/files/41551_2021_754_MOESM1_ESM.pdf" }, { "basename": "41551_2021_754_MOESM2_ESM.pdf", "url": "https://authors.library.caltech.edu/records/ykq5c-d8h82/files/41551_2021_754_MOESM2_ESM.pdf" } ], "resource_type": "article", "pub_year": "2021", "author_list": "Bloom, Joshua S.; Sathe, Laila; et el." }, { "id": "https://authors.library.caltech.edu/records/363j8-nw138", "eprint_id": 108622, "eprint_status": "archive", "datestamp": "2023-08-22 10:20:43", "lastmod": "2023-12-22 23:16:14", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Melsted-P\u00e1ll", "name": { "family": "Melsted", "given": "P\u00e1ll" }, "orcid": "0000-0002-8418-6724" }, { "id": "Booeshaghi-A-Sina", "name": { "family": "Booeshaghi", "given": "A. Sina" }, "orcid": "0000-0002-6442-4502" }, { "id": "Liu-Lauren", "name": { "family": "Liu", "given": "Lauren" } }, { "id": "Gao-Fan", "name": { "family": "Gao", "given": "Fan" } }, { "id": "Lu-Lambda", "name": { "family": "Lu", "given": "Lambda" }, "orcid": "0000-0002-7092-9427" }, { "id": "Min-Kyung-Hoi", "name": { "family": "Min", "given": "Kyung Hoi" }, "orcid": "0000-0003-0894-4017" }, { "id": "da-Veiga-Beltrame-Eduardo", "name": { "family": "da Veiga Beltrame", "given": "Eduardo" }, "orcid": "0000-0002-1529-9207" }, { "id": "Hjorleifsson-Kristj\u00e1n-E", "name": { "family": "Hjorleifsson", "given": "Kristj\u00e1n Eldj\u00e1rn" }, "orcid": "0000-0002-7851-1818" }, { "id": "Gehring-Jase", "name": { "family": "Gehring", "given": "Jase" }, "orcid": "0000-0002-3894-9495" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Modular, efficient and constant-memory single-cell RNA-seq preprocessing", "ispublished": "pub", "full_text_status": "public", "keywords": "Genome informatics; Software; Transcriptomics", "note": "\u00a9 2021 Nature Publishing Group. \n\nReceived 07 August 2019; Accepted 09 February 2021; Published 01 April 2021. \n\nWe thank V. Ntranos and V. Svensson for helpful suggestions and comments. We thank J. Farrell for the D. rerio gene annotation used to process SRR6956073, J. Schiefelbein for the A. thaliana gene annotation used to process SRR8257100, J. Fear for the D. melanogaster gene annotation used to process SRR8513910, and J. Kim and Q. Zhu for the C. elegans gene annotation used to process SRR8611943. The benchmarking work was made possible, in part, thanks to support from the Beckman Institute Caltech Bioinformatics Resource Center. A.S.B. and L.P. were funded in part by NIH U19MH114830. \n\nData availability: A diverse set of 20 datasets was compiled for the purpose of benchmarking preprocessing workflows. Datasets produced and distributed by 10x Genomics were downloaded from the 10x Genomics data downloads page: https://support.10xgenomics.com/single-cell-gene-expression/datasets. Six v3 chemistry datasets and two v2 chemistry datasets were downloaded and processed (Supplementary Table 3). Another 12 datasets were obtained from either the SRA or the European Nucleotide Archive; all were produced with 10x Genomics v2 chemistry. For six of the datasets (SRR6956073, SRR6998058, SRR7299563, SRR8206317, SRR8327928 and SRR8524760), the BAM files were downloaded and the Cell Ranger utility bamtofastq was run to produce FASTQ files for preprocessing from Cell Ranger\u2013structured BAM files. FASTQ files were downloaded directly for the datasets E-MTAB-7320, SRR8257100, SRR8513910, SRR8599150 (available at https://github.com/bustools/getting_started/releases/download/getting_started/SRR8599150_S1_L001_R1_001.fastq.gz and https://github.com/bustools/getting_started/releases/download/getting_started/SRR8599150_S1_L001_R2_001.fastq.gz), SRR8611943 and SRR8639063. \n\nCode availability: The software versions used for the results in the paper were: Alevin v0.13.1, bustools v0.39.1, Cell Ranger v3.0.0, DropletUtils v1.6.1, kallisto v0.46.0, Python 3.7, R v3.5.2, Scanpy v1.4.1, scvelo 0.1.17, Seurat v3.0, snakemake v5.3.0, STARsolo v2.7.0e, velocyto v0.17.17, wc v8.22 (GNU coreutils) and zcat v1.5 (gzip). All programs were run with default options unless otherwise specified. The code to reproduce the findings of this paper is available at https://github.com/pachterlab/MBLGLMBHGP_2021/, kallisto is available at https://github.com/pachterlab/kallisto/ and bustools is available at https://github.com/BUStools/bustools/. Documentation and tutorials for using the kallisto bustools scRNA-seq workflow are available at http://pachterlab.github.io/kallistobustools. \n\nDetails of all datasets and their accession numbers can be found in Supplementary Table 3. All genome annotations and reference transcriptomes can be found at https://doi.org/10.22002/D1.1876. \n\nThese authors contributed equally: P\u00e1ll Melsted, A. Sina Booeshaghi. \n\nAuthor Contributions: P.M., A.S.B., L. Liu and L.P. developed the algorithms for bustools and P.M., A.S.B. and L. Liu wrote the software. A.S.B. conceived of and performed the UMI and barcode calculations motivating the algorithms. F.G. implemented and performed the benchmarking procedure, and curated indices for the datasets. A.S.B. and E.d.V.B. designed and produced the comparisons between Cell Ranger and kallisto bustools. L. Lu investigated in detail the performance of different workflows on the \"10k mouse neuron\" data and produced the analysis of that dataset. A.S.B. designed the RNA velocity workflow and performed the RNA velocity analyses. K.M.H contributed to the development of the reproducible workflow. K.E.H. developed and investigated the effect of reference transcriptome sequences for pseudoalignment. J.G. interpreted results and helped to supervise the research. A.S.B. planned, organized and prepared figures. A.S.B., E.d.V.B., P.M. and L.P. planned the manuscript. A.S.B. and L.P. wrote the manuscript. \n\nThe authors declare no competing interests. \n\nPeer review information: Nature Biotechnology thanks the anonymous reviewers for their contribution to the peer review of this work.\n\nSupplemental Material - 41587_2021_870_MOESM1_ESM.pdf
Supplemental Material - 41587_2021_870_MOESM2_ESM.pdf
Supplemental Material - 41587_2021_870_MOESM3_ESM.xlsx
Supplemental Material - 41587_2021_870_MOESM4_ESM.xlsx
", "abstract": "We describe a workflow for preprocessing of single-cell RNA-sequencing data that balances efficiency and accuracy. Our workflow is based on the kallisto and bustools programs, and is near optimal in speed with a constant memory requirement providing scalability for arbitrarily large datasets. The workflow is modular, and we demonstrate its flexibility by showing how it can be used for RNA velocity analyses.", "date": "2021-07", "date_type": "published", "publication": "Nature Biotechnology", "volume": "39", "number": "7", "publisher": "Nature Publishing Group", "pagerange": "813-818", "id_number": "CaltechAUTHORS:20210405-142728694", "issn": "1087-0156", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20210405-142728694", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "Caltech Beckman Institute" }, { "agency": "NIH", "grant_number": "U19MH114830" } ] }, "local_group": { "items": [ { "id": "Division-of-Biology-and-Biological-Engineering" } ] }, "doi": "10.1038/s41587-021-00870-2", "primary_object": { "basename": "41587_2021_870_MOESM1_ESM.pdf", "url": "https://authors.library.caltech.edu/records/363j8-nw138/files/41587_2021_870_MOESM1_ESM.pdf" }, "related_objects": [ { "basename": "41587_2021_870_MOESM2_ESM.pdf", "url": "https://authors.library.caltech.edu/records/363j8-nw138/files/41587_2021_870_MOESM2_ESM.pdf" }, { "basename": "41587_2021_870_MOESM3_ESM.xlsx", "url": "https://authors.library.caltech.edu/records/363j8-nw138/files/41587_2021_870_MOESM3_ESM.xlsx" }, { "basename": "41587_2021_870_MOESM4_ESM.xlsx", "url": "https://authors.library.caltech.edu/records/363j8-nw138/files/41587_2021_870_MOESM4_ESM.xlsx" } ], "resource_type": "article", "pub_year": "2021", "author_list": "Melsted, P\u00e1ll; Booeshaghi, A. Sina; et el." }, { "id": "https://authors.library.caltech.edu/records/anq5b-hzp82", "eprint_id": 104254, "eprint_status": "archive", "datestamp": "2023-08-22 10:08:43", "lastmod": "2023-12-22 23:16:47", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Gustafsson-Johan", "name": { "family": "Gustafsson", "given": "Johan" }, "orcid": "0000-0001-5072-2659" }, { "id": "Robinson-Jonathan", "name": { "family": "Robinson", "given": "Jonathan" }, "orcid": "0000-0001-8567-5960" }, { "id": "Nielsen-Jens", "name": { "family": "Nielsen", "given": "Jens" }, "orcid": "0000-0002-9955-6003" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "BUTTERFLY: addressing the pooled amplification paradox with unique molecular identifiers in single-cell RNA-seq", "ispublished": "pub", "full_text_status": "public", "keywords": "Single-cell RNA-Seq; UMI; Droplet-based; PCR; Bias; Amplification; Batch correction; Correction", "note": "\u00a9 The Author(s). 2021. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. \n\nReceived 27 August 2020; Accepted 21 May 2021; Published 08 June 2021. \n\nWe thank Pall Melsted, Sina Booeshaghi, and Joseph Min for helpful suggestions on the project and on the integration of BUTTERFLY in bustools. \n\nAvailability of data and materials: Means to access the datasets analyzed during the current study are listed in Additional file 1: Table S2. The source code as well as Jupyter notebooks for generating the figures is available in GitHub [36], as well as the source code for the branch of bustools used in this project [37]. Snapshots of the repositories are available in Zenodo [38]. Jupiter notebooks have not been produced for Fig. 5 and the data generation for Additional file 1: Fig. S6, S24, and S25, due to difficulties in setting up the right versions of R packages, but the code is directly available in the Github repository. \n\nAll code is released under the BSD 2-clause license, except for the R file \"modZTNB.R,\" which is released under a GPL3 license to comply with the GPL3 license of PreseqR. \n\nReview history: The review history is available as Additional file 3. \n\nPeer review information: Barbara Cheifet was the primary editor of this article and managed its editorial process and peer review in collaboration with the rest of the editorial team. \n\nThis work was supported by funding from the Knut and Alice Wallenberg foundation (J.N.), the National Cancer Institute of the National Institutes of Health under award number F32CA220848 (J.R.), and NIH U19MH114830 (L.P.). \n\nAuthor Contributions: Conceptualization, J.G., L.P.; Methodology, J.G, L.P.; software, J.G. ; writing\u2014original draft, J.G., L.P.; writing\u2014review and editing, J.G., L.P., J.R., J.N.; supervision, L.P., J.R., J.N.; funding acquisition, L.P., J.R., J.N. The authors read and approved the final manuscript. \n\nEthics approval and consent to participate: Not applicable. \n\nConsent for publication: Not applicable. \n\nThe authors declare that they have no competing interests.\n\nPublished - s13059-021-02386-z.pdf
Submitted - 2020.07.06.188003v2.full.pdf
Supplemental Material - 13059_2021_2386_MOESM1_ESM.pdf
Supplemental Material - 13059_2021_2386_MOESM2_ESM.xlsx
Supplemental Material - 13059_2021_2386_MOESM3_ESM.docx
", "abstract": "The incorporation of unique molecular identifiers (UMIs) in single-cell RNA-seq assays makes possible the identification of duplicated molecules, thereby facilitating the counting of distinct molecules from sequenced reads. However, we show that the na\u00efve removal of duplicates can lead to a bias due to a \"pooled amplification paradox,\" and we propose an improved quantification method based on unseen species modeling. Our correction called BUTTERFLY uses a zero truncated negative binomial estimator implemented in the kallisto bustools workflow. We demonstrate its efficacy across cell types and genes and show that in some cases it can invert the relative abundance of genes.", "date": "2021-06-08", "date_type": "published", "publication": "Genome Biology", "volume": "22", "publisher": "BioMed Central", "pagerange": "Art. No. 174", "id_number": "CaltechAUTHORS:20200707-114817234", "issn": "1474-760X", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20200707-114817234", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "Knut and Alice Wallenberg Foundation" }, { "agency": "NIH Postdoctoral Fellowship", "grant_number": "F32CA220848" }, { "agency": "NIH", "grant_number": "U19MH114830" } ] }, "local_group": { "items": [ { "id": "Division-of-Biology-and-Biological-Engineering" } ] }, "doi": "10.1186/s13059-021-02386-z", "pmcid": "PMC8188791", "primary_object": { "basename": "13059_2021_2386_MOESM1_ESM.pdf", "url": "https://authors.library.caltech.edu/records/anq5b-hzp82/files/13059_2021_2386_MOESM1_ESM.pdf" }, "related_objects": [ { "basename": "13059_2021_2386_MOESM2_ESM.xlsx", "url": "https://authors.library.caltech.edu/records/anq5b-hzp82/files/13059_2021_2386_MOESM2_ESM.xlsx" }, { "basename": "13059_2021_2386_MOESM3_ESM.docx", "url": "https://authors.library.caltech.edu/records/anq5b-hzp82/files/13059_2021_2386_MOESM3_ESM.docx" }, { "basename": "2020.07.06.188003v2.full.pdf", "url": "https://authors.library.caltech.edu/records/anq5b-hzp82/files/2020.07.06.188003v2.full.pdf" }, { "basename": "s13059-021-02386-z.pdf", "url": "https://authors.library.caltech.edu/records/anq5b-hzp82/files/s13059-021-02386-z.pdf" } ], "resource_type": "article", "pub_year": "2021", "author_list": "Gustafsson, Johan; Robinson, Jonathan; et el." }, { "id": "https://authors.library.caltech.edu/records/t6026-vtf40", "eprint_id": 108918, "eprint_status": "archive", "datestamp": "2023-08-20 01:50:33", "lastmod": "2023-12-22 23:16:55", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Gorin-Gennady", "name": { "family": "Gorin", "given": "Gennady" }, "orcid": "0000-0001-6097-2029" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Analysis of Length Biases in Single-Cell RNA Sequencing of Unspliced mRNA by Markov Modeling", "ispublished": "pub", "full_text_status": "restricted", "note": "\u00a9 2021 Biophysical Society. \n\nAvailable online 12 February 2021.", "abstract": "Recent experimental advances in single-cell RNA sequencing (scRNA-seq) have enabled the quantification of transcriptomes with single-molecule resolution. However, thus far, the stochastic modeling of transcription has been separate from the discussion of the statistics of the sequencing process, leading to simplifications that may obfuscate transcriptional dynamics, and technical artifacts in the assays. For example, imputation, normalization, and smoothing, used to correct for stochastic sequencing phenomena, make experimental molecule count data incompatible with a discrete representation, thus rendering the data uninterpretable in the context of conventional Chemical Master Equation (CME) models. Models of gene expression - such as the negative binomial count model - are used with limited physical justification, whereas models for multimodal data are under-explored. Conversely, more detailed CME descriptions of gene expression do not directly address the complexities of the sequencing process. We demonstrate that modeling both phenomena reveals a pervasive gene length-based effect in the detection of unspliced mRNA: long genes are substantially more likely to have higher average unspliced mRNA expression. To explain this effect, we build a stochastic model that accounts for physiological and experimental events, and jointly infer hundreds of gene-specific as well as transcriptome-wide parameters. Specifically, we extend a joint model of mRNA processing described by Singh and Bokes (Biophys. J., 2012) to incorporate downstream Poisson sampling, representing cDNA library construction and sequencing. The explicit inclusion of sampling yields mechanistically interpretable results for the gene expression parameters, and suggests extensions to more complex models.", "date": "2021-02-12", "date_type": "published", "publication": "Biophysical Journal", "volume": "120", "number": "3", "publisher": "Biophysical Society", "pagerange": "81A", "id_number": "CaltechAUTHORS:20210503-100056268", "issn": "0006-3495", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20210503-100056268", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "local_group": { "items": [ { "id": "Division-of-Biology-and-Biological-Engineering" } ] }, "doi": "10.1016/j.bpj.2020.11.706", "resource_type": "article", "pub_year": "2021", "author_list": "Gorin, Gennady and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/8h643-s8r26", "eprint_id": 108921, "eprint_status": "archive", "datestamp": "2023-08-20 01:50:55", "lastmod": "2023-12-22 23:16:20", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Vastola-John-J", "name": { "family": "Vastola", "given": "John J." } }, { "id": "Gorin-Gennady", "name": { "family": "Gorin", "given": "Gennady" }, "orcid": "0000-0001-6097-2029" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Holmes-William-R", "name": { "family": "Holmes", "given": "William R." } } ] }, "title": "Learning the Dynamics of Bursty Transcription and Splicing using Ultra-Fast Parameter Inference and New Analytical Solutions of the Chemical Master Equation", "ispublished": "pub", "full_text_status": "restricted", "note": "\u00a9 2021 Biophysical Society. \n\nAvailable online 12 February 2021.", "abstract": "Single cell RNA counts data is increasingly available, and can in principle be used to extract mechanistic insight about transcription and splicing dynamics. In order to infer numbers related to processes of biophysical interest---for example, splicing rates, RNA production rates, RNA degradation rates, and the number of splicing steps involved in processing some particular kind of RNA---it is necessary to compare the predictions of quantitative models with counts data. In practice, this involves generating model predictions for an enormous number of parameter sets, and using some measure of goodness of fit to determine reasonable parameter ranges; because this procedure tends to be extremely computationally expensive, one can typically fit only very simple models involving a small state space and small number of parameters. We report on a new approach to fitting the dynamics of bursty transcription and splicing, which uses newly derived analytical solutions to the chemical master equation to greatly speed up parameter inference. The associated speedup, which we have found on simulated counts data to be many orders of magnitude in some cases, comes from not using stochastic simulations or numerical approaches like finite state projection, but the aforementioned closed-form mathematical formulas. Our approach applies to models of splicing involving arbitrarily many splicing steps, introns that can be removed in an arbitrary order, and arbitrarily many downstream alternatively spliced variants. Moreover, it scales extremely well as one's splicing model gets increasingly complicated (e.g. more splicing steps, more alternative splicing branches). We comment on some of the issues associated with using these algorithms to learn parameters from real counts data, including identifiability problems.", "date": "2021-02-12", "date_type": "published", "publication": "Biophysical Journal", "volume": "120", "number": "3", "publisher": "Biophysical Society", "pagerange": "135A", "id_number": "CaltechAUTHORS:20210503-102227319", "issn": "0006-3495", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20210503-102227319", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "local_group": { "items": [ { "id": "Division-of-Biology-and-Biological-Engineering" } ] }, "doi": "10.1016/j.bpj.2020.11.1018", "resource_type": "article", "pub_year": "2021", "author_list": "Vastola, John J.; Gorin, Gennady; et el." }, { "id": "https://authors.library.caltech.edu/records/r5anw-4eh31", "eprint_id": 103585, "eprint_status": "archive", "datestamp": "2023-08-22 07:58:55", "lastmod": "2023-12-22 23:31:49", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Booeshaghi-A-Sina", "name": { "family": "Booeshaghi", "given": "A. Sina" }, "orcid": "0000-0002-6442-4502" }, { "id": "Lubock-Nathan-B", "name": { "family": "Lubock", "given": "Nathan B." }, "orcid": "0000-0001-8064-2465" }, { "id": "Cooper-Aaron-R", "name": { "family": "Cooper", "given": "Aaron R." }, "orcid": "0000-0003-4588-2513" }, { "id": "Simpkins-Scott-W", "name": { "family": "Simpkins", "given": "Scott W." }, "orcid": "0000-0002-5997-2838" }, { "id": "Bloom-Joshua-S", "name": { "family": "Bloom", "given": "Joshua S." }, "orcid": "0000-0002-7241-1648" }, { "id": "Gehring-Jase", "name": { "family": "Gehring", "given": "Jase" }, "orcid": "0000-0002-3894-9495" }, { "id": "Luebbert-Laura", "name": { "family": "Luebbert", "given": "Laura" }, "orcid": "0000-0003-1379-2927" }, { "id": "Kosuri-Sriram", "name": { "family": "Kosuri", "given": "Sriram" }, "orcid": "0000-0002-4661-0600" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Reliable and accurate diagnostics from highly multiplexed sequencing assays", "ispublished": "pub", "full_text_status": "public", "keywords": "Computational biology and bioinformatics; Infectious diseases", "note": "\u00a9 2020 The Author(s). This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. \n\nReceived 02 November 2020; Accepted 24 November 2020; Published 10 December 2020. \n\nWe thank P\u00e1ll Melsted for assistance with bustools. \n\nAuthor Contributions: A.S.B. and L.P. developed the kallisto|bustools approach to processing and analyzing HMSA data. A.S.B. adapted kallisto and bustools to process SwabSeq, LAMP-seq, covE-seq, and TRB-seq data. A.S.B. performed the analyses and collected results for the paper. N.L. developed the bcl2fastq\u2009+\u2009starcode processing approach with assistance from A.R.C., S.W.S., and J.S.B. J.G. assisted with technical aspects of the SwabSeq assay and in assessing the kallisto|bustools workflow results. L.L. created Fig. 1 and explored the sample index structures of LAMP-seq, TRB-seq, and SwabSeq. S.K., N.L.B., A.R.C., S.W.S. and J.S.B. developed SwabSeq. A.S.B., L.L., and L.P. wrote the manuscript. \n\nCompeting interests: A.S.B., J.G., L.L., and L.P. declare no conflicts of interest. S.K., N.L.B., A.R.C., S.W.S. and J.S.B. are employees of Ocant, which developed SwabSeq. SwabSeq is released under the terms of the Octant Covid License20.\n\nPublished - s41598-020-78942-7.pdf
Submitted - 2020.05.13.20100131v1.full.pdf
Supplemental Material - 41598_2020_78942_MOESM1_ESM.docx
Supplemental Material - 41598_2020_78942_MOESM2_ESM.docx
", "abstract": "Scalable, inexpensive, and secure testing for SARS-CoV-2 infection is crucial for control of the novel coronavirus pandemic. Recently developed highly multiplexed sequencing assays (HMSAs) that rely on high-throughput sequencing can, in principle, meet these demands, and present promising alternatives to currently used RT-qPCR-based tests. However, reliable analysis, interpretation, and clinical use of HMSAs requires overcoming several computational, statistical and engineering challenges. Using recently acquired experimental data, we present and validate a computational workflow based on kallisto and bustools, that utilizes robust statistical methods and fast, memory efficient algorithms, to quickly, accurately and reliably process high-throughput sequencing data. We show that our workflow is effective at processing data from all recently proposed SARS-CoV-2 sequencing based diagnostic tests, and is generally applicable to any diagnostic HMSA.", "date": "2020-12-10", "date_type": "published", "publication": "Scientific Reports", "volume": "10", "publisher": "Nature Publishing Group", "pagerange": "Art. No. 21759", "id_number": "CaltechAUTHORS:20200601-101849395", "issn": "2045-2322", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20200601-101849395", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "local_group": { "items": [ { "id": "COVID-19" }, { "id": "Division-of-Biology-and-Biological-Engineering" } ] }, "doi": "10.1038/s41598-020-78942-7", "pmcid": "PMC7730459", "primary_object": { "basename": "2020.05.13.20100131v1.full.pdf", "url": "https://authors.library.caltech.edu/records/r5anw-4eh31/files/2020.05.13.20100131v1.full.pdf" }, "related_objects": [ { "basename": "41598_2020_78942_MOESM1_ESM.docx", "url": "https://authors.library.caltech.edu/records/r5anw-4eh31/files/41598_2020_78942_MOESM1_ESM.docx" }, { "basename": "41598_2020_78942_MOESM2_ESM.docx", "url": "https://authors.library.caltech.edu/records/r5anw-4eh31/files/41598_2020_78942_MOESM2_ESM.docx" }, { "basename": "s41598-020-78942-7.pdf", "url": "https://authors.library.caltech.edu/records/r5anw-4eh31/files/s41598-020-78942-7.pdf" } ], "resource_type": "article", "pub_year": "2020", "author_list": "Booeshaghi, A. Sina; Lubock, Nathan B.; et el." }, { "id": "https://authors.library.caltech.edu/records/fvv0w-xy358", "eprint_id": 98067, "eprint_status": "archive", "datestamp": "2023-08-20 00:29:29", "lastmod": "2023-12-22 23:16:38", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Svensson-Valentine", "name": { "family": "Svensson", "given": "Valentine" }, "orcid": "0000-0002-9217-2330" }, { "id": "da-Veiga-Beltrame-Eduardo", "name": { "family": "da Veiga Beltrame", "given": "Eduardo" }, "orcid": "0000-0002-1529-9207" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "A curated database reveals trends in single cell transcriptomics", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 The Author(s) 2020. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. \n\nReceived: 14 October 2019; Revision received: 10 July 2020; Editorial decision: 03 August 2020; Accepted: 16 November 2020; Published: 28 November 2020. \n\nWe thank Carlos Talavera-L\u00f3pez for helpful feedback on the manuscript.\n\nCloud infrastructure was partially funded through the Google Cloud Platform research credits program. The work was partly funded by NIH U19MH114830.\n\nPublished - baaa073.pdf
Submitted - 742304.full.pdf
Supplemental Material - baaa073_supp.zip
Supplemental Material - media-1.tsv
", "abstract": "The more than 1000 single-cell transcriptomics studies that have been published to date constitute a valuable and vast resource for biological discovery. While various 'atlas' projects have collated some of the associated datasets, most questions related to specific tissue types, species or other attributes of studies require identifying papers through manual and challenging literature search. To facilitate discovery with published single-cell transcriptomics data, we have assembled a near exhaustive, manually curated database of single-cell transcriptomics studies with key information: descriptions of the type of data and technologies used, along with descriptors of the biological systems studied. Additionally, the database contains summarized information about analysis in the papers, allowing for analysis of trends in the field. As an example, we show that the number of cell types identified in scRNA-seq studies is proportional to the number of cells analysed.", "date": "2020-11-28", "date_type": "published", "publication": "Database: The Journal of Biological Databases and Curation", "volume": "2020", "publisher": "Oxford University Press", "pagerange": "Art. No. baaa073", "id_number": "CaltechAUTHORS:20190821-092511308", "issn": "1758-0463", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20190821-092511308", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "Google Cloud Platform" }, { "agency": "NIH", "grant_number": "U19MH114830" } ] }, "local_group": { "items": [ { "id": "Division-of-Biology-and-Biological-Engineering" } ] }, "doi": "10.1093/database/baaa073", "pmcid": "PMC7698659", "primary_object": { "basename": "baaa073.pdf", "url": "https://authors.library.caltech.edu/records/fvv0w-xy358/files/baaa073.pdf" }, "related_objects": [ { "basename": "742304.full.pdf", "url": "https://authors.library.caltech.edu/records/fvv0w-xy358/files/742304.full.pdf" }, { "basename": "baaa073_supp.zip", "url": "https://authors.library.caltech.edu/records/fvv0w-xy358/files/baaa073_supp.zip" }, { "basename": "media-1.tsv", "url": "https://authors.library.caltech.edu/records/fvv0w-xy358/files/media-1.tsv" } ], "resource_type": "article", "pub_year": "2020", "author_list": "Svensson, Valentine; da Veiga Beltrame, Eduardo; et el." }, { "id": "https://authors.library.caltech.edu/records/bkjds-vsh60", "eprint_id": 108356, "eprint_status": "archive", "datestamp": "2023-08-20 00:17:38", "lastmod": "2023-12-22 23:16:18", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Mehrab-Zakaria", "name": { "family": "Mehrab", "given": "Zakaria" } }, { "id": "Mobin-Jaiaid", "name": { "family": "Mobin", "given": "Jaiaid" } }, { "id": "Tahmid-Ibrahim-Asadullah", "name": { "family": "Tahmid", "given": "Ibrahim Asadullah" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Rahman-Atif", "name": { "family": "Rahman", "given": "Atif" }, "orcid": "0000-0003-1805-3971" } ] }, "title": "A faster implementation of association mapping from k-mers", "ispublished": "pub", "full_text_status": "public", "keywords": "Association mapping, Genome wide association studies (GWAS), Reference free, k-mer", "note": "\u00a9 2020 Copyright Mehrab et al. This article is distributed under the terms of the Creative Commons Attribution License (CC BY 4.0). \n\nLior Pachter, and Atif Rahman were funded in part by NIH R21 HG006583. This paper describes protocol of a method originally presented in the paper \"Association mapping from sequencing reads using k-mers\" by Atif Rahman, Ingileif Hallgr\u00edmsd\u00f3ttir, Michael Eisen and Lior Pachter, and extended in \"A faster implementation of association mapping from k-mers\" by Zakaria Mehrab, Jaiaid Mobin, Ibrahim Asadullah Tahmid and Atif Rahman.\n\nThe authors declare no competing interests.\n\nPublished - Bio-protocol3815.pdf
", "abstract": "Association mapping is the process of linking phenotypes with genotypes. In genome wide association studies (GWAS), individuals are first genotyped using microarrays or by aligning sequenced reads to reference genomes. However, both these approaches rely on reference genomes which limits their application to organisms with no or incomplete reference genomes. To address this, reference free association mapping methods have been developed. Here we present the protocol of an alignment free method for association studies which is based on counting k-mers in sequenced reads, testing for associations between k-mers and the phenotype of interest, and local assembly of the k-mers of statistical significance. The method can map associations of categorical phenotypes to sequence and structural variations without requiring prior sequencing of reference genomes.", "date": "2020-11-05", "date_type": "published", "publication": "Bio-protocol", "volume": "10", "number": "21", "publisher": "Bio-Protocol", "pagerange": "Art. No. e3815", "id_number": "CaltechAUTHORS:20210309-074448590", "issn": "2331-8325", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20210309-074448590", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R21 HG006583" } ] }, "local_group": { "items": [ { "id": "Division-of-Biology-and-Biological-Engineering" } ] }, "doi": "10.21769/bioprotoc.3815", "primary_object": { "basename": "Bio-protocol3815.pdf", "url": "https://authors.library.caltech.edu/records/bkjds-vsh60/files/Bio-protocol3815.pdf" }, "resource_type": "article", "pub_year": "2020", "author_list": "Mehrab, Zakaria; Mobin, Jaiaid; et el." }, { "id": "https://authors.library.caltech.edu/records/ampr3-ja254", "eprint_id": 105305, "eprint_status": "archive", "datestamp": "2023-08-19 22:43:51", "lastmod": "2023-12-22 23:16:42", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Gorin-G", "name": { "family": "Gorin", "given": "Gennady" }, "orcid": "0000-0001-6097-2029" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Special function methods for bursty models of transcription", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2020 Published by the American Physical Society under the terms of the Creative Commons Attribution 4.0 International license. Further distribution of this work must maintain attribution to the author(s) and the published article's title, journal citation, and DOI. \n\nReceived 4 April 2020; accepted 10 August 2020; published 31 August 2020. \n\nThe DNA, pre-mRNA, and mature mRNA used in Fig. 1(a) are derivatives of the DNA Twemoji by Twitter, Inc., used under CC-BY 4.0. The routine for computing the Taylor approximation coefficient \u03a9_(j,i) uses a function by Ben Barrowes [80], translated from the FORTRAN original by Zhang and Jin [81]. The routine for computing the Taylor series approximation to the exponential integral E\u2081 (z) is a heavily modified version of a function by Ben Barrowes [80], translated from the FORTRAN original by Zhang and Jin [81]. The subplots in supplemental Figs. 2\u20134 [38] were aligned using a function by Pekka Kumpulainen [82]. G.G. and L.P. were partially funded by NIH U19MH114830.\n\nPublished - PhysRevE.102.022409.pdf
Accepted Version - 2003.12919.pdf
Supplemental Material - PRE_SI_200701.pdf
", "abstract": "We explore a Markov model used in the analysis of gene expression, involving the bursty production of pre-mRNA, its conversion to mature mRNA, and its consequent degradation. We demonstrate that the integration used to compute the solution of the stochastic system can be approximated by the evaluation of special functions. Furthermore, the form of the special function solution generalizes to a broader class of burst distributions. In light of the broader goal of biophysical parameter inference from transcriptomics data, we apply the method to simulated data, demonstrating effective control of precision and runtime. Finally, we propose and validate a non-Bayesian approach for parameter estimation based on the characteristic function of the target joint distribution of pre-mRNA and mRNA.", "date": "2020-08", "date_type": "published", "publication": "Physical Review E", "volume": "102", "number": "2", "publisher": "American Physical Society", "pagerange": "Art. No. 022409", "id_number": "CaltechAUTHORS:20200909-153753998", "issn": "2470-0045", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20200909-153753998", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "U19MH114830" } ] }, "local_group": { "items": [ { "id": "Division-of-Biology-and-Biological-Engineering" } ] }, "doi": "10.1103/physreve.102.022409", "primary_object": { "basename": "2003.12919.pdf", "url": "https://authors.library.caltech.edu/records/ampr3-ja254/files/2003.12919.pdf" }, "related_objects": [ { "basename": "PRE_SI_200701.pdf", "url": "https://authors.library.caltech.edu/records/ampr3-ja254/files/PRE_SI_200701.pdf" }, { "basename": "PhysRevE.102.022409.pdf", "url": "https://authors.library.caltech.edu/records/ampr3-ja254/files/PhysRevE.102.022409.pdf" } ], "resource_type": "article", "pub_year": "2020", "author_list": "Gorin, Gennady and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/nn6dw-xb480", "eprint_id": 100041, "eprint_status": "archive", "datestamp": "2023-08-19 22:31:24", "lastmod": "2023-12-22 23:34:08", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Koromila-Theodora", "name": { "family": "Koromila", "given": "Theodora" }, "orcid": "0000-0001-5504-1369" }, { "id": "Gao-Fan", "name": { "family": "Gao", "given": "Fan" } }, { "id": "Iwasaki-Yasuno", "name": { "family": "Iwasaki", "given": "Yasuno" } }, { "id": "He-Peng", "name": { "family": "He", "given": "Peng" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Gergen-J-Peter", "name": { "family": "Gergen", "given": "J. Peter" } }, { "id": "Stathopoulos-A", "name": { "family": "Stathopoulos", "given": "Angelike" }, "orcid": "0000-0001-6597-2036" } ] }, "title": "Odd-paired is a pioneer-like factor that coordinates with Zelda to control gene expression in embryos", "ispublished": "pub", "full_text_status": "public", "keywords": "Drosophila melanogaster, Odd-paired (Opa), Zelda, maternal-to-zygotic transition (MZT), midblastula\ntransition (MBT), short gastrulation (sog), ChIP-seq, RNA-seq, ATAC-seq, Histone mark", "note": "\u00a9 2020 Koromila et al. This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited. \n\nReceived: 26 November 2019; Accepted: 22 July 2020; Published: 23 July 2020. \n\nWe thank Chris Rushlow and Deborah Hursh for sharing fly stocks, Igor Antoshechkin and Henry Amrhein at the Millard and Muriel Jacobs Genetics and Genomics Laboratory at the California Institute of Technology for sequencing support, the lab of Josh Dubnau for assistance with Bioanalyzer samples, David Carlson and the Institute for Advanced Computational Science at the Stony Brook University, and Susie Newcomb, Leslie Dunipace and Frank Macabenta for assistance with experiments and comments on the manuscript. This study was supported by funding from NIH R35GM118146 and R03HD097535 to AS, the Bioinformatics Resource Center at the Beckman Institute of Caltech to FG and LP, and the Stony Brook University College of Arts and Sciences to JPG. \n\nThe funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication. \n\nAuthor contributions: Theodora Koromila, Conceived the project and planned the experimental approach, performed wet experiments except ChIP-seq, oversaw computational approach, carried out quantitative analysis of imaging data, analyzed data, wrote manuscript with input and editing help from FG, YI, PH, LP and JPG.; Fan Gao, Oversaw computational approach, performed all computational analysis except normalization of ATAC-seq data for visualization of individual loci, ATAC-seq peak calling, and nucleosome signature, analyzed data, gave input and editing help for writing the manuscript.; Yasuno Iwasaki, Performed ChIP-seq experiments with support of the Caltech genomics core, conducted an initial, independent analysis of the Opa-ChIP-seq data that first identified the new 7 bp consensus binding motif for Opa, gave input and editing help for writing the manuscript.; Peng He, Oversaw computational approach, conducted normalization of ATAC-seq data for visualization of individual loci, ATAC-seq peak calling, and nucleosome signature, analyzed data, gave input and editing help for writing the manuscript.; Lior Pachter, J Peter Gergen, Gave input and editing help for writing the manuscript.; Angelike Stathopoulos, Conceived the project and planned the experimental approach, directed the project, analyzed data, wrote manuscript with input and editing help from FG, YI, PH, LP and JPG. \n\nData availability: GEO accession number SuperSeries GSE153329. SubSeries: ChIP-seq and singled-end ATAC-seq (GSE140722), and RNA-seq and paired-end ATAC-seq data access (GSE153328). The codes for RNA-seq, Opa ChIP-seq and ATAC-seq processing (alignment and peak calling) were uploaded to github: https://github.com/caltech-bioinformatics-resource-center/Stathopoulos_Lab (copy archived at https://github.com/elifesciences-publications/Stathopoulos_Lab).\n\nPublished - elife-59610-v3.pdf
Submitted - 853028.full.pdf
Supplemental Material - elife-59610-supp-v2.zip
Supplemental Material - elife-59610-transrepform-v3.docx
Supplemental Material - elife-59610-video1.mp4
", "abstract": "Pioneer factors such as Zelda (Zld) help initiate zygotic transcription in Drosophila early embryos, but whether other factors support this dynamic process is unclear. Odd-paired (Opa), a zinc-finger transcription factor expressed at cellularization, controls the transition of genes from pair-rule to segmental patterns along the anterior-posterior axis. Finding that Opa also regulates expression through enhancer sog_Distal along the dorso-ventral axis, we hypothesized Opa's role is more general. Chromatin-immunoprecipitation (ChIP-seq) confirmed its in vivo binding to sog_Distal but also identified widespread binding throughout the genome, comparable to Zld. Furthermore, chromatin assays (ATAC-seq) demonstrate that Opa, like Zld, influences chromatin accessibility genome-wide at cellularization, suggesting both are pioneer factors with common as well as distinct targets. Lastly, embryos lacking opa exhibit widespread, late patterning defects spanning both axes. Collectively, these data suggest Opa is a general timing factor and likely late-acting pioneer factor that drives a secondary wave of zygotic gene expression.", "date": "2020-07-23", "date_type": "published", "publication": "eLife", "volume": "9", "publisher": "eLife Sciences Publications", "pagerange": "Art. No. e59610", "id_number": "CaltechAUTHORS:20191125-141648000", "issn": "2050-084X", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20191125-141648000", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R35GM118146" }, { "agency": "NIH", "grant_number": "R03HD097535" }, { "agency": "Caltech Beckman Institute" }, { "agency": "Stony Brook University" } ] }, "local_group": { "items": [ { "id": "Millard-and-Muriel-Jacobs-Genetics-and-Genomics-Laboratory" }, { "id": "Division-of-Biology-and-Biological-Engineering" } ] }, "doi": "10.7554/eLife.59610", "pmcid": "PMC7417190", "primary_object": { "basename": "853028.full.pdf", "url": "https://authors.library.caltech.edu/records/nn6dw-xb480/files/853028.full.pdf" }, "related_objects": [ { "basename": "elife-59610-supp-v2.zip", "url": "https://authors.library.caltech.edu/records/nn6dw-xb480/files/elife-59610-supp-v2.zip" }, { "basename": "elife-59610-transrepform-v3.docx", "url": "https://authors.library.caltech.edu/records/nn6dw-xb480/files/elife-59610-transrepform-v3.docx" }, { "basename": "elife-59610-v3.pdf", "url": "https://authors.library.caltech.edu/records/nn6dw-xb480/files/elife-59610-v3.pdf" }, { "basename": "elife-59610-video1.mp4", "url": "https://authors.library.caltech.edu/records/nn6dw-xb480/files/elife-59610-video1.mp4" } ], "resource_type": "article", "pub_year": "2020", "author_list": "Koromila, Theodora; Gao, Fan; et el." }, { "id": "https://authors.library.caltech.edu/records/zc0e4-j1959", "eprint_id": 103638, "eprint_status": "archive", "datestamp": "2023-08-19 21:45:45", "lastmod": "2023-12-22 23:16:09", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Mao-Shunfu", "name": { "family": "Mao", "given": "Shunfu" }, "orcid": "0000-0002-8203-0507" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Tse-David-N", "name": { "family": "Tse", "given": "David" } }, { "id": "Kannan-S", "name": { "family": "Kannan", "given": "Sreeram" } } ] }, "title": "RefShannon: A genome-guided transcriptome assembler using sparse flow decomposition", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2020 Mao et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. \n\nReceived: October 18, 2019; Accepted: April 24, 2020; Published: June 2, 2020. \n\nThe authors would like to thank Joseph Hui and Kayvon Mazooji for their support at the initial stage of the project. \n\nData Availability Statement: All relevant data are within the manuscript and its Supporting Information files. \n\nThis project is funded by NIH award 1R01HG008164, NSF CCF-1651236, and NSF CIF-1703403. \n\nThe authors have declared that no competing interests exist. \n\nAuthor Contributions: \nConceptualization: Lior Pachter, David Tse, Sreeram Kannan.\nData curation: Sreeram Kannan.\nFormal analysis: Shunfu Mao.\nFunding acquisition: Lior Pachter, David Tse, Sreeram Kannan.\nInvestigation: Shunfu Mao, Lior Pachter, David Tse, Sreeram Kannan.\nMethodology: Shunfu Mao, Lior Pachter, David Tse, Sreeram Kannan.\nProject administration: Lior Pachter, David Tse, Sreeram Kannan.\nSoftware: Shunfu Mao, Sreeram Kannan.\nSupervision: Sreeram Kannan.\nValidation: Shunfu Mao.\nVisualization: Shunfu Mao.\nWriting \u2013 original draft: Shunfu Mao.\nWriting \u2013 review & editing: Shunfu Mao, Sreeram Kannan.\n\nPublished - journal.pone.0232946.pdf
Supplemental Material - journal.pone.0232946.s001.pdf
Supplemental Material - journal.pone.0232946.s002.pdf
Supplemental Material - journal.pone.0232946.s003.pdf
Supplemental Material - journal.pone.0232946.s004.pdf
Supplemental Material - journal.pone.0232946.s005.pdf
Supplemental Material - journal.pone.0232946.s006.pdf
Supplemental Material - journal.pone.0232946.s007.pdf
", "abstract": "High throughput sequencing of RNA (RNA-Seq) has become a staple in modern molecular biology, with applications not only in quantifying gene expression but also in isoform-level analysis of the RNA transcripts. To enable such an isoform-level analysis, a transcriptome assembly algorithm is utilized to stitch together the observed short reads into the corresponding transcripts. This task is complicated due to the complexity of alternative splicing - a mechanism by which the same gene may generate multiple distinct RNA transcripts. We develop a novel genome-guided transcriptome assembler, RefShannon, that exploits the varying abundances of the different transcripts, in enabling an accurate reconstruction of the transcripts. Our evaluation shows RefShannon is able to improve sensitivity effectively (up to 22%) at a given specificity in comparison with other state-of-the-art assemblers. RefShannon is written in Python and is available from Github (https://github.com/shunfumao/RefShannon).", "date": "2020-06-02", "date_type": "published", "publication": "PLoS ONE", "volume": "15", "number": "6", "publisher": "Public Library of Science", "pagerange": "Art. No. e0232946", "id_number": "CaltechAUTHORS:20200602-124021279", "issn": "1932-6203", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20200602-124021279", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "1R01HG008164" }, { "agency": "NSF", "grant_number": "CCF-1651236" }, { "agency": "NSF", "grant_number": "CIF-1703403" } ] }, "local_group": { "items": [ { "id": "Division-of-Biology-and-Biological-Engineering" } ] }, "doi": "10.1371/journal.pone.0232946", "pmcid": "PMC7266320", "primary_object": { "basename": "journal.pone.0232946.s002.pdf", "url": "https://authors.library.caltech.edu/records/zc0e4-j1959/files/journal.pone.0232946.s002.pdf" }, "related_objects": [ { "basename": "journal.pone.0232946.s003.pdf", "url": "https://authors.library.caltech.edu/records/zc0e4-j1959/files/journal.pone.0232946.s003.pdf" }, { "basename": "journal.pone.0232946.s004.pdf", "url": "https://authors.library.caltech.edu/records/zc0e4-j1959/files/journal.pone.0232946.s004.pdf" }, { "basename": "journal.pone.0232946.s005.pdf", "url": "https://authors.library.caltech.edu/records/zc0e4-j1959/files/journal.pone.0232946.s005.pdf" }, { "basename": "journal.pone.0232946.s006.pdf", "url": "https://authors.library.caltech.edu/records/zc0e4-j1959/files/journal.pone.0232946.s006.pdf" }, { "basename": "journal.pone.0232946.s007.pdf", "url": "https://authors.library.caltech.edu/records/zc0e4-j1959/files/journal.pone.0232946.s007.pdf" }, { "basename": "journal.pone.0232946.pdf", "url": "https://authors.library.caltech.edu/records/zc0e4-j1959/files/journal.pone.0232946.pdf" }, { "basename": "journal.pone.0232946.s001.pdf", "url": "https://authors.library.caltech.edu/records/zc0e4-j1959/files/journal.pone.0232946.s001.pdf" } ], "resource_type": "article", "pub_year": "2020", "author_list": "Mao, Shunfu; Pachter, Lior; et el." }, { "id": "https://authors.library.caltech.edu/records/2pyfk-v8764", "eprint_id": 97957, "eprint_status": "archive", "datestamp": "2023-08-19 21:33:43", "lastmod": "2023-12-22 23:16:40", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Svensson-Valentine", "name": { "family": "Svensson", "given": "Valentine" }, "orcid": "0000-0002-9217-2330" }, { "id": "Gayoso-Adam", "name": { "family": "Gayoso", "given": "Adam" }, "orcid": "0000-0001-9537-0845" }, { "id": "Yosef-Nir", "name": { "family": "Yosef", "given": "Nir" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Interpretable factor models of single-cell RNA-seq via variational autoencoders", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2020 The Author(s). Published by Oxford University Press.\nThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. \n\nReceived: 13 September 2019; Revision received: 03 February 2020; Accepted: 20 February 2020; Published: 16 March 2020. \n\nWe thank Eduardo da Veiga Beltrame and Romain Lopez for helpful feedback on the manuscript. Sina Booeshaghi provided useful comments on the LDVAE software. Additionally, we thank the users of scVI who provided helpful discussion about the implementation on Github.\n\nFunding:\nThis work was supported by the National Institutes of Health [U19MH114830 to V.S. and L.P.]; and Error! Hyperlink reference not valid. [CZF2019-002454 to A.G. and N.Y.].\n\nConflict of Interest: none declared.\n\nPublished - btaa169.pdf
Submitted - 737601.full.pdf
Supplemental Material - btaa169_supplementary_data.zip
", "abstract": "Motivation: Single-cell RNA-seq makes possible the investigation of variability in gene expression among cells, and dependence of variation on cell type. Statistical inference methods for such analyses must be scalable, and ideally interpretable. \n\nResults: We present an approach based on a modification of a recently published highly scalable variational autoencoder framework that provides interpretability without sacrificing much accuracy. We demonstrate that our approach enables identification of gene programs in massive datasets. Our strategy, namely the learning of factor models with the auto-encoding variational Bayes framework, is not domain specific and may be useful for other applications. \n\nAvailability and implementation: The factor model is available in the scVI package hosted at https://github.com/YosefLab/scVI/.", "date": "2020-06", "date_type": "published", "publication": "Bioinformatics", "volume": "36", "number": "11", "publisher": "Oxford University Press", "pagerange": "3418-3421", "id_number": "CaltechAUTHORS:20190816-135915873", "issn": "1367-4803", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20190816-135915873", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "U19MH114830" }, { "agency": "Chan Zuckerberg Foundation", "grant_number": "CZF2019-002454" } ] }, "local_group": { "items": [ { "id": "Division-of-Biology-and-Biological-Engineering" } ] }, "doi": "10.1093/bioinformatics/btaa169", "pmcid": "PMC7267837", "primary_object": { "basename": "737601.full.pdf", "url": "https://authors.library.caltech.edu/records/2pyfk-v8764/files/737601.full.pdf" }, "related_objects": [ { "basename": "btaa169.pdf", "url": "https://authors.library.caltech.edu/records/2pyfk-v8764/files/btaa169.pdf" }, { "basename": "btaa169_supplementary_data.zip", "url": "https://authors.library.caltech.edu/records/2pyfk-v8764/files/btaa169_supplementary_data.zip" } ], "resource_type": "article", "pub_year": "2020", "author_list": "Svensson, Valentine; Gayoso, Adam; et el." }, { "id": "https://authors.library.caltech.edu/records/akk2e-ngb36", "eprint_id": 96201, "eprint_status": "archive", "datestamp": "2023-08-19 20:00:05", "lastmod": "2023-12-22 23:16:45", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Gorin-G", "name": { "family": "Gorin", "given": "Gennady" }, "orcid": "0000-0001-6097-2029" }, { "id": "Svensson-V", "name": { "family": "Svensson", "given": "Valentine" }, "orcid": "0000-0002-9217-2330" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "RNA velocity and protein acceleration from single-cell multiomics experiments", "ispublished": "pub", "full_text_status": "public", "keywords": "Protein acceleration, Protein velocity, RNA velocity, Transcriptomics, Multiomics, Bioinformatics,\nComputational biology", "note": "\u00a9 2020 The Author(s). Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. \n\nReceived 09 July 2019; Accepted 24 January 2020; Published 18 February 2020. \n\nWe thank the authors of Mimitou et al. [5] for providing velocyto pipeline outputs for ECCITE-seq datasets. \n\nReview history: The review history is available as Additional file 2. \n\nPeer review information: Barbara Cheifet was the primary editor of this article and managed its editorial process and peer review in collaboration with the rest of the editorial team. \n\nGG, VS, and LP were partially funded by NIH U19MH114830. \n\nAvailability of data and materials: CITE-seq RNA and protein data were acquired from Gene Expression Omnibus samples GSM2695381 and GSM2695382 [31]. REAP-seq RNA and protein data were acquired from GSM2685238 and GSM2685243 [32]. ECCITE-seq control protein data were acquired from GSM3596096 [33]. ECCITE-seq CTCL protein data were acquired from GSM3596101 [33]. Due to patient privacy concerns, raw ECCITE-seq RNA data (GSM3596095 and GSM3596100) were not available, and the gene count matrices generated by velocyto were acquired by personal request. 10X Genomics 1k and 10k PBMC datasets were acquired from the 10X Genomics website [20, 21]. \n\nThe datasets generated during this study are available on figshare [27,28,29,30].The Jupyter scripts used to analyze them are available on GitHub [26]. The protaccel Python package is available for installation through PyPi [24], and may be acquired as a script from GitHub [26] or Zenodo [34] under the BSD-2-Clause license. \n\nEthics declarations: Ethics approval and consent to participate: Not applicable. \n\nConsent for publication: Not applicable. \n\nThe authors declare that they have no competing interests.\n\nPublished - s13059-020-1945-3.pdf
Submitted - 658401.full.pdf
Supplemental Material - 13059_2020_1945_MOESM1_ESM.docx
Supplemental Material - 13059_2020_1945_MOESM2_ESM.docx
", "abstract": "The simultaneous quantification of protein and RNA makes possible the inference of past, present, and future cell states from single experimental snapshots. To enable such temporal analysis from multimodal single-cell experiments, we introduce an extension of the RNA velocity method that leverages estimates of unprocessed transcript and protein abundances to extrapolate cell states. We apply the model to six datasets and demonstrate consistency among cell landscapes and phase portraits. The analysis software is available as the protaccel Python package.", "date": "2020-02-18", "date_type": "published", "publication": "Genome Biology", "volume": "21", "publisher": "BioMed Central", "pagerange": "Art. No. 39", "id_number": "CaltechAUTHORS:20190607-122759859", "issn": "1465-6906", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20190607-122759859", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "U19MH114830" } ] }, "local_group": { "items": [ { "id": "Division-of-Biology-and-Biological-Engineering" } ] }, "doi": "10.1186/s13059-020-1945-3", "pmcid": "PMC7029606", "primary_object": { "basename": "13059_2020_1945_MOESM1_ESM.docx", "url": "https://authors.library.caltech.edu/records/akk2e-ngb36/files/13059_2020_1945_MOESM1_ESM.docx" }, "related_objects": [ { "basename": "13059_2020_1945_MOESM2_ESM.docx", "url": "https://authors.library.caltech.edu/records/akk2e-ngb36/files/13059_2020_1945_MOESM2_ESM.docx" }, { "basename": "658401.full.pdf", "url": "https://authors.library.caltech.edu/records/akk2e-ngb36/files/658401.full.pdf" }, { "basename": "s13059-020-1945-3.pdf", "url": "https://authors.library.caltech.edu/records/akk2e-ngb36/files/s13059-020-1945-3.pdf" } ], "resource_type": "article", "pub_year": "2020", "author_list": "Gorin, Gennady; Svensson, Valentine; et el." }, { "id": "https://authors.library.caltech.edu/records/m07dw-3jq86", "eprint_id": 90524, "eprint_status": "archive", "datestamp": "2023-08-22 03:23:36", "lastmod": "2023-10-23 16:59:58", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Gehring-J", "name": { "family": "Gehring", "given": "Jase" }, "orcid": "0000-0002-3894-9495" }, { "id": "Park-Jong-Hwee", "name": { "family": "Park", "given": "Jong Hwee" } }, { "id": "Chen-Sisi", "name": { "family": "Chen", "given": "Sisi" }, "orcid": "0000-0001-9448-9713" }, { "id": "Thomson-M-W", "name": { "family": "Thomson", "given": "Matthew" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Highly multiplexed single-cell RNA-seq by DNA oligonucleotide tagging of cellular proteins", "ispublished": "pub", "full_text_status": "public", "keywords": "DNA; Molecular biology; Sequencing", "note": "\u00a9 2019 Nature Publishing Group. \n\nReceived 07 August 2018; Accepted 27 November 2019; Published 23 December 2019. \n\nWe thank Z. Gartner and C. McGinnis for helpful feedback regarding the ClickTag protocol and V. Svensson for suggestions regarding analysis of multiplexed datasets. Thanks to P. Melsted and S. Booeshaghi for developing the 'kallisto | bustools' functions used in the preprocessing workflow and to P. Rivaud for assistance with 10x data processing. Additional support was provided by the the Caltech Bioinformatics Resource Center and the Single Cell Profiling and Engineering Center (SPEC) in the Beckman Institute at Caltech. \n\nData availability: Sequencing data from these experiments can be obtained from CaltechDATA at https://doi.org/10.22002/D1.1311. \n\nCode availability: Code and tutorials for the kITE demultiplexing workflow can be found at https://www.kallistobus.tools/kite_tutorial.html. Python notebooks used to process data and generate figures are available on GitHub at https://github.com/pachterlab/GPCTP_2019. The same GitHub repository also contains a fully reproducible reanalysis using 'kallisto | bustools' transcript alignments and a Google Colab notebook. \n\nAuthor Contributions: J.G. conceived and developed the ClickTag multiplexing strategy. J.G., J.H.P. and S.C. designed the scRNA-seq experiments and J.G. and J.H.P. performed the experiments. J.H.P. performed all tissue culture operations and J.G. developed the kITE demultiplexing workflow and analyzed the scRNA-seq data. J.G., J.H.P., S.C, M.T. and L.P. contributed to the interpretation of the results and writing of the manuscript. \n\nCompeting interests: J.G., L.P., S.C. and J.H.P. are listed as co-inventors on a patent application related to this work (US patent application 16/296,075).\n\nSubmitted - 315333.full.pdf
Supplemental Material - 41587_2019_372_MOESM1_ESM.pdf
Supplemental Material - 41587_2019_372_MOESM2_ESM.pdf
Supplemental Material - 41587_2019_372_MOESM3_ESM.xlsx
Supplemental Material - 41587_2019_372_MOESM4_ESM.xlsx
", "abstract": "We describe a universal sample multiplexing method for single-cell RNA sequencing in which fixed cells are chemically labeled by attaching identifying DNA oligonucleotides to cellular proteins. Analysis of a 96-plex perturbation experiment revealed changes in cell population structure and transcriptional states that cannot be discerned from bulk measurements, establishing an efficient method for surveying cell populations from large experiments or clinical samples with the depth and resolution of single-cell RNA sequencing.", "date": "2020-01", "date_type": "published", "publication": "Nature Biotechnology", "volume": "38", "number": "1", "publisher": "Nature Publishing Group", "pagerange": "35-38", "id_number": "CaltechAUTHORS:20181030-145533155", "issn": "1087-0156", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20181030-145533155", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "Caltech Beckman Institute" } ] }, "doi": "10.1038/s41587-019-0372-z", "primary_object": { "basename": "315333.full.pdf", "url": "https://authors.library.caltech.edu/records/m07dw-3jq86/files/315333.full.pdf" }, "related_objects": [ { "basename": "41587_2019_372_MOESM1_ESM.pdf", "url": "https://authors.library.caltech.edu/records/m07dw-3jq86/files/41587_2019_372_MOESM1_ESM.pdf" }, { "basename": "41587_2019_372_MOESM2_ESM.pdf", "url": "https://authors.library.caltech.edu/records/m07dw-3jq86/files/41587_2019_372_MOESM2_ESM.pdf" }, { "basename": "41587_2019_372_MOESM3_ESM.xlsx", "url": "https://authors.library.caltech.edu/records/m07dw-3jq86/files/41587_2019_372_MOESM3_ESM.xlsx" }, { "basename": "41587_2019_372_MOESM4_ESM.xlsx", "url": "https://authors.library.caltech.edu/records/m07dw-3jq86/files/41587_2019_372_MOESM4_ESM.xlsx" } ], "resource_type": "article", "pub_year": "2020", "author_list": "Gehring, Jase; Park, Jong Hwee; et el." }, { "id": "https://authors.library.caltech.edu/records/x39ya-h3a68", "eprint_id": 100522, "eprint_status": "archive", "datestamp": "2023-08-19 18:56:48", "lastmod": "2023-10-18 20:58:54", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Gao-Fan", "name": { "family": "Gao", "given": "Fan" } }, { "id": "da-Veiga-Beltrame-E", "name": { "family": "da Veiga Beltrame", "given": "Eduardo" }, "orcid": "0000-0002-1529-9207" }, { "id": "Gehring-J-A", "name": { "family": "Gehring", "given": "Jase A." } }, { "id": "Hjoerleifsson-K-E-E", "name": { "family": "Hjoerleifsson", "given": "Kristin E. Edljarn" } }, { "id": "Lu-Lambda", "name": { "family": "Lu", "given": "Lambda" }, "orcid": "0000-0002-7092-9427" }, { "id": "Melsted-P", "name": { "family": "Melsted", "given": "Paull" }, "orcid": "0000-0002-8418-6724" }, { "id": "Ntranos-V", "name": { "family": "Ntranos", "given": "Vasilis" }, "orcid": "0000-0002-2477-0670" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Svensson-V", "name": { "family": "Svensson", "given": "Valentine" }, "orcid": "0000-0002-9217-2330" } ] }, "title": "The BUS Format for Single-Cell RNA-Seq Processing and Analysis", "ispublished": "pub", "full_text_status": "restricted", "note": "\u00a9 2019 Association of Biomolecular Resource Facilities. \n\nPresenter: Fan Gao. \n\nThis is joint work with Eduardo da Veiga Beltrame, Jase A. Gehring, Kristj\u00a1n E. Edljarn Hjoerleifsson, Lambda Lu, Paull Melsted, Vasilis Ntranos, Lior Pachter, and Valentine Svensson.", "abstract": "The Barcode-UMI-Set format (BUS) is a recently developed format for representing pseudoalignments of reads from single-cell RNA-seq experiments. The format can be used with most single-cell RNA-seq technologies, can be generated efficiently, and allows for development of modular and robust workflows for processing and analysis of single-cell RNA-seq reads. To demonstrate the utility of BUS, we processed 381,992,071 single-cell RNA-Seq reads from a 1:1 mixture of fresh frozen human cells (HEK293T) and mouse cells (NIH3T3) produced with 10x technology and hosted on the 10x Genomics website. The generation of BUS format using a new command in the kallisto program took 984 seconds for this data (in comparison with 55,745 seconds with the 10x Genomics CellRanger software). I will present results showing that this workflow not only produces comparable results to the existing standard workflow, but is flexible and useful for many other applications.", "date": "2019-12", "date_type": "published", "publication": "Journal of Biomolecular Techniques", "volume": "30", "number": "S1", "publisher": "Association of Biomolecular Resource Facilities", "pagerange": "S62", "id_number": "CaltechAUTHORS:20200106-081503232", "issn": "1524-0215", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20200106-081503232", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "pmcid": "PMC6938108", "resource_type": "article", "pub_year": "2019", "author_list": "Gao, Fan; da Veiga Beltrame, Eduardo; et el." }, { "id": "https://authors.library.caltech.edu/records/jk166-h3s87", "eprint_id": 91273, "eprint_status": "archive", "datestamp": "2023-08-19 18:37:24", "lastmod": "2023-10-20 22:15:19", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Melsted-P", "name": { "family": "Melsted", "given": "P\u00e1ll" }, "orcid": "0000-0002-8418-6724" }, { "id": "Ntranos-V", "name": { "family": "Ntranos", "given": "Vasilis" }, "orcid": "0000-0002-2477-0670" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Barcode, UMI, Set format and BUStools", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2019 The Author(s). Published by Oxford University Press.\nThis article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) \n\nReceived: 26 November 2018; Revision Received: 15 February 2019; Accepted: 11 April 2019; Published: 09 May 2019. \n\nWe thank Fan Gao for helping with the benchmarking of 'kallisto bus' and BUStools. Valentine Svensson provided valuable suggestions, and we relied on his compilation of scRNA-seq read encodings (Svensson et al. 2017). Jase Gehring, Lynn Yi and Tina Wang provided valuable feedback on an initial kallisto-based scRNA-seq workflow, which motivated the development of the BUS format.\n\nPublished - btz279.pdf
Submitted - 472571.full.pdf
Supplemental Material - btz279_supplmentary.pdf
", "abstract": "We introduce the Barcode-UMI-Set format (BUS) for representing pseudoalignments of reads from single-cell RNA-seq experiments. The format can be used with all single-cell RNA-seq technologies, and we show that BUS files can be efficiently generated. BUStools is a suite of tools for working with BUS files and facilitates rapid quantification and analysis of single-cell RNA-seq data. The BUS format therefore makes possible the development of modular, technology-specific and robust workflows for single-cell RNA-seq analysis.", "date": "2019-11-01", "date_type": "published", "publication": "Bioinformatics", "volume": "35", "number": "21", "publisher": "Oxford University Press", "pagerange": "4472-4473", "id_number": "CaltechAUTHORS:20181128-093526289", "issn": "1367-4803", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20181128-093526289", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "doi": "10.1093/bioinformatics/btz279", "primary_object": { "basename": "btz279_supplmentary.pdf", "url": "https://authors.library.caltech.edu/records/jk166-h3s87/files/btz279_supplmentary.pdf" }, "related_objects": [ { "basename": "472571.full.pdf", "url": "https://authors.library.caltech.edu/records/jk166-h3s87/files/472571.full.pdf" }, { "basename": "btz279.pdf", "url": "https://authors.library.caltech.edu/records/jk166-h3s87/files/btz279.pdf" } ], "resource_type": "article", "pub_year": "2019", "author_list": "Melsted, P\u00e1ll; Ntranos, Vasilis; et el." }, { "id": "https://authors.library.caltech.edu/records/e3g77-5qr43", "eprint_id": 99325, "eprint_status": "archive", "datestamp": "2023-08-22 02:44:08", "lastmod": "2023-10-18 18:13:57", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Kim-Dong-Wook", "name": { "family": "Kim", "given": "Dong-Wook" }, "orcid": "0000-0002-5497-5853" }, { "id": "Yao-Zizhen", "name": { "family": "Yao", "given": "Zizhen" }, "orcid": "0000-0002-9361-5607" }, { "id": "Graybuck-Lucas-T", "name": { "family": "Graybuck", "given": "Lucas T." }, "orcid": "0000-0002-8814-6818" }, { "id": "Kim-Tae-Kyung", "name": { "family": "Kim", "given": "Tae Kyung" } }, { "id": "Nguyen-Thuc-Nghi", "name": { "family": "Nguyen", "given": "Thuc Nghi" } }, { "id": "Smith-Kimberly-A", "name": { "family": "Smith", "given": "Kimberly A." } }, { "id": "Fong-Olivia", "name": { "family": "Fong", "given": "Olivia" } }, { "id": "Yi-Lynn", "name": { "family": "Yi", "given": "Lynn" }, "orcid": "0000-0003-4575-0158" }, { "id": "Koulena-Noushin", "name": { "family": "Koulena", "given": "Noushin" }, "orcid": "0000-0002-9419-5712" }, { "id": "Pierson-Nico-G", "name": { "family": "Pierson", "given": "Nico" }, "orcid": "0000-0002-2451-0633" }, { "id": "Shah-Sheel", "name": { "family": "Shah", "given": "Sheel" } }, { "id": "Lo-Liching", "name": { "family": "Lo", "given": "Liching" } }, { "id": "Pool-Allan-Hermann", "name": { "family": "Pool", "given": "Allan-Hermann" }, "orcid": "0000-0002-0811-9861" }, { "id": "Oka-Yuki", "name": { "family": "Oka", "given": "Yuki" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Cai-Long", "name": { "family": "Cai", "given": "Long" }, "orcid": "0000-0002-7154-5361" }, { "id": "Tasic-Bosiljka", "name": { "family": "Tasic", "given": "Bosiljka" }, "orcid": "0000-0002-6861-4506" }, { "id": "Zeng-Hongkui", "name": { "family": "Zeng", "given": "Hongkui" }, "orcid": "0000-0002-0326-5878" }, { "id": "Anderson-D-J", "name": { "family": "Anderson", "given": "David J." }, "orcid": "0000-0001-6175-3872" } ] }, "title": "Multimodal Analysis of Cell Types in a Hypothalamic Node Controlling Social Behavior", "ispublished": "pub", "full_text_status": "public", "keywords": "hypothalamus; cell types; VMH; estrogen receptor; social behavior; single-cell RNA sequencing; aggression; mating; sexual dimorphism; metabolism", "note": "\u00a9 2019 Elsevier Inc. \n\nReceived 15 January 2019, Revised 28 July 2019, Accepted 20 September 2019, Available online 17 October 2019. \n\nWe thank A. Jones and C. Koch for support at the Allen Institute for Brain Sciences during the writing of this manuscript, J.-S. Chang for cell counting, Y. Huang for genotyping, G. Mancuso for administrative assistance, C. Chiu for lab management, S. Diamond for assistance with FACS, the Single Cell Profiling and Engineering Center (SPEC) in the Beckman Institute at Caltech for initial help for 10x scRNA-seq experiments, S. Pease for assistance with transgenic mouse strains, J. Costanza for mouse colony management, members of the Anderson laboratory for helpful comments on this project, and an anonymous reviewer for suggesting social fear testing in group-housed mice. This work was supported by US National Institutes of Health (NIH) BRAIN Initiative grants U01MH105982 and U19MH114830 to H.Z. and D.J.A. and NIH grants MH070053 and TR01 OD024686 to D.J.A. and L.C., respectively. D.-W.K. was supported by a Howard Hughes Medical Institute International Student Research Fellowship. D.J.A. is an investigator of the Howard Hughes Medical Institute. \n\nAuthor Contributions: D.-W.K., B.T., H.Z., and D.J.A. contributed to the study design. D.-W.K. performed most of the experiments. T.K.K. and K.A.S. prepared sequencing libraries for SMART-seq scRNA-seq. L.T.G and O.F. contributed data visualization. D.-W.K. and T.N.N performed Retro-seq experiments. D.-W.K., Z.Y, L.T.G, L.Y., and L.P. analyzed the scRNA-seq data. D.-W.K. and N.K. performed seqFISH experiments. D.-W.K., N.P., S.S., and L.C. analyzed the seqFISH data. D.-W.K. and L.L. performed retrograde labeling with c-fos immunohistochemistry experiments. A.-H.P. and Y.O. developed the tissue preparation protocols for 10x Act-seq. D.J.A. supervised the project. D.-W.K. and D.J.A. wrote the manuscript with contributions from B.T. and H.Z. All authors discussed and commented on the manuscript. \n\nThe authors declare no competing interests.\n\nAccepted Version - nihms-1629366.pdf
Supplemental Material - 1-s2.0-S0092867419310712-mmc1.xlsx
Supplemental Material - 1-s2.0-S0092867419310712-mmc2.xlsx
Supplemental Material - 1-s2.0-S0092867419310712-mmc3.xlsx
", "abstract": "The ventrolateral subdivision of the ventromedial hypothalamus (VMHvl) contains \u223c4,000 neurons that project to multiple targets and control innate social behaviors including aggression and mounting. However, the number of cell types in VMHvl and their relationship to connectivity and behavioral function are unknown. We performed single-cell RNA sequencing using two independent platforms\u2014SMART-seq (\u223c4,500 neurons) and 10x (\u223c78,000 neurons)\u2014and investigated correspondence between transcriptomic identity and axonal projections or behavioral activation, respectively. Canonical correlation analysis (CCA) identified 17 transcriptomic types (T-types), including several sexually dimorphic clusters, the majority of which were validated by seqFISH. Immediate early gene analysis identified T-types exhibiting preferential responses to intruder males versus females but only rare examples of behavior-specific activation. Unexpectedly, many VMHvl T-types comprise a mixed population of neurons with different projection target preferences. Overall our analysis revealed that, surprisingly, few VMHvl T-types exhibit a clear correspondence with behavior-specific activation and connectivity.", "date": "2019-10-17", "date_type": "published", "publication": "Cell", "volume": "179", "number": "3", "publisher": "Cell Press", "pagerange": "713-728", "id_number": "CaltechAUTHORS:20191017-094121433", "issn": "0092-8674", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20191017-094121433", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "U01MH105982" }, { "agency": "NIH", "grant_number": "U19MH114830" }, { "agency": "NIH", "grant_number": "MH070053" }, { "agency": "NIH", "grant_number": "TR01 OD024686" }, { "agency": "Howard Hughes Medical Institute (HHMI)" } ] }, "local_group": { "items": [ { "id": "Tianqiao-and-Chrissy-Chen-Institute-for-Neuroscience" } ] }, "doi": "10.1016/j.cell.2019.09.020", "pmcid": "PMC7534821", "primary_object": { "basename": "1-s2.0-S0092867419310712-mmc1.xlsx", "url": "https://authors.library.caltech.edu/records/e3g77-5qr43/files/1-s2.0-S0092867419310712-mmc1.xlsx" }, "related_objects": [ { "basename": "1-s2.0-S0092867419310712-mmc2.xlsx", "url": "https://authors.library.caltech.edu/records/e3g77-5qr43/files/1-s2.0-S0092867419310712-mmc2.xlsx" }, { "basename": "1-s2.0-S0092867419310712-mmc3.xlsx", "url": "https://authors.library.caltech.edu/records/e3g77-5qr43/files/1-s2.0-S0092867419310712-mmc3.xlsx" }, { "basename": "nihms-1629366.pdf", "url": "https://authors.library.caltech.edu/records/e3g77-5qr43/files/nihms-1629366.pdf" } ], "resource_type": "article", "pub_year": "2019", "author_list": "Kim, Dong-Wook; Yao, Zizhen; et el." }, { "id": "https://authors.library.caltech.edu/records/gyqah-rvc67", "eprint_id": 101174, "eprint_status": "archive", "datestamp": "2023-08-19 18:09:11", "lastmod": "2023-10-19 22:30:24", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Wright-M", "name": { "family": "Wright", "given": "Matthew" } }, { "id": "Goin-D-E", "name": { "family": "Goin", "given": "Dana" } }, { "id": "Smed-M-K", "name": { "family": "Smed", "given": "Mette" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Nelson-J-L", "name": { "family": "Nelson", "given": "J. Lee" } }, { "id": "Jewell-N-P", "name": { "family": "Jewell", "given": "Nicholas" } }, { "id": "Olsen-J", "name": { "family": "Olsen", "given": "J\u00f8rn" } }, { "id": "Hetland-M-L", "name": { "family": "Hetland", "given": "Merete Lund" } }, { "id": "Zoffmann-V", "name": { "family": "Zoffmann", "given": "Vibeke" } }, { "id": "Jawaheer-D", "name": { "family": "Jawaheer", "given": "Damini" } } ] }, "title": "Investigating the Post-Partum Flare in Rheumatoid Arthritis Using Transcriptome Analysis", "ispublished": "pub", "full_text_status": "restricted", "note": "\u00a9 2019 American College of Rheumatology. \n\nIssue Online: 29 October 2019; Version of Record online: 29 October 2019. \n\nDisclosure: M. Wright , None; D. Goin , None; M. Smed , None; L. Pachter , None; J. Nelson , None; N. Jewell , None; J. Olsen , None; M. Lund Hetland , Abbvie, 2, AbbVie, 2, Biogen, 2, BMS, 2, CellTrion, 2, 9, MSD, 2, Novartis, 2, Orion, 2, Pfi zer, 2, Samsung, 2, UCB, 2; V. Zoffmann , None; D. Jawaheer , None.", "abstract": "Women with Rheumatoid arthritis (RA) tend to have a predictable fl are of disease activity in the months after childbirth. The mechanism(s) underlying this post-partum fl are are as yet unknown. Using our pregnancy cohort, we (a) examined gene expression changes associated with a fl are of RA disease activity post- partum, (b) determined how those changes compare to post- partum changes observed among healthy women, and (c) examined whether expression profi les by 3 months post- partum differed from those before pregnancy.", "date": "2019-10", "date_type": "published", "publication": "Arthritis and Rheumatology", "volume": "71", "number": "S10", "publisher": "Wiley", "pagerange": "Art. No. 1940", "id_number": "CaltechAUTHORS:20200206-130336877", "issn": "2326-5191", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20200206-130336877", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "doi": "10.1002/art.41108", "resource_type": "article", "pub_year": "2019", "author_list": "Wright, Matthew; Goin, Dana; et el." }, { "id": "https://authors.library.caltech.edu/records/rcg7y-rez41", "eprint_id": 101173, "eprint_status": "archive", "datestamp": "2023-08-19 18:09:04", "lastmod": "2023-10-19 22:30:19", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Pathi-A", "name": { "family": "Pathi", "given": "Amogh" } }, { "id": "Smed-M-K", "name": { "family": "Smed", "given": "Mette" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Purdom-E", "name": { "family": "Purdom", "given": "Elizabeth" } }, { "id": "Wright-M", "name": { "family": "Wright", "given": "Matthew" } }, { "id": "Jewell-N-P", "name": { "family": "Jewell", "given": "Nicholas" } }, { "id": "Nelson-J-L", "name": { "family": "Nelson", "given": "J. Lee" } }, { "id": "Olsen-J", "name": { "family": "Olsen", "given": "J\u00f8rn" } }, { "id": "Hetland-M-L", "name": { "family": "Hetland", "given": "Merete Lund" } }, { "id": "Zoffmann-V", "name": { "family": "Zoffmann", "given": "Vibeke" } }, { "id": "Jawaheer-D", "name": { "family": "Jawaheer", "given": "Damini" } } ] }, "title": "The Pre-pregnancy Rheumatoid Arthritis Gene Expression Signature Correlates with Improvement or Worsening of Disease Activity During Pregnancy: A Pilot Study", "ispublished": "pub", "full_text_status": "restricted", "note": "\u00a9 2019 American College of Rheumatology. \n\nIssue Online: 29 October 2019; Version of Record online: 29 October 2019. \n\nDisclosure: A. Pathi , None; M. Smed , None; L. Pachter , None; E. Purdom , None; M. Wright , None; N. Jewell, None; J. Nelson , None; J. Olsen , None; M. Lund Hetland , Abbvie, 2, AbbVie, 2, Biogen, 2, BMS, 2, CellTrion, 2, 9, MSD, 2, Novartis, 2, Orion, 2, Pfizer, 2, Samsung, 2, UCB, 2; V. Zoffmann , None; D. Jawaheer , None.", "abstract": "Pregnancy is known to induce a natural improvement of Rheumatoid Arthritis (RA) symptoms in 50- 75% of patients as gestation progresses. However, the underlying mechanisms are not well understood and no biomarkers have been identified that predict whether a woman will improve or worsen during pregnancy. In this study, we aimed to identify RA- associated pre- pregnancy gene expression signatures to determine if they correlated with the subsequent improvement or worsening of RA during pregnancy.", "date": "2019-10", "date_type": "published", "publication": "Arthritis and Rheumatology", "volume": "71", "number": "S10", "publisher": "Wiley", "pagerange": "Art. No. 1938", "id_number": "CaltechAUTHORS:20200206-125453251", "issn": "2326-5191", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20200206-125453251", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "doi": "10.1002/art.41108", "resource_type": "article", "pub_year": "2019", "author_list": "Pathi, Amogh; Smed, Mette; et el." }, { "id": "https://authors.library.caltech.edu/records/b2hv9-6t971", "eprint_id": 96223, "eprint_status": "archive", "datestamp": "2023-08-22 02:28:00", "lastmod": "2023-10-20 20:59:47", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "McCurdy-S-R", "name": { "family": "McCurdy", "given": "Shannon" }, "orcid": "0000-0001-5555-4156" }, { "id": "Molinaro-A-M", "name": { "family": "Molinaro", "given": "Annette" }, "orcid": "0000-0002-9854-7404" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Factor analysis for survival time prediction with informative censoring and diverse covariates", "ispublished": "pub", "full_text_status": "public", "keywords": "diffuse lower\u2010grade glioma; exponential proportional hazards; factor analysis; glioblastoma multiforme; informative censoring; integrative models; latent variables; lung adenocarcinoma; lung squamous cell carcinoma", "note": "\u00a9 2019 John Wiley & Sons, Ltd. \n\nVersion of Record online: 04 June 2019; Manuscript accepted: 03 March 2019; Manuscript revised: 15 January 2019; Manuscript received: 23 January 2018. \n\nFunding Information: National Human Genome Research Institute of the National Institutes of Health. Grant Number: F32HG008713.\n\nSupplemental Material - sim_8151-supp-0001-survival_suppmat.pdf
", "abstract": "Fulfilling the promise of precision medicine requires accurately and precisely classifying disease states. For cancer, this includes prediction of survival time from a surfeit of covariates. Such data presents an opportunity for improved prediction, but also a challenge due to high dimensionality. Furthermore, disease populations can be heterogeneous. Integrative modeling is sensible, as the underlying hypothesis is that joint analysis of multiple covariates provides greater explanatory power than separate analyses. We propose an integrative latent variable model that combines factor analysis for various data types and an exponential proportional hazards (EPH) model for continuous survival time with informative censoring. The factor and EPH models are connected through low\u2010dimensional latent variables that can be interpreted and visualized to identify subpopulations. We use this model to predict survival time. We demonstrate this model's utility in simulation and on four Cancer Genome Atlas datasets: diffuse lower\u2010grade glioma, glioblastoma multiforme, lung adenocarcinoma, and lung squamous cell carcinoma. These datasets have small sample sizes, high\u2010dimensional diverse covariates, and high censorship rates. We compare the predictions from our model to three alternative models. Our model outperforms in simulation and is competitive on real datasets. Furthermore, the low\u2010dimensional visualization for diffuse lower\u2010grade glioma displays known subpopulations.", "date": "2019-09-10", "date_type": "published", "publication": "Statistics in Medicine", "volume": "38", "number": "20", "publisher": "Wiley", "pagerange": "3719-3732", "id_number": "CaltechAUTHORS:20190610-075805993", "issn": "0277-6715", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20190610-075805993", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH Postdoctoral Fellowship", "grant_number": "F32HG008713" } ] }, "doi": "10.1002/sim.8151", "primary_object": { "basename": "sim_8151-supp-0001-survival_suppmat.pdf", "url": "https://authors.library.caltech.edu/records/b2hv9-6t971/files/sim_8151-supp-0001-survival_suppmat.pdf" }, "resource_type": "article", "pub_year": "2019", "author_list": "McCurdy, Shannon; Molinaro, Annette; et el." }, { "id": "https://authors.library.caltech.edu/records/f2956-nyy34", "eprint_id": 98267, "eprint_status": "archive", "datestamp": "2023-08-22 02:18:50", "lastmod": "2024-01-18 17:26:18", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Booeshaghi-A-S", "name": { "family": "Booeshaghi", "given": "A. Sina" }, "orcid": "0000-0002-6442-4502" }, { "id": "da-Veiga-Beltrame-E", "name": { "family": "da Veiga Beltrame", "given": "Eduardo" }, "orcid": "0000-0002-1529-9207" }, { "id": "Bannon-D", "name": { "family": "Bannon", "given": "Dylan" } }, { "id": "Gehring-J", "name": { "family": "Gehring", "given": "Jase" }, "orcid": "0000-0002-3894-9495" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Principles of open source bioinstrumentation applied to the poseidon syringe pump system", "ispublished": "pub", "full_text_status": "public", "keywords": "Lab-on-a-chip; Mechanical engineering", "note": "\u00a9 2019 The Author(s). This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. \n\nReceived 07 February 2019; Accepted 08 August 2019; Published 27 August 2019. \n\nData Availability: Testing data is available at https://github.com/pachterlab/poseidon. \n\nWe thank Nicolas Bray and Kersh Theva for testing prototypes of the poseidon system and for valuable feedback. Thanks to Shannon Hateley for initial help with 3D printing and Zaid Adel Zayyad for designing the icons in Fig. 5. \n\nAuthor Contributions: J.G. conceived of the project and developed the initial design for the syringe pumps. A.S.B. designed the syringe pump system and microscope, and implemented the poseidon software. E.V.B. helped with the design the poseidon system and oversaw hardware printing and design. A.S.B. and E.V.B. tested the poseidon system. J.G., A.S.B. and E.V.B. formulated the design principles. D.B. developed an initial version of the software. A.S.B., E.V.B., J.G. and L.P. wrote the manuscript. \n\nThe authors declare no competing interests.\n\nPublished - s41598-019-48815-9.pdf
Submitted - 521096v1.full.pdf
Supplemental Material - 41598_2019_48815_MOESM1_ESM.pdf
", "abstract": "The poseidon syringe pump and microscope system is an open source alternative to commercial systems. It costs less than $400 and can be assembled in under an hour using the instructions and source files available at https://pachterlab.github.io/poseidon. We describe the poseidon system and use it to illustrate design principles that can facilitate the adoption and development of open source bioinstruments. The principles are functionality, robustness, safety, simplicity, modularity, benchmarking, and documentation.", "date": "2019-08-27", "date_type": "published", "publication": "Scientific Reports", "volume": "9", "publisher": "Nature Publishing Group", "pagerange": "12385", "id_number": "CaltechAUTHORS:20190827-103540654", "issn": "2045-2322", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20190827-103540654", "doi": "10.1038/s41598-019-48815-9", "pmcid": "PMC6711986", "primary_object": { "basename": "s41598-019-48815-9.pdf", "url": "https://authors.library.caltech.edu/records/f2956-nyy34/files/s41598-019-48815-9.pdf" }, "related_objects": [ { "basename": "41598_2019_48815_MOESM1_ESM.pdf", "url": "https://authors.library.caltech.edu/records/f2956-nyy34/files/41598_2019_48815_MOESM1_ESM.pdf" }, { "basename": "521096v1.full.pdf", "url": "https://authors.library.caltech.edu/records/f2956-nyy34/files/521096v1.full.pdf" } ], "resource_type": "article", "pub_year": "2019", "author_list": "Booeshaghi, A. Sina; da Veiga Beltrame, Eduardo; et el." }, { "id": "https://authors.library.caltech.edu/records/cpbyk-8yg76", "eprint_id": 92421, "eprint_status": "archive", "datestamp": "2023-08-22 00:48:15", "lastmod": "2023-10-20 15:40:54", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Ntranos-V", "name": { "family": "Ntranos", "given": "Vasilis" }, "orcid": "0000-0002-2477-0670" }, { "id": "Yi-Lynn", "name": { "family": "Yi", "given": "Lynn" }, "orcid": "0000-0003-4575-0158" }, { "id": "Melsted-P", "name": { "family": "Melsted", "given": "P\u00e1ll" }, "orcid": "0000-0002-8418-6724" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "A discriminative learning approach to differential expression analysis for single-cell RNA-seq", "ispublished": "pub", "full_text_status": "public", "keywords": "Computational biology and bioinformatics; Gene expression; Sequencing; Statistical methods", "note": "\u00a9 2019 Springer Nature Publishing AG. \n\nReceived 06 July 2018; Accepted 13 December 2018; Published\n21 January 2019. \n\nCode availability: The code required to conduct the simulations and reproduce the analyses is available at https://github.com/pachterlab/NYMP_2018. We also have provided the Github repository that was zipped at the time of manuscript acceptance as Supplementary Software. \n\nData availability: The myogenesis dataset (Trapnell et al.(10)) is available on the conquer database and on GEO as series GSE52529. The dataset on embryogenesis is available on the conquer database (Petropoulos et al.(22). The 10x PBMC dataset is available from the 10x Genomics Support website(19). \n\nWe thank N. Bray, J. Gehring and V. Svensson for discussion and comments on the manuscript, and H. Pimentel for assisting with the simulations. We thank A. Butler and R. Satija for implementing this method in Seurat. V.N., L.Y. and L.P. are partially funded by NIH R012017-0569. \n\nAuthor Contributions: V.N. developed the model during discussions with L.Y. and L.P, and analyzed the 10x PBMC dataset. L.Y. performed the simulations and analyzed the embryo SMART-Seq dataset. P.M. developed kallisto genomebam and assisted with analysis. All authors contributed extensively to the interpretation of the results and writing of the manuscript. \n\nThe authors declare no competing interests.\n\nSupplemental Material - 41592_2018_303_MOESM1_ESM.pdf
Supplemental Material - 41592_2018_303_MOESM2_ESM.pdf
Supplemental Material - 41592_2018_303_MOESM3_ESM.zip
", "abstract": "Single-cell RNA-seq makes it possible to characterize the transcriptomes of cell types across different conditions and to identify their transcriptional signatures via differential analysis. Our method detects changes in transcript dynamics and in overall gene abundance in large numbers of cells to determine differential expression. When applied to transcript compatibility counts obtained via pseudoalignment, our approach provides a quantification-free analysis of 3\u2032 single-cell RNA-seq that can identify previously undetectable marker genes.", "date": "2019-02", "date_type": "published", "publication": "Nature Methods", "volume": "16", "number": "2", "publisher": "Nature Publishing Group", "pagerange": "163-166", "id_number": "CaltechAUTHORS:20190123-095919155", "issn": "1548-7091", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20190123-095919155", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R012017-0569" } ] }, "doi": "10.1038/s41592-018-0303-9", "primary_object": { "basename": "41592_2018_303_MOESM1_ESM.pdf", "url": "https://authors.library.caltech.edu/records/cpbyk-8yg76/files/41592_2018_303_MOESM1_ESM.pdf" }, "related_objects": [ { "basename": "41592_2018_303_MOESM2_ESM.pdf", "url": "https://authors.library.caltech.edu/records/cpbyk-8yg76/files/41592_2018_303_MOESM2_ESM.pdf" }, { "basename": "41592_2018_303_MOESM3_ESM.zip", "url": "https://authors.library.caltech.edu/records/cpbyk-8yg76/files/41592_2018_303_MOESM3_ESM.zip" } ], "resource_type": "article", "pub_year": "2019", "author_list": "Ntranos, Vasilis; Yi, Lynn; et el." }, { "id": "https://authors.library.caltech.edu/records/26j43-gs111", "eprint_id": 90471, "eprint_status": "archive", "datestamp": "2023-08-19 14:00:02", "lastmod": "2023-10-20 21:59:00", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "McCurdy-S-R", "name": { "family": "McCurdy", "given": "Shannon R." }, "orcid": "0000-0001-5555-4156" }, { "id": "Ntranos-V", "name": { "family": "Ntranos", "given": "Vasilis" }, "orcid": "0000-0002-2477-0670" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Deterministic column subset selection for single-cell RNA-Seq", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2019 McCurdy et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. \n\nReceived: March 27, 2018; Accepted: December 26, 2018; Published: January 25, 2019. \n\nData Availability: All the single-cell gene expression files are available from the NCBI Sequence Read Archive (mouse brain: accession number SRA SRP045452, mouse bone marrow: accession number SRA SRP063520). The Python package containing code to perform the methods described in the article can be found at https://github.com/srmcc/dcss_single_cell.git. The package also contains code to download the datasets used as examples in the article\" in your manuscript. \n\nSRM is funded by Award Number F32HG008713 from the National Human Genome Research Institute. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. \n\nThe authors have declared that no competing interests exist. \n\nSRM would like to acknowledge Ilan Shomorony, Elaine Angelino, and Robert Tunney for useful comments. \n\nAuthor Contributions:\nConceptualization: Shannon R. McCurdy, Vasilis Ntranos.\nFormal analysis: Shannon R. McCurdy.\nMethodology: Shannon R. McCurdy, Vasilis Ntranos.\nSoftware: Shannon R. McCurdy.\nSupervision: Lior Pachter.\nValidation: Shannon R. McCurdy, Vasilis Ntranos.\nVisualization: Shannon R. McCurdy.\nWriting \u2013 original draft: Shannon R. McCurdy.\nWriting \u2013 review & editing: Shannon R. McCurdy, Vasilis Ntranos.\n\nPublished - journal.pone.0210571.pdf
Submitted - 159079.full.pdf
Supplemental Material - journal.pone.0210571.s001.pdf
", "abstract": "Analysis of single-cell RNA sequencing (scRNA-Seq) data often involves filtering out uninteresting or poorly measured genes and dimensionality reduction to reduce noise and simplify data visualization. However, techniques such as principal components analysis (PCA) fail to preserve non-negativity and sparsity structures present in the original matrices, and the coordinates of projected cells are not easily interpretable. Commonly used thresholding methods to filter genes avoid those pitfalls, but ignore collinearity and covariance in the original matrix. We show that a deterministic column subset selection (DCSS) method possesses many of the favorable properties of common thresholding methods and PCA, while avoiding pitfalls from both. We derive new spectral bounds for DCSS. We apply DCSS to two measures of gene expression from two scRNA-Seq experiments with different clustering workflows, and compare to three thresholding methods. In each case study, the clusters based on the small subset of the complete gene expression profile selected by DCSS are similar to clusters produced from the full set. The resulting clusters are informative for cell type.", "date": "2019-01-25", "date_type": "published", "publication": "PLoS ONE", "volume": "14", "number": "1", "publisher": "Public Library of Science", "pagerange": "Art. No. e0210571", "id_number": "CaltechAUTHORS:20181029-133340286", "issn": "1932-6203", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20181029-133340286", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH Postdoctoral Fellowship", "grant_number": "F32HG008713" } ] }, "doi": "10.1371/journal.pone.0210571", "pmcid": "PMC6347249", "primary_object": { "basename": "159079.full.pdf", "url": "https://authors.library.caltech.edu/records/26j43-gs111/files/159079.full.pdf" }, "related_objects": [ { "basename": "journal.pone.0210571.pdf", "url": "https://authors.library.caltech.edu/records/26j43-gs111/files/journal.pone.0210571.pdf" }, { "basename": "journal.pone.0210571.s001.pdf", "url": "https://authors.library.caltech.edu/records/26j43-gs111/files/journal.pone.0210571.s001.pdf" } ], "resource_type": "article", "pub_year": "2019", "author_list": "McCurdy, Shannon R.; Ntranos, Vasilis; et el." }, { "id": "https://authors.library.caltech.edu/records/a5jaw-w3h87", "eprint_id": 90476, "eprint_status": "archive", "datestamp": "2023-08-22 00:42:54", "lastmod": "2023-10-23 16:01:21", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Tambe-Akshay", "name": { "family": "Tambe", "given": "Akshay" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Barcode identification for single cell genomics", "ispublished": "pub", "full_text_status": "public", "keywords": "Single-cell; Barcodes; Barcode identification; de Bruijn graph; Circularization; K-mer counting", "note": "\u00a9 2019 The Author(s). This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. \n\nReceived: 23 May 2017; Accepted: 7 January 2019; Published: 17 January 2019. \n\nWe thank Jase Gehring and Vasilis Ntranos for helpful comments and feedback during the development of the method. \n\nFunding: None. \n\nAvailability of data and materials: The datasets analyzed here were obtained from previously published datasets, which are available at the NCBI Sequence Read Archive. SRA ascension numbers used in this paper are SRR1873277 and SRR5250839. \n\nAuthors' contributions: AT and LP conceived of the project. AT wrote the software and analyzed data. AT and LP wrote the manuscript. All authors read and approved the final manuscript. \n\nEthics approval: Not applicable. \n\nConsent for publication: Not applicable. \n\nThe authors declare that they have no competing interests.\n\nPublished - s12859-019-2612-0.pdf
Submitted - 136242.full.pdf
Supplemental Material - 12859_2019_2612_MOESM1_ESM.pdf
", "abstract": "Background: Single-cell sequencing experiments use short DNA barcode 'tags' to identify reads that originate from the same cell. In order to recover single-cell information from such experiments, reads must be grouped based on their barcode tag, a crucial processing step that precedes other computations. However, this step can be difficult due to high rates of mismatch and deletion errors that can afflict barcodes. \n\nResults: Here we present an approach to identify and error-correct barcodes by traversing the de Bruijn graph of circularized barcode k-mers. Our approach is based on the observation that circularizing a barcode sequence can yield error-free k-mers even when the size of k is large relative to the length of the barcode sequence, a regime which is typical single-cell barcoding applications. This allows for assignment of reads to consensus fingerprints constructed from k-mers. \n\nConclusion: We show that for single-cell RNA-Seq circularization improves the recovery of accurate single-cell transcriptome estimates, especially when there are a high number of errors per read. This approach is robust to the type of error (mismatch, insertion, deletion), as well as to the relative abundances of the cells. Sircel, a software package that implements this approach is described and publically available.", "date": "2019-01-17", "date_type": "published", "publication": "BMC Bioinformatics", "volume": "20", "publisher": "BioMed Central", "pagerange": "Art. No. 32", "id_number": "CaltechAUTHORS:20181029-144423877", "issn": "1471-2105", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20181029-144423877", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "doi": "10.1186/s12859-019-2612-0", "pmcid": "PMC6337828", "primary_object": { "basename": "12859_2019_2612_MOESM1_ESM.pdf", "url": "https://authors.library.caltech.edu/records/a5jaw-w3h87/files/12859_2019_2612_MOESM1_ESM.pdf" }, "related_objects": [ { "basename": "136242.full.pdf", "url": "https://authors.library.caltech.edu/records/a5jaw-w3h87/files/136242.full.pdf" }, { "basename": "s12859-019-2612-0.pdf", "url": "https://authors.library.caltech.edu/records/a5jaw-w3h87/files/s12859-019-2612-0.pdf" } ], "resource_type": "article", "pub_year": "2019", "author_list": "Tambe, Akshay and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/wt9yb-wny62", "eprint_id": 90174, "eprint_status": "archive", "datestamp": "2023-08-19 13:22:18", "lastmod": "2023-10-23 15:48:09", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Brown-B-C", "name": { "family": "Brown", "given": "Brielin C." }, "orcid": "0000-0001-5569-5223" }, { "id": "Bray-N-L", "name": { "family": "Bray", "given": "Nicolas L." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Expression reflects population structure", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2018 Brown et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. \n\nReceived: July 30, 2018; Accepted: November 20, 2018; Published: December 19, 2018. \n\nThe authors would like to thank Shannon McCurdy for invaluable feedback on this manuscript. \n\nLP and NB were funded by National Institutes of Health grant R01HG008164. LP was also funded by National Institutes of Health grant DK094699. BB was funded by the National Science Foundation Graduate Research Fellowship Program. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. \n\nData Availability: GEUVADIS project RNA-seq reads are available at the European Nucleotide Archive (accession number ENA: ERP001942). 1000 genomes genotypes are available from cog-genomics (https://www.cog-genomics.org/plink/1.9/resources#1kg). Analysis software are available on github (https://github.com/pachterlab/PCCA/). Gencode v27 transcripts are available at ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_27/gencode.v27.pc_transcripts.fa.gz. Gencode v27 GTF is available at ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_27/gencode.v27.annotation.gtf.gz. \n\nThe authors have declared that no competing interests exist.\n\nPublished - journal.pgen.1007841.pdf
Submitted - 364448.full.pdf
Supplemental Material - journal.pgen.1007841.s001.pdf
Supplemental Material - journal.pgen.1007841.s002.png
Supplemental Material - journal.pgen.1007841.s003.png
Supplemental Material - journal.pgen.1007841.s004.png
Supplemental Material - journal.pgen.1007841.s005.png
Supplemental Material - journal.pgen.1007841.s006.png
Supplemental Material - journal.pgen.1007841.s007.png
Supplemental Material - journal.pgen.1007841.s008.png
Supplemental Material - journal.pgen.1007841.s009.png
", "abstract": "Population structure in genotype data has been extensively studied, and is revealed by looking at the principal components of the genotype matrix. However, no similar analysis of population structure in gene expression data has been conducted, in part because a na\u00efve principal components analysis of the gene expression matrix does not cluster by population. We identify a linear projection that reveals population structure in gene expression data. Our approach relies on the coupling of the principal components of genotype to the principal components of gene expression via canonical correlation analysis. Our method is able to determine the significance of the variance in the canonical correlation projection explained by each gene. We identify 3,571 significant genes, only 837 of which had been previously reported to have an associated eQTL in the GEUVADIS results. We show that our projections are not primarily driven by differences in allele frequency at known cis-eQTLs and that similar projections can be recovered using only several hundred randomly selected genes and SNPs. Finally, we present preliminary work on the consequences for eQTL analysis. We observe that using our projection co-ordinates as covariates results in the discovery of slightly fewer genes with eQTLs, but that these genes replicate in GTEx matched tissue at a slightly higher rate.", "date": "2018-12-19", "date_type": "published", "publication": "PLoS Genetics", "volume": "14", "number": "12", "publisher": "Public Library of Science", "pagerange": "Art. No. e1007841", "id_number": "CaltechAUTHORS:20181008-162020262", "issn": "1553-7390", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20181008-162020262", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R01 HG008164" }, { "agency": "NIH", "grant_number": "DK094699" }, { "agency": "NSF Graduate Research Fellowship" } ] }, "doi": "10.1371/journal.pgen.1007841", "pmcid": "PMC6317812", "primary_object": { "basename": "364448.full.pdf", "url": "https://authors.library.caltech.edu/records/wt9yb-wny62/files/364448.full.pdf" }, "related_objects": [ { "basename": "journal.pgen.1007841.pdf", "url": "https://authors.library.caltech.edu/records/wt9yb-wny62/files/journal.pgen.1007841.pdf" }, { "basename": "journal.pgen.1007841.s003.png", "url": "https://authors.library.caltech.edu/records/wt9yb-wny62/files/journal.pgen.1007841.s003.png" }, { "basename": "journal.pgen.1007841.s004.png", "url": "https://authors.library.caltech.edu/records/wt9yb-wny62/files/journal.pgen.1007841.s004.png" }, { "basename": "journal.pgen.1007841.s005.png", "url": "https://authors.library.caltech.edu/records/wt9yb-wny62/files/journal.pgen.1007841.s005.png" }, { "basename": "journal.pgen.1007841.s006.png", "url": "https://authors.library.caltech.edu/records/wt9yb-wny62/files/journal.pgen.1007841.s006.png" }, { "basename": "journal.pgen.1007841.s007.png", "url": "https://authors.library.caltech.edu/records/wt9yb-wny62/files/journal.pgen.1007841.s007.png" }, { "basename": "journal.pgen.1007841.s001.pdf", "url": "https://authors.library.caltech.edu/records/wt9yb-wny62/files/journal.pgen.1007841.s001.pdf" }, { "basename": "journal.pgen.1007841.s002.png", "url": "https://authors.library.caltech.edu/records/wt9yb-wny62/files/journal.pgen.1007841.s002.png" }, { "basename": "journal.pgen.1007841.s008.png", "url": "https://authors.library.caltech.edu/records/wt9yb-wny62/files/journal.pgen.1007841.s008.png" }, { "basename": "journal.pgen.1007841.s009.png", "url": "https://authors.library.caltech.edu/records/wt9yb-wny62/files/journal.pgen.1007841.s009.png" } ], "resource_type": "article", "pub_year": "2018", "author_list": "Brown, Brielin C.; Bray, Nicolas L.; et el." }, { "id": "https://authors.library.caltech.edu/records/qhp6e-5ta13", "eprint_id": 90130, "eprint_status": "archive", "datestamp": "2023-08-19 11:52:21", "lastmod": "2023-10-18 23:13:39", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Svensson-V", "name": { "family": "Svensson", "given": "Valentine" }, "orcid": "0000-0002-9217-2330" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "RNA Velocity: Molecular Kinetics from Single-Cell RNA-Seq", "ispublished": "pub", "full_text_status": "restricted", "note": "\u00a9 2018 Elsevier. \n\nAvailable online 4 October 2018.", "abstract": "Applying a kinetic model of RNA transcription and splicing, La Manno et al. (2018) predict changes in mRNA levels of individual cells from single-cell RNA-seq data.", "date": "2018-10-04", "date_type": "published", "publication": "Molecular Cell", "volume": "72", "number": "1", "publisher": "Elsevier", "pagerange": "7-9", "id_number": "CaltechAUTHORS:20181004-091624887", "issn": "1097-2765", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20181004-091624887", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "doi": "10.1016/j.molcel.2018.09.026", "resource_type": "article", "pub_year": "2018", "author_list": "Svensson, Valentine and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/qbnnk-nye38", "eprint_id": 86015, "eprint_status": "archive", "datestamp": "2023-08-19 09:58:32", "lastmod": "2023-10-23 15:43:43", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Tunney-R-J", "name": { "family": "Tunney", "given": "Robert" } }, { "id": "McGlincy-N-J", "name": { "family": "McGlincy", "given": "Nicholas J." }, "orcid": "0000-0003-1412-2298" }, { "id": "Graham-M-E", "name": { "family": "Graham", "given": "Monica E." } }, { "id": "Naddaf-N", "name": { "family": "Naddaf", "given": "Nicki" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Lareau-L-F", "name": { "family": "Lareau", "given": "Liana F." }, "orcid": "0000-0003-3223-3426" } ] }, "title": "Accurate design of translational output by a neural network model of ribosome distribution", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2018 Springer Nature Limited. \n\nReceived 21 November 2017; Accepted 11 May 2018; Published\n02 July 2018. \n\nWe are grateful to N. Ingolia and S. McCurdy for discussion. This work was supported by the National Cancer Institute of the National Institutes of Health, under award R21CA202960 to L.F.L., and by the National Institute of General Medical Sciences of the National Institutes of Health, under award P50GM102706 to the Berkeley Center for RNA Systems Biology. R.T. was supported by the Department of Defense through the National Defense Science & Engineering Graduate Fellowship (NDSEG) Program. This work made use of the Vincent J. Coates Genomics Sequencing Laboratory at the University of California, Berkeley, supported by National Institutes of Health S10 Instrumentation grant OD018174, and the UC Berkeley flow cytometry core facilities. \n\nAuthor Contributions: L.F.L., R.T., and N.J.M. designed the study, with input from L.P. R.T. developed the software and performed modeling, and R.T., L.P., and L.F.L. analyzed and interpreted the computational results. N.J.M. designed and created the yeast strains and performed expression experiments, with assistance from M.E.G. and N.N. M.E.G. performed yeast ribosome profiling. N.J.M. and L.F.L. analyzed and interpreted the experimental data. R.T. and L.F.L. wrote the manuscript, with input from all authors. \n\nThe authors declare no competing interests. \n\nData availability: Ribosome profiling sequence data generated in this study have been deposited in the NCBI GEO database under accession number GSE106572. All I\u03c7nos software and analysis scripts, including a complete workflow of analyses in this paper and all analyzed data used to create figures, can be found at https://github.com/lareaulab/iXnos/.\n\nAccepted Version - nihms967462.pdf
Submitted - 201517.full.pdf
Supplemental Material - 201517-1.pdf
Supplemental Material - 201517-2.txt
Supplemental Material - 201517-3.txt
Supplemental Material - 201517-4.txt
Supplemental Material - 41594_2018_80_MOESM1_ESM.pdf
Supplemental Material - 41594_2018_80_MOESM2_ESM.pdf
Supplemental Material - 41594_2018_80_MOESM3_ESM.csv
Supplemental Material - 41594_2018_80_MOESM4_ESM.txt
", "abstract": "Synonymous codon choice can have dramatic effects on ribosome speed and protein expression. Ribosome profiling experiments have underscored that ribosomes do not move uniformly along mRNAs. Here, we have modeled this variation in translation elongation by using a feed-forward neural network to predict the ribosome density at each codon as a function of its sequence neighborhood. Our approach revealed sequence features affecting translation elongation and characterized large technical biases in ribosome profiling. We applied our model to design synonymous variants of a fluorescent protein spanning the range of translation speeds predicted with our model. Levels of the fluorescent protein in budding yeast closely tracked the predicted translation speeds across their full range. We therefore demonstrate that our model captures information determining translation dynamics in vivo; that this information can be harnessed to design coding sequences; and that control of translation elongation alone is sufficient to produce large quantitative differences in protein output.", "date": "2018-07", "date_type": "published", "publication": "Nature Structural & Molecular Biology", "volume": "25", "number": "7", "publisher": "Nature Publishing Group", "pagerange": "577-582", "id_number": "CaltechAUTHORS:20180423-152534642", "issn": "1545-9985", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20180423-152534642", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R21CA202960" }, { "agency": "NIH", "grant_number": "P50GM102706" }, { "agency": "National Defense Science and Engineering Graduate (NDSEG) Fellowship" }, { "agency": "NIH", "grant_number": "OD018174" } ] }, "doi": "10.1038/s41594-018-0080-2", "pmcid": "PMC6457438", "primary_object": { "basename": "201517-1.pdf", "url": "https://authors.library.caltech.edu/records/qbnnk-nye38/files/201517-1.pdf" }, "related_objects": [ { "basename": "201517-2.txt", "url": "https://authors.library.caltech.edu/records/qbnnk-nye38/files/201517-2.txt" }, { "basename": "201517-3.txt", "url": "https://authors.library.caltech.edu/records/qbnnk-nye38/files/201517-3.txt" }, { "basename": "201517-4.txt", "url": "https://authors.library.caltech.edu/records/qbnnk-nye38/files/201517-4.txt" }, { "basename": "201517.full.pdf", "url": "https://authors.library.caltech.edu/records/qbnnk-nye38/files/201517.full.pdf" }, { "basename": "41594_2018_80_MOESM1_ESM.pdf", "url": "https://authors.library.caltech.edu/records/qbnnk-nye38/files/41594_2018_80_MOESM1_ESM.pdf" }, { "basename": "41594_2018_80_MOESM2_ESM.pdf", "url": "https://authors.library.caltech.edu/records/qbnnk-nye38/files/41594_2018_80_MOESM2_ESM.pdf" }, { "basename": "41594_2018_80_MOESM3_ESM.csv", "url": "https://authors.library.caltech.edu/records/qbnnk-nye38/files/41594_2018_80_MOESM3_ESM.csv" }, { "basename": "41594_2018_80_MOESM4_ESM.txt", "url": "https://authors.library.caltech.edu/records/qbnnk-nye38/files/41594_2018_80_MOESM4_ESM.txt" }, { "basename": "nihms967462.pdf", "url": "https://authors.library.caltech.edu/records/qbnnk-nye38/files/nihms967462.pdf" } ], "resource_type": "article", "pub_year": "2018", "author_list": "Tunney, Robert; McGlincy, Nicholas J.; et el." }, { "id": "https://authors.library.caltech.edu/records/q0xz8-h1p26", "eprint_id": 95211, "eprint_status": "archive", "datestamp": "2023-08-19 09:47:48", "lastmod": "2023-10-20 21:56:26", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Rahman-Atif", "name": { "family": "Rahman", "given": "Atif" }, "orcid": "0000-0003-1805-3971" }, { "id": "Hallgr\u00edmsd\u00f3ttir-Ingileif", "name": { "family": "Hallgr\u00edmsd\u00f3ttir", "given": "Ingileif" } }, { "id": "Eisen-Michael-B", "name": { "family": "Eisen", "given": "Michael" }, "orcid": "0000-0002-7528-738X" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Association mapping from sequencing reads using k-mers", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2018 Rahman et al. This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited. \n\nReceived: 18 October 2017; Accepted: 08 June 2018; Published: 13 June 2018. \n\nWe thank Faraz Tavakoli, Harold Pimentel, Brielin Brown and Nicolas Bray for helpful conversations in the development of the method for association mapping from sequencing reads using k-mers. \n\nAR, IH, MBE and LP were funded in part by NIH R21 HG006583. AR was funded in part by Fulbright Science and Technology Fellowship 15093630. \n\nThe authors declare that no competing interests exist. \n\nAuthor contributions:\nAtif Rahman, Conceptualization, Resources, Data curation, Software, Formal analysis, Validation,\nInvestigation, Visualization, Methodology, Writing\u2014original draft, Writing\u2014review and editing; Ingi\nleif Hallgr\u00edmsd\u00f3ttir, Validation, Methodology, Writing\u2014review and editing; Michael Eisen, Conceptu\nalization, Supervision, Funding acquisition, Methodology, Project administration, Writing\u2014review\nand editing; Lior Pachter, Conceptualization, Formal analysis, Supervision, Funding acquisition, Vali\ndation, Methodology, Project administration, Writing\u2014review and editing. \n\nData availability: All data generated or analysed during this study are included in the manuscript and supporting files. Source data files have been provided for Figures 5.\n\nPublished - elife-32920.pdf
Submitted - 141267.full.pdf
Supplemental Material - elife-32920-transrepform-v2.pdf
", "abstract": "Genome wide association studies (GWAS) rely on microarrays, or more recently mapping of sequencing reads, to genotype individuals. The reliance on prior sequencing of a reference genome limits the scope of association studies, and also precludes mapping associations outside of the reference. We present an alignment free method for association studies of categorical phenotypes based on counting k-mers in whole-genome sequencing reads, testing for associations directly between k-mers and the trait of interest, and local assembly of the statistically significant k-mers to identify sequence differences. An analysis of the 1000 genomes data show that sequences identified by our method largely agree with results obtained using the standard approach. However, unlike standard GWAS, our method identifies associations with structural variations and sites not present in the reference genome. We also demonstrate that population stratification can be inferred from k-mers. Finally, application to an E.coli dataset on ampicillin resistance validates the approach.", "date": "2018-06-13", "date_type": "published", "publication": "eLife", "volume": "7", "publisher": "eLife Sciences Publications", "pagerange": "Art. No. e32920", "id_number": "CaltechAUTHORS:20190503-134759852", "issn": "2050-084X", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20190503-134759852", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R21 HG006583" }, { "agency": "Fulbright Foundation", "grant_number": "15093630" } ] }, "doi": "10.7554/elife.32920", "pmcid": "PMC6044908", "primary_object": { "basename": "141267.full.pdf", "url": "https://authors.library.caltech.edu/records/q0xz8-h1p26/files/141267.full.pdf" }, "related_objects": [ { "basename": "elife-32920-transrepform-v2.pdf", "url": "https://authors.library.caltech.edu/records/q0xz8-h1p26/files/elife-32920-transrepform-v2.pdf" }, { "basename": "elife-32920.pdf", "url": "https://authors.library.caltech.edu/records/q0xz8-h1p26/files/elife-32920.pdf" } ], "resource_type": "article", "pub_year": "2018", "author_list": "Rahman, Atif; Hallgr\u00edmsd\u00f3ttir, Ingileif; et el." }, { "id": "https://authors.library.caltech.edu/records/st76a-kg126", "eprint_id": 85872, "eprint_status": "archive", "datestamp": "2023-08-21 23:10:48", "lastmod": "2023-10-23 15:57:17", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Yi-Lynn", "name": { "family": "Yi", "given": "Lynn" }, "orcid": "0000-0003-4575-0158" }, { "id": "Pimentel-H", "name": { "family": "Pimentel", "given": "Harold" } }, { "id": "Bray-N-L", "name": { "family": "Bray", "given": "Nicolas L." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Gene-level differential analysis at transcript-level resolution", "ispublished": "pub", "full_text_status": "public", "keywords": "RNA-sequencing; Differential expression; Meta-analysis P value aggregation; Lancaster method; Fisher's method; \u0160id\u00e1k correction; RNA-seq quantification; RNA-seq alignment; Pseudo; alignment; Transcript compatibility counts; Gene ontology", "note": "\u00a9 2018 The Author(s). This article is distributed under the terms of the Creative Commons Attribution 4.0\nInternational License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. \n\nWe thank Jase Gehring, P\u00e1ll Melsted, and Vasilis Ntranos for discussion and feedback during development of the methods. Conversations with Cole Trapnell regarding the challenges of functional characterization of individual isoforms were instrumental in launching the project.\n\nLY was partially funded by the UCLA-Caltech Medical Science Training Program, NIH T32 GM07616, and the Lee Ramo Fund. Harold Pimentel was partially funded by NIH R01 HG008140. \n\nAvailability of data and materials: Scripts to reproduce the figures and results of the paper are available at http://github.com/pachterlab/aggregationDE/, which is under GNU General Public License v3.0. [33]. The RNA-seq datasets used in the analysis can be found at GEO GSE89024 [21]and GEO GSE95363 [25]. \n\nAuthors' contributions: LY, NLB, and LP devised the methods. LY analyzed the biological data. LY and LP performed computational experiments. HP developed and implemented the simulation framework. LY and LP wrote the paper. NLB and LP supervised the research. All authors read and approved the final manuscript. \n\nEthics approval and consent to participate: No data from humans were used in this manuscript. \n\nThe authors declare that they have no competing interests.\n\nPublished - s13059-018-1419-z.pdf
Submitted - 190199.full.pdf
Supplemental Material - 13059_2018_1419_MOESM1_ESM.pdf
", "abstract": "Compared to RNA-sequencing transcript differential analysis, gene-level differential expression analysis is more robust and experimentally actionable. However, the use of gene counts for statistical analysis can mask transcript-level dynamics. We demonstrate that 'analysis first, aggregation second,' where the p values derived from transcript analysis are aggregated to obtain gene-level results, increase sensitivity and accuracy. The method we propose can also be applied to transcript compatibility counts obtained from pseudoalignment of reads, which circumvents the need for quantification and is fast, accurate, and model-free. The method generalizes to various levels of biology and we showcase an application to gene ontologies.", "date": "2018-04-12", "date_type": "published", "publication": "Genome Biology", "volume": "19", "publisher": "BioMed Central", "pagerange": "Art. No. 53", "id_number": "CaltechAUTHORS:20180416-090553011", "issn": "1474-760X", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20180416-090553011", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "Caltech- Medical Science Training Program" }, { "agency": "NIH Predoctoral Fellowship", "grant_number": "T32 GM07616" }, { "agency": "Lee Ramo Fund" }, { "agency": "NIH", "grant_number": "R01 HG008140" } ] }, "doi": "10.1186/s13059-018-1419-z", "pmcid": "PMC5896116", "primary_object": { "basename": "s13059-018-1419-z.pdf", "url": "https://authors.library.caltech.edu/records/st76a-kg126/files/s13059-018-1419-z.pdf" }, "related_objects": [ { "basename": "13059_2018_1419_MOESM1_ESM.pdf", "url": "https://authors.library.caltech.edu/records/st76a-kg126/files/13059_2018_1419_MOESM1_ESM.pdf" }, { "basename": "190199.full.pdf", "url": "https://authors.library.caltech.edu/records/st76a-kg126/files/190199.full.pdf" } ], "resource_type": "article", "pub_year": "2018", "author_list": "Yi, Lynn; Pimentel, Harold; et el." }, { "id": "https://authors.library.caltech.edu/records/nf54s-9xm95", "eprint_id": 83162, "eprint_status": "archive", "datestamp": "2023-08-19 05:23:31", "lastmod": "2023-10-17 22:57:03", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Goin-D-E", "name": { "family": "Goin", "given": "Dana E." } }, { "id": "Smed-M-K", "name": { "family": "Smed", "given": "Mette" } }, { "id": "Jewell-N-P", "name": { "family": "Jewell", "given": "Nicholas" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Nelson-J-L", "name": { "family": "Nelson", "given": "J. Lee" } }, { "id": "Kjaergaard-H", "name": { "family": "Kjaergaard", "given": "Hanne" } }, { "id": "Olsen-J", "name": { "family": "Olsen", "given": "J\u00f8rn" } }, { "id": "Hetland-M-L", "name": { "family": "Hetland", "given": "Merete Lund" } }, { "id": "Ottesen-B", "name": { "family": "Ottesen", "given": "Bent" } }, { "id": "Zoffmann-V", "name": { "family": "Zoffmann", "given": "Vibeke" } }, { "id": "Jawaheer-D", "name": { "family": "Jawaheer", "given": "Damini" } } ] }, "title": "Longitudinal Changes in Gene Expression Associated with Disease Activity during Pregnancy and Post-Partum Among Women with Rheumatoid Arthritis", "ispublished": "pub", "full_text_status": "restricted", "keywords": "Disease Activity, Gene Expression, pregnancy and rheumatoid arthritis (RA), RNA", "note": "\u00a9 2017 American College of Rheumatology. \n\nIssue online: 27 September 2017; Version of record online: 27 September 2017. \n\nDisclosure: D. E. Goin, None; M. Smed, None; N. Jewell, None; L. Pachter, None; J. L. Nelson, None; H. Kjaergaard, None; J. Olsen, None; M. Lund Hetland, None; B. Ottesen, None; V. Zoffmann, None; D. Jawaheer, None.", "abstract": "Background/Purpose: Many women with rheumatoid arthritis (RA) experience an improvement in disease activity\nduring pregnancy, and a predictable flare in the months after they give birth. The cause of these changes is unknown. We hypothesized that understanding biological changes (through gene expression) that occur from pre-pregnancy through the pregnancy and post-partum periods will contribute important evidence to our knowledge of the drivers of disease activity in RA during and after pregnancy.\nMethods: We have established a prospective RA pregnancy cohort, with clinical data and blood samples collected at\npre-pregnancy (T0), each trimester of pregnancy and every 3 months up to a year post-partum (up to 8 time points).\nDisease activity at each time point was assessed using disease activity scores (DAS28CRP4); women who showed an\nimprovement during pregnancy were selected for analysis (n=9). Global gene expression profiles for each sample were\ngenerated using RNA-sequencing (RNA-seq). Raw reads were pseudo-aligned and quantified using kallisto. Random\neffects regression models were used to estimate the effects of changes in gene expression on disease activity (a) from\nT0 through the pregnancy period (P1), and (b) in the post-partum period (P2). The models were adjusted for age,\nmedication status at baseline and batch effects. Significance was assessed using a threshold of q<0.05 (FDR-adjusted). Functional enrichment analysis was performed using WebGestalt.\nResults: During pregnancy, 1,174 genes had expression patterns significantly associated with disease activity. While these were not significantly enriched in specific pathways, the genes whose increased expression was associated with the largest decrease (improvement) in disease activity during pregnancy were immune-related, and included ERAP1, CSNK2A1 and FAM175B. ERAP1 is involved in trimming peptides for presentation on MHC class I molecules;\nCSNK2A1 regulates cellular processes including cellular response to viral infection; FAM175B is involved in\ninterferon-signaling. In the post-partum period, 4,693 genes had expression patterns significantly associated with\ndisease activity. These were enriched (p<1x10^(-6)) in numerous immune-related pathways including MAPK signaling, T\ncell receptor signaling, osteoclast differentiation, hematopoietic cell lineage, B cell receptor signaling, Toll-like receptor signaling and leukocyte trans-endothelial migration, in addition to several pathways related to cancer. The genes whose increased expression were associated with larger increases in disease activity included EI24, CMTM7, PPP2CB and BFAR which are related to tumor suppression and/or regulation of apoptosis.\nConclusion: In this pilot RA pregnancy cohort study with longitudinal RNA-seq data, several candidate genes were\nidentified as significantly associated with improvement in disease activity during pregnancy, and others were associated with post-partum flares. These results warrant further investigations into possible roles of these genes in modulating RA disease activity in a larger cohort.", "date": "2017-10", "date_type": "published", "publication": "Arthritis and Rheumatology", "volume": "69", "number": "S10", "publisher": "Wiley", "pagerange": "Art. No. 2432", "id_number": "CaltechAUTHORS:20171113-143256428", "issn": "2326-5191", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20171113-143256428", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "doi": "10.1002/art.40321", "resource_type": "article", "pub_year": "2017", "author_list": "Goin, Dana E.; Smed, Mette; et el." }, { "id": "https://authors.library.caltech.edu/records/3hhxt-h4x56", "eprint_id": 83164, "eprint_status": "archive", "datestamp": "2023-08-19 05:23:39", "lastmod": "2023-10-17 22:57:08", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Goin-D-E", "name": { "family": "Goin", "given": "Dana E." } }, { "id": "Smed-M-K", "name": { "family": "Smed", "given": "Mette" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Purdom-E", "name": { "family": "Purdom", "given": "Elizabeth" } }, { "id": "Nelson-J-L", "name": { "family": "Nelson", "given": "J. Lee" } }, { "id": "Kjaergaard-H", "name": { "family": "Kjaergaard", "given": "Hanne" } }, { "id": "Olsen-J", "name": { "family": "Olsen", "given": "J\u00f8rn" } }, { "id": "Hetland-M-L", "name": { "family": "Hetland", "given": "Merete Lund" } }, { "id": "Ottesen-B", "name": { "family": "Ottesen", "given": "Bent" } }, { "id": "Zoffmann-V", "name": { "family": "Zoffmann", "given": "Vibeke" } }, { "id": "Jawaheer-D", "name": { "family": "Jawaheer", "given": "Damini" } } ] }, "title": "Transcriptome Analysis in Women with Rheumatoid Arthritis Who Improve or Worsen during Pregnancy", "ispublished": "pub", "full_text_status": "restricted", "note": "\u00a9 2017 American College of Rheumatology. \n\nIssue online: 27 September 2017; Version of record online: 27 September 2017. \n\nDisclosure: D. E. Goin, None; M. Smed, None; L. Pachter, None; E. Purdom, None; J. L. Nelson, None; H. Kjaergaard, None; J. Olsen, None; M. Lund Hetland, None; B. Ottesen, None; V. Zoffmann, None; D. Jawaheer, None.", "abstract": "Background/Purpose: Gene expression changes induced by pregnancy in women with rheumatoid arthritis (RA) and\nhealthy women have not been examined. The few studies previously conducted did not have pre-pregnancy samples\navailable as baseline. We have established a cohort of RA and healthy women followed prospectively from pre-pregnancy.\nIn this study, we aimed to identify pregnancy-induced changes in gene expression among women with RA and healthy women, and to assess how those changes may differ between RA women who improve or worsen during pregnancy.\nMethods: Clinical data and samples collected from a subset of 11 women with RA and 5 healthy women from our cohort before pregnancy (T0) and at the third trimester (T3) were analyzed. Disease activity scores were used to determine whether the RA women improved or worsened during pregnancy. Global gene expression profiles were generated by RNA sequencing (RNA-seq). The raw RNA-seq reads were pseudo-aligned to the reference transcriptome and expression levels were estimated with kallisto. Differential expression analysis of normalized expression levels was\nperformed using edgeR to identify genes differentially expressed within each group of women (T3 vs T0), using a foldchange cut-off of 2 and a significance threshold of q<0.05 (FDR-adjusted). Functional enrichment analysis was\nperformed using WebGestalt.\nResults: Of the 11 women with RA, 8 showed an improvement in disease activity by T3 (RA_(improved)), while 3\nworsened (RA_(worsened)). In the RA_(improved) group, a total of 161 genes were differentially expressed (DE) between T3 and T0. These included several genes whose expression have previously been associated with RA (e.g. S100A12, SLC14A1) as well as genes involved in the innate immune system (e.g. type I interferon-inducible genes). The majority of these genes (108 of 161) were also DE among healthy women. Of interest, most genes (30 of 31) that were\nsignificantly DE in both of the RA groups were also DE among healthy women (e.g. \u03b1-defensin genes). There were also\ndifferences between the RA_(improved) and RA_(worsened) groups. A set of IFN-inducible genes was over-expressed at T3 (vs T0) in the RA_(improved) but not the RA_(worsened) women. Additionally, some interesting candidate genes whose expression have previously been associated with RA (e.g. MMP9, PADI4 and PGLYRP1) were over-expressed at T3 (vs. T0)\namong RA_(worsened) but not among RA_(improved) women.\nConclusion: Pregnancy-induced gene expression changes common between RA women who improved and those who\nworsened appeared to be normal pregnancy-related changes that were also observed among healthy women. Other\ngenes that demonstrated different patterns of expression between the two RA groups are potential candidates that could be involved in the natural pregnancy-induced amelioration of RA.", "date": "2017-10", "date_type": "published", "publication": "Arthritis and Rheumatology", "volume": "69", "number": "S10", "publisher": "Wiley", "pagerange": "Art. No. 2433", "id_number": "CaltechAUTHORS:20171113-145148333", "issn": "2326-5191", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20171113-145148333", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "doi": "10.1002/art.40321", "resource_type": "article", "pub_year": "2017", "author_list": "Goin, Dana E.; Smed, Mette; et el." }, { "id": "https://authors.library.caltech.edu/records/15tce-gp284", "eprint_id": 74793, "eprint_status": "archive", "datestamp": "2023-08-19 04:08:59", "lastmod": "2023-10-24 23:17:50", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Schaeffer-L-V", "name": { "family": "Schaeffer", "given": "L." } }, { "id": "Pimentel-H", "name": { "family": "Pimentel", "given": "H." } }, { "id": "Bray-N-L", "name": { "family": "Bray", "given": "N." } }, { "id": "Melsted-P", "name": { "family": "Melsted", "given": "P." }, "orcid": "0000-0002-8418-6724" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "L." }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Pseudoalignment for metagenomic read assignment", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 The Author 2017. Published by Oxford University Press.\nThis article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices). \n\nReceived on October 18, 2016; revised on January 23, 2017; editorial decision on February 15, 2017; accepted on February 17, 2017. Published: 21 February 2017. \n\nWe thank readers of preprints of this manuscript for helpful suggestions that have improved our method and its description in the paper. \n\nH.P. was supported by an NSF graduate research fellowship. P.M. was partially supported by a Fulbright fellowship. L.S and L.P. were partially supported by NIH R01 HG006129 and NIH R01 DK094699. \n\nConflict of Interest: none declared.\n\nSubmitted - 1510.07371.pdf
", "abstract": "Motivation: Read assignment is an important first step in many metagenomic analysis workflows, providing the basis for identification and quantification of species. However ambiguity among the sequences of many strains makes it difficult to assign reads at the lowest level of taxonomy, and reads are typically assigned to taxonomic levels where they are unambiguous. We explore connections between metagenomic read assignment and the quantification of transcripts from RNA-Seq data in order to develop novel methods for rapid and accurate quantification of metagenomic strains. \n\nResults: We find that the recent idea of pseudoalignment introduced in the RNA-Seq context is highly applicable in the metagenomics setting. When coupled with the Expectation-Maximization (EM) algorithm, reads can be assigned far more accurately and quickly than is currently possible with state of the art software, making it possible and practical for the first time to analyze abundances of individual genomes in metagenomics projects.", "date": "2017-07-15", "date_type": "published", "publication": "Bioinformatics", "volume": "33", "number": "14", "publisher": "Oxford University Press", "pagerange": "2082-2088", "id_number": "CaltechAUTHORS:20170306-131027010", "issn": "1367-4803", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170306-131027010", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NSF Graduate Research Fellowship" }, { "agency": "Fulbright Foundation" }, { "agency": "NIH", "grant_number": "R01 HG006129" }, { "agency": "NIH", "grant_number": "R01 DK094699" } ] }, "doi": "10.1093/bioinformatics/btx106", "pmcid": "PMC5870846", "primary_object": { "basename": "1510.07371.pdf", "url": "https://authors.library.caltech.edu/records/15tce-gp284/files/1510.07371.pdf" }, "resource_type": "article", "pub_year": "2017", "author_list": "Schaeffer, L.; Pimentel, H.; et el." }, { "id": "https://authors.library.caltech.edu/records/kmsft-gtd43", "eprint_id": 78089, "eprint_status": "archive", "datestamp": "2023-08-19 03:51:45", "lastmod": "2023-10-25 23:44:45", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Pimentel-H", "name": { "family": "Pimentel", "given": "Harold" } }, { "id": "Bray-N-L", "name": { "family": "Bray", "given": "Nicolas L." } }, { "id": "Puente-S", "name": { "family": "Puente", "given": "Suzette" } }, { "id": "Melsted-P", "name": { "family": "Melsted", "given": "P\u00e1ll" }, "orcid": "0000-0002-8418-6724" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Differential analysis of RNA-seq incorporating quantification uncertainty", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2017 Macmillan Publishers Limited, part of Springer Nature. \n\nReceived 27 January 2017; Accepted 04 May 2017; Published online 05 June 2017. \n\nH.P. and L.P. were partially supported by NIH grant nos. R01 DK094699 and R01 HG006129. We thank D. Li, A. Tseng, and P. Sturmfels for help with implementing some of the interactive features in sleuth. \n\nAuthor Contributions: H.P. led the development of the sleuth statistical model and was assisted by S.P., N.L.B., P.M., and L.P. The method comparison and testing framework was designed by H.P., N.L.B., P.M., and L.P. The interactive sleuth live software was designed and implemented by H.P., as was the sleuth R package. H.P. automated production of the results. H.P., N.L.B., P.M., and L.P. analyzed results and wrote the paper. \n\nData availability statement: The Bottomly data set is available at the NCBI Gene Expression Omnibus (GSE26024, accession nos. SRR099223\u2013SRR099243). The Trapnell et al. data set (Fig. 1b) is available at the NCBI Gene Expression Omnibus (GSE37704, accession nos. SRR493366\u2013SRR493371). The GEUVADIS data set is available at the European Nucleotide Archive (accession no. ERP001942). \n\nThe authors declare no competing financial interests.\n\nSubmitted - 058164.full.pdf
Supplemental Material - nmeth.4324-S1.pdf
Supplemental Material - nmeth.4324-S2.zip
Supplemental Material - nmeth_4324-SF1.jpg
Supplemental Material - nmeth_4324-SF2.jpg
Supplemental Material - nmeth_4324-SF3.jpg
Supplemental Material - nmeth_4324-SF4.jpg
Supplemental Material - nmeth_4324-SF5.jpg
Supplemental Material - nmeth_4324-SF6.jpg
Supplemental Material - nmeth_4324-SF7.jpg
Supplemental Material - nmeth_4324-SF8.jpg
", "abstract": "We describe sleuth (http://pachterlab.github.io/sleuth), a method for the differential analysis of gene expression data that utilizes bootstrapping in conjunction with response error linear modeling to decouple biological variance from inferential variance. sleuth is implemented in an interactive shiny app that utilizes kallisto quantifications and bootstraps for fast and accurate analysis of data from RNA-seq experiments.", "date": "2017-07", "date_type": "published", "publication": "Nature Methods", "volume": "14", "number": "7", "publisher": "Nature Publishing Group", "pagerange": "687-690", "id_number": "CaltechAUTHORS:20170612-084553487", "issn": "1548-7091", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170612-084553487", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R01 DK094699" }, { "agency": "NIH", "grant_number": "R01 HG006129" } ] }, "doi": "10.1038/nmeth.4324", "primary_object": { "basename": "nmeth_4324-SF2.jpg", "url": "https://authors.library.caltech.edu/records/kmsft-gtd43/files/nmeth_4324-SF2.jpg" }, "related_objects": [ { "basename": "nmeth_4324-SF3.jpg", "url": "https://authors.library.caltech.edu/records/kmsft-gtd43/files/nmeth_4324-SF3.jpg" }, { "basename": "nmeth_4324-SF5.jpg", "url": "https://authors.library.caltech.edu/records/kmsft-gtd43/files/nmeth_4324-SF5.jpg" }, { "basename": "nmeth_4324-SF6.jpg", "url": "https://authors.library.caltech.edu/records/kmsft-gtd43/files/nmeth_4324-SF6.jpg" }, { "basename": "nmeth_4324-SF8.jpg", "url": "https://authors.library.caltech.edu/records/kmsft-gtd43/files/nmeth_4324-SF8.jpg" }, { "basename": "058164.full.pdf", "url": "https://authors.library.caltech.edu/records/kmsft-gtd43/files/058164.full.pdf" }, { "basename": "nmeth.4324-S1.pdf", "url": "https://authors.library.caltech.edu/records/kmsft-gtd43/files/nmeth.4324-S1.pdf" }, { "basename": "nmeth.4324-S2.zip", "url": "https://authors.library.caltech.edu/records/kmsft-gtd43/files/nmeth.4324-S2.zip" }, { "basename": "nmeth_4324-SF1.jpg", "url": "https://authors.library.caltech.edu/records/kmsft-gtd43/files/nmeth_4324-SF1.jpg" }, { "basename": "nmeth_4324-SF4.jpg", "url": "https://authors.library.caltech.edu/records/kmsft-gtd43/files/nmeth_4324-SF4.jpg" }, { "basename": "nmeth_4324-SF7.jpg", "url": "https://authors.library.caltech.edu/records/kmsft-gtd43/files/nmeth_4324-SF7.jpg" } ], "resource_type": "article", "pub_year": "2017", "author_list": "Pimentel, Harold; Bray, Nicolas L.; et el." }, { "id": "https://authors.library.caltech.edu/records/jbjbb-4as73", "eprint_id": 77860, "eprint_status": "archive", "datestamp": "2023-08-19 03:14:37", "lastmod": "2023-10-25 23:31:13", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Goin-D-E", "name": { "family": "Goin", "given": "Dana E." } }, { "id": "Smed-M-K", "name": { "family": "Smed", "given": "Mette Kiel" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Purdom-E", "name": { "family": "Purdom", "given": "Elizabeth" } }, { "id": "Nelson-J-L", "name": { "family": "Nelson", "given": "J. Lee" } }, { "id": "Kj\u00e6rgaard-H", "name": { "family": "Kj\u00e6rgaard", "given": "Hanne" } }, { "id": "Olsen-J", "name": { "family": "Olsen", "given": "J\u00f8rn" } }, { "id": "Hetland-M-L", "name": { "family": "Hetland", "given": "Merete Lund" } }, { "id": "Zoffmann-V", "name": { "family": "Zoffmann", "given": "Vibeke" } }, { "id": "Ottesen-B", "name": { "family": "Ottesen", "given": "Bent" } }, { "id": "Jawaheer-D", "name": { "family": "Jawaheer", "given": "Damini" } } ] }, "title": "Pregnancy-induced gene expression changes in vivo among women with rheumatoid arthritis: a pilot study", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2017 The Author(s). This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. \n\nReceived: 16 January 2017; Accepted: 2 May 2017; Published: 25 May 2017. \n\nWe are immensely grateful to the study subjects for their participation in the study. We thank Mr. Kurt Stig Jensen at the Juliane Marie Center for his support. The Rheumatology departments at the following hospitals in Denmark facilitated collection of data and samples: Rigshospitalet (Glostrup), Odense Universitetshospital, Kong Christian X's Gigthospital (Gr\u00e5sten), Aarhus University Hospital NBG, and Regionshospitalet Viborg. We thank all members of our project team for making this work possible: Anne-Grethe Rasmussen, Charlotte Sch\u00f6n Frengler, Dorte Heide, Randi Petersen, Tove Thorup Rasmussen, Lone Thomasen, Britta Hvidberg Nielsen, Teresa Rozenfeldt, Kirsten Junker, Lis Kastberg Schubert, Lis Lund, Jette Barlach, Helle Bendtsen, Helle Andersen, and Marjo Westerdahl for their contribution to data and sample collection; and Rikke Godtkj\u00e6r Andersen, Mie Rams Rasmussen, Pia Pedersen, Stine Birkelund, Louise Mielke, and Andreas Smed for management of data and samples. We also greatly appreciate the valuable assistance provided by Majbritt Norman Nielsen and DANBIO personnel. \n\nThis work was supported in part by funds from the National Institute of Arthritis, Musculoskeletal and Skin Diseases (NIAMS), USA (grant R21 AR057931); Gigtforeningen, Denmark (grant R87-A1477-B512); and the Juliane Marie Center, Rigshospitalet (Denmark). These funders did not have any role in conducting this study or in the interpretation and reporting of results. \n\nAvailability of data and materials: The data are governed by Danish privacy laws. The authors are legally forbidden from publicly sharing data under the terms of their agreement with the Danish Data Protection Agency. Data are available upon request from the corresponding author, after approval is granted by the Danish Data Protection Agency. \n\nAuthors' contributions: DEG analyzed the data, interpreted the results, and contributed to manuscript writing. MKS was responsible for acquisition of the data. LP and EP contributed to the analysis and interpretation of the data. JLN, HK, and JO contributed to the conception and design of the study. HK was also responsible for data acquisition. MLH, VZ, and BO contributed to the data acquisition. DJ was involved in the conception and design of the experiments, in the analysis and interpretation of the data, and in writing the manuscript. All authors contributed to critically revising the manuscript for important intellectual content. All authors read and approved the final manuscript. HK passed away before the submission of the final version of the manuscript; DJ accepts responsibility for the integrity and validity of the data collected and analyzed. \n\nThe authors declare that they have no competing interests. \n\nConsent for publication: Not applicable. \n\nEthics approval and consent to participate: This study was approved by the ethics committee for Region Hovedstaden (Denmark), the Danish Data Protection Agency, and the Children's Hospital Oakland Research Institute Institutional Review Board. All subjects provided written informed consent prior to enrollment.\n\nPublished - art_3A10.1186_2Fs13075-017-1312-2.pdf
Supplemental Material - 13075_2017_1312_MOESM1_ESM.pdf
Supplemental Material - 13075_2017_1312_MOESM2_ESM.pdf
", "abstract": "Background: Little is known about gene expression changes induced by pregnancy in women with rheumatoid arthritis (RA) and healthy women because the few studies previously conducted did not have pre-pregnancy samples available as baseline. We have established a cohort of women with RA and healthy women followed prospectively from a pre-pregnancy baseline. In this study, we tested the hypothesis that pregnancy-induced changes in gene expression among women with RA who improve during pregnancy (pregDAS_(improved)) overlap substantially with changes observed among healthy women and differ from changes observed among women with RA who worsen during pregnancy (pregDAS_(worse)). \n\nMethods: Global gene expression profiles were generated by RNA sequencing (RNA-seq) from 11 women with RA and 5 healthy women before pregnancy (T0) and at the third trimester (T3). Among the women with RA, eight showed an improvement in disease activity by T3, whereas three worsened. Differential expression analysis was used to identify genes demonstrating significant changes in expression within each of the RA and healthy groups (T3 vs T0), as well as between the groups at each time point. Gene set enrichment was assessed in terms of Gene Ontology processes and protein networks. \n\nResults: A total of 1296 genes were differentially expressed between T3 and T0 among the 8 pregDAS_(improved) women, with 161 genes showing at least two-fold change (FC) in expression by T3. The majority (108 of 161 genes) were also differentially expressed among healthy women (q<0.05, FC\u22652). Additionally, a small cluster of genes demonstrated contrasting changes in expression between the pregDAS_(improved) and pregDAS_(worse) groups, all of which were inducible by type I interferon (IFN). These IFN-inducible genes were over-expressed at T3 compared to the T0 baseline among the pregDAS_(improved) women. \n\nConclusions: In our pilot RNA-seq dataset, increased pregnancy-induced expression of type I IFN-inducible genes was observed among women with RA who improved during pregnancy, but not among women who worsened. These findings warrant further investigation into expression of these genes in RA pregnancy and their potential role in modulation of disease activity. These results are nevertheless preliminary and should be interpreted with caution until replicated in a larger sample.", "date": "2017-05-25", "date_type": "published", "publication": "Arthritis Research and Therapy", "volume": "19", "number": "1", "publisher": "BioMed Central", "pagerange": "Art. No. 104", "id_number": "CaltechAUTHORS:20170531-131446160", "issn": "1478-6362", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170531-131446160", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R21 AR057931" }, { "agency": "Gigtforeningen", "grant_number": "R87-A1477-B512" }, { "agency": "Juliane Marie Center, Rigshospitalet" } ] }, "doi": "10.1186/s13075-017-1312-2", "pmcid": "PMC5445464", "primary_object": { "basename": "13075_2017_1312_MOESM1_ESM.pdf", "url": "https://authors.library.caltech.edu/records/jbjbb-4as73/files/13075_2017_1312_MOESM1_ESM.pdf" }, "related_objects": [ { "basename": "13075_2017_1312_MOESM2_ESM.pdf", "url": "https://authors.library.caltech.edu/records/jbjbb-4as73/files/13075_2017_1312_MOESM2_ESM.pdf" }, { "basename": "art_3A10.1186_2Fs13075-017-1312-2.pdf", "url": "https://authors.library.caltech.edu/records/jbjbb-4as73/files/art_3A10.1186_2Fs13075-017-1312-2.pdf" } ], "resource_type": "article", "pub_year": "2017", "author_list": "Goin, Dana E.; Smed, Mette Kiel; et el." }, { "id": "https://authors.library.caltech.edu/records/7czc6-vfq51", "eprint_id": 77353, "eprint_status": "archive", "datestamp": "2023-08-21 21:05:54", "lastmod": "2023-10-25 22:06:47", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Li-Bo", "name": { "family": "Li", "given": "Bo" }, "orcid": "0000-0002-8019-8891" }, { "id": "Tambe-Akshay", "name": { "family": "Tambe", "given": "Akshay" } }, { "id": "Aviran-Sharon", "name": { "family": "Aviran", "given": "Sharon" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "PROBer Provides a General Toolkit for Analyzing Sequencing-Based Toeprinting Assays", "ispublished": "pub", "full_text_status": "public", "keywords": "post-transcriptional regulation; toeprinting by high-throughput sequencing; bioinformatics; RNA structure probing; post-transcriptional modification of RNA nucleotides; RNA-protein interactions", "note": "\u00a9 2017 Elsevier Inc. \n\nReceived 6 July 2016, Revised 19 December 2016, Accepted 13 April 2017, Available online 10 May 2017. \n\nWe thank Yiliang Ding, Yin Tang, Joel McManus, and Thomas Carlile for discussions and clarifications on the StructureFold, Mod-seeker, and Pseudo-seq methods. We thank Yeon Lee, Julian K\u00f6nig, Eric Van Nostrand, Gabriel Pratt, and Gene Yeo for discussions on the iCLIP and eCLIP protocols. This work is supported by NIH grants R01 HG006129 to L.P. and R00 HG006860 to S.A., and by the Center for RNA Systems Biology at UC Berkeley (NIH P50GM102706 grant) to B.L. A.T. was partially supported by NIH Molecular Biophysics Training grant (NIH GM08295).\n\nAccepted Version - nihms875397.pdf
Supplemental Material - 1-s2.0-S2405471217301394-mmc1.pdf
Supplemental Material - 1-s2.0-S2405471217301394-mmc2.pdf
", "abstract": "A number of sequencing-based transcriptase drop-off assays have recently been developed to probe post-transcriptional dynamics of RNA-protein interaction, RNA structure, and RNA modification. Although these assays survey a diverse set of epitranscriptomic marks, we use the term toeprinting assays since they share methodological similarities. Their interpretation is predicated on addressing a similar computational challenge: how to learn isoform-specific chemical modification profiles in the face of complex read multi-mapping. We introduce PROBer, a statistical model and associated software, that addresses this challenge for the analysis of toeprinting assays. PROBer takes sequencing data as input and outputs estimated transcript abundances and isoform-specific modification profiles. Results on both simulated and biological data demonstrate that PROBer significantly outperforms individual methods tailored for specific toeprinting assays. Since the space of toeprinting assays is ever expanding and these assays are likely to be performed and analyzed together, we believe PROBer's unified data analysis solution will be valuable to the RNA community.", "date": "2017-05-24", "date_type": "published", "publication": "Cell Systems", "volume": "4", "number": "5", "publisher": "Cell Press", "pagerange": "568-574", "id_number": "CaltechAUTHORS:20170510-142406445", "issn": "2405-4712", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170510-142406445", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R01 HG006129" }, { "agency": "NIH", "grant_number": "R00 HG006860" }, { "agency": "NIH", "grant_number": "P50GM102706" }, { "agency": "NIH Predoctoral Fellowship", "grant_number": "GM08295" } ] }, "doi": "10.1016/j.cels.2017.04.007", "pmcid": "PMC5758053", "primary_object": { "basename": "1-s2.0-S2405471217301394-mmc2.pdf", "url": "https://authors.library.caltech.edu/records/7czc6-vfq51/files/1-s2.0-S2405471217301394-mmc2.pdf" }, "related_objects": [ { "basename": "nihms875397.pdf", "url": "https://authors.library.caltech.edu/records/7czc6-vfq51/files/nihms875397.pdf" }, { "basename": "1-s2.0-S2405471217301394-mmc1.pdf", "url": "https://authors.library.caltech.edu/records/7czc6-vfq51/files/1-s2.0-S2405471217301394-mmc1.pdf" } ], "resource_type": "article", "pub_year": "2017", "author_list": "Li, Bo; Tambe, Akshay; et el." }, { "id": "https://authors.library.caltech.edu/records/tfn4w-6em19", "eprint_id": 77219, "eprint_status": "archive", "datestamp": "2023-08-19 02:45:05", "lastmod": "2023-10-23 15:08:24", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Yi-Lynn", "name": { "family": "Yi", "given": "Lynn" }, "orcid": "0000-0003-4575-0158" }, { "id": "Pimentel-H", "name": { "family": "Pimentel", "given": "Harold" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Zika infection of neural progenitor cells perturbs transcription in neurodevelopmental pathways", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2017 Yi et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and\nsource are credited. \n\nReceived: October 11, 2016; Accepted: March 30, 2017; Published: April 27, 2017. \n\nData Availability Statement: The data analysis can be repeated using the provided scripts at http://www.github.com/pachterlab/zika/. The preloaded sleuth Shiny app can be found via http://128.32.142.223/tang16/. These links direct to all the information necessary to replicate the study. \n\nLY was supported by funding from the UCLA/Caltech Medical Scientist Training Program, The Walter and Sylvia Treadway Endowment, and the NIH T32 (NRSA). LP was partially funded by NIH R01 DK094699. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. \n\nAuthor Contributions: \nConceptualization: LP.\nData curation: LY HP.\nFormal analysis: LY HP LP.\nFunding acquisition: LP.\nInvestigation: LY.\nMethodology: HP LY.\nProject administration: LP.\nResources: LP.\nSoftware: HP LY.\nSupervision: LP.\nValidation: LY.\nVisualization: LY HP.\nWriting \u2013 original draft: LY LP.\nWriting \u2013 review & editing: LY HP LP. \n\nThe authors have declared that no competing interests exist.\n\nPublished - journal.pone.0175744.pdf
Submitted - 072439.full.pdf
Supplemental Material - journal.pone.0175744.s001.csv
Supplemental Material - journal.pone.0175744.s002.csv
Supplemental Material - journal.pone.0175744.s003.csv
Supplemental Material - journal.pone.0175744.s004.xls
Supplemental Material - journal.pone.0175744.s005.xls
", "abstract": "Background: A recent study of the gene expression patterns of Zika virus (ZIKV) infected human neural progenitor cells (hNPCs) revealed transcriptional dysregulation and identified cell cycle-related pathways that are affected by infection. However deeper exploration of the information present in the RNA-Seq data can be used to further elucidate the manner in which Zika infection of hNPCs affects the transcriptome, refining pathway predictions and revealing isoform-specific dynamics. \n\nMethodology/Principal findings: We analyzed data published by Tang et al. using state-of-the-art tools for transcriptome analysis. By accounting for the experimental design and estimation of technical and inferential variance we were able to pinpoint Zika infection affected pathways that highlight Zika's neural tropism. The examination of differential genes reveals cases of isoform divergence. \n\nConclusions: Transcriptome analysis of Zika infected hNPCs has the potential to identify the molecular signatures of Zika infected neural cells. These signatures may be useful for diagnostics and for the resolution of infection pathways that can be used to harvest specific targets for further study.", "date": "2017-04-27", "date_type": "published", "publication": "PLoS ONE", "volume": "12", "number": "4", "publisher": "Public Library of Science", "pagerange": "Art. No. e0175744", "id_number": "CaltechAUTHORS:20170505-103858288", "issn": "1932-6203", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170505-103858288", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "UCLA-Caltech Medical Scientist Training Program" }, { "agency": "Walter and Sylvia Treadway Endowment" }, { "agency": "NIH", "grant_number": "R01 DK094699" } ] }, "doi": "10.1371/journal.pone.0175744", "pmcid": "PMC5407828", "primary_object": { "basename": "journal.pone.0175744.pdf", "url": "https://authors.library.caltech.edu/records/tfn4w-6em19/files/journal.pone.0175744.pdf" }, "related_objects": [ { "basename": "journal.pone.0175744.s001.csv", "url": "https://authors.library.caltech.edu/records/tfn4w-6em19/files/journal.pone.0175744.s001.csv" }, { "basename": "journal.pone.0175744.s002.csv", "url": "https://authors.library.caltech.edu/records/tfn4w-6em19/files/journal.pone.0175744.s002.csv" }, { "basename": "journal.pone.0175744.s003.csv", "url": "https://authors.library.caltech.edu/records/tfn4w-6em19/files/journal.pone.0175744.s003.csv" }, { "basename": "journal.pone.0175744.s004.xls", "url": "https://authors.library.caltech.edu/records/tfn4w-6em19/files/journal.pone.0175744.s004.xls" }, { "basename": "journal.pone.0175744.s005.xls", "url": "https://authors.library.caltech.edu/records/tfn4w-6em19/files/journal.pone.0175744.s005.xls" }, { "basename": "072439.full.pdf", "url": "https://authors.library.caltech.edu/records/tfn4w-6em19/files/072439.full.pdf" } ], "resource_type": "article", "pub_year": "2017", "author_list": "Yi, Lynn; Pimentel, Harold; et el." }, { "id": "https://authors.library.caltech.edu/records/nbtgs-bmj74", "eprint_id": 74831, "eprint_status": "archive", "datestamp": "2023-08-22 03:41:35", "lastmod": "2023-10-24 23:20:42", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "McAuliffe-J-D", "name": { "family": "McAuliffe", "given": "Jon D." } }, { "id": "Jordan-M-I", "name": { "family": "Jordan", "given": "Michael I." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Subtree power analysis finds optimal species for comparative genomics", "ispublished": "pub", "full_text_status": "public", "keywords": "hypothesis testing; likelihood ratio; sequence analysis", "note": "\u00a9 2005 The National Academy of Sciences. \n\nCommunicated by Peter J. Bickel, University of California, Berkeley, CA, April 6, 2005 (received for review December 13, 2004) \n\nWe thank Peter Bickel and Adam Siepel for helpful comments. M.I.J. was supported by National Institutes of Health Grant R33-HG003070. L.P. was supported by National Institutes of Health Grant R01-HG2362-3, a Sloan Foundation Research Fellowship, and National Science Foundation Career Award CCF-0347992. \n\nAuthor contributions: J.D.M., M.I.J., and L.P. designed research; J.D.M., M.I.J., and L.P. performed research; J.D.M., M.I.J., and L.P. contributed new reagents/analytic tools; J.D.M. analyzed data; and J.D.M. wrote the paper.\n\nPublished - PNAS-2005-McAuliffe-7900-5.pdf
Submitted - 0412012.pdf
Supplemental Material - 02790Fig3.pdf
", "abstract": "Sequence comparison across multiple organisms aids in the detection of regions under selection. However, resource limitations require a prioritization of genomes to be sequenced. This prioritization should be grounded in two considerations: the lineal scope encompassing the biological phenomena of interest, and the optimal species within that scope for detecting functional elements. We introduce a statistical framework for optimal species subset selection, based on maximizing power to detect conserved sites. Analysis of a phylogenetic star topology shows theoretically that the optimal species subset is not in general the most evolutionarily diverged subset. We then demonstrate this finding empirically in a study of vertebrate species. Our results suggest that marsupials are prime sequencing candidates.", "date": "2017-03-07", "date_type": "published", "publication": "Proceedings of the National Academy of Sciences of the United States of America", "volume": "102", "number": "22", "publisher": "National Academy of Sciences", "pagerange": "7900-7905", "id_number": "CaltechAUTHORS:20170307-090057219", "issn": "0027-8424", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170307-090057219", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R01-HG2362-3" }, { "agency": "Alfred P. Sloan Foundation" }, { "agency": "NSF", "grant_number": "CCF-0347992" } ] }, "doi": "10.1073/pnas.0502790102", "pmcid": "PMC1142384", "primary_object": { "basename": "0412012.pdf", "url": "https://authors.library.caltech.edu/records/nbtgs-bmj74/files/0412012.pdf" }, "related_objects": [ { "basename": "PNAS-2005-McAuliffe-7900-5.pdf", "url": "https://authors.library.caltech.edu/records/nbtgs-bmj74/files/PNAS-2005-McAuliffe-7900-5.pdf" }, { "basename": "02790Fig3.pdf", "url": "https://authors.library.caltech.edu/records/nbtgs-bmj74/files/02790Fig3.pdf" } ], "resource_type": "article", "pub_year": "2017", "author_list": "McAuliffe, Jon D.; Jordan, Michael I.; et el." }, { "id": "https://authors.library.caltech.edu/records/fw493-kaf67", "eprint_id": 95256, "eprint_status": "archive", "datestamp": "2023-08-19 00:27:32", "lastmod": "2023-10-20 19:02:42", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Pimentel-H", "name": { "family": "Pimentel", "given": "Harold" } }, { "id": "Sturmfels-P", "name": { "family": "Sturmfels", "given": "Pascal" } }, { "id": "Bray-N-L", "name": { "family": "Bray", "given": "Nicolas" } }, { "id": "Melsted-P", "name": { "family": "Melsted", "given": "P\u00e1ll" }, "orcid": "0000-0002-8418-6724" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "The Lair: a resource for exploratory analysis of published RNA-Seq data", "ispublished": "pub", "full_text_status": "public", "keywords": "RNA-Seq, Sequence read archive, Exploratory data analysis, Shiny, Interactive visualization, Reanalysis,\nReproducibility, Kallisto, Sleuth", "note": "\u00a9 2016 The Author(s). This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. \n\nReceived: 3 June 2016; Accepted: 19 November 2016; Published: 1 December 2016. \n\nWe thank the members of the Pachter Lab for contributing design and feature ideas and for suggesting the initial datasets with which to prototype The Lair. We thank the reviewers for testing our software and providing helpful comments and suggestions. \n\nHP and LP were partially supported by NIH grants R01 HG006129, R01 DK094699 and R01 HG008164. \n\nAvailability of data and materials: The main user website and database is at http://pachterlab.github.io/lair. The workflow software is at https://github.com/pachterlab/bears_analyses and the website code is at http://pachterlab.github.io/lair. Access to all webpages is free of charge. All software is released under GPLv3. \n\nAuthors' contributions: HP, NB, PM and LP conceived the idea. HP and PS developed The Lair system. HP, PS and LP analyzed data. HP and LP wrote the manuscript. All authors read and approved the final manuscript. \n\nAuthors' information: Not applicable.\n\nThe authors declare that they have no competing interests. \n\nConsent for publication: Not applicable. \n\nEthics approval and consent to participate: Not applicable.\n\nPublished - 12859_2016_Article_1357.pdf
", "abstract": "Increased emphasis on reproducibility of published research in the last few years has led to the large-scale archiving of sequencing data. While this data can, in theory, be used to reproduce results in papers, it is difficult to use in practice. We introduce a series of tools for processing and analyzing RNA-Seq data in the Sequence Read Archive, that together have allowed us to build an easily extendable resource for analysis of data underlying published papers. Our system makes the exploration of data easily accessible and usable without technical expertise. Our database and associated tools can be accessed at The Lair: http://pachterlab.github.io/lair", "date": "2016-12-01", "date_type": "published", "publication": "BMC Bioinformatics", "volume": "17", "number": "1", "publisher": "BioMed Central", "pagerange": "Art. No. 490", "id_number": "CaltechAUTHORS:20190506-141348569", "issn": "1471-2105", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20190506-141348569", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R01 HG006129" }, { "agency": "NIH", "grant_number": "R01 DK094699" }, { "agency": "NIH", "grant_number": "R01 HG008164" } ] }, "doi": "10.1186/s12859-016-1357-2", "pmcid": "PMC5131447", "primary_object": { "basename": "12859_2016_Article_1357.pdf", "url": "https://authors.library.caltech.edu/records/fw493-kaf67/files/12859_2016_Article_1357.pdf" }, "resource_type": "article", "pub_year": "2016", "author_list": "Pimentel, Harold; Sturmfels, Pascal; et el." }, { "id": "https://authors.library.caltech.edu/records/cz0cp-gge47", "eprint_id": 74808, "eprint_status": "archive", "datestamp": "2023-08-22 19:13:45", "lastmod": "2023-10-24 23:18:42", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Fu-Audrey-Qiuyan", "name": { "family": "Fu", "given": "Audrey Qiuyan" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Estimating intrinsic and extrinsic noise from single-cell gene expression measurements", "ispublished": "pub", "full_text_status": "public", "keywords": "gene expression; noise; optimal estimators; single cell", "note": "\u00a9 2016 Walter de Gruyter GmbH, Berlin/Boston. \n\nPublished Online: 2016-11-22; Published in Print: 2016-12-01.\n\nThis project began as a result of discussion during a journal club meeting of Prof. Jonathan Pritchard's group that A.F. was attending. We thank Michael Elowitz, Peter Swain, Nam Ki Lee and Sora Yang for sharing their data from Elowitz et al. (2002) and from Yang et al. (2014), respectively. We also thank helpful comments we have received since posting the manuscript online. In particular, we thank Arjun Raj for bringing up the 1- vs 2-copy experiment, and Erik van Nimwegen for helpful discussions. We also thank Editor in Chief Prof. Michael Stumpf and two anonymous reviewers for insightful comments that led to a significantly enriched version. A.F. was partially supported by K99 HG007368 and R00 HG007368 (NIH/NHGRI). L.P. was partially supported by NIH grants R01 HG006129 and R01 DK094699.\n\nAccepted Version - nihms873109.pdf
Submitted - 1601.03334.pdf
", "abstract": "Gene expression is stochastic and displays variation (\"noise\") both within and between cells. Intracellular (intrinsic) variance can be distinguished from extracellular (extrinsic) variance by applying the law of total variance to data from two-reporter assays that probe expression of identically regulated gene pairs in single cells. We examine established formulas [Elowitz, M. B., A. J. Levine, E. D. Siggia and P. S. Swain (2002): \"Stochastic gene expression in a single cell,\" Science, 297, 1183\u20131186.] for the estimation of intrinsic and extrinsic noise and provide interpretations of them in terms of a hierarchical model. This allows us to derive alternative estimators that minimize bias or mean squared error. We provide a geometric interpretation of these results that clarifies the interpretation in [Elowitz, M. B., A. J. Levine, E. D. Siggia and P. S. Swain (2002): \"Stochastic gene expression in a single cell,\" Science, 297, 1183\u20131186.]. We also demonstrate through simulation and re-analysis of published data that the distribution assumptions underlying the hierarchical model have to be satisfied for the estimators to produce sensible results, which highlights the importance of normalization.", "date": "2016-12", "date_type": "published", "publication": "Statistical Applications in Genetics and Molecular Biology", "volume": "15", "number": "6", "publisher": "De Gruyter", "pagerange": "447-471", "id_number": "CaltechAUTHORS:20170306-150123048", "issn": "2194-6302", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170306-150123048", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "K99 HG007368" }, { "agency": "NIH", "grant_number": "R00 HG007368" }, { "agency": "NIH", "grant_number": "R01 HG006129" }, { "agency": "NIH", "grant_number": "R01 DK094699" } ] }, "doi": "10.1515/sagmb-2016-0002", "pmcid": "PMC5518956", "primary_object": { "basename": "1601.03334.pdf", "url": "https://authors.library.caltech.edu/records/cz0cp-gge47/files/1601.03334.pdf" }, "related_objects": [ { "basename": "_Statistical_Applications_in_Genetics_and_Molecular_Biology__Estimating_intrinsic_and_extrinsic_noise_from_single-cell_gene_expression_measurements.pdf", "url": "https://authors.library.caltech.edu/records/cz0cp-gge47/files/_Statistical_Applications_in_Genetics_and_Molecular_Biology__Estimating_intrinsic_and_extrinsic_noise_from_single-cell_gene_expression_measurements.pdf" }, { "basename": "nihms873109.pdf", "url": "https://authors.library.caltech.edu/records/cz0cp-gge47/files/nihms873109.pdf" } ], "resource_type": "article", "pub_year": "2016", "author_list": "Fu, Audrey Qiuyan and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/pv1dd-91s03", "eprint_id": 95221, "eprint_status": "archive", "datestamp": "2023-08-22 19:01:41", "lastmod": "2023-10-20 18:41:12", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Chen-Xi-BIO", "name": { "family": "Chen", "given": "Xi" }, "orcid": "0000-0003-2648-3146" }, { "id": "Love-J-C", "name": { "family": "Love", "given": "J. Christopher" } }, { "id": "Navin-N-E", "name": { "family": "Navin", "given": "Nicholas E." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Stubbington-M-J-T", "name": { "family": "Stubbington", "given": "Michael J. T." } }, { "id": "Svensson-V", "name": { "family": "Svensson", "given": "Valentine" }, "orcid": "0000-0002-9217-2330" }, { "id": "Sweedler-J-V", "name": { "family": "Sweedler", "given": "Jonathan V." } }, { "id": "Teichmann-S-A", "name": { "family": "Teichmann", "given": "Sarah A." }, "orcid": "0000-0002-6294-6366" } ] }, "title": "Single-cell analysis at the threshold", "ispublished": "pub", "full_text_status": "restricted", "keywords": "Cell biology; Cytological techniques", "note": "\u00a9 2016 Nature Publishing Group. \n\nPublished 08 November 2016.", "abstract": "A discussion of some of the challenges and promise of single-cell technology.", "date": "2016-11", "date_type": "published", "publication": "Nature Biotechnology", "volume": "34", "number": "11", "publisher": "Nature Publishing Group", "pagerange": "1111-1118", "id_number": "CaltechAUTHORS:20190503-153138885", "issn": "1087-0156", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20190503-153138885", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "doi": "10.1038/nbt.3721", "resource_type": "article", "pub_year": "2016", "author_list": "Chen, Xi; Love, J. Christopher; et el." }, { "id": "https://authors.library.caltech.edu/records/jj12q-40t75", "eprint_id": 75011, "eprint_status": "archive", "datestamp": "2023-08-22 18:49:02", "lastmod": "2023-10-25 14:39:32", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Hateley-S", "name": { "family": "Hateley", "given": "Shannon" } }, { "id": "Hosamani-R", "name": { "family": "Hosamani", "given": "Ravikumar" } }, { "id": "Bhardwaj-S-R", "name": { "family": "Bhardwaj", "given": "Shilpa R." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Bhattacharya-S", "name": { "family": "Bhattacharya", "given": "Sharmila" } } ] }, "title": "Transcriptomic response of Drosophila melanogaster pupae developed in hypergravity", "ispublished": "pub", "full_text_status": "public", "keywords": "Hypergravity; Drosophila melanogaster; Pupae; Transcriptome; Metamorphosis; RNA-Seq", "note": "\u00a9 2016 Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). \n\nReceived 11 May 2016, Revised 12 August 2016, Accepted 8 September 2016, Available online 10 September 2016. \n\nThis work was funded by NASA grants to SB (NNX15AB42G and NNX13AN38G). RH was supported by a NASA Post-Doctoral Program (NPP) Fellowship. SH was supported by the NSF Graduate Research Fellowship. This work used the Vincent J. Coates Genomics Sequencing Laboratory at UC Berkeley, supported by NIH S10 Instrumentation Grants S10RR029668 and S10RR027303. \n\nAuthors' contributions: SH, RH, and SB wrote the manuscript text. RH and SRB carried out the experiments. SH performed the sequencing library preparation. SH performed bioinformatic analysis including QC, mapping, expression estimates, and differential expression analysis; pathway analysis was performed jointly by SH and RH. RH and SRB performed qRT-PCR and corresponding analysis. SB and LP supervised the research and edited the manuscript text. All authors read and approved the final manuscript. \n\nThe authors declare that they have no competing interests.\n\nPublished - 1-s2.0-S088875431630088X-main.pdf
Supplemental Material - mmc1.pdf
Supplemental Material - mmc10.docx
Supplemental Material - mmc2.pdf
Supplemental Material - mmc3.pdf
Supplemental Material - mmc4.zip
Supplemental Material - mmc5.xlsx
Supplemental Material - mmc6.txt
Supplemental Material - mmc7.txt
Supplemental Material - mmc8.txt
Supplemental Material - mmc9.txt
", "abstract": "Altered gravity can perturb normal development and induce corresponding changes in gene expression. Understanding this relationship between the physical environment and a biological response is important for NASA's space travel goals. We use RNA-Seq and qRT-PCR techniques to profile changes in early Drosophila melanogaster pupae exposed to chronic hypergravity (3 g, or three times Earth's gravity). During the pupal stage, D. melanogaster rely upon gravitational cues for proper development. Assessing gene expression changes in the pupae under altered gravity conditions helps highlight gravity-dependent genetic pathways. A robust transcriptional response was observed in hypergravity-treated pupae compared to controls, with 1513 genes showing a significant (q < 0.05) difference in gene expression. Five major biological processes were affected: ion transport, redox homeostasis, immune response, proteolysis, and cuticle development. \n\nThis outlines the underlying molecular and biological changes occurring in Drosophila pupae in response to hypergravity; gravity is important for many biological processes on Earth.", "date": "2016-10", "date_type": "published", "publication": "Genomics", "volume": "108", "number": "3-4", "publisher": "Elsevier", "pagerange": "158-167", "id_number": "CaltechAUTHORS:20170309-154829565", "issn": "0888-7543", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170309-154829565", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NASA", "grant_number": "NNX15AB42G" }, { "agency": "NASA", "grant_number": "NNX13AN38G" }, { "agency": "NASA Postdoctoral Program" }, { "agency": "NSF Graduate Research Fellowship" }, { "agency": "NIH", "grant_number": "S10RR029668" }, { "agency": "NIH", "grant_number": "S10RR02730" } ] }, "doi": "10.1016/j.ygeno.2016.09.002", "primary_object": { "basename": "mmc5.xlsx", "url": "https://authors.library.caltech.edu/records/jj12q-40t75/files/mmc5.xlsx" }, "related_objects": [ { "basename": "mmc6.txt", "url": "https://authors.library.caltech.edu/records/jj12q-40t75/files/mmc6.txt" }, { "basename": "mmc8.txt", "url": "https://authors.library.caltech.edu/records/jj12q-40t75/files/mmc8.txt" }, { "basename": "mmc9.txt", "url": "https://authors.library.caltech.edu/records/jj12q-40t75/files/mmc9.txt" }, { "basename": "1-s2.0-S088875431630088X-main.pdf", "url": "https://authors.library.caltech.edu/records/jj12q-40t75/files/1-s2.0-S088875431630088X-main.pdf" }, { "basename": "mmc10.docx", "url": "https://authors.library.caltech.edu/records/jj12q-40t75/files/mmc10.docx" }, { "basename": "mmc2.pdf", "url": "https://authors.library.caltech.edu/records/jj12q-40t75/files/mmc2.pdf" }, { "basename": "mmc4.zip", "url": "https://authors.library.caltech.edu/records/jj12q-40t75/files/mmc4.zip" }, { "basename": "mmc1.pdf", "url": "https://authors.library.caltech.edu/records/jj12q-40t75/files/mmc1.pdf" }, { "basename": "mmc3.pdf", "url": "https://authors.library.caltech.edu/records/jj12q-40t75/files/mmc3.pdf" }, { "basename": "mmc7.txt", "url": "https://authors.library.caltech.edu/records/jj12q-40t75/files/mmc7.txt" } ], "resource_type": "article", "pub_year": "2016", "author_list": "Hateley, Shannon; Hosamani, Ravikumar; et el." }, { "id": "https://authors.library.caltech.edu/records/3dew4-m0c86", "eprint_id": 95224, "eprint_status": "archive", "datestamp": "2023-08-22 18:01:22", "lastmod": "2023-10-20 19:00:39", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Ntranos-V", "name": { "family": "Ntranos", "given": "Vasilis" }, "orcid": "0000-0002-2477-0670" }, { "id": "Kamath-G-M", "name": { "family": "Kamath", "given": "Govinda M." } }, { "id": "Zhang-Jesse-M", "name": { "family": "Zhang", "given": "Jesse M." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Tse-David-N", "name": { "family": "Tse", "given": "David N." } } ] }, "title": "Fast and accurate single-cell RNA-seq analysis by clustering of transcript-compatibility counts", "ispublished": "pub", "full_text_status": "public", "keywords": "Minimum Span Tree; Affinity Propagation; Read Alignment; Affinity Propagation Algorithm; Pairwise Distance Matrix", "note": "\u00a9 2016 Ntranos et al. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. \n\nReceived: 24 February 2016; Accepted: 29 April 2016; Published: 26 May 2016. \n\nAvailability of data and materials: The code used to generate the results presented in this paper is available online on GitHub [49]. All sequencing reads for the Zeisel et al. dataset [7] are available through Gene Expression Omnibus [GEO:GSE60361] and for the Trapnell et al. dataset [12] through [GEO:GSE52529]. The method is publically available on GitHub (https://github.com/govinda-kamath/clustering_on_transcript_compatibility_counts) under the MIT license. \n\nEthics: No ethics approval was required for this study. \n\nWe thank P\u00e1ll Melsted for implementing the pseudo command in kallisto. This is the command that allows for direct output of transcript-compatibility counts via pseudoalignment. We would also like to thank Bo Li, Allon Wagner, and Nir Yosef for useful discussions about single-cell RNA-seq assays and their biases. \n\nThe authors declare that they have no competing interests. \n\nAuthors' contributions: VN, GMK, and JZ conceived the idea of clustering without quantification, performed analyses of data, analyzed and interpreted results, and wrote the manuscript. DNT and LP interpreted results, supervised the project, and wrote the manuscript. All authors read and approved the final manuscript. \n\nGMK and JZ are supported by the Center for Science of Information, an NSF Science and Technology Center, under grant agreement CCF-0939370. VN is supported in part by the Center for Science of Information and in part by a gift from Qualcomm Inc. LP is supported in part by the National Human Genome Research Institute of the National Institutes of Health under award number R01HG006129. DNT is supported in part by the Center of Science of Information and in part by the National Human Genome Research Institute of the National Institutes of Health under award number R01HG008164.\n\nPublished - 13059_2016_Article_970.pdf
Submitted - 036863.full.pdf
Supplemental Material - 13059_2016_970_MOESM1_ESM.pdf
", "abstract": "Current approaches to single-cell transcriptomic analysis are computationally intensive and require assay-specific modeling, which limits their scope and generality. We propose a novel method that compares and clusters cells based on their transcript-compatibility read counts rather than on the transcript or gene quantifications used in standard analysis pipelines. In the reanalysis of two landmark yet disparate single-cell RNA-seq datasets, we show that our method is up to two orders of magnitude faster than previous approaches, provides accurate and in some cases improved results, and is directly applicable to data from a wide variety of assays.", "date": "2016-05-26", "date_type": "published", "publication": "Genome Biology", "volume": "17", "number": "1", "publisher": "BioMed Central", "pagerange": "Art. No. 112", "id_number": "CaltechAUTHORS:20190503-155957743", "issn": "1474-760X", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20190503-155957743", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NSF", "grant_number": "CCF-0939370" }, { "agency": "Center for Science of Information (CSoI)" }, { "agency": "Qualcomm Inc." }, { "agency": "NIH", "grant_number": "R01HG006129" }, { "agency": "NIH", "grant_number": "R01HG008164" } ] }, "doi": "10.1186/s13059-016-0970-8", "pmcid": "PMC4881296", "primary_object": { "basename": "036863.full.pdf", "url": "https://authors.library.caltech.edu/records/3dew4-m0c86/files/036863.full.pdf" }, "related_objects": [ { "basename": "13059_2016_970_MOESM1_ESM.pdf", "url": "https://authors.library.caltech.edu/records/3dew4-m0c86/files/13059_2016_970_MOESM1_ESM.pdf" }, { "basename": "13059_2016_Article_970.pdf", "url": "https://authors.library.caltech.edu/records/3dew4-m0c86/files/13059_2016_Article_970.pdf" } ], "resource_type": "article", "pub_year": "2016", "author_list": "Ntranos, Vasilis; Kamath, Govinda M.; et el." }, { "id": "https://authors.library.caltech.edu/records/9cqwz-arv02", "eprint_id": 95246, "eprint_status": "archive", "datestamp": "2023-08-20 11:34:36", "lastmod": "2023-10-23 15:25:46", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Bray-N-L", "name": { "family": "Bray", "given": "Nicolas L." } }, { "id": "Pimentel-H", "name": { "family": "Pimentel", "given": "Harold" } }, { "id": "Melsted-P", "name": { "family": "Melsted", "given": "P\u00e1ll" }, "orcid": "0000-0002-8418-6724" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Near-optimal probabilistic RNA-seq quantification", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2016 Springer Nature Publishing AG. \n\nReceived 15 October 2015; Accepted 25 February 2016; Published 04 April 2016. \n\nN.L.B., H.P. and L.P. were partially funded by NIH R01 HG006129. P.M. was partially funded by a Fulbright fellowship. \n\nAuthor Contributions: N.L.B. and L.P. developed the concept of pseudoalignment and conceived the idea for applying it to RNA-seq quantification. P.M. conceived the implementation using De Bruijn graphs. N.L.B., H.P., P.M. and L.P. designed the kallisto software and N.L.B. implemented a prototype. H.P. and P.M. wrote the current kallisto implementation. N.B. and H.P. automated production of the results. N.L.B., H.P., P.M. and L.P. analyzed results and wrote the paper. \n\nThe authors declare no competing financial interests.\n\nIn the version of this article initially published, in the HTML version only, the equation \"\u03b1_tN > 0.01\" was written as \"\u03b1_(tN) > 0.01.\" In addition, in the Figure 1 legend, the formatting of the nodes was incorrect (v_1, etc., rather than v1). The errors have been corrected in the HTML and PDF versions of the article.\n\nSupplemental Material - nbt.3519-S1.pdf
Supplemental Material - nbt.3519-S2.xlsx
Supplemental Material - nbt.3519-S3.xlsx
Supplemental Material - nbt.3519-S4.xlsx
Supplemental Material - nbt.3519-S5.zip
Supplemental Material - nbt.3519-S6.zip
Supplemental Material - nbt.3519-SF1.jpg
Supplemental Material - nbt.3519-SF10.jpg
Supplemental Material - nbt.3519-SF11.jpg
Supplemental Material - nbt.3519-SF2.jpg
Supplemental Material - nbt.3519-SF3.jpg
Supplemental Material - nbt.3519-SF4.jpg
Supplemental Material - nbt.3519-SF5.jpg
Supplemental Material - nbt.3519-SF6.jpg
Supplemental Material - nbt.3519-SF7.jpg
Supplemental Material - nbt.3519-SF8.jpg
Supplemental Material - nbt.3519-SF9.jpg
", "abstract": "We present kallisto, an RNA-seq quantification program that is two orders of magnitude faster than previous approaches and achieves similar accuracy. Kallisto pseudoaligns reads to a reference, producing a list of transcripts that are compatible with each read while avoiding alignment of individual bases. We use kallisto to analyze 30 million unaligned paired-end RNA-seq reads in <10 min on a standard laptop computer. This removes a major computational bottleneck in RNA-seq analysis.", "date": "2016-05", "date_type": "published", "publication": "Nature Biotechnology", "volume": "34", "number": "5", "publisher": "Nature Publishing Group", "pagerange": "525-527", "id_number": "CaltechAUTHORS:20190506-110012992", "issn": "1087-0156", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20190506-110012992", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R01 HG006129" }, { "agency": "Fulbright Foundation" } ] }, "doi": "10.1038/nbt.3519", "primary_object": { "basename": "nbt.3519-SF10.jpg", "url": "https://authors.library.caltech.edu/records/9cqwz-arv02/files/nbt.3519-SF10.jpg" }, "related_objects": [ { "basename": "nbt.3519-SF4.jpg", "url": "https://authors.library.caltech.edu/records/9cqwz-arv02/files/nbt.3519-SF4.jpg" }, { "basename": "nbt.3519-SF5.jpg", "url": "https://authors.library.caltech.edu/records/9cqwz-arv02/files/nbt.3519-SF5.jpg" }, { "basename": "nbt.3519-S4.xlsx", "url": "https://authors.library.caltech.edu/records/9cqwz-arv02/files/nbt.3519-S4.xlsx" }, { "basename": "nbt.3519-S5.zip", "url": "https://authors.library.caltech.edu/records/9cqwz-arv02/files/nbt.3519-S5.zip" }, { "basename": "nbt.3519-SF3.jpg", "url": "https://authors.library.caltech.edu/records/9cqwz-arv02/files/nbt.3519-SF3.jpg" }, { "basename": "nbt.3519-SF7.jpg", "url": "https://authors.library.caltech.edu/records/9cqwz-arv02/files/nbt.3519-SF7.jpg" }, { "basename": "nbt.3519-SF8.jpg", "url": "https://authors.library.caltech.edu/records/9cqwz-arv02/files/nbt.3519-SF8.jpg" }, { "basename": "nbt.3519-S2.xlsx", "url": "https://authors.library.caltech.edu/records/9cqwz-arv02/files/nbt.3519-S2.xlsx" }, { "basename": "nbt.3519-S3.xlsx", "url": "https://authors.library.caltech.edu/records/9cqwz-arv02/files/nbt.3519-S3.xlsx" }, { "basename": "nbt.3519-S6.zip", "url": "https://authors.library.caltech.edu/records/9cqwz-arv02/files/nbt.3519-S6.zip" }, { "basename": "nbt.3519-SF1.jpg", "url": "https://authors.library.caltech.edu/records/9cqwz-arv02/files/nbt.3519-SF1.jpg" }, { "basename": "nbt.3519-SF11.jpg", "url": "https://authors.library.caltech.edu/records/9cqwz-arv02/files/nbt.3519-SF11.jpg" }, { "basename": "nbt.3519-SF2.jpg", "url": "https://authors.library.caltech.edu/records/9cqwz-arv02/files/nbt.3519-SF2.jpg" }, { "basename": "nbt.3519-SF6.jpg", "url": "https://authors.library.caltech.edu/records/9cqwz-arv02/files/nbt.3519-SF6.jpg" }, { "basename": "nbt.3519-S1.pdf", "url": "https://authors.library.caltech.edu/records/9cqwz-arv02/files/nbt.3519-S1.pdf" }, { "basename": "nbt.3519-SF9.jpg", "url": "https://authors.library.caltech.edu/records/9cqwz-arv02/files/nbt.3519-SF9.jpg" } ], "resource_type": "article", "pub_year": "2016", "author_list": "Bray, Nicolas L.; Pimentel, Harold; et el." }, { "id": "https://authors.library.caltech.edu/records/6h7fw-ebc29", "eprint_id": 74705, "eprint_status": "archive", "datestamp": "2023-08-20 10:02:35", "lastmod": "2023-10-24 22:52:13", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Pimentel-H", "name": { "family": "Pimentel", "given": "Harold" } }, { "id": "Parra-M", "name": { "family": "Parra", "given": "Marilyn" } }, { "id": "Gee-S-L", "name": { "family": "Gee", "given": "Sherry L." } }, { "id": "Mohandas-N", "name": { "family": "Mohandas", "given": "Narla" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Conboy-J-G", "name": { "family": "Conboy", "given": "John G." } } ] }, "title": "A dynamic intron retention program enriched in RNA processing genes regulates gene expression during terminal erythropoiesis", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. \n\nReceived April 10, 2015; Revised October 5, 2015; Accepted October 21, 2015. \n\nAuthor Contributions: J.G.C. and L.P. designed the research; H.P., M.P., S.L.G. performed research and analyzed data; and J.G.C., H.P., N.M., and L.P. wrote the article. \n\nFUNDING: National Institutes of Health [DK094699 to J.G.C., L.P.; DK32094 and DK26263 to N.M.]; Director, Office of Science and Office of Biological & Environmental Research of the US Department of Energy [DE-AC02-05CH1123]. Funding for open access charge: National Institutes of Health. \n\nConflict of interest statement. None declared.\n\nPublished - gkv1168.pdf
", "abstract": "Differentiating erythroblasts execute a dynamic alternative splicing program shown here to include extensive and diverse intron retention (IR) events. Cluster analysis revealed hundreds of developmentally-dynamic introns that exhibit increased IR in mature erythroblasts, and are enriched in functions related to RNA processing such as SF3B1 spliceosomal factor. Distinct, developmentally-stable IR clusters are enriched in metal-ion binding functions and include mitoferrin genes SLC25A37 and SLC25A28 that are critical for iron homeostasis. Some IR transcripts are abundant, e.g. comprising \u223c50% of highly-expressed SLC25A37 and SF3B1 transcripts in late erythroblasts, and thereby limiting functional mRNA levels. IR transcripts tested were predominantly nuclear-localized. Splice site strength correlated with IR among stable but not dynamic intron clusters, indicating distinct regulation of dynamically-increased IR in late erythroblasts. Retained introns were preferentially associated with alternative exons with premature termination codons (PTCs). High IR was observed in disease-causing genes including SF3B1 and the RNA binding protein FUS. Comparative studies demonstrated that the intron retention program in erythroblasts shares features with other tissues but ultimately is unique to erythropoiesis. We conclude that IR is a multi-dimensional set of processes that post-transcriptionally regulate diverse gene groups during normal erythropoiesis, misregulation of which could be responsible for human disease.", "date": "2016-01-29", "date_type": "published", "publication": "Nucleic Acids Research", "volume": "44", "number": "2", "publisher": "Oxford University Press", "pagerange": "838-851", "id_number": "CaltechAUTHORS:20170303-131213123", "issn": "0305-1048", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170303-131213123", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "DK094699" }, { "agency": "NIH", "grant_number": "DK32094" }, { "agency": "NIH", "grant_number": "DK26263" }, { "agency": "Department of Energy (DOE)", "grant_number": "DE-AC02-05CH11231" } ] }, "doi": "10.1093/nar/gkv1168", "pmcid": "PMC4737145", "primary_object": { "basename": "gkv1168.pdf", "url": "https://authors.library.caltech.edu/records/6h7fw-ebc29/files/gkv1168.pdf" }, "resource_type": "article", "pub_year": "2016", "author_list": "Pimentel, Harold; Parra, Marilyn; et el." }, { "id": "https://authors.library.caltech.edu/records/2n03m-r5660", "eprint_id": 74702, "eprint_status": "archive", "datestamp": "2023-08-20 09:31:33", "lastmod": "2023-10-24 22:52:04", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Mittal-A", "name": { "family": "Mittal", "given": "Anuradha" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Nelson-J-L", "name": { "family": "Nelson", "given": "J. Lee" } }, { "id": "Kj\u00e6rgaard-H", "name": { "family": "Kj\u00e6rgaard", "given": "Hanne" } }, { "id": "Smed-M-K", "name": { "family": "Smed", "given": "Mette Kiel" } }, { "id": "Gildengorin-V-L", "name": { "family": "Gildengorin", "given": "Virginia L." } }, { "id": "Zoffmann-V", "name": { "family": "Zoffmann", "given": "Vibeke" } }, { "id": "Hetland-M-L", "name": { "family": "Hetland", "given": "Merete Lund" } }, { "id": "Jewell-N-P", "name": { "family": "Jewell", "given": "Nicholas P." } }, { "id": "Olsen-J", "name": { "family": "Olsen", "given": "J\u00f8rn" } }, { "id": "Jawaheer-D", "name": { "family": "Jawaheer", "given": "Damini" } } ] }, "title": "Pregnancy-Induced Changes in Systemic Gene Expression among Healthy Women and Women with Rheumatoid Arthritis", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2015 Mittal et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. \n\nReceived: September 24, 2015; Accepted: November 30, 2015; Published: December 18, 2015. \n\nThis work was supported in part by funds from the National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS, http://www.niams.nih.gov/), USA (grant R21AR057931); and Gigtforeningen (https://www.gigtforeningen.dk/), Denmark (grant R87-A1477-B512). Author DJ received the funding. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. \n\nWe are grateful to the study subjects for their participation in the study. \n\nThe Rheumatology departments at the following hospitals in Denmark facilitated collection of data and samples: Rigshospitalet (Glostrup), Odense Universitetshospital, Kong Christian X's Gigthospital (Gr\u00e5sten), \u00c5rhus University Hospital NBG and Regionshospitalet Viborg. \n\nWe thank Anne-Grethe Rasmussen, Charlotte Sch\u00f6n Frengler, Dorte Heide, Randi Petersen, Tove Thorup Rasmussen, Lone Thomasen, Britta Hvidberg Nielsen, Teresa Rozenfeldt, Kirsten Junker, Lis Kastberg Schubert, Lis Lund, and Jette Barlach for their contribution with data and sample collection, and Rikke Godtkj\u00e6r Andersen, Mie Rasmussen and Tashnia Hossain for management of data and samples. \n\nWe also greatly appreciate valuable assistance provided by Majbritt Norman Nielsen, and DANBIO personnel. \n\nDr. Hanne Kj\u00e6rgaard passed away before the submission of the final version of this manuscript. Dr. Damini Jawaheer accepts responsibility for the integrity and validity of the data collected and analyzed. \n\nAuthor Contributions: Conceived and designed the experiments: AM JLN HK NPJ JO DJ. Performed the experiments: HK MKS VZ DJ. Analyzed the data: AM LP VLG NPJ DJ. Contributed reagents/materials/analysis tools: AM LP HK MKS VZ MLH NPJ DJ. Wrote the paper: AM LP JLN MKS VLG VZ MLH NPJ JO DJ. \n\nData Availability: Data are governed by Danish privacy laws. The authors are legally forbidden from publicly sharing data under the terms of their agreement with the Danish Data Protection Agency. Data are available upon request to the corresponding author, after approval is granted by the Danish Data Protection Agency. \n\nThe authors have declared that no competing interests exist.\n\nPublished - journal.pone.0145204.PDF
Supplemental Material - journal.pone.0145204.s001.TIF
Supplemental Material - journal.pone.0145204.s002.TIF
", "abstract": "Background: Pregnancy induces drastic biological changes systemically, and has a beneficial effect on some autoimmune conditions such as rheumatoid arthritis (RA). However, specific systemic changes that occur as a result of pregnancy have not been thoroughly examined in healthy women or women with RA. The goal of this study was to identify genes with expression patterns associated with pregnancy, compared to pre-pregnancy as baseline and determine whether those associations are modified by presence of RA. \n\nResults: In our RNA sequencing (RNA-seq) dataset from 5 healthy women and 20 women with RA, normalized expression levels of 4,710 genes were significantly associated with pregnancy status (pre-pregnancy, first, second and third trimesters) over time, irrespective of presence of RA (False Discovery Rate (FDR)-adjusted p value<0.05). These genes were enriched in pathways spanning multiple systems, as would be expected during pregnancy. A subset of these genes (n = 256) showed greater than two-fold change in expression during pregnancy compared to baseline levels, with distinct temporal trends through pregnancy. Another 98 genes involved in various biological processes including immune regulation exhibited expression patterns that were differentially associated with pregnancy in the presence or absence of RA. \n\nConclusions: Our findings support the hypothesis that the maternal immune system plays an active role during pregnancy, and also provide insight into other systemic changes that occur in the maternal transcriptome during pregnancy compared to the pre-pregnancy state. Only a small proportion of genes modulated by pregnancy were influenced by presence of RA in our data.", "date": "2015-12-18", "date_type": "published", "publication": "PLOS ONE", "volume": "10", "number": "12", "publisher": "Public Library of Science", "pagerange": "Art. No. e0145204", "id_number": "CaltechAUTHORS:20170303-124807271", "issn": "1932-6203", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170303-124807271", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R21AR057931" }, { "agency": "Gigtforeningen", "grant_number": "R87-A1477-B512" } ] }, "doi": "10.1371/journal.pone.0145204", "pmcid": "PMC4684291", "primary_object": { "basename": "journal.pone.0145204.PDF", "url": "https://authors.library.caltech.edu/records/2n03m-r5660/files/journal.pone.0145204.PDF" }, "related_objects": [ { "basename": "journal.pone.0145204.s001.TIF", "url": "https://authors.library.caltech.edu/records/2n03m-r5660/files/journal.pone.0145204.s001.TIF" }, { "basename": "journal.pone.0145204.s002.TIF", "url": "https://authors.library.caltech.edu/records/2n03m-r5660/files/journal.pone.0145204.s002.TIF" } ], "resource_type": "article", "pub_year": "2015", "author_list": "Mittal, Anuradha; Pachter, Lior; et el." }, { "id": "https://authors.library.caltech.edu/records/nrkwk-ccq57", "eprint_id": 74703, "eprint_status": "archive", "datestamp": "2023-08-20 09:26:46", "lastmod": "2023-10-24 22:52:09", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Hanchate-N-K", "name": { "family": "Hanchate", "given": "Naresh K." } }, { "id": "Kondoh-Kunio", "name": { "family": "Kondoh", "given": "Kunio" } }, { "id": "Lu-Zhonglua", "name": { "family": "Lu", "given": "Zhonglua" } }, { "id": "Kuand-Donghui", "name": { "family": "Kuang", "given": "Donghui" } }, { "id": "Ye-Xiaolan", "name": { "family": "Ye", "given": "Xiaolan" } }, { "id": "Qiu-Xiaojie", "name": { "family": "Qiu", "given": "Xiaojie" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Trapnell-C", "name": { "family": "Trapnell", "given": "Cole" } }, { "id": "Buck-L-B", "name": { "family": "Buck", "given": "Linda B." } } ] }, "title": "Single-cell transcriptomics reveals receptor transformations during olfactory neurogenesis", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2015 American Association for the Advancement of Science. \n\n13 August 2015; accepted 27 October 2015. \n\nWe thank J. Delrow, A. Marty, and A. Dawson at the Fred Hutchinson Cancer Research Center (FHCRC) Genomics Facility for assistance with RNA-seq; M. Fitzgibbon and J. Davidson at the FHCRC Bioinformatics Resource for early assistance with sequence analyses; and J. Vasquez and the FHCRC Scientific Imaging Facility for help with confocal microscopy. We also thank members of the Buck laboratory for helpful discussions. This work was supported by the Howard Hughes Medical Institute (L.B.B.), NIH grants R01 DC009324 (L.B.B.) and DP2 HD088158 (C.T.), an Alfred P. Sloan Fellowship (C.T.), and a Dale F. Frey Award for Breakthrough Scientists from the Damon Runyon Cancer Research Foundation (C.T.). L.B.B. is on the Board of Directors of International Flavors & Fragrances. The supplementary materials contain additional data. N.K.H., C.T., and L.B.B. designed the research; N.K.H. and C.T. performed the research; N.K.H., C.T., K.K., Z.L., D.K., X.Y., X.Q., and L.B.B. analyzed the data; L.P. provided guidance; and N.K.H, C.T., and L.B.B. wrote the paper. Raw sequencing data related to this study have been archived in the Gene Expression Omnibus (GEO) database under accession number GSE75413 (available at www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE75413).\n\nAccepted Version - nihms909331.pdf
Supplemental Material - 04/science.aad2456.DC1/aad2456-Hanchate-SM.pdf
", "abstract": "The sense of smell allows chemicals to be perceived as diverse scents. We used single neuron RNA-Sequencing (RNA-Seq) to explore developmental mechanisms that shape this ability as nasal olfactory neurons mature in mice. Most mature neurons expressed only one of the roughly 1000 odorant receptor genes (Olfrs) available, and that at high levels. However, many immature neurons expressed low levels of multiple Olfrs. Coexpressed Olfrs localized to overlapping zones of the nasal epithelium, suggesting regional biases, but not to single genomic loci. A single immature neuron could express Olfrs from up to seven different chromosomes. The mature state in which expression of Olfr genes is restricted to one per neuron emerges over a developmental progression that appears independent of neuronal activity requiring sensory transduction molecules.", "date": "2015-12-04", "date_type": "published", "publication": "Science", "volume": "350", "number": "6265", "publisher": "American Association for the Advancement of Science", "pagerange": "1251-1255", "id_number": "CaltechAUTHORS:20170303-130217211", "issn": "0036-8075", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170303-130217211", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "Howard Hughes Medical Institute (HHMI)" }, { "agency": "NIH", "grant_number": "R01 DC009324" }, { "agency": "NIH", "grant_number": "DP2 HD088158" }, { "agency": "Alfred P. Sloan Foundation" }, { "agency": "Damon Runyon Cancer Research Foundation" } ] }, "doi": "10.1126/science.aad2456", "pmcid": "PMC5642900", "primary_object": { "basename": "nihms909331.pdf", "url": "https://authors.library.caltech.edu/records/nrkwk-ccq57/files/nihms909331.pdf" }, "resource_type": "article", "pub_year": "2015", "author_list": "Hanchate, Naresh K.; Kondoh, Kunio; et el." }, { "id": "https://authors.library.caltech.edu/records/bj6ys-b0g23", "eprint_id": 58956, "eprint_status": "archive", "datestamp": "2023-08-20 08:49:48", "lastmod": "2023-10-23 19:54:38", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Paten-B", "name": { "family": "Paten", "given": "Benedict" } }, { "id": "Diekhans-M", "name": { "family": "Diekhans", "given": "Mark" } }, { "id": "Druker-B-J", "name": { "family": "Druker", "given": "Brian J." } }, { "id": "Friend-S", "name": { "family": "Friend", "given": "Stephen" } }, { "id": "Guinney-J", "name": { "family": "Guinney", "given": "Justin" } }, { "id": "Gassner-N", "name": { "family": "Gassner", "given": "Nadine" } }, { "id": "Guttman-M", "name": { "family": "Guttman", "given": "Mitchell" }, "orcid": "0000-0003-4748-9352" }, { "id": "Kent-W-J", "name": { "family": "Kent", "given": "W. James" } }, { "id": "Mantey-P", "name": { "family": "Mantey", "given": "Patrick" } }, { "id": "Margolin-A-A", "name": { "family": "Margolin", "given": "Adam A." } }, { "id": "Massie-M", "name": { "family": "Massie", "given": "Matt" } }, { "id": "Novak-A-M", "name": { "family": "Novak", "given": "Adam M." } }, { "id": "Nothaft-F", "name": { "family": "Nothaft", "given": "Frank" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Patterson-D", "name": { "family": "Patterson", "given": "David" } }, { "id": "Smuga-Otto-M", "name": { "family": "Smuga-Otto", "given": "Maciej" } }, { "id": "Stuart-J-M", "name": { "family": "Stuart", "given": "Joshua M." } }, { "id": "Van't-Veer-L", "name": { "family": "Van't Veer", "given": "Laura" } }, { "id": "Wold-B-J", "name": { "family": "Wold", "given": "Barbara" }, "orcid": "0000-0003-3235-8130" }, { "id": "Haussler-D", "name": { "family": "Haussler", "given": "David" } } ] }, "title": "The NIH BD2K center for big data in translational genomics", "ispublished": "pub", "full_text_status": "restricted", "note": "\u00a9 2015 The Author. Published by Oxford University Press on behalf of the American Medical Informatics Association. \n\nFirst published online: 14 July 2015. \n\nWe would like to thank the reviewers for their helpful comments and\nsuggestions. \n\nFunding: This work was supported by the National Human Genome Research Institute of the National Institutes of Health under Award Number U54HG007990. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. \n\nContributors: BP and DH wrote the manuscript with assistance from other authors. FN and MS created the figures. All the authors edited the manuscript.", "abstract": "The world's genomics data will never be stored in a single repository \u2013 rather, it will be distributed among many sites in many\ncountries. No one site will have enough data to explain genotype to phenotype relationships in rare diseases; therefore, sites must\nshare data. To accomplish this, the genetics community must forge common standards and protocols to make sharing and computing\ndata among many sites a seamless activity. Through the Global Alliance for Genomics and Health, we are pioneering the development\nof shared application programming interfaces (APIs) to connect the world's genome repositories. In parallel, we are developing\nan open source software stack (ADAM) that uses these APIs. This combination will create a cohesive genome informatics\necosystem. Using containers, we are facilitating the deployment of this software in a diverse array of environments. Through benchmarking\nefforts and big data driver projects, we are ensuring ADAM's performance and utility.", "date": "2015-11", "date_type": "published", "publication": "Journal of the American Medical Informatics Association", "volume": "22", "number": "6", "publisher": "American Medical Informatics Association", "pagerange": "1143-1147", "id_number": "CaltechAUTHORS:20150721-085701041", "issn": "1067-5027", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20150721-085701041", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "National Human Genome Research Institute", "grant_number": "U54HG007990" }, { "agency": "Howard Hughes Medical Institute (HHMI)" } ] }, "doi": "10.1093/jamia/ocv047", "pmcid": "PMC5009913", "resource_type": "article", "pub_year": "2015", "author_list": "Paten, Benedict; Diekhans, Mark; et el." }, { "id": "https://authors.library.caltech.edu/records/vg27x-zhk63", "eprint_id": 74706, "eprint_status": "archive", "datestamp": "2023-08-20 07:01:26", "lastmod": "2023-10-24 22:52:18", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Brat-D-J", "name": { "family": "Brat", "given": "Daniel J." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Comprehensive, Integrative Genomic Analysis of Diffuse Lower-Grade Gliomas", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2015 Massachusetts Medical Society. \n\nThis article was published on June 10, 2015, at NEJM.org. \n\nWe thank personnel at the genome sequencing centers and cancer genome characterization centers for data generation and analyses, members of the National Cancer Institute and National Human Genome Research Institute project teams for coordination of project activities, and personnel at the M.D. Anderson Functional Proteomics Core for RPPA data and analysis. \n\nThe authors are members of the Cancer Genome Atlas Research Network, and their names, affiliations, and roles are listed in Supplementary Appendix 1, available at NEJM.org. \n\nSupported by grants (U24CA143883, U24CA143858, U24CA143840, U24CA143799, U24CA143835, U24CA143845, U24CA143882, U24CA143867, U24CA143866, U24CA143848, U24CA144025, U54HG003067, U54HG003079, U54HG003273, U24CA126543, U24CA126544, U24CA126546, U24CA126551, U24CA126554, U24CA126561, U24CA126563, U24CA143731, U24CA143843, and P30CA016672) from the National Institutes of Health. \n\nDisclosure forms provided by the authors are available with the full text of this article at NEJM.org. \n\nThe views expressed in this article are those of the authors and do not reflect the official policy of the National Institutes of Health.\n\nAccepted Version - nihms711768.pdf
", "abstract": "BACKGROUND: Diffuse low-grade and intermediate-grade gliomas (which together make up the lower-grade gliomas, World Health Organization grades II and III) have highly variable clinical behavior that is not adequately predicted on the basis of histologic class. Some are indolent; others quickly progress to glioblastoma. The uncertainty is compounded by interobserver variability in histologic diagnosis. Mutations in IDH, TP53, and ATRX and codeletion of chromosome arms 1p and 19q (1p/19q codeletion) have been implicated as clinically relevant markers of lower-grade gliomas. \n\nMETHODS: We performed genomewide analyses of 293 lower-grade gliomas from adults, incorporating exome sequence, DNA copy number, DNA methylation, messenger RNA expression, microRNA expression, and targeted protein expression. These data were integrated and tested for correlation with clinical outcomes. \n\nRESULTS: Unsupervised clustering of mutations and data from RNA, DNA-copy-number, and DNA-methylation platforms uncovered concordant classification of three robust, nonoverlapping, prognostically significant subtypes of lower-grade glioma that were captured more accurately by IDH, 1p/19q, and TP53 status than by histologic class. Patients who had lower-grade gliomas with an IDH mutation and 1p/19q codeletion had the most favorable clinical outcomes. Their gliomas harbored mutations in CIC, FUBP1, NOTCH1, and the TERT promoter. Nearly all lower-grade gliomas with IDH mutations and no 1p/19q codeletion had mutations in TP53 (94%) and ATRX inactivation (86%). The large majority of lower-grade gliomas without an IDH mutation had genomic aberrations and clinical behavior strikingly similar to those found in primary glioblastoma. \n\nCONCLUSIONS: The integration of genomewide data from multiple platforms delineated three molecular classes of lower-grade gliomas that were more concordant with IDH, 1p/19q, and TP53 status than with histologic class. Lower-grade gliomas with an IDH mutation either had 1p/19q codeletion or carried a TP53 mutation. Most lower-grade gliomas without an IDH mutation were molecularly and clinically similar to glioblastoma. (Funded by the National Institutes of Health.)", "date": "2015-06-25", "date_type": "published", "publication": "New England Journal of Medicine", "volume": "372", "number": "26", "publisher": "Massachusetts Medical Society", "pagerange": "2481-2498", "id_number": "CaltechAUTHORS:20170303-132106100", "issn": "0028-4793", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170303-132106100", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "U24CA143883" }, { "agency": "NIH", "grant_number": "U24CA143858" }, { "agency": "NIH", "grant_number": "U24CA143840" }, { "agency": "NIH", "grant_number": "U24CA143799" }, { "agency": "NIH", "grant_number": "U24CA143835" }, { "agency": "NIH", "grant_number": "U24CA143845" }, { "agency": "NIH", "grant_number": "U24CA143882" }, { "agency": "NIH", "grant_number": "U24CA143867" }, { "agency": "NIH", "grant_number": "U24CA143866" }, { "agency": "NIH", "grant_number": "U24CA143848" }, { "agency": "NIH", "grant_number": "U24CA144025" }, { "agency": "NIH", "grant_number": "U54HG003067" }, { "agency": "NIH", "grant_number": "U54HG003079" }, { "agency": "NIH", "grant_number": "U54HG003273" }, { "agency": "NIH", "grant_number": "U24CA126543" }, { "agency": "NIH", "grant_number": "U24CA126544" }, { "agency": "NIH", "grant_number": "U24CA126546" }, { "agency": "NIH", "grant_number": "U24CA126551" }, { "agency": "NIH", "grant_number": "U24CA126554" }, { "agency": "NIH", "grant_number": "U24CA126561" }, { "agency": "NIH", "grant_number": "U24CA126563" }, { "agency": "NIH", "grant_number": "U24CA143731" }, { "agency": "NIH", "grant_number": "U24CA143843" }, { "agency": "NIH", "grant_number": "P30CA016672" } ] }, "corp_creators": { "items": [ "Cancer Genome Atlas Research Network" ] }, "doi": "10.1056/NEJMoa1402121", "pmcid": "PMC4530011", "primary_object": { "basename": "nihms711768.pdf", "url": "https://authors.library.caltech.edu/records/vg27x-zhk63/files/nihms711768.pdf" }, "resource_type": "article", "pub_year": "2015", "author_list": "Brat, Daniel J. and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/3y9vj-fab67", "eprint_id": 74707, "eprint_status": "archive", "datestamp": "2023-08-22 15:37:02", "lastmod": "2023-10-24 22:52:20", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Singer-M", "name": { "family": "Singer", "given": "Meromit" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Controlling for conservation in genome-wide DNA methylation studies", "ispublished": "pub", "full_text_status": "public", "keywords": "Averaging; Conservation; Comparative analysis; Missing data; DNA methylation; Junctions; Intron; Exon; Coding", "note": "\u00a9 Singer and Pachter; licensee BioMed Central. 2015. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. \n\nReceived: 15 April 2015. Accepted: 1 May 2015. Published: 30 May 2015. \n\nWe thank Yael Mandel-Gutfreund, Idit Kosti and Asaf Zemach for helpful feedback, as well as Nicolas Bray and other members of the Pachter lab for many insightful discussions. L.P. and M.S. were partially funded by NIH R01 HG006129. \n\nAuthors' contributions: MS and LP conceived the study and conducted the mathematical characterization, statistical analysis and design of correction methods. MS implemented the COMPARE software and conducted the data analysis. MS and LP wrote the manuscript. Both authors read and approved the final manuscript. \n\nThe authors declare that they have no competing interests.\n\nPublished - art_3A10.1186_2Fs12864-015-1604-3.pdf
Supplemental Material - 12864_2015_1604_MOESM1_ESM.pdf
Supplemental Material - 12864_2015_1604_MOESM2_ESM.pdf
", "abstract": "BACKGROUND: A commonplace analysis in high-throughput DNA methylation studies is the comparison of methylation extent between different functional regions, computed by averaging methylation states within region types and then comparing averages between regions. For example, it has been reported that methylation is more prevalent in coding regions as compared to their neighboring introns or UTRs, leading to hypotheses about novel forms of epigenetic regulation. \n\nRESULTS: We have identified and characterized a bias present in these seemingly straightforward comparisons that results in the false detection of differences in methylation intensities across region types. This bias arises due to differences in conservation rates, rather than methylation rates, and is broadly present in the published literature. When controlling for conservation at coding start sites the differences in DNA methylation rates disappear. Moreover, a re-evaluation of methylation rates at intronexon junctions reveals that the magnitude of previously reported differences is greatly exaggerated. We introduce two correction methods to address this bias, an inference-based matrix completion algorithm and an averaging approach, tailored to address different underlying biological questions. We evaluate how analysis using these corrections affects the detection of differences in DNA methylation across functional boundaries. \n\nCONCLUSIONS: We report here on a bias in DNA methylation comparative studies that originates in conservation rate differences and manifests itself in the false discovery of differences in DNA methylation intensities and their extents. We have characterized this bias and its broad implications, and show how to control for it so as to enable the study of a variety of biological questions.", "date": "2015-05-30", "date_type": "published", "publication": "BMC Genomics", "volume": "16", "publisher": "BioMed Central", "pagerange": "Art. No. 420", "id_number": "CaltechAUTHORS:20170303-133219740", "issn": "1471-2164", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170303-133219740", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R01 HG006129" } ] }, "doi": "10.1186/s12864-015-1604-3", "pmcid": "PMC4448855", "primary_object": { "basename": "12864_2015_1604_MOESM1_ESM.pdf", "url": "https://authors.library.caltech.edu/records/3y9vj-fab67/files/12864_2015_1604_MOESM1_ESM.pdf" }, "related_objects": [ { "basename": "12864_2015_1604_MOESM2_ESM.pdf", "url": "https://authors.library.caltech.edu/records/3y9vj-fab67/files/12864_2015_1604_MOESM2_ESM.pdf" }, { "basename": "art_3A10.1186_2Fs12864-015-1604-3.pdf", "url": "https://authors.library.caltech.edu/records/3y9vj-fab67/files/art_3A10.1186_2Fs12864-015-1604-3.pdf" } ], "resource_type": "article", "pub_year": "2015", "author_list": "Singer, Meromit and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/qbbbd-k1850", "eprint_id": 74708, "eprint_status": "archive", "datestamp": "2023-08-22 15:23:14", "lastmod": "2023-10-24 22:52:23", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Singer-M", "name": { "family": "Singer", "given": "Meromit" } }, { "id": "Kosti-I", "name": { "family": "Kosti", "given": "Idit" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Mandel-Gutfreund-Y", "name": { "family": "Mandel-Gutfreund", "given": "Yael" } } ] }, "title": "A diverse epigenetic landscape at human exons with implication for expression", "ispublished": "pub", "full_text_status": "public", "keywords": "dna methylation; exons; genes; histones; introns; methylation; epigenetics", "note": "\u00a9 The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. \n\nReceived May 19, 2014; Revised January 31, 2015; Accepted February 16, 2015. \n\nWe would like to thank Asaf Hellman and Dvir Aran for helpful discussions. \n\nFUNDING: Israeli Science Foundation [1623/12 to Y.M.G.]. Funding for open access charge: Israeli Science Foundation [1623/12 to Y.M.G.]. \n\nConflict of interest statement. None declared.\n\nPublished - gkv153.pdf
", "abstract": "DNA methylation is an important epigenetic marker associated with gene expression regulation in eukaryotes. While promoter methylation is relatively well characterized, the role of intragenic DNA methylation remains unclear. Here, we investigated the relationship of DNA methylation at exons and flanking introns with gene expression and histone modifications generated from a human fibroblast cell-line and primary B cells. Consistent with previous work we found that intragenic methylation is positively correlated with gene expression and that exons are more highly methylated than their neighboring intronic environment. Intriguingly, in this study we identified a unique subset of hypomethylated exons that demonstrate significantly lower methylation levels than their surrounding introns. Furthermore, we observed a negative correlation between exon methylation and the density of the majority of histone modifications. Specifically, we demonstrate that hypo-methylated exons at highly expressed genes are associated with open chromatin and have a characteristic histone code comprised of significantly high levels of histone markings. Overall, our comprehensive analysis of the human exome supports the presence of regulatory hypomethylated exons in protein coding genes. In particular our results reveal a previously unrecognized diverse and complex role of the epigenetic landscape within the gene body.", "date": "2015-04-20", "date_type": "published", "publication": "Nucleic Acids Research", "volume": "43", "number": "7", "publisher": "Oxford University Press", "pagerange": "3498-3508", "id_number": "CaltechAUTHORS:20170303-133948010", "issn": "0305-1048", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170303-133948010", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "Israel Science Foundation", "grant_number": "1623/12" } ] }, "doi": "10.1093/nar/gkv153", "pmcid": "PMC4402514", "primary_object": { "basename": "gkv153.pdf", "url": "https://authors.library.caltech.edu/records/qbbbd-k1850/files/gkv153.pdf" }, "resource_type": "article", "pub_year": "2015", "author_list": "Singer, Meromit; Kosti, Idit; et el." }, { "id": "https://authors.library.caltech.edu/records/acq1h-k9e77", "eprint_id": 74709, "eprint_status": "archive", "datestamp": "2023-08-20 03:53:14", "lastmod": "2023-10-24 22:52:25", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Aviran-S", "name": { "family": "Aviran", "given": "Sharon" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Rational experiment design for sequencing-based RNA structure mapping", "ispublished": "pub", "full_text_status": "public", "keywords": "next-generation sequencing, RNA structure, structure mapping, genomic big data, high-throughput genomics", "note": "\u00a9 2014 Aviran and Pachter; Published by Cold Spring Harbor Laboratory Press for the RNA Society. This article is distributed exclusively by the RNA Society for the first 12 months after the full-issue publication date (see http://rnajournal.cshlp.org/site/misc/terms.xhtml). After 12 months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/. \n\nReceived December 8, 2013. Accepted September 7, 2014. \n\nWe thank Yiliang Ding, Julius Lucks, Shujun Luo, Stefanie Mortimer, and Silvi Rouskin for many discussions and for clarifications on the methods they developed. This work is supported by National Institutes of Health (NIH) grants R00 HG006860 to S.A. and R01 HG006129 to L.P.\n\nPublished - 1864.pdf
", "abstract": "Structure mapping is a classic experimental approach for determining nucleic acid structure that has gained renewed interest in recent years following advances in chemistry, genomics, and informatics. The approach encompasses numerous techniques that use different means to introduce nucleotide-level modifications in a structure-dependent manner. Modifications are assayed via cDNA fragment analysis, using electrophoresis or next-generation sequencing (NGS). The recent advent of NGS has dramatically increased the throughput, multiplexing capacity, and scope of RNA structure mapping assays, thereby opening new possibilities for genome-scale, de novo, and in vivo studies. From an informatics standpoint, NGS is more informative than prior technologies by virtue of delivering direct molecular measurements in the form of digital sequence counts. Motivated by these new capabilities, we introduce a novel model-based in silico approach for quantitative design of large-scale multiplexed NGS structure mapping assays, which takes advantage of the direct and digital nature of NGS readouts. We use it to characterize the relationship between controllable experimental parameters and the precision of mapping measurements. Our results highlight the complexity of these dependencies and shed light on relevant tradeoffs and pitfalls, which can be difficult to discern by intuition alone. We demonstrate our approach by quantitatively assessing the robustness of SHAPE-Seq measurements, obtained by multiplexing SHAPE (selective 2\u2032-hydroxyl acylation analyzed by primer extension) chemistry in conjunction with NGS. We then utilize it to elucidate design considerations in advanced genome-wide approaches for probing the transcriptome, which recently obtained in vivo information using dimethyl sulfate (DMS) chemistry.", "date": "2014-12", "date_type": "published", "publication": "RNA", "volume": "20", "number": "12", "publisher": "RNA Society", "pagerange": "1864-1877", "id_number": "CaltechAUTHORS:20170303-134607827", "issn": "1355-8382", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170303-134607827", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R00 HG006860" }, { "agency": "NIH", "grant_number": "R01 HG006129" } ] }, "doi": "10.1261/rna.043844.113", "pmcid": "PMC4238353", "primary_object": { "basename": "1864.pdf", "url": "https://authors.library.caltech.edu/records/acq1h-k9e77/files/1864.pdf" }, "resource_type": "article", "pub_year": "2014", "author_list": "Aviran, Sharon and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/57pcw-w6930", "eprint_id": 74714, "eprint_status": "archive", "datestamp": "2023-08-20 01:21:16", "lastmod": "2023-10-20 23:05:13", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Forster-R", "name": { "family": "Forster", "given": "Ryan" } }, { "id": "Chiba-Kunitoshi", "name": { "family": "Chiba", "given": "Kunitoshi" } }, { "id": "Schaeffer-L-V", "name": { "family": "Schaeffer", "given": "Lorian" } }, { "id": "Regalado-S-G", "name": { "family": "Regalado", "given": "Samuel G." } }, { "id": "Lai-Christine-S", "name": { "family": "Lai", "given": "Christine S." } }, { "id": "Gao-Qing", "name": { "family": "Gao", "given": "Qing" } }, { "id": "Kiana-S", "name": { "family": "Kiani", "given": "Samira" } }, { "id": "Farin-H-F", "name": { "family": "Farin", "given": "Henner F." } }, { "id": "Clevers-H", "name": { "family": "Clevers", "given": "Hans" } }, { "id": "Cost-G-J", "name": { "family": "Cost", "given": "Gregory J." } }, { "id": "Chan-Andy", "name": { "family": "Chan", "given": "Andy" } }, { "id": "Rebar-E-J", "name": { "family": "Rebar", "given": "Edward J." } }, { "id": "Urnov-F-D", "name": { "family": "Urnov", "given": "Fyodor D." } }, { "id": "Gregory-P-D", "name": { "family": "Gregory", "given": "Philip D." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Jaenisch-R", "name": { "family": "Jaenisch", "given": "Rudolf" } }, { "id": "Hockemeyer-D", "name": { "family": "Hockemeyer", "given": "Dirk" } } ] }, "title": "Human Intestinal Tissue with Adult Stem Cell Properties Derived from Pluripotent Stem Cells", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2014 The Authors. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/). \n\nReceived 24 November 2013, Revised 4 May 2014, Accepted 5 May 2014, Available online 3 June 2014. \n\nWe thank R. Alagappan, P. Xu, Dong Dong Wu, and Lei Zhang and the Sangamo Production group for expert technical assistance. We thank Frank Soldner, Thomas Sandmann, and Helen Bateup for helpful comments during the design of the experiment. We thank Nicki Watson from the Keck Imaging Facility at the Whitehead Institute for performing the EM analysis. R.F. is supported by the National Science Foundation Graduate Research Fellowship Program (GRFP) under grant DGE 1106400 and NIH training grant 2T32GM007232-36. K.C. was supported by a fellowship from the Nakajima Foundation. R.J. was supported by NIH grants R37-CA084198, RO1-CA087869, and RO1-HD045022, and by a grant from the HHMI. R.J. is an adviser to Stemgent and a cofounder of Fate Therapeutics. D.H. is a New Scholar in Aging of the Ellison Medical Foundation and is supported by the Glenn Foundation and the Shurl and Kay Curci Foundation. G.J.C., A.C., E.J.R., P.D.G., and F.D.U. are full-time employees of Sangamo BioSciences. \n\nAccession Numbers: The GEO accession number for the RNA-seq data reported in this paper is GSE56930.\n\nRyan Forster, Kunitoshi Chiba, Lorian Schaeffer, Samuel G. Regalado, Christine S. Lai, Qing Gao, Samira Kiani, Henner F. Farin, Hans Clevers, Gregory J. Cost, Andy Chan, Edward J. Rebar, Fyodor D. Urnov, Philip D. Gregory, Lior Pachter, Rudolf Jaenisch, Dirk Hockemeyer, Human Intestinal Tissue with Adult Stem Cell Properties Derived from Pluripotent Stem Cells, Stem Cell Reports, Volume 3, Issue 1, 8 July 2014, Page 215, ISSN 2213-6711, http://dx.doi.org/10.1016/j.stemcr.2014.06.014.\n(http://www.sciencedirect.com/science/article/pii/S2213671114002008)\n\nPublished - main.pdf
Supplemental Material - mmc1.pdf
Supplemental Material - mmc2.xls
Supplemental Material - mmc3.mp4
Erratum - 1-s2.0-S2213671114002008-main.pdf
", "abstract": "Genetically engineered human pluripotent stem cells (hPSCs) have been proposed as a source for transplantation therapies and are rapidly becoming valuable tools for human disease modeling. However, many applications are limited due to the lack of robust differentiation paradigms that allow for the isolation of defined functional tissues. Here, using an endogenous LGR5-GFP reporter, we derived adult stem cells from hPSCs that gave rise to functional human intestinal tissue comprising all major cell types of the intestine. Histological and functional analyses revealed that such human organoid cultures could be derived with high purity and with a composition and morphology similar to those of cultures obtained from human biopsies. Importantly, hPSC-derived organoids responded to the canonical signaling pathways that control self-renewal and differentiation in the adult human intestinal stem cell compartment. This adult stem cell system provides a platform for studying human intestinal disease in vitro using genetically engineered hPSCs.", "date": "2014-06-03", "date_type": "published", "publication": "Stem Cell Reports", "volume": "2", "number": "6", "publisher": "Elsevier", "pagerange": "838-852", "id_number": "CaltechAUTHORS:20170303-141031221", "issn": "2213-6711", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170303-141031221", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NSF Graduate Research Fellowship", "grant_number": "DGE-1106400" }, { "agency": "NIH Predoctoral Fellowship", "grant_number": "2T32GM007232-36" }, { "agency": "Nakajima Foundation" }, { "agency": "NIH", "grant_number": "R37-CA084198" }, { "agency": "NIH", "grant_number": "RO1-CA087869" }, { "agency": "NIH", "grant_number": "RO1-HD045022" }, { "agency": "Howard Hughes Medical Institute (HHMI)" }, { "agency": "Ellison Medical Foundation" }, { "agency": "Glenn Foundation" }, { "agency": "Shurl and Kay Curci Foundation" } ] }, "doi": "10.1016/j.stemcr.2014.05.001", "pmcid": "PMC4050346", "primary_object": { "basename": "1-s2.0-S2213671114002008-main.pdf", "url": "https://authors.library.caltech.edu/records/57pcw-w6930/files/1-s2.0-S2213671114002008-main.pdf" }, "related_objects": [ { "basename": "main.pdf", "url": "https://authors.library.caltech.edu/records/57pcw-w6930/files/main.pdf" }, { "basename": "mmc1.pdf", "url": "https://authors.library.caltech.edu/records/57pcw-w6930/files/mmc1.pdf" }, { "basename": "mmc2.xls", "url": "https://authors.library.caltech.edu/records/57pcw-w6930/files/mmc2.xls" }, { "basename": "mmc3.mp4", "url": "https://authors.library.caltech.edu/records/57pcw-w6930/files/mmc3.mp4" } ], "resource_type": "article", "pub_year": "2014", "author_list": "Forster, Ryan; Chiba, Kunitoshi; et el." }, { "id": "https://authors.library.caltech.edu/records/ffw23-m8n84", "eprint_id": 95302, "eprint_status": "archive", "datestamp": "2023-08-20 00:55:32", "lastmod": "2023-10-20 22:58:22", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Wong-Valerie-L", "name": { "family": "Wong", "given": "Valerie L." } }, { "id": "Ellison-C-E", "name": { "family": "Ellison", "given": "Christopher E." } }, { "id": "Eisen-M-B", "name": { "family": "Eisen", "given": "Michael B." }, "orcid": "0000-0002-7528-738X" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Brem-R-B", "name": { "family": "Brem", "given": "Rachel B." } } ] }, "title": "Structural Variation among Wild and Industrial Strains of Penicillium chrysogenum", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2014 Wong et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. \n\nReceived: November 30, 2013; Accepted: April 11, 2014; Published: May 13, 2014. \n\nThe authors have no support or funding to report. \n\nCompeting interests: MBE is a member of the PLOS Board of Directors. This does not alter the authors' adherence to all the PLoS ONE policies on sharing data and materials.\n\nDevin Scannell produced the genomic libraries for sequencing. Daniel Henk graciously provided strains. We thank two reviewers for their helpful comments. \n\nAuthor Contributions: Conceived and designed the experiments: VLW CEE RBB. Performed the experiments: VLW CEE. Analyzed the data: VLW CEE. Contributed reagents/materials/analysis tools: VLW CEE MBE LP RBB. Wrote the paper: VLW CEE RBB. Supervised the research: MBE LP RBB.\n\nPublished - pone.0096784.pdf
Supplemental Material - journal.pone.0096784.s001.DOCX
Supplemental Material - journal.pone.0096784.s002.DOCX
Supplemental Material - journal.pone.0096784.s003.DOCX
Supplemental Material - journal.pone.0096784.s004.DOCX
", "abstract": "Strain selection and strain improvement are the first, and arguably most important, steps in the industrial production of biological compounds by microorganisms. While traditional methods of mutagenesis and selection have been effective in improving production of compounds at a commercial scale, the genetic changes underpinning the altered phenotypes have remained largely unclear. We utilized high-throughput Illumina short read sequencing of a wild Penicillium chrysogenum strain in order to make whole genome comparisons to a sequenced improved strain (WIS 54\u20131255). We developed an assembly-free method of identifying chromosomal rearrangements and validated the in silico predictions with a PCR-based assay and Sanger sequencing. Despite many rounds of mutagen treatment and artificial selection, WIS 54\u20131255 differs from its wild progenitor at only one of the identified rearrangements. We suggest that natural variants predisposed for high penicillin production were instrumental in the success of WIS 54\u20131255 as an industrial strain. In addition to finding a previously published inversion in the penicillin biosynthesis cluster, we located several genes related to penicillin production associated with these rearrangements. By comparing the configuration of rearrangement events among several historically important strains known to be high penicillin producers to a collection of recently isolated wild strains, we suggest that wild strains with rearrangements similar to those in known high penicillin producers may be viable candidates for further improvement efforts.", "date": "2014-05-13", "date_type": "published", "publication": "PLoS ONE", "volume": "9", "number": "5", "publisher": "Public Library of Science", "pagerange": "Art. No. e96784", "id_number": "CaltechAUTHORS:20190507-112223935", "issn": "1932-6203", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20190507-112223935", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "doi": "10.1371/journal.pone.0096784", "pmcid": "PMC4019546", "primary_object": { "basename": "journal.pone.0096784.s001.DOCX", "url": "https://authors.library.caltech.edu/records/ffw23-m8n84/files/journal.pone.0096784.s001.DOCX" }, "related_objects": [ { "basename": "journal.pone.0096784.s002.DOCX", "url": "https://authors.library.caltech.edu/records/ffw23-m8n84/files/journal.pone.0096784.s002.DOCX" }, { "basename": "journal.pone.0096784.s003.DOCX", "url": "https://authors.library.caltech.edu/records/ffw23-m8n84/files/journal.pone.0096784.s003.DOCX" }, { "basename": "journal.pone.0096784.s004.DOCX", "url": "https://authors.library.caltech.edu/records/ffw23-m8n84/files/journal.pone.0096784.s004.DOCX" }, { "basename": "pone.0096784.pdf", "url": "https://authors.library.caltech.edu/records/ffw23-m8n84/files/pone.0096784.pdf" } ], "resource_type": "article", "pub_year": "2014", "author_list": "Wong, Valerie L.; Ellison, Christopher E.; et el." }, { "id": "https://authors.library.caltech.edu/records/xw760-6ct89", "eprint_id": 74716, "eprint_status": "archive", "datestamp": "2023-08-20 00:50:49", "lastmod": "2023-10-24 22:52:47", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Takayama-Sachiko", "name": { "family": "Takayama", "given": "Sachiko" } }, { "id": "Dhahbi-Joseph", "name": { "family": "Dhahbi", "given": "Joseph" } }, { "id": "Roberts-A", "name": { "family": "Roberts", "given": "Adam" } }, { "id": "Mao-Guanxiong", "name": { "family": "Mao", "given": "Guanxiong" } }, { "id": "Heo-Seok-Jin", "name": { "family": "Heo", "given": "Seok-Jin" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Martin-David-I-K", "name": { "family": "Martin", "given": "David I. K." } }, { "id": "Boffelli-D", "name": { "family": "Boffelli", "given": "Dario" } } ] }, "title": "Genome methylation in D. melanogaster is found at specific short motifs and is independent of DNMT2 activity", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2014 Takayama et al. This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/. \n\nReceived June 21, 2013. Accepted February 18, 2014. \n\nWe thank Mark Biggin and Xiaoyong Li for the Canton-S Drosophila melanogaster DNA; Gunther Reuter and Frank Lyko for the Mt2 mutant strains and DNA; Sarah Siegrist for fly stocks, husbandry and other resources, and technical help and discussions. This work was supported by the NIH grants HL084474 (D.B.), ES016581 (D.I.K.M.), CA115768 (D.I.K.M.). J.D. was a Scholar of the California Institute of Regenerative Medicine. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. \n\nAuthor contributions: D.I.K.M. conceived the study; J.D. designed and performed the MeDIP-bisulfite deep sequencing experiments; S.T. performed the fly husbandry, obtained unfertilized oocytes, and carried out bisulfite PCR experiments; S.J.H. constructed the bisulfite-PCR sequencing libraries; S.T., G.M., and D.B. designed and performed bioinformatic analyses; A.R. and L.P. carried out the association between methylation and gene expression; D.I.K.M. and D.B. wrote the paper. \n\nData access: Sequencing data from this study have been submitted to the NCBI Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo/) under accession number GSE34425. Bisulfite sequencing data have been submitted to the NCBI Sequence Read Archive (SRA; http://www.ncbi.nlm.nih.gov/sra) under accession nos. SRR1191286, SRR1191445, SRR1191486, SRR1191487, SRR1191684, and SRR1191728.\n\nPublished - Genome_Res.-2014-Takayama-821-30.pdf
Supplemental Material - Supplemental_Material.pdf
Supplemental Material - Supplemental_primers.xls
", "abstract": "Cytosine methylation in the genome of Drosophila melanogaster has been elusive and controversial: Its location and function have not been established. We have used a novel and highly sensitive genomewide cytosine methylation assay to detect and map genome methylation in stage 5 Drosophila embryos. The methylation we observe with this method is highly localized and strand asymmetrical, limited to regions covering \u223c1% of the genome, dynamic in early embryogenesis, and concentrated in specific 5-base sequence motifs that are CA- and CT-rich but depleted of guanine. Gene body methylation is associated with lower expression, and many genes containing methylated regions have developmental or transcriptional functions. The only known DNA methyltransferase in Drosophila is the DNMT2 homolog MT2, but lines deficient for MT2 retain genomic methylation, implying the presence of a novel methyltransferase. The association of methylation with a lower expression of specific developmental genes at stage 5 raises the possibility that it participates in controlling gene expression during the maternal-zygotic transition.", "date": "2014-05", "date_type": "published", "publication": "Genome Research", "volume": "24", "number": "5", "publisher": "Cold Spring Harbor Laboratory Press", "pagerange": "821-830", "id_number": "CaltechAUTHORS:20170303-142644662", "issn": "1088-9051", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170303-142644662", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "HL084474" }, { "agency": "NIH", "grant_number": "ES016581" }, { "agency": "NIH", "grant_number": "CA115768" }, { "agency": "California Institute of Regenerative Medicine" } ] }, "doi": "10.1101/gr.162412.113", "pmcid": "PMC4009611", "primary_object": { "basename": "Genome_Res.-2014-Takayama-821-30.pdf", "url": "https://authors.library.caltech.edu/records/xw760-6ct89/files/Genome_Res.-2014-Takayama-821-30.pdf" }, "related_objects": [ { "basename": "Supplemental_Material.pdf", "url": "https://authors.library.caltech.edu/records/xw760-6ct89/files/Supplemental_Material.pdf" }, { "basename": "Supplemental_primers.xls", "url": "https://authors.library.caltech.edu/records/xw760-6ct89/files/Supplemental_primers.xls" } ], "resource_type": "article", "pub_year": "2014", "author_list": "Takayama, Sachiko; Dhahbi, Joseph; et el." }, { "id": "https://authors.library.caltech.edu/records/jjxxg-g1y43", "eprint_id": 74722, "eprint_status": "archive", "datestamp": "2023-08-20 00:17:16", "lastmod": "2023-10-24 22:53:03", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Pimentel-H", "name": { "family": "Pimentel", "given": "Harold" } }, { "id": "Parra-M", "name": { "family": "Parra", "given": "Marilynn" } }, { "id": "Gee-S", "name": { "family": "Gee", "given": "Sherry" } }, { "id": "Ghanem-D", "name": { "family": "Ghanem", "given": "Dana" } }, { "id": "An-Xiuli", "name": { "family": "An", "given": "Xiuli" } }, { "id": "Li-Jie", "name": { "family": "Li", "given": "Jie" }, "orcid": "0000-0002-3733-4587" }, { "id": "Mohandas-N", "name": { "family": "Mohandas", "given": "Narla" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Conboy-J-G", "name": { "family": "Conboy", "given": "John G." } } ] }, "title": "A dynamic alternative splicing program regulates gene expression during terminal erythropoiesis", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 The Author(s) 2014. Published by Oxford University Press.\nThis is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. \n\nReceived October 29, 2013; Revised December 16, 2013; Accepted December 17, 2013. \n\nJ.G.C., L.P., N.M. and X.A. designed the research; H.P., M.P., S.L.G., D.G. and J.L. performed research and analyzed data; and J.G.C., N.M., X.A., L.P. and H.P. wrote the article. \n\nFUNDING: The National Institutes of Health (NIH) [DK094699 and DK032094]. Director, Office of Science, and Office of Biological & Environmental Research of the US Department of Energy under Contract No. DE-AC02-05CH1123. Funding for open access charge: NIH [DK094699]. \n\nConflict of interest statement. None declared.\n\nPublished - gkt1388.pdf
", "abstract": "Alternative pre-messenger RNA splicing remodels the human transcriptome in a spatiotemporal manner during normal development and differentiation. Here we explored the landscape of transcript diversity in the erythroid lineage by RNA-seq analysis of five highly purified populations of morphologically distinct human erythroblasts, representing the last four cell divisions before enucleation. In this unique differentiation system, we found evidence of an extensive and dynamic alternative splicing program encompassing genes with many diverse functions. Alternative splicing was particularly enriched in genes controlling cell cycle, organelle organization, chromatin function and RNA processing. Many alternative exons exhibited differentiation-associated switches in splicing efficiency, mostly in late-stage polychromatophilic and orthochromatophilic erythroblasts, in concert with extensive cellular remodeling that precedes enucleation. A subset of alternative splicing switches introduces premature translation termination codons into selected transcripts in a differentiation stage-specific manner, supporting the hypothesis that alternative splicing-coupled nonsense-mediated decay contributes to regulation of erythroid-expressed genes as a novel part of the overall differentiation program. We conclude that a highly dynamic alternative splicing program in terminally differentiating erythroblasts plays a major role in regulating gene expression to ensure synthesis of appropriate proteome at each stage as the cells remodel in preparation for production of mature red cells.", "date": "2014-04", "date_type": "published", "publication": "Nucleic Acids Research", "volume": "42", "number": "6", "publisher": "Oxford University Press", "pagerange": "4031-4042", "id_number": "CaltechAUTHORS:20170303-143819723", "issn": "0305-1048", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170303-143819723", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "DK094699" }, { "agency": "NIH", "grant_number": "DK032094" }, { "agency": "Department of Energy (DOE)", "grant_number": "DE-AC02-05CH11231" } ] }, "doi": "10.1093/nar/gkt1388", "pmcid": "PMC3973340", "primary_object": { "basename": "gkt1388.pdf", "url": "https://authors.library.caltech.edu/records/jjxxg-g1y43/files/gkt1388.pdf" }, "resource_type": "article", "pub_year": "2014", "author_list": "Pimentel, Harold; Parra, Marilynn; et el." }, { "id": "https://authors.library.caltech.edu/records/v52pv-abs78", "eprint_id": 74725, "eprint_status": "archive", "datestamp": "2023-08-19 22:31:50", "lastmod": "2023-10-24 22:53:08", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Roberts-A", "name": { "family": "Roberts", "given": "Adam" } }, { "id": "Feng-Harvey", "name": { "family": "Feng", "given": "Harvey" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Fragment assignment in the cloud with eXpress-D", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 Roberts et al.; licensee BioMed Central Ltd. 2013. This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. \n\nReceived: 13 September 2013. Accepted: 18 November 2013. Published: 7 December 2013. \n\nWe thank Matei Zaharia, Kristal Curtis, and Reynold Xin for discussions on the feasibility of the Spark implementation. Adam Roberts was supported by an NSF graduate research fellowship. Lior Pachter was partially supported by NIH HG006129. \n\nAuthors' contributions: AR developed the method. AR and HF implemented the method and analyzed the results. AR, HF, and LP wrote the manuscript. All authors read and approved the final manuscript. \n\nAvailability and usage: eXpress-D and Spark are open source software that can be downloaded from their respective websites, http://github.com/adarob/express-d and http://spark.incubator.apache.org/. For ease of use, the eXpress-D source code includes a copy of a Spark script that allow users to launch, setup and manage EC2 clusters running Spark and HDFS. The script can be used to launch all nodes in the cluster using a customized Amazon Machine Image (AMI)-a type of templated operating system [24]-that is preloaded with eXpress-D source and binaries. Target and fragment datasets can then be loaded into HDFS or S3 for distributed execution. The eXpress-D wiki page includes more detail about using the script to launch clusters, as well as notes on cluster configuration and tuning. \n\nThe authors declare that they have no competing interest.\n\nPublished - art_3A10.1186_2F1471-2105-14-358.pdf
Supplemental Material - 12859_2013_6238_MOESM1_ESM.ZIP
Supplemental Material - 12859_2013_6238_MOESM2_ESM.pdf
Supplemental Material - 12859_2013_6238_MOESM3_ESM.pdf
Supplemental Material - 12859_2013_6238_MOESM4_ESM.pdf
", "abstract": "Background: Probabilistic assignment of ambiguously mapped fragments produced by high-throughput sequencing experiments has been demonstrated to greatly improve accuracy in the analysis of RNA-Seq and ChIP-Seq, and is an essential step in many other sequence census experiments. A maximum likelihood method using the expectation-maximization (EM) algorithm for optimization is commonly used to solve this problem. However, batch EM-based approaches do not scale well with the size of sequencing datasets, which have been increasing dramatically over the past few years. Thus, current approaches to fragment assignment rely on heuristics or approximations for tractability. \n\nResults: We present an implementation of a distributed EM solution to the fragment assignment problem using Spark, a data analytics framework that can scale by leveraging compute clusters within datacenters\u2013\"the cloud\". We demonstrate that our implementation easily scales to billions of sequenced fragments, while providing the exact maximum likelihood assignment of ambiguous fragments. The accuracy of the method is shown to be an improvement over the most widely used tools available and can be run in a constant amount of time when cluster resources are scaled linearly with the amount of input data. \n\nConclusions: The cloud offers one solution for the difficulties faced in the analysis of massive high-thoughput sequencing data, which continue to grow rapidly. Researchers in bioinformatics must follow developments in distributed systems\u2013such as new frameworks like Spark\u2013for ways to port existing methods to the cloud and help them scale to the datasets of the future. Our software, eXpress-D, is freely available at: http://github.com/adarob/express-d.", "date": "2013-12-07", "date_type": "published", "publication": "BMC Bioinformatics", "volume": "14", "publisher": "BioMed Central", "pagerange": "Art. No. 358", "id_number": "CaltechAUTHORS:20170303-144424899", "issn": "1471-2105", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170303-144424899", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NSF Graduate Research Fellowship" }, { "agency": "NIH", "grant_number": "HG006129" } ] }, "doi": "10.1186/1471-2105-14-358", "pmcid": "PMC3881492", "primary_object": { "basename": "12859_2013_6238_MOESM1_ESM.ZIP", "url": "https://authors.library.caltech.edu/records/v52pv-abs78/files/12859_2013_6238_MOESM1_ESM.ZIP" }, "related_objects": [ { "basename": "12859_2013_6238_MOESM2_ESM.pdf", "url": "https://authors.library.caltech.edu/records/v52pv-abs78/files/12859_2013_6238_MOESM2_ESM.pdf" }, { "basename": "12859_2013_6238_MOESM3_ESM.pdf", "url": "https://authors.library.caltech.edu/records/v52pv-abs78/files/12859_2013_6238_MOESM3_ESM.pdf" }, { "basename": "12859_2013_6238_MOESM4_ESM.pdf", "url": "https://authors.library.caltech.edu/records/v52pv-abs78/files/12859_2013_6238_MOESM4_ESM.pdf" }, { "basename": "art_3A10.1186_2F1471-2105-14-358.pdf", "url": "https://authors.library.caltech.edu/records/v52pv-abs78/files/art_3A10.1186_2F1471-2105-14-358.pdf" } ], "resource_type": "article", "pub_year": "2013", "author_list": "Roberts, Adam; Feng, Harvey; et el." }, { "id": "https://authors.library.caltech.edu/records/55r0e-gyg95", "eprint_id": 74738, "eprint_status": "archive", "datestamp": "2023-08-19 20:40:32", "lastmod": "2023-10-24 22:53:43", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Roberts-A", "name": { "family": "Roberts", "given": "Adam" } }, { "id": "Schaeffer-L-V", "name": { "family": "Schaeffer", "given": "Lorian" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Updating RNA-Seq analyses after re-annotation", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 The Author 2013. Published by Oxford University Press. \n\nReceived and revised on April 11, 2013; accepted on April 21, 2013. \n\nWe thank Isabelle Stanton for her advice on graph partitioning. \n\nFunding: AR was partly funded by an NSF graduate fellowship. AR and LP were partially funded by NIH R01 HG006129. \n\nConflict of Interest: none declared. \n\nAvailability and implementation: Our methods are implemented in software called ReXpress and are freely available, together with source code, at http://bio.math.berkeley.edu/ReXpress/.\n\nPublished - btt197.pdf
", "abstract": "The estimation of isoform abundances from RNA-Seq data requires a time-intensive step of mapping reads to either an assembled or previously annotated transcriptome, followed by an optimization procedure for deconvolution of multi-mapping reads. These procedures are essential for downstream analysis such as differential expression. In cases where it is desirable to adjust the underlying annotation, for example, on the discovery of novel isoforms or errors in existing annotations, current pipelines must be rerun from scratch. This makes it difficult to update abundance estimates after re-annotation, or to explore the effect of changes in the transcriptome on analyses. We present a novel efficient algorithm for updating abundance estimates from RNA-Seq experiments on re-annotation that does not require re-analysis of the entire dataset. Our approach is based on a fast partitioning algorithm for identifying transcripts whose abundances may depend on the added or deleted isoforms, and on a fast follow-up approach to re-estimating abundances for all transcripts. We demonstrate the effectiveness of our methods by showing how to synchronize RNA-Seq abundance estimates with the daily RefSeq incremental updates. Thus, we provide a practical approach to maintaining relevant databases of RNA-Seq derived abundance estimates even as annotations are being constantly revised.", "date": "2013-07-01", "date_type": "published", "publication": "Bioinformatics", "volume": "29", "number": "13", "publisher": "Oxford University Press", "pagerange": "1631-1637", "id_number": "CaltechAUTHORS:20170303-154642805", "issn": "1367-4803", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170303-154642805", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NSF Graduate Research Fellowship" }, { "agency": "NIH", "grant_number": "R01 HG006129" } ] }, "doi": "10.1093/bioinformatics/btt197", "pmcid": "PMC3694665", "primary_object": { "basename": "btt197.pdf", "url": "https://authors.library.caltech.edu/records/55r0e-gyg95/files/btt197.pdf" }, "resource_type": "article", "pub_year": "2013", "author_list": "Roberts, Adam; Schaeffer, Lorian; et el." }, { "id": "https://authors.library.caltech.edu/records/h7bt6-gaq50", "eprint_id": 74745, "eprint_status": "archive", "datestamp": "2023-08-22 08:50:09", "lastmod": "2023-10-24 22:54:08", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Kleinman-A", "name": { "family": "Kleinman", "given": "Aaron" } }, { "id": "Harel-M", "name": { "family": "Harel", "given": "Matan" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Affine and Projective Tree Metric Theorems", "ispublished": "pub", "full_text_status": "public", "keywords": "hierarchy; Gromov product; Kalmanson metric; Robinsonian metric; PC-tree; PQ-treep; hylogenetics; pyramid; ultrametric", "note": "\u00a9 2012 Springer Basel. \n\nReceived March 4, 2011. \n\nWe thank Laxmi Parida for introducing us to the applications of PQ-trees in biology during a visit to UC Berkeley in 2008. Thanks also to an anonymous reviewer whose careful reading of an initial draft of the paper helped us greatly. AK was funded by an NSF graduate research fellowship.\n\nSubmitted - 1103.2384.pdf
", "abstract": "The tree metric theorem provides a combinatorial four-point condition that characterizes dissimilarity maps derived from pairwise compatible split systems. A related weaker four point condition characterizes dissimilarity maps derived from circular split systems known as Kalmanson metrics. The tree metric theorem was first discovered in the context of phylogenetics and forms the basis of many tree reconstruction algorithms, whereas Kalmanson metrics were first considered by computer scientists, and are notable in that they are a non-trivial class of metrics for which the traveling salesman problem is tractable. We present a unifying framework for these theorems based on combinatorial structures that are used for graph planarity testing. These are (projective) PC-trees, and their affine analogs, PQ-trees. In the projective case, we generalize a number of concepts from clustering theory, including hierarchies, pyramids, ultrametrics, and Robinsonian matrices, and the theorems that relate them. As with tree metrics and ultrametrics, the link between PC-trees and PQ-trees is established via the Gromov product.", "date": "2013-03", "date_type": "published", "publication": "Annals of Combinatorics", "volume": "17", "number": "1", "publisher": "Springer", "pagerange": "205-228", "id_number": "CaltechAUTHORS:20170303-162557287", "issn": "0218-0006", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170303-162557287", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NSF Graduate Research Fellowship" } ] }, "doi": "10.1007/s00026-012-0173-2", "primary_object": { "basename": "1103.2384.pdf", "url": "https://authors.library.caltech.edu/records/h7bt6-gaq50/files/1103.2384.pdf" }, "resource_type": "article", "pub_year": "2013", "author_list": "Kleinman, Aaron; Harel, Matan; et el." }, { "id": "https://authors.library.caltech.edu/records/x5dqz-e3w34", "eprint_id": 74740, "eprint_status": "archive", "datestamp": "2023-08-22 08:27:13", "lastmod": "2023-10-24 22:53:50", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Rahman-A", "name": { "family": "Rahman", "given": "Atif" }, "orcid": "0000-0003-1805-3971" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "CGAL: computing genome assembly likelihoods", "ispublished": "pub", "full_text_status": "public", "keywords": "Genome assembly; evaluation; likelihood; sequencing", "note": "\u00a9 2013 Rahman and Pachter, licensee Springer. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. \n\nReceived: 23 August 2012. Accepted: 29 January 2013. Published: 29 January 2013. \n\nWe thank Michael Eisen, Aaron Kleinman, Harold Pimentel and Adam Roberts for helpful conversations in the development of the likelihood-based approach for assembly evaluation. LP was funded in part by NIH R21 HG006583. AR was funded in part by Fulbright Science & Technology Fellowship 15093630. \n\nAuthors' contributions: AR and LP conceived the project and developed the methodology. AR implemented the method in the CGAL software and obtained the results of the paper. AR and LP wrote the manuscript. All authors read and approved the final manuscript. \n\nThe authors have no competing interests.\n\nPublished - art_3A10.1186_2Fgb-2013-14-1-r8.pdf
Supplemental Material - 13059_2012_3037_MOESM10_ESM.eps
Supplemental Material - 13059_2012_3037_MOESM11_ESM.eps
Supplemental Material - 13059_2012_3037_MOESM1_ESM.PDF
Supplemental Material - 13059_2012_3037_MOESM2_ESM.eps
Supplemental Material - 13059_2012_3037_MOESM3_ESM.eps
Supplemental Material - 13059_2012_3037_MOESM4_ESM.eps
Supplemental Material - 13059_2012_3037_MOESM5_ESM.eps
Supplemental Material - 13059_2012_3037_MOESM6_ESM.eps
Supplemental Material - 13059_2012_3037_MOESM7_ESM.eps
Supplemental Material - 13059_2012_3037_MOESM8_ESM.eps
Supplemental Material - 13059_2012_3037_MOESM9_ESM.eps
", "abstract": "Assembly algorithms have been extensively benchmarked using simulated data so that results can be compared to ground truth. However, in de novo assembly, only crude metrics such as contig number and size are typically used to evaluate assembly quality. We present CGAL, a novel likelihood-based approach to assembly assessment in the absence of a ground truth. We show that likelihood is more accurate than other metrics currently used for evaluating assemblies, and describe its application to the optimization and comparison of assembly algorithms. Our methods are implemented in software that is freely available at http://bio.math.berkeley.edu/cgal/.", "date": "2013-01-29", "date_type": "published", "publication": "Genome Biology", "volume": "14", "number": "1", "publisher": "BioMed Central", "pagerange": "Art. No. R8", "id_number": "CaltechAUTHORS:20170303-155431520", "issn": "1465-6906", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170303-155431520", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R21 HG006583" }, { "agency": "Fulbright Foundation", "grant_number": "15093630" } ] }, "doi": "10.1186/gb-2013-14-1-r8", "pmcid": "PMC3663106", "primary_object": { "basename": "13059_2012_3037_MOESM6_ESM.eps", "url": "https://authors.library.caltech.edu/records/x5dqz-e3w34/files/13059_2012_3037_MOESM6_ESM.eps" }, "related_objects": [ { "basename": "art_3A10.1186_2Fgb-2013-14-1-r8.pdf", "url": "https://authors.library.caltech.edu/records/x5dqz-e3w34/files/art_3A10.1186_2Fgb-2013-14-1-r8.pdf" }, { "basename": "13059_2012_3037_MOESM11_ESM.eps", "url": "https://authors.library.caltech.edu/records/x5dqz-e3w34/files/13059_2012_3037_MOESM11_ESM.eps" }, { "basename": "13059_2012_3037_MOESM1_ESM.PDF", "url": "https://authors.library.caltech.edu/records/x5dqz-e3w34/files/13059_2012_3037_MOESM1_ESM.PDF" }, { "basename": "13059_2012_3037_MOESM5_ESM.eps", "url": "https://authors.library.caltech.edu/records/x5dqz-e3w34/files/13059_2012_3037_MOESM5_ESM.eps" }, { "basename": "13059_2012_3037_MOESM4_ESM.eps", "url": "https://authors.library.caltech.edu/records/x5dqz-e3w34/files/13059_2012_3037_MOESM4_ESM.eps" }, { "basename": "13059_2012_3037_MOESM7_ESM.eps", "url": "https://authors.library.caltech.edu/records/x5dqz-e3w34/files/13059_2012_3037_MOESM7_ESM.eps" }, { "basename": "13059_2012_3037_MOESM8_ESM.eps", "url": "https://authors.library.caltech.edu/records/x5dqz-e3w34/files/13059_2012_3037_MOESM8_ESM.eps" }, { "basename": "13059_2012_3037_MOESM9_ESM.eps", "url": "https://authors.library.caltech.edu/records/x5dqz-e3w34/files/13059_2012_3037_MOESM9_ESM.eps" }, { "basename": "13059_2012_3037_MOESM10_ESM.eps", "url": "https://authors.library.caltech.edu/records/x5dqz-e3w34/files/13059_2012_3037_MOESM10_ESM.eps" }, { "basename": "13059_2012_3037_MOESM2_ESM.eps", "url": "https://authors.library.caltech.edu/records/x5dqz-e3w34/files/13059_2012_3037_MOESM2_ESM.eps" }, { "basename": "13059_2012_3037_MOESM3_ESM.eps", "url": "https://authors.library.caltech.edu/records/x5dqz-e3w34/files/13059_2012_3037_MOESM3_ESM.eps" } ], "resource_type": "article", "pub_year": "2013", "author_list": "Rahman, Atif and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/3jby4-nxg71", "eprint_id": 74743, "eprint_status": "archive", "datestamp": "2023-08-19 14:15:31", "lastmod": "2023-10-24 22:53:59", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Trapnell-C", "name": { "family": "Trapnell", "given": "Cole" } }, { "id": "Hendrickson-D-G", "name": { "family": "Hendrickson", "given": "David G." } }, { "id": "Sauvageau-M", "name": { "family": "Sauvageau", "given": "Martin" } }, { "id": "Goff-L-A", "name": { "family": "Goff", "given": "Loyal" } }, { "id": "Rinn-J-L", "name": { "family": "Rinn", "given": "John L." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Differential analysis of gene regulation at transcript resolution with RNA-seq", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2012 Macmillan Publishers Limited. \n\nReceived 04 May 2012. Accepted 09 November 2012. Published online 09 December 2012. \n\nWe are grateful to D. Kelley for a careful reading of the manuscript, and B. Wold for sharing the hESC RNA-seq data. We are also thankful for the ongoing development efforts of A. Roberts, B. Langmead, D. Kim, G. Pertea, H. Pimentel and S. Salzberg. C.T. and D.G.H. are Damon Runyon Postdoctoral Fellows. J.L.R. is a Damon Runyon-Rachleff Inovator fellow. This work was supported by US National Institutes of Health grants DP2OD006670, P01GM099117, P50HG006193 and RO1ES020260 (to J.L.R.) and R01 HG006129 and R01 DK094699 (to L.P.). \n\nThese authors contributed equally to this work. Cole Trapnell & David G Hendrickson \n\nThese authors contributed equally to this work. John L Rinn & Lior Pachter \n\nAuthor Contributions: C.T. and L.P. developed the mathematics and statistics. D.G.H. and M.S. performed the experiments. D.G.H. and C.T. designed the experiments and performed the analysis. C.T. and L.G. implemented the software. L.P., J.L.R., D.G.H. and C.T. conceived the research. All authors wrote and approved the manuscript. \n\nThe authors declare no competing financial interests.\n\nAccepted Version - nihms439296.pdf
Supplemental Material - nbt.2450-S1.pdf
", "abstract": "Differential analysis of gene and transcript expression using high-throughput RNA sequencing (RNA-seq) is complicated by several sources of measurement variability and poses numerous statistical challenges. We present Cuffdiff 2, an algorithm that estimates expression at transcript-level resolution and controls for variability evident across replicate libraries. Cuffdiff 2 robustly identifies differentially expressed transcripts and genes and reveals differential splicing and promoter-preference changes. We demonstrate the accuracy of our approach through differential analysis of lung fibroblasts in response to loss of the developmental transcription factor HOXA1, which we show is required for lung fibroblast and HeLa cell cycle progression. Loss of HOXA1 results in significant expression level changes in thousands of individual transcripts, along with isoform switching events in key regulators of the cell cycle. Cuffdiff 2 performs robust differential analysis in RNA-seq experiments at transcript resolution, revealing a layer of regulation not readily observable with other high-throughput technologies.", "date": "2013-01", "date_type": "published", "publication": "Nature Biotechnology", "volume": "31", "number": "1", "publisher": "Nature Publishing Group", "pagerange": "46-53", "id_number": "CaltechAUTHORS:20170303-161532491", "issn": "1087-0156", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170303-161532491", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "Damon Runyon Cancer Research Foundation" }, { "agency": "NIH", "grant_number": "DP2OD006670" }, { "agency": "NIH", "grant_number": "P01GM099117" }, { "agency": "NIH", "grant_number": "P50HG006193" }, { "agency": "NIH", "grant_number": "RO1ES020260" }, { "agency": "NIH", "grant_number": "R01 HG006129" }, { "agency": "NIH", "grant_number": "R01 DK094699" } ] }, "doi": "10.1038/nbt.2450", "pmcid": "PMC3869392", "primary_object": { "basename": "nbt.2450-S1.pdf", "url": "https://authors.library.caltech.edu/records/3jby4-nxg71/files/nbt.2450-S1.pdf" }, "related_objects": [ { "basename": "nihms439296.pdf", "url": "https://authors.library.caltech.edu/records/3jby4-nxg71/files/nihms439296.pdf" } ], "resource_type": "article", "pub_year": "2013", "author_list": "Trapnell, Cole; Hendrickson, David G.; et el." }, { "id": "https://authors.library.caltech.edu/records/fa8tb-0xm37", "eprint_id": 74746, "eprint_status": "archive", "datestamp": "2023-08-19 14:15:44", "lastmod": "2023-10-24 22:54:14", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Roberts-Adam", "name": { "family": "Roberts", "given": "Adam" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Streaming fragment assignment for real-time analysis of sequencing experiments", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2012 Macmillan Publishers Limited. \n\nReceived 30 April 2012. Accepted 26 October 2012. Published online 18 November 2012. Corrected online 04 December 2012. \n\nThis work was supported by US National Institutes of Health grant R01HG006129. A.R. was supported in part by a National Science Foundation graduate research fellowship. We thank H. Pimentel for developing Map2GTF for converting genome mappings to transcriptome mappings and incorporating it into TopHat to help with our analysis. \n\nAuthor Contributions: A.R. and L.P. developed the mathematics and statistics and designed the algorithms. A.R. implemented the method in eXpress. A.R. and L.P. tested the software and performed the analysis. A.R. and L.P. wrote the manuscript. \n\nThe authors declare no competing financial interests.\n\nCorrected online 04 December 2012\nIn the HTML version of this article initially published online, errors in mathematical terms were present in the Online Methods section. The errors have been corrected in the HTML version.\n\nAccepted Version - nihms417955.pdf
Supplemental Material - nmeth.2251-S1.pdf
Supplemental Material - nmeth.2251-S2.zip
", "abstract": "We present eXpress, a software package for efficient probabilistic assignment of ambiguously mapping sequenced fragments. eXpress uses a streaming algorithm with linear run time and constant memory use. It can determine abundances of sequenced molecules in real time and can be applied to ChIP-seq, metagenomics and other large-scale sequencing data. We demonstrate its use on RNA-seq data and show that eXpress achieves greater efficiency than other quantification methods.", "date": "2013-01", "date_type": "published", "publication": "Nature Methods", "volume": "10", "number": "1", "publisher": "Nature Publishing Group", "pagerange": "71-73", "id_number": "CaltechAUTHORS:20170303-163300268", "issn": "1548-7091", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170303-163300268", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R01 HG006129" }, { "agency": "NSF Graduate Research Fellowship" } ] }, "doi": "10.1038/nmeth.2251", "pmcid": "PMC3880119", "primary_object": { "basename": "nihms417955.pdf", "url": "https://authors.library.caltech.edu/records/fa8tb-0xm37/files/nihms417955.pdf" }, "related_objects": [ { "basename": "nmeth.2251-S1.pdf", "url": "https://authors.library.caltech.edu/records/fa8tb-0xm37/files/nmeth.2251-S1.pdf" }, { "basename": "nmeth.2251-S2.zip", "url": "https://authors.library.caltech.edu/records/fa8tb-0xm37/files/nmeth.2251-S2.zip" } ], "resource_type": "article", "pub_year": "2013", "author_list": "Roberts, Adam and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/eas5h-yyq14", "eprint_id": 74748, "eprint_status": "archive", "datestamp": "2023-08-19 13:11:00", "lastmod": "2023-10-24 22:54:21", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Hower-V", "name": { "family": "Hower", "given": "Valerie" } }, { "id": "Starfield-R", "name": { "family": "Starfield", "given": "Richard" } }, { "id": "Roberts-Adam", "name": { "family": "Roberts", "given": "Adam" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Quantifying uniformity of mapped reads", "ispublished": "pub", "full_text_status": "restricted", "note": "\u00a9 The Author 2012. Published by Oxford University Press on behalf of The Society for Financial Studies. \n\nReceived: 20 September 2012. Revision Received: 04 July 2012. Accepted: 05 July 2012. \n\nV.H. was funded in part by NSF fellowship DMS-0902723. A.R and L.P. were funded in part by NIH R01 HG006129. A.R. was also funded in part by an NSF graduate research fellowship. \n\nConflict of Interest: none declared.", "abstract": "We describe a tool for quantifying the uniformity of mapped reads in high-throughput sequencing experiments. Our statistic directly measures the uniformity of both read position and fragment length, and we explain how to compute a P-value that can be used to quantify biases arising from experimental protocols and mapping procedures. Our method is useful for comparing different protocols in experiments such as RNA-Seq. \n\nAvailability and implementation: We provide a freely available and open source python script that can be used to analyze raw read data or reads mapped to transcripts in BAM format at http://www.math.miami.edu/~vhower/ReadSpy.html", "date": "2012-10-15", "date_type": "published", "publication": "Bioinformatics", "volume": "28", "number": "20", "publisher": "Oxford University Press", "pagerange": "2680-2682", "id_number": "CaltechAUTHORS:20170303-164056261", "issn": "1367-4803", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170303-164056261", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NSF Graduate Research Fellowship", "grant_number": "DMS-0902723" }, { "agency": "NIH", "grant_number": "R01 HG006129" } ] }, "doi": "10.1093/bioinformatics/bts451", "pmcid": "PMC3467739", "resource_type": "article", "pub_year": "2012", "author_list": "Hower, Valerie; Starfield, Richard; et el." }, { "id": "https://authors.library.caltech.edu/records/dnfee-gqw75", "eprint_id": 74749, "eprint_status": "archive", "datestamp": "2023-08-19 10:10:19", "lastmod": "2023-10-24 22:54:25", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "A closer look at RNA editing", "ispublished": "pub", "full_text_status": "restricted", "note": "\u00a9 2012 Macmillan Publishers Limited. \n\nThe author declares no competing financial interests.", "abstract": "Recent advances in high-throughput sequencing technology have made it possible to study RNA editing on a genome-wide scale. But realizing the potential of this approach requires stringent data analysis methods that control for genomic variation, sequencing errors and biases introduced by read-mapping procedures. In this issue, Peng et al. introduce such methods and apply them to conduct a careful, large-scale study of RNA editing in the transcriptome of a Han Chinese individual. These data provide the first reliable map of RNA edits in a person.", "date": "2012-03", "date_type": "published", "publication": "Nature Biotechnology", "volume": "30", "number": "3", "publisher": "Nature Publishing Group", "pagerange": "246-247", "id_number": "CaltechAUTHORS:20170303-164721967", "issn": "1087-0156", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170303-164721967", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "doi": "10.1038/nbt.2156", "resource_type": "article", "pub_year": "2012", "author_list": "Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/x492k-9n169", "eprint_id": 74752, "eprint_status": "archive", "datestamp": "2023-08-19 10:10:29", "lastmod": "2023-10-23 15:43:28", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Trapnell-C", "name": { "family": "Trapnell", "given": "Cole" } }, { "id": "Roberts-Adam", "name": { "family": "Roberts", "given": "Adam" } }, { "id": "Goff-L-A", "name": { "family": "Goff", "given": "Loyal" } }, { "id": "Pertea-G", "name": { "family": "Pertea", "given": "Geo" } }, { "id": "Kim-Daehwan", "name": { "family": "Kim", "given": "Daehwan" } }, { "id": "Kelley-D-R", "name": { "family": "Kelley", "given": "David R." } }, { "id": "Pimentel-H", "name": { "family": "Pimentel", "given": "Harold" } }, { "id": "Salzberg-S-L", "name": { "family": "Salzberg", "given": "Steven L." } }, { "id": "Rinn-J-L", "name": { "family": "Rinn", "given": "John L." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2012 Macmillan Publishers Limited. \n\nPublished online 1 March 2012; corrected after print 7 August 2014; doi:10.1038/nprot.2012.016. \n\nWe are grateful to D. Hendrickson, M. Cabili and B. Langmead for helpful technical discussions. The TopHat and Cufflinks projects are supported by US National Institutes of Health grants R01-HG006102 (to S.L.S.) and R01-HG006129-01 (to L.P.). C.T. is a Damon Runyon Cancer Foundation Fellow. L.G. is a National Science Foundation Postdoctoral Fellow. A.R. is a National Science Foundation Graduate Research Fellow. J.L.R. is a Damon Runyon-Rachleff, Searle, and Smith Family Scholar, and is supported by Director's New Innovator Awards (1DP2OD00667-01). This work was funded in part by the Center of Excellence in Genome Science from the US National Human Genome Research Institute (J.L.R.). J.L.R. is an investigator of the Merkin Foundation for Stem Cell Research at the Broad Institute. \n\nAuthor Contributions: C.T. is the lead developer for the TopHat and Cufflinks projects. L.G. designed and wrote CummeRbund. D.K., H.P. and G.P. are developers of TopHat. A.R. and G.P. are developers of Cufflinks and its accompanying utilities. C.T. developed the protocol, generated the example experiment and performed the analysis. L.P., S.L.S. and C.T. conceived the TopHat and Cufflinks software projects. C.T., D.R.K. and J.L.R. wrote the manuscript. \n\nThe authors declare no competing financial interests.\n\nAccepted Version - nihms-366741.pdf
", "abstract": "Recent advances in high-throughput cDNA sequencing (RNA-seq) can reveal new genes and splice variants and quantify expression genome-wide in a single assay. The volume and complexity of data from RNA-seq experiments necessitate scalable, fast and mathematically principled analysis software. TopHat and Cufflinks are free, open-source software tools for gene discovery and comprehensive expression analysis of high-throughput mRNA sequencing (RNA-seq) data. Together, they allow biologists to identify new genes and new splice variants of known ones, as well as compare gene and transcript expression under two or more conditions. This protocol describes in detail how to use TopHat and Cufflinks to perform such analyses. It also covers several accessory tools and utilities that aid in managing data, including CummeRbund, a tool for visualizing RNA-seq analysis results. Although the procedure assumes basic informatics skills, these tools assume little to no background with RNA-seq analysis and are meant for novices and experts alike. The protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results. The protocol's execution time depends on the volume of transcriptome sequencing data and available computing resources but takes less than 1 d of computer time for typical experiments and ~1 h of hands-on time.", "date": "2012-03", "date_type": "published", "publication": "Nature Protocols", "volume": "7", "number": "3", "publisher": "Nature Publishing Group", "pagerange": "562-578", "id_number": "CaltechAUTHORS:20170303-165006599", "issn": "1754-2189", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170303-165006599", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R01-HG006102" }, { "agency": "NIH", "grant_number": "R01-HG006129-01" }, { "agency": "Damon Runyon Cancer Research Foundation" }, { "agency": "NSF Postdoctoral Fellowship" }, { "agency": "NSF Graduate Research Fellowship" }, { "agency": "Searle Scholars Program" }, { "agency": "Smith Family Foundation" }, { "agency": "NIH", "grant_number": "1DP2OD00667-01" }, { "agency": "Merkin Foundation for Stem Cell Research" } ] }, "doi": "10.1038/nprot.2012.016", "pmcid": "PMC3334321", "primary_object": { "basename": "nihms-366741.pdf", "url": "https://authors.library.caltech.edu/records/x492k-9n169/files/nihms-366741.pdf" }, "resource_type": "article", "pub_year": "2012", "author_list": "Trapnell, Cole; Roberts, Adam; et el." }, { "id": "https://authors.library.caltech.edu/records/1bb4r-e4p39", "eprint_id": 74770, "eprint_status": "archive", "datestamp": "2023-08-19 08:59:35", "lastmod": "2023-10-24 23:13:44", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Martin-David-I-K", "name": { "family": "Martin", "given": "David I. K." } }, { "id": "Singer-M", "name": { "family": "Singer", "given": "Meromit" } }, { "id": "Dhahbi-Joseph", "name": { "family": "Dhahbi", "given": "Joseph" } }, { "id": "Mao-Guanxiong", "name": { "family": "Mao", "given": "Guanxiong" } }, { "id": "Zhang-Lu", "name": { "family": "Zhang", "given": "Lu" } }, { "id": "Schroth-G-P", "name": { "family": "Schroth", "given": "Gary P." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Boffelli-D", "name": { "family": "Boffelli", "given": "Dario" } } ] }, "title": "Phyloepigenomic comparison of great apes reveals a correlation between somatic and germline methylation states", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2011 Cold Spring Harbor Laboratory Press. The Authors acknowledge that six months after the full-issue publication date, the Article will be distributed under a Creative Commons CC-BY-NC License (Attribution-NonCommercial 4.0 International License, http://creativecommons.org/licenses/by-nc/4.0/). \n\nReceived February 28, 2011; accepted in revised form September 6, 2011. \n\nThis work was supported by NIH grants HL084474 (D.B.), ES016581 (D.I.K.M.), and CA115768 (D.I.K.M.). J.D. was supported by the California Institute of Regenerative Medicine. This study used biological materials obtained from the Southwest National Primate Research Center, which is supported by NIH-NCRR grant P51 RR013986. We thank Cole Trapnell for help with Bowtie alignments, and Cath Suter for helpful comments. \n\nData access: The sequence data used in this study have been submitted to the NCBI Gene Expression Omnibus (GEO) (http://www.ncbi.nlm.nih.gov/geo/) under accession number GSE22376.\n\nPublished - Genome_Res.-2011-Martin-2049-57.pdf
Supplemental Material - Singer_Supporting_Information_REVISED.pdf
", "abstract": "We have determined methylation state differences in the epigenomes of uncultured cells purified from human, chimpanzee, and orangutan, using digestion with a methylation-sensitive enzyme, deep sequencing, and computational analysis of the sequence data. The methylomes show a high degree of conservation, but the methylation states of approximately 10% of CpG island-like regions differ significantly between human and chimp. The differences are not associated with changes in CG content, and recapitulate the known phylogenetic relationship of the three species, indicating that they are stably maintained within each species. Inferences about the relationship between somatic and germline methylation states can be made by an analysis of CG decay, derived from methylation and sequence data. This indicates that somatic methylation states are highly related to germline states, and that the methylation differences between human and chimp have occurred in the germline. These results provide evidence for epigenetic changes that occur in the germline and distinguish closely related species, and suggest that germline epigenetic states might constrain somatic states.", "date": "2011-12-21", "date_type": "published", "publication": "Genome Research", "volume": "21", "number": "12", "publisher": "Cold Spring Harbor Laboratory Press", "pagerange": "2049-2057", "id_number": "CaltechAUTHORS:20170306-101015596", "issn": "1088-9051", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170306-101015596", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "HL084474" }, { "agency": "NIH", "grant_number": "ES016581" }, { "agency": "NIH", "grant_number": "CA115768" }, { "agency": "California Institute for Regenerative Medicine (CIRM)" }, { "agency": "NIH", "grant_number": "P51 RR013986" } ] }, "doi": "10.1101/gr.122721.111", "pmcid": "PMC3227095", "primary_object": { "basename": "Genome_Res.-2011-Martin-2049-57.pdf", "url": "https://authors.library.caltech.edu/records/1bb4r-e4p39/files/Genome_Res.-2011-Martin-2049-57.pdf" }, "related_objects": [ { "basename": "Singer_Supporting_Information_REVISED.pdf", "url": "https://authors.library.caltech.edu/records/1bb4r-e4p39/files/Singer_Supporting_Information_REVISED.pdf" } ], "resource_type": "article", "pub_year": "2011", "author_list": "Martin, David I. K.; Singer, Meromit; et el." }, { "id": "https://authors.library.caltech.edu/records/fbtq8-9m762", "eprint_id": 74766, "eprint_status": "archive", "datestamp": "2023-08-19 08:42:38", "lastmod": "2023-10-24 23:12:08", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Roberts-Adam", "name": { "family": "Roberts", "given": "Adam" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "RNA-Seq and find: entering the RNA deep field", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2011 BioMed Central Ltd. \n\nPublished: 22 November 2011. \n\nAR is supported by a graduate research fellowship from the National Science Foundation. LP is supported in part by grant NIH R01 HG006129-01. We thank Meromit Singer for comments on the curse of deep sequencing. \n\nThe authors declare that they have no competing interests.\n\nPublished - gm290.pdf
", "abstract": "Initial high-throughput RNA sequencing (RNA-Seq) experiments have revealed a complex and dynamic transcriptome, but because it samples transcripts in proportion to their abundances, assessing the extent and nature of low-level transcription using this technique has been difficult. A new assay, RNA CaptureSeq, addresses this limitation of RNA-Seq by enriching for low-level transcripts with cDNA tiling arrays prior to high-throughput sequencing. This approach reveals a plethora of transcripts that have been previously dismissed as 'noise', and hints at single-cell transcription fingerprints that may be crucial in defining cellular function in normal and disease states.", "date": "2011-11-22", "date_type": "published", "publication": "Genome Medicine", "volume": "3", "number": "11", "publisher": "BioMed Central", "pagerange": "Art. No. 74", "id_number": "CaltechAUTHORS:20170306-093511106", "issn": "1756-994X", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170306-093511106", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NSF Graduate Research Fellowship" }, { "agency": "NIH", "grant_number": "R01 HG006129-01" } ] }, "doi": "10.1186/gm290", "pmcid": "PMC3308029", "primary_object": { "basename": "gm290.pdf", "url": "https://authors.library.caltech.edu/records/fbtq8-9m762/files/gm290.pdf" }, "resource_type": "article", "pub_year": "2011", "author_list": "Roberts, Adam and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/khkgx-4cj35", "eprint_id": 74768, "eprint_status": "archive", "datestamp": "2023-08-19 08:42:01", "lastmod": "2023-10-24 23:12:15", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Meacham-F", "name": { "family": "Meacham", "given": "Frazer" } }, { "id": "Boffelli-D", "name": { "family": "Boffelli", "given": "Dario" } }, { "id": "Dhahbi-Joseph", "name": { "family": "Dhahbi", "given": "Joseph" } }, { "id": "Martin-David-I-K", "name": { "family": "Martin", "given": "David I. K." } }, { "id": "Singer-Meromit", "name": { "family": "Singer", "given": "Meromit" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Identification and correction of systematic error in high-throughput sequence data", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2011 Meacham et al. This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. \n\nReceived: 25 May 2011. Accepted: 21 November 2011. Published: 21 November 2011. \n\nWe thank Professor Yun Song and Dr. Wei-Chun Kao from UC Berkeley for the phiX174 dataset and the associated naiveBayesCall output. Dario Boffelli was partially funded by NIH grant HL084474, David Martin by NIH grant ES016581, and Meromit Singer and Lior Pachter by NIH grant 1R01HG006129-01. \n\nAuthors' contributions: FM, MS and LP formulated the problem of searching for systematic errors by studying discordant read pairs and designed a research plan. FM and MS conducted the research. DB, JD and DM performed the sequencing and contributed the datasets analyzed, and FM, MS and LP wrote the manuscript. All authors read and approved the final manuscript.\n\nPublished - art_3A10.1186_2F1471-2105-12-451.pdf
Supplemental Material - 12859_2011_5050_MOESM1_ESM.pdf
Supplemental Material - 12859_2011_5050_MOESM2_ESM.pdf
Supplemental Material - 12859_2011_5050_MOESM3_ESM.pdf
Supplemental Material - 12859_2011_5050_MOESM4_ESM.pdf
Supplemental Material - 12859_2011_5050_MOESM5_ESM.pdf
Supplemental Material - 12859_2011_5050_MOESM6_ESM.pdf
Supplemental Material - 12859_2011_5050_MOESM7_ESM.png
Supplemental Material - 12859_2011_5050_MOESM8_ESM.pdf
", "abstract": "Background: A feature common to all DNA sequencing technologies is the presence of base-call errors in the sequenced reads. The implications of such errors are application specific, ranging from minor informatics nuisances to major problems affecting biological inferences. Recently developed \"next-gen\" sequencing technologies have greatly reduced the cost of sequencing, but have been shown to be more error prone than previous technologies. Both position specific (depending on the location in the read) and sequence specific (depending on the sequence in the read) errors have been identified in Illumina and Life Technology sequencing platforms. We describe a new type of systematic error that manifests as statistically unlikely accumulations of errors at specific genome (or transcriptome) locations.\n\nResults: We characterize and describe systematic errors using overlapping paired reads from high-coverage data. We show that such errors occur in approximately 1 in 1000 base pairs, and that they are highly replicable across experiments. We identify motifs that are frequent at systematic error sites, and describe a classifier that distinguishes heterozygous sites from systematic error. Our classifier is designed to accommodate data from experiments in which the allele frequencies at heterozygous sites are not necessarily 0.5 (such as in the case of RNA-Seq), and can be used with single-end datasets. \n\nConclusions: Systematic errors can easily be mistaken for heterozygous sites in individuals, or for SNPs in population analyses. Systematic errors are particularly problematic in low coverage experiments, or in estimates of allele-specific expression from RNA-Seq data. Our characterization of systematic error has allowed us to develop a program, called SysCall, for identifying and correcting such errors. We conclude that correction of systematic errors is important to consider in the design and interpretation of high-throughput sequencing experiments.", "date": "2011-11-21", "date_type": "published", "publication": "BMC Bioinformatics", "volume": "12", "publisher": "BioMed Central", "pagerange": "Art. No. 451", "id_number": "CaltechAUTHORS:20170306-095020310", "issn": "1471-2105", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170306-095020310", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "HL084474" }, { "agency": "NIH", "grant_number": "ES016581" }, { "agency": "NIH", "grant_number": "1R01HG006129-01" } ] }, "doi": "10.1186/1471-2105-12-451", "pmcid": "PMC3295828", "primary_object": { "basename": "12859_2011_5050_MOESM1_ESM.pdf", "url": "https://authors.library.caltech.edu/records/khkgx-4cj35/files/12859_2011_5050_MOESM1_ESM.pdf" }, "related_objects": [ { "basename": "12859_2011_5050_MOESM3_ESM.pdf", "url": "https://authors.library.caltech.edu/records/khkgx-4cj35/files/12859_2011_5050_MOESM3_ESM.pdf" }, { "basename": "12859_2011_5050_MOESM5_ESM.pdf", "url": "https://authors.library.caltech.edu/records/khkgx-4cj35/files/12859_2011_5050_MOESM5_ESM.pdf" }, { "basename": "12859_2011_5050_MOESM8_ESM.pdf", "url": "https://authors.library.caltech.edu/records/khkgx-4cj35/files/12859_2011_5050_MOESM8_ESM.pdf" }, { "basename": "art_3A10.1186_2F1471-2105-12-451.pdf", "url": "https://authors.library.caltech.edu/records/khkgx-4cj35/files/art_3A10.1186_2F1471-2105-12-451.pdf" }, { "basename": "12859_2011_5050_MOESM2_ESM.pdf", "url": "https://authors.library.caltech.edu/records/khkgx-4cj35/files/12859_2011_5050_MOESM2_ESM.pdf" }, { "basename": "12859_2011_5050_MOESM4_ESM.pdf", "url": "https://authors.library.caltech.edu/records/khkgx-4cj35/files/12859_2011_5050_MOESM4_ESM.pdf" }, { "basename": "12859_2011_5050_MOESM6_ESM.pdf", "url": "https://authors.library.caltech.edu/records/khkgx-4cj35/files/12859_2011_5050_MOESM6_ESM.pdf" }, { "basename": "12859_2011_5050_MOESM7_ESM.png", "url": "https://authors.library.caltech.edu/records/khkgx-4cj35/files/12859_2011_5050_MOESM7_ESM.png" } ], "resource_type": "article", "pub_year": "2011", "author_list": "Meacham, Frazer; Boffelli, Dario; et el." }, { "id": "https://authors.library.caltech.edu/records/4dgap-m4a79", "eprint_id": 74769, "eprint_status": "archive", "datestamp": "2023-08-22 03:44:22", "lastmod": "2023-10-24 23:12:18", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Singer-Meromit", "name": { "family": "Singer", "given": "Meromit" } }, { "id": "Engstr\u00f6m-A", "name": { "family": "Engstr\u00f6m", "given": "Alexander" } }, { "id": "Sch\u00f6nhuth-A", "name": { "family": "Sch\u00f6nhuth", "given": "Alexander" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Determining Coding CpG Islands by Identifying Regions Significant for Pattern Statistics on Markov Chains", "ispublished": "pub", "full_text_status": "public", "keywords": "Markov chains; pattern statistics; CpG islands; genomics", "note": "\u00a9 2011 by Walter de Gruyter GmbH.", "abstract": "Recent experimental and computational work confirms that CpGs can be unmethylated inside coding exons, thereby showing that codons may be subjected to both genomic and epigenomic constraint. It is therefore of interest to identify coding CpG islands (CCGIs) that are regions inside exons enriched for CpGs. The difficulty in identifying such islands is that coding exons exhibit sequence biases determined by codon usage and constraints that must be taken into account. \n\nWe present a method for finding CCGIs that showcases a novel approach we have developed for identifying regions of interest that are significant (with respect to a Markov chain) for the counts of any pattern. Our method begins with the exact computation of tail probabilities for the number of CpGs in all regions contained in coding exons, and then applies a greedy algorithm for selecting islands from among the regions. We show that the greedy algorithm provably optimizes a biologically motivated criterion for selecting islands while controlling the false discovery rate. \n\nWe applied this approach to the human genome (hg18) and annotated CpG islands in coding exons. The statistical criterion we apply to evaluating islands reduces the number of false positives in existing annotations, while our approach to defining islands reveals significant numbers of undiscovered CCGIs in coding exons. Many of these appear to be examples of functional epigenetic specialization in coding exons.", "date": "2011-09-23", "date_type": "published", "publication": "Statistical Applications in Genetics and Molecular Biology", "volume": "10", "number": "1", "publisher": "De Gruyter", "pagerange": "Art. No. 43", "id_number": "CaltechAUTHORS:20170306-100428893", "issn": "2194-6302", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170306-100428893", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "doi": "10.2202/1544-6115.1677", "resource_type": "article", "pub_year": "2011", "author_list": "Singer, Meromit; Engstr\u00f6m, Alexander; et el." }, { "id": "https://authors.library.caltech.edu/records/381cc-ttw35", "eprint_id": 74772, "eprint_status": "archive", "datestamp": "2023-08-19 07:59:47", "lastmod": "2023-10-24 23:13:51", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Roberts-A", "name": { "family": "Roberts", "given": "Adam" } }, { "id": "Pimentel-H", "name": { "family": "Pimentel", "given": "Harold" } }, { "id": "Trapnell-C", "name": { "family": "Trapnell", "given": "Cole" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Identification of novel transcripts in annotated genomes using RNA-Seq", "ispublished": "pub", "full_text_status": "restricted", "note": "\u00a9 The Author 2011. Published by Oxford University Press. \n\nAR was funded in part by an NSF graduate fellowship. \n\nConflict of Interest: none declared.", "abstract": "Summary: We describe a new 'reference annotation based transcript assembly' problem for RNA-Seq data that involves assembling novel transcripts in the context of an existing annotation. This problem arises in the analysis of expression in model organisms, where it is desirable to leverage existing annotations for discovering novel transcripts. We present an algorithm for reference annotation-based transcript assembly and show how it can be used to rapidly investigate novel transcripts revealed by RNA-Seq in comparison with a reference annotation. \n\nAvailability: The methods described in this article are implemented in the Cufflinks suite of software for RNA-Seq, freely available from http://bio.math.berkeley.edu/cufflinks. The software is released under the BOOST license.", "date": "2011-09-01", "date_type": "published", "publication": "Bioinformatics", "volume": "27", "number": "17", "publisher": "Oxford University Press", "pagerange": "2325-2329", "id_number": "CaltechAUTHORS:20170306-101954304", "issn": "1367-4803", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170306-101954304", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NSF Graduate Research Fellowship" } ] }, "doi": "10.1093/bioinformatics/btr355", "resource_type": "article", "pub_year": "2011", "author_list": "Roberts, Adam; Pimentel, Harold; et el." }, { "id": "https://authors.library.caltech.edu/records/t7zmb-bhs07", "eprint_id": 74786, "eprint_status": "archive", "datestamp": "2023-08-22 03:22:49", "lastmod": "2023-10-24 23:16:45", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Levy-D", "name": { "family": "Levy", "given": "Dan" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "The neighbor-net algorithm", "ispublished": "pub", "full_text_status": "public", "keywords": "Neighbor-net; Neighbor-joining; Circular decomposable metric; Traveling salesman problem; Kalmanson conditions; Balanced length; Minimum evolution; Splits network", "note": "\u00a9 2010 Elsevier. \n\nReceived 18 February 2007, Accepted 3 June 2007, Available online 5 November 2010. \n\nFig. 1(e) is inspired by Fig. 13(b) from [23]. We thank David Bryant, Vincent Moulton and Andreas Spillner for kindly sharing with us a preprint of [10], Lu Luo for useful comments on a preliminary version of this manuscript, and Radu Mihaescu for suggestions in the proof of Theorem 29. Lior Pachter was supported in part by an NSF CAREER award (CCF-0347992) and thanks Jotun Hem and Philip Maini for hosting him while on sabbatical when this work was performed. Dan Levy was supported by a grant from the Biotechnology and Biological Sciences Research Council of the UK (BB/D005418/1).\n\nSubmitted - 0702515.pdf
", "abstract": "The neighbor-joining algorithm is a popular phylogenetics method for constructing trees from dissimilarity maps. The neighbor-net algorithm is an extension of the neighbor-joining algorithm and is used for constructing split networks. We begin by describing the output of neighbor-net in terms of the tessellation of M\u00af_0^n(R) by associahedra. This highlights the fact that neighbor-net outputs a tree in addition to a circular ordering and we explain when the neighbor-net tree is the neighbor-joining tree. A key observation is that the tree constructed in existing implementations of neighbor-net is not a neighbor-joining tree. Next, we show that neighbor-net is a greedy algorithm for finding circular split systems of minimal balanced length. This leads to an interpretation of neighbor-net as a greedy algorithm for the traveling salesman problem. The algorithm is optimal for Kalmanson matrices, from which it follows that neighbor-net is consistent and has optimal radius 12. We also provide a statistical interpretation for the balanced length for a circular split system as the length based on weighted least squares estimates of the splits. We conclude with applications of these results and demonstrate the implications of our theorems for a recently published comparison of Papuan and Austronesian languages.", "date": "2011-08", "date_type": "published", "publication": "Advances in Applied Mathematics", "volume": "47", "number": "2", "publisher": "Elsevier", "pagerange": "240-258", "id_number": "CaltechAUTHORS:20170306-113043756", "issn": "0196-8858", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170306-113043756", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NSF", "grant_number": "CCF-0347992" }, { "agency": "Biotechnology and Biological Sciences Research Council (BBSRC)", "grant_number": "BB/D005418/1" } ] }, "doi": "10.1016/j.aam.2010.09.002", "primary_object": { "basename": "0702515.pdf", "url": "https://authors.library.caltech.edu/records/t7zmb-bhs07/files/0702515.pdf" }, "resource_type": "article", "pub_year": "2011", "author_list": "Levy, Dan and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/kg7s1-59c84", "eprint_id": 74774, "eprint_status": "archive", "datestamp": "2023-08-19 07:38:59", "lastmod": "2023-10-24 23:13:58", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Snir-S", "name": { "family": "Snir", "given": "Sagi" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Tracing the Most Parsimonious Indel History", "ispublished": "pub", "full_text_status": "restricted", "keywords": "algorithms, biology, computational molecular biology evolution", "note": "\u00a9 2012 Mary Ann Liebert, Inc.", "abstract": "Sequence alignment (the grouping of homologous bases into one column) is fundamental to almost any task in comparative genomics. This translates to positing gaps in the genomic sequences to account for events of insertions and deletions (indels). The interrelationship between sequence alignment and phylogenetic reconstruction has drawn substantial attention recently with works showing the significance of differences in alignments. One of the plausible approaches in this direction is to grade the suitability of a tree to an associated alignment and vice verse. We here present a combinatorial (as opposed to statistical) approach based on the indel history. We show\u2014both by simulations and by using real biological data from the Encyclopedia of DNA Elements (ENCODE)\u2014that this criterion is sound. The novelty of our approach is the distinguishing between insertions and deletions, and augmenting the analysis with a dimension of \"depth,\" extending it from the sequence space to the phylogenetic space. Using this approach, we perform a comprehensive study of indel characteristic behavior among mammals in both coding and non-coding regions. Our results show significant differences in indel patterns between coding and non-coding regions. We also show other characteristic patterns of indel evolution in the depth of the underlying phylogeny.", "date": "2011-08", "date_type": "published", "publication": "Journal of Computational Biology", "volume": "18", "number": "8", "publisher": "Mary Ann Liebert, Inc.", "pagerange": "967-986", "id_number": "CaltechAUTHORS:20170306-102549258", "issn": "1066-5277", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170306-102549258", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "doi": "10.1089/cmb.2010.0325", "resource_type": "article", "pub_year": "2011", "author_list": "Snir, Sagi and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/c0y2x-zwa90", "eprint_id": 74776, "eprint_status": "archive", "datestamp": "2023-08-22 03:09:47", "lastmod": "2023-10-24 23:14:59", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Aviran-S", "name": { "family": "Aviran", "given": "Sharon" } }, { "id": "Trapnell-C", "name": { "family": "Trapnell", "given": "Cole" } }, { "id": "Lucks-J-B", "name": { "family": "Lucks", "given": "Julius B." } }, { "id": "Mortimer-S-A", "name": { "family": "Mortimer", "given": "Stefanie A." } }, { "id": "Luo-Shujun", "name": { "family": "Luo", "given": "Shujun" } }, { "id": "Schroth-G-P", "name": { "family": "Schroth", "given": "Gary P." } }, { "id": "Doudna-J-A", "name": { "family": "Doudna", "given": "Jennifer A." } }, { "id": "Arkin-A-P", "name": { "family": "Arkin", "given": "Adam P." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Modeling and automation of sequencing-based characterization of RNA structure", "ispublished": "pub", "full_text_status": "public", "keywords": "signal processing; next generation sequencing; chemical mapping; RNA sequencing; RNA folding", "note": "\u00a9 2011 National Academy of Sciences. Freely available online through the PNAS open access option. \n\nContributed by Jennifer A. Doudna, April 29, 2011 (sent for review February 13, 2011) \n\nS.A., J.B.L., and A.P.A. acknowledge support from the Synthetic Biology Engineering Research Center under NSF Grant 04-570/0540879. J.A.D. is an Howard Hughes Medical Institute (HHMI) Investigator, and this work was supported in part by the HHMI. S.A.M. is a fellow of the Leukemia and Lymphoma Society. J.B.L. and L.P. thank the Miller Institute for financial support and a stimulating environment in which this work was conceived. \n\nAuthor contributions: S.A., C.T., J.B.L., S.A.M., S.L., G.P.S., J.A.D., A.P.A., and L.P. designed research; S.A., C.T., J.B.L., S.A.M., and L.P. performed research; S.A., C.T., J.B.L., and L.P. contributed new reagents/analytic tools; S.A., C.T., J.B.L., S.A.M., S.L., and L.P. analyzed data; and S.A., C.T., J.B.L., S.A.M., S.L., G.P.S., J.A.D., A.P.A., and L.P. wrote the paper. \n\nThis article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1106541108/-/DCSupplemental. \n\nThe authors declare no conflict of interest.\n\nPublished - PNAS-2011-Aviran-11069-74.pdf
Supplemental Material - pnas.1106541108_SI.pdf
", "abstract": "Sequence census methods reduce molecular measurements such as transcript abundance and protein-nucleic acid interactions to counting problems via DNA sequencing. We focus on a novel assay utilizing this approach, called selective 2\u2032-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq), that can be used to characterize RNA secondary and tertiary structure. We describe a fully automated data analysis pipeline for SHAPE-Seq analysis that includes read processing, mapping, and structural inference based on a model of the experiment. Our methods rely on the solution of a series of convex optimization problems for which we develop efficient and effective numerical algorithms. Our results can be easily extended to other chemical probes of RNA structure, and also generalized to modeling polymerase drop-off in other sequence census-based experiments.", "date": "2011-07-05", "date_type": "published", "publication": "Proceedings of the National Academy of Sciences of the United States of America", "volume": "108", "number": "27", "publisher": "National Academy of Sciences", "pagerange": "11069-11074", "id_number": "CaltechAUTHORS:20170306-103050792", "issn": "0027-8424", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170306-103050792", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NSF", "grant_number": "04-570/0540879" }, { "agency": "Howard Hughes Medical Institute (HHMI)" }, { "agency": "Leukemia and Lymphoma Society" }, { "agency": "Miller Institute for Basic Research in Science" } ] }, "doi": "10.1073/pnas.1106541108", "pmcid": "PMC3131376", "primary_object": { "basename": "PNAS-2011-Aviran-11069-74.pdf", "url": "https://authors.library.caltech.edu/records/c0y2x-zwa90/files/PNAS-2011-Aviran-11069-74.pdf" }, "related_objects": [ { "basename": "pnas.1106541108_SI.pdf", "url": "https://authors.library.caltech.edu/records/c0y2x-zwa90/files/pnas.1106541108_SI.pdf" } ], "resource_type": "article", "pub_year": "2011", "author_list": "Aviran, Sharon; Trapnell, Cole; et el." }, { "id": "https://authors.library.caltech.edu/records/8ca3h-68a74", "eprint_id": 74778, "eprint_status": "archive", "datestamp": "2023-08-22 03:09:54", "lastmod": "2023-10-24 23:15:02", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Lucks-J-B", "name": { "family": "Lucks", "given": "Julius B." } }, { "id": "Mortimer-S-A", "name": { "family": "Mortimer", "given": "Stefanie A." } }, { "id": "Trapnell-C", "name": { "family": "Trapnell", "given": "Cole" } }, { "id": "Luo-Shujun", "name": { "family": "Luo", "given": "Shujun" } }, { "id": "Aviran-S", "name": { "family": "Aviran", "given": "Sharon" } }, { "id": "Schroth-G-P", "name": { "family": "Schroth", "given": "Gary P." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Doudna-J-A", "name": { "family": "Doudna", "given": "Jennifer A." } }, { "id": "Arkin-A-P", "name": { "family": "Arkin", "given": "Adam P." } } ] }, "title": "Multiplexed RNA structure characterization with selective 2'-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq)", "ispublished": "pub", "full_text_status": "public", "keywords": "chemical probing; RNA sequencing; RNA folding genomics", "note": "\u00a9 2011 National Academy of Sciences. Freely available online through the PNAS open access option. \n\nContributed by Jennifer A. Doudna, May 1, 2011 (sent for review February 9, 2011) \n\nThe authors thank Michael Eisen, Jacqueline Villalta, Oh Kyu Yoon, Leath Tonkin, Devin Scannell, Jennifer Kuehl, and Keith Keller for advice and assistance. We thank Rhiju Das for insightful reading of the manuscript. We also thank Phil Homan (University of North Carolina, Chapel Hill, NC) and Kevin Weeks (University of North Carolina, Chapel Hill, NC) for the generous gift of 1M7. J.A.D. is a Howard Hughes Medical Institute (HHMI) Investigator, and this work was supported in part by the HHMI. S.A.M. is a fellow of the Leukemia and Lymphoma Society. A.P.A., J.B.L., and S.A. acknowledge support from the Synthetic Biology Engineering Research Center under National Science Foundation Grant 04-570/0540879. J.B.L. and L.P. thank the Miller Institute for financial support, and a stimulating environment in which this work was conceived. \n\nAuthor contributions: J.B.L., S.A.M., C.T., S.L., S.A., G.P.S., L.P., J.A.D., and A.P.A. designed research; J.B.L., S.A.M., C.T., S.L., and S.A. performed research; J.B.L., S.A.M., C.T., S.L., and S.A. contributed new reagents/analytic tools; J.B.L., S.A.M., C.T., S.L., S.A., and L.P. analyzed data; and J.B.L., S.A.M., C.T., S.L., S.A., G.P.S., L.P., J.A.D., and A.P.A. wrote the paper. \n\nThis article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1106501108/-/DCSupplemental. \n\nThe authors declare no conflict of interest.\n\nPublished - 11063.full.pdf
Supplemental Material - pnas.1106501108_SI.pdf
", "abstract": "New regulatory roles continue to emerge for both natural and engineered noncoding RNAs, many of which have specific secondary and tertiary structures essential to their function. Thus there is a growing need to develop technologies that enable rapid characterization of structural features within complex RNA populations. We have developed a high-throughput technique, SHAPE-Seq, that can simultaneously measure quantitative, single nucleotide-resolution secondary and tertiary structural information for hundreds of RNA molecules of arbitrary sequence. SHAPE-Seq combines selective 2\u2032-hydroxyl acylation analyzed by primer extension (SHAPE) chemistry with multiplexed paired-end deep sequencing of primer extension products. This generates millions of sequencing reads, which are then analyzed using a fully automated data analysis pipeline, based on a rigorous maximum likelihood model of the SHAPE-Seq experiment. We demonstrate the ability of SHAPE-Seq to accurately infer secondary and tertiary structural information, detect subtle conformational changes due to single nucleotide point mutations, and simultaneously measure the structures of a complex pool of different RNA molecules. SHAPE-Seq thus represents a powerful step toward making the study of RNA secondary and tertiary structures high throughput and accessible to a wide array of scientific pursuits, from fundamental biological investigations to engineering RNA for synthetic biological systems.", "date": "2011-07-05", "date_type": "published", "publication": "Proceedings of the National Academy of Sciences of the United States of America", "volume": "108", "number": "27", "publisher": "National Academy of Sciences", "pagerange": "11063-11068", "id_number": "CaltechAUTHORS:20170306-104204159", "issn": "0027-8424", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170306-104204159", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "Howard Hughes Medical Institute (HHMI)" }, { "agency": "Leukemia and Lymphoma Society" }, { "agency": "NSF", "grant_number": "04-570/0540879" }, { "agency": "Miller Institute for Basic Research in Science" } ] }, "doi": "10.1073/pnas.1106501108", "pmcid": "PMC3131332", "primary_object": { "basename": "pnas.1106501108_SI.pdf", "url": "https://authors.library.caltech.edu/records/8ca3h-68a74/files/pnas.1106501108_SI.pdf" }, "related_objects": [ { "basename": "11063.full.pdf", "url": "https://authors.library.caltech.edu/records/8ca3h-68a74/files/11063.full.pdf" } ], "resource_type": "article", "pub_year": "2011", "author_list": "Lucks, Julius B.; Mortimer, Stefanie A.; et el." }, { "id": "https://authors.library.caltech.edu/records/tkpa3-pje63", "eprint_id": 74779, "eprint_status": "archive", "datestamp": "2023-08-19 05:53:07", "lastmod": "2023-10-24 23:16:24", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Roberts-A", "name": { "family": "Roberts", "given": "Adam" } }, { "id": "Trapnell-C", "name": { "family": "Trapnell", "given": "Cole" } }, { "id": "Donaghey-J", "name": { "family": "Donaghey", "given": "Julie" } }, { "id": "Rinn-J-L", "name": { "family": "Rinn", "given": "John L." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Improving RNA-Seq expression estimates by correcting for fragment bias", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2011 Roberts et al.; licensee BioMed Central Ltd. This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. \n\nReceived: 4 December 2010. Accepted: 16 March 2011. Published: 16 March 2011. \n\nWe thank Joshua Levin and Mitchell Guttman for their help with the NanoString experiment. Anat Caspi was instrumental in helping us obtain the SOLiD data. Adam Roberts was supported by an NSF graduate research fellowship. \n\nAuthors' contributions: AR, CT and LP developed the bias correction approach. AR implemented the improvements to the Cufflinks software. JLR provided reagents and guidance. JD performed the NanoString experiment. AR performed the analysis. AR and LP wrote the paper. All authors read and approved the final manuscript.\n\nPublished - art_3A10.1186_2Fgb-2011-12-3-r22.pdf
Supplemental Material - 13059_2010_2498_MOESM10_ESM.pdf
Supplemental Material - 13059_2010_2498_MOESM11_ESM.pdf
Supplemental Material - 13059_2010_2498_MOESM12_ESM.pdf
Supplemental Material - 13059_2010_2498_MOESM13_ESM.pdf
Supplemental Material - 13059_2010_2498_MOESM14_ESM.tiff
Supplemental Material - 13059_2010_2498_MOESM15_ESM.pdf
Supplemental Material - 13059_2010_2498_MOESM1_ESM.PDF
Supplemental Material - 13059_2010_2498_MOESM2_ESM.tgz
Supplemental Material - 13059_2010_2498_MOESM3_ESM.PDF
Supplemental Material - 13059_2010_2498_MOESM4_ESM.py
Supplemental Material - 13059_2010_2498_MOESM5_ESM.pdf
Supplemental Material - 13059_2010_2498_MOESM6_ESM.pdf
Supplemental Material - 13059_2010_2498_MOESM7_ESM.pdf
Supplemental Material - 13059_2010_2498_MOESM8_ESM.pdf
Supplemental Material - 13059_2010_2498_MOESM9_ESM.pdf
", "abstract": "The biochemistry of RNA-Seq library preparation results in cDNA fragments that are not uniformly distributed within the transcripts they represent. This non-uniformity must be accounted for when estimating expression levels, and we show how to perform the needed corrections using a likelihood based approach. We find improvements in expression estimates as measured by correlation with independently performed qRT-PCR and show that correction of bias leads to improved replicability of results across libraries and sequencing technologies.", "date": "2011-03-16", "date_type": "published", "publication": "Genome Biology", "volume": "12", "number": "3", "publisher": "BioMed Central", "pagerange": "Art. No. R22", "id_number": "CaltechAUTHORS:20170306-105110860", "issn": "1465-6906", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170306-105110860", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NSF Graduate Research Fellowship" } ] }, "doi": "10.1186/gb-2011-12-3-r22", "pmcid": "PMC3129672", "primary_object": { "basename": "13059_2010_2498_MOESM1_ESM.PDF", "url": "https://authors.library.caltech.edu/records/tkpa3-pje63/files/13059_2010_2498_MOESM1_ESM.PDF" }, "related_objects": [ { "basename": "13059_2010_2498_MOESM4_ESM.py", "url": "https://authors.library.caltech.edu/records/tkpa3-pje63/files/13059_2010_2498_MOESM4_ESM.py" }, { "basename": "13059_2010_2498_MOESM8_ESM.pdf", "url": "https://authors.library.caltech.edu/records/tkpa3-pje63/files/13059_2010_2498_MOESM8_ESM.pdf" }, { "basename": "13059_2010_2498_MOESM9_ESM.pdf", "url": "https://authors.library.caltech.edu/records/tkpa3-pje63/files/13059_2010_2498_MOESM9_ESM.pdf" }, { "basename": "13059_2010_2498_MOESM7_ESM.pdf", "url": "https://authors.library.caltech.edu/records/tkpa3-pje63/files/13059_2010_2498_MOESM7_ESM.pdf" }, { "basename": "13059_2010_2498_MOESM11_ESM.pdf", "url": "https://authors.library.caltech.edu/records/tkpa3-pje63/files/13059_2010_2498_MOESM11_ESM.pdf" }, { "basename": "13059_2010_2498_MOESM13_ESM.pdf", "url": "https://authors.library.caltech.edu/records/tkpa3-pje63/files/13059_2010_2498_MOESM13_ESM.pdf" }, { "basename": "13059_2010_2498_MOESM14_ESM.tiff", "url": "https://authors.library.caltech.edu/records/tkpa3-pje63/files/13059_2010_2498_MOESM14_ESM.tiff" }, { "basename": "13059_2010_2498_MOESM6_ESM.pdf", "url": "https://authors.library.caltech.edu/records/tkpa3-pje63/files/13059_2010_2498_MOESM6_ESM.pdf" }, { "basename": "13059_2010_2498_MOESM10_ESM.pdf", "url": "https://authors.library.caltech.edu/records/tkpa3-pje63/files/13059_2010_2498_MOESM10_ESM.pdf" }, { "basename": "13059_2010_2498_MOESM12_ESM.pdf", "url": "https://authors.library.caltech.edu/records/tkpa3-pje63/files/13059_2010_2498_MOESM12_ESM.pdf" }, { "basename": "13059_2010_2498_MOESM2_ESM.tgz", "url": "https://authors.library.caltech.edu/records/tkpa3-pje63/files/13059_2010_2498_MOESM2_ESM.tgz" }, { "basename": "art_3A10.1186_2Fgb-2011-12-3-r22.pdf", "url": "https://authors.library.caltech.edu/records/tkpa3-pje63/files/art_3A10.1186_2Fgb-2011-12-3-r22.pdf" }, { "basename": "13059_2010_2498_MOESM15_ESM.pdf", "url": "https://authors.library.caltech.edu/records/tkpa3-pje63/files/13059_2010_2498_MOESM15_ESM.pdf" }, { "basename": "13059_2010_2498_MOESM3_ESM.PDF", "url": "https://authors.library.caltech.edu/records/tkpa3-pje63/files/13059_2010_2498_MOESM3_ESM.PDF" }, { "basename": "13059_2010_2498_MOESM5_ESM.pdf", "url": "https://authors.library.caltech.edu/records/tkpa3-pje63/files/13059_2010_2498_MOESM5_ESM.pdf" } ], "resource_type": "article", "pub_year": "2011", "author_list": "Roberts, Adam; Trapnell, Cole; et el." }, { "id": "https://authors.library.caltech.edu/records/3k62z-3jg56", "eprint_id": 74785, "eprint_status": "archive", "datestamp": "2023-08-19 05:16:29", "lastmod": "2023-10-24 23:16:43", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Hower-V", "name": { "family": "Hower", "given": "Valerie" } }, { "id": "Evans-S-N", "name": { "family": "Evans", "given": "Steven N." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Shape-based peak identification for ChIP-Seq", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2011 Hower et al; licensee BioMed Central Ltd. This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. \n\nReceived: 3 June 2010. Accepted: 12 January 2011. Published: 12 January 2011. \n\nSNE is supported in part by NSF grant DMS-0907630 and VH is funded by NSF fellowship DMS-0902723. \n\nAuthors' contributions: LP proposed the problem of using the shape of a putative peak to determine binding sites in ChIP-Seq. SNE developed the probability theory. VH explored ideas from topological data analysis, implemented the algorithm, and analyzed the ChIP-Seq data. VH, SNE and LP worked together to develop the peak calling algorithm, and all contributed to writing the manuscript. All authors read and approved the final manuscript.\n\nPublished - art_3A10.1186_2F1471-2105-12-15.pdf
Supplemental Material - 12859_2010_4332_MOESM1_ESM.pdf
Supplemental Material - 12859_2010_4332_MOESM2_ESM.pdf
Supplemental Material - 12859_2010_4332_MOESM3_ESM.pdf
Supplemental Material - 12859_2010_4332_MOESM4_ESM.pdf
Supplemental Material - 12859_2010_4332_MOESM5_ESM.pdf
Supplemental Material - 12859_2010_4332_MOESM6_ESM.pdf
", "abstract": "Background: The identification of binding targets for proteins using ChIP-Seq has gained popularity as an alternative to ChIP-chip. Sequencing can, in principle, eliminate artifacts associated with microarrays, and cheap sequencing offers the ability to sequence deeply and obtain a comprehensive survey of binding. A number of algorithms have been developed to call \"peaks\" representing bound regions from mapped reads. Most current algorithms incorporate multiple heuristics, and despite much work it remains difficult to accurately determine individual peaks corresponding to distinct binding events. \n\nResults: Our method for identifying statistically significant peaks from read coverage is inspired by the notion of persistence in topological data analysis and provides a non-parametric approach that is statistically sound and robust to noise in experiments. Specifically, our method reduces the peak calling problem to the study of tree-based statistics derived from the data. We validate our approach using previously published data and show that it can discover previously missed regions. \n\nConclusions: The difficulty in accurately calling peaks for ChIP-Seq data is partly due to the difficulty in defining peaks, and we demonstrate a novel method that improves on the accuracy of previous methods in resolving peaks. Our introduction of a robust statistical test based on ideas from topological data analysis is also novel. Our methods are implemented in a program called T-PIC (T ree shape P eak I dentification for C hIP-Seq) is available at http://bio.math.berkeley.edu/tpic/.", "date": "2011-01-12", "date_type": "published", "publication": "BMC Bioinformatics", "volume": "12", "publisher": "BioMed Central", "pagerange": "Art. No. 15", "id_number": "CaltechAUTHORS:20170306-111327579", "issn": "1471-2105", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170306-111327579", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NSF", "grant_number": "DMS-0907630" }, { "agency": "NSF Graduate Research Fellowship", "grant_number": "DMS-0902723" } ] }, "doi": "10.1186/1471-2105-12-15", "pmcid": "PMC3032669", "primary_object": { "basename": "12859_2010_4332_MOESM4_ESM.pdf", "url": "https://authors.library.caltech.edu/records/3k62z-3jg56/files/12859_2010_4332_MOESM4_ESM.pdf" }, "related_objects": [ { "basename": "12859_2010_4332_MOESM5_ESM.pdf", "url": "https://authors.library.caltech.edu/records/3k62z-3jg56/files/12859_2010_4332_MOESM5_ESM.pdf" }, { "basename": "12859_2010_4332_MOESM6_ESM.pdf", "url": "https://authors.library.caltech.edu/records/3k62z-3jg56/files/12859_2010_4332_MOESM6_ESM.pdf" }, { "basename": "art_3A10.1186_2F1471-2105-12-15.pdf", "url": "https://authors.library.caltech.edu/records/3k62z-3jg56/files/art_3A10.1186_2F1471-2105-12-15.pdf" }, { "basename": "12859_2010_4332_MOESM1_ESM.pdf", "url": "https://authors.library.caltech.edu/records/3k62z-3jg56/files/12859_2010_4332_MOESM1_ESM.pdf" }, { "basename": "12859_2010_4332_MOESM2_ESM.pdf", "url": "https://authors.library.caltech.edu/records/3k62z-3jg56/files/12859_2010_4332_MOESM2_ESM.pdf" }, { "basename": "12859_2010_4332_MOESM3_ESM.pdf", "url": "https://authors.library.caltech.edu/records/3k62z-3jg56/files/12859_2010_4332_MOESM3_ESM.pdf" } ], "resource_type": "article", "pub_year": "2011", "author_list": "Hower, Valerie; Evans, Steven N.; et el." }, { "id": "https://authors.library.caltech.edu/records/445ny-3rj08", "eprint_id": 74787, "eprint_status": "archive", "datestamp": "2023-08-19 04:10:23", "lastmod": "2023-10-24 23:16:50", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Willerth-S-M", "name": { "family": "Willerth", "given": "Stephanie M." } }, { "id": "Pedro-H-A-M", "name": { "family": "Pedro", "given": "H\u00e9lder A. M." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Humeau-L-M", "name": { "family": "Humeau", "given": "Laurent M." } }, { "id": "Arkin-A-P", "name": { "family": "Arkin", "given": "Adam P." } }, { "id": "Schaffer-D-V", "name": { "family": "Schaffer", "given": "David V." } } ] }, "title": "Development of a Low Bias Method for Characterizing Viral Populations Using Next Generation Sequencing Technology", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2010 Willerth et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. \n\nReceived: May 27, 2010; Accepted: October 1, 2010; Published: October 22, 2010. \n\nThis work was supported by VIRxSYS corporation and NIH R01 GM073058. H.P. is sponsored by Fundacao Para a Ciencia e a Tecnologia [FCT], Fundacao Calouste Gulbenkian, Siemens SA Portugal and was supported by FCT fellowship SFRH/BD/33204/2007. The funders (specifically the VIRxSYS corporation) provided samples for analysis and input in the experimental design of this project. \n\nThe authors would like to thank Priya Shah, Katherine Hermans in the University of California-Berkeley Functional Genomics Laboratory, and Leath Tonkin of the Vincent J. Coates Genomic Sequencing Laboratory for their help. \n\nAuthor Contributions: Conceived and designed the experiments: SMW HAMP LP LMH APA DVS. Performed the experiments: SMW. Analyzed the data: HAMP. Contributed reagents/materials/analysis tools: HAMP LMH. Wrote the paper: SMW HAMP. Reviewed and commented on drafts of manuscaript: DVS. Revised and approved the final version of the article: LP LMH APA. \n\nCompeting interests: The first author (Stephanie M. Willerth) and the fourth author (Laurent M. Humeau) had their salaries paid for by the VIRxSYS corporation. The second author (H\u00e9lder A.M. Pedro) was sponsnsored in part by Siemens SA Portugal. The funding provided by these companies did not alter the authors' adherence to all the PLoS ONE policies on sharing data and materials.\n\nPublished - journal.pone.0013564.PDF
Supplemental Material - Figure_S1.tif
Supplemental Material - journal.pone.0013564.s002.TIF
Supplemental Material - journal.pone.0013564.s003.DOC
Supplemental Material - journal.pone.0013564.s004.DOC
", "abstract": "Background: With an estimated 38 million people worldwide currently infected with human immunodeficiency virus (HIV), and an additional 4.1 million people becoming infected each year, it is important to understand how this virus mutates and develops resistance in order to design successful therapies. \n\nMethodology/Principal Findings: We report a novel experimental method for amplifying full-length HIV genomes without the use of sequence-specific primers for high throughput DNA sequencing, followed by assembly of full length viral genome sequences from the resulting large dataset. Illumina was chosen for sequencing due to its ability to provide greater coverage of the HIV genome compared to prior methods, allowing for more comprehensive characterization of the heterogeneity present in the HIV samples analyzed. Our novel amplification method in combination with Illumina sequencing was used to analyze two HIV populations: a homogenous HIV population based on the canonical NL4-3 strain and a heterogeneous viral population obtained from a HIV patient's infected T cells. In addition, the resulting sequence was analyzed using a new computational approach to obtain a consensus sequence and several metrics of diversity. \n\nSignificance: This study demonstrates how a lower bias amplification method in combination with next generation DNA sequencing provides in-depth, complete coverage of the HIV genome, enabling a stronger characterization of the quasispecies present in a clinically relevant HIV population as well as future study of how HIV mutates in response to a selective pressure.", "date": "2010-10-22", "date_type": "published", "publication": "PLOS ONE", "volume": "5", "number": "10", "publisher": "Public Library of Science", "pagerange": "Art. No. e13564", "id_number": "CaltechAUTHORS:20170306-114844988", "issn": "1932-6203", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170306-114844988", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "VIRxSYS Corporation" }, { "agency": "NIH", "grant_number": "R01 GM073058" }, { "agency": "Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia (FCT)", "grant_number": "SFRH/BD/33204/2007" }, { "agency": "Funda\u00e7\u00e3o Calouste Gulbenkian" }, { "agency": "Siemens SA Portugal" } ] }, "doi": "10.1371/journal.pone.0013564", "pmcid": "PMC2962647", "primary_object": { "basename": "journal.pone.0013564.s002.TIF", "url": "https://authors.library.caltech.edu/records/445ny-3rj08/files/journal.pone.0013564.s002.TIF" }, "related_objects": [ { "basename": "journal.pone.0013564.s003.DOC", "url": "https://authors.library.caltech.edu/records/445ny-3rj08/files/journal.pone.0013564.s003.DOC" }, { "basename": "journal.pone.0013564.s004.DOC", "url": "https://authors.library.caltech.edu/records/445ny-3rj08/files/journal.pone.0013564.s004.DOC" }, { "basename": "Figure_S1.tif", "url": "https://authors.library.caltech.edu/records/445ny-3rj08/files/Figure_S1.tif" }, { "basename": "journal.pone.0013564.PDF", "url": "https://authors.library.caltech.edu/records/445ny-3rj08/files/journal.pone.0013564.PDF" } ], "resource_type": "article", "pub_year": "2010", "author_list": "Willerth, Stephanie M.; Pedro, H\u00e9lder A. M.; et el." }, { "id": "https://authors.library.caltech.edu/records/egmr2-mf591", "eprint_id": 74789, "eprint_status": "archive", "datestamp": "2023-08-19 03:34:11", "lastmod": "2023-10-24 23:16:56", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Evans-S-N", "name": { "family": "Evans", "given": "Steven N." } }, { "id": "Hower-V", "name": { "family": "Hower", "given": "Valerie" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Coverage statistics for sequence census methods", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2010 Evans et al; licensee BioMed Central Ltd. This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. \n\nReceived: 23 April 2010. Accepted: 18 August 2010. Published: 18 August 2010. \n\nSNE is supported in part by NSF grant DMS-0907630 and VH is funded by NSF fellowship DMS-0902723. We thank Adam Roberts for his help in making Figure 6. \n\nAuthors' contributions: LP proposed the problem of understanding the random behaviour of coverage functions in the context of sequence census methods. VH investigated the coverage function and lattice path excursions based on ideas from topological data analysis. SE developed the probability theory and identified the relevance of Theorem 1. SNE, VH and LP worked together on all aspects of the paper and wrote the manuscript. All authors read and approved the final manuscript.\n\nPublished - art_3A10.1186_2F1471-2105-11-430.pdf
Submitted - 1004.5587.pdf
Supplemental Material - 12859_2010_3887_MOESM1_ESM.pdf
Supplemental Material - 12859_2010_3887_MOESM2_ESM.png
Supplemental Material - 12859_2010_3887_MOESM3_ESM.png
Supplemental Material - 12859_2010_3887_MOESM4_ESM.pdf
Supplemental Material - 12859_2010_3887_MOESM5_ESM.pdf
Supplemental Material - 12859_2010_3887_MOESM6_ESM.pdf
", "abstract": "Background: We study the statistical properties of fragment coverage in genome sequencing experiments. In an extension of the classic Lander-Waterman model, we consider the effect of the length distribution of fragments. We also introduce a coding of the shape of the coverage depth function as a tree and explain how this can be used to detect regions with anomalous coverage. This modeling perspective is especially germane to current high-throughput sequencing experiments, where both sample preparation protocols and sequencing technology particulars can affect fragment length distributions. \n\nResults: Under the mild assumptions that fragment start sites are Poisson distributed and successive fragment lengths are independent and identically distributed, we observe that, regardless of fragment length distribution, the fragments produced in a sequencing experiment can be viewed as resulting from a two-dimensional spatial Poisson process. We then study the successive jumps of the coverage function, and show that they can be encoded as a random tree that is approximately a Galton-Watson tree with generation-dependent geometric offspring distributions whose parameters can be computed. \n\nConclusions: We extend standard analyses of shotgun sequencing that focus on coverage statistics at individual sites, and provide a null model for detecting deviations from random coverage in high-throughput sequence census based experiments. Our approach leads to explicit determinations of the null distributions of certain test statistics, while for others it greatly simplifies the approximation of their null distributions by simulation. Our focus on fragments also leads to a new approach to visualizing sequencing data that is of independent interest.", "date": "2010-08-18", "date_type": "published", "publication": "BMC Bioinformatics", "volume": "11", "publisher": "BioMed Central", "pagerange": "Art. No. 430", "id_number": "CaltechAUTHORS:20170306-122114222", "issn": "1471-2105", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170306-122114222", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NSF", "grant_number": "DMS-0907630" }, { "agency": "NSF Graduate Research Fellowship", "grant_number": "DMS-0902723" } ] }, "doi": "10.1186/1471-2105-11-430", "pmcid": "PMC2940910", "primary_object": { "basename": "12859_2010_3887_MOESM2_ESM.png", "url": "https://authors.library.caltech.edu/records/egmr2-mf591/files/12859_2010_3887_MOESM2_ESM.png" }, "related_objects": [ { "basename": "12859_2010_3887_MOESM3_ESM.png", "url": "https://authors.library.caltech.edu/records/egmr2-mf591/files/12859_2010_3887_MOESM3_ESM.png" }, { "basename": "12859_2010_3887_MOESM4_ESM.pdf", "url": "https://authors.library.caltech.edu/records/egmr2-mf591/files/12859_2010_3887_MOESM4_ESM.pdf" }, { "basename": "12859_2010_3887_MOESM5_ESM.pdf", "url": "https://authors.library.caltech.edu/records/egmr2-mf591/files/12859_2010_3887_MOESM5_ESM.pdf" }, { "basename": "12859_2010_3887_MOESM6_ESM.pdf", "url": "https://authors.library.caltech.edu/records/egmr2-mf591/files/12859_2010_3887_MOESM6_ESM.pdf" }, { "basename": "art_3A10.1186_2F1471-2105-11-430.pdf", "url": "https://authors.library.caltech.edu/records/egmr2-mf591/files/art_3A10.1186_2F1471-2105-11-430.pdf" }, { "basename": "1004.5587.pdf", "url": "https://authors.library.caltech.edu/records/egmr2-mf591/files/1004.5587.pdf" }, { "basename": "12859_2010_3887_MOESM1_ESM.pdf", "url": "https://authors.library.caltech.edu/records/egmr2-mf591/files/12859_2010_3887_MOESM1_ESM.pdf" } ], "resource_type": "article", "pub_year": "2010", "author_list": "Evans, Steven N.; Hower, Valerie; et el." }, { "id": "https://authors.library.caltech.edu/records/vzhec-44b69", "eprint_id": 74788, "eprint_status": "archive", "datestamp": "2023-08-19 03:24:50", "lastmod": "2023-10-24 23:16:53", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Singer-M", "name": { "family": "Singer", "given": "Meromit" } }, { "id": "Boffelli-D", "name": { "family": "Boffelli", "given": "Dario" } }, { "id": "Dhahbi-Joseph", "name": { "family": "Dhahbi", "given": "Joseph" } }, { "id": "Sch\u00f6nhuth-A", "name": { "family": "Sch\u00f6nhuth", "given": "Alexander" } }, { "id": "Schroth-G-P", "name": { "family": "Schroth", "given": "Gary P." } }, { "id": "Martin-David-I-K", "name": { "family": "Martin", "given": "David I. K." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "MetMap Enables Genome-Scale Methyltyping for Determining Methylation States in Populations", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2010 Singer et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. \n\nReceived: April 1, 2010; Accepted: July 15, 2010; Published: August 19, 2010. \n\nThis work was supported by the NIH grants HL084474 (DB), ES016581 (DM), CA115768 (DM). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Meromit Singer and Lior Pachter received no funding for this work. \n\nWe thank Lu Zhang from Illumina, Inc., Hayward, CA for the MethylSeq data used in this manuscript, the ENCODE project for generation of the FAIRE datasets, Sriram Sankararaman for many enlightening discussions and careful feedback on the manuscript, and Cole Trapnell for critical reading of the manuscript. \n\nAuthor Contributions: Conceived and designed the experiments: MS DB DIKM LP. Performed the experiments: MS JD AS GPS DIKM. Analyzed the data: MS DB JD AS DIKM LP. Contributed reagents/materials/analysis tools: GPS. Wrote the paper: MS DB DIKM LP. \n\nThe authors have declared that no competing interests exist.\n\nPublished - journal.pcbi.1000888.PDF
Supplemental Material - journal.pcbi.1000888.s001.PDF
Supplemental Material - journal.pcbi.1000888.s002.PDF
Supplemental Material - journal.pcbi.1000888.s003.PDF
Supplemental Material - journal.pcbi.1000888.s004.PDF
Supplemental Material - journal.pcbi.1000888.s005.PDF
", "abstract": "The ability to assay genome-scale methylation patterns using high-throughput sequencing makes it possible to carry out association studies to determine the relationship between epigenetic variation and phenotype. While bisulfite sequencing can determine a methylome at high resolution, cost inhibits its use in comparative and population studies. MethylSeq, based on sequencing of fragment ends produced by a methylation-sensitive restriction enzyme, is a method for methyltyping (survey of methylation states) and is a site-specific and cost-effective alternative to whole-genome bisulfite sequencing. Despite its advantages, the use of MethylSeq has been restricted by biases in MethylSeq data that complicate the determination of methyltypes. Here we introduce a statistical method, MetMap, that produces corrected site-specific methylation states from MethylSeq experiments and annotates unmethylated islands across the genome. MetMap integrates genome sequence information with experimental data, in a statistically sound and cohesive Bayesian Network. It infers the extent of methylation at individual CGs and across regions, and serves as a framework for comparative methylation analysis within and among species. We validated MetMap's inferences with direct bisulfite sequencing, showing that the methylation status of sites and islands is accurately inferred. We used MetMap to analyze MethylSeq data from four human neutrophil samples, identifying novel, highly unmethylated islands that are invisible to sequence-based annotation strategies. The combination of MethylSeq and MetMap is a powerful and cost-effective tool for determining genome-scale methyltypes suitable for comparative and association studies.", "date": "2010-08", "date_type": "published", "publication": "PLOS Computational Biology", "volume": "6", "number": "8", "publisher": "Public Library of Science", "pagerange": "Art. No. e1000888", "id_number": "CaltechAUTHORS:20170306-121310345", "issn": "1553-7358", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170306-121310345", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "HL084474" }, { "agency": "NIH", "grant_number": "ES016581" }, { "agency": "NIH", "grant_number": "CA115768" } ] }, "doi": "10.1371/journal.pcbi.1000888", "pmcid": "PMC2924245", "primary_object": { "basename": "journal.pcbi.1000888.PDF", "url": "https://authors.library.caltech.edu/records/vzhec-44b69/files/journal.pcbi.1000888.PDF" }, "related_objects": [ { "basename": "journal.pcbi.1000888.s001.PDF", "url": "https://authors.library.caltech.edu/records/vzhec-44b69/files/journal.pcbi.1000888.s001.PDF" }, { "basename": "journal.pcbi.1000888.s002.PDF", "url": "https://authors.library.caltech.edu/records/vzhec-44b69/files/journal.pcbi.1000888.s002.PDF" }, { "basename": "journal.pcbi.1000888.s003.PDF", "url": "https://authors.library.caltech.edu/records/vzhec-44b69/files/journal.pcbi.1000888.s003.PDF" }, { "basename": "journal.pcbi.1000888.s004.PDF", "url": "https://authors.library.caltech.edu/records/vzhec-44b69/files/journal.pcbi.1000888.s004.PDF" }, { "basename": "journal.pcbi.1000888.s005.PDF", "url": "https://authors.library.caltech.edu/records/vzhec-44b69/files/journal.pcbi.1000888.s005.PDF" } ], "resource_type": "article", "pub_year": "2010", "author_list": "Singer, Meromit; Boffelli, Dario; et el." }, { "id": "https://authors.library.caltech.edu/records/1sq2y-r7n90", "eprint_id": 74790, "eprint_status": "archive", "datestamp": "2023-08-19 03:20:11", "lastmod": "2023-10-24 23:16:58", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Levin-T-C", "name": { "family": "Levin", "given": "Tera C." } }, { "id": "Glazer-A-M", "name": { "family": "Glazer", "given": "Andrew M." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Brem-R-B", "name": { "family": "Brem", "given": "Rachel B." } }, { "id": "Eisen-M-B", "name": { "family": "Eisen", "given": "Michael B." }, "orcid": "0000-0002-7528-738X" } ] }, "title": "Exploring the Genetic Basis of Variation in Gene Predictions with a Synthetic Association Study", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2010 Levin et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. \n\nReceived: February 25, 2010; Accepted: June 8, 2010; Published: July 29, 2010. \n\nAuthor Contributions: Conceived and designed the experiments: TL AMG MBE. Performed the experiments: TL AMG. Analyzed the data: TL AMG. Contributed reagents/materials/analysis tools: TL AMG. Wrote the paper: TL AMG RB MBE. Supervised the research: MBE LP RB. \n\nThe authors have no support or funding to report. \n\nCompeting interests: MBE is a member of the PLoS Board of Directors. This does not alter the authors' adherence to all the PLoS ONE policies on sharing data and materials.\n\nPublished - journal.pone.0011645.PDF
", "abstract": "Identifying DNA polymorphisms that affect molecular processes like transcription, splicing, or translation typically requires genotyping and experimentally characterizing tissue from large numbers of individuals, which remains expensive and time consuming. Here we introduce an alternative strategy: a \"synthetic association study\" in which we computationally predict molecular phenotypes on artificial genomes containing randomly sampled combinations of polymorphic alleles, and perform a classical association study to identify genotypes underlying variation in these computationally predicted annotations. We applied this method to characterize the effects on gene structure of 32,792 single-nucleotide polymorphisms between two strains of the antibiotic producing fungus Penicilium chrysogenum. Although these SNPs represent only 0.1 percent of the nucleotides in the genome, they collectively altered 1.8 percent of predicted gene models between these strains. To determine which SNPs or combinations of SNPs were responsible for this variation, we predicted protein-coding genes in 500 intermediate genomes, each identical except for randomly chosen alleles at each SNP position. Of 30,468 gene models in the genome, 557 varied across these 500 genomes. 226 of these polymorphic gene models (40%) were perfectly correlated with individual SNPs, all of which were within or immediately proximal to the affected gene. The genetic architectures of the other 321 were more complex, with several examples of SNP epistasis that would have been difficult to predict a priori. We expect that many of the SNPs that affect computational gene structure reflect a biologically unrealistic sensitivity of the gene prediction algorithm to sequence changes, and we propose that genome annotation algorithms could be improved by minimizing their sensitivity to natural polymorphisms. However, many of the SNPs we identified are likely to affect transcript structure in vivo, and the synthetic association study approach can be easily generalized to any computed genome annotation to uncover relationships between genotype and important molecular phenotypes.", "date": "2010-07-29", "date_type": "published", "publication": "PLOS ONE", "volume": "5", "number": "7", "publisher": "Public Library of Science", "pagerange": "Art. No. e11645", "id_number": "CaltechAUTHORS:20170306-123004736", "issn": "1932-6203", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170306-123004736", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "doi": "10.1371/journal.pone.0011645", "pmcid": "PMC2912228", "primary_object": { "basename": "journal.pone.0011645.PDF", "url": "https://authors.library.caltech.edu/records/1sq2y-r7n90/files/journal.pone.0011645.PDF" }, "resource_type": "article", "pub_year": "2010", "author_list": "Levin, Tera C.; Glazer, Andrew M.; et el." }, { "id": "https://authors.library.caltech.edu/records/q1bwm-dea54", "eprint_id": 74791, "eprint_status": "archive", "datestamp": "2023-08-19 03:05:51", "lastmod": "2023-10-24 23:17:41", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Lapuk-A", "name": { "family": "Lapuk", "given": "Anna" } }, { "id": "Marr-H", "name": { "family": "Marr", "given": "Henry" } }, { "id": "Jakkula-L", "name": { "family": "Jakkula", "given": "Lakshmi" } }, { "id": "Pedro-H-A-M", "name": { "family": "Pedro", "given": "Helder" } }, { "id": "Bhattacharya-S", "name": { "family": "Bhattacharya", "given": "Sanchita" } }, { "id": "Purdom-E", "name": { "family": "Purdom", "given": "Elizabeth" } }, { "id": "Hu-Zhi", "name": { "family": "Hu", "given": "Zhi" } }, { "id": "Simpson-K", "name": { "family": "Simpson", "given": "Ken" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Durinck-S", "name": { "family": "Durinck", "given": "Steffen" } }, { "id": "Wang-Nicholas", "name": { "family": "Wang", "given": "Nicholas" } }, { "id": "Parvin-B", "name": { "family": "Parvin", "given": "Bahram" } }, { "id": "Fontenay-G", "name": { "family": "Fontenay", "given": "Gerald" } }, { "id": "Speed-T", "name": { "family": "Speed", "given": "Terence" } }, { "id": "Garbe-J-C", "name": { "family": "Garbe", "given": "James" } }, { "id": "Stampfer-Martha", "name": { "family": "Stampfer", "given": "Martha" } }, { "id": "Bayandorian-H", "name": { "family": "Bayandorian", "given": "Hovig" } }, { "id": "Dorton-S", "name": { "family": "Dorton", "given": "Shannon" } }, { "id": "Clark-T-A", "name": { "family": "Clark", "given": "Tyson A." } }, { "id": "Schweitzer-A", "name": { "family": "Schweitzer", "given": "Anthony" } }, { "id": "Wyrobek-A", "name": { "family": "Wyrobek", "given": "Andrew" } }, { "id": "Feller-H", "name": { "family": "Feiler", "given": "Heidi" } }, { "id": "Spellman-P", "name": { "family": "Spellman", "given": "Paul" } }, { "id": "Conboy-J-G", "name": { "family": "Conboy", "given": "John" } }, { "id": "Gray-J-W", "name": { "family": "Gray", "given": "Joe W." } } ] }, "title": "Exon-Level Microarray Analyses Identify Alternative Splicing Programs in Breast Cancer", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2010 American Association for Cancer Research. \n\nReceived 12/07/2009; revised 05/21/2010; accepted 06/01/2010; published OnlineFirst 07/06/2010. \n\nGrant Support: Director, Office of Science, Office of Biological & Environmental Research, of the U.S. Department of Energy under contract no. DE-AC02-05CH11231, USAMRMC W81XWH-07-1-0663 and NIH grants CA58207, CA112970, and CA 126477 (J.G. Conboy) by NIH grant HL045182 (J.W. Gray); and by the FCT SFRH/BD 33203 2007 (H. Pedro). \n\nDisclosure of Potential Conflicts of Interest: J.W. Gray received early access to microarrays from Affymetrix. \n\nThe costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.\n\nAccepted Version - nihms212752.pdf
Supplemental Material - 10/1541-7786.MCR-09-0528.DC1/Supplementary_Data.pdf
Supplemental Material - 10/1541-7786.MCR-09-0528.DC1/Supplementary_Table_S1.xlsx
", "abstract": "Protein isoforms produced by alternative splicing (AS) of many genes have been implicated in several aspects of cancer genesis and progression. These observations motivated a genome-wide assessment of AS in breast cancer. We accomplished this by measuring exon level expression in 31 breast cancer and nonmalignant immortalized cell lines representing luminal, basal, and claudin-low breast cancer subtypes using Affymetrix Human Junction Arrays. We analyzed these data using a computational pipeline specifically designed to detect AS with a low false-positive rate. This identified 181 splice events representing 156 genes as candidates for AS. Reverse transcription-PCR validation of a subset of predicted AS events confirmed 90%. Approximately half of the AS events were associated with basal, luminal, or claudin-low breast cancer subtypes. Exons involved in claudin-low subtype\u2013specific AS were significantly associated with the presence of evolutionarily conserved binding motifs for the tissue-specific Fox2 splicing factor. Small interfering RNA knockdown of Fox2 confirmed the involvement of this splicing factor in subtype-specific AS. The subtype-specific AS detected in this study likely reflects the splicing pattern in the breast cancer progenitor cells in which the tumor arose and suggests the utility of assays for Fox-mediated AS in cancer subtype definition and early detection. These data also suggest the possibility of reducing the toxicity of protein-targeted breast cancer treatments by targeting protein isoforms that are not present in limiting normal tissues.", "date": "2010-07", "date_type": "published", "publication": "Molecular Cancer Research", "volume": "8", "number": "7", "publisher": "American Association for Cancer Research", "pagerange": "961-974", "id_number": "CaltechAUTHORS:20170306-123614363", "issn": "1541-7786", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170306-123614363", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "Department of Energy (DOE)", "grant_number": "DE-AC02-05CH11231" }, { "agency": "US Army Medical Research and Materiel Command (USAMRMC)", "grant_number": "W81XWH-07-1-0663" }, { "agency": "NIH", "grant_number": "CA58207" }, { "agency": "NIH", "grant_number": "CA112970" }, { "agency": "NIH", "grant_number": "CA126477" }, { "agency": "NIH", "grant_number": "HL045182" }, { "agency": "Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia (FCT)", "grant_number": "SFRH/BD 33203 2007" } ] }, "doi": "10.1158/1541-7786.MCR-09-0528", "pmcid": "PMC2911965", "primary_object": { "basename": "nihms212752.pdf", "url": "https://authors.library.caltech.edu/records/q1bwm-dea54/files/nihms212752.pdf" }, "resource_type": "article", "pub_year": "2010", "author_list": "Lapuk, Anna; Marr, Henry; et el." }, { "id": "https://authors.library.caltech.edu/records/wvkv3-hv456", "eprint_id": 18505, "eprint_status": "archive", "datestamp": "2023-08-19 02:20:05", "lastmod": "2023-10-20 16:27:51", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Trapnell-C", "name": { "family": "Trapnell", "given": "Cole" } }, { "id": "Williams-B-A", "name": { "family": "Williams", "given": "Brian A." } }, { "id": "Pertea-G", "name": { "family": "Pertea", "given": "Geo" } }, { "id": "Mortazavi-A", "name": { "family": "Mortazavi", "given": "Ali" }, "orcid": "0000-0002-4259-6362" }, { "id": "Kwan-Gordon", "name": { "family": "Kwan", "given": "Gordon" } }, { "id": "van-Baren-M-J", "name": { "family": "van Baren", "given": "Marijke J." } }, { "id": "Salzberg-S-L", "name": { "family": "Salzberg", "given": "Steven L." } }, { "id": "Wold-B-J", "name": { "family": "Wold", "given": "Barbara J." }, "orcid": "0000-0003-3235-8130" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2010 Nature Publishing Group. \n\nReceived 02 February 2010; Accepted 22 March 2010; Published online 02 May 2010. \n\nThis work was supported in part by the US National Institutes of Health (NIH) grants R01-LM006845 and ENCODE U54-HG004576, as well as the Beckman Foundation, the Bren Foundation, the Moore Foundation (Cell Center Program) and the Miller Research Institute. We thank I. Antosechken and L. Schaeffer of the Caltech Jacobs Genome Center for DNA sequencing, and D. Trout, B. King and H. Amrhein for data pipeline and database design, operation and display. We are grateful to R. K. Bradley, K. Datchev, I. Hallgr\u00edmsd\u00f3ttir, J. Landolin, B. Langmead, A. Roberts, M. Schatz and D. Sturgill for helpful discussions. \n\nAuthor Contributions: C.T. and L.P. developed the mathematics and statistics and designed the algorithms; B.A.W. and G.K. performed the RNA-Seq and B.A.W. designed and executed experimental validations; C.T. implemented Cufflinks and Cuffdiff; G.P. implemented Cuffcompare; M.J.v.B. and A.M. tested the software; C.T., G.P. and A.M. performed the analysis; L.P., A.M. and B.J.W. conceived the project; C.T., L.P., A.M., B. J.W. and S.L.S. wrote the manuscript. \n\nSoftware availability. TopHat (http://tophat.cbcb.umd.edu) is freely available as source code. It takes a reference genome (as a Bowtie29 index) and RNA-Seq reads as FASTA or FASTQ and produces alignments in SAM30 format. TopHat is distributed under the Artistic License and runs on Linux and Mac OS X. \n\nThe Cufflinks assembler and abundance estimation algorithms (http://cufflinks.cbcb.umd.edu/) are open-source C++ programs and are freely available in both source and binary. The package includes the assembler along with utilities to structurally compare Cufflinks output between samples (Cuffcompare) and to perform differential expression testing (Cuffdiff). Cufflinks is distributed under the Boost License and runs on Linux and Mac OS X. The source code for Cufflinks version 0.8.0 is provided in Supplementary Data 3. \n\nThe authors declare no competing financial interests.\n\nAccepted Version - nihms190938.pdf
Supplemental Material - nbt.1621-S1.pdf
Supplemental Material - nbt.1621-S2.xls
Supplemental Material - nbt.1621-S3.zip
", "abstract": "High-throughput mRNA sequencing (RNA-Seq) promises simultaneous transcript discovery and abundance estimation. However, this would require algorithms that are not restricted by prior gene annotations and that account for alternative transcription and splicing. Here we introduce such algorithms in an open-source software program called Cufflinks. To test Cufflinks, we sequenced and analyzed >430 million paired 75-bp RNA-Seq reads from a mouse myoblast cell line over a differentiation time series. We detected 13,692 known transcripts and 3,724 previously unannotated ones, 62% of which are supported by independent expression data or by homologous genes in other species. Over the time series, 330 genes showed complete switches in the dominant transcription start site (TSS) or splice isoform, and we observed more subtle shifts in 1,304 other genes. These results suggest that Cufflinks can illuminate the substantial regulatory flexibility and complexity in even this well-studied model of muscle development and that it can improve transcriptome-based genome annotation.", "date": "2010-05", "date_type": "published", "publication": "Nature Biotechnology", "volume": "28", "number": "5", "publisher": "Nature Publishing Group", "pagerange": "511-515", "id_number": "CaltechAUTHORS:20100601-111602154", "issn": "1087-0156", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20100601-111602154", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R01-LM006845" }, { "agency": "NIH", "grant_number": "ENCODE U54-HG004576" }, { "agency": "Arnold and Mabel Beckman Foundation" }, { "agency": "Bren Foundation" }, { "agency": "Gordon and Betty Moore Foundation" }, { "agency": "Miller Institute for Basic Research in Science" } ] }, "doi": "10.1038/nbt.1621", "pmcid": "PMC3146043", "primary_object": { "basename": "nihms190938.pdf", "url": "https://authors.library.caltech.edu/records/wvkv3-hv456/files/nihms190938.pdf" }, "related_objects": [ { "basename": "nbt.1621-S1.pdf", "url": "https://authors.library.caltech.edu/records/wvkv3-hv456/files/nbt.1621-S1.pdf" }, { "basename": "nbt.1621-S2.xls", "url": "https://authors.library.caltech.edu/records/wvkv3-hv456/files/nbt.1621-S2.xls" }, { "basename": "nbt.1621-S3.zip", "url": "https://authors.library.caltech.edu/records/wvkv3-hv456/files/nbt.1621-S3.zip" } ], "resource_type": "article", "pub_year": "2010", "author_list": "Trapnell, Cole; Williams, Brian A.; et el." }, { "id": "https://authors.library.caltech.edu/records/y0yeq-t0p17", "eprint_id": 74792, "eprint_status": "archive", "datestamp": "2023-08-19 01:51:55", "lastmod": "2023-10-24 23:17:46", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Bradley-R-K", "name": { "family": "Bradley", "given": "Robert K." } }, { "id": "Li-Xiao-Yong", "name": { "family": "Li", "given": "Xiao-Yong" } }, { "id": "Trapnell-C", "name": { "family": "Trapnell", "given": "Cole" } }, { "id": "Davidson-S", "name": { "family": "Davidson", "given": "Stuart" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Chu-Hou-Cheng", "name": { "family": "Chu", "given": "Hou Cheng" } }, { "id": "Tonkin-L-A", "name": { "family": "Tonkin", "given": "Leath A." } }, { "id": "Biggin-M-D", "name": { "family": "Biggin", "given": "Mark D." } }, { "id": "Eisen-M-B", "name": { "family": "Eisen", "given": "Michael B." }, "orcid": "0000-0002-7528-738X" } ] }, "title": "Binding Site Turnover Produces Pervasive Quantitative Changes in Transcription Factor Binding between Closely Related Drosophila Species", "ispublished": "pub", "full_text_status": "public", "note": "This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. \n\nReceived: October 26, 2009; Accepted: February 17, 2010; Published: March 23, 2010. \n\nExperimental work described here was supported by a Howard Hughes Medical Institute Investigator award to MBE and by National Institutes of Health (NIH) grant GM704403 to MBE and MDB. Computational analyses were supported in by NIH grant HG002779 to MBE. Work at Lawrence Berkeley National Laboratory was conducted under Department of Energy contract DE-AC02-05CH11231. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. \n\nWe thank Colin Dewey for providing a Mercator orthology mapping for D. melanogaster and D. yakuba and producing the alignment with FSA. We thank the members of the Eisen lab for many helpful discussions and comments on the manuscript. \n\nAuthor Contributions: The author(s) have made the following declarations about their contributions: Conceived and designed the experiments: RKB XYL MDB MBE. Performed the experiments: XYL HCC LAT. Analyzed the data: RKB CT SD LP MBE. Contributed reagents/materials/analysis tools: RKB CT LAT. Wrote the paper: RKB MBE. \n\nCompeting interests: MBE is a co-founder and member of the Board of Directors of PLoS.\n\nPublished - journal.pbio.1000343.PDF
Supplemental Material - 144267.zip
", "abstract": "Changes in gene expression play an important role in evolution, yet the molecular mechanisms underlying regulatory evolution are poorly understood. Here we compare genome-wide binding of the six transcription factors that initiate segmentation along the anterior-posterior axis in embryos of two closely related species: Drosophila melanogaster and Drosophila yakuba. Where we observe binding by a factor in one species, we almost always observe binding by that factor to the orthologous sequence in the other species. Levels of binding, however, vary considerably. The magnitude and direction of the interspecies differences in binding levels of all six factors are strongly correlated, suggesting a role for chromatin or other factor-independent forces in mediating the divergence of transcription factor binding. Nonetheless, factor-specific quantitative variation in binding is common, and we show that it is driven to a large extent by the gain and loss of cognate recognition sequences for the given factor. We find only a weak correlation between binding variation and regulatory function. These data provide the first genome-wide picture of how modest levels of sequence divergence between highly morphologically similar species affect a system of coordinately acting transcription factors during animal development, and highlight the dominant role of quantitative variation in transcription factor binding over short evolutionary distances.", "date": "2010-03", "date_type": "published", "publication": "PLoS Biology", "volume": "8", "number": "3", "publisher": "Public Library of Science", "pagerange": "Art. No. e1000343", "id_number": "CaltechAUTHORS:20170306-125457377", "issn": "1545-7885", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170306-125457377", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "Howard Hughes Medical Institute (HHMI)" }, { "agency": "NIH", "grant_number": "GM704403" }, { "agency": "NIH", "grant_number": "HG002779" }, { "agency": "Department of Energy (DOE)", "grant_number": "DE-AC02-05CH11231" } ] }, "doi": "10.1371/journal.pbio.1000343", "pmcid": "PMC2843597", "primary_object": { "basename": "144267.zip", "url": "https://authors.library.caltech.edu/records/y0yeq-t0p17/files/144267.zip" }, "related_objects": [ { "basename": "journal.pbio.1000343.PDF", "url": "https://authors.library.caltech.edu/records/y0yeq-t0p17/files/journal.pbio.1000343.PDF" } ], "resource_type": "article", "pub_year": "2010", "author_list": "Bradley, Robert K.; Li, Xiao-Yong; et el." }, { "id": "https://authors.library.caltech.edu/records/2jf89-zme75", "eprint_id": 74794, "eprint_status": "archive", "datestamp": "2023-08-19 01:23:32", "lastmod": "2023-10-24 23:17:53", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Hilty-Markus", "name": { "family": "Hilty", "given": "Markus" } }, { "id": "Burke-Conor", "name": { "family": "Burke", "given": "Conor" } }, { "id": "Pedro-Helder-A-M", "name": { "family": "Pedro", "given": "Helder" } }, { "id": "Cardenas-Paul", "name": { "family": "Cardenas", "given": "Paul" } }, { "id": "Bush-Andy", "name": { "family": "Bush", "given": "Andy" } }, { "id": "Bossley-Cara", "name": { "family": "Bossley", "given": "Cara" } }, { "id": "Davies-Jane", "name": { "family": "Davies", "given": "Jane" }, "orcid": "0000-0002-4108-4357" }, { "id": "Ervine-Aaron", "name": { "family": "Ervine", "given": "Aaron" } }, { "id": "Poulter-Len", "name": { "family": "Poulter", "given": "Len" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Moffatt-Miriam-F", "name": { "family": "Moffatt", "given": "Miriam F." } }, { "id": "Cookson-William-O-C", "name": { "family": "Cookson", "given": "William O. C." } } ] }, "title": "Disordered Microbial Communities in Asthmatic Airways", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2010 Hilty et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. \n\nReceived: October 8, 2009; Accepted: December 12, 2009; Published: January 5, 2010. \n\nThe study was funded by the GABRIEL FP6 Integrated Project of the European Commission and the Wellcome Trust. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. \n\nWe thank Jan Henson at the National Jewish Hospital in Denver for her encouragement to pursue this study. \n\nAuthor Contributions: Conceived and designed the experiments: MH CB LP MFM WOCC. Performed the experiments: MH CB PC CB JCD AE LP. Analyzed the data: MH HP PC LP MFM WOCC. Contributed reagents/materials/analysis tools: CB AB CB JCD. Wrote the paper: MH MFM WOCC. \n\nSequences are stored as GenBank accession GQ360090-GQ365143. \n\nThe authors have declared that no competing interests exist.\n\nPublished - journal.pone.0008578.PDF
", "abstract": "Background: A rich microbial environment in infancy protects against asthma [1], [2] and infections precipitate asthma exacerbations [3]. We compared the airway microbiota at three levels in adult patients with asthma, the related condition of COPD, and controls. We also studied bronchial lavage from asthmatic children and controls. \n\nPrincipal Findings: We identified 5,054 16S rRNA bacterial sequences from 43 subjects, detecting >70% of species present. The bronchial tree was not sterile, and contained a mean of 2,000 bacterial genomes per cm2 surface sampled. Pathogenic Proteobacteria, particularly Haemophilus spp., were much more frequent in bronchi of adult asthmatics or patients with COPD than controls. We found similar highly significant increases in Proteobacteria in asthmatic children. Conversely, Bacteroidetes, particularly Prevotella spp., were more frequent in controls than adult or child asthmatics or COPD patients. \n\nSignificance: The results show the bronchial tree to contain a characteristic microbiota, and suggest that this microbiota is disturbed in asthmatic airways.", "date": "2010-01-05", "date_type": "published", "publication": "PLOS ONE", "volume": "5", "number": "1", "publisher": "Public Library of Science", "pagerange": "Art. No. e8578", "id_number": "CaltechAUTHORS:20170306-131319458", "issn": "1932-6203", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170306-131319458", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "European Commission", "grant_number": "GABRIEL" }, { "agency": "Wellcome Trust" } ] }, "doi": "10.1371/journal.pone.0008578", "pmcid": "PMC2798952", "primary_object": { "basename": "journal.pone.0008578.PDF", "url": "https://authors.library.caltech.edu/records/2jf89-zme75/files/journal.pone.0008578.PDF" }, "resource_type": "article", "pub_year": "2010", "author_list": "Hilty, Markus; Burke, Conor; et el." }, { "id": "https://authors.library.caltech.edu/records/fbdkt-n0b22", "eprint_id": 74799, "eprint_status": "archive", "datestamp": "2023-08-20 02:17:06", "lastmod": "2023-10-24 23:18:09", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Morton-J", "name": { "family": "Morton", "given": "Jason" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Shiu-Anne", "name": { "family": "Shiu", "given": "Anne" } }, { "id": "Sturmfels-B", "name": { "family": "Sturmfels", "given": "Bernd" } }, { "id": "Wienand-O", "name": { "family": "Wienand", "given": "Oliver" } } ] }, "title": "Convex Rank Tests and Semigraphoids", "ispublished": "pub", "full_text_status": "public", "keywords": "braid arrangement, graphical model, permutohedron, polyhedral fan, rank test, semigraphoid, submodular function, symmetric group", "note": "\u00a9 2009 Society for Industrial and Applied Mathematics. \n\nSubmitted: 16 February 2008. Accepted: 22 January 2009. Published online: 10 July 2009. \n\nThis paper extends the note Geometry of Rank Tests, in Proceedings of the Probabilistic Graphical Models (PGM 3) conference, Prague, 2006. \n\nResearch for this author [JM] was supported as part of the DARPA Program Fundamental Laws of Biology. \n\nResearch for the second and fourth authors was supported as part of the DARPA Program Fundamental Laws of Biology. Research for the third author was supported by a Lucent Technologies Bell Labs Graduate Research Fellowship. \n\nResearch for this author [OW] was supported by the Wipprecht\nFoundation. \n\nOur research on rank tests originated in discussions with Olivier Pourqui\u00e9 and Mary-Lee Dequ\u00e9ant as part of the DARPA Program Fundamental Laws of Biology. We thank Milan Studen\u00fd and Franti\u0161ek Mat\u00fa\u0161 for helpful comments.\n\nPublished - 080715822.pdf
Submitted - 0702564.pdf
", "abstract": "Convex rank tests are partitions of the symmetric group which have desirable geometric properties. The statistical tests defined by such partitions involve counting all permutations in the equivalence classes. Each class consists of the linear extensions of a partially ordered set specified by data. Our methods refine existing rank tests of nonparametric statistics, such as the sign test and the runs test, and are useful for exploratory analysis of ordinal data. We establish a bijection between convex rank tests and probabilistic conditional independence structures known as semigraphoids. The subclass of submodular rank tests is derived from faces of the cone of submodular functions or from Minkowski summands of the permutohedron. We enumerate all small instances of such rank tests. Of particular interest are graphical tests, which correspond to both graphical models and to graph associahedra.", "date": "2009-07-10", "date_type": "published", "publication": "SIAM Journal on Discrete Mathematics", "volume": "23", "number": "3", "publisher": "Society for Industrial and Applied Mathematics", "pagerange": "1117-1134", "id_number": "CaltechAUTHORS:20170306-133921221", "issn": "0895-4801", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170306-133921221", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "Defense Advanced Research Projects Agency (DARPA)" }, { "agency": "Lucent Technologies Bell Labs" }, { "agency": "Wipprecht Foundation" } ] }, "doi": "10.1137/080715822", "primary_object": { "basename": "080715822.pdf", "url": "https://authors.library.caltech.edu/records/fbdkt-n0b22/files/080715822.pdf" }, "related_objects": [ { "basename": "0702564.pdf", "url": "https://authors.library.caltech.edu/records/fbdkt-n0b22/files/0702564.pdf" } ], "resource_type": "article", "pub_year": "2009", "author_list": "Morton, Jason; Pachter, Lior; et el." }, { "id": "https://authors.library.caltech.edu/records/6d8hx-6w537", "eprint_id": 74802, "eprint_status": "archive", "datestamp": "2023-08-20 01:38:58", "lastmod": "2023-10-24 23:18:20", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Trapnell-C", "name": { "family": "Trapnell", "given": "Cole" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Salzberg-S-L", "name": { "family": "Salzberg", "given": "Steven L." } } ] }, "title": "TopHat: discovering splice junctions with RNA-Seq", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2009 The Author(s). This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. \n\nWe thank Adam Phillippy, Geo Pertea, Ben Langmead, Kasper Hansen, Angela Brooks and Ali Mortazavi for helpful technical discussions. We thank Diane Trout, Ali Mortazavi, Brian Williams, Kenneth McCue, Lorian Schaeffer and Barbara Wold for making their data available for our case study. \n\nFunding: National Institues of Health (R01-LM06845, R01-GM083873 to S.L.S.); National Science Foundation (CCF 0347992 to L.P.). \n\nConflict of Interest: none declared.\n\nPublished - btp120.pdf
", "abstract": "Motivation: A new protocol for sequencing the messenger RNA in a cell, known as RNA-Seq, generates millions of short sequence fragments in a single run. These fragments, or 'reads', can be used to measure levels of gene expression and to identify novel splice variants of genes. However, current software for aligning RNA-Seq data to a genome relies on known splice junctions and cannot identify novel ones. TopHat is an efficient read-mapping algorithm designed to align reads from an RNA-Seq experiment to a reference genome without relying on known splice sites. \n\nResults: We mapped the RNA-Seq reads from a recent mammalian RNA-Seq experiment and recovered more than 72% of the splice junctions reported by the annotation-based software from that study, along with nearly 20 000 previously unreported junctions. The TopHat pipeline is much faster than previous systems, mapping nearly 2.2 million reads per CPU hour, which is sufficient to process an entire RNA-Seq experiment in less than a day on a standard desktop computer. We describe several challenges unique to ab initio splice site discovery from RNA-Seq reads that will require further algorithm development. \n\nAvailability: TopHat is free, open-source software available from http://tophat.cbcb.umd.edu", "date": "2009-05-01", "date_type": "published", "publication": "Bioinformatics", "volume": "25", "number": "9", "publisher": "Oxford University Press", "pagerange": "1105-1111", "id_number": "CaltechAUTHORS:20170306-141357019", "issn": "1367-4803", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170306-141357019", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R01-LM06845" }, { "agency": "NIH", "grant_number": "R01-GM083873" }, { "agency": "NSF", "grant_number": "CCF-0347992" } ] }, "doi": "10.1093/bioinformatics/btp120", "pmcid": "PMC2672628", "primary_object": { "basename": "btp120.pdf", "url": "https://authors.library.caltech.edu/records/6d8hx-6w537/files/btp120.pdf" }, "resource_type": "article", "pub_year": "2009", "author_list": "Trapnell, Cole; Pachter, Lior; et el." }, { "id": "https://authors.library.caltech.edu/records/rbmj2-r9288", "eprint_id": 74800, "eprint_status": "archive", "datestamp": "2023-08-20 01:36:30", "lastmod": "2023-10-24 23:18:12", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Bradley-R-K", "name": { "family": "Bradley", "given": "Robert K." } }, { "id": "Roberts-A", "name": { "family": "Roberts", "given": "Adam" } }, { "id": "Smoot-M", "name": { "family": "Smoot", "given": "Michael" } }, { "id": "Juvekar-S", "name": { "family": "Juvekar", "given": "Sudeep" } }, { "id": "Do-Jaeyoung", "name": { "family": "Do", "given": "Jaeyoung" } }, { "id": "Dewey-C-N", "name": { "family": "Dewey", "given": "Colin" } }, { "id": "Holmes-I", "name": { "family": "Holmes", "given": "Ian" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Fast Statistical Alignment", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2009 Bradley et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. \n\nReceived: October 22, 2008; Accepted: April 20, 2009; Published: May 29, 2009. \n\nIan Holmes was partially supported by NIH/NHGRI grant 1R01GM076705. Lior Pachter was partially supported by NSF CAREER award CCF 03-47992. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. \n\nFSA borrows heavily from previous work, both in its code base and its intellectual foundations. \n\nIdeas. The distance-based approach to multiple alignment was proposed in [13],[14]. This included the idea of modifying the accuracy criterion suggested [53] and [54] to include gaps and the demonstration that the resulting modified expected accuracy could be used to control the expected sensitivity and specificity. Furthermore, [13],[14] introduced the sequence annealing approach to building multiple alignments, via the description of alignments using partially ordered sets [31],[55],[56]. The graph-based approach to alignment was formalized by [57] and these results were used in the DIALIGN program [58]. \n\nThe query-specific learning method for re-estimating alignment parameters on the fly was inspired by [15] and conversations with Joseph Bradley about query-specific structure learning of graphical models. \n\nThe iterative refinement technique is based on ideas in [59]. \n\nThe FSA algorithm was parallelized using a modification of the approach in MW [60]. \n\nThe coloring in the GUI according to posterior probabilities of alignment is inspired by the AU viewer of BAli-Phy [9]. \n\nSoftware. The sequence annealing implementation in FSA is based on Ariel Schwartz's AMAP program [14], which implements the Pearce-Kelly algorithm [61]. \n\nFSA's query-specific learning uses code created with Gerton Lunter's HMMoC compiler for HMMs [15]. The \"aligner\" example distributed with HMMoC, which implements a learning procedure for gap parameters, was an inspiration for FSA's learning strategies. FSA's banding code is taken directly from the \"aligner\" example. \n\nFSA's sequence and alignment representation code was inspired by similar code in the dart library [62]. Several Perl packages distributed with FSA are derived from packages of the same name in dart. \n\nAuthor Contributions: Wrote the paper: RKB CD LP. Led the development of FSA, wrote most of the code base, and developed the query-specific learning method: RKB. Redesigned the sequence annealing algorithm, constituted the core development team, and managed the project: RKB CD LP. Developed the GUI: AR. Developed a preliminary version of the GUI: MS. Developed the iterative refinement technique: SJ. Developed the parellelization and database modes: JD CD. Provided advice on the dart library, including its algorithms, programming and software components: IH. Created the FSA webserver: LP. \n\nThe authors have declared that no competing interests exist.\n\nPublished - journal.pcbi.1000392.PDF
Supplemental Material - Text_S1.pdf
", "abstract": "We describe a new program for the alignment of multiple biological sequences that is both statistically motivated and fast enough for problem sizes that arise in practice. Our Fast Statistical Alignment program is based on pair hidden Markov models which approximate an insertion/deletion process on a tree and uses a sequence annealing algorithm to combine the posterior probabilities estimated from these models into a multiple alignment. FSA uses its explicit statistical model to produce multiple alignments which are accompanied by estimates of the alignment accuracy and uncertainty for every column and character of the alignment\u2014previously available only with alignment programs which use computationally-expensive Markov Chain Monte Carlo approaches\u2014yet can align thousands of long sequences. Moreover, FSA utilizes an unsupervised query-specific learning procedure for parameter estimation which leads to improved accuracy on benchmark reference alignments in comparison to existing programs. The centroid alignment approach taken by FSA, in combination with its learning procedure, drastically reduces the amount of false-positive alignment on biological data in comparison to that given by other methods. The FSA program and a companion visualization tool for exploring uncertainty in alignments can be used via a web interface at http://orangutan.math.berkeley.edu/fsa/, and the source code is available at http://fsa.sourceforge.net/.", "date": "2009-05", "date_type": "published", "publication": "PLOS Computational Biology", "volume": "5", "number": "5", "publisher": "Public Library of Science", "pagerange": "Art. No. e1000392", "id_number": "CaltechAUTHORS:20170306-135830452", "issn": "1553-7358", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170306-135830452", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "1R01GM076705" }, { "agency": "NSF", "grant_number": "CCF-0347992" } ] }, "doi": "10.1371/journal.pcbi.1000392", "pmcid": "PMC2684580", "primary_object": { "basename": "Text_S1.pdf", "url": "https://authors.library.caltech.edu/records/rbmj2-r9288/files/Text_S1.pdf" }, "related_objects": [ { "basename": "journal.pcbi.1000392.PDF", "url": "https://authors.library.caltech.edu/records/rbmj2-r9288/files/journal.pcbi.1000392.PDF" } ], "resource_type": "article", "pub_year": "2009", "author_list": "Bradley, Robert K.; Roberts, Adam; et el." }, { "id": "https://authors.library.caltech.edu/records/8c606-67x34", "eprint_id": 74834, "eprint_status": "archive", "datestamp": "2023-08-21 21:22:03", "lastmod": "2023-10-24 23:20:53", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Mihaescu-R", "name": { "family": "Mihaescu", "given": "Radu" } }, { "id": "Levy-D", "name": { "family": "Levy", "given": "Dan" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Why neighbor-joining works", "ispublished": "pub", "full_text_status": "public", "keywords": "Distance methods; Edge radius; Neighbor-joining; Quartets", "note": "\u00a9 2007 Springer Science+Business Media, LLC. \n\nReceived: 25 December 2006 / Accepted: 15 October 2007 / Published online: 4 December 2007. \n\nRadu Mihaescu was supported by a National Science Foundation graduate fellowship, and partially by the Fannie and John Hertz Foundation. Lior Pachter was partially supported by NIH grant R01HG2362 and NSF grant CCF0347992. Dan Levy was supported by NIH grant GM68423.\n\nSubmitted - 0602041.pdf
", "abstract": "We show that the neighbor-joining algorithm is a robust quartet method for constructing trees from distances. This leads to a new performance guarantee that contains Atteson's optimal radius bound as a special case and explains many cases where neighbor-joining is successful even when Atteson's criterion is not satisfied. We also provide a proof for Atteson's conjecture on the optimal edge radius of the neighbor-joining algorithm. The strong performance guarantees we provide also hold for the quadratic time fast neighbor-joining algorithm, thus providing a theoretical basis for inferring very large phylogenies with neighbor-joining.", "date": "2009-05", "date_type": "published", "publication": "Algorithmica", "volume": "54", "number": "1", "publisher": "Springer", "pagerange": "1-24", "id_number": "CaltechAUTHORS:20170307-094400293", "issn": "0178-4617", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170307-094400293", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NSF Graduate Research Fellowship" }, { "agency": "Fannie and John Hertz Foundation" }, { "agency": "NIH", "grant_number": "R01HG2362" }, { "agency": "NSF", "grant_number": "CCF-0347992" }, { "agency": "NIH", "grant_number": "GM68423" } ] }, "doi": "10.1007/s00453-007-9116-4", "primary_object": { "basename": "0602041.pdf", "url": "https://authors.library.caltech.edu/records/8c606-67x34/files/0602041.pdf" }, "resource_type": "article", "pub_year": "2009", "author_list": "Mihaescu, Radu; Levy, Dan; et el." }, { "id": "https://authors.library.caltech.edu/records/64b05-1m385", "eprint_id": 74844, "eprint_status": "archive", "datestamp": "2023-08-20 00:08:21", "lastmod": "2023-10-24 23:21:22", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Bradley-R-K", "name": { "family": "Bradley", "given": "Robert K." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Holmes-I", "name": { "family": "Holmes", "given": "Ian" } } ] }, "title": "Specific alignment of structured RNA: stochastic grammars and sequence annealing", "ispublished": "pub", "full_text_status": "restricted", "note": "\u00a9 The Author 2008. Published by Oxford University Press. \n\nReceived on June 20, 2008; revised and accepted on September 15, 2008. Advance Access publication September 16, 2008. \n\nWe thank Lars Barquist for computer support and Ariel Schwartz for the original development and implementation of the sequence annealing technique. \n\nFunding: NIH/NHGRI (grant 1R01GM076705); NSF CAREER award (CCF 03-47992 to L.P.); NSF Graduate Research Fellowship (to R.K.B.). \n\nConflict of Interest: none declared.", "abstract": "Motivation: Whole-genome screens suggest that eukaryotic genomes are dense with non-coding RNAs (ncRNAs). We introduce a novel approach to RNA multiple alignment which couples a generative probabilistic model of sequence and structure with an efficient sequence annealing approach for exploring the space of multiple alignments. This leads to a new software program, Stemloc-AMA, that is both accurate and specific in the alignment of multiple related RNA sequences. \n\nResults: When tested on the benchmark datasets BRalibase II and BRalibase 2.1, Stemloc-AMA has comparable sensitivity to and better specificity than the best competing methods. We use a large-scale random sequence experiment to show that while most alignment programs maximize sensitivity at the expense of specificity, even to the point of giving complete alignments of non-homologous sequences, Stemloc-AMA aligns only sequences with detectable homology and leaves unrelated sequences largely unaligned. Such accurate and specific alignments are crucial for comparative-genomics analysis, from inferring phylogeny to estimating substitution rates across different lineages. \n\nAvailability: Stemloc-AMA is available from http://biowiki.org/StemLocAMA as part of the dart software package for sequence analysis.", "date": "2008-12-01", "date_type": "published", "publication": "Bioinformatics", "volume": "24", "number": "23", "publisher": "Oxford University Press", "pagerange": "2677-2683", "id_number": "CaltechAUTHORS:20170307-103825678", "issn": "1367-4803", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170307-103825678", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "1R01GM076705" }, { "agency": "NSF", "grant_number": "CCF-0347992" }, { "agency": "NSF Graduate Research Fellowship" } ] }, "doi": "10.1093/bioinformatics/btn495", "pmcid": "PMC2732270", "resource_type": "article", "pub_year": "2008", "author_list": "Bradley, Robert K.; Pachter, Lior; et el." }, { "id": "https://authors.library.caltech.edu/records/newbg-mqk69", "eprint_id": 74805, "eprint_status": "archive", "datestamp": "2023-08-19 23:37:49", "lastmod": "2023-10-24 23:18:30", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Mihaescu-R", "name": { "family": "Mihaescu", "given": "Radu" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Combinatorics of least squares trees", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2008 National Academy of Sciences. \n\nEdited by Peter J. Bickel, University of California, Berkeley, CA, and approved May 21, 2008 (received for review March 3, 2007) \n\nR.M. was supported by a National Science Foundation (NSF) Graduate Fellowship and partially by the Fannie and John Hertz. \n\nAuthor contributions: R.M. and L.P. designed research, performed research, and wrote the paper. \n\nThe authors declare no conflict of interest. \n\nThis article is a PNAS Direct Submission.\n\nPublished - 25464022.pdf
Submitted - 0802.2395.pdf
", "abstract": "A recurring theme in the least squares approach to phylogenetics has been the discovery of elegant combinatorial formulas for the least squares estimates of edge lengths. These formulas have proved useful for the development of efficient algorithms, and have also been important for understanding connections among popular phylogeny algorithms. For example, the selection criterion of the neighbor-joining algorithm is now understood in terms of the combinatorial formulas of Pauplin for estimating tree length. We highlight a phylogenetically desirable property that weighted least squares methods should satisfy, and provide a complete characterization of methods that satisfy the property. The necessary and sufficient condition is a multiplicative four point condition that the the variance matrix needs to satisfy. The proof is based on the observation that the Lagrange multipliers in the proof of the Gauss\u2013Markov theorem are tree-additive. Our results generalize and complete previous work on ordinary least squares, balanced minimum evolution and the taxon weighted variance model. They also provide a time optimal algorithm for computation.", "date": "2008-09-09", "date_type": "published", "publication": "Proceedings of the National Academy of Sciences of the United States of America", "volume": "105", "number": "36", "publisher": "National Academy of Sciences", "pagerange": "13206-13211", "id_number": "CaltechAUTHORS:20170306-144249240", "issn": "0027-8424", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170306-144249240", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NSF Graduate Research Fellowship" }, { "agency": "Fannie and John Hertz Foundation" } ] }, "doi": "10.1073/pnas.0802089105", "pmcid": "PMC2533170", "primary_object": { "basename": "0802.2395.pdf", "url": "https://authors.library.caltech.edu/records/newbg-mqk69/files/0802.2395.pdf" }, "related_objects": [ { "basename": "25464022.pdf", "url": "https://authors.library.caltech.edu/records/newbg-mqk69/files/25464022.pdf" } ], "resource_type": "article", "pub_year": "2008", "author_list": "Mihaescu, Radu and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/xmecw-c0d24", "eprint_id": 74845, "eprint_status": "archive", "datestamp": "2023-08-19 23:27:21", "lastmod": "2023-10-24 23:21:27", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Dequ\u00e9ant-M-L", "name": { "family": "Dequ\u00e9ant", "given": "Mary-Lee" } }, { "id": "Ahnert-S", "name": { "family": "Ahnert", "given": "Sebastian" } }, { "id": "Edelsbrunner-H", "name": { "family": "Edelsbrunner", "given": "Herbert" } }, { "id": "Fink-T-M-A", "name": { "family": "Fink", "given": "Thomas M. A." } }, { "id": "Glynn-E-F", "name": { "family": "Glynn", "given": "Earl F." } }, { "id": "Hattem-G", "name": { "family": "Hattem", "given": "Gaye" } }, { "id": "Kudlicki-A", "name": { "family": "Kudlicki", "given": "Andrzej" } }, { "id": "Mileyko-Y", "name": { "family": "Mileyko", "given": "Yuriy" } }, { "id": "Morton-J", "name": { "family": "Morton", "given": "Jason" } }, { "id": "Mushegian-A-R", "name": { "family": "Mushegian", "given": "Arcady R." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Rowicka-M", "name": { "family": "Rowicka", "given": "Maga" } }, { "id": "Shiu-Anne", "name": { "family": "Shiu", "given": "Anne" } }, { "id": "Sturmfels-B", "name": { "family": "Sturmfels", "given": "Bernd" } }, { "id": "Pourqui\u00e9-O", "name": { "family": "Pourqui\u00e9", "given": "Olivier" } } ] }, "title": "Comparison of Pattern Detection Methods in Microarray Time Series of the Segmentation Clock", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2008 Dequ\u00e9ant et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. \n\nReceived: December 19, 2007; Accepted: April 26, 2008; Published: August 6, 2008. \n\nThis research was partially supported by DARPA grant HR 0011-05-1-0057. HE and YM mathematical work was supported by DARPA grant HR0011-05-1-0007. AS research was supported by a Lucent Technologies Bell Labs Graduate Research. Fellowship; AK and MR research was supported by NIH grant GM U54 GM74942; and SA research was supported by Association pour la Recherche sur le Cancer (ARC), France. OP, AM, MLD, EG and GH research was supported by the Stowers Institute for Medical Research. OP is a Howard Hughes Medical Institute Investigator. \n\nThe authors thank Z. Otwinowski for helpful discussions, J. Chatfield for editorial assistance and S. Esteban for artwork. \n\nAuthor Contributions: Performed the experiments: MLD. Analyzed the data: MLD. Wrote the paper: MLD HE TMAF GH AK AM MR OP. Prepared the microarray data: MD. Conceived, designed and implemented the algorithm of the Address reduction method: SA TMAF. Conceived and designed the algorithm of the Stable persistence method: HE YM. Implemented the algorithm of the Lomb Scargle method: EFG. Implemented the automated PubMed Search: GH. Conceived, designed and implemented the algorithm of the Phase consistency method: AK MR. Conceived, designed and implemented the algorithm of the Cyclohedron test method: JM AS. Conceived and designed the algorithm of the Cyclohedron test method: LP BS. \n\nCompeting interests: Thomas M.A. Fink and Sebastian Ahnert have filed U.S. patent 20070086635, Method of identifying pattern in a series of data.\n\nPublished - journal.pone.0002856.PDF
Supplemental Material - 149911.zip
", "abstract": "While genome-wide gene expression data are generated at an increasing rate, the repertoire of approaches for pattern discovery in these data is still limited. Identifying subtle patterns of interest in large amounts of data (tens of thousands of profiles) associated with a certain level of noise remains a challenge. A microarray time series was recently generated to study the transcriptional program of the mouse segmentation clock, a biological oscillator associated with the periodic formation of the segments of the body axis. A method related to Fourier analysis, the Lomb-Scargle periodogram, was used to detect periodic profiles in the dataset, leading to the identification of a novel set of cyclic genes associated with the segmentation clock. Here, we applied to the same microarray time series dataset four distinct mathematical methods to identify significant patterns in gene expression profiles. These methods are called: Phase consistency, Address reduction, Cyclohedron test and Stable persistence, and are based on different conceptual frameworks that are either hypothesis- or data-driven. Some of the methods, unlike Fourier transforms, are not dependent on the assumption of periodicity of the pattern of interest. Remarkably, these methods identified blindly the expression profiles of known cyclic genes as the most significant patterns in the dataset. Many candidate genes predicted by more than one approach appeared to be true positive cyclic genes and will be of particular interest for future research. In addition, these methods predicted novel candidate cyclic genes that were consistent with previous biological knowledge and experimental validation in mouse embryos. Our results demonstrate the utility of these novel pattern detection strategies, notably for detection of periodic profiles, and suggest that combining several distinct mathematical approaches to analyze microarray datasets is a valuable strategy for identifying genes that exhibit novel, interesting transcriptional patterns.", "date": "2008-08-06", "date_type": "published", "publication": "PLOS ONE", "volume": "3", "number": "8", "publisher": "Public Library of Science", "pagerange": "Art. No. e2856", "id_number": "CaltechAUTHORS:20170307-104539290", "issn": "1932-6203", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170307-104539290", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "Defense Advanced Research Projects Agency (DARPA)", "grant_number": "HR0011-05-1-0057" }, { "agency": "Defense Advanced Research Projects Agency (DARPA)", "grant_number": "HR0011-05-1-0007" }, { "agency": "Lucent Technologies Bell Labs" }, { "agency": "NIH", "grant_number": "U54 GM7494" }, { "agency": "Association pour la Recherche sur le Cancer" }, { "agency": "Stowers Institute for Medical Research" }, { "agency": "Howard Hughes Medical Institute (HHMI)" } ] }, "doi": "10.1371/journal.pone.0002856", "pmcid": "PMC2481401", "primary_object": { "basename": "149911.zip", "url": "https://authors.library.caltech.edu/records/xmecw-c0d24/files/149911.zip" }, "related_objects": [ { "basename": "journal.pone.0002856.PDF", "url": "https://authors.library.caltech.edu/records/xmecw-c0d24/files/journal.pone.0002856.PDF" } ], "resource_type": "article", "pub_year": "2008", "author_list": "Dequ\u00e9ant, Mary-Lee; Ahnert, Sebastian; et el." }, { "id": "https://authors.library.caltech.edu/records/pc0jh-45q03", "eprint_id": 74847, "eprint_status": "archive", "datestamp": "2023-08-19 22:43:25", "lastmod": "2023-10-24 23:21:30", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Satija-R", "name": { "family": "Satija", "given": "Rahul" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Hein-J", "name": { "family": "Hein", "given": "Jotun" } } ] }, "title": "Combining statistical alignment and phylogenetic footprinting to detect regulatory elements", "ispublished": "pub", "full_text_status": "restricted", "note": "\u00a9 The Author 2008. Published by Oxford University Press. \n\nReceived on January 21, 2008; revised on February 21, 2008; accepted on March 17, 2008. Advance Access publication March 18, 2008. \n\nWe thank Istv\u00e1n Mikl\u00f3s, Rune Lyngs\u00f8 and Gerton Lunter for helpful discussion. R.S. is funded by the Rhodes Trust, UK. \n\nConflict of Interest: none declared.", "abstract": "Motivation: Traditional alignment-based phylogenetic footprinting approaches make predictions on the basis of a single assumed alignment. The predictions are therefore highly sensitive to alignment errors or regions of alignment uncertainty. Alternatively, statistical alignment methods provide a framework for performing phylogenetic analyses by examining a distribution of alignments. \n\nResults: We developed a novel algorithm for predicting functional elements by combining statistical alignment and phylogenetic footprinting (SAPF). SAPF simultaneously performs both alignment and annotation by combining phylogenetic footprinting techniques with an hidden Markov model (HMM) transducer-based multiple alignment model, and can analyze sequence data from multiple sequences. We assessed SAPF's predictive performance on two simulated datasets and three well-annotated cis-regulatory modules from newly sequenced Drosophila genomes. The results demonstrate that removing the traditional dependence on a single alignment can significantly augment the predictive performance, especially when there is uncertainty in the alignment of functional regions. \n\nAvailability: SAPF is freely available to download online at http://www.stats.ox.ac.uk/~satija/SAPF/", "date": "2008-05-15", "date_type": "published", "publication": "Bioinformatics", "volume": "24", "number": "10", "publisher": "Oxford University Press", "pagerange": "1236-1242", "id_number": "CaltechAUTHORS:20170307-111036850", "issn": "1367-4803", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170307-111036850", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "Rhodes Trust" } ] }, "doi": "10.1093/bioinformatics/btn104", "resource_type": "article", "pub_year": "2008", "author_list": "Satija, Rahul; Pachter, Lior; et el." }, { "id": "https://authors.library.caltech.edu/records/096a7-06d87", "eprint_id": 74797, "eprint_status": "archive", "datestamp": "2023-08-19 22:38:33", "lastmod": "2023-10-24 23:18:04", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Tesler-G", "name": { "family": "Tesler", "given": "Glenn" } }, { "id": "Eriksson-N", "name": { "family": "Eriksson", "given": "Nicholas" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Mitsuya-Yumi", "name": { "family": "Mitsuya", "given": "Yumi" } }, { "id": "Rhee-Soo-Yon", "name": { "family": "Rhee", "given": "Soo-Yon" } }, { "id": "Wang-Chunlin", "name": { "family": "Wang", "given": "Chunlin" } }, { "id": "Gharizadeh-B", "name": { "family": "Gharizadeh", "given": "Baback" } }, { "id": "Ronaghi-M", "name": { "family": "Ronaghi", "given": "Mostafa" } }, { "id": "Shafer-R-W", "name": { "family": "Shafer", "given": "Robert W." } }, { "id": "Beerenwinkel-N", "name": { "family": "Beerenwinkel", "given": "Niko" } } ] }, "title": "Viral Population Estimation Using Pyrosequencing", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2008 Eriksson et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. \n\nReceived: July 2, 2007; Accepted: March 27, 2008; Published: May 9, 2008. \n\nN. Eriksson and L. Pachter were partially supported by the NSF (grants DMS-0603448 and CCF-0347992, respectively). N. Beerenwinkel was funded by a grant from the Bill and Melinda Gates Foundation through the Grand Challenges in Global Health Initiative. The NSF has played no role in any part of this work. \n\nThe authors have declared that no competing interests exist. \n\nAuthor Contributions. Performed the experiments: YM SR CW BG MR RS. Analyzed the data: NE LP NB. Wrote the paper: NE LP NB\n\nPublished - journal.pcbi.1000074_1_.PDF
Submitted - 0707.0114.pdf
Supplemental Material - Figure_S3_1_.pdf
Supplemental Material - Figure_S4.pdf
Supplemental Material - Figure_Supp2.pdf
Supplemental Material - upp.1.pdf
", "abstract": "The diversity of virus populations within single infected hosts presents a major difficulty for the natural immune response as well as for vaccine design and antiviral drug therapy. Recently developed pyrophosphate-based sequencing technologies (pyrosequencing) can be used for quantifying this diversity by ultra-deep sequencing of virus samples. We present computational methods for the analysis of such sequence data and apply these techniques to pyrosequencing data obtained from HIV populations within patients harboring drug-resistant virus strains. Our main result is the estimation of the population structure of the sample from the pyrosequencing reads. This inference is based on a statistical approach to error correction, followed by a combinatorial algorithm for constructing a minimal set of haplotypes that explain the data. Using this set of explaining haplotypes, we apply a statistical model to infer the frequencies of the haplotypes in the population via an expectation\u2013maximization (EM) algorithm. We demonstrate that pyrosequencing reads allow for effective population reconstruction by extensive simulations and by comparison to 165 sequences obtained directly from clonal sequencing of four independent, diverse HIV populations. Thus, pyrosequencing can be used for cost-effective estimation of the structure of virus populations, promising new insights into viral evolutionary dynamics and disease control strategies.", "date": "2008-05", "date_type": "published", "publication": "PLoS Computational Biology", "volume": "4", "number": "5", "publisher": "Public Library of Science", "pagerange": "Art. No. e1000074", "id_number": "CaltechAUTHORS:20170306-133352205", "issn": "1553-7358", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170306-133352205", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NSF", "grant_number": "DMS-0603448" }, { "agency": "NSF", "grant_number": "CCF-0347992" }, { "agency": "Bill and Melinda Gates Foundation" } ] }, "doi": "10.1371/journal.pcbi.1000074", "pmcid": "PMC2323617", "primary_object": { "basename": "journal.pcbi.1000074_1_.PDF", "url": "https://authors.library.caltech.edu/records/096a7-06d87/files/journal.pcbi.1000074_1_.PDF" }, "related_objects": [ { "basename": "upp.1.pdf", "url": "https://authors.library.caltech.edu/records/096a7-06d87/files/upp.1.pdf" }, { "basename": "0707.0114.pdf", "url": "https://authors.library.caltech.edu/records/096a7-06d87/files/0707.0114.pdf" }, { "basename": "Figure_S3_1_.pdf", "url": "https://authors.library.caltech.edu/records/096a7-06d87/files/Figure_S3_1_.pdf" }, { "basename": "Figure_S4.pdf", "url": "https://authors.library.caltech.edu/records/096a7-06d87/files/Figure_S4.pdf" }, { "basename": "Figure_Supp2.pdf", "url": "https://authors.library.caltech.edu/records/096a7-06d87/files/Figure_Supp2.pdf" } ], "resource_type": "article", "pub_year": "2008", "author_list": "Tesler, Glenn; Eriksson, Nicholas; et el." }, { "id": "https://authors.library.caltech.edu/records/mdrpa-46960", "eprint_id": 74804, "eprint_status": "archive", "datestamp": "2023-08-19 22:34:34", "lastmod": "2023-10-24 23:18:27", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Eickmeyer-K", "name": { "family": "Eickmeyer", "given": "Kord" } }, { "id": "Huggins-P", "name": { "family": "Huggins", "given": "Peter" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Yoshida-Ruriko", "name": { "family": "Yoshida", "given": "Ruriko" } } ] }, "title": "On the optimality of the neighbor-joining algorithm", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 Eickmeyer et al; licensee BioMed Central Ltd. 2008. This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. \n\nReceived: 13 November 2007. Accepted: 30 April 2008. Published: 30 April 2008. \n\nThe authors declare that they have no competing interests.\n\nPublished - art_3A10.1186_2F1748-7188-3-5.pdf
Submitted - 0710.5142.pdf
Supplemental Material - 13015_2007_46_MOESM1_ESM.pdf
Supplemental Material - 13015_2007_46_MOESM2_ESM.pdf
Supplemental Material - 13015_2007_46_MOESM3_ESM.pdf
Supplemental Material - 13015_2007_46_MOESM4_ESM.pdf
", "abstract": "The popular neighbor-joining (NJ) algorithm used in phylogenetics is a greedy algorithm for finding the balanced minimum evolution (BME) tree associated to a dissimilarity map. From this point of view, NJ is \"optimal\" when the algorithm outputs the tree which minimizes the balanced minimum evolution criterion. We use the fact that the NJ tree topology and the BME tree topology are determined by polyhedral subdivisions of the spaces of dissimilarity maps \u211b^(^n _2)_+ to study the optimality of the neighbor-joining algorithm. In particular, we investigate and compare the polyhedral subdivisions for n \u2264 8. This requires the measurement of volumes of spherical polytopes in high dimension, which we obtain using a combination of Monte Carlo methods and polyhedral algorithms. Our results include a demonstration that highly unrelated trees can be co-optimal in BME reconstruction, and that NJ regions are not convex. We obtain the l_2 radius for neighbor-joining for n = 5 and we conjecture that the ability of the neighbor-joining algorithm to recover the BME tree depends on the diameter of the BME tree.", "date": "2008-04-30", "date_type": "published", "publication": "Algorithms for Molecular Biology", "volume": "3", "publisher": "BioMed Central", "pagerange": "Art. No. 5", "id_number": "CaltechAUTHORS:20170306-143033087", "issn": "1748-7188", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170306-143033087", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "doi": "10.1186/1748-7188-3-5", "pmcid": "PMC2430562", "primary_object": { "basename": "0710.5142.pdf", "url": "https://authors.library.caltech.edu/records/mdrpa-46960/files/0710.5142.pdf" }, "related_objects": [ { "basename": "13015_2007_46_MOESM1_ESM.pdf", "url": "https://authors.library.caltech.edu/records/mdrpa-46960/files/13015_2007_46_MOESM1_ESM.pdf" }, { "basename": "13015_2007_46_MOESM2_ESM.pdf", "url": "https://authors.library.caltech.edu/records/mdrpa-46960/files/13015_2007_46_MOESM2_ESM.pdf" }, { "basename": "13015_2007_46_MOESM3_ESM.pdf", "url": "https://authors.library.caltech.edu/records/mdrpa-46960/files/13015_2007_46_MOESM3_ESM.pdf" }, { "basename": "13015_2007_46_MOESM4_ESM.pdf", "url": "https://authors.library.caltech.edu/records/mdrpa-46960/files/13015_2007_46_MOESM4_ESM.pdf" }, { "basename": "art_3A10.1186_2F1748-7188-3-5.pdf", "url": "https://authors.library.caltech.edu/records/mdrpa-46960/files/art_3A10.1186_2F1748-7188-3-5.pdf" } ], "resource_type": "article", "pub_year": "2008", "author_list": "Eickmeyer, Kord; Huggins, Peter; et el." }, { "id": "https://authors.library.caltech.edu/records/crp2q-p2j11", "eprint_id": 74849, "eprint_status": "archive", "datestamp": "2023-08-19 21:25:40", "lastmod": "2023-10-24 23:21:35", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Stark-A", "name": { "family": "Stark", "given": "Alexander" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2007 Macmillan Publishers Limited. \n\nReceived 21 July 2007; Accepted 4 October 2007. \n\nWe thank the National Human Genome Research Institute (NHGRI) for continued support. A.S. was supported in part by the Schering AG/Ernst Schering Foundation and in part by the Human Frontier Science Program Organization (HFSPO). P.K. was supported in part by a National Science Foundation Graduate Research Fellowship. J.S.P. thanks B. Raney and R. Baertsch, and the Danish Medical Research Council and the National Cancer Institute for support. J.B. thanks the Schering AG/Ernst Schering Foundation for a postdoctoral fellowship. L.Parts thanks J. Vilo. S.R. was supported by a HHMI-NIH/NIBIB Interfaces Training Grant and thanks T. Lane and M. Werner-Washburne. D.H., D.P.B., G.J.H. and T.C.K. are Investigators of the Howard Hughes Medical Institute, and B.P., J.G.R., E.H. and J.B. are affiliated with these investigators. J.W.C. and S.E.C. were supported by the NHGRI. M.K. was supported by start-up funds from the MIT Electrical Engineering and Computer Science Laboratory, the Broad Institute of MIT and Harvard, and the MIT Computer Science and Artificial Intelligence Laboratory, and by the Distinguished Alumnus (1964) Career Development Professorship. \n\nAuthor Contributions -- Organizing committee: Manolis Kellis, William Gelbart, Doug Smith, Andrew G. Clark, Michael E. Eisen, Thomas C. Kaufman; protein-coding gene prediction: Michael F. Lin, Ameya N. Deoras, Mira V. Han, Matthew W. Hahn, Donald G. Gilbert, Michael Weir, Michael Rice, Manolis Kellis; manual curation of protein-coding genes: Madeline A. Crosby, Harvard FlyBase curators, William M. Gelbart; validation of protein-coding genes: Joseph W. Carlson, Berkeley Drosophila Genome Project, Susan E. Celniker; non-coding RNA gene prediction: Jakob S. Pedersen, David Haussler, Yongkyu Park, Seung-Won Park, Manolis Kellis; microRNA gene prediction: Alexander Stark, Pouya Kheradpour, Leopold Parts, Manolis Kellis; microRNA cloning and sequencing: Julius Brennecke, Emily Hodges, Gregory J. Hannon; microRNA target prediction: Alexander Stark, J. Graham Ruby, Manolis Kellis, Eric C. Lai, David P. Bartel; motif identification: Alexander Stark, Pouya Kheradpour, Manolis Kellis; motif instance prediction: Alexander Stark, Pouya Kheradpour, Sushmita Roy, Morgan L. Maeder, Benjamin J. Polansky, Bryanne E. Robson, Deborah A. Eastman, Stein Aerts, Bassem Hassan, Jacques van Helden, Manolis Kellis; genome alignments: Angie S. Hinrichs, W. James Kent, Anat Caspi, Lior Pachter, Colin N. Dewey, Benedict Paten; phylogeny and branch length estimation: Matthew D. Rasmussen, Manolis Kellis; final manuscript preparation: Alexander Stark, Michael F. Lin, Pouya Kheradpour, Jakob Pedersen, Manolis Kellis. \n\nThe authors declare no competing financial interests.\n\nAccepted Version - nihms40032.pdf
Supplemental Material - nature06340-s1.pdf
", "abstract": "Sequencing of multiple related species followed by comparative genomics analysis constitutes a powerful approach for the systematic understanding of any genome. Here, we use the genomes of 12 Drosophila species for the de novo discovery of functional elements in the fly. Each type of functional element shows characteristic patterns of change, or 'evolutionary signatures', dictated by its precise selective constraints. Such signatures enable recognition of new protein-coding genes and exons, spurious and incorrect gene annotations, and numerous unusual gene structures, including abundant stop-codon readthrough. Similarly, we predict non-protein-coding RNA genes and structures, and new microRNA (miRNA) genes. We provide evidence of miRNA processing and functionality from both hairpin arms and both DNA strands. We identify several classes of pre- and post-transcriptional regulatory motifs, and predict individual motif instances with high confidence. We also study how discovery power scales with the divergence and number of species compared, and we provide general guidelines for comparative studies.", "date": "2007-11-08", "date_type": "published", "publication": "Nature", "volume": "450", "number": "7167", "publisher": "Nature Publishing Group", "pagerange": "219-232", "id_number": "CaltechAUTHORS:20170307-112926954", "issn": "0028-0836", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170307-112926954", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "National Human Genome Research Institute" }, { "agency": "Ernst Schering Foundation" }, { "agency": "Human Frontier Science Program" }, { "agency": "NSF Graduate Research Fellowship" }, { "agency": "Danish Medical Research Council" }, { "agency": "National Cancer Institute" }, { "agency": "Howard Hughes Medical Institute (HHMI)" }, { "agency": "National Institute of Biomedical Imaging and Bioengineering" }, { "agency": "Massachusetts Institute of Technology (MIT)" }, { "agency": "Broad Institute of MIT and Harvard" } ] }, "doi": "10.1038/nature06340", "pmcid": "PMC2474711", "primary_object": { "basename": "nihms40032.pdf", "url": "https://authors.library.caltech.edu/records/crp2q-p2j11/files/nihms40032.pdf" }, "related_objects": [ { "basename": "nature06340-s1.pdf", "url": "https://authors.library.caltech.edu/records/crp2q-p2j11/files/nature06340-s1.pdf" } ], "resource_type": "article", "pub_year": "2007", "author_list": "Stark, Alexander and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/04wc6-q2j80", "eprint_id": 74850, "eprint_status": "archive", "datestamp": "2023-08-19 21:25:51", "lastmod": "2023-10-24 23:21:40", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Clark-A-G", "name": { "family": "Clark", "given": "Andrew G." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Evolution of genes and genomes on the Drosophila phylogeny", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2007 Macmillan Publishers Limited. \n\nReceived 19 July 2007; Accepted 5 October 2007. \n\nAgencourt Bioscience Corporation, The Broad Institute of MIT and Harvard and the Washington University Genome Sequencing Center were supported by grants and contracts from the National Human Genome Research Insititute (NHGRI). T.C. Kaufman acknowledges support from the Indian Genomics Initiative. \n\nAuthor Contributions The laboratory groups of A. G. Clark (including A. M. Larracuente, T. B. Sackton, and N. D. Singh) and Michael B. Eisen (including V. N. Iyer and D. A. Pollard) played the part of coordinating the primary writing and editing of the manuscript with the considerable help of D. R. Smith, C. M. Bergman, W. M. Gelbart, B. Oliver, T. A. Markow, T. C. Kaufman and M. Kellis. D. R. Smith served as primary coordinator for the assemblies. The remaining authors contributed either through their efforts in sequence production, assembly and annotation, or in the analysis of specific topics that served as the focus of more than 40 companion papers. \n\nThe author declares no competing financial interests.\n\nSupplemental Material - nature06341-s1.pdf
Supplemental Material - nature06341-s2.xls
", "abstract": "Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species.", "date": "2007-11-08", "date_type": "published", "publication": "Nature", "volume": "450", "number": "7167", "publisher": "Nature Publishing Group", "pagerange": "203-218", "id_number": "CaltechAUTHORS:20170307-113900780", "issn": "0028-0836", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170307-113900780", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "National Human Genome Research Institute" }, { "agency": "Indian Genomics Initiative" } ] }, "corp_creators": { "items": [ "Drosophila 12 Genomes Consortium" ] }, "doi": "10.1038/nature06341", "primary_object": { "basename": "nature06341-s1.pdf", "url": "https://authors.library.caltech.edu/records/04wc6-q2j80/files/nature06341-s1.pdf" }, "related_objects": [ { "basename": "nature06341-s2.xls", "url": "https://authors.library.caltech.edu/records/04wc6-q2j80/files/nature06341-s2.xls" }, { "basename": "nature06341-s3.xls", "url": "https://authors.library.caltech.edu/records/04wc6-q2j80/files/nature06341-s3.xls" } ], "resource_type": "article", "pub_year": "2007", "author_list": "Clark, Andrew G. and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/qj3nn-jb436", "eprint_id": 74852, "eprint_status": "archive", "datestamp": "2023-08-19 21:23:13", "lastmod": "2023-10-24 23:21:48", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Begun-David-J", "name": { "family": "Begun", "given": "David J." } }, { "id": "Holloway-Alisha-K", "name": { "family": "Holloway", "given": "Alisha K." } }, { "id": "Stevens-Kristian", "name": { "family": "Stevens", "given": "Kristian" } }, { "id": "Hillier-LaDeana-W", "name": { "family": "Hillier", "given": "LaDeana W." } }, { "id": "Poh-Yu-Ping", "name": { "family": "Poh", "given": "Yu-Ping" } }, { "id": "Hahn-Matthew-W", "name": { "family": "Hahn", "given": "Matthew W." } }, { "id": "Nista-Phillip-M", "name": { "family": "Nista", "given": "Phillip M." } }, { "id": "Jones-Corbin-D", "name": { "family": "Jones", "given": "Corbin D." } }, { "id": "Kern-Andrew-D", "name": { "family": "Kern", "given": "Andrew D." } }, { "id": "Dewey-Colin-N", "name": { "family": "Dewey", "given": "Colin N." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Myers-Eugene-W", "name": { "family": "Myers", "given": "Eugene" } }, { "id": "Langley-Charles-H", "name": { "family": "Langley", "given": "Charles H." } } ] }, "title": "Population Genomics: Whole-Genome Analysis of Polymorphism and Divergence in Drosophila simulans", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2007 Begun et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. \n\nReceived: March 19, 2007; Accepted: September 26, 2007; Published: November 6, 2007. \n\nDJB was supported by National Institutes of Health (NIH) Grant R01 GM071926. CHL was supported by NIH R01HG2107\u20133 and NIH HG02942-01A1. AKH was supported by a National Science Foundation (NSF) Postdoctoral Fellowship in Biological Informatics (Grant No. 0434670). MWH and PMN were supported by an Indiana University-Purdue University Collaborations in Life Sciences and Informatics Research grant to MWH. MWH was also supported by an NSF Postdoctoral Fellowship in Biological Informatics while he was at UC-Davis. CDJ was supported by NSF grant DEB 0512106. ADK was supported by a Howard Hughes Predoctoral Fellowship. Y-P. Poh was supported by a graduate student fellowship (Academica Sinica) and Grant NSC 9402917-1-007\u2013011. LP and CD were supported by NIH R01-HG02362\u201303 and NSF grant CCF 03\u201347992. Generation of the D. simulans and D. yakuba sequences was supported by grants from the National Human Genome Research Institute. \n\nWe gratefully acknowledge the following members of the Washington University Medical School Genome Sequencing Center: Asif T. Chinwalla, Holland Cordum, Lucinda A. Fulton, Robert S. Fulton, Elaine M. Mardis, Joanne Nelson, John Osborne, Kymberlie H. Pepin, Patrick Minx, John Spieth, Rick Wilson, Jian Xu, Shiaw-Pyng Yang, and especially Sandra W. Clifton for her role in shepherding this project. D. Barbash and H. Lindfors provided assistance in the laboratory at UC-Davis. J. Anderson alerted us to the possibility that the sim4 and sim6 libraries could be cross-contaminated and also provided many useful comments on the paper. We also thank M. Noor, T. Mitchell-Olds, and three anonymous reviewers for comments. \n\nAuthor Contributions: The project was conceived by DJB and CHL. The D. yakuba assembly was produced by LWH. The D. yakuba/D. melanogaster alignment was by CND and LP. KS created the D. simulans assembly used in the analysis; its quality was evaluated by AKH and empirically tested by CDJ. An early version of the D. simulans assembly was by produced by EM. Population genetic analysis was performed by AKH, YPP, CHL, DJB, MWH, PMN, CDJ, KS, and ADK. The paper was written by DJB, AKH, and CHL, with assistance from several co-authors. \n\nAccession Numbers: The GenBank (http://www.ncbi.nlm.nih.gov/Genbank/) accession number for D. yakuba is AAEU01000000 (version 1) and for the D. simulans w501 whole-genome shotgun assembly is TBS-AAEU01000000 (version 1). \n\nThe authors have declared that no competing interests exist.\n\nPublished - journal.pbio.0050310.PDF
Supplemental Material - 151547.zip
", "abstract": "The population genetic perspective is that the processes shaping genomic variation can be revealed only through simultaneous investigation of sequence polymorphism and divergence within and between closely related species. Here we present a population genetic analysis of Drosophila simulans based on whole-genome shotgun sequencing of multiple inbred lines and comparison of the resulting data to genome assemblies of the closely related species, D. melanogaster and D. yakuba. We discovered previously unknown, large-scale fluctuations of polymorphism and divergence along chromosome arms, and significantly less polymorphism and faster divergence on the X chromosome. We generated a comprehensive list of functional elements in the D. simulans genome influenced by adaptive evolution. Finally, we characterized genomic patterns of base composition for coding and noncoding sequence. These results suggest several new hypotheses regarding the genetic and biological mechanisms controlling polymorphism and divergence across the Drosophila genome, and provide a rich resource for the investigation of adaptive evolution and functional variation in D. simulans.", "date": "2007-11", "date_type": "published", "publication": "PLoS Biology", "volume": "5", "number": "11", "publisher": "Public Library of Science", "pagerange": "Art. No. e310", "id_number": "CaltechAUTHORS:20170307-114846354", "issn": "1545-7885", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170307-114846354", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R01 GM071926" }, { "agency": "NIH", "grant_number": "R01HG2107\u20133" }, { "agency": "NIH", "grant_number": "HG02942-01A1" }, { "agency": "NSF Postdoctoral Fellowship", "grant_number": "0434670" }, { "agency": "Indiana University-Purdue University" }, { "agency": "NSF", "grant_number": "DEB-0512106" }, { "agency": "Howard Hughes Medical Institute (HHMI)" }, { "agency": "Academia Sinica" }, { "agency": "National Science Council (Taipei)", "grant_number": "9402917-1-007\u2013011" }, { "agency": "NIH", "grant_number": "R01-HG02362\u201303" }, { "agency": "NSF", "grant_number": "CCF-0347992" }, { "agency": "National Human Genome Research Institute" } ] }, "doi": "10.1371/journal.pbio.0050310", "pmcid": "PMC2062478", "primary_object": { "basename": "151547.zip", "url": "https://authors.library.caltech.edu/records/qj3nn-jb436/files/151547.zip" }, "related_objects": [ { "basename": "journal.pbio.0050310.PDF", "url": "https://authors.library.caltech.edu/records/qj3nn-jb436/files/journal.pbio.0050310.PDF" } ], "resource_type": "article", "pub_year": "2007", "author_list": "Begun, David J.; Holloway, Alisha K.; et el." }, { "id": "https://authors.library.caltech.edu/records/yenc3-ndk14", "eprint_id": 74837, "eprint_status": "archive", "datestamp": "2023-08-19 21:23:06", "lastmod": "2023-10-24 23:21:03", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Huggins-P", "name": { "family": "Huggins", "given": "Peter" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Sturmfels-B", "name": { "family": "Sturmfels", "given": "Bernd" } } ] }, "title": "Towards the Human Genotope", "ispublished": "pub", "full_text_status": "public", "keywords": "ENCODE project \u00b7 Genotope \u00b7 Human variation \u00b7 Polytope \u00b7 Single nucleotide polymorphism", "note": "\u00a9 2007 Society for Mathematical Biology. \n\nReceived: 10 January 2007 / Accepted: 15 June 2007 / Published online: 15 September 2007. \n\nThis work was supported by in part by the Defense Advanced Research Projects Agency (DARPA) under grant HR0011-05-1-0057.\n\nSubmitted - 0611032.pdf
", "abstract": "The human genotope is the convex hull of all allele frequency vectors that can be obtained from the genotypes present in the human population. In this paper, we take a few initial steps toward a description of this object, which may be fundamental for future population based genetics studies. Here we use data from the HapMap Project, restricted to two ENCODE regions, to study a subpolytope of the human genotope. We study three different approaches for obtaining informative low-dimensional projections of this subpolytope. The projections are specified by projection onto few tag SNPs, principal component analysis, and archetypal analysis. We describe the application of our geometric approach to identifying structure in populations based on single nucleotide polymorphisms.", "date": "2007-11", "date_type": "published", "publication": "Bulletin of Mathematical Biology", "volume": "69", "number": "8", "publisher": "Springer", "pagerange": "2723-2735", "id_number": "CaltechAUTHORS:20170307-095818473", "issn": "0092-8240", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170307-095818473", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "Defense Advanced Research Projects Agency (DARPA)", "grant_number": "HR0011-05-1-0057" } ] }, "doi": "10.1007/s11538-007-9244-7", "primary_object": { "basename": "0611032.pdf", "url": "https://authors.library.caltech.edu/records/yenc3-ndk14/files/0611032.pdf" }, "resource_type": "article", "pub_year": "2007", "author_list": "Huggins, Peter; Pachter, Lior; et el." }, { "id": "https://authors.library.caltech.edu/records/znssb-fyb07", "eprint_id": 74835, "eprint_status": "archive", "datestamp": "2023-08-19 21:03:18", "lastmod": "2023-10-24 23:20:55", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Beerenwinkel-N", "name": { "family": "Beerenwinkel", "given": "Niko" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Sturmfels-B", "name": { "family": "Sturmfels", "given": "Bernd" } } ] }, "title": "Epistasis and Shapes of Fitness Landscapes", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2007 Institute of Statistical Science, Academia Sinica. \n\n(Received April 2006; accepted August 2006) \n\nThis paper was inspired by the DARPA workshop on Fitness Landscapes held at Berkeley in February 2006. We thank Sebastian Bonhoeffer, Richard Lenski and Sally Otto for helpful discussions. N.B. was supported by the Deutsche Forschungsgemeinschaft (BE 3217/1-1). L.P. and B.S. were supported by the DARPA program \"Fundamental Laws in Biology\" (HR0011-05-1-0057).\n\nPublished - A17n43.pdf
Submitted - 0603034.pdf
", "abstract": "The relationship between the shape of a fitness landscape and the underlying gene interactions, or epistasis, has been extensively studied in the two-locus case. Gene interactions among multiple loci are usually reduced to two-way interactions. We present a geometric theory of shapes of fitness landscapes for multiple loci. A central concept is the genotope, which is the convex hull of all possible allele frequencies in populations. Triangulations of the genotope correspond to different shapes of fitness landscapes and reveal all the gene interactions. The theory is applied to fitness data from HIV and Drosophila melanogaster. In both cases, our findings refine earlier analyses and reveal previously undetected gene interactions.", "date": "2007-10", "date_type": "published", "publication": "Statistica Sinica", "volume": "17", "number": "4", "publisher": "Institute of Statistical Science, Academia Sinica", "pagerange": "1317-1342", "id_number": "CaltechAUTHORS:20170307-094958898", "issn": "1017-0405", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170307-094958898", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "Deutsche Forschungsgemeinschaft (DFG)", "grant_number": "BE 3217/1-1" }, { "agency": "Defense Advanced Research Projects Agency (DARPA)", "grant_number": "HR0011-05-1-0057" } ] }, "doi": "10.48550/arXiv.0603034", "primary_object": { "basename": "0603034.pdf", "url": "https://authors.library.caltech.edu/records/znssb-fyb07/files/0603034.pdf" }, "related_objects": [ { "basename": "A17n43.pdf", "url": "https://authors.library.caltech.edu/records/znssb-fyb07/files/A17n43.pdf" } ], "resource_type": "article", "pub_year": "2007", "author_list": "Beerenwinkel, Niko; Pachter, Lior; et el." }, { "id": "https://authors.library.caltech.edu/records/47a83-v6n34", "eprint_id": 74813, "eprint_status": "archive", "datestamp": "2023-08-22 09:40:25", "lastmod": "2023-10-24 23:18:47", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Morton-J", "name": { "family": "Morton", "given": "Jason" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Shiu-Anne", "name": { "family": "Shiu", "given": "Anne" } }, { "id": "Sturmfels-B", "name": { "family": "Sturmfels", "given": "Bernd" } } ] }, "title": "The Cyclohedron Test for Finding Periodic Genes in Time Course Expression Studies", "ispublished": "pub", "full_text_status": "public", "keywords": "time course microarray; biological clock; somitogenesis; multiple testing", "note": "\u00a9 2007 by Walter de Gruyter GmbH. \n\nWe are grateful to Mary-Lee Dequ\u00e9ant and Olivier Pourqui\u00e9 for many helpful discussions, and to Mary-Lee for preparing the data for our use. We thank Oliver Wienand for helping us implement the cyclohedron test. The collaboration was facilitated by the DARPA Fundamental Laws of Biology Program which supported our research. Anne Shiu was supported by a Lucent Technologies Bell Labs Graduate Research Fellowship. We thank an anonymous referee for helpful comments.\n\nSubmitted - 0702049.pdf
", "abstract": "The problem of finding periodically expressed genes from time course microarray experiments is at the center of numerous efforts to identify the molecular components of biological clocks. We present a new approach to this problem based on the cyclohedron test, which is a rank test inspired by recent advances in algebraic combinatorics. The test has the advantage of being robust to measurement errors, and can be used to ascertain the significance of top-ranked genes. We apply the test to recently published measurements of gene expression during mouse somitogenesis and find 32 genes that collectively are significant. Among these are previously identified periodic genes involved in the Notch/FGF and Wnt signaling pathways, as well as novel candidate genes that may play a role in regulating the segmentation clock. These results confirm that there are an abundance of exceptionally periodic genes expressed during somitogenesis. The emphasis of this paper is on the statistics and combinatorics that underlie the cyclohedron test and its implementation within a multiple testing framework.", "date": "2007-08", "date_type": "published", "publication": "Statistical Applications in Genetics and Molecular Biology", "volume": "6", "publisher": "De Gruyter", "pagerange": "Art. No. 21", "id_number": "CaltechAUTHORS:20170306-154012533", "issn": "2194-6302", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170306-154012533", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "Defense Advanced Research Projects Agency (DARPA)" }, { "agency": "Lucent Technologies Bell Labs" } ] }, "doi": "10.2202/1544-6115.1286", "primary_object": { "basename": "0702049.pdf", "url": "https://authors.library.caltech.edu/records/47a83-v6n34/files/0702049.pdf" }, "resource_type": "article", "pub_year": "2007", "author_list": "Morton, Jason; Pachter, Lior; et el." }, { "id": "https://authors.library.caltech.edu/records/q89db-2cr95", "eprint_id": 74857, "eprint_status": "archive", "datestamp": "2023-08-22 09:26:38", "lastmod": "2023-10-24 23:22:04", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Chatterji-S", "name": { "family": "Chatterji", "given": "Sourav" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Patterns of gene duplication and intron loss in the ENCODE regions suggest a confounding factor", "ispublished": "pub", "full_text_status": "public", "keywords": "Gene evolution; Exon\u2013intron structure; Intron loss; Gene duplication; Gene structure evolution; Mammalian genome evolution; ENCODE", "note": "\u00a9 2007 Elsevier. \n\nReceived 29 November 2006, Accepted 22 March 2007, Available online 11 May 2007. \n\nWe thank Colin Dewey for providing Mercator maps of the ENCODE regions. We also thank the GENCODE and HAVANA teams for organizing the EGASP workshop during which we began work on this project. S.C. and L.P. were partially funded by NIH Grants R01:HG02632-1 and U01:HG003150-01.\n\nAccepted Version - nihms26716.pdf
Supplemental Material - mmc1.txt
Supplemental Material - mmc2.txt
", "abstract": "The exon\u2013intron structure of eukaryotic genes allows for phenomena such as alternative splicing, nonsense-mediated decay, and regulation through untranslated regions. However, the evolution of the exon structure of genes is not well elucidated because of limited and phylogenetically sparse data sets. In this study, we use the phylogenetically diverse sequencing of the ENCODE regions to study gene structure evolution in mammalian genomes. This first phylogenetically diverse study of gene structure changes offers insights into the mode and tempo of mammalian gene structure evolution. The genes undergoing structure changes appear to be moderately to highly expressed in germline cells and show levels of selection similar to those of other ENCODE genes. Patterns of gene duplication of the affected genes are more complex than expected. The number of sampled genomes is sufficiently dense to infer that certain gene duplications happened after intron loss. Thus, although gene duplication is highly correlated with intron loss, we conclude that structural changes in genes are not necessarily due to a loss of constraint following gene duplication as previously suggested.", "date": "2007-07", "date_type": "published", "publication": "Genomics", "volume": "90", "number": "1", "publisher": "Elsevier", "pagerange": "44-48", "id_number": "CaltechAUTHORS:20170307-130932831", "issn": "0888-7543", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170307-130932831", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R01 HG02632-1" }, { "agency": "NIH", "grant_number": "U01 HG003150-01" } ] }, "doi": "10.1016/j.ygeno.2007.03.008", "pmcid": "PMC2034525", "primary_object": { "basename": "mmc1.txt", "url": "https://authors.library.caltech.edu/records/q89db-2cr95/files/mmc1.txt" }, "related_objects": [ { "basename": "mmc2.txt", "url": "https://authors.library.caltech.edu/records/q89db-2cr95/files/mmc2.txt" }, { "basename": "nihms26716.pdf", "url": "https://authors.library.caltech.edu/records/q89db-2cr95/files/nihms26716.pdf" } ], "resource_type": "article", "pub_year": "2007", "author_list": "Chatterji, Sourav and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/0dpmy-1bc16", "eprint_id": 74855, "eprint_status": "archive", "datestamp": "2023-08-19 20:28:08", "lastmod": "2023-10-24 23:21:56", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Birney-E", "name": { "family": "Birney", "given": "Ewan" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2007 Macmillan Publishers Limited. \n\nReceived 2 March 2007; Accepted 23 April 2007. \n\nWe thank D. Leja for providing graphical expertise and support. Funding support is acknowledged from the following sources: National Institutes of Health, The European Union BioSapiens NoE, Affymetrix, Swiss National Science Foundation, the Spanish Ministerio de Educaci\u00f3n y Ciencia, Spanish Ministry of Education and Science, CIBERESP, Genome Spain and Generalitat de Catalunya, Ministry of Education, Culture, Sports, Science and Technology of Japan, the NCCR Frontiers in Genetics, the J\u00e9r\u00f4me Lejeune Foundation, the Childcare Foundation, the Novartis Foundations, the Danish Research Council, the Swedish Research Council, the Knut and Alice Wallenberg Foundation, the Wellcome Trust, the Howard Hughes Medical Institute, the Bio-X Institute, the RIKEN Institute, the US Army, National Science Foundation, the Deutsche Forschungsgemeinschaft, the Austrian Gen-AU program, the BBSRC and The European Molecular Biology Laboratory. We thank the Barcelona SuperComputing Center and the NIH Biowulf cluster for computer facilities. The Consortium thanks the ENCODE Scientific Advisory Panel for their advice on the project: G. Weinstock, M. Cherry, G. Churchill, M. Eisen, S. Elgin, J. Lis, J. Rine, M. Vidal and P. Zamore. \n\nThe author declares no competing financial interests.\n\nAccepted Version - nihms27513.pdf
Supplemental Material - nature05874-s1.pdf
Supplemental Material - nature05874-s2.pdf
", "abstract": "We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.", "date": "2007-06-14", "date_type": "published", "publication": "Nature", "volume": "447", "number": "7146", "publisher": "Nature Publishing Group", "pagerange": "799-816", "id_number": "CaltechAUTHORS:20170307-121815071", "issn": "0028-0836", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170307-121815071", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "corp_creators": { "items": [ "ENCODE Project Consortium" ] }, "doi": "10.1038/nature05874", "pmcid": "PMC2212820", "primary_object": { "basename": "nihms27513.pdf", "url": "https://authors.library.caltech.edu/records/0dpmy-1bc16/files/nihms27513.pdf" }, "related_objects": [ { "basename": "nature05874-s1.pdf", "url": "https://authors.library.caltech.edu/records/0dpmy-1bc16/files/nature05874-s1.pdf" }, { "basename": "nature05874-s2.pdf", "url": "https://authors.library.caltech.edu/records/0dpmy-1bc16/files/nature05874-s2.pdf" } ], "resource_type": "article", "pub_year": "2007", "author_list": "Birney, Ewan and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/jz1bx-4kd75", "eprint_id": 74856, "eprint_status": "archive", "datestamp": "2023-08-19 20:22:39", "lastmod": "2023-10-24 23:21:59", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Margulies-E-H", "name": { "family": "Margulies", "given": "Elliott H." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2007, Cold Spring Harbor Laboratory Press. Freely available online through the Genome Research Open Access option. The Authors acknowledge that six months after the full-issue publication date, the Article will be distributed under a Creative Commons CC-BY-NC License (Attribution-NonCommercial 4.0 International License, http://creativecommons.org/licenses/by-nc/4.0/). \n\nReceived October 12, 2006. Accepted February 15, 2007. \n\nWe thank F. Collins for critical review of the manuscript; all other ENCODE analysis subgroups for their camaraderie and collaboration; P. Good, E. Feingold, and L. Liefer for ENCODE Consortium guidance and administrative assistance; the Wellcome Trust Sanger Institute, the Max Planck Institute for Developmental Biology, and The Netherlands Institute for Developmental Biology for providing a draft zebrafish genome sequence prior to publication; the DOE Joint Genome Institute for providing a draft Xenopus sequence prior to publication; G. Schuler for making ENCODE comparative sequence data available at NCBI; D. Church for coordinating the identification of finished mouse sequence orthologous to ENCODE regions; and the anonymous reviewers of this manuscript for their constructive comments on previous drafts. This research was supported in part by the Intramural Research Program of the National Human Genome Research Institute, National Institutes of Health (E.H.M., J.C.M., and E.D.G.). G.M.C. was a Howard Hughes Medical Institute pre-doctoral Fellow. G.A. is a Bio-X Graduate Student Fellow. D.J.T. is supported by NIH 1 P41 HG02371-05. C.N.D. is supported in part by NIH HG003150. M.H., J.T., and W.M. are supported in part by R01:HG002238. T.M. was supported by BBSRC grant 721/BEP17055. I.H. was funded in part by NIH/NHGRI grant 1R01GM076705-01. S.E.A., S.N., and J.I.M. thank the \"Vital IT\" computational platform and are supported by grants from NIH ENCODE, Swiss National Science Foundation, European Union, and the ChildCare Foundation. L.P. is supported in part by R01:HG02632 and U01:HG003150. N.G. was supported in part by the Wellcome Trust. D.H. and A. Sidow are supported by funds from NHGRI. A. Siepel was supported by the UCBREP GREAT fellowship (University of California Biotechnology Research and Education Program Graduate Research and Education in Adaptive Biotechnology).\n\nPublished - Genome_Res.-2007-Margulies-760-74.pdf
Supplemental Material - margulies_figS1.pdf
Supplemental Material - margulies_figS2.pdf
Supplemental Material - margulies_newGR_supplement_revised_revised.pdf
", "abstract": "A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation, alignment, and evolutionary constraint analyses of 23 mammalian species for all ENCODE targets. Alignments were generated using four different methods; comparisons of these methods reveal large-scale consistency but substantial differences in terms of small genomic rearrangements, sensitivity (sequence coverage), and specificity (alignment accuracy). We describe the quantitative and qualitative trade-offs concomitant with alignment method choice and the levels of technical error that need to be accounted for in applications that require multisequence alignments. Using the generated alignments, we identified constrained regions using three different methods. While the different constraint-detecting methods are in general agreement, there are important discrepancies relating to both the underlying alignments and the specific algorithms. However, by integrating the results across the alignments and constraint-detecting methods, we produced constraint annotations that were found to be robust based on multiple independent measures. Analyses of these annotations illustrate that most classes of experimentally annotated functional elements are enriched for constrained sequences; however, large portions of each class (with the exception of protein-coding sequences) do not overlap constrained regions. The latter elements might not be under primary sequence constraint, might not be constrained across all mammals, or might have expendable molecular functions. Conversely, 40% of the constrained sequences do not overlap any of the functional elements that have been experimentally identified. Together, these findings demonstrate and quantify how many genomic functional elements await basic molecular characterization.", "date": "2007-06", "date_type": "published", "publication": "Genome Research", "volume": "17", "number": "6", "publisher": "Cold Spring Harbor Laboratory Press", "pagerange": "760-774", "id_number": "CaltechAUTHORS:20170307-125526948", "issn": "1088-9051", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170307-125526948", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "National Human Genome Research Institute" }, { "agency": "Howard Hughes Medical Institute (HHMI)" }, { "agency": "Bio-X Fellowship" }, { "agency": "NIH", "grant_number": "1 P41 HG02371-05" }, { "agency": "NIH", "grant_number": "HG003150" }, { "agency": "NIH", "grant_number": "R01 HG002238" }, { "agency": "Biotechnology and Biological Sciences Research Council (BBSRC)", "grant_number": "721/BEP17055" }, { "agency": "NIH", "grant_number": "1R01GM076705-01" }, { "agency": "Swiss National Science Foundation (SNSF)" }, { "agency": "European Union" }, { "agency": "ChildCare Foundation" }, { "agency": "NIH", "grant_number": "R01 HG02632" }, { "agency": "NIH", "grant_number": "U01 HG003150" }, { "agency": "Wellcome Trust" }, { "agency": "University of California" } ] }, "corp_creators": { "items": [ "ENCODE Project Consortium" ] }, "doi": "10.1101/gr.6034307", "pmcid": "PMC1891336", "primary_object": { "basename": "margulies_figS2.pdf", "url": "https://authors.library.caltech.edu/records/jz1bx-4kd75/files/margulies_figS2.pdf" }, "related_objects": [ { "basename": "margulies_newGR_supplement_revised_revised.pdf", "url": "https://authors.library.caltech.edu/records/jz1bx-4kd75/files/margulies_newGR_supplement_revised_revised.pdf" }, { "basename": "Genome_Res.-2007-Margulies-760-74.pdf", "url": "https://authors.library.caltech.edu/records/jz1bx-4kd75/files/Genome_Res.-2007-Margulies-760-74.pdf" }, { "basename": "margulies_figS1.pdf", "url": "https://authors.library.caltech.edu/records/jz1bx-4kd75/files/margulies_figS1.pdf" } ], "resource_type": "article", "pub_year": "2007", "author_list": "Margulies, Elliott H. and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/tmz64-b0883", "eprint_id": 74858, "eprint_status": "archive", "datestamp": "2023-08-19 20:22:46", "lastmod": "2023-10-24 23:22:07", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Interpreting the unculturable majority", "ispublished": "pub", "full_text_status": "restricted", "note": "\u00a9 2007 Nature Publishing Group. \n\nThe author declares no competing financial interests.", "abstract": "New methods are necessary for the analysis and interpretation of massive amounts of metagenomic data.", "date": "2007-06", "date_type": "published", "publication": "Nature Methods", "volume": "4", "number": "6", "publisher": "Nature Publishing Group", "pagerange": "479-480", "id_number": "CaltechAUTHORS:20170307-131612759", "issn": "1548-7091", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170307-131612759", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "doi": "10.1038/nmeth0607-479", "resource_type": "article", "pub_year": "2007", "author_list": "Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/1wmws-xht65", "eprint_id": 74859, "eprint_status": "archive", "datestamp": "2023-08-19 20:02:22", "lastmod": "2023-10-24 23:22:13", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Beerenwinkel-N", "name": { "family": "Beerenwinkel", "given": "Niko" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Sturmfels-B", "name": { "family": "Sturmfels", "given": "Bernd" } }, { "id": "Elena-S-F", "name": { "family": "Elena", "given": "Santiago F." }, "orcid": "0000-0001-8249-5593" }, { "id": "Lenski-R-E", "name": { "family": "Lenski", "given": "Richard E." }, "orcid": "0000-0002-1064-8375" } ] }, "title": "Analysis of epistatic interactions and fitness landscapes using a new geometric approach", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 Beerenwinkel et al; licensee BioMed Central Ltd. 2007. This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. \n\nReceived: 17 October 2006. Accepted: 13 April 2007. Published: 13 April 2007. \n\nThis work was supported by the \"FunBio\" grant from DARPA to Simon Levin (Princeton University). We thank Simon Levin and Ben Mann (DARPA) for facilitating this math-bio collaboration, Peter Bates and Charles Ofria for helpful discussions, Peter Malkin for an improved program for computing circuits, Mike Stillman for writing and improving the Macaulay 2 code, and three anonymous reviewers for suggestions. Collection of the dataset used in this study was funded by a fellowship from the Spanish MEC to S.F.E. and a grant from the NSF to R.E.L. N.B. was funded by a grant from the Bill & Melinda Gates foundation through the Grand Challenges in Global Health Initiative.\n\nPublished - art_3A10.1186_2F1471-2148-7-60.pdf
Supplemental Material - 12862_2006_354_MOESM10_ESM.pdf
Supplemental Material - 12862_2006_354_MOESM1_ESM.pdf
Supplemental Material - 12862_2006_354_MOESM2_ESM.m2
Supplemental Material - 12862_2006_354_MOESM3_ESM.zip
Supplemental Material - 12862_2006_354_MOESM4_ESM.txt
Supplemental Material - 12862_2006_354_MOESM5_ESM.txt
Supplemental Material - 12862_2006_354_MOESM6_ESM.pdf
Supplemental Material - 12862_2006_354_MOESM7_ESM.pdf
Supplemental Material - 12862_2006_354_MOESM8_ESM.pdf
Supplemental Material - 12862_2006_354_MOESM9_ESM.pdf
", "abstract": "Background: Understanding interactions between mutations and how they affect fitness is a central problem in evolutionary biology that bears on such fundamental issues as the structure of fitness landscapes and the evolution of sex. To date, analyses of fitness landscapes have focused either on the overall directional curvature of the fitness landscape or on the distribution of pairwise interactions. In this paper, we propose and employ a new mathematical approach that allows a more complete description of multi-way interactions and provides new insights into the structure of fitness landscapes. \n\nResults: We apply the mathematical theory of gene interactions developed by Beerenwinkel et al. to a fitness landscape for Escherichia coli obtained by Elena and Lenski. The genotypes were constructed by introducing nine mutations into a wild-type strain and constructing a restricted set of 27 double mutants. Despite the absence of mutants higher than second order, our analysis of this genotypic space points to previously unappreciated gene interactions, in addition to the standard pairwise epistasis. Our analysis confirms Elena and Lenski's inference that the fitness landscape is complex, so that an overall measure of curvature obscures a diversity of interaction types. We also demonstrate that some mutations contribute disproportionately to this complexity. In particular, some mutations are systematically better than others at mixing with other mutations. We also find a strong correlation between epistasis and the average fitness loss caused by deleterious mutations. In particular, the epistatic deviations from multiplicative expectations tend toward more positive values in the context of more deleterious mutations, emphasizing that pairwise epistasis is a local property of the fitness landscape. Finally, we determine the geometry of the fitness landscape, which reflects many of these biologically interesting features. \n\nConclusion: A full description of complex fitness landscapes requires more information than the average curvature or the distribution of independent pairwise interactions. We have proposed a mathematical approach that, in principle, allows a complete description and, in practice, can suggest new insights into the structure of real fitness landscapes. Our analysis emphasizes the value of non-independent genotypes for these inferences.", "date": "2007-04-13", "date_type": "published", "publication": "BMC Evolutionary Biology", "volume": "7", "publisher": "BioMed Central", "pagerange": "Art. No. 60", "id_number": "CaltechAUTHORS:20170307-131928305", "issn": "1471-2148", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170307-131928305", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "Defense Advanced Research Projects Agency (DARPA)" }, { "agency": "Ministerio de Educaci\u00f3n y Ciencia (MEC)" }, { "agency": "NSF" }, { "agency": "Bill and Melinda Gates Foundation" } ] }, "doi": "10.1186/1471-2148-7-60", "pmcid": "PMC1865543", "primary_object": { "basename": "12862_2006_354_MOESM10_ESM.pdf", "url": "https://authors.library.caltech.edu/records/1wmws-xht65/files/12862_2006_354_MOESM10_ESM.pdf" }, "related_objects": [ { "basename": "12862_2006_354_MOESM2_ESM.m2", "url": "https://authors.library.caltech.edu/records/1wmws-xht65/files/12862_2006_354_MOESM2_ESM.m2" }, { "basename": "12862_2006_354_MOESM9_ESM.pdf", "url": "https://authors.library.caltech.edu/records/1wmws-xht65/files/12862_2006_354_MOESM9_ESM.pdf" }, { "basename": "12862_2006_354_MOESM1_ESM.pdf", "url": "https://authors.library.caltech.edu/records/1wmws-xht65/files/12862_2006_354_MOESM1_ESM.pdf" }, { "basename": "12862_2006_354_MOESM3_ESM.zip", "url": "https://authors.library.caltech.edu/records/1wmws-xht65/files/12862_2006_354_MOESM3_ESM.zip" }, { "basename": "12862_2006_354_MOESM4_ESM.txt", "url": "https://authors.library.caltech.edu/records/1wmws-xht65/files/12862_2006_354_MOESM4_ESM.txt" }, { "basename": "12862_2006_354_MOESM5_ESM.txt", "url": "https://authors.library.caltech.edu/records/1wmws-xht65/files/12862_2006_354_MOESM5_ESM.txt" }, { "basename": "12862_2006_354_MOESM6_ESM.pdf", "url": "https://authors.library.caltech.edu/records/1wmws-xht65/files/12862_2006_354_MOESM6_ESM.pdf" }, { "basename": "12862_2006_354_MOESM7_ESM.pdf", "url": "https://authors.library.caltech.edu/records/1wmws-xht65/files/12862_2006_354_MOESM7_ESM.pdf" }, { "basename": "12862_2006_354_MOESM8_ESM.pdf", "url": "https://authors.library.caltech.edu/records/1wmws-xht65/files/12862_2006_354_MOESM8_ESM.pdf" }, { "basename": "art_3A10.1186_2F1471-2148-7-60.pdf", "url": "https://authors.library.caltech.edu/records/1wmws-xht65/files/art_3A10.1186_2F1471-2148-7-60.pdf" } ], "resource_type": "article", "pub_year": "2007", "author_list": "Beerenwinkel, Niko; Pachter, Lior; et el." }, { "id": "https://authors.library.caltech.edu/records/737t0-a8613", "eprint_id": 74830, "eprint_status": "archive", "datestamp": "2023-08-19 19:35:24", "lastmod": "2023-10-24 23:20:40", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Sturmfels-B", "name": { "family": "Sturmfels", "given": "Bernd" } } ] }, "title": "The Mathematics of Phylogenomics", "ispublished": "pub", "full_text_status": "public", "keywords": "genomics, phylogenetics, genetic code, algebraic statistics, hidden Markov model, sequence alignment, ultraconservation", "note": "\u00a9 2007 Society for Industrial and Applied Mathematics. \n\nReceived by the editors May 29, 2005; accepted for publication (in revised form) September 30, 2005; published electronically January 30, 2007. \n\nThe first author was supported by a grant from the NIH (R01-\nHG2362-3), a Sloan Foundation Research Fellowship, and an NSF CAREER award (CCF-0347992). The second author was supported by the NSF (DMS-0200729, DMS-0456960). \n\nThe vertebrate whole genome alignments we have analyzed were assembled by Nicolas Bray and Colin Dewey. We also thank Sourav Chatterji and Von Bing Yap for their help in searching through the alignments.\n\nPublished - 050632634.pdf
Submitted - 0409132.pdf
", "abstract": "The grand challenges in biology today are being shaped by powerful high\u2010throughput technologies that have revealed the genomes of many organisms, global expression patterns of genes, and detailed information about variation within populations. We are therefore able to ask, for the first time, fundamental questions about the evolution of genomes, the structure of genes and their regulation, and the connections between genotypes and phenotypes of individuals. The answers to these questions are all predicated on progress in a variety of computational, statistical, and mathematical fields. The rapid growth in the characterization of genomes has led to the advancement of a new discipline called phylogenomics. This discipline results from the combination of two major fields in the life sciences: genomics, i.e., the study of the function and structure of genes and genomes; and molecular phylogenetics, i.e., the study of the hierarchical evolutionary relationships among organisms and their genomes. The objective of this article is to offer mathematicians a first introduction to this emerging field, and to discuss specific mathematical problems and developments arising from phylogenomics.", "date": "2007-01-30", "date_type": "published", "publication": "SIAM Review", "volume": "49", "number": "1", "publisher": "Society for Industrial and Applied Mathematics", "pagerange": "3-31", "id_number": "CaltechAUTHORS:20170307-085456375", "issn": "0036-1445", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170307-085456375", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R01-HG2362-3" }, { "agency": "Alfred P. Sloan Foundation" }, { "agency": "NSF", "grant_number": "CCF-0347992" }, { "agency": "NSF", "grant_number": "DMS-0200729" }, { "agency": "NSF", "grant_number": "DMS-0456960" } ] }, "doi": "10.1137/050632634", "primary_object": { "basename": "0409132.pdf", "url": "https://authors.library.caltech.edu/records/737t0-a8613/files/0409132.pdf" }, "related_objects": [ { "basename": "050632634.pdf", "url": "https://authors.library.caltech.edu/records/737t0-a8613/files/050632634.pdf" } ], "resource_type": "article", "pub_year": "2007", "author_list": "Pachter, Lior and Sturmfels, Bernd" }, { "id": "https://authors.library.caltech.edu/records/pmyqk-7z667", "eprint_id": 74861, "eprint_status": "archive", "datestamp": "2023-08-19 19:32:30", "lastmod": "2023-10-24 23:22:18", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Schwartz-A-S", "name": { "family": "Schwartz", "given": "Ariel S." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Multiple alignment by sequence annealing", "ispublished": "pub", "full_text_status": "restricted", "note": "\u00a9 The Author 2006. Published by Oxford University Press. \n\nAriel Schwartz was supported by NSF grant EF 03-31494. Lior Pachter was supported by NIH grant R01HG2362 and NSF grant CCF0347992.", "abstract": "MOTIVATION: We introduce a novel approach to multiple alignment that is based on an algorithm for rapidly checking whether single matches are consistent with a partial multiple alignment. This leads to a sequence annealing algorithm, which is an incremental method for building multiple sequence alignments one match at a time. Our approach improves significantly on the standard progressive alignment approach to multiple alignment. \n\nRESULTS: The sequence annealing algorithm performs well on benchmark test sets of protein sequences. It is not only sensitive, but also specific, drastically reducing the number of incorrectly aligned residues in comparison to other programs. The method allows for adjustment of the sensitivity/specificity tradeoff and can be used to reliably identify homologous regions among protein sequences. \n\nAVAILABILITY: An implementation of the sequence annealing algorithm is available at http://bio.math.berkeley.edu/amap/", "date": "2007-01-15", "date_type": "published", "publication": "Bioinformatics", "volume": "23", "number": "2", "publisher": "Oxford University Press", "pagerange": "e24-e29", "id_number": "CaltechAUTHORS:20170307-134453947", "issn": "1367-4803", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170307-134453947", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NSF", "grant_number": "EF-0331494" }, { "agency": "NIH", "grant_number": "R01HG2362" }, { "agency": "NSF", "grant_number": "CCF-0347992" } ] }, "doi": "10.1093/bioinformatics/btl311", "resource_type": "article", "pub_year": "2007", "author_list": "Schwartz, Ariel S. and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/kk2y6-qk074", "eprint_id": 74833, "eprint_status": "archive", "datestamp": "2023-08-19 17:57:05", "lastmod": "2023-10-24 23:20:48", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Dewey-C-N", "name": { "family": "Dewey", "given": "Colin N." } }, { "id": "Huggins-P-M", "name": { "family": "Huggins", "given": "Peter M." } }, { "id": "Woods-K", "name": { "family": "Woods", "given": "Kevin" } }, { "id": "Sturmfels-B", "name": { "family": "Sturmfels", "given": "Bernd" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Parametric Alignment of Drosophila Genomes", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2006 Dewey et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. \n\nReceived: December 7, 2005; Accepted: May 10, 2006; Published: June 23, 2006. \n\nCND was supported by the NIH (HG003150), PMH was supported by an ARCS Foundation fellowship, and KW was supported by the NSF (DMS-040214). BS was supported by the NSF (DMS-0456960), and LP was supported by the NIH (R01-HG2362-3 and HG003150) and an NSF CAREER award (CCF-0347992). \n\nAuthor Contributions: CND, PMH, KW, BS, and LP conceived and designed the experiments. CND and PMH performed the experiments. CND, PMH, KW, BS, and LP analyzed the data. CND, PMH, KW, BS, and LP wrote the paper. \n\nThe authors have declared that no competing interests exist.\n\nPublished - journal.pcbi.0020073.PDF
Submitted - 0512008.pdf
", "abstract": "The classic algorithms of Needleman\u2013Wunsch and Smith\u2013Waterman find a maximum a posteriori probability alignment for a pair hidden Markov model (PHMM). To process large genomes that have undergone complex genome rearrangements, almost all existing whole genome alignment methods apply fast heuristics to divide genomes into small pieces that are suitable for Needleman\u2013Wunsch alignment. In these alignment methods, it is standard practice to fix the parameters and to produce a single alignment for subsequent analysis by biologists. As the number of alignment programs applied on a whole genome scale continues to increase, so does the disagreement in their results. The alignments produced by different programs vary greatly, especially in non-coding regions of eukaryotic genomes where the biologically correct alignment is hard to find. Parametric alignment is one possible remedy. This methodology resolves the issue of robustness to changes in parameters by finding all optimal alignments for all possible parameters in a PHMM. Our main result is the construction of a whole genome parametric alignment of Drosophila melanogaster and Drosophila pseudoobscura. This alignment draws on existing heuristics for dividing whole genomes into small pieces for alignment, and it relies on advances we have made in computing convex polytopes that allow us to parametrically align non-coding regions using biologically realistic models. We demonstrate the utility of our parametric alignment for biological inference by showing that cis-regulatory elements are more conserved between Drosophila melanogaster and Drosophila pseudoobscura than previously thought. We also show how whole genome parametric alignment can be used to quantitatively assess the dependence of branch length estimates on alignment parameters.", "date": "2006-06", "date_type": "published", "publication": "PLoS Computational Biology", "volume": "2", "number": "6", "publisher": "Public Library of Science", "pagerange": "Art. No. e73", "id_number": "CaltechAUTHORS:20170307-090954418", "issn": "1553-734X", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170307-090954418", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "HG003150" }, { "agency": "ARCS Foundation" }, { "agency": "NSF", "grant_number": "DMS-040214" }, { "agency": "NSF", "grant_number": "DMS-0456960" }, { "agency": "NIH", "grant_number": "R01-HG2362-3" }, { "agency": "NIH", "grant_number": "HG003150" }, { "agency": "NSF", "grant_number": "CCF-0347992" } ] }, "doi": "10.1371/journal.pcbi.0020073", "pmcid": "PMC1480539", "primary_object": { "basename": "journal.pcbi.0020073.PDF", "url": "https://authors.library.caltech.edu/records/kk2y6-qk074/files/journal.pcbi.0020073.PDF" }, "related_objects": [ { "basename": "0512008.pdf", "url": "https://authors.library.caltech.edu/records/kk2y6-qk074/files/0512008.pdf" } ], "resource_type": "article", "pub_year": "2006", "author_list": "Dewey, Colin N.; Huggins, Peter M.; et el." }, { "id": "https://authors.library.caltech.edu/records/9rzxt-bad89", "eprint_id": 74872, "eprint_status": "archive", "datestamp": "2023-08-19 17:41:41", "lastmod": "2023-10-24 23:45:11", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Dewey-C-N", "name": { "family": "Dewey", "given": "Colin N." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Evolution at the nucleotide level: the problem of multiple whole-genome alignment", "ispublished": "pub", "full_text_status": "restricted", "note": "\u00a9 The Author 2006. Published by Oxford University Press. \n\nReceived February 1, 2006; Revised and Accepted March 9, 2006. \n\nC.N.D. was supported by the NIH (HG003150). L.P. was supported by the NIH (R01-HG2362-3 and HG003150) and an NSF CAREER award (CCF-0347992). \n\nConflict of Interest statement. None declared.", "abstract": "With the genome sequences of numerous species at hand, we have the opportunity to discover how evolution has acted at each and every nucleotide in our genome. To this end, we must identify sets of nucleotides that have descended from a common ancestral nucleotide. The problem of identifying evolutionary-related nucleotides is that of sequence alignment. When the sequences under consideration are entire genomes, we have the problem of multiple whole-genome alignment. In this paper, we first state a series of definitions for homology and its subrelations between single nucleotides. Within this framework, we review the current methods available for the alignment of multiple large genomes. We then describe a subset of tools that make biological inferences from multiple whole-genome alignments.", "date": "2006-04-15", "date_type": "published", "publication": "Human Molecular Genetics", "volume": "15", "number": "Suppl. 1", "publisher": "Oxford University Press", "pagerange": "R51-R56", "id_number": "CaltechAUTHORS:20170307-162320251", "issn": "0964-6906", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170307-162320251", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "HG003150" }, { "agency": "NIH", "grant_number": "R01-HG2362-3" }, { "agency": "NSF", "grant_number": "CCF-0347992" } ] }, "doi": "10.1093/hmg/ddl056", "resource_type": "article", "pub_year": "2006", "author_list": "Dewey, Colin N. and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/4g2cj-jb389", "eprint_id": 74873, "eprint_status": "archive", "datestamp": "2023-08-19 17:40:51", "lastmod": "2023-10-24 23:45:14", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Chatterji-S", "name": { "family": "Chatterji", "given": "Sourav" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Reference based annotation with GeneMapper", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 Chatterji and Pachter; licensee BioMed Central Ltd. 2006. This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. \n\nReceived: 24 November 2005. Accepted: 3 March 2006. Published: 5 April 2006. \n\nWe thank Colin Dewey and Narayanan Manikandan for their helpful suggestions and comments. The work was partially funded by NIH grants R01:HG02632-1 and U01:HG003150-01.\n\nPublished - art_3A10.1186_2Fgb-2006-7-4-r29.pdf
Supplemental Material - 13059_2005_1297_MOESM1_ESM.tgz
Supplemental Material - 13059_2005_1297_MOESM2_ESM.tgz
Supplemental Material - 13059_2005_1297_MOESM3_ESM.tgz
Supplemental Material - 13059_2005_1297_MOESM4_ESM.eps
Supplemental Material - 13059_2005_1297_MOESM5_ESM.eps
Supplemental Material - 13059_2005_1297_MOESM6_ESM.eps
Supplemental Material - 13059_2005_1297_MOESM7_ESM.eps
", "abstract": "We introduce GeneMapper, a program for transferring annotations from a well annotated genome to other genomes. Drawing on high quality curated annotations, GeneMapper enables rapid and accurate annotation of newly sequenced genomes and is suitable for both finished and draft genomes. GeneMapper uses a profile based approach for mapping genes into multiple species, improving upon the standard pairwise approach. GeneMapper is freely available for academic use.", "date": "2006-04-05", "date_type": "published", "publication": "Genome Biology", "volume": "7", "number": "4", "publisher": "BioMed Central", "pagerange": "Art. No. R29", "id_number": "CaltechAUTHORS:20170307-162728814", "issn": "1465-6906", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170307-162728814", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R01 HG02632-1" }, { "agency": "NIH", "grant_number": "U01 HG003150-01" } ] }, "doi": "10.1186/gb-2006-7-4-r29", "pmcid": "PMC1557983", "primary_object": { "basename": "art_3A10.1186_2Fgb-2006-7-4-r29.pdf", "url": "https://authors.library.caltech.edu/records/4g2cj-jb389/files/art_3A10.1186_2Fgb-2006-7-4-r29.pdf" }, "related_objects": [ { "basename": "13059_2005_1297_MOESM1_ESM.tgz", "url": "https://authors.library.caltech.edu/records/4g2cj-jb389/files/13059_2005_1297_MOESM1_ESM.tgz" }, { "basename": "13059_2005_1297_MOESM2_ESM.tgz", "url": "https://authors.library.caltech.edu/records/4g2cj-jb389/files/13059_2005_1297_MOESM2_ESM.tgz" }, { "basename": "13059_2005_1297_MOESM3_ESM.tgz", "url": "https://authors.library.caltech.edu/records/4g2cj-jb389/files/13059_2005_1297_MOESM3_ESM.tgz" }, { "basename": "13059_2005_1297_MOESM4_ESM.eps", "url": "https://authors.library.caltech.edu/records/4g2cj-jb389/files/13059_2005_1297_MOESM4_ESM.eps" }, { "basename": "13059_2005_1297_MOESM5_ESM.eps", "url": "https://authors.library.caltech.edu/records/4g2cj-jb389/files/13059_2005_1297_MOESM5_ESM.eps" }, { "basename": "13059_2005_1297_MOESM6_ESM.eps", "url": "https://authors.library.caltech.edu/records/4g2cj-jb389/files/13059_2005_1297_MOESM6_ESM.eps" }, { "basename": "13059_2005_1297_MOESM7_ESM.eps", "url": "https://authors.library.caltech.edu/records/4g2cj-jb389/files/13059_2005_1297_MOESM7_ESM.eps" } ], "resource_type": "article", "pub_year": "2006", "author_list": "Chatterji, Sourav and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/rfdxr-1as30", "eprint_id": 74877, "eprint_status": "archive", "datestamp": "2023-08-19 17:29:34", "lastmod": "2023-10-24 23:45:30", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Lall-S", "name": { "family": "Lall", "given": "Sabbi" } }, { "id": "Gr\u00fcn-D", "name": { "family": "Gr\u00fcn", "given": "Dominic" } }, { "id": "Krek-A", "name": { "family": "Krek", "given": "Azra" } }, { "id": "Chen-Kevin-Bio", "name": { "family": "Chen", "given": "Kevin" } }, { "id": "Wang-Yi-Lu", "name": { "family": "Wang", "given": "Yi-Lu" } }, { "id": "Dewey-C-N", "name": { "family": "Dewey", "given": "Colin N." } }, { "id": "Sood-P", "name": { "family": "Sood", "given": "Pranidhi" } }, { "id": "Colombo-T", "name": { "family": "Colombo", "given": "Teresa" } }, { "id": "Bray-N-L", "name": { "family": "Bray", "given": "Nicolas" } }, { "id": "MacMenamin-P", "name": { "family": "MacMenamin", "given": "Philip" } }, { "id": "Kao-Huey-Ling", "name": { "family": "Kao", "given": "Huey-Ling" } }, { "id": "Gunsalus-K-C", "name": { "family": "Gunsalus", "given": "Kristin C." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Piano-F", "name": { "family": "Piano", "given": "Fabio" } }, { "id": "Rajewsky-N", "name": { "family": "Rajewsky", "given": "Nikolaus" } } ] }, "title": "A Genome-Wide Map of Conserved MicroRNA Targets in C. elegans", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2006 Elsevier. \n\nReceived 19 December 2005, Revised 19 January 2006, Accepted 24 January 2006, Available online 2 February 2006. \n\nWe would like to thank Marc Vidal and Denis Dupuy for their kind gift of promoterome constructs used in this study. In addition, we would like to thank Anita Fernandez for discussions and comments on the manuscript and Frank Slack and Oliver Hobert for help with Table S1. We thank Thadeous Kacmarczyk for administration of our computers. We also thank Tamer A. Hadi for excellent technical support for this project. D.G. was funded by a scholarship from the German Academic Exchange Service. This work was supported by NIH Grants R21-HD049435 to N.R. and F.P. and R01-HD046236 to F.P. and by a HHMI grant through the Undergraduate Biological Sciences Education Program to NYU. L.P. is supported by Sloan Research Fellowship, NSF grant CCF 03-47992, and L.P., C.N.D., N.B., and K.C. are supported by NIH Grant R01-HG02362.\n\nSupplemental Material - mmc1.pdf
", "abstract": "Background: Metazoan miRNAs regulate protein-coding genes by binding the 3\u2032 UTR of cognate mRNAs. Identifying targets for the 115 known C. elegans miRNAs is essential for understanding their function. \n\nResults: By using a new version of PicTar and sequence alignments of three nematodes, we predict that miRNAs regulate at least 10% of C. elegans genes through conserved interactions. We have developed a new experimental pipeline to assay 3\u2032 UTR-mediated posttranscriptional gene regulation via an endogenous reporter expression system amenable to high-throughput cloning, demonstrating the utility of this system using one of the most intensely studied miRNAs, let-7. Our expression analyses uncover several new potential let-7 targets and suggest a new let-7 activity in head muscle and neurons. To explore genome-wide trends in miRNA function, we analyzed functional categories of predicted target genes, finding that one-third of C. elegans miRNAs target gene sets are enriched for specific functional annotations. We have also integrated miRNA target predictions with other functional genomic data from C. elegans. \n\nConclusions: At least 10% of C. elegans genes are predicted miRNA targets, and a number of nematode miRNAs seem to regulate biological processes by targeting functionally related genes. We have also developed and successfully utilized an in vivo system for testing miRNA target predictions in likely endogenous expression domains. The thousands of genome-wide miRNA target predictions for nematodes, humans, and flies are available from the PicTar website and are linked to an accessible graphical network-browsing tool allowing exploration of miRNA target predictions in the context of various functional genomic data resources.", "date": "2006-03-07", "date_type": "published", "publication": "Current Biology", "volume": "16", "number": "5", "publisher": "Cell Press", "pagerange": "460-471", "id_number": "CaltechAUTHORS:20170307-164849681", "issn": "0960-9822", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170307-164849681", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "Deutscher Akademischer Austauschdienst (DAAD)" }, { "agency": "NIH", "grant_number": "R21-HD049435" }, { "agency": "NIH", "grant_number": "R01-HD046236" }, { "agency": "Howard Hughes Medical Institute (HHMI)" }, { "agency": "Alfred P. Sloan Foundation" }, { "agency": "NSF", "grant_number": "CCF-0347992" }, { "agency": "NIH", "grant_number": "R01-HG02362" } ] }, "doi": "10.1016/j.cub.2006.01.050", "primary_object": { "basename": "mmc1.pdf", "url": "https://authors.library.caltech.edu/records/rfdxr-1as30/files/mmc1.pdf" }, "resource_type": "article", "pub_year": "2006", "author_list": "Lall, Sabbi; Gr\u00fcn, Dominic; et el." }, { "id": "https://authors.library.caltech.edu/records/1hgt0-34a86", "eprint_id": 74876, "eprint_status": "archive", "datestamp": "2023-08-19 17:25:25", "lastmod": "2023-10-24 23:45:28", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Levy-D", "name": { "family": "Levy", "given": "Dan" } }, { "id": "Yoshida-Ruriko", "name": { "family": "Yoshida", "given": "Ruriko" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Beyond Pairwise Distances: Neighbor-Joining with Phylogenetic Diversity Estimates", "ispublished": "pub", "full_text_status": "restricted", "note": "\u00a9 The Author 2005. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. \n\nAccepted: 19 October 2005. Published: 09 November 2005. \n\nWe thank the anonymous referees for comments that improved the manuscript. This work was partially funded by the National Institutes of Health (NIH) grant (R01HG2362). L.P. was also supported by a Sloan foundation fellowship. D.L. was also supported by NIH grant (GM68423).", "abstract": "The \"neighbor-joining algorithm\" is a recursive procedure for reconstructing trees that is based on a transformation of pairwise distances between leaves. We present a generalization of the neighbor-joining transformation, which uses estimates of phylogenetic diversity rather than pairwise distances in the tree. This leads to an improved neighbor-joining algorithm whose total running time is still polynomial in the number of taxa. On simulated data, the method outperforms other distance-based methods. We have implemented neighbor-joining for subtree weights in a program called MJOIN which is freely available under the Gnu Public License at http://bio.math.berkeley.edu/mjoin/.", "date": "2006-03", "date_type": "published", "publication": "Molecular Biology and Evolution", "volume": "23", "number": "3", "publisher": "Oxford University Press", "pagerange": "491-498", "id_number": "CaltechAUTHORS:20170307-164418033", "issn": "0737-4038", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170307-164418033", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R01HG2362" }, { "agency": "Alfred P. Sloan Foundation" }, { "agency": "NIH", "grant_number": "GM68423" } ] }, "doi": "10.1093/molbev/msj059", "resource_type": "article", "pub_year": "2006", "author_list": "Levy, Dan; Yoshida, Ruriko; et el." }, { "id": "https://authors.library.caltech.edu/records/y9ss2-0wq61", "eprint_id": 74895, "eprint_status": "archive", "datestamp": "2023-08-19 17:17:53", "lastmod": "2023-10-24 23:46:38", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Caspi-A", "name": { "family": "Caspi", "given": "Anat" }, "orcid": "0000-0001-8702-8273" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Identification of transposable elements using multiple alignments of related genomes", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2006 Cold Spring Harbor Laboratory Press. The Authors acknowledge that six months after the full-issue publication date, the Article will be distributed under a Creative Commons CC-BY-NC License (Attribution-NonCommercial 4.0 International License, http://creativecommons.org/licenses/by-nc/4.0/). \n\nAccepted September 19, 2005. Received June 29, 2005. Published in Advance December 14, 2005. \n\nWe thank Roger Hoskins for elucidating the problem of position-effect variegation for us. We also thank Colin Dewey for providing the MERCATOR alignments and access to his software library; and Sue Celniker, Michael Ashburner, and the anonymous reviewers for useful discussion and comments. Finally, we thank the Washington University in St. Louis Genome Sequencing Center and Agencourt for the draft assemblies of D. yakuba and D. virilis. \n\nSupplemental material is available at http://baboon.math.berkeley.edu/~caspian/DrosTEs/. The site includes a table showing the coordinates, length, classification, and characterization of genomic environment of the new identified insertion regions of known TE families; a table showing the coordinates, length, and environmental characterization of the new TE families in D. melanogaster euchromatin resulting from our case study; and a table showing the genomic environmental characterization of the BDGP annotated TE instances. Our reported Wilcoxon rank test results are also available. Additionally, the alignments of all the new families and the known TE families with new insertions are available on that site.\n\nPublished - Genome_Res.-2006-Caspi-260-70.pdf
", "abstract": "Accurate genome-wide cataloging of transposable elements (TEs) will facilitate our understanding of mobile DNA evolution, expose the genomic effects of TEs on the host genome, and improve the quality of assembled genomes. Using the availability of several nearly complete Drosophila genomes and developments in whole genome alignment methods, we introduce a large-scale comparative method for identifying repetitive mobile DNA regions. These regions are highly enriched for transposable elements. Our method has two main features distinguishing it from other repeat-finding methods. First, rather than relying on sequence similarity to determine the location of repeats, the genomic artifacts of the transposition mechanism itself are systematically tracked in the context of multiple alignments. Second, we can derive bounds on the age of each repeat instance based on the phylogenetic species tree. We report results obtained using both complete and draft sequences of four closely related Drosophila genomes and validate our results with manually curated TE annotations in the Drosophila melanogaster euchromatin. We show the utility of our findings in exploring both transposable elements and their host genomes: In the study of TEs, we offer predictions for novel families, annotate new insertions of known families, and show data that support the hypothesis that all known TE families in D. melanogaster were recently active; in the study of the host, we show how our findings can be used to determine shifts in the eu-heterochromatin junction in the pericentric chromosome regions.", "date": "2006-02", "date_type": "published", "publication": "Genome Research", "volume": "16", "number": "2", "publisher": "Cold Spring Harbor Laboratory Press", "pagerange": "260-270", "id_number": "CaltechAUTHORS:20170308-111109017", "issn": "1088-9051", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170308-111109017", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "doi": "10.1101/gr.4361206", "pmcid": "PMC1361722", "primary_object": { "basename": "Genome_Res.-2006-Caspi-260-70.pdf", "url": "https://authors.library.caltech.edu/records/y9ss2-0wq61/files/Genome_Res.-2006-Caspi-260-70.pdf" }, "resource_type": "article", "pub_year": "2006", "author_list": "Caspi, Anat and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/czhmv-7p908", "eprint_id": 74902, "eprint_status": "archive", "datestamp": "2023-08-19 16:02:44", "lastmod": "2023-10-24 23:46:57", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Chen-Kevin-Bio", "name": { "family": "Chen", "given": "Kevin" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2005 Chen and Pachter. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. \n\nPublished: July 12, 2005. \n\nWe thank Eric Allen, Jill Banfield, Susannah Tringe, and Gene Tyson for introducing us to the field of metagenomics and for helpful discussions while preparing the manuscript. We also thank Richard Karp and Satish Rao for useful discussions on bioinformatics issues, and the anonymous reviewers for their comments on an earlier version of this paper. Some of the data we have used were provided by JGI and EMBL. KC was supported by National Science Foundation (NSF) grant EF 03\u201331494. LP was supported by a Sloan Research Fellowship, NSF grant CCF 03\u201347992, and National Institutes of Health grant R01-HG02362\u201303.\n\nPublished - journal.pcbi.0010024.PDF
", "abstract": "The application of whole-genome shotgun sequencing to microbial communities represents a major development in metagenomics, the study of uncultured microbes via the tools of modern genomic analysis. In the past year, whole-genome shotgun sequencing projects of prokaryotic communities from an acid mine biofilm, the Sargasso Sea, Minnesota farm soil, three deep-sea whale falls, and deep-sea sediments have been reported, adding to previously published work on viral communities from marine and fecal samples. The interpretation of this new kind of data poses a wide variety of exciting and difficult bioinformatics problems. The aim of this review is to introduce the bioinformatics community to this emerging field by surveying existing techniques and promising new approaches for several of the most interesting of these computational problems.", "date": "2005-07", "date_type": "published", "publication": "PLoS Computational Biology", "volume": "1", "number": "2", "publisher": "Public Library of Science", "pagerange": "Art. No. e24", "id_number": "CaltechAUTHORS:20170308-124940796", "issn": "1553-734X", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170308-124940796", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NSF", "grant_number": "EF-0331494" }, { "agency": "Alfred P. Sloan Foundation" }, { "agency": "NSF", "grant_number": "CCF-0347992" }, { "agency": "NIH", "grant_number": "R01-HG02362\u201303" } ] }, "doi": "10.1371/journal.pcbi.0010024", "pmcid": "PMC1185649", "primary_object": { "basename": "journal.pcbi.0010024.PDF", "url": "https://authors.library.caltech.edu/records/czhmv-7p908/files/journal.pcbi.0010024.PDF" }, "resource_type": "article", "pub_year": "2005", "author_list": "Chen, Kevin and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/yf5ce-tav61", "eprint_id": 74903, "eprint_status": "archive", "datestamp": "2023-08-19 16:02:50", "lastmod": "2023-10-24 23:47:02", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Chatterji-S", "name": { "family": "Chatterji", "given": "Sourav" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Large Multiple Organism Gene Finding by Collapsed Gibbs Sampling", "ispublished": "pub", "full_text_status": "restricted", "note": "\u00a9 2005 Mary Ann Liebert, Inc. \n\nThanks to Simon Cawley for helpful discussions and comments. This work was partially funded with a grant from the NIH (R01: HG2362-1).", "abstract": "The Gibbs sampling method has been widely used for sequence analysis after it was successfully applied to the problem of identifying regulatory motif sequences upstream of genes. Since then, numerous variants of the original idea have emerged: however, in all cases the application has been to finding short motifs in collections of short sequences (typically less than 100 nucleotides long). In this paper, we introduce a Gibbs sampling approach for identifying genes in multiple large genomic sequences up to hundreds of kilobases long. This approach leverages the evolutionary relationships between the sequences to improve the gene predictions, without explicitly aligning the sequences. We have applied our method to the analysis of genomic sequence from 14 genomic regions, totaling roughly 1.8 Mb of sequence in each organism. We show that our approach compares favorably with existing ab initio approaches to gene finding, including pairwise comparison based gene prediction methods which make explicit use of alignments. Furthermore, excellent performance can be obtained with as little as four organisms, and the method overcomes a number of difficulties of previous comparison based gene finding approaches: it is robust with respect to genomic rearrangements, can work with draft sequence, and is fast (linear in the number and length of the sequences). It can also be seamlessly integrated with Gibbs sampling motif detection methods.", "date": "2005-07", "date_type": "published", "publication": "Journal of Computational Biology", "volume": "12", "number": "6", "publisher": "Mary Ann Liebert, Inc.", "pagerange": "599-608", "id_number": "CaltechAUTHORS:20170308-125857130", "issn": "1066-5277", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170308-125857130", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R01 HG2362-1" } ] }, "doi": "10.1089/cmb.2005.12.599", "resource_type": "article", "pub_year": "2005", "author_list": "Chatterji, Sourav and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/dejyt-dqe25", "eprint_id": 95217, "eprint_status": "archive", "datestamp": "2023-08-22 03:41:40", "lastmod": "2023-10-20 18:40:56", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "McAuliffe-J-D", "name": { "family": "McAuliffe", "given": "Jon D." } }, { "id": "Jordan-M-I", "name": { "family": "Jordan", "given": "Michael I." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Subtree power analysis and species selection for comparative genomics", "ispublished": "pub", "full_text_status": "public", "keywords": "hypothesis testing; likelihood ratio; sequence analysis", "note": "\u00a9 2005 The National Academy of Sciences. \n\nCommunicated by Peter J. Bickel, University of California, Berkeley, CA, April 6, 2005 (received for review December 13, 2004). \n\nWe thank Peter Bickel and Adam Siepel for helpful comments. M.I.J. was supported by National Institutes of Health Grant R33-HG003070. L.P. was supported by National Institutes of Health Grant R01-HG2362-3, a Sloan Foundation Research Fellowship, and National Science Foundation Career Award CCF-0347992. \n\nAuthor contributions: J.D.M., M.I.J., and L.P. designed research; J.D.M., M.I.J., and L.P. performed research; J.D.M., M.I.J., and L.P. contributed new reagents/analytic tools; J.D.M. analyzed data; and J.D.M. wrote the paper.\n\nPublished - pnas-0502790102.pdf
", "abstract": "Sequence comparison across multiple organisms aids in the detection of regions under selection. However, resource limitations require a prioritization of genomes to be sequenced. This prioritization should be grounded in two considerations: the lineal scope encompassing the biological phenomena of interest, and the optimal species within that scope for detecting functional elements. We introduce a statistical framework for optimal species subset selection, based on maximizing power to detect conserved sites. Analysis of a phylogenetic star topology shows theoretically that the optimal species subset is not in general the most evolutionarily diverged subset. We then demonstrate this finding empirically in a study of vertebrate species. Our results suggest that marsupials are prime sequencing candidates.", "date": "2005-05-31", "date_type": "published", "publication": "Proceedings of the National Academy of Sciences of the United States of America", "volume": "102", "number": "22", "publisher": "National Academy of Sciences", "pagerange": "7900-7905", "id_number": "CaltechAUTHORS:20190503-150942109", "issn": "0027-8424", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20190503-150942109", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R33-HG003070" }, { "agency": "NIH", "grant_number": "R01-HG2362-3" }, { "agency": "Alfred P. Sloan Foundation" }, { "agency": "NSF", "grant_number": "CCF-0347992" } ] }, "doi": "10.1073/pnas.0502790102", "pmcid": "PMC1142384", "primary_object": { "basename": "pnas-0502790102.pdf", "url": "https://authors.library.caltech.edu/records/dejyt-dqe25/files/pnas-0502790102.pdf" }, "resource_type": "article", "pub_year": "2005", "author_list": "McAuliffe, Jon D.; Jordan, Michael I.; et el." }, { "id": "https://authors.library.caltech.edu/records/tr080-6rv26", "eprint_id": 74905, "eprint_status": "archive", "datestamp": "2023-08-19 14:51:55", "lastmod": "2023-10-20 21:57:45", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Hillier-L-W", "name": { "family": "Hillier", "given": "LaDeana W." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2004 Macmillan Publishers Limited. \n\nReceived 19 July 2004; Accepted 1 November 2004. \n\nThe Gallus gallus sequence and map generation at Washington University School of Medicine's Genome Sequencing Center was supported by grants from the National Human Genome Research Institute (NHGRI). For work from other groups, we acknowledge the support of the Biotechnology and Biological Sciences Research Council, Center for Integrative Genomics Funds, Childcare and Lejeune Foundations, Chinese Academy of Sciences and Ministry of Science and Technology, Department of Energy, Desiree and Niels Yde Foundation, European Union, European Molecular Biology Laboratory, Fonds Quebe\u00e7ois de la Recherche sur la Nature et les Technologies, Howard Hughes Medical Institute, National Institute for Diabetes and Digestive and Kidney Diseases, NHGRI, National Institutes of Health, National Natural Science Foundation of China, National Science Foundation, Novo Nordisk Foundation, Stowers Institute for Medical Research, Swedish Research Council, Swiss NCCR Frontiers in Genetics, Swiss National Science Foundation, USDA/CSREES National Research Initiative, USDA/CSREES National Animal Genome Research Program, Wallenberg Consortium North and the AgriFunGen program at the Swedish University of Agricultural Sciences, UK Medical Research Council, University of California Presidential Chair Fund, University of Pennsylvania Genomics Institute Award, University of Texas at Arlington, and the Wellcome Trust. Resources for exploring the sequence and annotation data are available on browser displays available at Ensembl (http://www.ensembl.org), UCSC (http://genome.ucsc.edu) and the NCBI (http://www.ncbi.nlm.nih.gov). We thank R. Waterston for advice regarding the manuscript. \n\nCorrespondence and requests for materials should be addressed to R.K.W. (Email: rwilson@watson.wustl.edu). This G. gallus whole-genome shotgun project has been deposited at DDBJ/EMBL/GenBank under the project accession number AADN00000000. The version described in this paper is the first version, AADN01000000.\n\nIn Table 5 of this Article, the last four values listed in the 'Copy number' column were incorrect. These should be: LTR elements, 30,000; DNA transposons, 20,000; simple repeats, 140,000; and satellites, 4,000. These errors do not affect any of the conclusions in our paper.\n\nSupplemental Material - nature03154-s1.pdf
Supplemental Material - nature03154-s2.pdf
Supplemental Material - nature03154-s3.pdf
", "abstract": "We present here a draft genome sequence of the red jungle fowl, Gallus gallus. Because the chicken is a modern descendant of the dinosaurs and the first non-mammalian amniote to have its genome sequenced, the draft sequence of its genome\u2014composed of approximately one billion base pairs of sequence and an estimated 20,000\u201323,000 genes\u2014provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes. For example, the evolutionary distance between chicken and human provides high specificity in detecting functional elements, both non-coding and coding. Notably, many conserved non-coding sequences are far from genes and cannot be assigned to defined functional classes. In coding regions the evolutionary dynamics of protein domains and orthologous groups illustrate processes that distinguish the lineages leading to birds and mammals. The distinctive properties of avian microchromosomes, together with the inferred patterns of conserved synteny, provide additional insights into vertebrate chromosome architecture.", "date": "2004-12-09", "date_type": "published", "publication": "Nature", "volume": "432", "number": "7018", "publisher": "Nature Publishing Group", "pagerange": "695-716", "id_number": "CaltechAUTHORS:20170308-130340353", "issn": "0028-0836", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170308-130340353", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "National Human Genome Research Institute" }, { "agency": "Biotechnology and Biological Sciences Research Council (BBSRC)" }, { "agency": "Center for Integrative Genomics Funds" }, { "agency": "ChildCare Foundation" }, { "agency": "Lejeune Foundation" }, { "agency": "Chinese Academy of Sciences" }, { "agency": "Ministry of Science and Technology (China)" }, { "agency": "Department of Energy (DOE)" }, { "agency": "Desiree and Niels Yde Foundation" }, { "agency": "European Union" }, { "agency": "European Molecular Biology Laboratory (EMBL)" }, { "agency": "Fonds Quebe\u00e7ois de la Recherche sur la Nature et les Technologies (FQRNT)" }, { "agency": "Howard Hughes Medical Institute (HHMI)" }, { "agency": "National Institute for Diabetes and Digestive and Kidney Diseases" }, { "agency": "NIH" }, { "agency": "NSF" }, { "agency": "National Natural Science Foundation of China" }, { "agency": "Novo Nordisk Foundation" }, { "agency": "Stowers Institute for Medical Research" }, { "agency": "Swedish Research Council" }, { "agency": "Swiss NCCR Frontiers in Genetics" }, { "agency": "Swiss National Science Foundation (SNSF)" }, { "agency": "U.S. Department of Agriculture" }, { "agency": "Swedish University of Agricultural Sciences" }, { "agency": "Medical Research Council (UK)" }, { "agency": "University of California" }, { "agency": "University of Pennsylvania" }, { "agency": "University of Texas at Arlington" }, { "agency": "Wellcome Trust" } ] }, "corp_creators": { "items": [ "International Chicken Genome Sequencing Consortium" ] }, "doi": "10.1038/nature03154", "primary_object": { "basename": "nature03154-s1.pdf", "url": "https://authors.library.caltech.edu/records/tr080-6rv26/files/nature03154-s1.pdf" }, "related_objects": [ { "basename": "nature03154-s2.pdf", "url": "https://authors.library.caltech.edu/records/tr080-6rv26/files/nature03154-s2.pdf" }, { "basename": "nature03154-s3.pdf", "url": "https://authors.library.caltech.edu/records/tr080-6rv26/files/nature03154-s3.pdf" } ], "resource_type": "article", "pub_year": "2004", "author_list": "Hillier, LaDeana W. and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/q9x6k-rkc63", "eprint_id": 74911, "eprint_status": "archive", "datestamp": "2023-08-19 14:46:35", "lastmod": "2023-10-24 23:47:26", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Boffelli-D", "name": { "family": "Boffelli", "given": "Dario" } }, { "id": "Weer-C-V", "name": { "family": "Weer", "given": "Claire V." } }, { "id": "Weng-Li", "name": { "family": "Weng", "given": "Li" } }, { "id": "Lewis-K-D", "name": { "family": "Lewis", "given": "Keith D." } }, { "id": "Shoukry-M", "name": { "family": "Shoukry", "given": "Malak I." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Keys-D-N", "name": { "family": "Keys", "given": "David N." } }, { "id": "Rubin-E-M", "name": { "family": "Rubin", "given": "Edward M." } } ] }, "title": "Intraspecies sequence comparisons for annotating genomes", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2004 Cold Spring Harbor Laboratory Press. The Authors acknowledge that six months after the full-issue publication date, the Article will be distributed under a Creative Commons CC-BY-NC License (Attribution-NonCommercial 4.0 International License, http://creativecommons.org/licenses/by-nc/4.0/). \n\nReceived May 18, 2004; accepted in revised form October 5, 2004. \n\nWe thank Shigeki Fujiwara, Arjan Gittenberger, Kevin Heasman, Helene Huelvan, Di Jiang, Shungo Kano, Aimee Phillippi, Andy Sexton, and Seb Shimeld for providing C. intestinalis samples. Research was conducted at the E.O. Lawrence Berkeley National Laboratory and at the Joint Genome Institute, with support by grants from the Programs for Genomic Application, NHLBI (E.M.R.) and NIH (L.P.), and performed under Dept. of Energy Contract DE-AC0378SF00098, Univ. of California. \n\nGenBank accession numbers: Forkhead region: AY667314\u2013AY667347. Snail region: AY667371\u2013AY667407. Col5a1 region: AY667278\u2013AY667313. Patched region: AY667348\u2013AY667370.\n\nPublished - Genome_Res.-2004-Boffelli-2406-11.pdf
Supplemental Material - Supplementary_Information.doc
", "abstract": "Analysis of sequence variation among members of a single species offers a potential approach to identify functional DNA elements responsible for biological features unique to that species. Due to its high rate of allelic polymorphism and ease of genetic manipulability, we chose the sea squirt, Ciona intestinalis, to explore intraspecies sequence comparisons for genome annotation. A large number of C. intestinalis specimens were collected from four continents, and a set of genomic intervals were amplified, resequenced, and analyzed to determine the mutation rates at each nucleotide in the sequence. We found that regions with low mutation rates efficiently demarcated functionally constrained sequences: these include a set of noncoding elements, which we showed in C. intestinalis transgenic assays to act as tissue-specific enhancers, as well as the location of coding sequences. This illustrates that comparisons of multiple members of a species can be used for genome annotation, suggesting a path for the annotation of the sequenced genomes of organisms occupying uncharacterized phylogenetic branches of the animal kingdom. It also raises the possibility that the resequencing of a large number of Homo sapiens individuals might be used to annotate the human genome and identify sequences defining traits unique to our species.", "date": "2004-12", "date_type": "published", "publication": "Genome Research", "volume": "14", "number": "12", "publisher": "Cold Spring Harbor Laboratory Press", "pagerange": "2406-2411", "id_number": "CaltechAUTHORS:20170308-131651119", "issn": "1088-9051", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170308-131651119", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "National Heart, Lung, and Blood Institute" }, { "agency": "NIH" }, { "agency": "Department of Energy (DOE)", "grant_number": "DE-AC03-78SF00098" } ] }, "doi": "10.1101/gr.3199704", "pmcid": "PMC534664", "primary_object": { "basename": "Genome_Res.-2004-Boffelli-2406-11.pdf", "url": "https://authors.library.caltech.edu/records/q9x6k-rkc63/files/Genome_Res.-2004-Boffelli-2406-11.pdf" }, "related_objects": [ { "basename": "Supplementary_Information.doc", "url": "https://authors.library.caltech.edu/records/q9x6k-rkc63/files/Supplementary_Information.doc" } ], "resource_type": "article", "pub_year": "2004", "author_list": "Boffelli, Dario; Weer, Claire V.; et el." }, { "id": "https://authors.library.caltech.edu/records/hn6be-bya11", "eprint_id": 74828, "eprint_status": "archive", "datestamp": "2023-08-19 14:41:36", "lastmod": "2023-10-24 23:20:31", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Sturmfels-B", "name": { "family": "Sturmfels", "given": "Bernd" } } ] }, "title": "Parametric Inference for Biological Sequence Analysis", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2004 The National Academy of Sciences. \n\nCommunicated by Stephen E. Fienberg, Carnegie Mellon University, Pittsburgh, PA, September 10, 2004 (received for review January 25, 2004) \n\nL.P. was supported in part by National Institutes of Health Grant R01-HG02362-02. B.S. was supported by a Hewlett Packard Visiting Research Professorship 2003/2004 at the Mathematical Sciences Research Institute (MSRI) at the University of California, Berkeley, and in part by National Science Foundation Grant DMS-0200729.\n\nPublished - PNAS-2004-Pachter-16138-43.pdf
Submitted - 0401033.pdf
", "abstract": "One of the major successes in computational biology has been the unification, by using the graphical model formalism, of a multitude of algorithms for annotating and comparing biological sequences. Graphical models that have been applied to these problems include hidden Markov models for annotation, tree models for phylogenetics, and pair hidden Markov models for alignment. A single algorithm, the sum-product algorithm, solves many of the inference problems that are associated with different statistical models. This article introduces the polytope propagation algorithm for computing the Newton polytope of an observation from a graphical model. This algorithm is a geometric version of the sum-product algorithm and is used to analyze the parametric behavior of maximum a posteriori inference calculations for graphical models.", "date": "2004-11-16", "date_type": "published", "publication": "Proceedings of the National Academy of Sciences of the United States of America", "volume": "101", "number": "46", "publisher": "National Academy of Sciences", "pagerange": "16138-16143", "id_number": "CaltechAUTHORS:20170307-081738298", "issn": "0027-8424", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170307-081738298", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R01-HG02362-02" }, { "agency": "Hewlett-Packard Company" }, { "agency": "NSF", "grant_number": "DMS-0200729" } ] }, "doi": "10.1073/pnas.0406011101", "pmcid": "PMC528961", "primary_object": { "basename": "0401033.pdf", "url": "https://authors.library.caltech.edu/records/hn6be-bya11/files/0401033.pdf" }, "related_objects": [ { "basename": "PNAS-2004-Pachter-16138-43.pdf", "url": "https://authors.library.caltech.edu/records/hn6be-bya11/files/PNAS-2004-Pachter-16138-43.pdf" } ], "resource_type": "article", "pub_year": "2004", "author_list": "Pachter, Lior and Sturmfels, Bernd" }, { "id": "https://authors.library.caltech.edu/records/9npck-fag88", "eprint_id": 74825, "eprint_status": "archive", "datestamp": "2023-08-19 14:41:29", "lastmod": "2023-10-24 23:20:21", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Sturmfels-B", "name": { "family": "Sturmfels", "given": "Bernd" } } ] }, "title": "Tropical Geometry of Statistical Models", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2004 The National Academy of Sciences. \n\nCommunicated by Stephen E. Fienberg, Carnegie Mellon University, Pittsburgh, PA, September 10, 2004 (received for review January 25, 2004) \n\nWe thank Komei Fukuda, Michael Joswig, and Kristian Ranestad for their help in obtaining the computational results reported in section 2. L.P. was supported in part by National Institutes of Health Grant R01-HG02362-02. B.S. was supported by a Hewlett Packard Visiting Research Professorship 2003/2004 at the Mathematical Sciences Research Institute (MSRI) at the University of California, Berkley, and in part by National Science Foundation Grant DMS-0200729.\n\nPublished - PNAS-2004-Pachter-16132-7.pdf
Submitted - 0311009.pdf
", "abstract": "This article presents a unified mathematical framework for inference in graphical models, building on the observation that graphical models are algebraic varieties. From this geometric viewpoint, observations generated from a model are coordinates of a point in the variety, and the sum-product algorithm is an efficient tool for evaluating specific coordinates. Here, we address the question of how the solutions to various inference problems depend on the model parameters. The proposed answer is expressed in terms of tropical algebraic geometry. The Newton polytope of a statistical model plays a key role. Our results are applied to the hidden Markov model and the general Markov model on a binary tree.", "date": "2004-11-16", "date_type": "published", "publication": "Proceedings of the National Academy of Sciences of the United States of America", "volume": "101", "number": "46", "publisher": "National Academy of Sciences", "pagerange": "16132-16137", "id_number": "CaltechAUTHORS:20170307-073504137", "issn": "0027-8424", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170307-073504137", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R01-HG02362-02" }, { "agency": "Hewlett-Packard Company" }, { "agency": "NSF", "grant_number": "DMS-0200729" } ] }, "doi": "10.1073/pnas.0406010101", "pmcid": "PMC528960", "primary_object": { "basename": "0311009.pdf", "url": "https://authors.library.caltech.edu/records/9npck-fag88/files/0311009.pdf" }, "related_objects": [ { "basename": "PNAS-2004-Pachter-16132-7.pdf", "url": "https://authors.library.caltech.edu/records/9npck-fag88/files/PNAS-2004-Pachter-16132-7.pdf" } ], "resource_type": "article", "pub_year": "2004", "author_list": "Pachter, Lior and Sturmfels, Bernd" }, { "id": "https://authors.library.caltech.edu/records/h1p73-wax92", "eprint_id": 74915, "eprint_status": "archive", "datestamp": "2023-08-19 14:33:37", "lastmod": "2023-10-24 23:47:56", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Feingold-E-A", "name": { "family": "Feingold", "given": "E. A." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "L." }, "orcid": "0000-0002-9164-6231" } ] }, "title": "The ENCODE (ENCyclopedia Of DNA Elements) Project", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2004 American Association for the Advancement of Science. \n\nThe Consortium thanks the ENCODE Scientific Advisory Panel for their helpful advice on the project: G. Weinstock, G. Churchill, M. Eisen, S. Elgin, S. Elledge, J. Rine, and M. Vidal. We thank D. Leja, and M. Cichanowski for their work in creating figures for this paper. Supported by the National Human Genome Research Institute, the National Library of Medicine, the Wellcome Trust, and the Howard Hughes Medical Institute.\n\nSupplemental Material - Feingold.SOM.DC1.pdf
Supplemental Material - Feingold.SOM.DC2.pdf
", "abstract": "The ENCyclopedia Of DNA Elements (ENCODE) Project aims to identify all functional elements in the human genome sequence. The pilot phase of the Project is focused on a specified 30 megabases (\u223c1%) of the human genome sequence and is organized as an international consortium of computational and laboratory-based scientists working to develop and apply high-throughput approaches for detecting all sequence elements that confer biological function. The results of this pilot phase will guide future efforts to analyze the entire human genome.", "date": "2004-10-22", "date_type": "published", "publication": "Science", "volume": "306", "number": "5696", "publisher": "American Association for the Advancement of Science", "pagerange": "636-640", "id_number": "CaltechAUTHORS:20170308-133717579", "issn": "0036-8075", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170308-133717579", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "National Human Genome Research Institute" }, { "agency": "National Library of Medicine" }, { "agency": "Wellcome Trust" }, { "agency": "Howard Hughes Medical Institute (HHMI)" } ] }, "corp_creators": { "items": [ "ENCODE Project Consortium" ] }, "doi": "10.1126/science.1105136", "primary_object": { "basename": "Feingold.SOM.DC1.pdf", "url": "https://authors.library.caltech.edu/records/h1p73-wax92/files/Feingold.SOM.DC1.pdf" }, "related_objects": [ { "basename": "Feingold.SOM.DC2.pdf", "url": "https://authors.library.caltech.edu/records/h1p73-wax92/files/Feingold.SOM.DC2.pdf" } ], "resource_type": "article", "pub_year": "2004", "author_list": "Feingold, E. A. and Pachter, L." }, { "id": "https://authors.library.caltech.edu/records/h1v4b-fzh55", "eprint_id": 74917, "eprint_status": "archive", "datestamp": "2023-08-19 14:06:33", "lastmod": "2023-10-24 23:48:04", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "McAuliffe-J-D", "name": { "family": "McAuliffe", "given": "Jon D." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Jordan-M-I", "name": { "family": "Jordan", "given": "Michael I." } } ] }, "title": "Multiple-sequence functional annotation and the generalized hidden Markov phylogeny", "ispublished": "pub", "full_text_status": "restricted", "note": "\u00a9 2004 Oxford University Press. \n\nReceived on September 20, 2003; revised on January 4, 2004; accepted on January 20, 2004. Advance Access publication February 26, 2004. \n\nWe thank Dario Boffelli and Eddy Rubin for the sequence data that we have analyzed as well as many helpful discussions about the concepts of phylogenetic shadowing. We are also grateful to the anonymous referees for comments that led to several improvements. L.P. was supported in part by a grant from the NIH (R01-HG02362-02). M.J. was supported by a grant from the NSF (IIS-9988642).", "abstract": "Motivation: Phylogenetic shadowing is a comparative genomics principle that allows for the discovery of conserved regions in sequences from multiple closely related organisms. We develop a formal probabilistic framework for combining phylogenetic shadowing with feature-based functional annotation methods. The resulting model, a generalized hidden Markov phylogeny (GHMP), applies to a variety of situations where functional regions are to be inferred from evolutionary constraints. \n\nResults: We show how GHMPs can be used to predict complete shared gene structures in multiple primate sequences. We also describe shadower, our implementation of such a prediction system. We find that shadower outperforms previously reported ab initio gene finders, including comparative human\u2013mouse approaches, on a small sample of diverse exonic regions. Finally, we report on an empirical analysis of shadower's performance which reveals that as few as five well-chosen species may suffice to attain maximal sensitivity and specificity in exon demarcation. \n\nAvailability: A Web server is available at http://bonaire.lbl.gov/shadower", "date": "2004-08-12", "date_type": "published", "publication": "Bioinformatics", "volume": "20", "number": "12", "publisher": "Oxford University Press", "pagerange": "1850-1860", "id_number": "CaltechAUTHORS:20170308-135943475", "issn": "1367-4803", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170308-135943475", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R01-HG02362-02" }, { "agency": "NSF", "grant_number": "IIS-9988642" } ] }, "doi": "10.1093/bioinformatics/bth153", "resource_type": "article", "pub_year": "2004", "author_list": "McAuliffe, Jon D.; Pachter, Lior; et el." }, { "id": "https://authors.library.caltech.edu/records/w4ajn-sw675", "eprint_id": 74918, "eprint_status": "archive", "datestamp": "2023-08-19 13:52:28", "lastmod": "2023-10-24 23:48:06", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Frazier-K-A", "name": { "family": "Frazer", "given": "Kelly A." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Poliakov-A-N-B", "name": { "family": "Poliakov", "given": "Alexander" } }, { "id": "Rubin-E-M", "name": { "family": "Rubin", "given": "Edward M." } }, { "id": "Dubchak-I", "name": { "family": "Dubchak", "given": "Inna" } } ] }, "title": "VISTA: computational tools for comparative genomics", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2004, the authors. The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. \n\nReceived February 16, 2004; Revised and Accepted April 26, 2004. \n\nWe are grateful to Simon Minovitsky, Shyam Prabhakar, Michael Teplitsky and Dmitriy Ryaboy for ongoing development, maintenance and support of all our tools. The VISTA project is a collaborative effort of a large group of scientists and engineers. Among contributors are Nicolas Bray (the author of AVID and contributor to the design of whole-genome alignment strategies), Michael Brudno (contributor to MLAGAN and multiple whole-genome alignment), Nameeta Shah (the author of Phylo-VISTA), Olivier Couronne (contributor to whole-genome alignment), Gabriela Loots (rVISTA), Ivan Ovcharenko (contributor to rVISTA and whole-genome alignment), Chris Mayor (contributor to mVISTA) and Brian Klock and Lila Tretikov (VISTA Browser). Our special thanks to the biologists of the Genomics Division at LBNL (Dario Boffelli, Jim Bristow, Jan-Fang Cheng, Marcelo Nobrega, Len Pennacchio, James Priest and many others) for their help, support and critical comments. This project was partially supported by the Programs for Genomic Applications grant from the NHLBI/NIH and the Office of Biological and Environmental Research, Office of Science, US Department of Energy.\n\nPublished - gkh458.pdf
", "abstract": "Comparison of DNA sequences from different species is a fundamental method for identifying functional elements in genomes. Here, we describe the VISTA family of tools created to assist biologists in carrying out this task. Our first VISTA server at http://www-gsd.lbl.gov/vista/ was launched in the summer of 2000 and was designed to align long genomic sequences and visualize these alignments with associated functional annotations. Currently the VISTA site includes multiple comparative genomics tools and provides users with rich capabilities to browse pre-computed whole-genome alignments of large vertebrate genomes and other groups of organisms with VISTA Browser, to submit their own sequences of interest to several VISTA servers for various types of comparative analysis and to obtain detailed comparative analysis results for a set of cardiovascular genes. We illustrate capabilities of the VISTA site by the analysis of a 180 kb interval on human chromosome 5 that encodes for the kinesin family member 3A (KIF3A) protein.", "date": "2004-07-01", "date_type": "published", "publication": "Nucleic Acids Research", "volume": "32", "number": "Suppl. 2", "publisher": "Oxford University Press", "pagerange": "W273-W279", "id_number": "CaltechAUTHORS:20170308-140652072", "issn": "0305-1048", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170308-140652072", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "National Heart, Lung, and Blood Institute" }, { "agency": "Department of Energy (DOE)" } ] }, "doi": "10.1093/nar/gkh458", "pmcid": "PMC441596", "primary_object": { "basename": "gkh458.pdf", "url": "https://authors.library.caltech.edu/records/w4ajn-sw675/files/gkh458.pdf" }, "resource_type": "article", "pub_year": "2004", "author_list": "Frazer, Kelly A.; Pachter, Lior; et el." }, { "id": "https://authors.library.caltech.edu/records/2v174-rtt49", "eprint_id": 74827, "eprint_status": "archive", "datestamp": "2023-08-22 02:01:46", "lastmod": "2023-10-24 23:20:26", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Pachter-L", "name": { "family": "Pachter", "given": "L." }, "orcid": "0000-0002-9164-6231" }, { "id": "Speyer-D", "name": { "family": "Speyer", "given": "D." } } ] }, "title": "Reconstructing Trees from Subtree Weights", "ispublished": "pub", "full_text_status": "public", "keywords": "Phylogenetics; Tree; Reconstruction; Algorithm; Tropical", "note": "\u00a9 2004 Elsevier. \n\n(Received December 2003; accepted January 2004)\n\nWe thank B. Sturmfels for many comments which improved the manuscript. L. Pachter was partially supported by a Grant from the NIH (R01-HG02362-02).\n\nSubmitted - 0311156.pdf
", "abstract": "The tree-metric theorem provides a necessary and sufficient condition for a dissimilarity matrix to be a tree metric, and has served as the foundation for numerous distance-based reconstruction methods in phylogenetics. Our main result is an extension of the tree-metric theorem to more general dissimilarity maps. In particular, we show that a tree with n leaves is reconstructible from the weights of the m-leaf subtrees provided that n \u2265 2m - 1.", "date": "2004-06", "date_type": "published", "publication": "Applied Mathematics Letters", "volume": "17", "number": "6", "publisher": "Elsevier", "pagerange": "615-621", "id_number": "CaltechAUTHORS:20170307-080948323", "issn": "0893-9659", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170307-080948323", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R01-HG02362-02" } ] }, "doi": "10.1016/S0893-9659(04)90095-X", "primary_object": { "basename": "0311156.pdf", "url": "https://authors.library.caltech.edu/records/2v174-rtt49/files/0311156.pdf" }, "resource_type": "article", "pub_year": "2004", "author_list": "Pachter, L. and Speyer, D." }, { "id": "https://authors.library.caltech.edu/records/t95d7-66x59", "eprint_id": 74926, "eprint_status": "archive", "datestamp": "2023-08-19 13:24:31", "lastmod": "2023-10-24 23:48:40", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Gibbs-R-A", "name": { "family": "Gibbs", "given": "Richard A." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Genome sequence of the Brown Norway rat yields insights into mammalian evolution", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2004 Macmillan Publishers Limited. \n\nReceived 31 December 2003; Accepted 20 February 2004. \n\nWork at Baylor College of Medicine was supported by a grant from the NHGRI and NHLBI to R.A.G. Work at Genome Therapeutics was supported by grants from the NHGRI to D.S. A.S. acknowledges support from the NIGMS. M.B. acknowledges support from the NIH. N.H. was supported by the NGFN/BMBF (German Ministry for Research and Education). B.J.T. and J.M.Y. are supported by an NIH grant from the NIDCD. K.M.R. and G.M.C. are Howard Hughes Medical Institute Predoctoral Fellows. L.M.D'S., K.M. and K.J.K. are supported by training fellowships from the W. M. Keck Foundation to the Gulf Coast Consortia through the Keck Center for Computational and Structural Biology. Work at Case Western Reserve was supported in part by NIH grants to E.E.E. Work at IMIM was supported by a grant from Plan Nacional de I + D (Spain). M.M.A. acknowledges support from programme Ram\u00f3n y Cajal and a grant from the Spanish Ministry of Science and Technology. Work at Universidad de Oviedo was supported by grants from the European Union, Obra Social Cajastur and Gobierno del Principado de Asturias. Work at Penn State University was supported by NHGRI grants. Work at the University of California Berkeley was supported by a grant from the NIH. Work at the Washington University School of Medicine Genome Sequencing Center and the British Columbia Cancer Agency Genome Sciences Centre was supported by an NIH grant. Work at UCSC and CHORI was supported by the NHGRI. \n\nThe author declares no competing financial interests.\n\nSupplemental Material - nature02426-s1.doc
Supplemental Material - nature02426-s10.doc
Supplemental Material - nature02426-s11.doc
Supplemental Material - nature02426-s12.doc
Supplemental Material - nature02426-s2.pdf
Supplemental Material - nature02426-s3.jpg
Supplemental Material - nature02426-s4.pdf
Supplemental Material - nature02426-s5.jpg
Supplemental Material - nature02426-s6.jpg
Supplemental Material - nature02426-s7.pdf
Supplemental Material - nature02426-s8.pdf
Supplemental Material - nature02426-s9.pdf
", "abstract": "The laboratory rat (Rattus norvegicus) is an indispensable tool in experimental medicine and drug development, having made inestimable contributions to human health. We report here the genome sequence of the Brown Norway (BN) rat strain. The sequence represents a high-quality 'draft' covering over 90% of the genome. The BN rat sequence is the third complete mammalian genome to be deciphered, and three-way comparisons with the human and mouse genomes resolve details of mammalian evolution. This first comprehensive analysis includes genes and proteins and their relation to human disease, repeated sequences, comparative genome-wide studies of mammalian orthologous chromosomal regions and rearrangement breakpoints, reconstruction of ancestral karyotypes and the events leading to existing species, rates of variation, and lineage-specific and lineage-independent evolutionary events such as expansion of gene families, orthology relations and protein evolution.", "date": "2004-04-01", "date_type": "published", "publication": "Nature", "volume": "428", "number": "6982", "publisher": "Nature Publishing Group", "pagerange": "493-521", "id_number": "CaltechAUTHORS:20170308-145138306", "issn": "0028-0836", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170308-145138306", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "National Human Genome Research Institute" }, { "agency": "National Heart, Lung, and Blood Institute" }, { "agency": "National Institute of General Medical Sciences" }, { "agency": "Bundesministerium f\u00fcr Bildung und Forschung (BMBF)" }, { "agency": "NIH" }, { "agency": "Howard Hughes Medical Institute (HHMI)" }, { "agency": "W. M. Keck Foundation" }, { "agency": "Plan Nacional de I + D (Spain)" }, { "agency": "Ram\u00f3n y Cajal Programme" }, { "agency": "Ministerio de Ciencia y Tecnolog\u00eda (MYCT)" }, { "agency": "European Union" }, { "agency": "Obra Social Cajastur" }, { "agency": "Gobierno del Principado de Asturias" } ] }, "corp_creators": { "items": [ "Rat Genome Sequencing Project Consortium" ] }, "doi": "10.1038/nature02426", "primary_object": { "basename": "nature02426-s1.doc", "url": "https://authors.library.caltech.edu/records/t95d7-66x59/files/nature02426-s1.doc" }, "related_objects": [ { "basename": "nature02426-s12.doc", "url": "https://authors.library.caltech.edu/records/t95d7-66x59/files/nature02426-s12.doc" }, { "basename": "nature02426-s4.pdf", "url": "https://authors.library.caltech.edu/records/t95d7-66x59/files/nature02426-s4.pdf" }, { "basename": "nature02426-s5.jpg", "url": "https://authors.library.caltech.edu/records/t95d7-66x59/files/nature02426-s5.jpg" }, { "basename": "nature02426-s6.jpg", "url": "https://authors.library.caltech.edu/records/t95d7-66x59/files/nature02426-s6.jpg" }, { "basename": "nature02426-s7.pdf", "url": "https://authors.library.caltech.edu/records/t95d7-66x59/files/nature02426-s7.pdf" }, { "basename": "nature02426-s10.doc", "url": "https://authors.library.caltech.edu/records/t95d7-66x59/files/nature02426-s10.doc" }, { "basename": "nature02426-s11.doc", "url": "https://authors.library.caltech.edu/records/t95d7-66x59/files/nature02426-s11.doc" }, { "basename": "nature02426-s2.pdf", "url": "https://authors.library.caltech.edu/records/t95d7-66x59/files/nature02426-s2.pdf" }, { "basename": "nature02426-s3.jpg", "url": "https://authors.library.caltech.edu/records/t95d7-66x59/files/nature02426-s3.jpg" }, { "basename": "nature02426-s8.pdf", "url": "https://authors.library.caltech.edu/records/t95d7-66x59/files/nature02426-s8.pdf" }, { "basename": "nature02426-s9.pdf", "url": "https://authors.library.caltech.edu/records/t95d7-66x59/files/nature02426-s9.pdf" } ], "resource_type": "article", "pub_year": "2004", "author_list": "Gibbs, Richard A. and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/9kwva-gf330", "eprint_id": 74924, "eprint_status": "archive", "datestamp": "2023-08-19 13:21:43", "lastmod": "2023-10-24 23:48:31", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Dewey-C-N", "name": { "family": "Dewey", "given": "Colin" } }, { "id": "Wu-Jia-Qian", "name": { "family": "Wu", "given": "Jia Qian" } }, { "id": "Cawley-S", "name": { "family": "Cawley", "given": "Simon" } }, { "id": "Alexandersson-M", "name": { "family": "Alexandersson", "given": "Marina" } }, { "id": "Gibbs-R-A", "name": { "family": "Gibbs", "given": "Richard" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Accurate Identification of Novel Human Genes Through Simultaneous Gene Prediction in Human, Mouse, and Rat", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2004 Cold Spring Harbor Laboratory Press. The Authors acknowledge that six months after the full-issue publication date, the Article will be distributed under a Creative Commons CC-BY-NC License (Attribution-NonCommercial 4.0 International License, http://creativecommons.org/licenses/by-nc/4.0/). \n\nAccepted January 26, 2004. Received November 5, 2003. \n\nL.P. and C.D. were partially supported by NIH grant R01 HG2362-2. The whole-genome SLAM runs were performed on the Affymetrix computing cluster. R.G. and J.Q.W. were partially supported by grants from the NHGRI/NHLBI (1 U54 HG02345) and NCI/SAIC (20XS182A). \n\nThe publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked \"advertisement\" in accordance with 18 USC section 1734 solely to indicate this fact.\n\nPublished - 661.full.pdf
", "abstract": "We describe a new method for simultaneously identifying novel homologous genes with identical structure in the human, mouse, and rat genomes by combining pairwise predictions made with the SLAM gene-finding program. Using this method, we found 3698 gene triples in the human, mouse, and rat genomes which are predicted with exactly the same gene structure. We show, both computationally and experimentally, that the introns of these triples are predicted accurately as compared with the introns of other ab initio gene prediction sets. Computationally, we compared the introns of these gene triples, as well as those from other ab initio gene finders, with known intron annotations. We show that a unique property of SLAM, namely that it predicts gene structures simultaneously in two organisms, is key to producing sets of predictions that are highly accurate in intron structure when combined with other programs. Experimentally, we performed reverse transcription-polymerase chain reaction (RT-PCR) in both the human and rat to test the exon pairs flanking introns from a subset of the gene triples for which the human gene had not been previously identified. By performing RT-PCR on orthologous introns in both the human and rat genomes, we additionally explore the validity of using RT-PCR as a method for confirming gene predictions.", "date": "2004-04", "date_type": "published", "publication": "Genome Research", "volume": "14", "number": "4", "publisher": "Cold Spring Harbor Laboratory Press", "pagerange": "661-664", "id_number": "CaltechAUTHORS:20170308-144150791", "issn": "1088-9051", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170308-144150791", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R01 HG2362-2" }, { "agency": "NIH", "grant_number": "1 U54 HG02345" }, { "agency": "National Human Genome Research Institute" }, { "agency": "National Heart, Lung and Blood Institute" }, { "agency": "National Cancer Institute", "grant_number": "20XS182A" } ] }, "doi": "10.1101/gr.1939804", "pmcid": "PMC383310", "primary_object": { "basename": "661.full.pdf", "url": "https://authors.library.caltech.edu/records/9kwva-gf330/files/661.full.pdf" }, "resource_type": "article", "pub_year": "2004", "author_list": "Dewey, Colin; Wu, Jia Qian; et el." }, { "id": "https://authors.library.caltech.edu/records/vf6w0-hrb56", "eprint_id": 74925, "eprint_status": "archive", "datestamp": "2023-08-19 13:21:48", "lastmod": "2023-10-24 23:48:34", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Yap-Von-Bing", "name": { "family": "Yap", "given": "Von Bing" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Identification of Evolutionary Hotspots in the Rodent Genomes", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2004 Cold Spring Harbor Laboratory Press. The Authors acknowledge that six months after the full-issue publication date, the Article will be distributed under a Creative Commons CC-BY-NC License (Attribution-NonCommercial 4.0 International License, http://creativecommons.org/licenses/by-nc/4.0/). \n\nAccepted December 27, 2003. Received September 11, 2003. \n\nWe thank Nicolas Bray and Colin Dewey for generating the whole-genome alignments; and Greg Cooper, Ross Hardison, David Haussler, Webb Miller, and Arend Sidow for extensive discussions. Special appreciation goes to Krishna Roskin for reconciling the findings of the different groups. We also thank the referees for many helpful comments and suggestions. L.P. and V.B.Y. were partially funded by a grant from the NIH (R01-HG02362-01). \n\nThe publication costs of this article were defrayed in part by payment of page charges. This article must therefore be herebymarked \"advertisement\" in accordance with 18 USC section 1734 solely to indicate this fact.\n\nPublished - 574.full.pdf
", "abstract": "We describe a whole-genome comparative analysis of the human, mouse, and rat genomes to describe the average substitution patterns of four genomic regions: ancient repeats, rodent-specific DNA, exons, and conserved (coding and noncoding) regions, and to identify rodent evolutionary hotspots. In all types of regions, except the rodent-specific DNA, the rat branch is slightly longer than the mouse branch. Moreover, the mouse\u2013rat distance is longer in the rodent-specific DNA than in the ancient repeats. Analysis of individual conserved regions with different substitution models yielded the conclusion that the Jukes\u2013Cantor model is inadequate, and the Hasegawa\u2013Kishino\u2013Yano model is almost as good as the REV model. Using human as an outgroup, we identified 5055 evolutionary hotspots, which are highly conserved subalignment blocks (each consisting of at least 100 aligned sites and a small fraction of gaps) with a large and statistically significant difference in the branch lengths of the rodent species. The cutoffs used to identify the hotspots are partially based on estimates of the average rates of substitution. The fractions of hotspots overlapping with the rodent RefSeq genes, RefSeq exons, and ESTs are all higher than expected. Still, more than half of the hotspots lie in noncoding regions of the mouse genome. We believe that the hotspots represent biologically interesting regions in the rodent genomes.", "date": "2004-04", "date_type": "published", "publication": "Genome Research", "volume": "14", "number": "4", "publisher": "Cold Spring Harbor Laboratory Press", "pagerange": "574-579", "id_number": "CaltechAUTHORS:20170308-144803591", "issn": "1088-9051", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170308-144803591", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R01-HG02362-01" } ] }, "doi": "10.1101/gr.1967904", "pmcid": "PMC383301", "primary_object": { "basename": "574.full.pdf", "url": "https://authors.library.caltech.edu/records/vf6w0-hrb56/files/574.full.pdf" }, "resource_type": "article", "pub_year": "2004", "author_list": "Yap, Von Bing and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/nb32q-zt780", "eprint_id": 74826, "eprint_status": "archive", "datestamp": "2023-08-19 13:21:34", "lastmod": "2023-10-24 23:20:23", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Bray-N-L", "name": { "family": "Bray", "given": "Nicolas" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "MAVID: Constrained ancestral alignment of multiple sequences", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2004 Cold Spring Harbor Laboratory Press. The Authors acknowledge that six months after the full-issue publication date, the Article will be distributed under a Creative Commons CC-BY-NC License (Attribution-NonCommercial 4.0 International License, http://creativecommons.org/licenses/by-nc/4.0/). \n\nAccepted November 17, 2003. Received September 10, 2003. \n\nograms. We thank Von Bing Yap for helping with the evolutionary models used in MAVID. Thanks to Ingileif Brynd's Hallgr'msd\u00f3ttir for her help throughout the project and for her comments on the final manuscript. The data used in the multiple alignment of the CFTR region was generated by the NIH Intramural Sequencing Center (www.nisc.nih.gov), and was used subject to their 6-mo hold policy. The HIV sequences were downloaded from the HIV database (hiv-web.lanl.gov). Thanks also to the Rat Sequencing Consortium, both for providing the rat sequence to align, and for facilitating helpful collaborations and discussions. Finally, we thank the anonymous reviewers for their insightful comments and suggestions. This work was partially supported by funding from the NIH (grant R01-HG02362-01) and the Berkeley PGA grant from the NHLBI. \n\nThe publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked \"advertisement\" in accordance with 18 USC section 1734 solely to indicate this fact.\n\nPublished - 693.full.pdf
Submitted - 0311018.pdf
", "abstract": "We describe a new global multiple-alignment program capable of aligning a large number of genomic regions. Our progressive-alignment approach incorporates the following ideas: maximum-likelihood inference of ancestral sequences, automatic guide-tree construction, protein-based anchoring of ab-initio gene predictions, and constraints derived from a global homology map of the sequences. We have implemented these ideas in the MAVID program, which is able to accurately align multiple genomic regions up to megabases long. MAVID is able to effectively align divergent sequences, as well as incomplete unfinished sequences. We demonstrate the capabilities of the program on the benchmark CFTR region, which consists of 1.8 Mb of human sequence and 20 orthologous regions in marsupials, birds, fish, and mammals. Finally, we describe two large MAVID alignments, an alignment of all the available HIV genomes and a multiple alignment of the entire human, mouse, and rat genomes.", "date": "2004-04", "date_type": "published", "publication": "Genome Research", "volume": "14", "number": "4", "publisher": "Cold Spring Harbor Laboratory Press", "pagerange": "693-699", "id_number": "CaltechAUTHORS:20170307-074220313", "issn": "1088-9051", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170307-074220313", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R01-HG02362-01" }, { "agency": "National Heart, Lung, and Blood Institute" } ] }, "doi": "10.1101/gr.1960404", "pmcid": "PMC383315", "primary_object": { "basename": "0311018.pdf", "url": "https://authors.library.caltech.edu/records/nb32q-zt780/files/0311018.pdf" }, "related_objects": [ { "basename": "693.full.pdf", "url": "https://authors.library.caltech.edu/records/nb32q-zt780/files/693.full.pdf" } ], "resource_type": "article", "pub_year": "2004", "author_list": "Bray, Nicolas and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/bzm2q-a6208", "eprint_id": 74921, "eprint_status": "archive", "datestamp": "2023-08-19 13:21:39", "lastmod": "2023-10-24 23:48:19", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Chakrabarti-K", "name": { "family": "Chakrabarti", "given": "Kushal" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Visualization of Multiple Genome Annotations and Alignments With the K-BROWSER", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2004 Cold Spring Harbor Laboratory Press. The Authors acknowledge that six months after the full-issue publication date, the Article will be distributed under a Creative Commons CC-BY-NC License (Attribution-NonCommercial 4.0 International License, http://creativecommons.org/licenses/by-nc/4.0/). \n\nAccepted November 17, 2003. Received September 10, 2003. \n\nWe thank Nicolas Bray and Colin Dewey for suggestions and help with the alignments. Yin Lau helped with the Web site design. L.P. was partially supported by the NIH (R02-HG02362-01), and K.C. was partially supported by a COR grant from UC Berkeley. \n\nThe publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked \"advertisement\" in accordance with 18 USC section 1734 solely to indicate this fact. \n\nThe K-BROWSER can be accessed at http://hanuman.math.berkeley.edu/kbrowser/. The source code is available upon request.\n\nPublished - 716.full.pdf
", "abstract": "We introduce a novel genome browser application, the K-BROWSER, that allows intuitive visualization of biological information across an arbitrary number of multiply aligned genomes. In particular, the K-BROWSER simultaneously displays an arbitrary number of genomes both through overlaid annotations and predictions that describe their respective characteristics, and through the multiple alignment that describes their global relationship to one another. The browsing environment has been designed to allow users seamless access to information available in every genome and, furthermore, to allow easy navigation within and between genomes. As of the date of publication, the K-BROWSER has been set up on the human, mouse, and rat genomes.", "date": "2004-04", "date_type": "published", "publication": "Genome Research", "volume": "14", "number": "4", "publisher": "Cold Spring Harbor Laboratory Press", "pagerange": "716-720", "id_number": "CaltechAUTHORS:20170308-142044905", "issn": "1088-9051", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170308-142044905", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R02-HG02362-01" }, { "agency": "University of California, Berkeley" } ] }, "doi": "10.1101/gr.1957004", "pmcid": "PMC383318", "primary_object": { "basename": "716.full.pdf", "url": "https://authors.library.caltech.edu/records/bzm2q-a6208/files/716.full.pdf" }, "resource_type": "article", "pub_year": "2004", "author_list": "Chakrabarti, Kushal and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/dkg16-32f71", "eprint_id": 74929, "eprint_status": "archive", "datestamp": "2023-08-19 12:03:49", "lastmod": "2023-10-24 23:48:53", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Cawley-S", "name": { "family": "Cawley", "given": "Simon L." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "HMM sampling and applications to gene finding and alternative splicing", "ispublished": "pub", "full_text_status": "public", "keywords": "suboptimal parses, sampling, hidden Markov model, conserved alternative splicing", "note": "\u00a9 2003 Oxford University Press. \n\nReceived on March 17, 2003; accepted on June 9, 2003. \n\nThe authors would like to thank Michael Siani-Rose for useful discussions. This work was partially supported by NIH grant R01-HG02362-01.\n\nPublished - btg1057.pdf
", "abstract": "The standard method of applying hidden Markov models to biological problems is to find a Viterbi (maximal weight) path through the HMM graph. The Viterbi algorithm reduces the problem of finding the most likely hidden state sequence that explains given observations, to a dynamic programming problem for corresponding directed acyclic graphs. For example, in the gene finding application, the HMM is used to find the most likely underlying gene structure given a DNA sequence. In this note we discuss the applications of sampling methods for HMMs. The standard sampling algorithm for HMMs is a variant of the common forward-backward and backtrack algorithms, and has already been applied in the context of Gibbs sampling methods. Nevetheless, the practice of sampling state paths from HMMs does not seem to have been widely adopted, and important applications have been overlooked. We show how sampling can be used for finding alternative splicings for genes, including alternative splicings that are conserved between genes from related organisms. We also show how sampling from the posterior distribution is a natural way to compute probabilities for predicted exons and gene structures being correct under the assumed model. Finally, we describe a new memory efficient sampling algorithm for certain classes of HMMs which provides a practical sampling alternative to the Hirschberg algorithm for optimal alignment. The ideas presented have applications not only to gene finding and HMMs but more generally to stochastic context free grammars and RNA structure prediction.", "date": "2003-09-27", "date_type": "published", "publication": "Bioinformatics", "volume": "19", "number": "Suppl 2", "publisher": "Oxford University Press", "pagerange": "ii36-ii41", "id_number": "CaltechAUTHORS:20170308-151105303", "issn": "1367-4803", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170308-151105303", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R01-HG02362-01" } ] }, "doi": "10.1093/bioinformatics/btg1057", "primary_object": { "basename": "btg1057.pdf", "url": "https://authors.library.caltech.edu/records/dkg16-32f71/files/btg1057.pdf" }, "resource_type": "article", "pub_year": "2003", "author_list": "Cawley, Simon L. and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/n5qhb-91b08", "eprint_id": 74931, "eprint_status": "archive", "datestamp": "2023-08-19 11:46:05", "lastmod": "2023-10-24 23:49:04", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Lam-Fumei", "name": { "family": "Lam", "given": "Fumei" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Forcing numbers of stop signs", "ispublished": "pub", "full_text_status": "restricted", "note": "\u00a9 2002 Elsevier.", "abstract": "Let G be a graph with a perfect matching M. The forcing number of M is the smallest number of edges in a subset S\u2282M such that S is contained in no other perfect matching of G. We present methods for determining bounds on forcing numbers and apply these methods to find bounds for the forcing numbers of stop signs. A consequence of our main result is that every perfect matching of a stop sign of size (n,k) contains at least n disjoint alternating cycles.", "date": "2003-07-15", "date_type": "published", "publication": "Theoretical Computer Science", "volume": "303", "number": "2-3", "publisher": "Elsevier", "pagerange": "409-416", "id_number": "CaltechAUTHORS:20170308-151538288", "issn": "0304-3975", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170308-151538288", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "doi": "10.1016/S0304-3975(02)00499-1", "resource_type": "article", "pub_year": "2003", "author_list": "Lam, Fumei and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/07qjz-fdj71", "eprint_id": 74932, "eprint_status": "archive", "datestamp": "2023-08-19 11:43:50", "lastmod": "2023-10-24 23:49:06", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Bray-N-L", "name": { "family": "Bray", "given": "Nicolas" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "MAVID multiple alignment server", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2003 Oxford University Press. \n\nReceived February 15, 2003; Revised April 4, 2003; Accepted April 16, 2003.\n\nPublished - gkg623.pdf
", "abstract": "MAVID is a multiple alignment program suitable for many large genomic regions. The MAVID web server allows biomedical researchers to quickly obtain multiple alignments for genomic sequences and to subsequently analyse the alignments for conserved regions. MAVID has been successfully used for the alignment of closely related species such as primates and also for the alignment of more distant organisms such as human and fugu. The server is fast, capable of aligning hundreds of kilobases in less than a minute. The multiple alignment is used to build a phylogenetic tree for the sequences, which is subsequently used as a basis for identifying conserved regions in the alignment. The server can be accessed at http://baboon.math.berkeley.edu/mavid/.", "date": "2003-07-01", "date_type": "published", "publication": "Nucleic Acids Research", "volume": "31", "number": "13", "publisher": "Oxford University Press", "pagerange": "3525-3526", "id_number": "CaltechAUTHORS:20170308-152103068", "issn": "1362-4962", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170308-152103068", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "doi": "10.1093/nar/gkg623", "pmcid": "PMC169029", "primary_object": { "basename": "gkg623.pdf", "url": "https://authors.library.caltech.edu/records/07qjz-fdj71/files/gkg623.pdf" }, "resource_type": "article", "pub_year": "2003", "author_list": "Bray, Nicolas and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/s4hgs-n2n71", "eprint_id": 74934, "eprint_status": "archive", "datestamp": "2023-08-19 11:43:55", "lastmod": "2023-10-24 23:49:15", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Cawley-S", "name": { "family": "Cawley", "given": "Simon" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Alexandersson-M", "name": { "family": "Alexandersson", "given": "Marina" } } ] }, "title": "SLAM web server for comparative gene finding and alignment", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2003 Oxford University Press. \n\nReceived February 14, 2003; Revised and Accepted April 3, 2003. \n\nWe thank Nicolas Bray for helping to set up the web server site. Colin Dewey has helped in performing and analyzing the SLAM whole genome runs. L.P. and S.C. are partially supported by a grant from the NIH (R01-HG02362-01). M.A. is supported by the Swedish Foundation for Strategic Research.\n\nPublished - gkg583.pdf
", "abstract": "SLAM is a program that simultaneously aligns and annotates pairs of homologous sequences. The SLAM web server integrates SLAM with repeat masking tools and the AVID alignment program to allow for rapid alignment and gene prediction in user submitted sequences. Along with annotations and alignments for the submitted sequences, users obtain a list of predicted conserved non-coding sequences (and their associated alignments). The web site also links to whole genome annotations of the human, mouse and rat genomes produced with the SLAM program. The server can be accessed at http://bio.math.berkeley.edu/slam.", "date": "2003-07-01", "date_type": "published", "publication": "Nucleic Acids Research", "volume": "31", "number": "13", "publisher": "Oxford University Press", "pagerange": "3507-3509", "id_number": "CaltechAUTHORS:20170308-152527629", "issn": "1362-4962", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170308-152527629", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R01-HG02362-01" }, { "agency": "Swedish Foundation for Strategic Research" } ] }, "doi": "10.1093/nar/gkg583", "pmcid": "PMC168989", "primary_object": { "basename": "gkg583.pdf", "url": "https://authors.library.caltech.edu/records/s4hgs-n2n71/files/gkg583.pdf" }, "resource_type": "article", "pub_year": "2003", "author_list": "Cawley, Simon; Pachter, Lior; et el." }, { "id": "https://authors.library.caltech.edu/records/azh9j-rwv53", "eprint_id": 74936, "eprint_status": "archive", "datestamp": "2023-08-19 11:42:21", "lastmod": "2023-10-24 23:49:22", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Lam-Fumei", "name": { "family": "Lam", "given": "Fumei" } }, { "id": "Alexandersson-M", "name": { "family": "Alexandersson", "given": "Marina" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Picking Alignments from (Steiner) Trees", "ispublished": "pub", "full_text_status": "restricted", "keywords": "alignment, Steiner tree, hidden Markov model", "note": "\u00a9 2003 Mary Ann Liebert, Inc. \n\nM.A. was partially supported by STINT, the Swedish Foundation for International Cooperation in Research and Higher Education, and the Center for Pure and Applied Mathematics at U.C. Berkeley. Thanks to Nick Bray for helping with some of the implementation of SLIM and to Simon Cawley for helpful discussions and suggestions.", "abstract": "The application of Needleman\u2013Wunsch alignment techniques to biological sequences is complicated by two serious problems when the sequences are long: the running time, which scales as the product of the lengths of sequences, and the difficulty in obtaining suitable parameters that produce meaningful alignments. The running time problem is often corrected by reducing the search space, using techniques such as banding, or chaining of high-scoring pairs. The parameter problem is more difficult to fix, partly because the probabilistic model, which Needleman\u2013Wunsch is equivalent to, does not capture a key feature of biological sequence alignments, namely the alternation of conserved blocks and seemingly unrelated nonconserved segments. We present a solution to the problem of designing efficient search spaces for pair hidden Markov models that align biological sequences by taking advantage of their associated features. Our approach leads to an optimization problem, for which we obtain a 2-approximation algorithm, and that is based on the construction of Manhattan networks, which are close relatives of Steiner trees. We describe the underlying theory and show how our methods can be applied to alignment of DNA sequences in practice, succesfully reducing the Viterbi algorithm search space of alignment PHMMs by three orders of magnitude.", "date": "2003-07", "date_type": "published", "publication": "Journal of Computational Biology", "volume": "10", "number": "3-4", "publisher": "Mary Ann Liebert, Inc.", "pagerange": "509-520", "id_number": "CaltechAUTHORS:20170308-153301235", "issn": "1066-5277", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170308-153301235", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "Swedish Foundation for International Cooperation in Research and Higher Education (STINT)" }, { "agency": "University of California, Berkeley" } ] }, "doi": "10.1089/10665270360688156", "resource_type": "article", "pub_year": "2003", "author_list": "Lam, Fumei; Alexandersson, Marina; et el." }, { "id": "https://authors.library.caltech.edu/records/ahkxa-cwz69", "eprint_id": 74938, "eprint_status": "archive", "datestamp": "2023-08-19 11:08:46", "lastmod": "2023-10-24 23:49:32", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Alexandersson-M", "name": { "family": "Alexandersson", "given": "Marina" } }, { "id": "Cawley-S", "name": { "family": "Cawley", "given": "Simon" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "SLAM: Cross-Species Gene Finding and Alignment with a Generalized Pair Hidden Markov Model", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2003 Cold Spring Harbor Laboratory Press. The Authors acknowledge that six months after the full-issue publication date, the Article will be distributed under a Creative Commons CC-BY-NC License (Attribution-NonCommercial 4.0 International License, http://creativecommons.org/licenses/by-nc/4.0/). \n\nReceived May 13, 2002. Accepted December 3, 2002. \n\nWe thank Terry Speed and David Kulp for helpful suggestions and support, and James Harley Gorrell for technical computing advice. Marina Alexandersson was supported by STINT, the Swedish Foundation for International Cooperation in Research and Higher Education. This work was partially supported by NIH grant R01 HG02362-01. \n\nThe publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked \"advertisement\" in accordance with 18 USC section 1734 solely to indicate this fact.\n\nPublished - 496.full.pdf
", "abstract": "Comparative-based gene recognition is driven by the principle that conserved regions between related organisms are more likely than divergent regions to be coding. We describe a probabilistic framework for gene structure and alignment that can be used to simultaneously find both the gene structure and alignment of two syntenic genomic regions. A key feature of the method is the ability to enhance gene predictions by finding the best alignment between two syntenic sequences, while at the same time finding biologically meaningful alignments that preserve the correspondence between coding exons. Our probabilistic framework is the generalized pair hidden Markov model, a hybrid of (1) generalized hidden Markov models, which have been used previously for gene finding, and (2) pair hidden Markov models, which have applications to sequence alignment. We have built a gene finding and alignment program called SLAM, which aligns and identifies complete exon/intron structures of genes in two related but unannotated sequences of DNA. SLAM is able to reliably predict gene structures for any suitably related pair of organisms, most notably with fewer false-positive predictions compared to previous methods (examples are provided for Homo sapiens/Mus musculus andPlasmodium falciparum/Plasmodium vivax comparisons). Accuracy is obtained by distinguishing conserved noncoding sequence (CNS) from conserved coding sequence. CNS annotation is a novel feature of SLAM and may be useful for the annotation of UTRs, regulatory elements, and other noncoding features.", "date": "2003-03-01", "date_type": "published", "publication": "Genome Research", "volume": "13", "number": "3", "publisher": "Cold Spring Harbor Laboratory Press", "pagerange": "496-502", "id_number": "CaltechAUTHORS:20170308-154151410", "issn": "1088-9051", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170308-154151410", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "Swedish Foundation for International Cooperation in Research and Higher Education (STINT)" }, { "agency": "NIH", "grant_number": "R01-HG02362-01" } ] }, "doi": "10.1101/gr.424203", "pmcid": "PMC430255", "primary_object": { "basename": "496.full.pdf", "url": "https://authors.library.caltech.edu/records/ahkxa-cwz69/files/496.full.pdf" }, "resource_type": "article", "pub_year": "2003", "author_list": "Alexandersson, Marina; Cawley, Simon; et el." }, { "id": "https://authors.library.caltech.edu/records/hty8p-scq52", "eprint_id": 74939, "eprint_status": "archive", "datestamp": "2023-09-22 17:07:02", "lastmod": "2023-10-23 23:21:44", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Boffelli-D", "name": { "family": "Boffelli", "given": "Dario" } }, { "id": "McAuliffe-J-D", "name": { "family": "McAuliffe", "given": "Jon" } }, { "id": "Ovcharenko-D", "name": { "family": "Ovcharenko", "given": "Dmitriy" } }, { "id": "Lewis-K-D", "name": { "family": "Lewis", "given": "Keith D." } }, { "id": "Ovcharenko-I", "name": { "family": "Ovcharenko", "given": "Ivan" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Rubin-E-M", "name": { "family": "Rubin", "given": "Edward M." } } ] }, "title": "Phylogenetic Shadowing of Primate Sequences to Find Functional Regions of the Human Genome", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2003 American Association for the Advancement of Science. \n\n9 December 2002; accepted 14 January 2003. \n\nWe thank J.-F. Cheng for support with the sequencing infrastructure, the Zoological Society of San Diego for providing primate DNA samples, and I. Udalova and L. Pennacchio for useful discussions. M. Jordan contributed useful suggestions concerning statistical methods. This work was performed under the auspices of the U.S. Department of Energy's Office of Science, Biological and Environmental Research; by the University of California, Lawrence Berkeley National Laboratory under Contract No. DEAC0376SF00098; supported by Grant #HL66728, Berkeley-PGA, under the Programs for Genomic Application, funded by National Heart, Lung, and Blood Institute, USA. L.P. was partially supported by a grant from NIH (R01-HG02362-01).\n\nSupplemental Material - 26/299.5611.1391.DC1/Boffelli.SOM.pdf
", "abstract": "Nonhuman primates represent the most relevant model organisms to understand the biology of Homo sapiens. The recent divergence and associated overall sequence conservation between individual members of this taxon have nonetheless largely precluded the use of primates in comparative sequence studies. We used sequence comparisons of an extensive set of Old World and New World monkeys and hominoids to identify functional regions in the human genome. Analysis of these data enabled the discovery of primate-specific gene regulatory elements and the demarcation of the exons of multiple genes. Much of the information content of the comprehensive primate sequence comparisons could be captured with a small subset of phylogenetically close primates. These results demonstrate the utility of intraprimate sequence comparisons to discover common mammalian as well as primate-specific functional elements in the human genome, which are unattainable through the evaluation of more evolutionarily distant species.", "date": "2003-02-28", "date_type": "published", "publication": "Science", "volume": "299", "number": "5611", "publisher": "American Association for the Advancement of Science", "pagerange": "1391-1394", "id_number": "CaltechAUTHORS:20170308-154624549", "issn": "0036-8075", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170308-154624549", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "Department of Energy (DOE)", "grant_number": "DE-AC03-76SF00098" }, { "agency": "NIH", "grant_number": "HL66728" }, { "agency": "NIH", "grant_number": "R01-HG02362-01" } ] }, "doi": "10.1126/science.1081331", "primary_object": { "basename": "Boffelli.SOM.pdf", "url": "https://authors.library.caltech.edu/records/hty8p-scq52/files/Boffelli.SOM.pdf" }, "resource_type": "article", "pub_year": "2003", "author_list": "Boffelli, Dario; McAuliffe, Jon; et el." }, { "id": "https://authors.library.caltech.edu/records/ps1wq-rss05", "eprint_id": 74943, "eprint_status": "archive", "datestamp": "2023-08-19 10:49:52", "lastmod": "2023-10-24 23:49:41", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Bray-N-L", "name": { "family": "Bray", "given": "Nick" } }, { "id": "Dubchak-I", "name": { "family": "Dubchak", "given": "Inna" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "AVID: A Global Alignment Program", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2003 Cold Spring Harbor Laboratory Press. The Authors acknowledge that six months after the full-issue publication date, the Article will be distributed under a Creative Commons CC-BY-NC License (Attribution-NonCommercial 4.0 International License, http://creativecommons.org/licenses/by-nc/4.0/). \n\nReceived September 9, 2002. Accepted November 7, 2002. \n\nWe thank Alex Poliakov for helping in setting up the AVID Web servers and provided extensive debugging support and assistance. We also thank Jody Schwartz for help in testing and debugging AVID and Jim Lord who helped in developing overlap identification methods for draft contigs. Thanks also to the Mouse Sequencing Consortium for generating whole genome mouse sequence, which helped greatly in refining and streamlining AVID. Some of the sequence data used to benchmark the alignment programs were generated by the NIH Intramural Sequencing Center (www.nisc.nih.gov). This project was supported in part by a Program in Genomic Applications grant (PGA) from the National Heart Lung and Blood Institute and a grant from the NIH (R01-HG02362-01). \n\nThe publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked \"advertisement\" in accordance with 18 USC section 1734 solely to indicate this fact.\n\nPublished - 97.full.pdf
", "abstract": "In this paper we describe a new global alignment method called AVID. The method is designed to be fast, memory efficient, and practical for sequence alignments of large genomic regions up to megabases long. We present numerous applications of the method, ranging from the comparison of assemblies to alignment of large syntenic genomic regions and whole genome human/mouse alignments. We have also performed a quantitative comparison of AVID with other popular alignment tools. To this end, we have established a format for the representation of alignments and methods for their comparison. These formats and methods should be useful for future studies. The tools we have developed for the alignment comparisons, as well as the AVID program, are publicly available. See Web Site References section for AVID Web address and Web addresses for other programs discussed in this paper.", "date": "2003-01-01", "date_type": "published", "publication": "Genome Research", "volume": "13", "number": "1", "publisher": "Cold Spring Harbor Laboratory Press", "pagerange": "97-102", "id_number": "CaltechAUTHORS:20170308-155631683", "issn": "1088-9051", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170308-155631683", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH", "grant_number": "R01-HG02362-01" } ] }, "doi": "10.1101/gr.789803", "pmcid": "PMC430967", "primary_object": { "basename": "97.full.pdf", "url": "https://authors.library.caltech.edu/records/ps1wq-rss05/files/97.full.pdf" }, "resource_type": "article", "pub_year": "2003", "author_list": "Bray, Nick; Dubchak, Inna; et el." }, { "id": "https://authors.library.caltech.edu/records/2yrxj-wqw78", "eprint_id": 74944, "eprint_status": "archive", "datestamp": "2023-08-19 10:49:57", "lastmod": "2023-10-24 23:49:44", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Couronne-O", "name": { "family": "Couronne", "given": "Olivier" } }, { "id": "Poliakov-A-N-B", "name": { "family": "Poliakov", "given": "Alexander" } }, { "id": "Bray-N-L", "name": { "family": "Bray", "given": "Nicolas" } }, { "id": "Ishkhanov-T", "name": { "family": "Ishkhanov", "given": "Tigran" } }, { "id": "Ryaboy-D", "name": { "family": "Ryaboy", "given": "Dmitriy" } }, { "id": "Rubin-E-M", "name": { "family": "Rubin", "given": "Edward" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Dubchak-I", "name": { "family": "Dubchak", "given": "Inna" } } ] }, "title": "Strategies and Tools for Whole-Genome Alignments", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2003 Cold Spring Harbor Laboratory Press. The Authors acknowledge that six months after the full-issue publication date, the Article will be distributed under a Creative Commons CC-BY-NC License (Attribution-NonCommercial 4.0 International License, http://creativecommons.org/licenses/by-nc/4.0/). \n\nReceived September 4, 2002. Accepted November 6, 2002. \n\nWe thank the Mouse Genome Sequencing Consortium for the opportunity to work with the mouse genome during the sequencing phases and in the subsequent analysis phase. The analysis group, comprising many individuals and teams from around the world, was particularly helpful not only in providing crucial suggestions and advice as the project unfolded, but also in contributing many independent ideas. Special thanks go to Jim Kent, who coordinated the alignment efforts of the mouse sequencing consortium analysis group and designed the filtering methods for calculating alignment coverage. Thanks also to the Penn State Group (Laura Elnitsky, Ross Hardison, Webb Miller, Scott Schwartz, and others) and the PatternHunter Group (Ming Li, Mike Zody, and others), who developed different alignment strategies which we compared. We thank Ivan Ovcharenko for initiating the project and developing the prototype. We also thank Serafim Batzoglou for his help with generating simulated reads and assemblies for our test sets. The project was partially supported by a Program for Genomic Applications grant from the National Heart Lung and Blood Institute. This work was supported by the Director, Office of Science, of the U.S. Department of Energy under Contract No. DE-AC03-76SF00098. \n\nThe publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked \"advertisement\" in accordance with 18 USC section 1734 solely to indicate this fact.\n\nPublished - 73.full.pdf
", "abstract": "The availability of the assembled mouse genome makes possible, for the first time, an alignment and comparison of two large vertebrate genomes. We investigated different strategies of alignment for the subsequent analysis of conservation of genomes that are effective for assemblies of different quality. These strategies were applied to the comparison of the working draft of the human genome with the Mouse Genome Sequencing Consortium assembly, as well as other intermediate mouse assemblies. Our methods are fast and the resulting alignments exhibit a high degree of sensitivity, covering more than 90% of known coding exons in the human genome. We obtained such coverage while preserving specificity. With a view towards the end user, we developed a suite of tools and Web sites for automatically aligning and subsequently browsing and working with whole-genome comparisons. We describe the use of these tools to identify conserved non-coding regions between the human and mouse genomes, some of which have not been identified by other methods.", "date": "2003-01-01", "date_type": "published", "publication": "Genome Research", "volume": "13", "number": "1", "publisher": "Cold Spring Harbor Laboratory Press", "pagerange": "73-80", "id_number": "CaltechAUTHORS:20170308-160145750", "issn": "1088-9051", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170308-160145750", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "National Heart, Lung and Blood Institute" }, { "agency": "Department of Energy (DOE)", "grant_number": "DE-AC03-76SF00098" } ] }, "doi": "10.1101/gr.762503", "pmcid": "PMC430965", "primary_object": { "basename": "73.full.pdf", "url": "https://authors.library.caltech.edu/records/2yrxj-wqw78/files/73.full.pdf" }, "resource_type": "article", "pub_year": "2003", "author_list": "Couronne, Olivier; Poliakov, Alexander; et el." }, { "id": "https://authors.library.caltech.edu/records/0x565-8ph71", "eprint_id": 74965, "eprint_status": "archive", "datestamp": "2023-08-19 10:25:23", "lastmod": "2023-10-24 23:51:14", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Waterston-R-H", "name": { "family": "Waterston", "given": "Robert H." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Initial sequencing and comparative analysis of the mouse genome", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2002 Macmillan Publishers Limited. \n\nReceived 18 September 2002; Accepted 31 October 2002. \n\nWe thank J. Takahashi and M. Johnston for comments on the manuscript; the Mouse Liaison Group for strategic advice; L. Gaffney, D. Leja and K.-S. Toh for graphical help; B. Graham and G. Roberts for administrative work on sequencing of individual mouse BACs; and P. Kassos and M. McMurtry for secretarial assistance. We thank D. Hill and L. Corbani of the Mouse Genome Informatics Group for their contributions to the GO analysis for mouse and human, and the members of the Bork group at EMBL for discussions. Funding was provided by the National Institutes of Health (National Human Genome Research Institute, National Cancer Institute, National Institute of Dental and Craniofacial Research, National Institute of Diabetes and Digestive and Kidney Diseases, National Institute of General Medical Sciences, National Eye Institute, National Institute of Environmental Health Sciences, National Institute of Aging, National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institute on Deafness and Other Communication Disorders, National Institute of Mental Health, National Institute on Drug Abuse, National Center for Research Resources, the National Heart Lung and Blood Institute and The Fogarty International Center); the Wellcome Trust; the Howard Hughes Medical Institute; the United States Department of Energy; the National Science Foundation; the Medical Research Council; NSERC; BMBF (German Ministry for Research and Education); the European Molecular Biology Laboratory; Plan Nacional de I + D and Instituto Carlos III; Swiss National Science Foundation, NCCR Frontiers in Genetics, the Swiss Cancer League and the 'Childcare' and 'J. Lejeune' Foundations; and the Ministry of Education, Culture, Sports, Science and Technology of Japan. The initial threefold sequence coverage was partly supported by the Mouse Sequencing Consortium (GlaxoSmithKline, Merck and Affymetrix) through the Foundation for the National Institutes of Health. We acknowledge A. Holden for coordinating the Mouse Sequencing Consortium. We thank the Sanger Institute systems group for maintenance and provision of the computer resource. The MGSC also used Hewlett-Packard Company's BioCluster, a configuration of 27 HP AlphaServer ES40 systems with 100 CPUs and 1 terabyte of storage. The BioCluster is housed in Hewlett-Packard's IQ Solutions Center, and was accessed remotely. The computing resource greatly accelerated the analysis. \n\nAuthors' contributions: The following authors contributed to project leadership: R. H. Waterston, K. Lindblad-Toh, E. Birney, J. Rogers, M. R. Brent, F. S. Collins, R. Guig\u00f3, R. C. Hardison, D. Haussler, D. B. Jaffe, W. J. Kent, W. Miller, C. P. Ponting, A. Smit, M. C. Zody and E. S. Lander. \n\nAvailability of sequence and assembly data: Unprocessed sequence reads are available from the NCBI trace archive (ftp://ftp.ncbi.nih.gov/pub/TraceDB/mus_musculus/). Raw assembly data (before removal of contaminants, anchoring to chromosomes, and addition of finished sequence) are available from the Whitehead Institute for Biomedical Research (WIBR) (ftp://wolfram.wi.mit.edu/pub/mouse_contigs/Mar10_02/). The released assembly MGSCv3 is available from Ensembl (http://www.ensembl.org/Mus_musculus/), NCBI (ftp://ftp.ncbi.nih.gov/genomes/M_musculus/MGSCv3_Release1/), UCSC (http://genome.ucsc.edu/downloads.html) and WIBR (ftp://wolfram.wi.mit.edu/pub/mouse_contigs/MGSC_V3/). (See Supplementary Information for detailed Methods.) \n\nThe author declares no competing financial interests.\n\nSupplemental Material - nature01262-s1.doc
Supplemental Material - nature01262-s10.doc
Supplemental Material - nature01262-s11.jpg
Supplemental Material - nature01262-s12.jpg
Supplemental Material - nature01262-s13.jpg
Supplemental Material - nature01262-s14.jpg
Supplemental Material - nature01262-s15.jpg
Supplemental Material - nature01262-s16.jpg
Supplemental Material - nature01262-s2.doc
Supplemental Material - nature01262-s3.doc
Supplemental Material - nature01262-s4.doc
Supplemental Material - nature01262-s5.doc
Supplemental Material - nature01262-s6.doc
Supplemental Material - nature01262-s7.doc
Supplemental Material - nature01262-s8.doc
Supplemental Material - nature01262-s9.doc
", "abstract": "The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.", "date": "2002-12-05", "date_type": "published", "publication": "Nature", "volume": "420", "number": "6915", "publisher": "Nature Publishing Group", "pagerange": "520-562", "id_number": "CaltechAUTHORS:20170309-090859678", "issn": "0028-0836", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170309-090859678", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH" }, { "agency": "Wellcome Trust" }, { "agency": "Howard Hughes Medical Institute (HHMI)" }, { "agency": "Department of Energy (DOE)" }, { "agency": "NSF" }, { "agency": "Medical Research Council (UK)" }, { "agency": "Natural Sciences and Engineering Research Council of Canada (NSERC)" }, { "agency": "Bundesministerium f\u00fcr Bildung und Forschung (BMBF)" }, { "agency": "European Molecular Biology Laboratory (EMBL)" }, { "agency": "Plan Nacional de I + D (Spain)" }, { "agency": "Instituto Carlos III" }, { "agency": "Swiss National Science Foundation (SNSF)" }, { "agency": "Swiss Cancer League" }, { "agency": "ChildCare Foundation" }, { "agency": "Jerome Lejeune Foundation" }, { "agency": "Ministry of Education, Culture, Sports, Science and Technology (MEXT)" }, { "agency": "Mouse Sequencing Consortium" } ] }, "corp_creators": { "items": [ "Mouse Genome Sequencing Consortium" ] }, "doi": "10.1038/nature01262", "primary_object": { "basename": "nature01262-s1.doc", "url": "https://authors.library.caltech.edu/records/0x565-8ph71/files/nature01262-s1.doc" }, "related_objects": [ { "basename": "nature01262-s10.doc", "url": "https://authors.library.caltech.edu/records/0x565-8ph71/files/nature01262-s10.doc" }, { "basename": "nature01262-s14.jpg", "url": "https://authors.library.caltech.edu/records/0x565-8ph71/files/nature01262-s14.jpg" }, { "basename": "nature01262-s16.jpg", "url": "https://authors.library.caltech.edu/records/0x565-8ph71/files/nature01262-s16.jpg" }, { "basename": "nature01262-s3.doc", "url": "https://authors.library.caltech.edu/records/0x565-8ph71/files/nature01262-s3.doc" }, { "basename": "nature01262-s4.doc", "url": "https://authors.library.caltech.edu/records/0x565-8ph71/files/nature01262-s4.doc" }, { "basename": "nature01262-s5.doc", "url": "https://authors.library.caltech.edu/records/0x565-8ph71/files/nature01262-s5.doc" }, { "basename": "nature01262-s7.doc", "url": "https://authors.library.caltech.edu/records/0x565-8ph71/files/nature01262-s7.doc" }, { "basename": "nature01262-s12.jpg", "url": "https://authors.library.caltech.edu/records/0x565-8ph71/files/nature01262-s12.jpg" }, { "basename": "nature01262-s13.jpg", "url": "https://authors.library.caltech.edu/records/0x565-8ph71/files/nature01262-s13.jpg" }, { "basename": "nature01262-s2.doc", "url": "https://authors.library.caltech.edu/records/0x565-8ph71/files/nature01262-s2.doc" }, { "basename": "nature01262-s6.doc", "url": "https://authors.library.caltech.edu/records/0x565-8ph71/files/nature01262-s6.doc" }, { "basename": "nature01262-s8.doc", "url": "https://authors.library.caltech.edu/records/0x565-8ph71/files/nature01262-s8.doc" }, { "basename": "nature01262-s9.doc", "url": "https://authors.library.caltech.edu/records/0x565-8ph71/files/nature01262-s9.doc" }, { "basename": "nature01262-s11.jpg", "url": "https://authors.library.caltech.edu/records/0x565-8ph71/files/nature01262-s11.jpg" }, { "basename": "nature01262-s15.jpg", "url": "https://authors.library.caltech.edu/records/0x565-8ph71/files/nature01262-s15.jpg" } ], "resource_type": "article", "pub_year": "2002", "author_list": "Waterston, Robert H. and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/nvtfr-fpx06", "eprint_id": 74967, "eprint_status": "archive", "datestamp": "2023-08-19 09:45:23", "lastmod": "2023-10-24 23:51:22", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Alexandersson-M", "name": { "family": "Alexandersson", "given": "Marina" } }, { "id": "Cawley-S", "name": { "family": "Cawley", "given": "Simon" } } ] }, "title": "Applications of Generalized Pair Hidden Markov Models to Alignment and Gene Finding Problems", "ispublished": "pub", "full_text_status": "public", "keywords": "hidden Markov model, alignment, gene \ufffdfinding, comparative genomics", "note": "\u00a9 2002 Mary Ann Liebert, Inc. \n\nWe thank Terry Speed for valuable comments. M.A. was supported by STINT, the Swedish Foundation for International Cooperation in Research and Higher Education.\n\nPublished - 10665270252935520.pdf
", "abstract": "Hidden Markov models (HMMs) have been successfully applied to a variety of problems in molecular biology, ranging from alignment problems to gene finding and annotation. Alignment problems can be solved with pair HMMs, while gene finding programs rely on generalized HMMs in order to model exon lengths. In this paper, we introduce the generalized pair HMM (GPHMM), which is an extension of both pair and generalized HMMs. We show how GPHMMs, in conjunction with approximate alignments, can be used for cross-species gene finding and describe applications to DNA\u2013cDNA and DNA\u2013protein alignment. GPHMMs provide a unifying and probabilistically sound theory for modeling these problems.", "date": "2002-07", "date_type": "published", "publication": "Journal of Computational Biology", "volume": "9", "number": "2", "publisher": "Mary Ann Liebert, Inc.", "pagerange": "389-399", "id_number": "CaltechAUTHORS:20170309-092737635", "issn": "1066-5277", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170309-092737635", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "Swedish Foundation for International Cooperation in Research and Higher Education (STINT)" } ] }, "doi": "10.1089/10665270252935520", "primary_object": { "basename": "10665270252935520.pdf", "url": "https://authors.library.caltech.edu/records/nvtfr-fpx06/files/10665270252935520.pdf" }, "resource_type": "article", "pub_year": "2002", "author_list": "Pachter, Lior; Alexandersson, Marina; et el." }, { "id": "https://authors.library.caltech.edu/records/mg9tz-23986", "eprint_id": 74973, "eprint_status": "archive", "datestamp": "2023-08-19 09:26:05", "lastmod": "2023-10-24 23:51:42", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Benos-P", "name": { "family": "Benos", "given": "Panayiotis" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "From First Base: The Sequence of the Tip of the X Chromosome of Drosophila melanogaster, a Comparison of Two Sequencing Strategies", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2002 Cold Spring Harbor Laboratory Press. The Authors acknowledge that six months after the full-issue publication date, the Article will be distributed under a Creative Commons CC-BY-NC License (Attribution-NonCommercial 4.0 International License, http://creativecommons.org/licenses/by-nc/4.0/). \n\nReceived December 10, 2000. Accepted February 16, 2001. \n\nThis work was supported by a Contract from the European Commission under Framework Programme 4 (coordinator D.M. Glover), by a grant from the Medical Research Council, London to M.A. and D.M.G., by a grant from the Direcci\u00f3n General de Investigacion Cient\u0131\u0301fica y T\u00e9cnica to J.M., by a grant from the Hellenic Secretariat General for Science and Technology to K.L., and by a grant from the Deutsche Humangenomprojekt to H.J. R.D.C.S. was supported by a Wellcome Trust Senior Fellowship. We thank many colleagues for their help. We are grateful to Gerry Rubin and his colleagues at the BDGP, particularly Suzanna Lewis, Sima Misra, and Susan Celniker (and, of course, Gerry himself) for the exchange of materials, information, and ideas over the years. Greg Helt of the BDGP was very helpful in providing us with the initial Drosophila gene training set. We also thank Rolf Apweiler and his SWISS-PROT/TrEMBL team at the EBI, particularly Alexander Kanapin and Wolfgang Fleischmann for their help with the protein motif analysis. We also thank Rolf Apweiler, head of that team, for his blessings. Richard Durbin's group at the Sanger Center have been extraordinarily helpful; in particular, Daniel Lawson gave tremendous help with ACeDB despite having to bend double at times. Kim Rutherford of the Pathogen Sequencing Unit at the Sanger Center provided the software to draw Figure 1; without this we may have been lost. We thank Brian Oliver of the NIH, Bethesda for a pre-print copy of his paper on testis ESTs, Leyla Bayraktaroglou (FlyBase group, Harvard) for her help in the curation of reference sequence data sets, and David Judge of the Cambridge School of Biological Sciences Biocomputing Unit for help. \n\nThe publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked \"advertisement\" in accordance with 18 USC section 1734 solely to indicate this fact. \n\n[All of the sequences analyzed in this paper have been deposited in the EMBL-Bank database under the following accession nos.: AL009146, AL009147, AL009171, AL009188\u2013AL009196, AL021067, AL021086, AL021106\u2013AL021108, AL021726, AL021728, AL022017, AL022018, AL022139, AL023873, AL023874, AL023893, AL024453, AL024455\u2013AL024457, AL024485, AL030993, AL030994, AL031024\u2013AL031028, AL031128, AL031173, AL031366, AL031367, AL031581\u2013AL031583, AL031640, AL031765, AL031883, AL031884, AL034388, AL034544, AL035104, AL035105, AL035207, AL035245, AL035331, AL035632, AL049535, AL050231, AL050232, AL109630, AL121804, AL121806, AL132651, AL132792, AL132797, AL133503\u2013AL133506, AL138678, AL138971, AL138972, and Z98269. A single file (FASTA format) of the 2.6-Mb contig is available from ftp://ftp.ebi.ac.uk/pub/databases/edgp/contigs/contig_1.fa.] \n\nSupplementary data are available fromftp://ebi.ac.uk/pub/databases/edgp/EDGP-GenomeResearch_suppdata_2001.\n\nPublished - 710.full.pdf
", "abstract": "We present the sequence of a contiguous 2.63 Mb of DNA extending from the tip of the X chromosome ofDrosophila melanogaster. Within this sequence, we predict 277 protein coding genes, of which 94 had been sequenced already in the course of studying the biology of their gene products, and examples of 12 different transposable elements. We show that an interval between bands 3A2 and 3C2, believed in the 1970s to show a correlation between the number of bands on the polytene chromosomes and the 20 genes identified by conventional genetics, is predicted to contain 45 genes from its DNA sequence. We have determined the insertion sites ofP-elements from 111 mutant lines, about half of which are in a position likely to affect the expression of novel predicted genes, thus representing a resource for subsequent functional genomic analysis. We compare the European Drosophila Genome Project sequence with the corresponding part of the independently assembled and annotated Joint Sequence determined through \"shotgun\" sequencing. Discounting differences in the distribution of known transposable elements between the strains sequenced in the two projects, we detected three major sequence differences, two of which are probably explained by errors in assembly; the origin of the third major difference is unclear. In addition there are eight sequence gaps within the Joint Sequence. At least six of these eight gaps are likely to be sites of transposable elements; the other two are complex. Of the 275 genes in common to both projects, 60% are identical within 1% of their predicted amino-acid sequence and 31% show minor differences such as in choice of translation initiation or termination codons; the remaining 9% show major differences in interpretation.", "date": "2002-05", "date_type": "published", "publication": "Genome Research", "volume": "11", "number": "5", "publisher": "Cold Spring Harbor Laboratory Press", "pagerange": "710-730", "id_number": "CaltechAUTHORS:20170309-100309238", "issn": "1088-9051", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170309-100309238", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "European Commission" }, { "agency": "Medical Research Council (UK)" }, { "agency": "Direcci\u00f3n General de Investigacion Cient\u0131\u0301fica y T\u00e9cnica" }, { "agency": "Hellenic Secretariat General for Science and Technology" }, { "agency": "Deutsche Humangenomprojekt" }, { "agency": "Wellcome Trust" } ] }, "doi": "10.1101/gr.173801", "pmcid": "PMC311117", "primary_object": { "basename": "710.full.pdf", "url": "https://authors.library.caltech.edu/records/mg9tz-23986/files/710.full.pdf" }, "resource_type": "article", "pub_year": "2002", "author_list": "Benos, Panayiotis and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/0947j-v2n07", "eprint_id": 74968, "eprint_status": "archive", "datestamp": "2023-08-19 09:26:00", "lastmod": "2023-10-24 23:51:25", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Loots-G-G", "name": { "family": "Loots", "given": "Gabriela G." } }, { "id": "Ovcharenko-I", "name": { "family": "Ovcharenko", "given": "Ivan" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Dubchak-I", "name": { "family": "Dubchak", "given": "Inna" } }, { "id": "Rubin-E-M", "name": { "family": "Rubin", "given": "Edward M." } } ] }, "title": "rVista for Comparative Sequence-Based Discovery of Functional Transcription Factor Binding Sites", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2002 Cold Spring Harbor Laboratory Press. The Authors acknowledge that six months after the full-issue publication date, the Article will be distributed under a Creative Commons CC-BY-NC License (Attribution-NonCommercial 4.0 International License, http://creativecommons.org/licenses/by-nc/4.0/). \n\nReceived November 27, 2001. Accepted March 7, 2002. \n\nWe are grateful to Moshe Malkin, Jody Schwartz, Alexander Fabrikant, and Michael Brudno for technical assistance. We thank the Rubin Laboratory for insightful comments on the manuscript. This work was supported by the Program for Genomic Applications (PGAs) funded by the National Heart, Lung, and Blood Institute (NHLBI/NIH); G.G. Loots was supported by the Department of Energy Alexander Hollaender Fellowship. \n\nThe publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked \"advertisement\" in accordance with 18 USC section 1734 solely to indicate this fact.\n\nPublished - 832.full.pdf
", "abstract": "Identifying transcriptional regulatory elements represents a significant challenge in annotating the genomes of higher vertebrates. We have developed a computational tool, rVISTA, for high-throughput discovery of cis-regulatory elements that combines clustering of predicted transcription factor binding sites (TFBSs) and the analysis of interspecies sequence conservation to maximize the identification of functional sites. To assess the ability of rVISTA to discover true positive TFBSs while minimizing the prediction of false positives, we analyzed the distribution of several TFBSs across 1 Mb of the well-annotated cytokine gene cluster (Hs5q31; Mm11). Because a large number of AP-1, NFAT, and GATA-3 sites have been experimentally identified in this interval, we focused our analysis on the distribution of all binding sites specific for these transcription factors. The exploitation of the orthologous human\u2013mouse dataset resulted in the elimination of >95% of the \u223c58,000 binding sites predicted on analysis of the human sequence alone, whereas it identified 88% of the experimentally verified binding sites in this region.", "date": "2002-05", "date_type": "published", "publication": "Genome Research", "volume": "12", "number": "5", "publisher": "Cold Spring Harbor Laboratory Press", "pagerange": "832-839", "id_number": "CaltechAUTHORS:20170309-093325277", "issn": "1088-9051", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170309-093325277", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "National Heart, Lung, and Blood Institute" }, { "agency": "Department of Energy (DOE)" } ] }, "doi": "10.1101/gr.225502", "pmcid": "PMC186580", "primary_object": { "basename": "832.full.pdf", "url": "https://authors.library.caltech.edu/records/0947j-v2n07/files/832.full.pdf" }, "resource_type": "article", "pub_year": "2002", "author_list": "Loots, Gabriela G.; Ovcharenko, Ivan; et el." }, { "id": "https://authors.library.caltech.edu/records/3f346-d0z81", "eprint_id": 74969, "eprint_status": "archive", "datestamp": "2023-08-19 09:09:39", "lastmod": "2023-10-24 23:51:29", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Dubchak-I", "name": { "family": "Dubchak", "given": "Inna" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "The computational challenges of applying comparative-based computational methods to whole genomes", "ispublished": "pub", "full_text_status": "restricted", "keywords": "comparative genomics, whole genomes, sequence alignment, gene finding, regulatory elements", "note": "\u00a9 2002 Henry Stewart Publications. \n\nWe thank Dr Edward Rubin for helpful discussions. This work was supported by one of the 11 Programs for Genomic Applications (PGAs) funded by the National Heart, Lung, and Blood Institute (NHLBI).", "abstract": "The explosion in genomic sequence avaliable in public databases has resulted in an unprecedented opportunity for computational whole genome analyses. A number of promising comparative-based approaches have been developed for gene finding, regulatory element discovery and other purposes, and it is clear that these tools will play a fundamental role in analysing the enormous amount of new data that is currently being generated. The synthesis of computationally intensive comparative computational approaches with the requirement for computational scientists. We focus on a few of these challenges, using by way of example the problems of alignment, gene and finding and regulatory element discovery, and discuss the issues that have arisen in attempts to solve these problems in the context of whole genome analysis pipelines.", "date": "2002-03", "date_type": "published", "publication": "Briefings in Bioinformatics", "volume": "3", "number": "1", "publisher": "Oxford University Press", "pagerange": "18-22", "id_number": "CaltechAUTHORS:20170309-093904962", "issn": "1467-5463", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170309-093904962", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "National Heart, Lung, and Blood Institute" } ] }, "doi": "10.1093/bib/3.1.18", "resource_type": "article", "pub_year": "2002", "author_list": "Dubchak, Inna and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/hnyxk-9bm85", "eprint_id": 74971, "eprint_status": "archive", "datestamp": "2023-08-19 08:52:05", "lastmod": "2023-10-24 23:51:39", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Peter-A", "name": { "family": "Peter", "given": "Annette" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Mapping and identification of essential gene functions on the X chromosome of Drosophila", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2002 European Molecular Biology Organization. \n\nReceived October 11, 2001; revised and accepted November 22, 2001. \n\nWe thank many colleagues for their help. This work was supported by the German Human Genome Project (grant 01 KW 9632/9) and a contract from the European Commission.\n\nSupplemental Material - inline-supplementary-material-1.pdf
", "abstract": "The Drosophila melanogaster genome consists of four chromosomes that contain 165 Mb of DNA, 120 Mb of which are euchromatic. The two Drosophila Genome Projects, in collaboration with Celera Genomics Systems, have sequenced the genome, complementing the previously established physical and genetic maps. In addition, the Berkeley Drosophila Genome Project has undertaken large\u2010scale functional analysis based on mutagenesis by transposable P element insertions into autosomes. Here, we present a large\u2010scale P element insertion screen for vital gene functions and a BAC tiling map for the X chromosome. A collection of 501 X\u2010chromosomal P element insertion lines was used to map essential genes cytogenetically and to establish short sequence tags (STSs) linking the insertion sites to the genome. The distribution of the P element integration sites, the identified genes and transcription units as well as the expression patterns of the P\u2010element\u2010tagged enhancers is described and discussed.", "date": "2002-01", "date_type": "published", "publication": "EMBO Reports", "volume": "3", "number": "1", "publisher": "European Molecular Biology Organization", "pagerange": "34-38", "id_number": "CaltechAUTHORS:20170309-095230836", "issn": "1469-221X", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170309-095230836", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "Deutsche Humangenomprojekt", "grant_number": "01 KW 9632/9" }, { "agency": "European Commission" } ] }, "doi": "10.1093/embo-reports/kvf012", "pmcid": "PMC1083931", "primary_object": { "basename": "inline-supplementary-material-1.pdf", "url": "https://authors.library.caltech.edu/records/hnyxk-9bm85/files/inline-supplementary-material-1.pdf" }, "resource_type": "article", "pub_year": "2002", "author_list": "Peter, Annette and Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/tjked-pcn03", "eprint_id": 74977, "eprint_status": "archive", "datestamp": "2023-08-19 06:29:30", "lastmod": "2023-10-24 23:51:51", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Mayor-C", "name": { "family": "Mayor", "given": "Chris" } }, { "id": "Brudno-M", "name": { "family": "Brudno", "given": "Michael" } }, { "id": "Schwartz-J-R", "name": { "family": "Schwartz", "given": "Jody R." } }, { "id": "Poliakov-A-N-B", "name": { "family": "Poliakov", "given": "Alexander" } }, { "id": "Rubin-E-M", "name": { "family": "Rubin", "given": "Edward M." } }, { "id": "Frazier-K-A", "name": { "family": "Frazer", "given": "Kelly A." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior S." }, "orcid": "0000-0002-9164-6231" }, { "id": "Dubchak-I", "name": { "family": "Dubchak", "given": "Inna" } } ] }, "title": "VISTA : visualizing global DNA sequence alignments of arbitrary length", "ispublished": "pub", "full_text_status": "restricted", "note": "\u00a9 2000 Oxford University Press. \n\nReceived on June 1, 2000; revised on July 19, 2000; accepted on August 2, 2000. \n\nThis work has been supported by the Department of Energy contract DE-AC03-76SF00098, NIH GM-5748202 (KAF).", "abstract": "VISTA is a program for visualizing global DNA sequence alignments of arbitrary length. It has a clean output, allowing for easy identification of similarity, and is easily configurable, enabling the visualization of alignments of various lengths at different levels of resolution. It is currently available on the web, thus allowing for easy access by all researchers. \n\nAvailability: VISTA server is available on the web at http://www-gsd.lbl.gov/vista. The source code is available upon request.", "date": "2000-11", "date_type": "published", "publication": "Bioinformatics", "volume": "16", "number": "11", "publisher": "Oxford University Press", "pagerange": "1046-1047", "id_number": "CaltechAUTHORS:20170309-104254761", "issn": "1367-4803", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170309-104254761", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "Department of Energy (DOE)", "grant_number": "DE-AC03-76SF00098" }, { "agency": "NIH", "grant_number": "GM-5748202" } ] }, "doi": "10.1093/bioinformatics/16.11.1046", "resource_type": "article", "pub_year": "2000", "author_list": "Mayor, Chris; Brudno, Michael; et el." }, { "id": "https://authors.library.caltech.edu/records/yas84-r6j70", "eprint_id": 74979, "eprint_status": "archive", "datestamp": "2023-08-19 06:15:36", "lastmod": "2023-10-25 14:35:31", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Dubchak-I", "name": { "family": "Dubchak", "given": "Inna" } }, { "id": "Brudno-M", "name": { "family": "Brudno", "given": "Michael" } }, { "id": "Loots-G-G", "name": { "family": "Loots", "given": "Gabriela G." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Mayor-C", "name": { "family": "Mayor", "given": "Chris" } }, { "id": "Rubin-E-M", "name": { "family": "Rubin", "given": "Edward M." } }, { "id": "Frazier-K-A", "name": { "family": "Frazer", "given": "Kelly A." } } ] }, "title": "Active Conservation of Noncoding Sequences Revealed by Three-Way Species Comparisons", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2000 Cold Spring Harbor Laboratory Press. The Authors acknowledge that six months after the full-issue publication date, the Article will be distributed under a Creative Commons CC-BY-NC License (Attribution-NonCommercial 4.0 International License, http://creativecommons.org/licenses/by-nc/4.0/). \n\nReceived March 28, 2000. Accepted July 12, 2000. \n\nWe thank Keith Lewis, Willow Dean, and Cathy Blankespoor for DNA sequencing and Nila Patil for valuable remarks on the manuscript. This work was supported by the following grants: U.S. Department of Energy contract DE-AC376SF00098 and NIH GM-5748202 (K.A.F.) \n\nThe publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked \"advertisement\" in accordance with 18 USC section 1734 solely to indicate this fact.\n\nPublished - 1304.full.pdf
", "abstract": "Human and mouse genomic sequence comparisons are being increasingly used to search for evolutionarily conserved gene regulatory elements. Large-scale human\u2013mouse DNA comparison studies have discovered numerous conserved noncoding sequences of which only a fraction has been functionally investigated A question therefore remains as to whether most of these noncoding sequences are conserved because of functional constraints or are the result of a lack of divergence time. \n\n[The sequence data described in this paper have been submitted to the GenBank data library under accession nos. AF276990.]", "date": "2000-09", "date_type": "published", "publication": "Genome Research", "volume": "10", "number": "9", "publisher": "Cold Spring Harbor Laboratory Press", "pagerange": "1304-1306", "id_number": "CaltechAUTHORS:20170309-111029375", "issn": "1088-9051", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170309-111029375", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "Department of Energy (DOE)", "grant_number": "DE-AC03-76SF00098" }, { "agency": "NIH", "grant_number": "GM-5748202" } ] }, "doi": "10.1101/gr.142200", "pmcid": "PMC310906", "primary_object": { "basename": "1304.full.pdf", "url": "https://authors.library.caltech.edu/records/yas84-r6j70/files/1304.full.pdf" }, "resource_type": "article", "pub_year": "2000", "author_list": "Dubchak, Inna; Brudno, Michael; et el." }, { "id": "https://authors.library.caltech.edu/records/9zeww-dy998", "eprint_id": 74978, "eprint_status": "archive", "datestamp": "2023-08-19 06:01:21", "lastmod": "2023-10-24 23:51:57", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Batzoglou-S", "name": { "family": "Batzoglou", "given": "Serafim" } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Mesirov-J-P", "name": { "family": "Mesirov", "given": "Jill P." } }, { "id": "Berger-B", "name": { "family": "Berger", "given": "Bonnie" } }, { "id": "Lander-E-S", "name": { "family": "Lander", "given": "Eric S." }, "orcid": "0000-0003-2662-4631" } ] }, "title": "Human and Mouse Gene Structure: Comparative Analysis and Application to Exon Prediction", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 2000 Cold Spring Harbor Laboratory Press. The Authors acknowledge that six months after the full-issue publication date, the Article will be distributed under a Creative Commons CC-BY-NC License (Attribution-NonCommercial 4.0 International License, http://creativecommons.org/licenses/by-nc/4.0/). \n\nReceived February 15, 2000. Accepted May 2, 2000. \n\nS.B., B.B., and this work were supported in part by Merck. L.P. was supported in part by a graduate fellowship from the Program in Mathematics and Molecular Biology and by a National Institutes of Health training grant. E.S.L. and J.M. were supported in part by a grant from the National Human Genome Research Institute. We thank Bruce Birren, Ken Dewar, and Daniel Kleitman for helpful discussions. We thank Eric Banks for support with software development. \n\nS.B. and L.P. contributed equally to this work. \n\nThe publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked \"advertisement\" in accordance with 18 USC section 1734 solely to indicate this fact.\n\nPublished - 950.full.pdf
", "abstract": "We describe a novel analytical approach to gene recognition based on cross-species comparison. We first undertook a comparison of orthologous genomic loci from human and mouse, studying the extent of similarity in the number, size and sequence of exons and introns. We then developed an approach for recognizing genes within such orthologous regions by first aligning the regions using an iterative global alignment system and then identifying genes based on conservation of exonic features at aligned positions in both species. The alignment and gene recognition are performed by new programs calledGLASS and ROSETTA, respectively.ROSETTA performed well at exact identification of coding exons in 117 orthologous pairs tested.", "date": "2000-07", "date_type": "published", "publication": "Genome Research", "volume": "10", "number": "7", "publisher": "Cold Spring Harbor Laboratory Press", "pagerange": "950-958", "id_number": "CaltechAUTHORS:20170309-110441139", "issn": "1088-9051", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170309-110441139", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "NIH Predoctoral Fellowship" }, { "agency": "National Human Genome Research Institute" } ] }, "doi": "10.1101/gr.10.7.950", "pmcid": "PMC310911", "primary_object": { "basename": "950.full.pdf", "url": "https://authors.library.caltech.edu/records/9zeww-dy998/files/950.full.pdf" }, "resource_type": "article", "pub_year": "2000", "author_list": "Batzoglou, Serafim; Pachter, Lior; et el." }, { "id": "https://authors.library.caltech.edu/records/vh6dd-k6133", "eprint_id": 74981, "eprint_status": "archive", "datestamp": "2023-08-19 04:30:28", "lastmod": "2023-10-25 14:35:38", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Batzoglou-S", "name": { "family": "Batzoglou", "given": "Serafim" } }, { "id": "Spitkovsky-V-I", "name": { "family": "Spitkovsky", "given": "Valentin I." } }, { "id": "Banks-E", "name": { "family": "Banks", "given": "Eric" } }, { "id": "Lander-E-S", "name": { "family": "Lander", "given": "Eric S." }, "orcid": "0000-0003-2662-4631" }, { "id": "Kleitman-D-J", "name": { "family": "Kleitman", "given": "Daniel J." } }, { "id": "Berger-B", "name": { "family": "Berger", "given": "Bonnie" } } ] }, "title": "A Dictionary-Based Approach for Gene Annotation", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 1999 Mary Ann Liebert, Inc.", "abstract": "This paper describes a fast and fully automated dictionary-based approach to gene annotation and exon prediction. Two dictionaries are constructed, one from the nonredundant protein OWL database and the other from the dbEST database. These dictionaries are used to obtain O(1) time lookups of tuples in the dictionaries (4 tuples for the OWL database and 11 tuples for the dbEST database). These tuples can be used to rapidly find the longest matches at every position in an input sequence to the database sequences. Such matches provide very useful information pertaining to locating common segments between exons, alternative splice sites, and frequency data of long tuples for statistical purposes. These dictionaries also provide the basis for both homology determination, and statistical approaches to exon prediction.", "date": "1999-07", "date_type": "published", "publication": "Journal of Computational Biology", "volume": "6", "number": "3-4", "publisher": "Mary Ann Liebert, Inc.", "pagerange": "419-430", "id_number": "CaltechAUTHORS:20170309-113000311", "issn": "1066-5277", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170309-113000311", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "doi": "10.1089/106652799318364", "resource_type": "article", "pub_year": "1999", "author_list": "Pachter, Lior; Batzoglou, Serafim; et el." }, { "id": "https://authors.library.caltech.edu/records/8cx0q-kp250", "eprint_id": 74995, "eprint_status": "archive", "datestamp": "2023-08-22 13:04:33", "lastmod": "2023-10-25 14:38:44", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" }, { "id": "Kim-Peter", "name": { "family": "Kim", "given": "Peter" } } ] }, "title": "Forcing matchings on square grids", "ispublished": "pub", "full_text_status": "restricted", "keywords": "Forcing number of a matching; Domino tiling; Feedback arc set", "note": "\u00a9 1998 Elsevier. \n\nReceived 20 September 1996; revised 17 June 1997; accepted 13 October 1997. \n\nWe thank Ken Halpern for discovering the example in Fig. 3. Dave Finberg helped in checking Conjecture 2. Lior Pachter was supported by DOE grant number 63564. Peter Kim was supported by the Center for Excellence in Education, while working at the RSI summer program.", "abstract": "Let G be a graph that admits a perfect matching. The forcing number of a perfect matching M of G is defined as the smallest number of edges in a subset S \u2282 M, such that S is in no other perfect matching. We show that for the 2n \u00d7 2n square grid, the forcing number of any perfect matching is bounded below by n and above by n^2. Both bounds are sharp. We also establish a connection between the forcing problem and the minimum feedback set problem. Finally, we present some conjectures about forcing numbers in other graphs.", "date": "1998-08-28", "date_type": "published", "publication": "Discrete Mathematics", "volume": "190", "number": "1-3", "publisher": "Elsevier", "pagerange": "287-294", "id_number": "CaltechAUTHORS:20170309-141622723", "issn": "0012-365X", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170309-141622723", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "Department of Energy (DOE)", "grant_number": "63564" }, { "agency": "Center for Excellence in Education" } ] }, "doi": "10.1016/S0012-365X(97)00266-5", "resource_type": "article", "pub_year": "1998", "author_list": "Pachter, Lior and Kim, Peter" }, { "id": "https://authors.library.caltech.edu/records/xbfv1-a5n80", "eprint_id": 74983, "eprint_status": "archive", "datestamp": "2023-08-19 02:33:42", "lastmod": "2023-10-25 14:35:44", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Kleitman-D-J", "name": { "family": "Kleitman", "given": "D." } }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "L." }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Finding Convex Sets Among Points in the Plane", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 1998 Springer-Verlag. \n\nReceived January 1, 1997, and in revised form June 6, 1997. \n\nWe thank G\u00e9za T\u00f3th and Pavel Valtr for contributing the lower bound construction. We also thank the referee for numerous helpful suggestions and comments.\n\nPublished - art_3A10.1007_2FPL00009358.pdf
", "abstract": "Let g(n) denote the least value such that any g(n) points in the plane in general position contain the vertices of a convex n-gon. In 1935, Erd\u0151s and Szekeres showed that g(n) exists, and they obtained the bounds 2^(n\u22122) + 1 \u2264 g(n) \u2264 (^(2n\u22124)_(n\u22122)) + 1. Chung and Graham have recently improved the upper bound by 1; the first improvement since the original Erd\u0151s\u2014Szekeres paper. We show that g(n) \u2264 (^(2n\u22124)_(n\u22122)) + 7 \u2212 2n.", "date": "1998-03", "date_type": "published", "publication": "Discrete and Computational Geometry", "volume": "19", "number": "3", "publisher": "Springer", "pagerange": "405-410", "id_number": "CaltechAUTHORS:20170309-114305555", "issn": "0179-5376", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170309-114305555", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "doi": "10.1007/PL00009358", "primary_object": { "basename": "art_3A10.1007_2FPL00009358.pdf", "url": "https://authors.library.caltech.edu/records/xbfv1-a5n80/files/art_3A10.1007_2FPL00009358.pdf" }, "resource_type": "article", "pub_year": "1998", "author_list": "Kleitman, D. and Pachter, L." }, { "id": "https://authors.library.caltech.edu/records/jxxfj-d1y91", "eprint_id": 74996, "eprint_status": "archive", "datestamp": "2023-08-19 02:15:17", "lastmod": "2023-10-25 14:38:48", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Batzoglou-S", "name": { "family": "Batzoglou", "given": "Serafim" } }, { "id": "Berger-B", "name": { "family": "Berger", "given": "Bonnie" } }, { "id": "Kleitman-D-J", "name": { "family": "Kleitman", "given": "Daniel J." } }, { "id": "Lander-E-S", "name": { "family": "Lander", "given": "Eric S." }, "orcid": "0000-0003-2662-4631" }, { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Recent Developments in Computational Gene Recognition", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 1998 Documenta Mathematica. \n\nWe thank Eric Banks, William Beebee, John Dunagan, Nick Feamster, Aram Harrow, Julia Lipman, Valentin Spitkovsky, Tina Tyan and Bill Wallis for helping in countless ways with the implementation of the ideas outlined in this paper. This project has been supported by Merck. Pachter has been partially supported by an NIH training grant and a Program in Mathematics and Molecular Biology graduate fellowship.\n\nPublished - 3Berger.MAN.ps
", "abstract": "We survey recent mathematical and computational work in the field of gene recognition, focusing on the techniques that have been developed to tackle the problem of identifying protein coding regions in genes. We also present a new approach to gene recognition which is based on a variety of tools we have developed.", "date": "1998", "date_type": "published", "publication": "Documenta Mathematica", "volume": "ICM I", "publisher": "Deutsche Mathematiker-Vereinigung (DMV)", "pagerange": "649-658", "id_number": "CaltechAUTHORS:20170309-142440565", "issn": "1431-0635", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170309-142440565", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "funders": { "items": [ { "agency": "Merck" }, { "agency": "NIH Predoctoral Fellowship" }, { "agency": "Program in Mathematics and Molecular Biology" } ] }, "primary_object": { "basename": "3Berger.MAN.ps", "url": "https://authors.library.caltech.edu/records/jxxfj-d1y91/files/3Berger.MAN.ps" }, "resource_type": "article", "pub_year": "1998", "author_list": "Batzoglou, Serafim; Berger, Bonnie; et el." }, { "id": "https://authors.library.caltech.edu/records/ccpk8-h2947", "eprint_id": 74998, "eprint_status": "archive", "datestamp": "2023-08-19 02:07:17", "lastmod": "2023-10-25 14:38:56", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Constructing status injective graphs", "ispublished": "pub", "full_text_status": "restricted", "note": "\u00a9 1997 Elsevier. \n\nReceived 15 July 1996; revised 21 October 1996.", "abstract": "The status, or distance sum, of a given vertex v in a graph is defined by s(v) = \u2211_(u \u2260 v)d(u, v) where d(u, v) is the distance from a vertex u to v. We show that every graph is the induced subgraph of a graph whose vertices all have distinct stati. Using this result we then construct a family of graphs which have consecutive integers for their stati. This settles the question raised by Harary and Buckley about whether there exist graphs whose stati are consecutive integers. We also use the above constructions to find families of non-isomorphic graphs with the same stati.", "date": "1997-12-05", "date_type": "published", "publication": "Discrete Applied Mathematics", "volume": "80", "number": "1", "publisher": "Elsevier", "pagerange": "107-113", "id_number": "CaltechAUTHORS:20170309-143338663", "issn": "0166-218X", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170309-143338663", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "doi": "10.1016/S0166-218X(97)00073-5", "resource_type": "article", "pub_year": "1997", "author_list": "Pachter, Lior" }, { "id": "https://authors.library.caltech.edu/records/mxdck-d5b36", "eprint_id": 75001, "eprint_status": "archive", "datestamp": "2023-08-19 02:02:08", "lastmod": "2023-10-25 14:39:03", "type": "article", "metadata_visibility": "show", "creators": { "items": [ { "id": "Pachter-L", "name": { "family": "Pachter", "given": "Lior" }, "orcid": "0000-0002-9164-6231" } ] }, "title": "Combinatorial Approaches and Conjectures for 2-Divisibility Problems Concerning Domino Tilings of Polyominoes", "ispublished": "pub", "full_text_status": "public", "note": "\u00a9 1997 The Author. \n\nSubmitted: September 24, 1997; Accepted: November 8, 1997. \n\nWe thank Joshua Bao and Jim Propp for helpful suggestions and comments. Special thanks go to Glenn Tesler for helping to draw the tiling pictures and to David Wilson for providing his program vax.el with which all the conjectures were tested. Finally, we are indebted to the anonymous referee for excellent suggestions which greatly helped in improving the final version of the paper.\n\nPublished - 1314-1393-1-PB.pdf
", "abstract": "We give the first complete combinatorial proof of the fact that the number of domino tilings of the 2n\u00d72n square grid is of the form 2^n(2k + 1)^2, thus settling a question raised by John, Sachs, and Zernitz. The proof lends itself naturally to some interesting generalizations, and leads to a number of new conjectures.", "date": "1997-11-08", "date_type": "published", "publication": "Electronic Journal of Combinatorics", "volume": "4", "number": "1", "publisher": "Electronic Journal of Combinatorics", "pagerange": "Art. No. R29", "id_number": "CaltechAUTHORS:20170309-144854496", "issn": "1077-8926", "official_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170309-144854496", "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "primary_object": { "basename": "1314-1393-1-PB.pdf", "url": "https://authors.library.caltech.edu/records/mxdck-d5b36/files/1314-1393-1-PB.pdf" }, "resource_type": "article", "pub_year": "1997", "author_list": "Pachter, Lior" } ]