[
    {
        "id": "authors:f8vmz-3pw50",
        "collection": "authors",
        "collection_id": "f8vmz-3pw50",
        "cite_using_url": "https://authors.library.caltech.edu/records/f8vmz-3pw50",
        "type": "article",
        "title": "FALCON: Fourier Adaptive Learning and Control for Disturbance Rejection Under Extreme Turbulence",
        "author": [
            {
                "family_name": "Lale",
                "given_name": "Sahin",
                "orcid": "0000-0002-7191-346X",
                "clpid": "Lale-Sahin"
            },
            {
                "family_name": "Renn",
                "given_name": "Peter I.",
                "orcid": "0000-0002-5735-3873",
                "clpid": "Renn-Peter-I"
            },
            {
                "family_name": "Azizzadenesheli",
                "given_name": "Kamyar",
                "orcid": "0000-0001-8507-1868",
                "clpid": "Azizzadenesheli-Kamyar"
            },
            {
                "family_name": "Hassibi",
                "given_name": "Babak",
                "orcid": "0000-0002-1375-5838",
                "clpid": "Hassibi-B"
            },
            {
                "family_name": "Gharib",
                "given_name": "Morteza",
                "orcid": "0000-0003-0754-4193",
                "clpid": "Gharib-M"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            }
        ],
        "abstract": "<p>Controlling aerodynamic forces in turbulent conditions is crucial for UAV operation. Traditional reactive methods often struggle due to unpredictable flow and sensor noise. We present FALCON (Fourier Adaptive Learning and Control), a model-based reinforcement learning framework for effective modeling and control of aerodynamic forces under turbulent flows. FALCON leverages two key insights: turbulent dynamics are well-modeled in the frequency domain, and most turbulent energy is concentrated in low-frequencies. FALCON learns a concise Fourier basis to model system dynamics from 35&thinsp;s of flow data. To address sensor limitations, FALCON models dynamics using a short history of actions and measurements. With this approach, FALCON applies model predictive control for safe and efficient control. Tested in the Caltech wind tunnel under highly turbulent conditions, FALCON learns to control the underlying nonlinear dynamics with less than 9&thinsp;min of data, consistently outperforming state-of-the-art methods. We provide guarantees for FALCON, ensuring stability and robustness.</p>",
        "doi": "10.1038/s44182-024-00013-0",
        "issn": "2731-4278",
        "publisher": "Nature Publishing Group",
        "publication": "npj Robotics",
        "publication_date": "2024-09-24",
        "series_number": "1",
        "volume": "2",
        "issue": "1",
        "pages": "6"
    },
    {
        "id": "authors:emsfz-n3k73",
        "collection": "authors",
        "collection_id": "emsfz-n3k73",
        "cite_using_url": "https://authors.library.caltech.edu/records/emsfz-n3k73",
        "type": "article",
        "title": "Near-term distributed quantum computation using mean-field corrections and auxiliary qubits",
        "author": [
            {
                "family_name": "McClain Gomez",
                "given_name": "Abigail",
                "orcid": "0009-0002-0090-0941",
                "clpid": "McClain-Gomez-Abigail"
            },
            {
                "family_name": "Patti",
                "given_name": "Taylor L.",
                "orcid": "0000-0002-4242-6072",
                "clpid": "Patti-Taylor-L"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Yelin",
                "given_name": "Susanne F.",
                "orcid": "0000-0003-1655-9151",
                "clpid": "Yelin-Susanne-F"
            }
        ],
        "abstract": "<div class=\"article-text wd-jnl-art-abstract cf\">\n<p>Distributed quantum computation is often proposed to increase the scalability of quantum hardware, as it reduces cooperative noise and requisite connectivity by sharing quantum information between distant quantum devices. However, such exchange of quantum information itself poses unique engineering challenges, requiring high gate fidelity and costly non-local operations. To mitigate this, we propose near-term distributed quantum computing, focusing on approximate approaches that involve limited information transfer and conservative entanglement production. We first devise an approximate distributed computing scheme for the time evolution of quantum systems split across any combination of classical and quantum devices. Our procedure harnesses mean-field corrections and auxiliary qubits to link two or more devices classically, optimally encoding the auxiliary qubits to both minimize short-time evolution error and extend the approximate scheme's performance to longer evolution times. We then expand the scheme to include limited quantum information transfer through selective qubit shuffling or teleportation, broadening our method's applicability and boosting its performance. Finally, we build upon these concepts to produce an approximate circuit-cutting technique for the fragmented pre-training of variational quantum algorithms. To characterize our technique, we introduce a non-linear perturbation theory that discerns the critical role of our mean-field corrections in optimization and may be suitable for analyzing other non-linear quantum techniques. This fragmented pre-training is remarkably successful, reducing algorithmic error by orders of magnitude while requiring fewer iterations.</p>\n</div>",
        "doi": "10.1088/2058-9565/ad3f45",
        "issn": "2058-9565",
        "publisher": "IOP Publishing",
        "publication": "Quantum Science and Technology",
        "publication_date": "2024-07",
        "series_number": "3",
        "volume": "9",
        "issue": "3",
        "pages": "035022"
    },
    {
        "id": "authors:sr4k2-nhn30",
        "collection": "authors",
        "collection_id": "sr4k2-nhn30",
        "cite_using_url": "https://authors.library.caltech.edu/records/sr4k2-nhn30",
        "type": "article",
        "title": "Plasma surrogate modelling using Fourier neural operators",
        "author": [
            {
                "family_name": "Gopakumar",
                "given_name": "Vignesh",
                "orcid": "0000-0003-0904-3448",
                "clpid": "Gopakumar-Vignesh"
            },
            {
                "family_name": "Pamela",
                "given_name": "Stanislas",
                "orcid": "0000-0001-8854-1749",
                "clpid": "Pamela-Stanislas"
            },
            {
                "family_name": "Zanisi",
                "given_name": "Lorenzo",
                "orcid": "0000-0002-8746-2595",
                "clpid": "Zanisi-Lorenzo"
            },
            {
                "family_name": "Li",
                "given_name": "Zongyi",
                "orcid": "0000-0003-2081-9665",
                "clpid": "Li-Zongyi"
            },
            {
                "family_name": "Gray",
                "given_name": "Ander",
                "orcid": "0000-0002-1585-0900",
                "clpid": "Gray-Ander"
            },
            {
                "family_name": "Brennand",
                "given_name": "Daniel",
                "clpid": "Brennand-Daniel"
            },
            {
                "family_name": "Bhatia",
                "given_name": "Nitesh",
                "orcid": "0000-0003-1367-3477",
                "clpid": "Bhatia-Nitesh"
            },
            {
                "family_name": "Stathopoulos",
                "given_name": "Gregory",
                "orcid": "0009-0005-8368-5854",
                "clpid": "Stathopoulos-Gregory"
            },
            {
                "family_name": "Kusner",
                "given_name": "Matt",
                "orcid": "0000-0002-4554-3389",
                "clpid": "Kusner-Matt-J"
            },
            {
                "family_name": "Deisenroth",
                "given_name": "Marc Peter",
                "orcid": "0000-0003-1503-680X",
                "clpid": "Deisenroth-Marc-Peter"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            }
        ],
        "abstract": "<p>Predicting plasma evolution within a Tokamak reactor is crucial to realizing the goal of sustainable fusion. Capabilities in forecasting the spatio-temporal evolution of plasma rapidly and accurately allow us to quickly iterate over design and control strategies on current Tokamak devices and future reactors. Modelling plasma evolution using numerical solvers is often expensive, consuming many hours on supercomputers, and hence, we need alternative inexpensive surrogate models. We demonstrate accurate predictions of plasma evolution both in simulation and experimental domains using deep learning-based surrogate modelling tools, viz., Fourier neural operators (FNO). We show that FNO has a speedup of six orders of magnitude over traditional solvers in predicting the plasma dynamics simulated from magnetohydrodynamic models, while maintaining a high accuracy (Mean Squared Error in the normalised domain&nbsp;<span class=\"inline-eqn\"><span class=\"tex\">&asymp;10\u207b\u2075</span></span>). Our modified version of the FNO is capable of solving multi-variable Partial Differential Equations, and can capture the dependence among the different variables in a single model. FNOs can also predict plasma evolution on real-world experimental data observed by the cameras positioned within the MAST Tokamak, i.e. cameras looking across the central solenoid and the divertor in the Tokamak. We show that FNOs are able to accurately forecast the evolution of plasma and have the potential to be deployed for real-time monitoring. We also illustrate their capability in forecasting the plasma shape, the locations of interactions of the plasma with the central solenoid and the divertor for the full (available) duration of the plasma shot within MAST. The FNO offers a viable alternative for surrogate modelling as it is quick to train and infer, and requires fewer data points, while being able to do zero-shot super-resolution and getting high-fidelity solutions.</p>",
        "doi": "10.1088/1741-4326/ad313a",
        "issn": "0029-5515",
        "publisher": "IOP Publishing",
        "publication": "Nuclear Fusion",
        "publication_date": "2024-05",
        "series_number": "5",
        "volume": "64",
        "issue": "5",
        "pages": "056025"
    },
    {
        "id": "authors:2ttpy-ejy25",
        "collection": "authors",
        "collection_id": "2ttpy-ejy25",
        "cite_using_url": "https://authors.library.caltech.edu/records/2ttpy-ejy25",
        "type": "article",
        "title": "Neural operators for accelerating scientific simulations and design",
        "author": [
            {
                "family_name": "Azizzadenesheli",
                "given_name": "Kamyar",
                "orcid": "0000-0001-8507-1868",
                "clpid": "Azizzadenesheli-Kamyar"
            },
            {
                "family_name": "Kovachki",
                "given_name": "Nikola",
                "orcid": "0000-0002-3650-2972",
                "clpid": "Kovachki-Nikola"
            },
            {
                "family_name": "Li",
                "given_name": "Zongyi",
                "orcid": "0000-0003-2081-9665",
                "clpid": "Li-Zongyi"
            },
            {
                "family_name": "Liu-Schiaffini",
                "given_name": "Miguel",
                "orcid": "0000-0001-9685-8383",
                "clpid": "Liu-Schiaffini-Miguel"
            },
            {
                "family_name": "Kossaifi",
                "given_name": "Jean",
                "orcid": "0000-0002-4445-3429",
                "clpid": "Kossaifi-Jean"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            }
        ],
        "abstract": "<div class=\"c-article-section\">\n<div class=\"c-article-section__content\">\n<p>Scientific discovery and engineering design are currently limited by the time and cost of physical experiments. Numerical simulations are an alternative approach but are usually intractable for complex real-world problems. Artificial intelligence promises a solution through fast data-driven surrogate models. In particular, neural operators present a principled framework for learning mappings between functions defined on continuous domains, such as spatiotemporal processes and partial differential equations. Neural operators can extrapolate and predict solutions at new locations unseen during training. They can be integrated with physics and other domain constraints enforced at finer resolutions to obtain high-fidelity solutions and good generalization. Neural operators are differentiable, so they can directly optimize parameters for inverse design and other inverse problems. Neural operators can therefore augment, or even replace, existing numerical simulators in many applications, such as computational fluid dynamics, weather forecasting and material modelling, providing speedups of four to five orders of magnitude.</p>\n</div>\n</div>\n\n<div class=\"js-context-bar-sticky-point-mobile\">&nbsp;</div>",
        "doi": "10.1038/s42254-024-00712-5",
        "issn": "2522-5820",
        "publisher": "Nature Publishing Group",
        "publication": "Nature Reviews Physics",
        "publication_date": "2024-04-08"
    },
    {
        "id": "authors:06m4r-8jf10",
        "collection": "authors",
        "collection_id": "06m4r-8jf10",
        "cite_using_url": "https://authors.library.caltech.edu/records/06m4r-8jf10",
        "type": "article",
        "title": "Physics-Informed Neural Operator for Learning Partial Differential Equations",
        "author": [
            {
                "family_name": "Li",
                "given_name": "Zongyi",
                "orcid": "0000-0003-2081-9665",
                "clpid": "Li-Zongyi"
            },
            {
                "family_name": "Zheng",
                "given_name": "Hongkai",
                "clpid": "Zheng-Hongkai"
            },
            {
                "family_name": "Kovachki",
                "given_name": "Nikola",
                "orcid": "0000-0002-3650-2972",
                "clpid": "Kovachki-Nikola"
            },
            {
                "family_name": "Jin",
                "given_name": "David",
                "clpid": "Jin-David"
            },
            {
                "family_name": "Chen",
                "given_name": "Haoxuan",
                "orcid": "0000-0002-8238-2764",
                "clpid": "Chen-Haoxuan"
            },
            {
                "family_name": "Liu",
                "given_name": "Burigede",
                "orcid": "0000-0002-6518-3368",
                "clpid": "Liu-Burigede"
            },
            {
                "family_name": "Azizzadenesheli",
                "given_name": "Kamyar",
                "orcid": "0000-0001-8507-1868",
                "clpid": "Azizzadenesheli-Kamyar"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            }
        ],
        "abstract": "<div class=\"hlFld-Abstract\">\n<div class=\"article__section article__abstract clearfix\">\n<div class=\"abstractSection abstractInFull\">\n<div class=\"abstractSection abstractInFull\">\n<p>In this paper, we propose physics-informed neural operators (PINO) that combine training data and physics constraints to learn the solution operator of a given family of parametric Partial Differential Equations (PDE). PINO is the first hybrid approach incorporating data and PDE constraints at different resolutions to learn the operator. Specifically, in PINO, we combine coarse-resolution training data with PDE constraints imposed at a higher resolution. The resulting PINO model can accurately approximate the ground-truth solution operator for many popular PDE families and shows no degradation in accuracy even under zero-shot super-resolution, i.e., being able to predict beyond the resolution of training data. PINO uses the Fourier neural operator (FNO) framework that is guaranteed to be a universal approximator for any continuous operator and discretization convergent in the limit of mesh refinement. By adding PDE constraints to FNO at a higher resolution, we obtain a high-fidelity reconstruction of the ground-truth operator. Moreover, PINO succeeds in settings where no training data is available and only PDE constraints are imposed, while previous approaches, such as the Physics-Informed Neural Network (PINN), fail due to optimization challenges, e.g., in multi-scale dynamic systems such as Kolmogorov flows.</p>\n</div>\n</div>\n</div>\n</div>",
        "doi": "10.1145/3648506",
        "issn": "2831-3194",
        "publisher": "Association for Computing Machinery",
        "publication": "ACM / IMS Journal of Data Science",
        "publication_date": "2024-02-21"
    },
    {
        "id": "authors:rhxkg-vj009",
        "collection": "authors",
        "collection_id": "rhxkg-vj009",
        "cite_using_url": "https://authors.library.caltech.edu/records/rhxkg-vj009",
        "type": "article",
        "title": "State-specific protein\u2013ligand complex structure prediction with a multiscale deep generative model",
        "author": [
            {
                "family_name": "Qiao",
                "given_name": "Zhuoran",
                "orcid": "0000-0002-5704-7331"
            },
            {
                "family_name": "Nie",
                "given_name": "Weili",
                "clpid": "Nie-Weili"
            },
            {
                "family_name": "Vahdat",
                "given_name": "Arash",
                "clpid": "Vahdat-Arash"
            },
            {
                "family_name": "Miller",
                "given_name": "Thomas F.",
                "orcid": "0000-0002-1882-5380",
                "clpid": "Miller-T-F-III"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            }
        ],
        "abstract": "<div class=\"c-article-section\">\n<div class=\"c-article-section__content\">\n<p>The binding complexes formed by proteins and small molecule ligands are ubiquitous and critical to life. Despite recent advancements in protein structure prediction, existing algorithms are so far unable to systematically predict the binding ligand structures along with their regulatory effects on protein folding. To address this discrepancy, we present NeuralPLexer, a computational approach that can directly predict protein&ndash;ligand complex structures solely using protein sequence and ligand molecular graph inputs. NeuralPLexer adopts a deep generative model to sample the three-dimensional structures of the binding complex and their conformational changes at an atomistic resolution. The model is based on a diffusion process that incorporates essential biophysical constraints and a multiscale geometric deep learning system to iteratively sample residue-level contact maps and all heavy-atom coordinates in a hierarchical manner. NeuralPLexer achieves state-of-the-art performance compared with all existing methods on benchmarks for both protein&ndash;ligand blind docking and flexible binding-site structure recovery. Moreover, owing to its specificity in sampling both ligand-free-state and ligand-bound-state ensembles, NeuralPLexer consistently outperforms AlphaFold2 in terms of global protein structure accuracy on both representative structure pairs with large conformational changes and recently determined ligand-binding proteins. NeuralPLexer predictions align with structure determination experiments for important targets in enzyme engineering and drug discovery, suggesting its potential for accelerating the design of functional proteins and small molecules at the proteome scale.</p>\n</div>\n</div>",
        "doi": "10.1038/s42256-024-00792-z",
        "issn": "2522-5839",
        "publisher": "Nature Publishing Group",
        "publication": "Nature Machine Intelligence",
        "publication_date": "2024-02-12"
    },
    {
        "id": "authors:ch9jq-hc145",
        "collection": "authors",
        "collection_id": "ch9jq-hc145",
        "cite_using_url": "https://authors.library.caltech.edu/records/ch9jq-hc145",
        "type": "article",
        "title": "AI-aided geometric design of anti-infection catheters",
        "author": [
            {
                "family_name": "Zhou",
                "given_name": "Tingtao",
                "orcid": "0000-0002-1766-719X",
                "clpid": "Zhou-Tingtao"
            },
            {
                "family_name": "Wan",
                "given_name": "Xuan",
                "orcid": "0000-0002-6165-6340",
                "clpid": "Wan-Xuan"
            },
            {
                "family_name": "Huang",
                "given_name": "Daniel Zhengyu",
                "clpid": "Huang-Daniel-Zhengyu"
            },
            {
                "family_name": "Li",
                "given_name": "Zongyi",
                "orcid": "0000-0003-2081-9665",
                "clpid": "Li-Zongyi"
            },
            {
                "family_name": "Peng",
                "given_name": "Zhiwei",
                "orcid": "0000-0002-9486-2837",
                "clpid": "Peng-Zhiwei"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Brady",
                "given_name": "John F.",
                "orcid": "0000-0001-5817-9128",
                "clpid": "Brady-J-F"
            },
            {
                "family_name": "Sternberg",
                "given_name": "Paul W.",
                "orcid": "0000-0002-7699-0173",
                "clpid": "Sternberg-P-W"
            },
            {
                "family_name": "Daraio",
                "given_name": "Chiara",
                "orcid": "0000-0001-5296-4440",
                "clpid": "Daraio-C"
            }
        ],
        "abstract": "Bacteria can swim upstream in a narrow tube and pose a clinical threat of urinary tract infection to patients implanted with catheters. Coatings and structured surfaces have been proposed to repel bacteria, but no such approach thoroughly addresses the contamination problem in catheters. Here, on the basis of the physical mechanism of upstream swimming, we propose a novel geometric design, optimized by an artificial intelligence model. Using\n            Escherichia coli\n            , we demonstrate the anti-infection mechanism in microfluidic experiments and evaluate the effectiveness of the design in three-dimensionally printed prototype catheters under clinical flow rates. Our catheter design shows that one to two orders of magnitude improved suppression of bacterial contamination at the upstream end, potentially prolonging the in-dwelling time for catheter use and reducing the overall risk of catheter-associated urinary tract infection.",
        "doi": "10.1126/sciadv.adj1741",
        "pmcid": "PMC10776022",
        "issn": "2375-2548",
        "publisher": "American Association for the Advancement of Science",
        "publication": "Science Advances",
        "publication_date": "2024-01-05",
        "series_number": "1",
        "volume": "10",
        "issue": "1",
        "pages": "eadj1741"
    },
    {
        "id": "authors:ewmpw-3r017",
        "collection": "authors",
        "collection_id": "ewmpw-3r017",
        "cite_using_url": "https://authors.library.caltech.edu/records/ewmpw-3r017",
        "type": "article",
        "title": "Shaping the Water-Harvesting Behavior of Metal\u2013Organic Frameworks Aided by Fine-Tuned GPT Models",
        "author": [
            {
                "family_name": "Zheng",
                "given_name": "Zhiling",
                "orcid": "0000-0001-6090-2258",
                "clpid": "Zheng-Zhiling"
            },
            {
                "family_name": "Alawadhi",
                "given_name": "Ali H.",
                "orcid": "0000-0003-2680-5221",
                "clpid": "Alawadhi-Ali-H"
            },
            {
                "family_name": "Chheda",
                "given_name": "Saumil",
                "orcid": "0000-0002-0989-5707",
                "clpid": "Chheda-Saumil"
            },
            {
                "family_name": "Neumann",
                "given_name": "S. Ephraim",
                "orcid": "0000-0002-8515-9621",
                "clpid": "Neumann-S-Ephraim"
            },
            {
                "family_name": "Rampal",
                "given_name": "Nakul",
                "orcid": "0000-0002-6187-5631",
                "clpid": "Rampal-Nakul"
            },
            {
                "family_name": "Liu",
                "given_name": "Shengchao",
                "orcid": "0000-0003-2030-2367",
                "clpid": "Liu-Shengchao"
            },
            {
                "family_name": "Nguyen",
                "given_name": "Ha L.",
                "orcid": "0000-0002-4977-925X",
                "clpid": "Nguyen-Ha-L"
            },
            {
                "family_name": "Lin",
                "given_name": "Yen-hsu",
                "clpid": "Lin-Yen-hsu"
            },
            {
                "family_name": "Rong",
                "given_name": "Zichao",
                "orcid": "0000-0002-9014-9540",
                "clpid": "Rong-Zichao"
            },
            {
                "family_name": "Siepmann",
                "given_name": "J. Ilja",
                "orcid": "0000-0003-2534-4507",
                "clpid": "Siepmann-Joern-Ilja"
            },
            {
                "family_name": "Gagliardi",
                "given_name": "Laura",
                "orcid": "0000-0001-5227-1396",
                "clpid": "Gagliardi-Laura"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Borgs",
                "given_name": "Christian",
                "orcid": "0000-0001-5653-0498",
                "clpid": "Borgs-Christian"
            },
            {
                "family_name": "Chayes",
                "given_name": "Jennifer T.",
                "orcid": "0000-0003-4020-8618",
                "clpid": "Chayes-Jennifer-T"
            },
            {
                "family_name": "Yaghi",
                "given_name": "Omar M.",
                "orcid": "0000-0002-5611-3325",
                "clpid": "Yaghi-Omar-M"
            }
        ],
        "abstract": "<p>We construct a data set of metal\u2013organic framework (MOF) linkers and employ a fine-tuned GPT assistant to propose MOF linker designs by mutating and modifying the existing linker structures. This strategy allows the GPT model to learn the intricate language of chemistry in molecular representations, thereby achieving an enhanced accuracy in generating linker structures compared with its base models. Aiming to highlight the significance of linker design strategies in advancing the discovery of water-harvesting MOFs, we conducted a systematic MOF variant expansion upon state-of-the-art MOF-303 utilizing a multidimensional approach that integrates linker extension with multivariate tuning strategies. We synthesized a series of isoreticular aluminum MOFs, termed Long-Arm MOFs (LAMOF-1 to LAMOF-10), featuring linkers that bear various combinations of heteroatoms in their five-membered ring moiety, replacing pyrazole with either thiophene, furan, or thiazole rings or a combination of two. Beyond their consistent and robust architecture, as demonstrated by permanent porosity and thermal stability, the LAMOF series offers a generalizable synthesis strategy. Importantly, these 10 LAMOFs establish new benchmarks for water uptake (up to 0.64 g g\u207b\u00b9) and operational humidity ranges (between 13 and 53%), thereby expanding the diversity of water-harvesting MOFs.</p><p>&nbsp;</p>",
        "doi": "10.1021/jacs.3c12086",
        "issn": "0002-7863",
        "publisher": "American Chemical Society",
        "publication": "Journal of the American Chemical Society",
        "publication_date": "2023-12-27",
        "series_number": "51",
        "volume": "145",
        "issue": "51",
        "pages": "28284-28295"
    },
    {
        "id": "authors:v26v8-j6q25",
        "collection": "authors",
        "collection_id": "v26v8-j6q25",
        "cite_using_url": "https://authors.library.caltech.edu/records/v26v8-j6q25",
        "type": "article",
        "title": "Stability Constrained Reinforcement Learning for Decentralized Real-Time Voltage Control",
        "author": [
            {
                "family_name": "Feng",
                "given_name": "Jie",
                "orcid": "0000-0002-5049-9423",
                "clpid": "Feng-Jie"
            },
            {
                "family_name": "Shi",
                "given_name": "Yuanyuan",
                "orcid": "0000-0002-6182-7664",
                "clpid": "Shi-Yuanyuan"
            },
            {
                "family_name": "Qu",
                "given_name": "Guannan",
                "orcid": "0000-0002-5466-3550",
                "clpid": "Qu-Guannan"
            },
            {
                "family_name": "Low",
                "given_name": "Steven H.",
                "orcid": "0000-0001-6476-3048",
                "clpid": "Low-S-H"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Wierman",
                "given_name": "Adam",
                "orcid": "0000-0002-5923-0199",
                "clpid": "Wierman-A"
            }
        ],
        "abstract": "<div class=\"abstract-text row g-0\">\n<div class=\"col-12\">\n<div class=\"u-mb-1\">\n<div>Deep reinforcement learning has been recognized as a promising tool to address the challenges in real-time control of power systems. However, its deployment in real-world power systems has been hindered by a lack of explicit stability and safety guarantees. In this paper, we propose a stability-constrained reinforcement learning (RL) method for real-time implementation of voltage control, that guarantees system stability both during policy learning and deployment of the learned policy. The key idea underlying our approach is an explicitly constructed Lyapunov function that leads to a sufficient structural condition for stabilizing policies, i.e., monotonically decreasing policies guarantee stability. We incorporate this structural constraint with RL, by parameterizing each local voltage controller using a monotone neural network, thus ensuring the stability constraint is satisfied by design. We demonstrate the effectiveness of our approach in both single-phase and three-phase IEEE test feeders, where the proposed method can reduce the transient control cost by more than 26.7% and shorten the voltage recovery time by 23.6% on average compared to the widely used linear policy, while always achieving voltage stability. In contrast, standard RL methods often fail to achieve voltage stability.</div>\n</div>\n</div>\n</div>",
        "doi": "10.1109/tcns.2023.3338240",
        "issn": "2325-5870",
        "publisher": "IEEE",
        "publication": "IEEE Transactions on Control of Network Systems",
        "publication_date": "2023-12-01",
        "pages": "1-12"
    },
    {
        "id": "authors:wygt1-n8w76",
        "collection": "authors",
        "collection_id": "wygt1-n8w76",
        "cite_using_url": "https://authors.library.caltech.edu/records/wygt1-n8w76",
        "type": "article",
        "title": "Multi-modal molecule structure\u2013text model for text-based retrieval and editing",
        "author": [
            {
                "family_name": "Liu",
                "given_name": "Shengchao",
                "orcid": "0000-0003-2030-2367",
                "clpid": "Liu-Shengchao"
            },
            {
                "family_name": "Nie",
                "given_name": "Weili",
                "clpid": "Nie-Weili"
            },
            {
                "family_name": "Wang",
                "given_name": "Chengpeng",
                "orcid": "0000-0002-9196-2613",
                "clpid": "Wang-Chengpeng"
            },
            {
                "family_name": "Lu",
                "given_name": "Jiarui",
                "clpid": "Lu-Jiarui"
            },
            {
                "family_name": "Qiao",
                "given_name": "Zhuoran",
                "clpid": "Qiao-Zhuoran"
            },
            {
                "family_name": "Liu",
                "given_name": "Ling",
                "clpid": "Liu-Ling"
            },
            {
                "family_name": "Tang",
                "given_name": "Jian",
                "clpid": "Tang-Jian"
            },
            {
                "family_name": "Xiao",
                "given_name": "Chaowei",
                "clpid": "Xiao-Chaowei"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            }
        ],
        "abstract": "<p>There is increasing adoption of artificial intelligence in drug discovery. However, existing studies use machine learning to mainly utilize the chemical structures of molecules but ignore the vast textual knowledge available in chemistry. Incorporating textual knowledge enables us to realize new drug design objectives, adapt to text-based instructions and predict complex biological activities. Here we present a multi-modal molecule structure\u2013text model, MoleculeSTM, by jointly learning molecules' chemical structures and textual descriptions via a contrastive learning strategy. To train MoleculeSTM, we construct a large multi-modal dataset, namely, PubChemSTM, with over 280,000 chemical structure\u2013text pairs. To demonstrate the effectiveness and utility of MoleculeSTM, we design two challenging zero-shot tasks based on text instructions, including structure\u2013text retrieval and molecule editing. MoleculeSTM has two main properties: open vocabulary and compositionality via natural language. In experiments, MoleculeSTM obtains the state-of-the-art generalization ability to novel biochemical concepts across various benchmarks.</p>",
        "doi": "10.1038/s42256-023-00759-6",
        "issn": "2522-5839",
        "publisher": "Nature Publishing Group",
        "publication": "Nature Machine Intelligence",
        "publication_date": "2023-12",
        "series_number": "12",
        "volume": "5",
        "issue": "12",
        "pages": "1447-1457"
    },
    {
        "id": "authors:yn54g-8d682",
        "collection": "authors",
        "collection_id": "yn54g-8d682",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20230420-711199500.6",
        "type": "article",
        "title": "A vision transformer for decoding surgeon activity from surgical videos",
        "author": [
            {
                "family_name": "Kiyasseh",
                "given_name": "Dani",
                "orcid": "0000-0002-2898-1790",
                "clpid": "Kiyasseh-Dani"
            },
            {
                "family_name": "Ma",
                "given_name": "Runzhuo",
                "orcid": "0000-0001-6381-2661",
                "clpid": "Ma-Runzhuo"
            },
            {
                "family_name": "Haque",
                "given_name": "Taseen F.",
                "orcid": "0000-0002-7165-6539",
                "clpid": "Haque-Taseen-F"
            },
            {
                "family_name": "Miles",
                "given_name": "Brian J.",
                "orcid": "0000-0001-7927-9873",
                "clpid": "Miles-Brian-J"
            },
            {
                "family_name": "Wagner",
                "given_name": "Christian",
                "clpid": "Wagner-Christian"
            },
            {
                "family_name": "Donoho",
                "given_name": "Daniel A.",
                "orcid": "0000-0002-0531-1436",
                "clpid": "Donoho-Daniel-A"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Hung",
                "given_name": "Andrew J.",
                "orcid": "0000-0002-7201-6736",
                "clpid": "Hung-Andrew-J"
            }
        ],
        "abstract": "The intraoperative activity of a surgeon has substantial impact on postoperative outcomes. However, for most surgical procedures, the details of intraoperative surgical actions, which can vary widely, are not well understood. Here we report a machine learning system leveraging a vision transformer and supervised contrastive learning for the decoding of elements of intraoperative surgical activity from videos commonly collected during robotic surgeries. The system accurately identified surgical steps, actions performed by the surgeon, the quality of these actions and the relative contribution of individual video frames to the decoding of the actions. Through extensive testing on data from three different hospitals located in two different continents, we show that the system generalizes across videos, surgeons, hospitals and surgical procedures, and that it can provide information on surgical gestures and skills from unannotated videos. Decoding intraoperative activity via accurate machine learning systems could be used to provide surgeons with feedback on their operating skills, and may allow for the identification of optimal surgical behaviour and for the study of relationships between intraoperative factors and postoperative outcomes.",
        "doi": "10.1038/s41551-023-01010-8",
        "pmcid": "PMC10307635",
        "issn": "2157-846X",
        "publisher": "Nature Publishing Group",
        "publication": "Nature Biomedical Engineering",
        "publication_date": "2023-06",
        "series_number": "6",
        "volume": "7",
        "issue": "6",
        "pages": "780-796"
    },
    {
        "id": "authors:ehv4k-4pn80",
        "collection": "authors",
        "collection_id": "ehv4k-4pn80",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20230420-614686900.13",
        "type": "article",
        "title": "Human visual explanations mitigate bias in AI-based assessment of surgeon skills",
        "author": [
            {
                "family_name": "Kiyasseh",
                "given_name": "Dani",
                "orcid": "0000-0002-2898-1790",
                "clpid": "Kiyasseh-Dani"
            },
            {
                "family_name": "Laca",
                "given_name": "Jasper",
                "clpid": "Laca-Jasper-A"
            },
            {
                "family_name": "Haque",
                "given_name": "Taseen F.",
                "orcid": "0000-0002-7165-6539",
                "clpid": "Haque-Taseen-F"
            },
            {
                "family_name": "Otiato",
                "given_name": "Maxwell",
                "orcid": "0000-0001-6979-6316",
                "clpid": "Otiato-Maxwell-X"
            },
            {
                "family_name": "Miles",
                "given_name": "Brian J.",
                "orcid": "0000-0001-7927-9873",
                "clpid": "Miles-Brian-J"
            },
            {
                "family_name": "Wagner",
                "given_name": "Christian",
                "clpid": "Wagner-Christian"
            },
            {
                "family_name": "Donoho",
                "given_name": "Daniel A.",
                "orcid": "0000-0002-0531-1436",
                "clpid": "Donoho-Daniel-A"
            },
            {
                "family_name": "Trinh",
                "given_name": "Quoc-Dien",
                "orcid": "0000-0003-3857-9276",
                "clpid": "Trinh-Quoc-Dien"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Hung",
                "given_name": "Andrew J.",
                "orcid": "0000-0002-7201-6736",
                "clpid": "Hung-Andrew-J"
            }
        ],
        "abstract": "Artificial intelligence (AI) systems can now reliably assess surgeon skills through videos of intraoperative surgical activity. With such systems informing future high-stakes decisions such as whether to credential surgeons and grant them the privilege to operate on patients, it is critical that they treat all surgeons fairly. However, it remains an open question whether surgical AI systems exhibit bias against surgeon sub-cohorts, and, if so, whether such bias can be mitigated. Here, we examine and mitigate the bias exhibited by a family of surgical AI systems\u2014SAIS\u2014deployed on videos of robotic surgeries from three geographically-diverse hospitals (USA and EU). We show that SAIS exhibits an underskilling bias, erroneously downgrading surgical performance, and an overskilling bias, erroneously upgrading surgical performance, at different rates across surgeon sub-cohorts. To mitigate such bias, we leverage a strategy \u2014TWIX\u2014which teaches an AI system to provide a visual explanation for its skill assessment that otherwise would have been provided by human experts. We show that whereas baseline strategies inconsistently mitigate algorithmic bias, TWIX can effectively mitigate the underskilling and overskilling bias while simultaneously improving the performance of these AI systems across hospitals. We discovered that these findings carry over to the training environment where we assess medical students' skills today. Our study is a critical prerequisite to the eventual implementation of AI-augmented global surgeon credentialing programs, ensuring that all surgeons are treated fairly.",
        "doi": "10.1038/s41746-023-00766-2",
        "pmcid": "PMC10063676",
        "issn": "2398-6352",
        "publisher": "Nature Publishing Group",
        "publication": "npj Digital Medicine",
        "publication_date": "2023-04-04",
        "volume": "6",
        "pages": "Art. No. 54"
    },
    {
        "id": "authors:2ybre-3g121",
        "collection": "authors",
        "collection_id": "2ybre-3g121",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20230404-448520900.5",
        "type": "article",
        "title": "Real-time high-resolution CO\u2082 geological storage prediction using nested Fourier neural operators",
        "author": [
            {
                "family_name": "Wen",
                "given_name": "Gege",
                "orcid": "0000-0003-1668-3777",
                "clpid": "Wen-Gege"
            },
            {
                "family_name": "Li",
                "given_name": "Zongyi",
                "orcid": "0000-0003-2081-9665",
                "clpid": "Li-Zongyi"
            },
            {
                "family_name": "Long",
                "given_name": "Qirui",
                "orcid": "0000-0002-6572-4021",
                "clpid": "Long-Qirui"
            },
            {
                "family_name": "Azizzadenesheli",
                "given_name": "Kamyar",
                "orcid": "0000-0001-8507-1868",
                "clpid": "Azizzadenesheli-Kamyar"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Benson",
                "given_name": "Sally M.",
                "orcid": "0000-0002-3733-4296",
                "clpid": "Benson-Sally-M"
            }
        ],
        "abstract": "Carbon capture and storage (CCS) plays an essential role in global decarbonization. Scaling up CCS deployment requires accurate and high-resolution modeling of the storage reservoir pressure buildup and the gaseous plume migration. However, such modeling is very challenging at scale due to the high computational costs of existing numerical methods. This challenge leads to significant uncertainties in evaluating storage opportunities, which can delay the pace of large-scale CCS deployment. We introduce Nested Fourier Neural Operator (FNO), a machine-learning framework for high-resolution dynamic 3D CO\u2082 storage modeling at a basin scale. Nested FNO produces forecasts at different refinement levels using a hierarchy of FNOs and speeds up flow prediction nearly 700\u2006000 times compared to existing methods. By learning the solution operator for the family of governing partial differential equations, Nested FNO creates a general-purpose numerical simulator alternative for CO\u2082 storage with diverse reservoir conditions, geological heterogeneity, and injection schemes. Our framework enables unprecedented real-time modeling and probabilistic simulations that can support the scale-up of global CCS deployment.",
        "doi": "10.1039/d2ee04204e",
        "issn": "1754-5692",
        "publisher": "Royal Society of Chemistry",
        "publication": "Energy and Environmental Science",
        "publication_date": "2023-04",
        "series_number": "4",
        "volume": "16",
        "issue": "4",
        "pages": "1732-1741"
    },
    {
        "id": "authors:25yyn-6ch14",
        "collection": "authors",
        "collection_id": "25yyn-6ch14",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20220930-482429300.5",
        "type": "article",
        "title": "Assessing the efficacy of dissection gestures in robotic surgery",
        "author": [
            {
                "family_name": "Inouye",
                "given_name": "Daniel A.",
                "orcid": "0000-0001-7202-4800",
                "clpid": "Inouye-Daniel-A"
            },
            {
                "family_name": "Ma",
                "given_name": "Runzhuo",
                "orcid": "0000-0001-6381-2661",
                "clpid": "Ma-Runzhuo"
            },
            {
                "family_name": "Nguyen",
                "given_name": "Jessica H.",
                "orcid": "0000-0003-0454-8463",
                "clpid": "Nguyen-Jessica-H"
            },
            {
                "family_name": "Laca",
                "given_name": "Jasper",
                "clpid": "Laca-Jasper-A"
            },
            {
                "family_name": "Kocielnik",
                "given_name": "Rafal",
                "orcid": "0000-0001-5602-6056",
                "clpid": "Kocielnik-Rafal"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Hung",
                "given_name": "Andrew J.",
                "orcid": "0000-0002-7201-6736",
                "clpid": "Hung-Andrew-J"
            }
        ],
        "abstract": "Our group previously defined a dissection gesture classification system that deconstructs robotic tissue dissection into its most elemental yet meaningful movements. The purpose of this study was to expand upon this framework by adding an assessment of gesture efficacy (ineffective, effective, or erroneous) and analyze dissection patterns between groups of surgeons of varying experience. We defined three possible gesture efficacies as ineffective (no meaningful effect on the tissue), effective (intended effect on the tissue), and erroneous (unintended disruption of the tissue). Novices (0 prior robotic cases), intermediates (1\u201399 cases), and experts (\u2265\u2009100 cases) completed a robotic dissection task in a dry-lab training environment. Video recordings were reviewed to classify each gesture and determine its efficacy, then dissection patterns between groups were analyzed. 23 participants completed the task, with 9 novices, 8 intermediates with median caseload 60 (IQR 41\u201380), and 6 experts with median caseload 525 (IQR 413\u2013900). For gesture selection, we found increasing experience associated with increasing proportion of overall dissection gestures (p\u2009=\u20090.009) and decreasing proportion of retraction gestures (p\u2009=\u20090.009). For gesture efficacy, novices performed the greatest proportion of ineffective gestures (9.8%, p\u2009&lt;\u20090.001), intermediates commit the greatest proportion of erroneous gestures (26.8%, p\u2009&lt;\u20090.001), and the three groups performed similar proportions of overall effective gestures, though experts performed the greatest proportion of effective retraction gestures (85.6%, p\u2009&lt;\u20090.001). Between groups of experience, we found significant differences in gesture selection and gesture efficacy. These relationships may provide insight into further improving surgical training.",
        "doi": "10.1007/s11701-022-01458-x",
        "issn": "1863-2491",
        "publisher": "Springer",
        "publication": "Journal of Robotic Surgery",
        "publication_date": "2023-04",
        "series_number": "2",
        "volume": "17",
        "issue": "2",
        "pages": "597-603"
    },
    {
        "id": "authors:w21s1-f9826",
        "collection": "authors",
        "collection_id": "w21s1-f9826",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20230502-987371300.6",
        "type": "article",
        "title": "A multi-institutional study using artificial intelligence to provide reliable and fair feedback to surgeons",
        "author": [
            {
                "family_name": "Kiyasseh",
                "given_name": "Dani",
                "orcid": "0000-0002-2898-1790",
                "clpid": "Kiyasseh-Dani"
            },
            {
                "family_name": "Laca",
                "given_name": "Jasper",
                "clpid": "Laca-Jasper-A"
            },
            {
                "family_name": "Haque",
                "given_name": "Taseen F.",
                "orcid": "0000-0002-7165-6539",
                "clpid": "Haque-Taseen-F"
            },
            {
                "family_name": "Miles",
                "given_name": "Brian J.",
                "orcid": "0000-0001-7927-9873",
                "clpid": "Miles-Brian-J"
            },
            {
                "family_name": "Wagner",
                "given_name": "Christian",
                "clpid": "Wagner-Christian"
            },
            {
                "family_name": "Donoho",
                "given_name": "Daniel A.",
                "orcid": "0000-0002-0531-1436",
                "clpid": "Donoho-Daniel-A"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Hung",
                "given_name": "Andrew J.",
                "orcid": "0000-0002-7201-6736",
                "clpid": "Hung-Andrew-J"
            }
        ],
        "abstract": "Background. Surgeons who receive reliable feedback on their performance quickly master the skills necessary for surgery. Such performance-based feedback can be provided by a recently-developed artificial intelligence (AI) system that assesses a surgeon's skills based on a surgical video while simultaneously highlighting aspects of the video most pertinent to the assessment. However, it remains an open question whether these highlights, or explanations, are equally reliable for all surgeons. \n              \nMethods. Here, we systematically quantify the reliability of AI-based explanations on surgical videos from three hospitals across two continents by comparing them to explanations generated by humans experts. To improve the reliability of AI-based explanations, we propose the strategy of training with explanations \u2013TWIX \u2013which uses human explanations as supervision to explicitly teach an AI system to highlight important video frames. \n              \nResults. We show that while AI-based explanations often align with human explanations, they are not equally reliable for different sub-cohorts of surgeons (e.g., novices vs. experts), a phenomenon we refer to as an explanation bias. We also show that TWIX enhances the reliability of AI-based explanations, mitigates the explanation bias, and improves the performance of AI systems across hospitals. These findings extend to a training environment where medical students can be provided with feedback today. \n              \nConclusions. Our study informs the impending implementation of AI-augmented surgical training and surgeon credentialing programs, and contributes to the safe and fair democratization of surgery.",
        "doi": "10.1038/s43856-023-00263-3",
        "pmcid": "PMC10063640",
        "issn": "2730-664X",
        "publisher": "Nature Publishing Group",
        "publication": "Communications Medicine",
        "publication_date": "2023-03-30",
        "volume": "3",
        "pages": "Art. No. 42"
    },
    {
        "id": "authors:32nad-tmr69",
        "collection": "authors",
        "collection_id": "32nad-tmr69",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20221110-430801400.16",
        "type": "article",
        "title": "Capturing fine-grained details for video-based automation of suturing skills assessment",
        "author": [
            {
                "family_name": "Hung",
                "given_name": "Andrew J.",
                "orcid": "0000-0002-7201-6736",
                "clpid": "Hung-Andrew-J"
            },
            {
                "family_name": "Bao",
                "given_name": "Richard",
                "clpid": "Bao-Richard"
            },
            {
                "family_name": "Sunmola",
                "given_name": "Idris O.",
                "clpid": "Sunmola-Idris-O"
            },
            {
                "family_name": "Huang",
                "given_name": "De-An",
                "orcid": "0000-0002-6945-7768",
                "clpid": "Huang-De-An"
            },
            {
                "family_name": "Nguyen",
                "given_name": "Jessica H.",
                "orcid": "0000-0003-0454-8463",
                "clpid": "Nguyen-Jessica-H"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            }
        ],
        "abstract": "Objectives: Manually-collected suturing technical skill scores are strong predictors of continence recovery after robotic radical prostatectomy. Herein, we automate suturing technical skill scoring through computer vision (CV) methods as a scalable method to provide feedback.\n\nMethods: Twenty-two surgeons completed a suturing exercise three times on the Mimic\u2122 Flex VR simulator. Instrument kinematic data (XYZ coordinates of each instrument and pose) were captured at 30 Hz. After standardized training, three human raters manually video segmented suturing task into four sub-stitch phases (Needle handling, Needle targeting, Needle driving, Needle withdrawal) and labeled the corresponding technical skill domains (Needle positioning, Needle entry, Needle driving, and Needle withdrawal). The CV framework extracted RGB features and optical flow frames using a pre-trained AlexNet. Additional CV strategies including auxiliary supervision (using kinematic data during training only) and attention mechanisms were implemented to improve performance.\n\nResults: This study included data from 15 expert surgeons (median caseload 300 [IQR 165\u2013750]) and 7 training surgeons (0 [IQR 0\u20138]). In all, 226 virtual sutures were captured. Automated assessments for Needle positioning performed best with the simplest approach (1 s video; AUC 0.749). Remaining skill domains exhibited improvements with the implementation of auxiliary supervision and attention mechanisms when deployed separately (AUC 0.604\u20130.794). All techniques combined produced the best performance, particularly for Needle driving and Needle withdrawal (AUC 0.959 and 0.879, respectively).\n\nConclusions: This study demonstrated the best performance of automated suturing technical skills assessment to date using advanced CV techniques. Future work will determine if a \"human in the loop\" is necessary to verify surgeon evaluations.",
        "doi": "10.1007/s11548-022-02778-x",
        "pmcid": "PMC9975072",
        "issn": "1861-6429",
        "publisher": "Springer",
        "publication": "International Journal of Computer Assisted Radiology and Surgery",
        "publication_date": "2023-03",
        "series_number": "3",
        "volume": "18",
        "issue": "3",
        "pages": "545-552"
    },
    {
        "id": "authors:rhktp-a6270",
        "collection": "authors",
        "collection_id": "rhktp-a6270",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20221017-15547800.39",
        "type": "article",
        "title": "#COVIDisAirborne: AI-enabled multiscale computational microscopy of delta SARS-CoV-2 in a respiratory aerosol",
        "author": [
            {
                "family_name": "Dommer",
                "given_name": "Abigail",
                "orcid": "0000-0003-4847-4136",
                "clpid": "Dommer-Abigail"
            },
            {
                "family_name": "Casalino",
                "given_name": "Lorenzo",
                "orcid": "0000-0003-3581-1148",
                "clpid": "Casalino-Lorenzo"
            },
            {
                "family_name": "Kearns",
                "given_name": "Fiona",
                "orcid": "0000-0002-5469-9035",
                "clpid": "Kearns-Fiona"
            },
            {
                "family_name": "Rosenfeld",
                "given_name": "Mia",
                "orcid": "0000-0002-8961-8231"
            },
            {
                "family_name": "Wauer",
                "given_name": "Nicholas",
                "orcid": "0000-0002-1230-9166"
            },
            {
                "family_name": "Ahn",
                "given_name": "Surl-Hee",
                "orcid": "0000-0002-3422-805X"
            },
            {
                "family_name": "Russo",
                "given_name": "John",
                "orcid": "0000-0002-2813-6554"
            },
            {
                "family_name": "Oliveira",
                "given_name": "Sofia",
                "orcid": "0000-0001-8753-4950"
            },
            {
                "family_name": "Morris",
                "given_name": "Clare",
                "orcid": "0000-0002-4314-5387"
            },
            {
                "family_name": "Bogetti",
                "given_name": "Anthony",
                "orcid": "0000-0003-0610-2879"
            },
            {
                "family_name": "Trifan",
                "given_name": "Anda",
                "orcid": "0000-0003-4808-9502"
            },
            {
                "family_name": "Brace",
                "given_name": "Alexander",
                "orcid": "0000-0001-9873-9177"
            },
            {
                "family_name": "Sztain",
                "given_name": "Terra",
                "orcid": "0000-0002-1327-8541"
            },
            {
                "family_name": "Clyde",
                "given_name": "Austin",
                "orcid": "0000-0002-3697-7070"
            },
            {
                "family_name": "Ma",
                "given_name": "Heng",
                "orcid": "0000-0002-7667-922X"
            },
            {
                "family_name": "Chennubhotla",
                "given_name": "Chakra",
                "orcid": "0000-0002-0024-1627"
            },
            {
                "family_name": "Lee",
                "given_name": "Hyungro",
                "orcid": "0000-0002-4221-7094"
            },
            {
                "family_name": "Turilli",
                "given_name": "Matteo",
                "orcid": "0000-0003-0527-1435"
            },
            {
                "family_name": "Khalid",
                "given_name": "Syma",
                "orcid": "0000-0002-3694-5044"
            },
            {
                "family_name": "Tamayo-Mendoza",
                "given_name": "Teresa"
            },
            {
                "family_name": "Welborn",
                "given_name": "Matthew",
                "orcid": "0000-0001-8659-6535"
            },
            {
                "family_name": "Christensen",
                "given_name": "Anders S.",
                "orcid": "0000-0002-7253-6897"
            },
            {
                "family_name": "Smith",
                "given_name": "Daniel G. A.",
                "orcid": "0000-0001-8626-0900"
            },
            {
                "family_name": "Qiao",
                "given_name": "Zhuoran",
                "orcid": "0000-0002-5704-7331",
                "clpid": "Qiao-Zhuoran"
            },
            {
                "family_name": "Sirumalla",
                "given_name": "Sai K."
            },
            {
                "family_name": "O'Connor",
                "given_name": "Michael"
            },
            {
                "family_name": "Manby",
                "given_name": "Frederick",
                "orcid": "0000-0001-7611-714X"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Hardy",
                "given_name": "David"
            },
            {
                "family_name": "Phillips",
                "given_name": "James",
                "orcid": "0000-0002-2296-3591"
            },
            {
                "family_name": "Stern",
                "given_name": "Abraham"
            },
            {
                "family_name": "Romero",
                "given_name": "Josh"
            },
            {
                "family_name": "Clark",
                "given_name": "David"
            },
            {
                "family_name": "Dorrell",
                "given_name": "Mitchell"
            },
            {
                "family_name": "Maiden",
                "given_name": "Tom"
            },
            {
                "family_name": "Huang",
                "given_name": "Lei"
            },
            {
                "family_name": "McCalpin",
                "given_name": "John",
                "orcid": "0000-0002-2535-1355"
            },
            {
                "family_name": "Woods",
                "given_name": "Christopher",
                "orcid": "0000-0001-6563-9903"
            },
            {
                "family_name": "Gray",
                "given_name": "Alan"
            },
            {
                "family_name": "Williams",
                "given_name": "Matt",
                "orcid": "0000-0003-2198-1058"
            },
            {
                "family_name": "Barker",
                "given_name": "Bryan"
            },
            {
                "family_name": "Rajapaksha",
                "given_name": "Harinda"
            },
            {
                "family_name": "Pitts",
                "given_name": "Richard",
                "orcid": "0000-0002-2037-3360"
            },
            {
                "family_name": "Gibbs",
                "given_name": "Tom"
            },
            {
                "family_name": "Stone",
                "given_name": "John",
                "orcid": "0000-0001-7215-762X"
            },
            {
                "family_name": "Zuckerman",
                "given_name": "Daniel M.",
                "orcid": "0000-0001-7662-2031"
            },
            {
                "family_name": "Mulholland",
                "given_name": "Adrian J.",
                "orcid": "0000-0003-1015-4567"
            },
            {
                "family_name": "Miller",
                "given_name": "Thomas F., III",
                "orcid": "0000-0002-1882-5380",
                "clpid": "Miller-T-F-III"
            },
            {
                "family_name": "Jha",
                "given_name": "Shantenu",
                "orcid": "0000-0002-5040-026X"
            },
            {
                "family_name": "Ramanathan",
                "given_name": "Arvind",
                "orcid": "0000-0002-1622-5488"
            },
            {
                "family_name": "Chong",
                "given_name": "Lillian",
                "orcid": "0000-0002-0590-483X"
            },
            {
                "family_name": "Amaro",
                "given_name": "Rommie E.",
                "orcid": "0000-0002-9275-9553"
            }
        ],
        "abstract": "We seek to completely revise current models of airborne transmission of respiratory viruses by providing never-before-seen atomic-level views of the SARS-CoV-2 virus within a respiratory aerosol. Our work dramatically extends the capabilities of multiscale computational microscopy to address the significant gaps that exist in current experimental methods, which are limited in their ability to interrogate aerosols at the atomic/molecular level and thus obscure our understanding of airborne transmission. We demonstrate how our integrated data-driven platform provides a new way of exploring the composition, structure, and dynamics of aerosols and aerosolized viruses, while driving simulation method development along several important axes. We present a series of initial scientific discoveries for the SARS-CoV-2 Delta variant, noting that the full scientific impact of this work has yet to be realized.",
        "doi": "10.1177/10943420221128233",
        "pmcid": "PMC9527558",
        "issn": "1094-3420",
        "publisher": "SAGE Publications",
        "publication": "International Journal of High Performance Computing Applications",
        "publication_date": "2023-01",
        "series_number": "1",
        "volume": "37",
        "issue": "1",
        "pages": "28-44"
    },
    {
        "id": "authors:qw8kb-57104",
        "collection": "authors",
        "collection_id": "qw8kb-57104",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20230209-988069100.14",
        "type": "article",
        "title": "Surgical gestures as a method to quantify surgical performance and predict patient outcomes",
        "author": [
            {
                "family_name": "Ma",
                "given_name": "Runzhuo",
                "orcid": "0000-0001-6381-2661",
                "clpid": "Ma-Runzhuo"
            },
            {
                "family_name": "Ramaswamy",
                "given_name": "Ashwin",
                "orcid": "0000-0002-8816-7838",
                "clpid": "Ramaswamy-Ashwin"
            },
            {
                "family_name": "Xu",
                "given_name": "Jiashu",
                "orcid": "0000-0003-4093-2315",
                "clpid": "Xu-Jiashu"
            },
            {
                "family_name": "Trinh",
                "given_name": "Loc",
                "clpid": "Trinh-Loc"
            },
            {
                "family_name": "Kiyasseh",
                "given_name": "Dani",
                "orcid": "0000-0002-2898-1790",
                "clpid": "Kiyasseh-Dani"
            },
            {
                "family_name": "Chu",
                "given_name": "Timothy N.",
                "clpid": "Chu-Timothy-N"
            },
            {
                "family_name": "Wong",
                "given_name": "Elyssa Y.",
                "clpid": "Wong-Elyssa-Y"
            },
            {
                "family_name": "Lee",
                "given_name": "Ryan S.",
                "clpid": "Lee-Ryan-S"
            },
            {
                "family_name": "Rodriguez",
                "given_name": "Ivan",
                "clpid": "Rodriguez-Ivan"
            },
            {
                "family_name": "DeMeo",
                "given_name": "Gina",
                "clpid": "DeMeo-Gina"
            },
            {
                "family_name": "Desai",
                "given_name": "Aditya",
                "clpid": "Desai-Aditya"
            },
            {
                "family_name": "Otiato",
                "given_name": "Maxwell X.",
                "orcid": "0000-0001-6979-6316",
                "clpid": "Otiato-Maxwell-X"
            },
            {
                "family_name": "Roberts",
                "given_name": "Sidney I.",
                "clpid": "Roberts-Sidney-I"
            },
            {
                "family_name": "Nguyen",
                "given_name": "Jessica H.",
                "orcid": "0000-0003-0454-8463",
                "clpid": "Nguyen-Jessica-H"
            },
            {
                "family_name": "Laca",
                "given_name": "Jasper",
                "clpid": "Laca-Jasper-A"
            },
            {
                "family_name": "Liu",
                "given_name": "Yan",
                "orcid": "0000-0002-5837-4908",
                "clpid": "Liu-Yan"
            },
            {
                "family_name": "Urbanova",
                "given_name": "Katarina",
                "clpid": "Urbanova-Katarina"
            },
            {
                "family_name": "Wagner",
                "given_name": "Christian",
                "clpid": "Wagner-Christian"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Hu",
                "given_name": "Jim C.",
                "orcid": "0000-0003-2562-8024",
                "clpid": "Hu-Jim-C"
            },
            {
                "family_name": "Hung",
                "given_name": "Andrew J.",
                "orcid": "0000-0002-7201-6736",
                "clpid": "Hung-Andrew-J"
            }
        ],
        "abstract": "How well a surgery is performed impacts a patient's outcomes; however, objective quantification of performance remains an unsolved challenge. Deconstructing a procedure into discrete instrument-tissue \"gestures\" is a emerging way to understand surgery. To establish this paradigm in a procedure where performance is the most important factor for patient outcomes, we identify 34,323 individual gestures performed in 80 nerve-sparing robot-assisted radical prostatectomies from two international medical centers. Gestures are classified into nine distinct dissection gestures (e.g., hot cut) and four supporting gestures (e.g., retraction). Our primary outcome is to identify factors impacting a patient's 1-year erectile function (EF) recovery after radical prostatectomy. We find that less use of hot cut and more use of peel/push are statistically associated with better chance of 1-year EF recovery. Our results also show interactions between surgeon experience and gesture types\u2014similar gesture selection resulted in different EF recovery rates dependent on surgeon experience. To further validate this framework, two teams independently constructe distinct machine learning models using gesture sequences vs. traditional clinical features to predict 1-year EF. In both models, gesture sequences are able to better predict 1-year EF (Team 1: AUC 0.77, 95% CI 0.73\u20130.81; Team 2: AUC 0.68, 95% CI 0.66\u20130.70) than traditional clinical features (Team 1: AUC 0.69, 95% CI 0.65\u20130.73; Team 2: AUC 0.65, 95% CI 0.62\u20130.68). Our results suggest that gestures provide a granular method to objectively indicate surgical performance and outcomes. Application of this methodology to other surgeries may lead to discoveries on methods to improve surgery.",
        "doi": "10.1038/s41746-022-00738-y",
        "pmcid": "PMC9780308",
        "issn": "2398-6352",
        "publisher": "Springer Science and Business Media LLC",
        "publication": "npj Digital Medicine",
        "publication_date": "2022-12-22",
        "volume": "5",
        "pages": "Art. No. 187"
    },
    {
        "id": "authors:vwezt-wv943",
        "collection": "authors",
        "collection_id": "vwezt-wv943",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20221202-906480600.2",
        "type": "article",
        "title": "LNS-Madam: Low-Precision Training in Logarithmic Number System using Multiplicative Weight Update",
        "author": [
            {
                "family_name": "Zhao",
                "given_name": "Jiawei",
                "orcid": "0000-0002-5726-6040",
                "clpid": "Zhao-Jiawei"
            },
            {
                "family_name": "Dai",
                "given_name": "Steve",
                "orcid": "0000-0002-5045-1964",
                "clpid": "Dai-Steve"
            },
            {
                "family_name": "Venkatesan",
                "given_name": "Rangharajan",
                "clpid": "Venkatesan-Rangharajan"
            },
            {
                "family_name": "Zimmer",
                "given_name": "Brian",
                "orcid": "0000-0001-9997-3141",
                "clpid": "Zimmer-Brian"
            },
            {
                "family_name": "Ali",
                "given_name": "Mustafa",
                "orcid": "0000-0002-4452-6464",
                "clpid": "Ali-Mustafa"
            },
            {
                "family_name": "Liu",
                "given_name": "Ming-Yu",
                "orcid": "0000-0002-2951-2398",
                "clpid": "Liu-Ming-Yu"
            },
            {
                "family_name": "Khailany",
                "given_name": "Brucek",
                "orcid": "0000-0002-7584-3489",
                "clpid": "Khailany-Brucek"
            },
            {
                "family_name": "Dally",
                "given_name": "William J.",
                "orcid": "0000-0003-4632-2876",
                "clpid": "Dally-William-J"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            }
        ],
        "abstract": "Representing deep neural networks (DNNs) in low-precision is a promising approach to enable efficient acceleration and memory reduction. Previous methods that train DNNs in low-precision typically keep a copy of weights in high-precision during the weight updates. Directly training with low-precision weights leads to accuracy degradation due to complex interactions between the low-precision number systems and the learning algorithms. To address this issue, we develop a co-designed low-precision training framework, termed LNS-Madam, in which we jointly design a logarithmic number system (LNS) and a multiplicative weight update algorithm (Madam). We prove that LNS-Madam results in low quantization error during weight updates, leading to stable performance even if the precision is limited. We further propose a hardware design of LNS-Madam that resolves practical challenges in implementing an efficient datapath for LNS computations. Our implementation effectively reduces energy overhead incurred by LNS-to-integer conversion and partial sum accumulation. Experimental results show that LNS-Madam achieves comparable accuracy to full-precision counterparts with only 8 bits on popular computer vision and natural language tasks. Compared to FP32 and FP8, LNS-Madam reduces the energy consumption by over 90% and 55%, respectively.",
        "doi": "10.1109/tc.2022.3202747",
        "issn": "0018-9340",
        "publisher": "IEEE",
        "publication": "IEEE Transactions on Computers",
        "publication_date": "2022-12",
        "series_number": "12",
        "volume": "71",
        "issue": "12",
        "pages": "3179-3190"
    },
    {
        "id": "authors:c9g79-b2898",
        "collection": "authors",
        "collection_id": "c9g79-b2898",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20221122-564647900.20",
        "type": "article",
        "title": "Using Real-time Feedback To Improve Surgical Performance on a Robotic Tissue Dissection Task",
        "author": [
            {
                "family_name": "Laca",
                "given_name": "Jasper A.",
                "clpid": "Laca-Jasper-A"
            },
            {
                "family_name": "Kocielnik",
                "given_name": "Rafal",
                "orcid": "0000-0001-5602-6056",
                "clpid": "Kocielnik-Rafal"
            },
            {
                "family_name": "Nguyen",
                "given_name": "Jessica H.",
                "orcid": "0000-0003-0454-8463",
                "clpid": "Nguyen-Jessica-H"
            },
            {
                "family_name": "You",
                "given_name": "Jonathan",
                "clpid": "You-Jonathan"
            },
            {
                "family_name": "Tsang",
                "given_name": "Ryan",
                "clpid": "Tsang-Ryan"
            },
            {
                "family_name": "Wong",
                "given_name": "Elyssa Y.",
                "clpid": "Wong-Elyssa-Y"
            },
            {
                "family_name": "Shtulman",
                "given_name": "Andrew",
                "orcid": "0000-0002-4687-3099",
                "clpid": "Shtulman-Andrew"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Hung",
                "given_name": "Andrew J.",
                "orcid": "0000-0002-7201-6736",
                "clpid": "Hung-Andrew-J"
            }
        ],
        "abstract": "Background: There is no standard for the feedback that an attending surgeon provides to a training surgeon, which may lead to variable outcomes in teaching cases. \n\nObjective: To create and administer standardized feedback to medical students in an attempt to improve performance and learning. \n\nDesign, setting, and participants: A cohort of 45 medical students was recruited from a single medical school. Participants were randomly assigned to two groups. Both completed two rounds of a robotic surgical dissection task on a da Vinci Xi surgical system. The first round was the baseline assessment. In the second round, one group received feedback and the other served as the control (no feedback). \n\nOutcome measurements and statistical analysis: Video from each round was retrospectively reviewed by four blinded raters and given a total error tally (primary outcome) and a technical skills score (Global Evaluative Assessment of Robotic Surgery [GEARS]). Generalized linear models were used for statistical modeling. According to their initial performance, each participant was categorized as either an innate performer or an underperformer, depending on whether their error tally was above or below the median. \n\nResults and limitations: In round 2, the intervention group had a larger decrease in error rate than the control group, with a risk ratio (RR) of 1.51 (95% confidence interval [CI] 1.07\u20132.14; p = 0.02). The intervention group also had a greater increase in GEARS score in comparison to the control group, with a mean group difference of 2.15 (95% CI 0.81\u20133.49; p &lt; 0.01). The interaction effect between innate performers versus underperformers and the intervention was statistically significant for the error rates, at F(1,38) = 5.16 (p = 0.03). Specifically, the intervention had a statistically significant effect on the error rate for underperformers (RR 2.23, 95% CI 1.37\u20133.62; p &lt; 0.01) but not for innate performers (RR 1.03, 95% CI 0.63\u20131.68; p = 0.91). \n\nConclusions: Real-time feedback improved performance globally compared to the control. The benefit of real-time feedback was stronger for underperformers than for trainees with innate skill. \n\nPatient summary: We found that real-time feedback during a training task using a surgical robot improved the performance of trainees when the task was repeated. This feedback approach could help in training doctors in robotic surgery.",
        "doi": "10.1016/j.euros.2022.09.015",
        "pmcid": "PMC9732447",
        "issn": "2666-1683",
        "publisher": "Elsevier",
        "publication": "European Urology Open Science",
        "publication_date": "2022-12",
        "volume": "46",
        "pages": "15-21"
    },
    {
        "id": "authors:dsq3y-8gb92",
        "collection": "authors",
        "collection_id": "dsq3y-8gb92",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20221011-459145000.39",
        "type": "article",
        "title": "Intelligent resolution: Integrating Cryo-EM with AI-driven multi-resolution simulations to observe the severe acute respiratory syndrome coronavirus-2 replication-transcription machinery in action",
        "author": [
            {
                "family_name": "Trifan",
                "given_name": "Anda",
                "orcid": "0000-0003-4808-9502",
                "clpid": "Trifan-Anda"
            },
            {
                "family_name": "Gorgun",
                "given_name": "Defne"
            },
            {
                "family_name": "Salim",
                "given_name": "Michael"
            },
            {
                "family_name": "Li",
                "given_name": "Zongyi",
                "clpid": "Li-Zongyi"
            },
            {
                "family_name": "Brace",
                "given_name": "Alexander"
            },
            {
                "family_name": "Zvyagin",
                "given_name": "Maxim"
            },
            {
                "family_name": "Ma",
                "given_name": "Heng"
            },
            {
                "family_name": "Clyde",
                "given_name": "Austin"
            },
            {
                "family_name": "Clark",
                "given_name": "David"
            },
            {
                "family_name": "Hardy",
                "given_name": "David J."
            },
            {
                "family_name": "Burnley",
                "given_name": "Tom"
            },
            {
                "family_name": "Huang",
                "given_name": "Lei"
            },
            {
                "family_name": "McCalpin",
                "given_name": "John"
            },
            {
                "family_name": "Emani",
                "given_name": "Murali"
            },
            {
                "family_name": "Yoo",
                "given_name": "Hyenseung"
            },
            {
                "family_name": "Yin",
                "given_name": "Junqi"
            },
            {
                "family_name": "Tsaris",
                "given_name": "Aristeidis"
            },
            {
                "family_name": "Subbiah",
                "given_name": "Vishal"
            },
            {
                "family_name": "Raza",
                "given_name": "Tanveer"
            },
            {
                "family_name": "Liu",
                "given_name": "Jessica"
            },
            {
                "family_name": "Trebesch",
                "given_name": "Noah"
            },
            {
                "family_name": "Wells",
                "given_name": "Geoffrey"
            },
            {
                "family_name": "Mysore",
                "given_name": "Venkatesh"
            },
            {
                "family_name": "Gibbs",
                "given_name": "Thomas"
            },
            {
                "family_name": "Phillips",
                "given_name": "James"
            },
            {
                "family_name": "Chennubhotla",
                "given_name": "S. Chakra"
            },
            {
                "family_name": "Foster",
                "given_name": "Ian",
                "orcid": "0000-0003-2129-5269"
            },
            {
                "family_name": "Stevens",
                "given_name": "Rick"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Vishwanath",
                "given_name": "Venkatram"
            },
            {
                "family_name": "Stone",
                "given_name": "John E."
            },
            {
                "family_name": "Tajkhorshid",
                "given_name": "Emad"
            },
            {
                "family_name": "Harris",
                "given_name": "Sarah A."
            },
            {
                "family_name": "Ramanathan",
                "given_name": "Arvind",
                "orcid": "0000-0002-1622-5488"
            }
        ],
        "abstract": "The severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) replication transcription complex (RTC) is a multi-domain protein responsible for replicating and transcribing the viral mRNA inside a human cell. Attacking RTC function with pharmaceutical compounds is a pathway to treating COVID-19. Conventional tools, e.g. cryo-electron microscopy and all-atom molecular dynamics (AAMD), do not provide sufficiently high resolution or timescale to capture important dynamics of this molecular machine. Consequently, we develop an innovative workflow that bridges the gap between these resolutions, using mesoscale fluctuating finite element analysis (FFEA) continuum simulations and a hierarchy of AI-methods that continually learn and infer features for maintaining consistency between AAMD and FFEA simulations. We leverage a multi-site distributed workflow manager to orchestrate AI, FFEA, and AAMD jobs, providing optimal resource utilization across HPC centers. Our study provides unprecedented access to study the SARS-CoV-2 RTC machinery, while providing general capability for AI-enabled multi-resolution simulations at scale.",
        "doi": "10.1177/10943420221113513",
        "issn": "1094-3420",
        "publisher": "SAGE Publications",
        "publication": "International Journal of High Performance Computing Applications",
        "publication_date": "2022-10-12"
    },
    {
        "id": "authors:2hkzz-hy091",
        "collection": "authors",
        "collection_id": "2hkzz-hy091",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20220714-224603901",
        "type": "article",
        "title": "Neural Scene Representation for Locomotion on Structured Terrain",
        "author": [
            {
                "family_name": "Hoeller",
                "given_name": "David",
                "orcid": "0000-0001-8010-9011",
                "clpid": "Hoeller-David"
            },
            {
                "family_name": "Rudin",
                "given_name": "Nikita",
                "orcid": "0000-0001-5893-0348",
                "clpid": "Rudin-Nikita"
            },
            {
                "family_name": "Choy",
                "given_name": "Christopher",
                "orcid": "0000-0002-6566-3193",
                "clpid": "Choy-Christopher"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Hutter",
                "given_name": "Marco",
                "orcid": "0000-0001-9049-534X",
                "clpid": "Hutter-Marco"
            }
        ],
        "abstract": "We propose a learning-based method to reconstruct the local terrain for locomotion with a mobile robot traversing urban environments. Using a stream of depth measurements from the onboard cameras and the robot's trajectory, the algorithm estimates the topography in the robot's vicinity. The raw measurements from these cameras are noisy and only provide partial and occluded observations that in many cases do not show the terrain the robot stands on. Therefore, we propose a 3D reconstruction model that faithfully reconstructs the scene, despite the noisy measurements and large amounts of missing data coming from the blind spots of the camera arrangement. The model consists of a 4D fully convolutional network on point clouds that learns the geometric priors to complete the scene from the context and an auto-regressive feedback to leverage spatio-temporal consistency and use evidence from the past. The network can be solely trained with synthetic data, and due to extensive augmentation, it is robust in the real world, as shown in the validation on a quadrupedal robot, ANYmal, traversing challenging settings. We run the pipeline on the robot's onboard low-power computer using an efficient sparse tensor implementation and show that the proposed method outperforms classical map representations.",
        "doi": "10.1109/LRA.2022.3184779",
        "issn": "2377-3766",
        "publisher": "IEEE",
        "publication": "IEEE Robotics and Automation Letters",
        "publication_date": "2022-10",
        "series_number": "4",
        "volume": "7",
        "issue": "4",
        "pages": "8667-8674"
    },
    {
        "id": "authors:p7bmp-vrd31",
        "collection": "authors",
        "collection_id": "p7bmp-vrd31",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20220112-7446100",
        "type": "article",
        "title": "Use of surgical video\u2013based automated performance metrics to predict blood loss and success of simulated vascular injury control in neurosurgery: a pilot study",
        "author": [
            {
                "family_name": "Pangal",
                "given_name": "Dhiraj J.",
                "orcid": "0000-0001-7391-9825",
                "clpid": "Pangal-Dhiraj-J"
            },
            {
                "family_name": "Kugener",
                "given_name": "Guillaume",
                "orcid": "0000-0002-4697-2847",
                "clpid": "Kugener-Guillaume"
            },
            {
                "family_name": "Cardinal",
                "given_name": "Tyler",
                "orcid": "0000-0001-8277-6942",
                "clpid": "Cardinal-Tyler"
            },
            {
                "family_name": "Lechtholz-Zey",
                "given_name": "Elizabeth",
                "clpid": "Lechtholz-Zey-Elizabeth"
            },
            {
                "family_name": "Collet",
                "given_name": "Casey",
                "clpid": "Collet-Casey"
            },
            {
                "family_name": "Lasky",
                "given_name": "Sasha",
                "clpid": "Lasky-Sasha"
            },
            {
                "family_name": "Sundaram",
                "given_name": "Shivani",
                "orcid": "0000-0003-2863-9204",
                "clpid": "Sundaram-Shivani"
            },
            {
                "family_name": "Zhu",
                "given_name": "Yichao",
                "clpid": "Zhu-Yichao"
            },
            {
                "family_name": "Roshannai",
                "given_name": "Arman",
                "clpid": "Roshannai-Arman"
            },
            {
                "family_name": "Chan",
                "given_name": "Justin",
                "clpid": "Chan-Justin"
            },
            {
                "family_name": "Sinha",
                "given_name": "Aditya",
                "clpid": "Sinha-Aditya"
            },
            {
                "family_name": "Hung",
                "given_name": "Andrew J.",
                "orcid": "0000-0002-7201-6736",
                "clpid": "Hung-Andrew-J"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Zada",
                "given_name": "Gabriel",
                "orcid": "0000-0001-5821-902X",
                "clpid": "Zada-Gabriel"
            },
            {
                "family_name": "Donoho",
                "given_name": "Daniel A.",
                "orcid": "0000-0002-0531-1436",
                "clpid": "Donoho-Daniel-A"
            }
        ],
        "abstract": "Objective: Experts can assess surgeon skill using surgical video, but a limited number of expert surgeons are available. Automated performance metrics (APMs) are a promising alternative but have not been created from operative videos in neurosurgery to date. The authors aimed to evaluate whether video-based APMs can predict task success and blood loss during endonasal endoscopic surgery in a validated cadaveric simulator of vascular injury of the internal carotid artery. \n\nMethods: Videos of cadaveric simulation trials by 73 neurosurgeons and otorhinolaryngologists were analyzed and manually annotated with bounding boxes to identify the surgical instruments in the frame. APMs in five domains were defined\u2014instrument usage, time-to-phase, instrument disappearance, instrument movement, and instrument interactions\u2014on the basis of expert analysis and task-specific surgical progressions. Bounding-box data of instrument position were then used to generate APMs for each trial. Multivariate linear regression was used to test for the associations between APMs and blood loss and task success (hemorrhage control in less than 5 minutes). The APMs of 93 successful trials were compared with the APMs of 49 unsuccessful trials. \n\nResults: In total, 29,151 frames of surgical video were annotated. Successful simulation trials had superior APMs in each domain, including proportionately more time spent with the key instruments in view (p 2 value of 0.87 (p &lt; 0.001). \n\nConclusions: Video-based APMs were superior predictors of simulation trial success and blood loss than surgeon characteristics such as case volume and attending status. Surgeon educators can use APMs to assess competency, quantify performance, and provide actionable, structured feedback in order to improve patient outcomes. Validation of APMs provides a benchmark for further development of fully automated video assessment pipelines that utilize machine learning and computer vision.",
        "doi": "10.3171/2021.10.jns211064",
        "issn": "0022-3085",
        "publisher": "American Association of Neurological Surgeons",
        "publication": "Journal of Neurosurgery",
        "publication_date": "2022-09",
        "series_number": "3",
        "volume": "137",
        "issue": "3",
        "pages": "840-849"
    },
    {
        "id": "authors:5dfyx-1yq85",
        "collection": "authors",
        "collection_id": "5dfyx-1yq85",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20220908-194215690",
        "type": "article",
        "title": "Validation of Machine Learning-Based Automated Surgical Instrument Annotation Using Publicly Available Intraoperative Video",
        "author": [
            {
                "family_name": "Markarian",
                "given_name": "Nicholas",
                "clpid": "Markarian-Nicholas"
            },
            {
                "family_name": "Kugener",
                "given_name": "Guillaume",
                "orcid": "0000-0002-4697-2847",
                "clpid": "Kugener-Guillaume"
            },
            {
                "family_name": "Pangal",
                "given_name": "Dhiraj J.",
                "orcid": "0000-0001-7391-9825",
                "clpid": "Pangal-Dhiraj-J"
            },
            {
                "family_name": "Unadkat",
                "given_name": "Vyom",
                "clpid": "Unadkat-Vyom"
            },
            {
                "family_name": "Sinha",
                "given_name": "Aditya",
                "clpid": "Sinha-Aditya"
            },
            {
                "family_name": "Zhu",
                "given_name": "Yichao",
                "clpid": "Zhu-Yichao"
            },
            {
                "family_name": "Roshannai",
                "given_name": "Arman",
                "clpid": "Roshannai-Arman"
            },
            {
                "family_name": "Chan",
                "given_name": "Justin",
                "clpid": "Chan-Justin"
            },
            {
                "family_name": "Hung",
                "given_name": "Andrew J.",
                "orcid": "0000-0002-7201-6736",
                "clpid": "Hung-Andrew-J"
            },
            {
                "family_name": "Wrobel",
                "given_name": "Bozena B.",
                "clpid": "Wrobel-Bozena-B"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Zada",
                "given_name": "Gabriel",
                "orcid": "0000-0001-5821-902X",
                "clpid": "Zada-Gabriel"
            },
            {
                "family_name": "Donoho",
                "given_name": "Daniel A.",
                "orcid": "0000-0002-0531-1436",
                "clpid": "Donoho-Daniel-A"
            }
        ],
        "abstract": "BACKGROUND: Intraoperative tool movement data have been demonstrated to be clinically useful in quantifying surgical performance. However, collecting this information from intraoperative video requires laborious hand annotation. The ability to automatically annotate tools in surgical video would advance surgical data science by eliminating a time-intensive step in research. \n\nOBJECTIVE: To identify whether machine learning (ML) can automatically identify surgical instruments contained within neurosurgical video. \n\nMETHODS: A ML model which automatically identifies surgical instruments in frame was developed and trained on multiple publicly available surgical video data sets with instrument location annotations. A total of 39\u2009693 frames from 4 data sets were used (endoscopic endonasal surgery [EEA] [30\u2009015 frames], cataract surgery [4670], laparoscopic cholecystectomy [2532], and microscope-assisted brain/spine tumor removal [2476]). A second model trained only on EEA video was also developed. Intraoperative EEA videos from YouTube were used for test data (3 videos, 1239 frames). \n\nRESULTS: The YouTube data set contained 2169 total instruments. Mean average precision (mAP) for instrument detection on the YouTube data set was 0.74. The mAP for each individual video was 0.65, 0.74, and 0.89. The second model trained only on EEA video also had an overall mAP of 0.74 (0.62, 0.84, and 0.88 for individual videos). Development costs were $130 for manual video annotation and under $100 for computation. \n\nCONCLUSION: Surgical instruments contained within endoscopic endonasal intraoperative video can be detected using a fully automated ML model. The addition of disparate surgical data sets did not improve model performance, although these data sets may improve generalizability of the model in other use cases.",
        "doi": "10.1227/ons.0000000000000274",
        "issn": "2332-4252",
        "publisher": "Wolters Kluwer",
        "publication": "Operative Neurosurgery",
        "publication_date": "2022-09",
        "series_number": "3",
        "volume": "23",
        "issue": "3",
        "pages": "235-240"
    },
    {
        "id": "authors:hpyps-c1q28",
        "collection": "authors",
        "collection_id": "hpyps-c1q28",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20220909-232706000",
        "type": "article",
        "title": "Variational quantum optimization with multibasis encodings",
        "author": [
            {
                "family_name": "Patti",
                "given_name": "Taylor L.",
                "orcid": "0000-0002-4242-6072",
                "clpid": "Patti-Taylor-L"
            },
            {
                "family_name": "Kossaifi",
                "given_name": "Jean",
                "orcid": "0000-0002-4445-3429",
                "clpid": "Kossaifi-Jean"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Yelin",
                "given_name": "Susanne F.",
                "clpid": "Yelin-Susanne-F"
            }
        ],
        "abstract": "Despite extensive research efforts, few quantum algorithms for classical optimization demonstrate a realizable quantum advantage. The utility of many quantum algorithms is limited by high requisite circuit depth and nonconvex optimization landscapes. We tackle these challenges by introducing a variational quantum algorithm that benefits from two innovations: multibasis graph encodings using single-qubit expectation values and nonlinear activation functions. Our technique results in increased observed optimization performance and a factor-of-two reduction in requisite qubits. While the classical simulation of many qubits with traditional quantum formalism is impossible due to its exponential scaling, we mitigate this limitation with exact circuit representations using factorized tensor rings. In particular, the shallow circuits permitted by our technique, combined with efficient factorized tensor-based simulation, enable us to successfully optimize the MaxCut of the 512-vertex DIMACS library graphs on a single GPU. By improving the performance of quantum optimization algorithms while requiring fewer quantum resources and utilizing shallower, more error-resistant circuits, we offer tangible progress for variational quantum optimization.",
        "doi": "10.1103/physrevresearch.4.033142",
        "issn": "2643-1564",
        "publisher": "American Physical Society",
        "publication": "Physical Review Research",
        "publication_date": "2022-08",
        "series_number": "3",
        "volume": "4",
        "issue": "3",
        "pages": "Art. No. 4.033142"
    },
    {
        "id": "authors:kemxx-q5m20",
        "collection": "authors",
        "collection_id": "kemxx-q5m20",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20210831-203900979",
        "type": "article",
        "title": "Informing geometric deep learning with electronic interactions to accelerate quantum chemistry",
        "author": [
            {
                "family_name": "Qiao",
                "given_name": "Zhuoran",
                "orcid": "0000-0002-5704-7331",
                "clpid": "Qiao-Zhuoran"
            },
            {
                "family_name": "Christensen",
                "given_name": "Anders S.",
                "orcid": "0000-0002-7253-6897",
                "clpid": "Christensen-Anders-S"
            },
            {
                "family_name": "Welborn",
                "given_name": "Matthew",
                "orcid": "0000-0001-8659-6535",
                "clpid": "Welborn-Matthew-G"
            },
            {
                "family_name": "Manby",
                "given_name": "Frederick R.",
                "orcid": "0000-0001-7611-714X",
                "clpid": "Manby-Frederick-R"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Miller",
                "given_name": "Thomas F., III",
                "orcid": "0000-0002-1882-5380",
                "clpid": "Miller-T-F-III"
            }
        ],
        "abstract": "Predicting electronic energies, densities, and related chemical properties can facilitate the discovery of novel catalysts, medicines, and battery materials. However, existing machine learning techniques are challenged by the scarcity of training data when exploring unknown chemical spaces. We overcome this barrier by systematically incorporating knowledge of molecular electronic structure into deep learning. By developing a physics-inspired equivariant neural network, we introduce a method to learn molecular representations based on the electronic interactions among atomic orbitals. Our method, OrbNet-Equi, leverages efficient tight-binding simulations and learned mappings to recover high-fidelity physical quantities. OrbNet-Equi accurately models a wide spectrum of target properties while being several orders of magnitude faster than density functional theory. Despite only using training samples collected from readily available small-molecule libraries, OrbNet-Equi outperforms traditional semiempirical and machine learning\u2013based methods on comprehensive downstream benchmarks that encompass diverse main-group chemical processes. Our method also describes interactions in challenging charge-transfer complexes and open-shell systems. We anticipate that the strategy presented here will help to expand opportunities for studies in chemistry and materials science, where the acquisition of experimental or reference training data is costly.",
        "doi": "10.1073/pnas.2205221119",
        "pmcid": "PMC9351474",
        "issn": "0027-8424",
        "publisher": "National Academy of Science",
        "publication": "Proceedings of the National Academy of Sciences",
        "publication_date": "2022-07-28",
        "series_number": "31",
        "volume": "119",
        "issue": "31",
        "pages": "Art. No. e2205221119"
    },
    {
        "id": "authors:g79fh-yqz70",
        "collection": "authors",
        "collection_id": "g79fh-yqz70",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20220714-212437915",
        "type": "article",
        "title": "Langevin Monte Carlo for Contextual Bandits",
        "author": [
            {
                "family_name": "Xu",
                "given_name": "Pan",
                "clpid": "Xu-Pan"
            },
            {
                "family_name": "Zheng",
                "given_name": "Hongkai",
                "clpid": "Zheng-Hongkai"
            },
            {
                "family_name": "Mazumdar",
                "given_name": "Eric V.",
                "orcid": "0000-0002-1815-269X",
                "clpid": "Mazumdar-Eric"
            },
            {
                "family_name": "Azizzadenesheli",
                "given_name": "Kamyar",
                "orcid": "0000-0001-8507-1868",
                "clpid": "Azizzadenesheli-Kamyar"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            }
        ],
        "abstract": "We study the efficiency of Thompson sampling for contextual bandits. Existing Thompson sampling-based algorithms need to construct a Laplace approximation (i.e., a Gaussian distribution) of the posterior distribution, which is inefficient to sample in high dimensional applications for general covariance matrices. Moreover, the Gaussian approximation may not be a good surrogate for the posterior distribution for general reward generating functions. We propose an efficient posterior sampling algorithm, viz., Langevin Monte Carlo Thompson Sampling (LMC-TS), that uses Markov Chain Monte Carlo (MCMC) methods to directly sample from the posterior distribution in contextual bandits. Our method is computationally efficient since it only needs to perform noisy gradient descent updates without constructing the Laplace approximation of the posterior distribution. We prove that the proposed algorithm achieves the same sublinear regret bound as the best Thompson sampling algorithms for a special case of contextual bandits, viz., linear contextual bandits. We conduct experiments on both synthetic data and real-world datasets on different contextual bandit models, which demonstrates that directly sampling from the posterior is both computationally efficient and competitive in performance.",
        "doi": "10.48550/arXiv.arXiv.2206.11254",
        "issn": "2640-3498",
        "publisher": "ML Research Press",
        "publication": "Proceedings of Machine Learning Research",
        "publication_date": "2022-06-22",
        "volume": "162",
        "pages": "24830-24850"
    },
    {
        "id": "authors:87tww-zn973",
        "collection": "authors",
        "collection_id": "87tww-zn973",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20220714-212445251",
        "type": "article",
        "title": "Thompson Sampling Achieves \u00d5(\u221aT) Regret in Linear  Quadratic Control",
        "author": [
            {
                "family_name": "Kargin",
                "given_name": "Taylan",
                "orcid": "0000-0001-6744-654X",
                "clpid": "Kargin-Taylan"
            },
            {
                "family_name": "Lale",
                "given_name": "Sahin",
                "orcid": "0000-0002-7191-346X",
                "clpid": "Lale-Sahin"
            },
            {
                "family_name": "Azizzadenesheli",
                "given_name": "Kamyar",
                "orcid": "0000-0001-8507-1868",
                "clpid": "Azizzadenesheli-Kamyar"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Hassibi",
                "given_name": "Babak",
                "orcid": "0000-0002-1375-5838",
                "clpid": "Hassibi-B"
            }
        ],
        "abstract": "Thompson Sampling (TS) is an efficient method for decision-making under uncertainty, where an action is sampled from a carefully prescribed distribution which is updated based on the observed data. In this work, we study the problem of adaptive control of stabilizable linear-quadratic regulators (LQRs) using TS, where the system dynamics are unknown. Previous works have established that \u00d5(\u221aT) frequentist regret is optimal for the adaptive control of LQRs. However, the existing methods either work only in restrictive settings, require a priori known stabilizing controllers, or utilize computationally intractable approaches. We propose an efficient TS algorithm for the adaptive control of LQRs, TS-based Adaptive Control, TSAC, that attains \u00d5(\u221aT)regret, even for multidimensional systems, thereby solving the open problem posed in Abeille and Lazaric (2018). TSAC does not require a priori known stabilizing controller and achieves fast stabilization of the underlying system by effectively exploring the environment in the early stages. Our result hinges on developing a novel lower bound on the probability that the TS provides an optimistic sample. By carefully prescribing an early exploration strategy and a policy update rule, we show that TS achieves order-optimal regret in adaptive control of multidimensional stabilizable LQRs. We empirically demonstrate the performance and the efficiency of TSAC in several adaptive control tasks.",
        "doi": "10.48550/arXiv.2206.08520",
        "issn": "2640-3498",
        "publisher": "ML Research Press",
        "publication": "Proceedings of Machine Learning Research",
        "publication_date": "2022-06-17",
        "volume": "178",
        "pages": "3235-3284"
    },
    {
        "id": "authors:gt5nn-4yp94",
        "collection": "authors",
        "collection_id": "gt5nn-4yp94",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20220413-607067100",
        "type": "article",
        "title": "Deep Neural Networks Can Accurately Detect Blood Loss and Hemorrhage Control Task Success From Video",
        "author": [
            {
                "family_name": "Kugener",
                "given_name": "Guillaume",
                "orcid": "0000-0002-4697-2847",
                "clpid": "Kugener-Guillaume"
            },
            {
                "family_name": "Zhu",
                "given_name": "Yichao",
                "clpid": "Zhu-Yichao"
            },
            {
                "family_name": "Pangal",
                "given_name": "Dhiraj J.",
                "orcid": "0000-0001-7391-9825",
                "clpid": "Pangal-Dhiraj-J"
            },
            {
                "family_name": "Sinha",
                "given_name": "Aditya",
                "clpid": "Sinha-Aditya"
            },
            {
                "family_name": "Markarian",
                "given_name": "Nicholas",
                "clpid": "Markarian-Nicholas"
            },
            {
                "family_name": "Roshannai",
                "given_name": "Arman",
                "clpid": "Roshannai-Arman"
            },
            {
                "family_name": "Chan",
                "given_name": "Justin",
                "clpid": "Chan-Justin"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Hung",
                "given_name": "Andrew J.",
                "orcid": "0000-0002-7201-6736",
                "clpid": "Hung-Andrew-J"
            },
            {
                "family_name": "Wrobel",
                "given_name": "Bozena B.",
                "clpid": "Wrobel-Bozena-B"
            },
            {
                "family_name": "Zada",
                "given_name": "Gabriel",
                "orcid": "0000-0001-5821-902X",
                "clpid": "Zada-Gabriel"
            },
            {
                "family_name": "Donoho",
                "given_name": "Daniel A.",
                "orcid": "0000-0002-0531-1436",
                "clpid": "Donoho-Daniel-A"
            }
        ],
        "abstract": "Background: Deep neural networks (DNNs) have not been proven to detect blood loss (BL) or predict surgeon performance from video. \n\nObjective: To train a DNN using video from cadaveric training exercises of surgeons controlling simulated internal carotid hemorrhage to predict clinically relevant outcomes. \n\nMethods: Video was input as a series of images; deep learning networks were developed, which predicted BL and task success from images alone (automated model) and images plus human-labeled instrument annotations (semiautomated model). These models were compared against 2 reference models, which used average BL across all trials as its prediction (control 1) and a linear regression with time to hemostasis (a metric with known association with BL) as input (control 2). The root-mean-square error (RMSE) and correlation coefficients were used to compare the models; lower RMSE indicates superior performance. \n\nResults: One hundred forty-three trials were used (123 for training and 20 for testing). Deep learning models outperformed controls (control 1: RMSE 489 mL, control 2: RMSE 431 mL, R2 = 0.35) at BL prediction. The automated model predicted BL with an RMSE of 358 mL (R2 = 0.4) and correctly classified outcome in 85% of trials. The RMSE and classification performance of the semiautomated model improved to 260 mL and 90%, respectively. \n\nConclusion: BL and task outcome classification are important components of an automated assessment of surgical performance. DNNs can predict BL and outcome of hemorrhage control from video alone; their performance is improved with surgical instrument presence data. The generalizability of DNNs trained on hemorrhage control tasks should be investigated.",
        "doi": "10.1227/neu.0000000000001906",
        "issn": "0148-396X",
        "publisher": "Lippincott, Williams & Wilkins",
        "publication": "Neurosurgery",
        "publication_date": "2022-06",
        "series_number": "6",
        "volume": "90",
        "issue": "6",
        "pages": "823-829"
    },
    {
        "id": "authors:dkz22-hkm86",
        "collection": "authors",
        "collection_id": "dkz22-hkm86",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20220124-214564000",
        "type": "article",
        "title": "Expert surgeons and deep learning models can predict the outcome of surgical hemorrhage from 1 min of video",
        "author": [
            {
                "family_name": "Pangal",
                "given_name": "Dhiraj J.",
                "orcid": "0000-0001-7391-9825",
                "clpid": "Pangal-Dhiraj-J"
            },
            {
                "family_name": "Kugener",
                "given_name": "Guillaume",
                "orcid": "0000-0002-4697-2847",
                "clpid": "Kugener-Guillaume"
            },
            {
                "family_name": "Zhu",
                "given_name": "Yichao",
                "clpid": "Zhu-Yichao"
            },
            {
                "family_name": "Sinha",
                "given_name": "Aditya",
                "clpid": "Sinha-Aditya"
            },
            {
                "family_name": "Unadkat",
                "given_name": "Vyom",
                "clpid": "Unadkat-Vyom"
            },
            {
                "family_name": "Cote",
                "given_name": "David J.",
                "clpid": "Cote-David-J"
            },
            {
                "family_name": "Strickland",
                "given_name": "Ben",
                "orcid": "0000-0002-4620-9542",
                "clpid": "Strickland-Ben-A"
            },
            {
                "family_name": "Rutkowski",
                "given_name": "Martin",
                "orcid": "0000-0002-5188-3419",
                "clpid": "Rutkowski-Martin"
            },
            {
                "family_name": "Hung",
                "given_name": "Andrew",
                "orcid": "0000-0002-7201-6736",
                "clpid": "Hung-Andrew-J"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Han",
                "given_name": "X. Y.",
                "clpid": "Han-X-Y"
            },
            {
                "family_name": "Papyan",
                "given_name": "Vardan",
                "orcid": "0000-0002-5028-2144",
                "clpid": "Papyan-Vardan"
            },
            {
                "family_name": "Wrobel",
                "given_name": "Bozena",
                "clpid": "Wrobel-Bozena-B"
            },
            {
                "family_name": "Zada",
                "given_name": "Gabriel",
                "orcid": "0000-0001-5821-902X",
                "clpid": "Zada-Gabriel"
            },
            {
                "family_name": "Donoho",
                "given_name": "Daniel A.",
                "orcid": "0000-0002-0531-1436",
                "clpid": "Donoho-Daniel-A"
            }
        ],
        "abstract": "Major vascular injury resulting in uncontrolled bleeding is a catastrophic and often fatal complication of minimally invasive surgery. At the outset of these events, surgeons do not know how much blood will be lost or whether they will successfully control the hemorrhage (achieve hemostasis). We evaluate the ability of a deep learning neural network (DNN) to predict hemostasis control ability using the first minute of surgical video and compare model performance with human experts viewing the same video. The publicly available SOCAL dataset contains 147 videos of attending and resident surgeons managing hemorrhage in a validated, high-fidelity cadaveric simulator. Videos are labeled with outcome and blood loss (mL). The first minute of 20 videos was shown to four, blinded, fellowship trained skull-base neurosurgery instructors, and to SOCALNet (a DNN trained on SOCAL videos). SOCALNet architecture included a convolutional network (ResNet) identifying spatial features and a recurrent network identifying temporal features (LSTM). Experts independently assessed surgeon skill, predicted outcome and blood loss (mL). Outcome and blood loss predictions were compared with SOCALNet. Expert inter-rater reliability was 0.95. Experts correctly predicted 14/20 trials (Sensitivity: 82%, Specificity: 55%, Positive Predictive Value (PPV): 69%, Negative Predictive Value (NPV): 71%). SOCALNet correctly predicted 17/20 trials (Sensitivity 100%, Specificity 66%, PPV 79%, NPV 100%) and correctly identified all successful attempts. Expert predictions of the highest and lowest skill surgeons and expert predictions reported with maximum confidence were more accurate. Experts systematically underestimated blood loss (mean error \u2212 131 mL, RMSE 350 mL, R2 0.70) and fewer than half of expert predictions identified blood loss\u2009&gt;\u2009500 mL (47.5%, 19/40). SOCALNet had superior performance (mean error \u2212 57 mL, RMSE 295 mL, R\u00b2 0.74) and detected most episodes of blood loss\u2009&gt;\u2009500 mL (80%, 8/10). In validation experiments, SOCALNet evaluation of a critical on-screen surgical maneuver and high/low-skill composite videos were concordant with expert evaluation. Using only the first minute of video, experts and SOCALNet can predict outcome and blood loss during surgical hemorrhage. Experts systematically underestimated blood loss, and SOCALNet had no false negatives. DNNs can provide accurate, meaningful assessments of surgical video. We call for the creation of datasets of surgical adverse events for quality improvement research.",
        "doi": "10.1038/s41598-022-11549-2",
        "pmcid": "PMC9114003",
        "issn": "2045-2322",
        "publisher": "Nature Publishing Group",
        "publication": "Scientific Reports",
        "publication_date": "2022-05-17",
        "volume": "12",
        "pages": "Art. No. 8137"
    },
    {
        "id": "authors:dq8ck-xrt79",
        "collection": "authors",
        "collection_id": "dq8ck-xrt79",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20220715-174841781",
        "type": "article",
        "title": "Diffusion Models for Adversarial Purification",
        "author": [
            {
                "family_name": "Nie",
                "given_name": "Weili",
                "clpid": "Nie-Weili"
            },
            {
                "family_name": "Guo",
                "given_name": "Brandon",
                "clpid": "Guo-Brandon"
            },
            {
                "family_name": "Huang",
                "given_name": "Yujia",
                "orcid": "0000-0001-7667-8342",
                "clpid": "Huang-Yujia"
            },
            {
                "family_name": "Xiao",
                "given_name": "Chaowei",
                "orcid": "0000-0002-7043-4926",
                "clpid": "Xiao-Chaowei"
            },
            {
                "family_name": "Vahdat",
                "given_name": "Arash",
                "clpid": "Vahdat-Arash"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            }
        ],
        "abstract": "Adversarial purification refers to a class of defense methods that remove adversarial perturbations using a generative model. These methods do not make assumptions on the form of attack and the classification model, and thus can defend pre-existing classifiers against unseen threats. However, their performance currently falls behind adversarial training methods. In this work, we propose DiffPure that uses diffusion models for adversarial purification: Given an adversarial example, we first diffuse it with a small amount of noise following a forward diffusion process, and then recover the clean image through a reverse generative process. To evaluate our method against strong adaptive attacks in an efficient and scalable way, we propose to use the adjoint method to compute full gradients of the reverse generative process. Extensive experiments on three image datasets including CIFAR-10, ImageNet and CelebA-HQ with three classifier architectures including ResNet, WideResNet and ViT demonstrate that our method achieves the state-of-the-art results, outperforming current adversarial training and adversarial purification methods, often by a large margin. Project page:\nhttps://diffpure.github.io.",
        "doi": "10.48550/arXiv.2205.07460",
        "issn": "2640-3498",
        "publisher": "ML Research Press",
        "publication": "Proceedings of Machine Learning Research",
        "publication_date": "2022-05-16",
        "volume": "162",
        "pages": "16805-16827"
    },
    {
        "id": "authors:q3grb-3vz72",
        "collection": "authors",
        "collection_id": "q3grb-3vz72",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20220505-792409800",
        "type": "article",
        "title": "Neural-Fly enables rapid learning for agile flight in strong winds",
        "author": [
            {
                "family_name": "O'Connell",
                "given_name": "Michael",
                "orcid": "0000-0001-6681-8823",
                "clpid": "O'Connell-Michael"
            },
            {
                "family_name": "Shi",
                "given_name": "Guanya",
                "orcid": "0000-0002-9075-3705",
                "clpid": "Shi-Guanya"
            },
            {
                "family_name": "Shi",
                "given_name": "Xichen",
                "orcid": "0000-0002-5366-9256",
                "clpid": "Shi-Xichen"
            },
            {
                "family_name": "Azizzadenesheli",
                "given_name": "Kamyar",
                "orcid": "0000-0001-8507-1868",
                "clpid": "Azizzadenesheli-Kamyar"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Yue",
                "given_name": "Yisong",
                "orcid": "0000-0001-9127-1989",
                "clpid": "Yue-Yisong"
            },
            {
                "family_name": "Chung",
                "given_name": "Soon-Jo",
                "orcid": "0000-0002-6657-3907",
                "clpid": "Chung-Soon-Jo"
            }
        ],
        "abstract": "Executing safe and precise flight maneuvers in dynamic high-speed winds is important for the ongoing commoditization of uninhabited aerial vehicles (UAVs). However, because the relationship between various wind conditions and its effect on aircraft maneuverability is not well understood, it is challenging to design effective robot controllers using traditional control design methods. We present Neural-Fly, a learning-based approach that allows rapid online adaptation by incorporating pretrained representations through deep learning. Neural-Fly builds on two key observations that aerodynamics in different wind conditions share a common representation and that the wind-specific part lies in a low-dimensional space. To that end, Neural-Fly uses a proposed learning algorithm, domain adversarially invariant meta-learning (DAIML), to learn the shared representation, only using 12 minutes of flight data. With the learned representation as a basis, Neural-Fly then uses a composite adaptation law to update a set of linear coefficients for mixing the basis elements. When evaluated under challenging wind conditions generated with the Caltech Real Weather Wind Tunnel, with wind speeds up to 43.6 kilometers/hour (12.1 meters/second), Neural-Fly achieves precise flight control with substantially smaller tracking error than stateof-the-art nonlinear and adaptive controllers. In addition to strong empirical performance, the exponential stability of Neural-Fly results in robustness guarantees. Last, our control design extrapolates to unseen wind conditions, is shown to be effective for outdoor flights with only onboard sensors, and can transfer across drones with minimal performance degradation.",
        "doi": "10.1126/scirobotics.abm6597",
        "issn": "2470-9476",
        "publisher": "American Association for the Advancement of Science",
        "publication": "Science Robotics",
        "publication_date": "2022-05-04",
        "series_number": "66",
        "volume": "7",
        "issue": "66",
        "pages": "Art. No. eabm6597"
    },
    {
        "id": "authors:96rss-qa524",
        "collection": "authors",
        "collection_id": "96rss-qa524",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20220912-920381000",
        "type": "article",
        "title": "The Relationship Between Technical Skills, Cognitive Workload, and Errors During Robotic Surgical Exercises",
        "author": [
            {
                "family_name": "Roberts",
                "given_name": "Sidney I.",
                "clpid": "Roberts-Sidney-I"
            },
            {
                "family_name": "Cen",
                "given_name": "Steven Y.",
                "orcid": "0000-0002-7859-8909",
                "clpid": "Cen-Steven-Y"
            },
            {
                "family_name": "Nguyen",
                "given_name": "Jessica H.",
                "orcid": "0000-0003-0454-8463",
                "clpid": "Nguyen-Jessica-H"
            },
            {
                "family_name": "Perez",
                "given_name": "Laura C.",
                "clpid": "Perez-Laura-C"
            },
            {
                "family_name": "Medina",
                "given_name": "Luis G.",
                "clpid": "Medina-Luis-G"
            },
            {
                "family_name": "Ma",
                "given_name": "Runzhuo",
                "orcid": "0000-0001-6381-2661",
                "clpid": "Ma-Runzhuo"
            },
            {
                "family_name": "Marshall",
                "given_name": "Sandra",
                "clpid": "Marshall-Sandra"
            },
            {
                "family_name": "Kocielnik",
                "given_name": "Rafal",
                "orcid": "0000-0001-5602-6056",
                "clpid": "Kocielnik-Rafal"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Hung",
                "given_name": "Andrew J.",
                "orcid": "0000-0002-7201-6736",
                "clpid": "Hung-Andrew-J"
            }
        ],
        "abstract": "Purpose: We attempt to understand the relationship between surgeon technical skills, cognitive workload, and errors during a simulated robotic dissection task.\n\nMaterials and Methods: Participant surgeons performed a robotic surgery dissection exercise. Participants were grouped based on surgical experience. Technical skills were evaluated utilizing the validated Global Evaluative Assessment of Robotic Skills (GEARS) assessment tool. The dissection task was evaluated for errors during active dissection or passive retraction maneuvers. We quantified cognitive workload of surgeon participants as an index of cognitive activity (ICA), derived from task-evoked pupillary response metrics; ICA ranged 0 to 1, with 1 representing maximum ICA. Generalized estimating equation (GEE) was used for all modelings to establish relationships between surgeon technical skills, cognitive workload, and errors.\n\nResults: We found a strong association between technical skills as measured by multiple GEARS domains (depth perception, force sensitivity, and robotic control) and passive errors, with higher GEARS scores associated with a lower relative risk of errors (all p\u2009&lt;\u20090.01). For novice surgeons, as average GEARS scores increased, the average estimated ICA decreased. In contrast, as average GEARS increased for expert surgeons, the average estimated ICA increased. When exhibiting optimal technical skill (maximal GEARS scores), novices and experts reached a similar range of ICA scores (ICA: 0.47 and 0.42, respectively).\n\nConclusions: This study found that there is an optimal cognitive workload level for surgeons of all experience levels during our robotic surgical exercise. Select technical skill domains were strong predictors of errors. Future research will explore whether an ideal cognitive workload range truly optimizes surgical training and reduces surgical errors.",
        "doi": "10.1089/end.2021.0790",
        "pmcid": "PMC9145254",
        "issn": "0892-7790",
        "publisher": "Mary Ann Liebert Inc",
        "publication": "Journal of Endourology",
        "publication_date": "2022-05",
        "series_number": "5",
        "volume": "36",
        "issue": "5",
        "pages": "712-720"
    },
    {
        "id": "authors:0m99v-t3788",
        "collection": "authors",
        "collection_id": "0m99v-t3788",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20220412-15492000",
        "type": "article",
        "title": "U-FNO\u2014An enhanced Fourier neural operator-based deep-learning model for multiphase flow",
        "author": [
            {
                "family_name": "Wen",
                "given_name": "Gege",
                "orcid": "0000-0003-1668-3777",
                "clpid": "Wen-Gege"
            },
            {
                "family_name": "Li",
                "given_name": "Zongyi",
                "orcid": "0000-0003-2081-9665",
                "clpid": "Li-Zongyi"
            },
            {
                "family_name": "Azizzadenesheli",
                "given_name": "Kamyar",
                "orcid": "0000-0001-8507-1868",
                "clpid": "Azizzadenesheli-Kamyar"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Benson",
                "given_name": "Sally M.",
                "orcid": "0000-0002-3733-4296",
                "clpid": "Benson-Sally-M"
            }
        ],
        "abstract": "Numerical simulation of multiphase flow in porous media is essential for many geoscience applications. Machine learning models trained with numerical simulation data can provide a faster alternative to traditional simulators. Here we present U-FNO, a novel neural network architecture for solving multiphase flow problems with superior accuracy, speed, and data efficiency. U-FNO is designed based on the newly proposed Fourier neural operator (FNO), which has shown excellent performance in single-phase flows. We extend the FNO-based architecture to a highly complex CO\u2082-water multiphase problem with wide ranges of permeability and porosity heterogeneity, anisotropy, reservoir conditions, injection configurations, flow rates, and multiphase flow properties. The U-FNO architecture is more accurate in gas saturation and pressure buildup predictions than the original FNO and a state-of-the-art convolutional neural network (CNN) benchmark. Meanwhile, it has superior data utilization efficiency, requiring only a third of the training data to achieve the equivalent accuracy as CNN. U-FNO provides superior performance in highly heterogeneous geological formations and critically important applications such as gas saturation and pressure buildup \"fronts\" determination. The trained model can serve as a general-purpose alternative to routine numerical simulations of 2D-radial CO\u2082 injection problems with significant speed-ups than traditional simulators.",
        "doi": "10.1016/j.advwatres.2022.104180",
        "issn": "0309-1708",
        "publisher": "Elsevier",
        "publication": "Advances in Water Resources",
        "publication_date": "2022-05",
        "volume": "163",
        "pages": "Art. No. 104180"
    },
    {
        "id": "authors:xk16q-c4k63",
        "collection": "authors",
        "collection_id": "xk16q-c4k63",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20220714-212518736",
        "type": "article",
        "title": "Understanding The Robustness in Vision Transformers",
        "author": [
            {
                "family_name": "Zhou",
                "given_name": "Daquan",
                "clpid": "Zhou-Daquan"
            },
            {
                "family_name": "Yu",
                "given_name": "Zhiding",
                "clpid": "Yu-Zhiding"
            },
            {
                "family_name": "Xie",
                "given_name": "Enze",
                "clpid": "Xie-Enze"
            },
            {
                "family_name": "Xiao",
                "given_name": "Chaowei",
                "clpid": "Xiao-Chaowei"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Feng",
                "given_name": "Jiashi",
                "clpid": "Feng-Jiashi"
            },
            {
                "family_name": "Alvarez",
                "given_name": "Jose M.",
                "clpid": "Alvarez-Jose-M"
            }
        ],
        "abstract": "Recent studies show that Vision Transformers(ViTs) exhibit strong robustness against various corruptions. Although this property is partly attributed to the self-attention mechanism, there is still a lack of systematic understanding. In this paper, we examine the role of self-attention in learning robust representations. Our study is motivated by the intriguing properties of the emerging visual grouping in Vision Transformers, which indicates that self-attention may promote robustness through improved mid-level representations. We further propose a family of fully attentional networks (FANs) that strengthen this capability by incorporating an attentional channel processing design. We validate the design comprehensively on various hierarchical backbones. Our model achieves a state of-the-art 87.1% accuracy and 35.8% mCE on ImageNet-1k and ImageNet-C with 76.8M parameters. We also demonstrate state-of-the-art accuracy and robustness in two downstream tasks: semantic segmentation and object detection. Code will be available at this https URL.",
        "doi": "10.48550/arXiv.2204.12451",
        "issn": "2640-3498",
        "publisher": "ML Research Press",
        "publication": "Proceedings of Machine Learning Research",
        "publication_date": "2022-04-26",
        "volume": "162",
        "pages": "27378-27394"
    },
    {
        "id": "authors:09mnq-t5j04",
        "collection": "authors",
        "collection_id": "09mnq-t5j04",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20220329-772928599",
        "type": "article",
        "title": "Utility of the Simulated Outcomes Following Carotid Artery Laceration Video Data Set for Machine Learning Applications",
        "author": [
            {
                "family_name": "Kugener",
                "given_name": "Guillaume",
                "orcid": "0000-0002-4697-2847",
                "clpid": "Kugener-Guillaume"
            },
            {
                "family_name": "Pangal",
                "given_name": "Dhiraj J.",
                "orcid": "0000-0001-7391-9825",
                "clpid": "Pangal-Dhiraj-J"
            },
            {
                "family_name": "Cardinal",
                "given_name": "Tyler",
                "clpid": "Cardinal-Tyler"
            },
            {
                "family_name": "Collet",
                "given_name": "Casey",
                "clpid": "Collet-Casey"
            },
            {
                "family_name": "Lechtholz-Zey",
                "given_name": "Elizabeth",
                "clpid": "Lechtholz-Zey-Elizabeth"
            },
            {
                "family_name": "Lasky",
                "given_name": "Sasha",
                "clpid": "Lasky-Sasha"
            },
            {
                "family_name": "Sundaram",
                "given_name": "Shivani",
                "orcid": "0000-0003-2863-9204",
                "clpid": "Sundaram-Shivani"
            },
            {
                "family_name": "Markarian",
                "given_name": "Nicholas",
                "clpid": "Markarian-Nicholas"
            },
            {
                "family_name": "Zhu",
                "given_name": "Yichao",
                "clpid": "Zhu-Yichao"
            },
            {
                "family_name": "Roshannai",
                "given_name": "Arman",
                "clpid": "Roshannai-Arman"
            },
            {
                "family_name": "Sinha",
                "given_name": "Aditya",
                "clpid": "Sinha-Aditya"
            },
            {
                "family_name": "Han",
                "given_name": "X. Y.",
                "clpid": "Han-X-Y"
            },
            {
                "family_name": "Papyan",
                "given_name": "Vardan",
                "orcid": "0000-0002-5028-2144",
                "clpid": "Papyan-Vardan"
            },
            {
                "family_name": "Hung",
                "given_name": "Andrew J.",
                "orcid": "0000-0002-7201-6736",
                "clpid": "Hung-Andrew-J"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Wrobel",
                "given_name": "Bozena",
                "clpid": "Wrobel-Bozena-B"
            },
            {
                "family_name": "Zada",
                "given_name": "Gabriel",
                "orcid": "0000-0001-5821-902X",
                "clpid": "Zada-Gabriel"
            },
            {
                "family_name": "Donoho",
                "given_name": "Daniel A.",
                "orcid": "0000-0002-0531-1436",
                "clpid": "Donoho-Daniel-A"
            }
        ],
        "abstract": "Importance. Surgical data scientists lack video data sets that depict adverse events, which may affect model generalizability and introduce bias. Hemorrhage may be particularly challenging for computer vision\u2013based models because blood obscures the scene. \n\nObjective. To assess the utility of the Simulated Outcomes Following Carotid Artery Laceration (SOCAL)\u2014a publicly available surgical video data set of hemorrhage complication management with instrument annotations and task outcomes\u2014to provide benchmarks for surgical data science techniques, including computer vision instrument detection, instrument use metrics and outcome associations, and validation of a SOCAL-trained neural network using real operative video. \n\nDesign, Setting, and Participants. For this quailty improvement study, a total of 75 surgeons with 1 to 30 years' experience (mean, 7 years) were filmed from January 1, 2017, to December 31, 2020, managing catastrophic surgical hemorrhage in a high-fidelity cadaveric training exercise at nationwide training courses. Videos were annotated from January 1 to June 30, 2021. \n\nInterventions. Surgeons received expert coaching between 2 trials. \n\nMain Outcomes and Measures. Hemostasis within 5 minutes (task success, dichotomous), time to hemostasis (in seconds), and blood loss (in milliliters) were recorded. Deep neural networks (DNNs) were trained to detect surgical instruments in view. Model performance was measured using mean average precision (mAP), sensitivity, and positive predictive value. \n\nResults. SOCAL contains 31\u202f443 frames with 65\u202f071 surgical instrument annotations from 147 trials with associated surgeon demographic characteristics, time to hemostasis, and recorded blood loss for each trial. Computer vision\u2013based instrument detection methods using DNNs trained on SOCAL achieved a mAP of 0.67 overall and 0.91 for the most common surgical instrument (suction). Hemorrhage control challenges standard object detectors: detection of some surgical instruments remained poor (mAP,\u20090.25). On real intraoperative video, the model achieved a sensitivity of 0.77 and a positive predictive value of 0.96. Instrument use metrics derived from the SOCAL video were significantly associated with performance (blood loss). \n\nConclusions and Relevance. Hemorrhage control is a high-stakes adverse event that poses unique challenges for video analysis, but no data sets of hemorrhage control exist. The use of SOCAL, the first data set to depict hemorrhage control, allows the benchmarking of data science applications, including object detection, performance metric development, and identification of metrics associated with outcomes. In the future, SOCAL may be used to build and validate surgical data science models.",
        "doi": "10.1001/jamanetworkopen.2022.3177",
        "pmcid": "PMC8938712",
        "issn": "2574-3805",
        "publisher": "American Medical Association",
        "publication": "JAMA Network Open",
        "publication_date": "2022-03",
        "series_number": "3",
        "volume": "5",
        "issue": "3",
        "pages": "Art. No. e223177"
    },
    {
        "id": "authors:j2js9-txr69",
        "collection": "authors",
        "collection_id": "j2js9-txr69",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20210225-132721680",
        "type": "article",
        "title": "A learning-based multiscale method and its application to inelastic impact problems",
        "author": [
            {
                "family_name": "Liu",
                "given_name": "Burigede",
                "orcid": "0000-0002-6518-3368",
                "clpid": "Liu-Burigede"
            },
            {
                "family_name": "Kovachki",
                "given_name": "Nikola",
                "orcid": "0000-0002-3650-2972",
                "clpid": "Kovachki-Nikola-B"
            },
            {
                "family_name": "Li",
                "given_name": "Zongyi",
                "clpid": "Li-Zongyi"
            },
            {
                "family_name": "Azizzadenesheli",
                "given_name": "Kamyar",
                "orcid": "0000-0001-8507-1868",
                "clpid": "Azizzadenesheli-Kamyar"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Stuart",
                "given_name": "Andrew M.",
                "orcid": "0000-0001-9091-7266",
                "clpid": "Stuart-A-M"
            },
            {
                "family_name": "Bhattacharya",
                "given_name": "Kaushik",
                "orcid": "0000-0003-2908-5469",
                "clpid": "Bhattacharya-K"
            }
        ],
        "abstract": "The macroscopic properties of materials that we observe and exploit in engineering application result from complex interactions between physics at multiple length and time scales: electronic, atomistic, defects, domains etc. Multiscale modeling seeks to understand these interactions by exploiting the inherent hierarchy where the behavior at a coarser scale regulates and averages the behavior at a finer scale. This requires the repeated solution of computationally expensive finer-scale models, and often a priori knowledge of those aspects of the finer-scale behavior that affect the coarser scale (order parameters, state variables, descriptors, etc.). We address this challenge in a two-scale setting where we learn the fine-scale behavior from off-line calculations and then use the learnt behavior directly in coarse scale calculations. The approach builds on the recent success of deep neural networks by combining their approximation power in high dimensions with ideas from model reduction. It results in a neural network approximation that has high fidelity, is computationally inexpensive, is independent of the need for a priori knowledge, and can be used directly in the coarse scale calculations. We demonstrate the approach on problems involving the impact of magnesium, a promising light-weight structural and protective material.",
        "doi": "10.1016/j.jmps.2021.104668",
        "issn": "0022-5096",
        "publisher": "Elsevier",
        "publication": "Journal of the Mechanics and Physics of Solids",
        "publication_date": "2022-01",
        "volume": "158",
        "pages": "Art. No. 104668"
    },
    {
        "id": "authors:cwhg8-t6c31",
        "collection": "authors",
        "collection_id": "cwhg8-t6c31",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20210831-203931813",
        "type": "article",
        "title": "OrbNet Denali: A machine learning potential for biological and organic chemistry with semi-empirical cost and DFT accuracy",
        "author": [
            {
                "family_name": "Christensen",
                "given_name": "Anders S.",
                "orcid": "0000-0002-7253-6897",
                "clpid": "Christensen-Anders-S"
            },
            {
                "family_name": "Sirumalla",
                "given_name": "Sai Krishna",
                "orcid": "0000-0002-1875-2062",
                "clpid": "Sirumalla-Sai-Krishna"
            },
            {
                "family_name": "Qiao",
                "given_name": "Zhuoran",
                "orcid": "0000-0002-5704-7331",
                "clpid": "Qiao-Zhuoran"
            },
            {
                "family_name": "O'Connor",
                "given_name": "Michael B.",
                "clpid": "O'Connor-Michael-B"
            },
            {
                "family_name": "Smith",
                "given_name": "Daniel G. A.",
                "orcid": "0000-0001-8626-0900",
                "clpid": "Smith-Daniel-G-A"
            },
            {
                "family_name": "Ding",
                "given_name": "Feizhi",
                "clpid": "Ding-Feizhi"
            },
            {
                "family_name": "Bygrave",
                "given_name": "Peter J.",
                "orcid": "0000-0002-5505-5637",
                "clpid": "Bygrave-Peter-J"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Welborn",
                "given_name": "Matthew",
                "orcid": "0000-0001-8659-6535",
                "clpid": "Welborn-Matthew-G"
            },
            {
                "family_name": "Manby",
                "given_name": "Frederick R.",
                "orcid": "0000-0001-7611-714X",
                "clpid": "Manby-Frederick-R"
            },
            {
                "family_name": "Miller",
                "given_name": "Thomas F., III",
                "orcid": "0000-0002-1882-5380",
                "clpid": "Miller-T-F-III"
            }
        ],
        "abstract": "We present OrbNet Denali, a machine learning model for an electronic structure that is designed as a drop-in replacement for ground-state density functional theory (DFT) energy calculations. The model is a message-passing graph neural network that uses symmetry-adapted atomic orbital features from a low-cost quantum calculation to predict the energy of a molecule. OrbNet Denali is trained on a vast dataset of 2.3 \u00d7 10\u2076 DFT calculations on molecules and geometries. This dataset covers the most common elements in biochemistry and organic chemistry (H, Li, B, C, N, O, F, Na, Mg, Si, P, S, Cl, K, Ca, Br, and I) and charged molecules. OrbNet Denali is demonstrated on several well-established benchmark datasets, and we find that it provides accuracy that is on par with modern DFT methods while offering a speedup of up to three orders of magnitude. For the GMTKN55 benchmark set, OrbNet Denali achieves WTMAD-1 and WTMAD-2 scores of 7.19 and 9.84, on par with modern DFT functionals. For several GMTKN55 subsets, which contain chemical problems that are not present in the training set, OrbNet Denali produces a mean absolute error comparable to those of DFT methods. For the Hutchison conformer benchmark set, OrbNet Denali has a median correlation coefficient of R\u00b2 = 0.90 compared to the reference DLPNO-CCSD(T) calculation and R\u00b2 = 0.97 compared to the method used to generate the training data (\u03c9B97X-D3/def2-TZVP), exceeding the performance of any other method with a similar cost. Similarly, the model reaches chemical accuracy for non-covalent interactions in the S66x10 dataset. For torsional profiles, OrbNet Denali reproduces the torsion profiles of \u03c9B97X-D3/def2-TZVP with an average mean absolute error of 0.12 kcal/mol for the potential energy surfaces of the diverse fragments in the TorsionNet500 dataset.",
        "doi": "10.1063/5.0061990",
        "issn": "0021-9606",
        "publisher": "American Institute of Physics",
        "publication": "Journal of Chemical Physics",
        "publication_date": "2021-11-28",
        "series_number": "20",
        "volume": "155",
        "issue": "20",
        "pages": "Art. No. 204103"
    },
    {
        "id": "authors:emvcf-kqt83",
        "collection": "authors",
        "collection_id": "emvcf-kqt83",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20220714-224643553",
        "type": "article",
        "title": "Adversarial Skill Chaining for Long-Horizon Robot Manipulation via Terminal State Regularization",
        "author": [
            {
                "family_name": "Lee",
                "given_name": "Youngwoon",
                "clpid": "Lee-Youngwoon"
            },
            {
                "family_name": "Lim",
                "given_name": "Joseph J.",
                "clpid": "Lim-Joseph-J"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Zhu",
                "given_name": "Yuke",
                "orcid": "0000-0002-9198-2227",
                "clpid": "Zhu-Yuke"
            }
        ],
        "abstract": "Skill chaining is a promising approach for synthesizing complex behaviors by sequentially combining previously learned skills. Yet, a naive composition of skills fails when a policy encounters a starting state never seen during its training. For successful skill chaining, prior approaches attempt to widen the policy's starting state distribution. However, these approaches require larger state distributions to be covered as more policies are sequenced, and thus are limited to short skill sequences. In this paper, we propose to chain multiple policies without excessively large initial state distributions by regularizing the terminal state distributions in an adversarial learning framework. We evaluate our approach on two complex long-horizon manipulation tasks of furniture assembly. Our results have shown that our method establishes the first model-free reinforcement learning algorithm to solve these tasks; whereas prior skill chaining approaches fail. The code and videos are available at https://clvrai.com/skill-chaining.",
        "doi": "10.48550/arXiv.arXiv.2111.07999",
        "issn": "2640-3498",
        "publisher": "ML Research Press",
        "publication": "Proceedings of Machine Learning Research",
        "publication_date": "2021-11-15",
        "volume": "164",
        "pages": "406-416"
    },
    {
        "id": "authors:qxbfh-mre98",
        "collection": "authors",
        "collection_id": "qxbfh-mre98",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20211008-183538597",
        "type": "article",
        "title": "Deep Learning to Automate Technical Skills Assessment in Robotic Surgery",
        "author": [
            {
                "family_name": "Hung",
                "given_name": "Andrew J.",
                "orcid": "0000-0002-7201-6736",
                "clpid": "Hung-Andrew-J"
            },
            {
                "family_name": "Liu",
                "given_name": "Yan",
                "orcid": "0000-0002-5837-4908",
                "clpid": "Liu-Yan"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            }
        ],
        "abstract": "Surgeon performance affects patient outcomes. To improve patient outcomes, we must identify poor surgical performance. However, surgeons may not always associate a specific surgical act with its consequential outcome unless the error is egregious and the outcome is immediate. Today, there is little formal structure for surgeons to receive specific technical skills feedback after formal training. Current hurdles for surgeons to obtain and maintain hospital privileges to perform an operative procedure include peer proctoring and evaluation, which are arguably insufficient when juxtaposed to the potentially devastating outcomes that can occur if surgical errors arise.",
        "doi": "10.1001/jamasurg.2021.3651",
        "issn": "2168-6254",
        "publisher": "American Medical Association",
        "publication": "JAMA Surgery",
        "publication_date": "2021-09-15",
        "series_number": "11",
        "volume": "156",
        "issue": "11",
        "pages": "1059-1060"
    },
    {
        "id": "authors:03nhs-3kv35",
        "collection": "authors",
        "collection_id": "03nhs-3kv35",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20210910-182725636",
        "type": "article",
        "title": "A systematic review of virtual reality for the assessment of technical skills in neurosurgery",
        "author": [
            {
                "family_name": "Chan",
                "given_name": "Justin",
                "clpid": "Chan-Justin"
            },
            {
                "family_name": "Pangal",
                "given_name": "Dhiraj J.",
                "orcid": "0000-0001-7391-9825",
                "clpid": "Pangal-Dhiraj-J"
            },
            {
                "family_name": "Cardinal",
                "given_name": "Tyler",
                "orcid": "0000-0001-8277-6942",
                "clpid": "Cardinal-Tyler"
            },
            {
                "family_name": "Kugener",
                "given_name": "Guillaume",
                "orcid": "0000-0002-4697-2847",
                "clpid": "Kugener-Guillaume"
            },
            {
                "family_name": "Zhu",
                "given_name": "Yichao",
                "clpid": "Zhu-Yichao"
            },
            {
                "family_name": "Roshannai",
                "given_name": "Arman",
                "clpid": "Roshannai-Arman"
            },
            {
                "family_name": "Markarian",
                "given_name": "Nicholas",
                "clpid": "Markarian-Nicholas"
            },
            {
                "family_name": "Sinha",
                "given_name": "Aditya",
                "clpid": "Sinha-Aditya"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Hung",
                "given_name": "Andrew J.",
                "orcid": "0000-0002-7201-6736",
                "clpid": "Hung-Andrew-J"
            },
            {
                "family_name": "Zada",
                "given_name": "Gabriel",
                "orcid": "0000-0001-5821-902X",
                "clpid": "Zada-Gabriel"
            },
            {
                "family_name": "Donoho",
                "given_name": "Daniel A.",
                "orcid": "0000-0002-0531-1436",
                "clpid": "Donoho-Daniel-A"
            }
        ],
        "abstract": "Objective: Virtual reality (VR) and augmented reality (AR) systems are increasingly available to neurosurgeons. These systems may provide opportunities for technical rehearsal and assessments of surgeon performance. The assessment of neurosurgeon skill in VR and AR environments and the validity of VR and AR feedback has not been systematically reviewed. \n\nMethods: A systematic review following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines was conducted through MEDLINE and PubMed. Studies published in English between January 1990 and February 2021 describing the use of VR or AR to quantify surgical technical performance of neurosurgeons without the use of human raters were included. The types and categories of automated performance metrics (APMs) from each of these studies were recorded. \n\nResults: Thirty-three VR studies were included in the review; no AR studies met inclusion criteria. VR APMs were categorized as either distance to target, force, kinematics, time, blood loss, or volume of resection. Distance and time were the most well-studied APM domains, although all domains were effective at differentiating surgeon experience levels. Distance was successfully used to track improvements with practice. Examining volume of resection demonstrated that attending surgeons removed less simulated tumor but preserved more normal tissue than trainees. More recently, APMs have been used in machine learning algorithms to predict level of training with a high degree of accuracy. Key limitations to enhanced-reality systems include limited AR usage for automated surgical assessment and lack of external and longitudinal validation of VR systems. \n\nConclusions: VR has been used to assess surgeon performance across a wide spectrum of domains. The VR environment can be used to quantify surgeon performance, assess surgeon proficiency, and track training progression. AR systems have not yet been used to provide metrics for surgeon performance assessment despite potential for intraoperative integration. VR-based APMs may be especially useful for metrics that are difficult to assess intraoperatively, including blood loss and extent of resection.",
        "doi": "10.3171/2021.5.focus21210",
        "issn": "1092-0684",
        "publisher": "American Association of Neurological Surgeons",
        "publication": "Neurosurgical Focus",
        "publication_date": "2021-08",
        "series_number": "2",
        "volume": "51",
        "issue": "2",
        "pages": "Art. No. E15"
    },
    {
        "id": "authors:jb40p-2w034",
        "collection": "authors",
        "collection_id": "jb40p-2w034",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20210831-203918113",
        "type": "article",
        "title": "SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies",
        "author": [
            {
                "family_name": "Fan",
                "given_name": "Linxi",
                "orcid": "0000-0001-7393-3125",
                "clpid": "Fan-Linxi-Jim"
            },
            {
                "family_name": "Wang",
                "given_name": "Guanzhi",
                "clpid": "Wang-Guanzhi"
            },
            {
                "family_name": "Huang",
                "given_name": "De-An",
                "orcid": "0000-0002-6945-7768",
                "clpid": "Huang-De-An"
            },
            {
                "family_name": "Yu",
                "given_name": "Zhiding",
                "orcid": "0000-0003-1776-996X",
                "clpid": "Yu-Zhiding"
            },
            {
                "family_name": "Fei-Fei",
                "given_name": "Li",
                "orcid": "0000-0002-7481-0810",
                "clpid": "Fei-Fei-Li"
            },
            {
                "family_name": "Zhu",
                "given_name": "Yuke",
                "orcid": "0000-0002-9198-2227",
                "clpid": "Zhu-Yuke"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            }
        ],
        "abstract": "Generalization has been a long-standing challenge for reinforcement learning (RL). Visual RL, in particular, can be easily distracted by irrelevant factors in high-dimensional observation space. In this work, we consider robust policy learning which targets zero-shot generalization to unseen visual environments with large distributional shift. We propose SECANT, a novel self-expert cloning technique that leverages image augmentation in two stages to *decouple* robust representation learning from policy optimization. Specifically, an expert policy is first trained by RL from scratch with weak augmentations. A student network then learns to mimic the expert policy by supervised learning with strong augmentations, making its representation more robust against visual variations compared to the expert. Extensive experiments demonstrate that SECANT significantly advances the state of the art in zero-shot generalization across 4 challenging domains. Our average reward improvements over prior SOTAs are: DeepMind Control (+26.5%), robotic manipulation (+337.8%), vision-based autonomous driving (+47.7%), and indoor object navigation (+15.8%). Code release and video are available at https://linxifan.github.io/secant-site/.",
        "doi": "10.48550/arXiv.2106.09678",
        "issn": "2640-3498",
        "publisher": "ML Research Press",
        "publication": "Proceedings of Machine Learning Research",
        "publication_date": "2021-07",
        "volume": "139",
        "pages": "3088-3099"
    },
    {
        "id": "authors:qydjs-btm85",
        "collection": "authors",
        "collection_id": "qydjs-btm85",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20210510-134322482",
        "type": "article",
        "title": "Image-Level or Object-Level? A Tale of Two Resampling Strategies for Long-Tailed Detection",
        "author": [
            {
                "family_name": "Chang",
                "given_name": "Nadine",
                "clpid": "Chang-Nadine"
            },
            {
                "family_name": "Yu",
                "given_name": "Zhiding",
                "clpid": "Yu-Zhiding"
            },
            {
                "family_name": "Wang",
                "given_name": "Yu-Xiong",
                "clpid": "Wang-Yu-Xiong"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Fidler",
                "given_name": "Sanja",
                "clpid": "Fidler-Sanja"
            },
            {
                "family_name": "Alvarez",
                "given_name": "Jose M.",
                "clpid": "Alvarez-Jose-M"
            }
        ],
        "abstract": "Training on datasets with long-tailed distributions has been challenging for major recognition tasks such as classification and detection. To deal with this challenge, image resampling is typically introduced as a simple but effective approach. However, we observe that long-tailed detection differs from classification since multiple classes may be present in one image. As a result, image resampling alone is not enough to yield a sufficiently balanced distribution at the object-level. We address object-level resampling by introducing an object-centric sampling strategy based on a dynamic, episodic memory bank. Our proposed strategy has two benefits: 1) convenient object-level resampling without significant extra computation, and 2) implicit feature-level augmentation from model updates. We show that image-level and object-level resamplings are both important, and thus unify them with a joint resampling strategy. Our method achieves state-of-the-art performance on the rare categories of LVIS, with 1.89% and 3.13% relative improvements over Forest R-CNN on detection and instance segmentation.",
        "doi": "10.48550/arXiv.2104.05702",
        "issn": "2640-3498",
        "publisher": "ML Research Press",
        "publication": "Proceedings of Machine Learning Research",
        "publication_date": "2021-07",
        "volume": "139",
        "pages": "1463-1472"
    },
    {
        "id": "authors:6a12y-8yq46",
        "collection": "authors",
        "collection_id": "6a12y-8yq46",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20210831-203904421",
        "type": "article",
        "title": "Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning",
        "author": [
            {
                "family_name": "Mahajan",
                "given_name": "Anuj",
                "clpid": "Mahajan-Anuj"
            },
            {
                "family_name": "Samvelyan",
                "given_name": "Mikayel",
                "clpid": "Samvelyan-Mikayel"
            },
            {
                "family_name": "Mao",
                "given_name": "Lei",
                "clpid": "Mao-Lei"
            },
            {
                "family_name": "Makoviychuk",
                "given_name": "Viktor",
                "clpid": "Makoviychuk-Viktor"
            },
            {
                "family_name": "Garg",
                "given_name": "Animesh",
                "orcid": "0000-0003-0482-4296",
                "clpid": "Garg-Animesh"
            },
            {
                "family_name": "Kossaifi",
                "given_name": "Jean",
                "orcid": "0000-0002-4445-3429",
                "clpid": "Kossaifi-Jean"
            },
            {
                "family_name": "Whiteson",
                "given_name": "Shimon",
                "clpid": "Whiteson-Shimon"
            },
            {
                "family_name": "Zhu",
                "given_name": "Yuke",
                "orcid": "0000-0002-9198-2227",
                "clpid": "Zhu-Yuke"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "clpid": "Anandkumar-A"
            }
        ],
        "abstract": "Reinforcement Learning in large action spaces is a challenging problem. This is especially true for cooperative multi-agent reinforcement learning (MARL), which often requires tractable learning while respecting various constraints like communication budget and information about other agents. In this work, we focus on the fundamental hurdle affecting both value-based and policy-gradient approaches: an exponential blowup of the action space with the number of agents. For value-based methods, it poses challenges in accurately representing the optimal value function for value-based methods, thus inducing suboptimality. For policy gradient methods, it renders the critic ineffective and exacerbates the problem of the lagging critic. We show that from a learning theory perspective, both problems can be addressed by accurately representing the associated action-value function with a low-complexity hypothesis class. This requires accurately modelling the agent interactions in a sample efficient way. To this end, we propose a novel tensorised formulation of the Bellman equation. This gives rise to our method Tesseract, which utilises the view of Q-function seen as a tensor where the modes correspond to action spaces of different agents. Algorithms derived from Tesseract decompose the Q-tensor across the agents and utilise low-rank tensor approximations to model the agent interactions relevant to the task. We provide PAC analysis for Tesseract based algorithms and highlight their relevance to the class of rich observation MDPs. Empirical results in different domains confirm the gains in sample efficiency using Tesseract as supported by the theory.",
        "doi": "10.48550/arXiv.2106.00136",
        "issn": "2640-3498",
        "publisher": "ML Research Press",
        "publication": "Proceedings of Machine Learning Research",
        "publication_date": "2021-07",
        "volume": "139",
        "pages": "7301-7312"
    },
    {
        "id": "authors:na91v-x7h45",
        "collection": "authors",
        "collection_id": "na91v-x7h45",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20210831-203857558",
        "type": "article",
        "title": "Coach-Player Multi-agent Reinforcement Learning for Dynamic Team Composition",
        "author": [
            {
                "family_name": "Liu",
                "given_name": "Bo",
                "clpid": "Liu-Bo"
            },
            {
                "family_name": "Liu",
                "given_name": "Qiang",
                "clpid": "Liu-Qiang"
            },
            {
                "family_name": "Stone",
                "given_name": "Peter",
                "clpid": "Stone-Peter"
            },
            {
                "family_name": "Garg",
                "given_name": "Animesh",
                "orcid": "0000-0003-0482-4296",
                "clpid": "Garg-Animesh"
            },
            {
                "family_name": "Zhu",
                "given_name": "Yuke",
                "orcid": "0000-0002-9198-2227",
                "clpid": "Zhu-Yuke"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "clpid": "Anandkumar-A"
            }
        ],
        "abstract": "In real-world multi-agent systems, agents with different capabilities may join or leave without altering the team's overarching goals. Coordinating teams with such dynamic composition is challenging: the optimal team strategy varies with the composition. We propose COPA, a coach-player framework to tackle this problem. We assume the coach has a global view of the environment and coordinates the players, who only have partial views, by distributing individual strategies. Specifically, we 1) adopt the attention mechanism for both the coach and the players; 2) propose a variational objective to regularize learning; and 3) design an adaptive communication method to let the coach decide when to communicate with the players. We validate our methods on a resource collection task, a rescue game, and the StarCraft micromanagement tasks. We demonstrate zero-shot generalization to new team compositions. Our method achieves comparable or better performance than the setting where all players have a full view of the environment. Moreover, we see that the performance remains high even when the coach communicates as little as 13% of the time using the adaptive communication strategy.",
        "doi": "10.48550/arXiv.2105.08692",
        "issn": "2640-3498",
        "publisher": "ML Research Press",
        "publication": "Proceedings of Machine Learning Research",
        "publication_date": "2021-07",
        "volume": "139",
        "pages": "6860-6870"
    },
    {
        "id": "authors:0nxef-bvv19",
        "collection": "authors",
        "collection_id": "0nxef-bvv19",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20210727-172214672",
        "type": "article",
        "title": "Robust Reinforcement Learning: A Constrained Game-theoretic Approach",
        "author": [
            {
                "family_name": "Yu",
                "given_name": "Jing",
                "clpid": "Yu-Jing"
            },
            {
                "family_name": "Gehring",
                "given_name": "Clement",
                "clpid": "Gehring-Clement"
            },
            {
                "family_name": "Sch\u00e4fer",
                "given_name": "Florian",
                "clpid": "Sch\u00e4fer-Florian"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "clpid": "Anandkumar-A"
            }
        ],
        "abstract": "Deep reinforcement learning (RL) methods provide state-of-art performance in complex control tasks. However, it has been widely recognized that RL methods often fail to generalize due to unaccounted uncertainties. In this work, we propose a game theoretic framework for robust reinforcement learning that comprises many previous works as special cases. We formulate robust RL as a constrained minimax game between the RL agent and an environmental agent which represents uncertainties such as model parameter variations and adversarial disturbances. To solve the competitive optimization problems arising in our framework, we propose to use competitive mirror descent (CMD). This method accounts for the interactive nature of the game at each iteration while using Bregman divergences to adapt to the global structure of the constraint set. We demonstrate an RRL policy gradient algorithm that leverages Lagrangian duality and CMD. We empirically show that our algorithm is stable for large step sizes, resulting in faster convergence on linear quadratic games.",
        "issn": "2640-3498",
        "publisher": "PMLR",
        "publication": "Proceedings of Machine Learning Research",
        "publication_date": "2021-06",
        "volume": "144",
        "pages": "1242-1254"
    },
    {
        "id": "authors:rszc7-7g943",
        "collection": "authors",
        "collection_id": "rszc7-7g943",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20210727-162630002",
        "type": "article",
        "title": "Finite-time System Identification and Adaptive Control in Autoregressive Exogenous Systems",
        "author": [
            {
                "family_name": "Lale",
                "given_name": "Sahin",
                "orcid": "0000-0002-7191-346X",
                "clpid": "Lale-Sahin"
            },
            {
                "family_name": "Azizzadenesheli",
                "given_name": "Kamyar",
                "orcid": "0000-0001-8507-1868",
                "clpid": "Azizzadenesheli-Kamyar"
            },
            {
                "family_name": "Hassibi",
                "given_name": "Babak",
                "orcid": "0000-0002-1375-5838",
                "clpid": "Hassibi-B"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            }
        ],
        "abstract": "Autoregressive exogenous (ARX) systems are the general class of input-output dynamical system used for modeling stochastic linear dynamical system (LDS) including partially observable LDS such as LQG systems. In this work, we study the problem of system identification and adaptive control of unknown ARX systems. We provide finite-time learning guarantees for the ARX systems under both open-loop and closed-loop data collection. Using these guarantees, we design adaptive control algorithms for unknown ARX systems with arbitrary strongly convex or non-strongly convex quadratic regulating costs. Under strongly convex cost functions, we design an adaptive control algorithm based on online gradient descent to design and update the controllers that are constructed via a convex controller reparametrization. We show that our algorithm has \u00d5(\u221aT) regret via explore and commit approach and if the model estimates are updated in epochs using closed-loop data collection, it attains the optimal regret of polylog(T) after T time-steps of interaction. For the case of non-strongly convex quadratic cost functions, we propose an adaptive control algorithm that deploys the optimism in the face of uncertainty principle to design the controller. In this setting, we show that the explore and commit approach has a regret upper bound of \u00d5(\u221aT^(2/3)), and the adaptive control with continuous model estimate updates attains \u00d5(\u221aT) regret after T time-steps.",
        "issn": "2640-3498",
        "publisher": "PMLR",
        "publication": "Proceedings of Machine Learning Research",
        "publication_date": "2021-06",
        "volume": "144",
        "pages": "967-979"
    },
    {
        "id": "authors:bz7g6-w7j65",
        "collection": "authors",
        "collection_id": "bz7g6-w7j65",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20210510-092451106",
        "type": "article",
        "title": "Stable Online Control of Linear Time-Varying Systems",
        "author": [
            {
                "family_name": "Qu",
                "given_name": "Guannan",
                "orcid": "0000-0002-5466-3550",
                "clpid": "Qu-Guannan"
            },
            {
                "family_name": "Shi",
                "given_name": "Yuanyuan",
                "orcid": "0000-0002-6182-7664",
                "clpid": "Shi-Yuanyuan"
            },
            {
                "family_name": "Lale",
                "given_name": "Sahin",
                "orcid": "0000-0002-7191-346X",
                "clpid": "Lale-Sahin"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Wierman",
                "given_name": "Adam",
                "orcid": "0000-0002-5923-0199",
                "clpid": "Wierman-A"
            }
        ],
        "abstract": "Linear time-varying (LTV) systems are widely used for modeling real-world dynamical systems due to their generality and simplicity. Providing stability guarantees for LTV systems is one of the central problems in control theory. However, existing approaches that guarantee stability typically lead to significantly sub-optimal cumulative control cost in online settings where only current or short-term system information is available. In this work, we propose an efficient online control algorithm, COvariance Constrained Online Linear Quadratic (COCO-LQ) control, that guarantees input-to-state stability for a large class of LTV systems while also minimizing the control cost. The proposed method incorporates a state covariance constraint into the semi-definite programming (SDP) formulation of the LQ optimal controller. We empirically demonstrate the performance of COCO-LQ in both synthetic experiments and a power system frequency control example.",
        "doi": "10.48550/arXiv.2104.14134",
        "issn": "2640-3498",
        "publisher": "PMLR",
        "publication": "Proceedings of Machine Learning Research",
        "publication_date": "2021-06",
        "volume": "144",
        "pages": "742-753"
    },
    {
        "id": "authors:12k1k-p2p46",
        "collection": "authors",
        "collection_id": "12k1k-p2p46",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20210225-132728423",
        "type": "article",
        "title": "Stability and Identification of Random Asynchronous Linear Time-Invariant Systems",
        "author": [
            {
                "family_name": "Lale",
                "given_name": "Sahin",
                "orcid": "0000-0002-7191-346X",
                "clpid": "Lale-Sahin"
            },
            {
                "family_name": "Teke",
                "given_name": "Oguzhan",
                "orcid": "0000-0002-1131-5206",
                "clpid": "Teke-Oguzhan"
            },
            {
                "family_name": "Hassibi",
                "given_name": "Babak",
                "orcid": "0000-0002-1375-5838",
                "clpid": "Hassibi-B"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            }
        ],
        "abstract": "In many computational tasks and dynamical systems, asynchrony and randomization are naturally present and have been considered as ways to increase the speed and reduce the cost of computation while compromising the accuracy and convergence rate. In this work, we show the additional benefits of randomization and asynchrony on the stability of linear dynamical systems. We introduce a natural model for random asynchronous linear time-invariant (LTI) systems which generalizes the standard (synchronous) LTI systems. In this model, each state variable is updated randomly and asynchronously with some probability according to the underlying system dynamics. We examine how the mean-square stability of random asynchronous LTI systems vary with respect to randomization and asynchrony. Surprisingly, we show that the stability of random asynchronous LTI systems does not imply or is not implied by the stability of the synchronous variant of the system and an unstable synchronous system can be stabilized via randomization and/or asynchrony. We further study a special case of the introduced model, namely randomized LTI systems, where each state element is updated randomly with some fixed but unknown probability. We consider the problem of system identification of unknown randomized LTI systems using the precise characterization of mean-square stability via extended Lyapunov equation. For unknown randomized LTI systems, we propose a systematic identification method to recover the underlying dynamics. Given a single input/output trajectory, our method estimates the model parameters that govern the system dynamics, the update probability of state variables, and the noise covariance using the correlation matrices of collected data and the extended Lyapunov equation. Finally, we empirically demonstrate that the proposed method consistently recovers the underlying system dynamics with optimal rate.",
        "doi": "10.48550/arXiv.2012.04160",
        "issn": "2640-3498",
        "publisher": "PMLR",
        "publication": "Proceedings of Machine Learning Research",
        "publication_date": "2021-06",
        "volume": "144",
        "pages": "651-663"
    },
    {
        "id": "authors:kasz1-0dp07",
        "collection": "authors",
        "collection_id": "kasz1-0dp07",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20200928-140721280",
        "type": "article",
        "title": "Deep learning-based computer vision to recognize and classify suturing gestures in robot-assisted surgery",
        "author": [
            {
                "family_name": "Luongo",
                "given_name": "Francisco",
                "clpid": "Luongo-Francisco-J"
            },
            {
                "family_name": "Hakim",
                "given_name": "Ryan",
                "clpid": "Hakim-Ryan"
            },
            {
                "family_name": "Nguyen",
                "given_name": "Jessica H.",
                "orcid": "0000-0003-0454-8463",
                "clpid": "Nguyen-Jessica-H"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Hung",
                "given_name": "Andrew J.",
                "orcid": "0000-0002-7201-6736",
                "clpid": "Hung-Andrew-J"
            }
        ],
        "abstract": "Background: Our previous work classified a taxonomy of needle driving gestures during a vesicourethral anastomosis of robotic radical prostatectomy in association with tissue tears and patient outcomes. Herein, we train deep learning-based computer vision to automate the identification and classification of suturing gestures for needle driving attempts. \n\nMethods: Two independent raters manually annotated live suturing video clips to label timepoints and gestures. Identification (2,395 videos) and classification (511 videos) datasets were compiled to train computer vision models to produce 2- and 5-class label predictions, respectively. Networks were trained on inputs of raw red/blue/green pixels as well as optical flow for each frame. We explore the effect of different recurrent models (long short-term memory versus convolutional long short-term memory). All models were trained on 80/20 train/test splits. \n\nResults: We observe that all models are able to reliably predict either the presence of a gesture (identification, area under the curve: 0.88) as well as the type of gesture (classification, area under the curve: 0.87) at significantly above chance levels. For both gesture identification and classification datasets, we observed no effect of recurrent classification model choice on performance. \n\nConclusion: Our results demonstrate computer vision's ability to recognize features that not only can identify the action of suturing but also distinguish between different classifications of suturing gestures. This demonstrates the potential to utilize deep learning computer vision toward future automation of surgical skill assessment.",
        "doi": "10.1016/j.surg.2020.08.016",
        "pmcid": "PMC7994208",
        "issn": "0039-6060",
        "publisher": "Elsevier",
        "publication": "Surgery",
        "publication_date": "2021-05",
        "series_number": "5",
        "volume": "169",
        "issue": "5",
        "pages": "1240-1244"
    },
    {
        "id": "authors:gaad6-9qt19",
        "collection": "authors",
        "collection_id": "gaad6-9qt19",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20210611-152119929",
        "type": "article",
        "title": "Tensor Methods in Computer Vision and Deep Learning",
        "author": [
            {
                "family_name": "Panagakis",
                "given_name": "Yannis",
                "orcid": "0000-0003-0153-5210",
                "clpid": "Panagakis-Yannis"
            },
            {
                "family_name": "Kossaifi",
                "given_name": "Jean",
                "orcid": "0000-0002-4445-3429",
                "clpid": "Kossaifi-Jean"
            },
            {
                "family_name": "Chrysos",
                "given_name": "Grigorios G.",
                "orcid": "0000-0002-0650-1856",
                "clpid": "Chrysos-Grigorios-G"
            },
            {
                "family_name": "Oldfield",
                "given_name": "James",
                "orcid": "0000-0002-7000-5179",
                "clpid": "Oldfield-James"
            },
            {
                "family_name": "Nicolaou",
                "given_name": "Mihalis A.",
                "orcid": "0000-0001-9175-477X",
                "clpid": "Nicolaou-Mihalis-A"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Zafeiriou",
                "given_name": "Stefanos",
                "orcid": "0000-0002-5222-1740",
                "clpid": "Zafeiriou-Stefanos"
            }
        ],
        "abstract": "Tensors, or multidimensional arrays, are data structures that can naturally represent visual data of multiple dimensions. Inherently able to efficiently capture structured, latent semantic spaces and high-order interactions, tensors have a long history of applications in a wide span of computer vision problems. With the advent of the deep learning paradigm shift in computer vision, tensors have become even more fundamental. Indeed, essential ingredients in modern deep learning architectures, such as convolutions and attention mechanisms, can readily be considered as tensor mappings. In effect, tensor methods are increasingly finding significant applications in deep learning, including the design of memory and compute efficient network architectures, improving robustness to random noise and adversarial attacks, and aiding the theoretical understanding of deep networks. This article provides an in-depth and practical review of tensors and tensor methods in the context of representation learning and deep learning, with a particular focus on visual data analysis and computer vision applications. Concretely, besides fundamental work in tensor-based visual data analysis methods, we focus on recent developments that have brought on a gradual increase in tensor methods, especially in deep learning architectures and their implications in computer vision applications. To further enable the newcomer to grasp such concepts quickly, we provide companion Python notebooks, covering key aspects of this article and implementing them, step-by-step with TensorLy.",
        "doi": "10.1109/jproc.2021.3074329",
        "issn": "0018-9219",
        "publisher": "IEEE",
        "publication": "Proceedings of the IEEE",
        "publication_date": "2021-05",
        "series_number": "5",
        "volume": "109",
        "issue": "5",
        "pages": "863-890"
    },
    {
        "id": "authors:p1h5b-5rx70",
        "collection": "authors",
        "collection_id": "p1h5b-5rx70",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20210223-154127043",
        "type": "article",
        "title": "Physics-informed machine learning: case studies for weather and climate modelling",
        "author": [
            {
                "family_name": "Kashinath",
                "given_name": "K.",
                "orcid": "0000-0002-9311-5215",
                "clpid": "Kashinath-Karthik"
            },
            {
                "family_name": "Mustafa",
                "given_name": "M.",
                "clpid": "Mustafa-M"
            },
            {
                "family_name": "Albert",
                "given_name": "A.",
                "clpid": "Albert-A"
            },
            {
                "family_name": "Wu",
                "given_name": "J-L.",
                "clpid": "Wu-J-L"
            },
            {
                "family_name": "Jiang",
                "given_name": "C.",
                "clpid": "Jiang-C"
            },
            {
                "family_name": "Esmaeilzadeh",
                "given_name": "S.",
                "orcid": "0000-0001-6122-9122",
                "clpid": "Esmaeilzadeh-Soheil"
            },
            {
                "family_name": "Azizzadenesheli",
                "given_name": "K.",
                "orcid": "0000-0001-8507-1868",
                "clpid": "Azizzadenesheli-Kamyar"
            },
            {
                "family_name": "Wang",
                "given_name": "R.",
                "clpid": "Wang-R"
            },
            {
                "family_name": "Chattopadhyay",
                "given_name": "A.",
                "clpid": "Chattopadhyay-A"
            },
            {
                "family_name": "Singh",
                "given_name": "A.",
                "clpid": "Singh-A"
            },
            {
                "family_name": "Manepalli",
                "given_name": "A.",
                "clpid": "Manepalli-A"
            },
            {
                "family_name": "Chirila",
                "given_name": "D.",
                "orcid": "0000-0002-6394-4688",
                "clpid": "Chirila-Dragos"
            },
            {
                "family_name": "Yu",
                "given_name": "R.",
                "clpid": "Yu-R"
            },
            {
                "family_name": "Walters",
                "given_name": "R.",
                "clpid": "Walters-R"
            },
            {
                "family_name": "White",
                "given_name": "B.",
                "orcid": "0000-0002-3739-9604",
                "clpid": "White-Brian"
            },
            {
                "family_name": "Xiao",
                "given_name": "H.",
                "clpid": "Xiao-H"
            },
            {
                "family_name": "Tchelepi",
                "given_name": "H. A.",
                "orcid": "0000-0002-3084-6635",
                "clpid": "Tchelepi-Hamdi-A"
            },
            {
                "family_name": "Marcus",
                "given_name": "P.",
                "clpid": "Marcus-P"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "A.",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Hassanzadeh",
                "given_name": "P.",
                "orcid": "0000-0001-9425-8085",
                "clpid": "Hassanzadeh-Pedram"
            }
        ],
        "abstract": "Machine learning (ML) provides novel and powerful ways of accurately and efficiently recognizing complex patterns, emulating nonlinear dynamics, and predicting the spatio-temporal evolution of weather and climate processes. Off-the-shelf ML models, however, do not necessarily obey the fundamental governing laws of physical systems, nor do they generalize well to scenarios on which they have not been trained. We survey systematic approaches to incorporating physics and domain knowledge into ML models and distill these approaches into broad categories. Through 10 case studies, we show how these approaches have been used successfully for emulating, downscaling, and forecasting weather and climate processes. The accomplishments of these studies include greater physical consistency, reduced training time, improved data efficiency, and better generalization. Finally, we synthesize the lessons learned and identify scientific, diagnostic, computational, and resource challenges for developing truly robust and reliable physics-informed ML models for weather and climate processes.",
        "doi": "10.1098/rsta.2020.0093",
        "issn": "1364-503X",
        "publisher": "Royal Society of London",
        "publication": "Philosophical Transactions A: Mathematical, Physical and Engineering Sciences",
        "publication_date": "2021-04-05",
        "series_number": "2194",
        "volume": "379",
        "issue": "2194",
        "pages": "Art. No. 20200093"
    },
    {
        "id": "authors:vat05-w9c33",
        "collection": "authors",
        "collection_id": "vat05-w9c33",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20201110-074357009",
        "type": "article",
        "title": "Active Learning under Label Shift",
        "author": [
            {
                "family_name": "Zhao",
                "given_name": "Eric",
                "orcid": "0000-0002-9595-0150",
                "clpid": "Zhao-Eric"
            },
            {
                "family_name": "Liu",
                "given_name": "Anqi",
                "clpid": "Liu-Anqi"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Yue",
                "given_name": "Yisong",
                "orcid": "0000-0001-9127-1989",
                "clpid": "Yue-Yisong"
            }
        ],
        "abstract": "We address the problem of active learning under label shift: when the class proportions of source and target domains differ. We introduce a \"medial distribution\" to incorporate a tradeoff between importance weighting and class-balanced sampling and propose their combined usage in active learning. Our method is known as Mediated Active Learning under Label Shift (MALLS). It balances the bias from class-balanced sampling and the variance from importance weighting. We prove sample complexity and generalization guarantees for MALLS which show active learning reduces asymptotic sample complexity even under arbitrary label shift. We empirically demonstrate MALLS scales to high-dimensional datasets and can reduce the sample complexity of active learning by 60% in deep active learning tasks.",
        "doi": "10.48550/arXiv.2007.08479",
        "issn": "2640-3498",
        "publisher": "PMLR",
        "publication": "Proceedings of Machine Learning Research",
        "publication_date": "2021-04",
        "volume": "130",
        "pages": "3412-3420"
    },
    {
        "id": "authors:m40dq-4s262",
        "collection": "authors",
        "collection_id": "m40dq-4s262",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20200526-150616242",
        "type": "article",
        "title": "Chance-Constrained Trajectory Optimization for Safe Exploration and Learning of Nonlinear Systems",
        "author": [
            {
                "family_name": "Nakka",
                "given_name": "Yashwanth Kumar",
                "orcid": "0000-0001-7897-3644",
                "clpid": "Nakka-Yashwanth-K"
            },
            {
                "family_name": "Liu",
                "given_name": "Anqi",
                "clpid": "Liu-Anqi"
            },
            {
                "family_name": "Shi",
                "given_name": "Guanya",
                "orcid": "0000-0002-9075-3705",
                "clpid": "Shi-Guanya"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Yue",
                "given_name": "Yisong",
                "orcid": "0000-0001-9127-1989",
                "clpid": "Yue-Yisong"
            },
            {
                "family_name": "Chung",
                "given_name": "Soon-Jo",
                "orcid": "0000-0002-6657-3907",
                "clpid": "Chung-Soon-Jo"
            }
        ],
        "abstract": "Learning-based control algorithms require data collection with abundant supervision for training. Safe exploration algorithms ensure the safety of this data collection process even when only partial knowledge is available. We present a new approach for optimal motion planning with safe exploration that integrates chance-constrained stochastic optimal control with dynamics learning and feedback control. We derive an iterative convex optimization algorithm that solves an Information-cost Stochastic Nonlinear Optimal Control problem (Info-SNOC). The optimization objective encodes control cost for performance and exploration cost for learning, and the safety is incorporated as distributionally robust chance constraints. The dynamics are predicted from a robust regression model that is learned from data. The Info-SNOC algorithm is used to compute a sub-optimal pool of safe motion plans that aid in exploration for learning unknown residual dynamics under safety constraints. A stable feedback controller is used to execute the motion plan and collect data for model learning. We prove the safety of rollout from our exploration method and reduction in uncertainty over epochs, thereby guaranteeing the consistency of our learning method. We validate the effectiveness of Info-SNOC by designing and implementing a pool of safe trajectories for a planar robot. We demonstrate that our approach has higher success rate in ensuring safety when compared to a deterministic trajectory optimization approach.",
        "doi": "10.1109/LRA.2020.3044033",
        "issn": "2377-3766",
        "publisher": "IEEE",
        "publication": "IEEE Robotics and Automation Letters",
        "publication_date": "2021-04",
        "series_number": "2",
        "volume": "6",
        "issue": "2",
        "pages": "389-396"
    },
    {
        "id": "authors:355ty-9zz43",
        "collection": "authors",
        "collection_id": "355ty-9zz43",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20200420-154710784",
        "type": "article",
        "title": "The Potential Dangers of Artificial Intelligence for Radiology and Radiologists",
        "author": [
            {
                "family_name": "Chu",
                "given_name": "Linda C.",
                "clpid": "Chu-Linda-C"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Shin",
                "given_name": "Hoo Chang",
                "clpid": "Shin-Hoo-Chang"
            },
            {
                "family_name": "Fishman",
                "given_name": "Elliot K.",
                "clpid": "Fishman-Elliot-K"
            }
        ],
        "abstract": "With the advent of artificial intelligence (AI) across many fields and subspecialties, there are considerable expectations for transformative impact. However, there are also concerns regarding the potential abuse of AI. Many scientists have been worried about the dangers of AI leading to \"biased\" conclusions, in part because of the enthusiasm of the inventor or overenthusiasm among the general public. Here, though, we consider some scenarios in which people may intend to cause potential errors within data sets of analyzed information, resulting in incorrect conclusions and leading to potential problems with patient care and outcomes.",
        "doi": "10.1016/j.jacr.2020.04.010",
        "pmcid": "PMC7164850",
        "issn": "1546-1440",
        "publisher": "Elsevier",
        "publication": "Journal of the American College of Radiology",
        "publication_date": "2020-10",
        "series_number": "10",
        "volume": "17",
        "issue": "10",
        "pages": "1309-1311"
    },
    {
        "id": "authors:4m70t-56j02",
        "collection": "authors",
        "collection_id": "4m70t-56j02",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20200818-095759329",
        "type": "article",
        "title": "OrbNet: Deep learning for quantum chemistry using symmetry-adapted atomic-orbital features",
        "author": [
            {
                "family_name": "Qiao",
                "given_name": "Zhuoran",
                "orcid": "0000-0002-5704-7331",
                "clpid": "Qiao-Zhuoran"
            },
            {
                "family_name": "Welborn",
                "given_name": "Matthew",
                "orcid": "0000-0001-8659-6535",
                "clpid": "Welborn-Matthew-G"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Manby",
                "given_name": "Frederick R.",
                "orcid": "0000-0001-7611-714X",
                "clpid": "Manby-Frederick-R"
            },
            {
                "family_name": "Miller",
                "given_name": "Thomas F., III",
                "orcid": "0000-0002-1882-5380",
                "clpid": "Miller-T-F-III"
            }
        ],
        "abstract": "We introduce a machine learning method in which energy solutions from the Schr\u00f6dinger equation are predicted using symmetry adapted atomic orbital features and a graph neural-network architecture. OrbNet is shown to outperform existing methods in terms of learning efficiency and transferability for the prediction of density functional theory results while employing low-cost features that are obtained from semi-empirical electronic structure calculations. For applications to datasets of drug-like molecules, including QM7b-T, QM9, GDB-13-T, DrugBank, and the conformer benchmark dataset of Folmsbee and Hutchison [Int. J. Quantum Chem. (published online) (2020)], OrbNet predicts energies within chemical accuracy of density functional theory at a computational cost that is 1000-fold or more reduced.",
        "doi": "10.1063/5.0021955",
        "issn": "0021-9606",
        "publisher": "American Institute of Physics",
        "publication": "Journal of Chemical Physics",
        "publication_date": "2020-09-28",
        "series_number": "12",
        "volume": "153",
        "issue": "12",
        "pages": "Art. No. 124111"
    },
    {
        "id": "authors:3gm5w-asa82",
        "collection": "authors",
        "collection_id": "3gm5w-asa82",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20201106-120151731",
        "type": "article",
        "title": "OCEAN: Online Task Inference for Compositional Tasks with Context Adaptation",
        "author": [
            {
                "family_name": "Ren",
                "given_name": "Hongyu",
                "clpid": "Ren-Hongyu"
            },
            {
                "family_name": "Zhu",
                "given_name": "Yuke",
                "orcid": "0000-0002-9198-2227",
                "clpid": "Zhu-Yuke"
            },
            {
                "family_name": "Leskovec",
                "given_name": "Jure",
                "orcid": "0000-0002-5411-923X",
                "clpid": "Leskovec-J"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Garg",
                "given_name": "Animesh",
                "orcid": "0000-0003-0482-4296",
                "clpid": "Garg-Animesh"
            }
        ],
        "abstract": "Real-world tasks often exhibit a compositional structure that contains a sequence of simpler sub-tasks. For instance, opening a door requires reaching, grasping, rotating, and pulling the door knob. Such compositional tasks require an agent to reason about the sub-task at hand while orchestrating global behavior accordingly. This can be cast as an online task inference problem, where the current task identity, represented by a context variable, is estimated from the agent's past experiences with probabilistic inference. Previous approaches have employed simple latent distributions, e.g., Gaussian, to model a single context for the entire task. However, this formulation lacks the expressiveness to capture the composition and transition of the sub-tasks. We propose a variational inference framework OCEAN to perform online task inference for compositional tasks. OCEAN models global and local context variables in a joint latent space, where the global variables represent a mixture of sub-tasks required for the task, while the local variables capture the transitions between the sub-tasks. Our framework supports flexible latent distributions based on prior knowledge of the task structure and can be trained in an unsupervised manner. Experimental results show that OCEAN provides more effective task inference with sequential context adaptation and thus leads to a performance boost on complex, multi-stage tasks.",
        "doi": "10.48550/arXiv.2008.07087",
        "issn": "2640-3498",
        "publisher": "PMLR",
        "publication": "Proceedings of Machine Learning Research",
        "publication_date": "2020-08",
        "volume": "124",
        "pages": "1378-1387"
    },
    {
        "id": "authors:rztks-hw818",
        "collection": "authors",
        "collection_id": "rztks-hw818",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20190327-085728859",
        "type": "article",
        "title": "Tensor Regression Networks",
        "author": [
            {
                "family_name": "Kossaifi",
                "given_name": "Jean",
                "clpid": "Kossaifi-J"
            },
            {
                "family_name": "Lipton",
                "given_name": "Zachary C.",
                "clpid": "Lipton-Z-C"
            },
            {
                "family_name": "Kolbeinsson",
                "given_name": "Arinbj\u00f6rn",
                "clpid": "Kolbeinsson-A"
            },
            {
                "family_name": "Khanna",
                "given_name": "Aran",
                "clpid": "Khanna-A"
            },
            {
                "family_name": "Furlanello",
                "given_name": "Tommaso",
                "clpid": "Furlanello-T"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "clpid": "Anandkumar-A"
            }
        ],
        "abstract": "Convolutional neural networks typically consist of many convolutional layers followed by one or more fully connected layers. While convolutional layers map between high-order activation tensors, the fully connected layers operate on flattened activation vectors. Despite empirical success, this approach has notable drawbacks. Flattening followed by fully connected layers discards multilinear structure in the activations and requires many parameters. We address these problems by incorporating tensor algebraic operations that preserve multilinear structure at every layer. First, we introduce Tensor Contraction Layers (TCLs) that reduce the dimensionality of their input while preserving their multilinear structure using tensor contraction. Next, we introduce Tensor Regression Layers (TRLs), which express outputs through a low-rank multilinear mapping from a high-order activation tensor to an output tensor of arbitrary order. We learn the contraction and regression factors end-to-end, and produce accurate nets with fewer parameters. Additionally, our layers regularize networks by imposing low-rank constraints on the activations (TCL) and regression weights (TRL). Experiments on ImageNet show that, applied to VGG and ResNet architectures, TCLs and TRLs reduce the number of parameters compared to fully connected layers by more than 65% while maintaining or increasing accuracy. In addition to the space savings, our approach's ability to leverage topological structure can be crucial for structured data such as MRI. In particular, we demonstrate significant performance improvements over comparable architectures on three tasks associated with the UK Biobank dataset.",
        "doi": "10.48550/arXiv.1707.08308",
        "issn": "1533-7928",
        "publisher": "Journal of Machine Learning Research",
        "publication": "Journal of Machine Learning Research",
        "publication_date": "2020-07-20",
        "volume": "21",
        "pages": "1-21"
    },
    {
        "id": "authors:y0yqt-30w39",
        "collection": "authors",
        "collection_id": "y0yqt-30w39",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20201106-120205331",
        "type": "article",
        "title": "Automated Synthetic-to-Real Generalization",
        "author": [
            {
                "family_name": "Chen",
                "given_name": "Wuyang",
                "clpid": "Chen-Wuyang"
            },
            {
                "family_name": "Yu",
                "given_name": "Zhiding",
                "clpid": "Yu-Zhiding"
            },
            {
                "family_name": "Wang",
                "given_name": "Zhangyang",
                "orcid": "0000-0002-2050-5693",
                "clpid": "Wang-Zhangyang"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "clpid": "Anandkumar-A"
            }
        ],
        "abstract": "Models trained on synthetic images often face degraded generalization to real data. As a convention, these models are often initialized with ImageNet pretrained representation. Yet the role of ImageNet knowledge is seldom discussed despite common practices that leverage this knowledge to maintain the generalization ability. An example is the careful hand-tuning of early stopping and layer-wise learning rates, which is shown to improve synthetic-to-real generalization but is also laborious and heuristic. In this work, we explicitly encourage the synthetically trained model to maintain similar representations with the ImageNet pretrained model, and propose a learning-to-optimize (L2O) strategy to automate the selection of layer-wise learning rates. We demonstrate that the proposed framework can significantly improve the synthetic-to-real generalization performance without seeing and training on real data, while also benefiting downstream tasks such as domain adaptation. Code is available at: https://github.com/NVlabs/ASG.",
        "doi": "10.48550/arXiv.2007.06965",
        "issn": "2640-3498",
        "publisher": "ML Research Press",
        "publication": "Proceedings of Machine Learning Research",
        "publication_date": "2020-07-14",
        "volume": "119",
        "pages": "1746-1756"
    },
    {
        "id": "authors:8k3ah-wcs70",
        "collection": "authors",
        "collection_id": "8k3ah-wcs70",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20200109-084932688",
        "type": "article",
        "title": "Angular Visual Hardness",
        "author": [
            {
                "family_name": "Chen",
                "given_name": "Beidi",
                "clpid": "Chen-Beidi"
            },
            {
                "family_name": "Liu",
                "given_name": "Weiyang",
                "clpid": "Liu-Weiyang"
            },
            {
                "family_name": "Yu",
                "given_name": "Zhiding",
                "clpid": "Yu-Zhiding"
            },
            {
                "family_name": "Kautz",
                "given_name": "Jan",
                "clpid": "Kautz-Jan"
            },
            {
                "family_name": "Shrivastava",
                "given_name": "Anshumali",
                "clpid": "Shrivastava-Anshumali"
            },
            {
                "family_name": "Garg",
                "given_name": "Animesh",
                "orcid": "0000-0003-0482-4296",
                "clpid": "Garg-Animesh"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "clpid": "Anandkumar-A"
            }
        ],
        "abstract": "Recent convolutional neural networks (CNNs) have led to impressive performance but often suffer from poor calibration. They tend to be overconfident, with the model confidence not always reflecting the underlying true ambiguity and hardness. In this paper, we propose angular visual hardness (AVH), a score given by the normalized angular distance between the sample feature embedding and the target classifier to measure sample hardness. We validate this score with an in-depth and extensive scientific study, and observe that CNN models with the highest accuracy also have the best AVH scores. This agrees with an earlier finding that state-of-art models improve on the classification of harder examples. We observe that the training dynamics of AVH is vastly different compared to the training loss. Specifically, AVH quickly reaches a plateau for all samples even though the training loss keeps improving. This suggests the need for designing better loss functions that can target harder examples more effectively. We also find that AVH has a statistically significant correlation with human visual hardness. Finally, we demonstrate the benefit of AVH to a variety of applications such as self-training for domain adaptation and domain generalization.",
        "doi": "10.48550/arXiv.1912.02279",
        "issn": "2640-3498",
        "publisher": "ML Research Press",
        "publication": "Proceedings of Machine Learning Research",
        "publication_date": "2020-07",
        "volume": "119",
        "pages": "1637-1648"
    },
    {
        "id": "authors:0tppf-v5s28",
        "collection": "authors",
        "collection_id": "0tppf-v5s28",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20190905-154247884",
        "type": "article",
        "title": "Directivity Modes of Earthquake Populations with Unsupervised Learning",
        "author": [
            {
                "family_name": "Ross",
                "given_name": "Zachary E.",
                "orcid": "0000-0002-6343-8400",
                "clpid": "Ross-Z-E"
            },
            {
                "family_name": "Trugman",
                "given_name": "Daniel T.",
                "orcid": "0000-0002-9296-4223",
                "clpid": "Trugman-Daniel-T"
            },
            {
                "family_name": "Azizzadenesheli",
                "given_name": "Kamyar",
                "orcid": "0000-0001-8507-1868",
                "clpid": "Azizzadenesheli-Kamyar"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            }
        ],
        "abstract": "We present a novel approach for resolving modes of rupture directivity in large populations of earthquakes. A seismic spectral decomposition technique is used to first produce relative measurements of radiated energy for earthquakes in a spatially compact cluster. The azimuthal distribution of energy for each earthquake is then assumed to result from one of several distinct modes of rupture propagation. Rather than fitting a kinematic rupture model to determine the most likely mode of rupture propagation, we instead treat the modes as latent variables and learn them with a Gaussian mixture model. The mixture model simultaneously determines the number of events that best identify with each mode. The technique is demonstrated on four datasets in California, each with compact clusters of several thousand earthquakes with comparable slip mechanisms. We show that the datasets naturally decompose into distinct rupture propagation modes that correspond to different rupture directions, and the fault plane is unambiguously identified for all cases. We find that these small earthquakes exhibit unilateral ruptures 63\u201373% of the time on average. The results provide important observational constraints on the physics of earthquakes and faults.",
        "doi": "10.1029/2019JB018299",
        "issn": "2169-9313",
        "publisher": "American Geophysical Union",
        "publication": "Journal of Geophysical Research. Solid Earth",
        "publication_date": "2020-02",
        "series_number": "2",
        "volume": "125",
        "issue": "2",
        "pages": "Art. No. e2019JB018299"
    },
    {
        "id": "authors:j16vt-na095",
        "collection": "authors",
        "collection_id": "j16vt-na095",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20200526-130837701",
        "type": "article",
        "title": "Spectral Learning on Matrices and Tensors",
        "author": [
            {
                "family_name": "Janzamin",
                "given_name": "Majid",
                "clpid": "Janzamin-M"
            },
            {
                "family_name": "Ge",
                "given_name": "Rong",
                "clpid": "Ge-Rong"
            },
            {
                "family_name": "Kossaifi",
                "given_name": "Jean",
                "clpid": "Kossaifi-J"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "clpid": "Anandkumar-A"
            }
        ],
        "abstract": "Spectral methods have been the mainstay in several domains such as machine learning, applied mathematics and scientific computing. They involve finding a certain kind of spectral decomposition to obtain basis functions that can capture important structures or directions for the problem at hand. The most common spectral method is the principal component analysis (PCA). It utilizes the principal components or the top eigenvectors of the data covariance matrix to carry out dimensionality reduction as one of its applications. This data pre-processing step is often effective in separating signal from noise. PCA and other spectral techniques applied to matrices have several limitations. By limiting to only pairwise moments, they are effectively making a Gaussian approximation on the underlying data. Hence, they fail on data with hidden variables which lead to non-Gaussianity. However, in almost any data set, there are latent effects that cannot be directly observed, e.g., topics in a document corpus, or underlying causes of a disease. By extending the spectral decomposition methods to higher order moments, we demonstrate the ability to learn a wide range of latent variable models efficiently. Higher-order moments can be represented by tensors, and intuitively, they can encode more information than just pairwise moment matrices. More crucially, tensor decomposition can pick up latent effects that are missed by matrix methods. For instance, tensor decomposition can uniquely identify non-orthogonal components. Exploiting these aspects turns out to be fruitful for provable unsupervised learning of a wide range of latent variable models. We also outline the computational techniques to design efficient tensor decomposition methods. They are embarrassingly parallel and thus scalable to large data sets. Whilst there exist many optimized linear algebra software packages, efficient tensor algebra packages are also beginning to be developed. We introduce Tensorly, which has a simple python interface for expressing tensor operations. It has a flexible back-end system supporting NumPy, PyTorch, TensorFlow and MXNet amongst others. This allows it to carry out multi-GPU and CPU operations, and can also be seamlessly integrated with deep-learning functionalities.",
        "doi": "10.1561/2200000057",
        "issn": "1935-8237",
        "publisher": "Now Publishers",
        "publication": "Foundations and Trends in Machine Learning",
        "publication_date": "2019-11-28",
        "series_number": "5-6",
        "volume": "12",
        "issue": "5-6",
        "pages": "393-536"
    },
    {
        "id": "authors:dx4yc-jbh70",
        "collection": "authors",
        "collection_id": "dx4yc-jbh70",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20220107-163918011",
        "type": "article",
        "title": "Guaranteed Scalable Learning of Latent Tree Models",
        "author": [
            {
                "family_name": "Huang",
                "given_name": "Furong",
                "clpid": "Huang-Furong"
            },
            {
                "family_name": "Naresh",
                "given_name": "Niranjan Uma",
                "clpid": "Naresh-Niranjan-Uma"
            },
            {
                "family_name": "Perros",
                "given_name": "Ioakeim",
                "clpid": "Perros-Ioakeim"
            },
            {
                "family_name": "Chen",
                "given_name": "Robert",
                "clpid": "Chen-Robert"
            },
            {
                "family_name": "Sun",
                "given_name": "Jimeng",
                "clpid": "Sun-Jimeng"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "clpid": "Anandkumar-A"
            }
        ],
        "abstract": "We present an integrated approach to structure and parameter estimation in latent tree graphical models, where some nodes are hidden. Our overall approach follows a \"divide-and-conquer\" strategy that learns models over small groups of variables and iteratively merges into a global solution. The structure learning involves combinatorial operations such as minimum spanning tree construction and local recursive grouping; the parameter learning is based on the method of moments and on tensor decompositions. Our method is guaranteed to correctly recover the unknown tree structure and the model parameters with low sample complexity for the class of linear multivariate latent tree models which includes discrete and Gaussian distributions, and Gaussian mixtures. Our bulk asynchronous parallel algorithm is implemented in parallel and scales logarithmically with the number of variables and linearly with dimensionality of each variable.",
        "doi": "10.48550/arXiv.1406.4566",
        "issn": "2640-3498",
        "publisher": "ML Research Press",
        "publication": "Proceedings of Machine Learning Research",
        "publication_date": "2019-07",
        "volume": "115",
        "pages": "883-893"
    },
    {
        "id": "authors:ey15f-s3m29",
        "collection": "authors",
        "collection_id": "ey15f-s3m29",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20190327-085810844",
        "type": "article",
        "title": "Open Vocabulary Learning on Source Code with a Graph-Structured Cache",
        "author": [
            {
                "family_name": "Cvitkovic",
                "given_name": "Milan",
                "clpid": "Cvitkovic-M"
            },
            {
                "family_name": "Singh",
                "given_name": "Badal",
                "clpid": "Singh-Badal"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "clpid": "Anandkumar-A"
            }
        ],
        "abstract": "Machine learning models that take computer program source code as input typically use Natural Language Processing (NLP) techniques. However, a major challenge is that code is written using an open, rapidly changing vocabulary due to, e.g., the coinage of new variable and method names. Reasoning over such a vocabulary is not something for which most NLP methods are designed. We introduce a Graph-Structured Cache to address this problem; this cache contains a node for each new word the model encounters with edges connecting each word to its occurrences in the code. We find that combining this graph-structured cache strategy with recent Graph-Neural-Network-based models for supervised learning on code improves the models' performance on a code completion task and a variable naming task \u2014 with over 100% relative improvement on the latter \u2014 at the cost of a moderate increase in computation time.",
        "doi": "10.48550/arXiv.1810.08305",
        "issn": "2640-3498",
        "publisher": "PMLR",
        "publication": "Proceedings of Machine Learning Research",
        "publication_date": "2019-06",
        "volume": "97",
        "pages": "1475-1485"
    },
    {
        "id": "authors:7gpw5-j1d54",
        "collection": "authors",
        "collection_id": "7gpw5-j1d54",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20200302-111944472",
        "type": "article",
        "title": "Junior AI researchers are in demand by universities and industry",
        "author": [
            {
                "family_name": "Kwok",
                "given_name": "Roberta",
                "clpid": "Kwok-Roberta"
            },
            {
                "family_name": "Ranade",
                "given_name": "Gireeja",
                "clpid": "Ranade-G"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Maskey",
                "given_name": "Sameer",
                "clpid": "Maskey-S"
            },
            {
                "family_name": "Mohaghegh",
                "given_name": "Mahsa",
                "clpid": "Mohaghegh-M"
            },
            {
                "family_name": "Vreeken",
                "given_name": "Jilles",
                "clpid": "Vreeken-J"
            },
            {
                "family_name": "Herman",
                "given_name": "Herman",
                "clpid": "Herman-H"
            }
        ],
        "abstract": "Opportunities for moving between academia and business are expanding for scientists as companies step up recruitment.",
        "doi": "10.1038/d41586-019-01248-w",
        "issn": "0028-0836",
        "publisher": "Nature Publishing Group",
        "publication": "Nature",
        "publication_date": "2019-04-23",
        "series_number": "7753",
        "volume": "568",
        "issue": "7753",
        "pages": "581-583"
    },
    {
        "id": "authors:ne64r-z5789",
        "collection": "authors",
        "collection_id": "ne64r-z5789",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20190228-133230688",
        "type": "article",
        "title": "TensorLy: Tensor Learning in Python",
        "author": [
            {
                "family_name": "Kossaifi",
                "given_name": "Jean",
                "clpid": "Kossaifi-J"
            },
            {
                "family_name": "Panagakis",
                "given_name": "Yannis",
                "clpid": "Panagakis-Y"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Pantic",
                "given_name": "Maja",
                "clpid": "Pantic-M"
            }
        ],
        "abstract": "Tensors are higher-order extensions of matrices. While matrix methods form the cornerstone of traditional machine learning and data analysis, tensor methods have been gaining increasing traction. However, software support for tensor operations is not on the same footing. In order to bridge this gap, we have developed TensorLy, a Python library that provides a high-level API for tensor methods and deep tensorized neural networks. TensorLy aims to follow the same standards adopted by the main projects of the Python scientific community, and to seamlessly integrate with them. Its BSD license makes it suitable for both academic and commercial applications. TensorLy's backend system allows users to perform computations with several libraries such as NumPy or PyTorch to name but a few. They can be scaled on multiple CPU or GPU machines. In addition, using the deep-learning frameworks as backend allows to easily design and train deep tensorized neural networks. TensorLy is available at https://github.com/tensorly/tensorly",
        "doi": "10.48550/arXiv.1610.09555",
        "issn": "1533-7928",
        "publisher": "Journal of Machine Learning Research",
        "publication": "Journal of Machine Learning Research",
        "publication_date": "2019-02",
        "series_number": "26",
        "volume": "20",
        "issue": "26",
        "pages": "1-6"
    },
    {
        "id": "authors:jm2vp-qjb50",
        "collection": "authors",
        "collection_id": "jm2vp-qjb50",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20190327-085742729",
        "type": "article",
        "title": "signSGD: Compressed Optimisation for Non-Convex Problems",
        "author": [
            {
                "family_name": "Bernstein",
                "given_name": "Jeremy",
                "orcid": "0000-0001-9110-7476",
                "clpid": "Bernstein-Jeremy-D"
            },
            {
                "family_name": "Wang",
                "given_name": "Yu-Xiang",
                "orcid": "0000-0002-6403-212X",
                "clpid": "Wang-Yu-Xiang"
            },
            {
                "family_name": "Azizzadenesheli",
                "given_name": "Kamyar",
                "orcid": "0000-0001-8507-1868",
                "clpid": "Azizzadenesheli-Kamyar"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            }
        ],
        "abstract": "Training large neural networks requires distributing learning across multiple workers, where the cost of communicating gradients can be a significant bottleneck. signSGD alleviates this problem by transmitting just the sign of each minibatch stochastic gradient. We prove that it can get the best of both worlds: compressed gradients and SGD-level convergence rate. The relative \u2113_1/\u2113_2 geometry of gradients, noise and curvature informs whether signSGD or SGD is theoretically better suited to a particular problem. On the practical side we find that the momentum counterpart of signSGD is able to match the accuracy and convergence speed of Adam on deep Imagenet models. We extend our theory to the distributed setting, where the parameter server uses majority vote to aggregate gradient signs from each worker enabling 1-bit compression of worker-server communication in both directions. Using a theorem by Gauss we prove that majority vote can achieve the same reduction in variance as full precision distributed SGD. Thus, there is great promise for sign-based optimisation schemes to achieve fast communication and fast convergence. Code to reproduce experiments is to be found at https://github.com/jxbz/signSGD.",
        "doi": "10.48550/arXiv.1802.04434",
        "issn": "2640-3498",
        "publisher": "PMLR",
        "publication": "Proceedings of Machine Learning Research",
        "publication_date": "2018-07",
        "volume": "80",
        "pages": "560-569"
    },
    {
        "id": "authors:gyb97-qmz56",
        "collection": "authors",
        "collection_id": "gyb97-qmz56",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20190327-085757099",
        "type": "article",
        "title": "Born Again Neural Networks",
        "author": [
            {
                "family_name": "Furlanello",
                "given_name": "Tommaso",
                "clpid": "Furlanello-T"
            },
            {
                "family_name": "Lipton",
                "given_name": "Zachary C.",
                "clpid": "Lipton-Z-C"
            },
            {
                "family_name": "Tschannen",
                "given_name": "Michael",
                "clpid": "Tschannen-M"
            },
            {
                "family_name": "Itti",
                "given_name": "Laurent",
                "clpid": "Itti-L"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "clpid": "Anandkumar-A"
            }
        ],
        "abstract": "Knowledge Distillation (KD) consists of transferring \"knowledge\" from one machine learning model (the teacher) to another (the student). Commonly, the teacher is a high-capacity model with formidable performance, while the student is more compact. By transferring knowledge, one hopes to benefit from the student's compactness, without sacrificing too much performance. We study KD from a new perspective: rather than compressing models, we train students parameterized identically to their teachers. Surprisingly, these Born-Again Networks (BANs), outperform their teachers significantly, both on computer vision and language modeling tasks. Our experiments with BANs based on DenseNets demonstrate state-of-the-art performance on the CIFAR-10 (3.5%) and CIFAR-100 (15.5%) datasets, by validation error. Additional experiments explore two distillation objectives: (i) Confidence-Weighted by Teacher Max (CWTM) and (ii) Dark Knowledge with Permuted Predictions (DKPP). Both methods elucidate the essential components of KD, demonstrating the effect of the teacher outputs on both predicted and non-predicted classes.",
        "doi": "10.48550/arXiv.1805.04770",
        "issn": "2640-3498",
        "publisher": "PMLR",
        "publication": "Proceedings of Machine Learning Research",
        "publication_date": "2018-07",
        "volume": "80",
        "pages": "1607-1616"
    },
    {
        "id": "authors:cnyfz-31290",
        "collection": "authors",
        "collection_id": "cnyfz-31290",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20190327-085739295",
        "type": "article",
        "title": "StrassenNets: Deep Learning with a Multiplication Budget",
        "author": [
            {
                "family_name": "Tschannen",
                "given_name": "Michael",
                "clpid": "Tschannen-M"
            },
            {
                "family_name": "Khanna",
                "given_name": "Aran",
                "clpid": "Khanna-A"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "clpid": "Anandkumar-A"
            }
        ],
        "abstract": "A large fraction of the arithmetic operations required to evaluate deep neural networks (DNNs) consists of matrix multiplications, in both convolution and fully connected layers. We perform end-to-end learning of low-cost approximations of matrix multiplications in DNN layers by casting matrix multiplications as 2-layer sum-product networks (SPNs) (arithmetic circuits) and learning their (ternary) edge weights from data. The SPNs disentangle multiplication and addition operations and enable us to impose a budget on the number of multiplication operations. Combining our method with knowledge distillation and applying it to image classification DNNs (trained on ImageNet) and language modeling DNNs (using LSTMs), we obtain a first-of-a-kind reduction in number of multiplications (over 99.5%) while maintaining the predictive performance of the full-precision models. Finally, we demonstrate that the proposed framework is able to rediscover Strassen's matrix multiplication algorithm, learning to multiply 2\u00d72 matrices using only 7 multiplications instead of 8.",
        "doi": "10.48550/arXiv.1712.03942",
        "issn": "2640-3498",
        "publisher": "PMLR",
        "publication": "Proceedings of Machine Learning Research",
        "publication_date": "2018-07",
        "volume": "80",
        "pages": "4985-4994"
    },
    {
        "id": "authors:16dxg-bb611",
        "collection": "authors",
        "collection_id": "16dxg-bb611",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20190401-123333151",
        "type": "article",
        "title": "Homotopy Analysis for Tensor PCA",
        "author": [
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Deng",
                "given_name": "Yuan",
                "clpid": "Deng-Yuan"
            },
            {
                "family_name": "Ge",
                "given_name": "Rong",
                "clpid": "Ge-Rong"
            },
            {
                "family_name": "Mobahi",
                "given_name": "Hossein",
                "clpid": "Mobahi-H"
            }
        ],
        "abstract": "Developing efficient and guaranteed nonconvex algorithms has been an important challenge in modern machine learning. Algorithms with good empirical performance such as stochastic gradient descent often lack theoretical guarantees. In this paper, we analyze the class of homotopy or continuation methods for global optimization of nonconvex functions. These methods start from an objective function that is efficient to optimize (e.g. convex), and progressively modify it to obtain the required objective, and the solutions are passed along the homotopy path. For the challenging problem of tensor PCA, we prove global convergence of the homotopy method in the \"high noise\" regime. The signal-to-noise requirement for our algorithm is tight in the sense that it matches the recovery guarantee for the \\em best degree-4 sum-of-squares algorithm. In addition, we prove a phase transition along the homotopy path for tensor PCA. This allows us to simplify the homotopy method to a local search algorithm, viz., tensor power iterations, with a specific initialization and a noise injection procedure, while retaining the theoretical guarantees.",
        "doi": "10.48550/arXiv.1610.09322",
        "issn": "2640-3498",
        "publisher": "PMLR",
        "publication": "Proceedings of Machine Learning Research",
        "publication_date": "2017-07",
        "volume": "65",
        "pages": "79-104"
    },
    {
        "id": "authors:x6ay2-sy046",
        "collection": "authors",
        "collection_id": "x6ay2-sy046",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170920-111802806",
        "type": "article",
        "title": "A Clustering Approach to Learning Sparsely Used Overcomplete Dictionaries",
        "author": [
            {
                "family_name": "Agarwal",
                "given_name": "Alekh",
                "clpid": "Agarwal-A"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Netrapalli",
                "given_name": "Praneeth",
                "clpid": "Netrapalli-P"
            }
        ],
        "abstract": "We consider the problem of learning over complete dictionaries in the context of sparse coding, where each sample selects a sparse subset of dictionary elements. Our main result is a strategy to approximately recover the unknown dictionary using an efficient algorithm. Our algorithm is a clustering-style procedure, where each cluster is used to estimate a dictionary element. The resulting solution can often be further cleaned up to obtain a high accuracy estimate, and we provide one simple scenario where \u2113_1-regularized regression can be used for such a second stage.",
        "doi": "10.1109/TIT.2016.2614684",
        "issn": "0018-9448",
        "publisher": "IEEE",
        "publication": "IEEE Transactions on Information Theory",
        "publication_date": "2017-01",
        "series_number": "1",
        "volume": "63",
        "issue": "1",
        "pages": "575-592"
    },
    {
        "id": "authors:b6q6r-7yf44",
        "collection": "authors",
        "collection_id": "b6q6r-7yf44",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170920-110910164",
        "type": "article",
        "title": "Analyzing Tensor Power Method Dynamics in Overcomplete Regime",
        "author": [
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Ge",
                "given_name": "Rong",
                "clpid": "Ge-Rong"
            },
            {
                "family_name": "Janzamin",
                "given_name": "Majid",
                "clpid": "Janzamin-M"
            }
        ],
        "abstract": "We present a novel analysis of the dynamics of tensor power iterations in the overcomplete regime where the tensor CP rank is larger than the input dimension. Finding the CP decomposition of an overcomplete tensor is NP-hard in general. We consider the case where the tensor components are randomly drawn, and show that the simple power iteration recovers the components with bounded error under mild initialization conditions. We apply our analysis to unsupervised learning of latent variable models, such as multi-view mixture models and spherical Gaussian mixtures. Given the third order moment tensor, we learn the parameters using tensor power iterations. We prove it can correctly learn the model parameters when the number of hidden components k is much larger than the data dimension d, up to k=o(d^(1.5)). We initialize the power iterations with data samples and prove its success under mild conditions on the signal-to-noise ratio of the samples. Our analysis significantly expands the class of latent variable models where spectral methods are applicable. Our analysis also deals with noise in the input tensor leading to sample complexity result in the application to learning latent variable models.",
        "doi": "10.48550/arXiv.1411.1488",
        "issn": "1533-7928",
        "publisher": "Journal of Machine Learning Research",
        "publication": "Journal of Machine Learning Research",
        "publication_date": "2017",
        "series_number": "22",
        "volume": "18",
        "issue": "22",
        "pages": "1-40"
    },
    {
        "id": "authors:06qe9-9s868",
        "collection": "authors",
        "collection_id": "06qe9-9s868",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170927-090108498",
        "type": "article",
        "title": "Learning Sparsely Used Overcomplete Dictionaries via Alternating Minimization",
        "author": [
            {
                "family_name": "Agarwal",
                "given_name": "Alekh",
                "clpid": "Agarwal-A"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Jain",
                "given_name": "Prateek",
                "clpid": "Jain-P"
            },
            {
                "family_name": "Netrapalli",
                "given_name": "Praneeth",
                "clpid": "Netrapalli-P"
            }
        ],
        "abstract": "We consider the problem of sparse coding, where each sample consists of a sparse linear combination of a set of dictionary atoms, and the task is to learn both the dictionary elements and the mixing coefficients. Alternating minimization is a popular heuristic for sparse coding, where the dictionary and the coefficients are estimated in alternate steps, keeping the other fixed. Typically, the coefficients are estimated via \u2113_1 minimization, keeping the dictionary fixed, and the dictionary is estimated through least squares, keeping the coefficients fixed. In this paper, we establish local linear convergence for this variant of alternating minimization and establish that the basin of attraction for the global optimum (corresponding to the true dictionary and the coefficients) is O(1/s^2), where s is the sparsity level in each sample and the dictionary satisfies restricted isometry property. Combined with the recent results of approximate dictionary estimation, this yields provable guarantees for exact recovery of both the dictionary elements and the coefficients, when the dictionary elements are incoherent.",
        "doi": "10.1137/140979861",
        "issn": "1052-6234",
        "publisher": "Society for Industrial and Applied Mathematics",
        "publication": "SIAM Journal of Optimization",
        "publication_date": "2016-12-08",
        "series_number": "4",
        "volume": "26",
        "issue": "4",
        "pages": "2775-2799"
    },
    {
        "id": "authors:tmaq7-9k471",
        "collection": "authors",
        "collection_id": "tmaq7-9k471",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20190401-123326217",
        "type": "article",
        "title": "Open Problem: Approximate Planning of POMDPs in the class of Memoryless Policies",
        "author": [
            {
                "family_name": "Azizzadenesheli",
                "given_name": "Kamyar",
                "orcid": "0000-0001-8507-1868",
                "clpid": "Azizzadenesheli-Kamyar"
            },
            {
                "family_name": "Lazaric",
                "given_name": "Alessandro",
                "orcid": "0000-0002-8970-413X",
                "clpid": "Lazaric-Alessandro"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            }
        ],
        "abstract": "Planning plays an important role in the broad class of decision theory. Planning has drawn much attention in recent work in the robotics and sequential decision making areas. Recently, Reinforcement Learning (RL), as an agent-environment interaction problem, has brought further attention to planning methods. Generally in RL, one can assume a generative model, e.g. graphical models, for the environment, and then the task for the RL agent is to learn the model parameters and find the optimal strategy based on these learnt parameters. Based on environment behavior, the agent can assume various types of generative models, e.g. Multi Armed Bandit for a static environment, or Markov Decision Process (MDP) for a dynamic environment. The advantage of these popular models is their simplicity, which results in tractable methods of learning the parameters and finding the optimal policy. The drawback of these models is again their simplicity: these models usually underfit and underestimate the actual environment behavior. For example, in robotics, the agent usually has noisy observations of the environment inner state and MDP is not a suitable model. \n\nMore complex models like Partially Observable Markov Decision Process (POMDP) can compensate for this drawback. Fitting this model to the environment, where the partial observation is given to the agent, generally gives dramatic performance improvement, sometimes unbounded improvement, compared to MDP. In general, finding the optimal policy for the POMDP model is computationally intractable and fully non convex, even for the class of memoryless policies. The open problem is to come up with a method to find an exact or an approximate optimal stochastic memoryless policy for POMDP models.",
        "doi": "10.48550/arXiv.1608.04996",
        "issn": "2640-3498",
        "publisher": "PMLR",
        "publication": "Proceedings of Machine Learning Research",
        "publication_date": "2016-06",
        "volume": "49",
        "pages": "1639-1642"
    },
    {
        "id": "authors:ttqvx-6ps30",
        "collection": "authors",
        "collection_id": "ttqvx-6ps30",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20190401-123310700",
        "type": "article",
        "title": "Reinforcement Learning of POMDPs using Spectral Methods",
        "author": [
            {
                "family_name": "Azizzadenesheli",
                "given_name": "Kamyar",
                "orcid": "0000-0001-8507-1868",
                "clpid": "Azizzadenesheli-Kamyar"
            },
            {
                "family_name": "Lazaric",
                "given_name": "Alessandro",
                "orcid": "0000-0002-8970-413X",
                "clpid": "Lazaric-Alessandro"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            }
        ],
        "abstract": "We propose a new reinforcement learning algorithm for partially observable Markov decision processes (POMDP) based on spectral decomposition methods. While spectral methods have been previously employed for consistent learning of (passive) latent variable models such as hidden Markov models, POMDPs are more challenging since the learner interacts with the environment and possibly changes the future observations in the process. We devise a learning algorithm running through episodes, in each episode we employ spectral techniques to learn the POMDP parameters from a trajectory generated by a fixed policy. At the end of the episode, an optimization oracle returns the optimal memoryless planning policy which maximizes the expected reward based on the estimated POMDP model. We prove an order-optimal regret bound w.r.t. the optimal memoryless policy and efficient scaling with respect to the dimensionality of observation and action spaces.",
        "doi": "10.48550/arXiv.1602.07764",
        "issn": "2640-3498",
        "publisher": "PMLR",
        "publication": "Proceedings of Machine Learning Research",
        "publication_date": "2016-06",
        "volume": "49",
        "pages": "193-256"
    },
    {
        "id": "authors:vt9tt-kd996",
        "collection": "authors",
        "collection_id": "vt9tt-kd996",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170927-144026647",
        "type": "article",
        "title": "When Are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity",
        "author": [
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Hsu",
                "given_name": "Daniel",
                "clpid": "Hsu-Daniel"
            },
            {
                "family_name": "Janzamin",
                "given_name": "Majid",
                "clpid": "Janzamin-M"
            },
            {
                "family_name": "Kakade",
                "given_name": "Sham",
                "clpid": "Kakade-S-M"
            }
        ],
        "abstract": "Overcomplete latent representations have been very popular for unsupervised feature learning in recent years. In this paper, we specify which overcomplete models can be identified given observable moments of a certain order. We consider probabilistic admixture or topic models in the overcomplete regime, where the number of latent topics can greatly exceed the size of the observed word vocabulary. While general overcomplete topic models are not identifiable, we establish generic identifiability under a constraint, referred to as topic persistence. Our sufficient conditions for identifiability involve a novel set of \"higher order\" expansion conditions on the topic-word matrix or the population structure of the model. This set of higher-order expansion conditions allow for overcomplete models, and require the existence of a perfect matching from latent topics to higher order observed words. We establish that random structured topic models are identifiable w.h.p. in the overcomplete regime. Our identifiability results allows for general (non-degenerate) distributions for modeling the topic proportions, and thus, we can handle arbitrarily correlated topics in our framework. Our identifiability results imply uniqueness of a class of tensor decompositions with structured sparsity which is contained in the class of Tucker decompositions, but is more general than the Candecomp/Parafac (CP) decomposition.",
        "doi": "10.48550/arXiv.1308.2853",
        "issn": "1533-7928",
        "publisher": "Journal of Machine Learning Research",
        "publication": "Journal of Machine Learning Research",
        "publication_date": "2015-12",
        "volume": "16",
        "pages": "2643-2694"
    },
    {
        "id": "authors:vdyfb-6na98",
        "collection": "authors",
        "collection_id": "vdyfb-6na98",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170927-111140656",
        "type": "article",
        "title": "Online Tensor Methods for Learning Latent Variable Models",
        "author": [
            {
                "family_name": "Huang",
                "given_name": "Furong",
                "clpid": "Huang-Furong"
            },
            {
                "family_name": "Niranjan",
                "given_name": "U. N.",
                "clpid": "Niranjan-U-N"
            },
            {
                "family_name": "Hakeem",
                "given_name": "Mohammad Umar",
                "clpid": "Hakeem-M-U"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "clpid": "Anandkumar-A"
            }
        ],
        "abstract": "We introduce an online tensor decomposition based approach for two latent variable modeling problems namely, (1) community detection, in which we learn the latent communities that the social actors in social networks belong to, and (2) topic modeling, in which we infer hidden topics of text articles. We consider decomposition of moment tensors using stochastic gradient descent. We conduct optimization of multilinear operations in SGD and avoid directly forming the tensors, to save computational and storage costs. We present optimized algorithm in two platforms. Our GPU-based implementation exploits the parallelism of SIMD architectures to allow for maximum speed-up by a careful optimization of storage and data transfer, whereas our CPU-based implementation uses efficient sparse matrix computations and is suitable for large sparse data sets. For the community detection problem, we demonstrate accuracy and computational efficiency on Facebook, Yelp and DBLP data sets, and for the topic modeling problem, we also demonstrate good performance on the New York Times data set. We compare our results to the state-of-the-art algorithms such as the variational method, and report a gain of accuracy and a gain of several orders of magnitude in the execution time.",
        "doi": "10.48550/arXiv.1309.0787",
        "issn": "1533-7928",
        "publisher": "Journal of Machine Learning Research",
        "publication": "Journal of Machine Learning Research",
        "publication_date": "2015-12",
        "volume": "16",
        "pages": "2797-2835"
    },
    {
        "id": "authors:ahvyn-gsn72",
        "collection": "authors",
        "collection_id": "ahvyn-gsn72",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170920-142816744",
        "type": "article",
        "title": "A Spectral Algorithm for Latent Dirichlet Allocation",
        "author": [
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Foster",
                "given_name": "Dean P.",
                "orcid": "0000-0002-8503-0270",
                "clpid": "Foster-Dean-P"
            },
            {
                "family_name": "Hsu",
                "given_name": "Daniel",
                "orcid": "0000-0002-3495-7113",
                "clpid": "Hsu-Daniel"
            },
            {
                "family_name": "Kakade",
                "given_name": "Sham M.",
                "clpid": "Kakade-Sham-M"
            },
            {
                "family_name": "Liu",
                "given_name": "Yi-Kai",
                "orcid": "0000-0001-7458-4721",
                "clpid": "Liu-Yi-Kai"
            }
        ],
        "abstract": "Topic modeling is a generalization of clustering that posits that observations (words in a document) are generated by multiple latent factors (topics), as opposed to just one. The increased representational power comes at the cost of a more challenging unsupervised learning problem for estimating the topic-word distributions when only words are observed, and the topics are hidden. This work provides a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of multi-view models and topic models, including latent Dirichlet allocation (LDA). For LDA, the procedure correctly recovers both the topic-word distributions and the parameters of the Dirichlet prior over the topic mixtures, using only trigram statistics (i.e., third order moments, which may be estimated with documents containing just three words). The method is based on an efficiently computable orthogonal tensor decomposition of low-order moments.",
        "doi": "10.1007/s00453-014-9909-1",
        "issn": "0178-4617",
        "publisher": "Springer",
        "publication": "Algorithmica",
        "publication_date": "2015-05",
        "series_number": "1",
        "volume": "72",
        "issue": "1",
        "pages": "193-214"
    },
    {
        "id": "authors:kj7vp-een86",
        "collection": "authors",
        "collection_id": "kj7vp-een86",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20190401-162921773",
        "type": "article",
        "title": "Provable Tensor Methods for Learning Mixtures of Generalized Linear Models",
        "author": [
            {
                "family_name": "Sedghi",
                "given_name": "Hanie",
                "clpid": "Sedghi-H"
            },
            {
                "family_name": "Janzamin",
                "given_name": "Majid",
                "clpid": "Janzamin-M"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Anima",
                "clpid": "Anandkumar-A"
            }
        ],
        "abstract": "We consider the problem of learning mixtures of generalized linear models (GLM) which arise in classification and regression problems. Typical learning approaches such as expectation maximization (EM) or variational Bayes can get stuck in spurious local optima. In contrast, we present a tensor decomposition method which is guaranteed to correctly recover the parameters. The key insight is to employ certain feature transformations of the input, which depend on the input generative model. Specifically, we employ score function tensors of the input and compute their cross-correlation with the response variable. We establish that the decomposition of this tensor consistently recovers the parameters, under mild non-degeneracy conditions. We demonstrate that the computational and sample complexity of our method is a low order polynomial of the input and the latent dimensions.",
        "doi": "10.48550/arXiv.1412.3046",
        "issn": "2640-3498",
        "publisher": "PMLR",
        "publication": "Proceedings of Machine Learning Research",
        "publication_date": "2014-12-09",
        "volume": "51",
        "pages": "1223-1231"
    },
    {
        "id": "authors:9wngc-yb438",
        "collection": "authors",
        "collection_id": "9wngc-yb438",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170927-134735763",
        "type": "article",
        "title": "Tensor Decompositions for Learning Latent Variable Models",
        "author": [
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Ge",
                "given_name": "Rong",
                "clpid": "Ge-Rong"
            },
            {
                "family_name": "Hsu",
                "given_name": "Daniel",
                "clpid": "Hsu-Daniel"
            },
            {
                "family_name": "Kakade",
                "given_name": "Sham M.",
                "clpid": "Kakade-S-M"
            },
            {
                "family_name": "Telgarsky",
                "given_name": "Matus",
                "clpid": "Telgarsky-M"
            }
        ],
        "abstract": "This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models---including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation---which exploits a certain tensor structure in their low-order observable moments (typically, of second- and third-order). Specifically, parameter estimation is reduced to the problem of extracting a certain (orthogonal) decomposition of a symmetric tensor derived from the moments; this decomposition can be viewed as a natural generalization of the singular value decomposition for matrices. Although tensor decompositions are generally intractable to compute, the decomposition of these specially structured tensors can be efficiently obtained by a variety of approaches, including power iterations and maximization approaches (similar to the case of matrices). A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin's perturbation theorem for the singular vectors of matrices. This implies a robust and computationally tractable estimation approach for several popular latent variable models.",
        "doi": "10.48550/arXiv.1210.7559",
        "issn": "1533-7928",
        "publisher": "Journal of Machine Learning Research",
        "publication": "Journal of Machine Learning Research",
        "publication_date": "2014-08",
        "volume": "15",
        "pages": "2773-2832"
    },
    {
        "id": "authors:0ewyv-97w09",
        "collection": "authors",
        "collection_id": "0ewyv-97w09",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170927-093022023",
        "type": "article",
        "title": "A Tensor Approach to Learning Mixed Membership Community Models",
        "author": [
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Ge",
                "given_name": "Rong",
                "clpid": "Ge-Rong"
            },
            {
                "family_name": "Hsu",
                "given_name": "Daniel",
                "clpid": "Hsu-Daniel"
            },
            {
                "family_name": "Kakade",
                "given_name": "Sham M.",
                "clpid": "Kakade-S-M"
            }
        ],
        "abstract": "Community detection is the task of detecting hidden communities from observed interactions. Guaranteed community detection has so far been mostly limited to models with non-overlapping communities such as the stochastic block model. In this paper, we remove this restriction, and provide guaranteed community detection for a family of probabilistic network models with overlapping communities, termed as the mixed membership Dirichlet model, first introduced by Airoldi et al. (2008). This model allows for nodes to have fractional memberships in multiple communities and assumes that the community memberships are drawn from a Dirichlet distribution. Moreover, it contains the stochastic block model as a special case. We propose a unified approach to learning these models via a tensor spectral decomposition method. Our estimator is based on low-order moment tensor of the observed network, consisting of 33-star counts. Our learning method is fast and is based on simple linear algebraic operations, e.g., singular value decomposition and tensor power iterations. We provide guaranteed recovery of community memberships and model parameters and present a careful finite sample analysis of our learning method. As an important special case, our results match the best known scaling requirements for the (homogeneous) stochastic block model.",
        "doi": "10.48550/arXiv.1302.2684",
        "issn": "1533-7928",
        "publisher": "Journal of Machine Learning Research",
        "publication": "Journal of Machine Learning Research",
        "publication_date": "2014-06",
        "volume": "15",
        "pages": "2239-2312"
    },
    {
        "id": "authors:dh3b9-e9480",
        "collection": "authors",
        "collection_id": "dh3b9-e9480",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170925-101553300",
        "type": "article",
        "title": "Active Learning of Multiple Source Multiple Destination Topologies",
        "author": [
            {
                "family_name": "Sattari",
                "given_name": "Pegah",
                "clpid": "Sattari-P"
            },
            {
                "family_name": "Kurant",
                "given_name": "Maciej",
                "clpid": "Kurant-M"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Markopoulou",
                "given_name": "Athina",
                "clpid": "Markopoulou-A"
            },
            {
                "family_name": "Rabbat",
                "given_name": "Michael G.",
                "clpid": "Rabbat-M-G"
            }
        ],
        "abstract": "We consider the problem of inferring the topology of a network with M sources and N receivers (an M-by- N network), by sending probes between the sources and receivers. Prior work has shown that this problem can be decomposed into two parts: first, infer smaller subnetwork components (1-by- N's or 2-by-2's) and then merge them to identify the M-by- N topology. We focus on the second part, which had previously received less attention in the literature. We assume that a 1-by- N topology is given and that all 2-by-2 components can be queried and learned using end-to-end probes. The problem is which 2-by-2's to query and how to merge them with the given 1-by- N, so as to exactly identify the 2-by- N topology, and optimize a number of performance metrics, including the number of queries (which directly translates into measurement bandwidth), time complexity, and memory usage. We provide a lower bound, [N/2], on the number of 2-by-2's required by any active learning algorithm and propose two greedy algorithms. The first algorithm follows the framework of multiple hypothesis testing, in particular Generalized Binary Search (GBS). The second algorithm is called the Receiver Elimination Algorithm (REA) and follows a bottom-up approach. It requires exactly N-1 steps, which is much less than all (2N) possible 2-by-2's. Simulation results demonstrate that both algorithms correctly identify the 2-by- N topology and are near-optimal, but REA is more efficient in practice.",
        "doi": "10.1109/TSP.2014.2304431",
        "issn": "1053-587X",
        "publisher": "IEEE",
        "publication": "IEEE Transactions on Signal Processing",
        "publication_date": "2014-04-15",
        "series_number": "8",
        "volume": "62",
        "issue": "8",
        "pages": "1926-1937"
    },
    {
        "id": "authors:02zya-rxd68",
        "collection": "authors",
        "collection_id": "02zya-rxd68",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170927-142820777",
        "type": "article",
        "title": "High-Dimensional Covariance Decomposition into Sparse Markov and Independence Models",
        "author": [
            {
                "family_name": "Janzamin",
                "given_name": "Majid",
                "clpid": "Janzamin-M"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "clpid": "Anandkumar-A"
            }
        ],
        "abstract": "Fitting high-dimensional data involves a delicate tradeoff between faithful representation and the use of sparse models. Too often, sparsity assumptions on the fitted model are too restrictive to provide a faithful representation of the observed data. In this paper, we present a novel framework incorporating sparsity in different domains. We decompose the observed covariance matrix into a sparse Gaussian Markov model (with a sparse precision matrix) and a sparse independence model (with a sparse covariance matrix). Our framework incorporates sparse covariance and sparse precision estimation as special cases and thus introduces a richer class of high-dimensional models. We posit the observed data as generated from a linear combination of a sparse Gaussian Markov model (with a sparse precision matrix) and a sparse Gaussian independence model (with a sparse covariance matrix). We characterize sufficient conditions for identifiability of the two models, viz., Markov and independence models. We propose an efficient decomposition method based on a modification of the popular \u2113_1-penalized maximum- likelihood estimator (\u2113_1-MLE). We establish that our estimator is consistent in both the domains, i.e., it successfully recovers the supports of both Markov and independence models, when the number of samples n scales as n=\u03a9(d^2log p), where p is the number of variables and d is the maximum node degree in the Markov model. Our experiments validate these results and also demonstrate that our models have better inference accuracy under simple algorithms such as loopy belief propagation.",
        "doi": "10.48550/arXiv.1211.0919",
        "issn": "1533-7928",
        "publisher": "Journal of Machine Learning Research",
        "publication": "Journal of Machine Learning Research",
        "publication_date": "2014-04",
        "volume": "15",
        "pages": "1549-1591"
    },
    {
        "id": "authors:nas3a-knz03",
        "collection": "authors",
        "collection_id": "nas3a-knz03",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170920-142253537",
        "type": "article",
        "title": "Seeing through black boxes: Tracking transactions through queues under monitoring resource constraints",
        "author": [
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "He",
                "given_name": "Ting",
                "clpid": "He-Ting"
            },
            {
                "family_name": "Bisdikian",
                "given_name": "Chatschik",
                "clpid": "Bisdikian-C"
            },
            {
                "family_name": "Agrawal",
                "given_name": "Dakshi",
                "clpid": "Agrawal-D"
            }
        ],
        "abstract": "The problem of optimal allocation of monitoring resources for tracking transactions progressing through a distributed system, modeled as a queueing network, is considered. Two forms of monitoring information are considered, viz., locally unique transaction identifiers, and arrival and departure timestamps of transactions at each processing queue. The timestamps are assumed to be available at all the queues but in the absence of identifiers, only enable imprecise tracking since parallel processing can result in out-of-order departures. On the other hand, identifiers enable precise tracking but are not available without proper instrumentation. Given an instrumentation budget, only a subset of queues can be selected for the production of identifiers, while the remaining queues have to resort to imprecise tracking using timestamps. The goal is then to optimally allocate the instrumentation budget to maximize the overall tracking accuracy. The challenge is that the optimal allocation strategy depends on accuracies of timestamp-based tracking at different queues, which has complex dependencies on the arrival and service processes, and the queueing discipline. We propose two simple heuristics for allocation by predicting the order of timestamp-based tracking accuracies of different queues. We derive sufficient conditions for these heuristics to achieve optimality through the notion of the stochastic comparison of queues. Simulations show that our heuristics are close to optimality, even when the parameters deviate from these conditions.",
        "doi": "10.1016/j.peva.2013.08.003",
        "issn": "0166-5316",
        "publisher": "Elsevier",
        "publication": "Performance Evaluation",
        "publication_date": "2013-12",
        "series_number": "12",
        "volume": "70",
        "issue": "12",
        "pages": "1090-1110"
    },
    {
        "id": "authors:w15ds-38039",
        "collection": "authors",
        "collection_id": "w15ds-38039",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170920-132342501",
        "type": "article",
        "title": "Topology discovery of sparse random graphs with few participants",
        "author": [
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Hassidim",
                "given_name": "Avinatan",
                "clpid": "Hassidim-A"
            },
            {
                "family_name": "Kelner",
                "given_name": "Jonathan",
                "clpid": "Kelner-J"
            }
        ],
        "abstract": "We consider the task of topology discovery of sparse random graphs using end-to-end random measurements (e.g., delay) between a subset of nodes, referred to as the participants. The rest of the nodes are hidden, and do not provide any information for topology discovery. We consider topology discovery under two routing models: (a) the participants exchange messages along the shortest paths and obtain end-to-end measurements, and (b) additionally, the participants exchange messages along the second shortest path. For scenario (a), our proposed algorithm results in a sub-linear edit-distance guarantee using a sub-linear number of uniformly selected participants. For scenario (b), we obtain a much stronger result, and show that we can achieve consistent reconstruction when a sub-linear number of uniformly selected nodes participate. This implies that accurate discovery of sparse random graphs is tractable using an extremely small number of participants. We finally obtain a lower bound on the number of participants required by any algorithm to reconstruct the original random graph up to a given edit distance. We also demonstrate that while consistent discovery is tractable for sparse random graphs using a small number of participants, in general, there are graphs which cannot be discovered by any algorithm even with a significant number of participants, and with the availability of end-to-end information along all the paths between the participants.",
        "doi": "10.1002/rsa.20420",
        "issn": "1042-9832",
        "publisher": "Wiley",
        "publication": "Random Structures & Algorithms",
        "publication_date": "2013-08",
        "series_number": "1",
        "volume": "43",
        "issue": "1",
        "pages": "16-48"
    },
    {
        "id": "authors:04z10-a8295",
        "collection": "authors",
        "collection_id": "04z10-a8295",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170927-104250746",
        "type": "article",
        "title": "Learning loopy graphical models with latent variables: Efficient methods and guarantees",
        "author": [
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Valluvan",
                "given_name": "Ragupathyraj",
                "clpid": "Valluvan-R"
            }
        ],
        "abstract": "The problem of structure estimation in graphical models with latent variables is considered. We characterize conditions for tractable graph estimation and develop efficient methods with provable guarantees. We consider models where the underlying Markov graph is locally tree-like, and the model is in the regime of correlation decay. For the special case of the Ising model, the number of samples n required for structural consistency of our method scales as n=\u03a9(\u03b8^(\u2212\u03b4\u03b7(\u03b7+1)\u22122)_(min)log p), where p is the number of variables, \u03b8_(min) is the minimum edge potential, \u03b4 is the depth (i.e., distance from a hidden node to the nearest observed nodes), and \u03b7 is a parameter which depends on the bounds on node and edge potentials in the Ising model. Necessary conditions for structural consistency under any algorithm are derived and our method nearly matches the lower bound on sample requirements. Further, the proposed method is practical to implement and provides flexibility to control the number of latent variables and the cycle lengths in the output graph.",
        "doi": "10.48550/arXiv.1203.3887",
        "issn": "0090-5364",
        "publisher": "Institute of Mathematical Statistics",
        "publication": "Annals of Statistics",
        "publication_date": "2013",
        "series_number": "2",
        "volume": "41",
        "issue": "2",
        "pages": "401-435"
    },
    {
        "id": "authors:a5bhb-28r93",
        "collection": "authors",
        "collection_id": "a5bhb-28r93",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170927-091743601",
        "type": "article",
        "title": "High-Dimensional Gaussian Graphical Model Selection: Walk Summability and Local Separation Criterion",
        "author": [
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Tan",
                "given_name": "Vincent Y. F.",
                "clpid": "Tan-Vincent-Y-F"
            },
            {
                "family_name": "Huang",
                "given_name": "Furong",
                "clpid": "Huang-Furong"
            },
            {
                "family_name": "Willsky",
                "given_name": "Alan S.",
                "clpid": "Willsky-A-S"
            }
        ],
        "abstract": "We consider the problem of high-dimensional Gaussian graphical model selection. We identify a set of graphs for which an efficient estimation algorithm exists, and this algorithm is based on thresholding of empirical conditional covariances. Under a set of transparent conditions, we establish structural consistency (or sparsistency) for the proposed algorithm, when the number of samples n=\u03a9(J_(min)^(-2) log p), where p is the number of variables and J_(min) is the minimum (absolute) edge potential of the graphical model. The sufficient conditions for sparsistency are based on the notion of walk-summability of the model and the presence of sparse local vertex separators in the underlying graph. We also derive novel non-asymptotic necessary conditions on the number of samples required for sparsistency.",
        "doi": "10.48550/arXiv.1107.1270",
        "issn": "1533-7928",
        "publisher": "Journal of Machine Learning Research",
        "publication": "Journal of Machine Learning Research",
        "publication_date": "2012-08",
        "volume": "13",
        "pages": "2293-2337"
    },
    {
        "id": "authors:96w9p-6n432",
        "collection": "authors",
        "collection_id": "96w9p-6n432",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20120820-094221711",
        "type": "article",
        "title": "Feedback Message Passing for Inference in Gaussian Graphical Models",
        "author": [
            {
                "family_name": "Liu",
                "given_name": "Ying",
                "clpid": "Liu-Ying"
            },
            {
                "family_name": "Chandrasekaran",
                "given_name": "Venkat",
                "clpid": "Chandrasekaran-V"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Willsky",
                "given_name": "Alan S.",
                "clpid": "Willsky-A-S"
            }
        ],
        "abstract": "While loopy belief propagation (LBP) performs reasonably well for inference in some Gaussian graphical models with cycles, its performance is unsatisfactory for many others. In particular for some models LBP does not converge, and in general when it does converge, the computed variances are incorrect (except for cycle-free graphs for which belief propagation (BP) is non-iterative and exact). In this paper we propose feedback message passing (FMP), a message-passing algorithm that makes use of a special set of vertices (called a feedback vertex set or FVS) whose removal results in a cycle-free graph. In FMP, standard BP is employed several times on the cycle-free subgraph excluding the FVS while a special message-passing scheme is used for the nodes in the FVS. The computational complexity of exact inference is O(k^(2)n), where is the number of feedback nodes, and is the total number of nodes. When the size of the FVS is very large, FMP is computationally costly. Hence we propose approximate FMP, where a pseudo-FVS is used instead of an FVS, and where inference in the non-cycle-free graph obtained by removing the pseudo-FVS is carried out approximately using LBP. We show that, when approximate FMP converges, it yields exact means and variances on the pseudo-FVS and exact means throughout the remainder of the graph. We also provide theoretical results on the convergence and accuracy of approximate FMP. In particular, we prove error bounds on variance computation. Based on these theoretical results, we design efficient algorithms to select a pseudo-FVS of bounded size. The choice of the pseudo-FVS allows us to explicitly trade off between efficiency and accuracy. Experimental results show that using a pseudo-FVS of size no larger than log (n), this procedure converges much more often, more quickly, and provides more accurate results than LBP on the entire graph.",
        "doi": "10.1109/TSP.2012.2195656",
        "issn": "1053-587X",
        "publisher": "IEEE",
        "publication": "IEEE Transactions on Signal Processing",
        "publication_date": "2012-08",
        "series_number": "8",
        "volume": "60",
        "issue": "8",
        "pages": "4135-4150"
    },
    {
        "id": "authors:zm5sy-k6p16",
        "collection": "authors",
        "collection_id": "zm5sy-k6p16",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170927-101515951",
        "type": "article",
        "title": "High-dimensional structure estimation in Ising models: Local separation criterion",
        "author": [
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "orcid": "0000-0002-6974-6797",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Tan",
                "given_name": "Vincent Y. F.",
                "clpid": "Tan-Vincent-Y-F"
            },
            {
                "family_name": "Huang",
                "given_name": "Furong",
                "clpid": "Huang-Furong"
            },
            {
                "family_name": "Willsky",
                "given_name": "Alan S.",
                "clpid": "Willsky-Alan-S"
            }
        ],
        "abstract": "We consider the problem of high-dimensional Ising (graphical) model selection. We propose a simple algorithm for structure estimation based on the thresholding of the empirical conditional variation distances. We introduce a novel criterion for tractable graph families, where this method is efficient, based on the presence of sparse local separators between node pairs in the underlying graph. For such graphs, the proposed algorithm has a sample complexity of n=\u03a9(J^(\u22122)_(min)log p), where p is the number of variables, and J_(min) is the minimum (absolute) edge potential in the model. We also establish nonasymptotic necessary and sufficient conditions for structure estimation.",
        "doi": "10.48550/arXiv.1107.1736",
        "issn": "0090-5364",
        "publisher": "Institute of Mathematical Statistics",
        "publication": "Annals of Statistics",
        "publication_date": "2012-06",
        "series_number": "3",
        "volume": "40",
        "issue": "3",
        "pages": "1346-1375"
    },
    {
        "id": "authors:hsndz-x7g55",
        "collection": "authors",
        "collection_id": "hsndz-x7g55",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170925-094601829",
        "type": "article",
        "title": "Robust Rate Maximization Game Under Bounded Channel Uncertainty",
        "author": [
            {
                "family_name": "Anandkumar",
                "given_name": "Amod J. G.",
                "clpid": "Anandkumar-Amod-J-G"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Lambotharan",
                "given_name": "Sangarapillai",
                "clpid": "Lambotharan-S"
            },
            {
                "family_name": "Chambers",
                "given_name": "Jonathon A.",
                "clpid": "Chambers-J-A"
            }
        ],
        "abstract": "We consider the problem of decentralized power allocation for competitive rate maximization in a frequency-selective Gaussian interference channel under bounded channel uncertainty. We formulate a distribution-free robust framework for the rate maximization game. We present the robust optimization equilibrium for this game and derive sufficient conditions for its existence and uniqueness. We show that an iterative waterfilling algorithm converges to this equilibrium under certain sufficient conditions. We analyze the social properties of the equilibrium under varying channel uncertainty bounds for the two-user case. We also observe an interesting phenomenon that the equilibrium moves toward a frequency-division multiple-access solution for any set of channel coefficients under increasing channel uncertainty bounds. We further prove that increasing channel uncertainty can lead to a more efficient equilibrium and, hence, a better sum rate in certain two-user communication systems. Finally, we confirm, through simulations, that this improvement in equilibrium efficiency is also observed in systems with a higher number of users.",
        "doi": "10.1109/TVT.2011.2171011",
        "issn": "0018-9545",
        "publisher": "IEEE",
        "publication": "IEEE Transactions on Vehicular Technology",
        "publication_date": "2011-11",
        "series_number": "9",
        "volume": "60",
        "issue": "9",
        "pages": "4471-4486"
    },
    {
        "id": "authors:c3saa-sgs90",
        "collection": "authors",
        "collection_id": "c3saa-sgs90",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170927-144736867",
        "type": "article",
        "title": "Learning High-Dimensional Markov Forest Distributions: Analysis of Error Rates",
        "author": [
            {
                "family_name": "Tan",
                "given_name": "Vincent Y. F.",
                "clpid": "Tan-Vincent-Y-F"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Willsky",
                "given_name": "Alan S.",
                "clpid": "Willsky-A-S"
            }
        ],
        "abstract": "The problem of learning forest-structured discrete graphical models from i.i.d. samples is considered. An algorithm based on pruning of the Chow-Liu tree through adaptive thresholding is proposed. It is shown that this algorithm is both structurally consistent and risk consistent and the error probability of structure learning decays faster than any polynomial in the number of samples under fixed model size. For the high-dimensional scenario where the size of the model d and the number of edges k scale with the number of samples n, sufficient conditions on (n,d,k) are given for the algorithm to satisfy structural and risk consistencies. In addition, the extremal structures for learning are identified; we prove that the independent (resp., tree) model is the hardest (resp., easiest) to learn using the proposed algorithm in terms of error rates for structure learning.",
        "doi": "10.48550/arXiv.1005.0766",
        "issn": "1533-7928",
        "publisher": "Journal of Machine Learning Research",
        "publication": "Journal of Machine Learning Research",
        "publication_date": "2011-06",
        "volume": "12",
        "pages": "1617-1653"
    },
    {
        "id": "authors:fx9my-swj04",
        "collection": "authors",
        "collection_id": "fx9my-swj04",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170927-100701408",
        "type": "article",
        "title": "Learning Latent Tree Graphical Models",
        "author": [
            {
                "family_name": "Choi",
                "given_name": "Myung Jin",
                "clpid": "Choi-Myung Jin"
            },
            {
                "family_name": "Tan",
                "given_name": "Vincent Y. F.",
                "clpid": "Tan-Vincent-Y-F"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Willsky",
                "given_name": "Alan S.",
                "clpid": "Willsky-A-S"
            }
        ],
        "abstract": "We study the problem of learning a latent tree graphical model where samples are available only from a subset of variables. We propose two consistent and computationally efficient algorithms for learning minimal latent trees, that is, trees without any redundant hidden nodes. Unlike many existing methods, the observed nodes (or variables) are not constrained to be leaf nodes. Our algorithms can be applied to both discrete and Gaussian random variables and our learned models are such that all the observed and latent variables have the same domain (state space). Our first algorithm, recursive grouping, builds the latent tree recursively by identifying sibling groups using so-called information distances. One of the main contributions of this work is our second algorithm, which we refer to as CLGrouping. CLGrouping starts with a pre-processing procedure in which a tree over the observed variables is constructed. This global step groups the observed nodes that are likely to be close to each other in the true latent tree, thereby guiding subsequent recursive grouping (or equivalent procedures such as neighbor-joining) on much smaller subsets of variables. This results in more accurate and efficient learning of latent trees. We also present regularized versions of our algorithms that learn latent tree approximations of arbitrary distributions. We compare the proposed algorithms to other methods by performing extensive numerical experiments on various latent tree graphical models such as hidden Markov models and star graphs. In addition, we demonstrate the applicability of our methods on real-world data sets by modeling the dependency structure of monthly stock returns in the S&amp;P index and of the words in the 20 newsgroups data set.",
        "doi": "10.48550/arXiv.1009.2722",
        "issn": "1533-7928",
        "publisher": "Journal of Machine Learning Research",
        "publication": "Journal of Machine Learning Research",
        "publication_date": "2011-05",
        "volume": "12",
        "pages": "1771-1812"
    },
    {
        "id": "authors:pbdd0-km498",
        "collection": "authors",
        "collection_id": "pbdd0-km498",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170922-133040888",
        "type": "article",
        "title": "Distributed Algorithms for Learning and Cognitive Medium Access with Logarithmic Regret",
        "author": [
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Michael",
                "given_name": "Nithin",
                "clpid": "Michael-N"
            },
            {
                "family_name": "Tang",
                "given_name": "Ao Kevin",
                "orcid": "0000-0001-6296-644X",
                "clpid": "Tang-Ao"
            },
            {
                "family_name": "Swami",
                "given_name": "Ananthram",
                "clpid": "Swami-A"
            }
        ],
        "abstract": "The problem of distributed learning and channel access is considered in a cognitive network with multiple secondary users. The availability statistics of the channels are initially unknown to the secondary users and are estimated using sensing decisions. There is no explicit information exchange or prior agreement among the secondary users and sensing and access decisions are undertaken by them in a completely distributed manner. We propose policies for distributed learning and access which achieve order-optimal cognitive system throughput (number of successful secondary transmissions) under self play, i.e., when implemented at all the secondary users. Equivalently, our policies minimize the sum regret in distributed learning and access, which is the loss in secondary throughput due to learning and distributed access. For the scenario when the number of secondary users is known to the policy, we prove that the total regret is logarithmic in the number of transmission slots. This policy achieves order-optimal regret based on a logarithmic lower bound for regret under any uniformly-good learning and access policy. We then consider the case when the number of secondary users is fixed but unknown, and is estimated at each user through feedback. We propose a policy whose sum regret grows only slightly faster than logarithmic in the number of transmission slots.",
        "doi": "10.1109/JSAC.2011.110406",
        "issn": "0733-8716",
        "publisher": "IEEE",
        "publication": "IEEE Journal on Selected Areas in Communications",
        "publication_date": "2011-04",
        "series_number": "4",
        "volume": "29",
        "issue": "4",
        "pages": "731-745"
    },
    {
        "id": "authors:4464m-myq24",
        "collection": "authors",
        "collection_id": "4464m-myq24",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170922-092634649",
        "type": "article",
        "title": "A Large-Deviation Analysis of the Maximum-Likelihood Learning of Markov Tree Structures",
        "author": [
            {
                "family_name": "Tan",
                "given_name": "Vincent Y. F.",
                "clpid": "Tan-Vincent-Y-F"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Tong",
                "given_name": "Lang",
                "clpid": "Tong-Lang"
            },
            {
                "family_name": "Willsky",
                "given_name": "Alan S.",
                "clpid": "Willsky-A-S"
            }
        ],
        "abstract": "The problem of maximum-likelihood (ML) estimation of discrete tree-structured distributions is considered. Chow and Liu established that ML-estimation reduces to the construction of a maximum-weight spanning tree using the empirical mutual information quantities as the edge weights. Using the theory of large-deviations, we analyze the exponent associated with the error probability of the event that the ML-estimate of the Markov tree structure differs from the true tree structure, given a set of independently drawn samples. By exploiting the fact that the output of ML-estimation is a tree, we establish that the error exponent is equal to the exponential rate of decay of a single dominant crossover event. We prove that in this dominant crossover event, a non-neighbor node pair replaces a true edge of the distribution that is along the path of edges in the true tree graph connecting the nodes in the non-neighbor pair. Using ideas from Euclidean information theory, we then analyze the scenario of ML-estimation in the very noisy learning regime and show that the error exponent can be approximated as a ratio, which is interpreted as the signal-to-noise ratio (SNR) for learning tree distributions. We show via numerical experiments that in this regime, our SNR approximation is accurate.",
        "doi": "10.1109/TIT.2011.2104513",
        "issn": "0018-9448",
        "publisher": "IEEE",
        "publication": "IEEE Transactions on Information Theory",
        "publication_date": "2011-03",
        "series_number": "3",
        "volume": "57",
        "issue": "3",
        "pages": "1714-1735"
    },
    {
        "id": "authors:9at92-pns60",
        "collection": "authors",
        "collection_id": "9at92-pns60",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170922-082655078",
        "type": "article",
        "title": "Learning Gaussian Tree Models: Analysis of Error Exponents and Extremal Structures",
        "author": [
            {
                "family_name": "Tan",
                "given_name": "Vincent Y. F.",
                "clpid": "Tan-Vincent-Y-F"
            },
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Willsky",
                "given_name": "Alan S.",
                "clpid": "Willsky-A-S"
            }
        ],
        "abstract": "The problem of learning tree-structured Gaussian graphical models from independent and identically distributed (i.i.d.) samples is considered. The influence of the tree structure and the parameters of the Gaussian distribution on the learning rate as the number of samples increases is discussed. Specifically, the error exponent corresponding to the event that the estimated tree structure differs from the actual unknown tree structure of the distribution is analyzed. Finding the error exponent reduces to a least-squares problem in the very noisy learning regime. In this regime, it is shown that the extremal tree structure that minimizes the error exponent is the star for any fixed set of correlation coefficients on the edges of the tree. If the magnitudes of all the correlation coefficients are less than 0.63, it is also shown that the tree structure that maximizes the error exponent is the Markov chain. In other words, the star and the chain graphs represent the hardest and the easiest structures to learn in the class of tree-structured Gaussian graphical models. This result can also be intuitively explained by correlation decay: pairs of nodes which are far apart, in terms of graph distance, are unlikely to be mistaken as edges by the maximum-likelihood estimator in the asymptotic regime.",
        "doi": "10.1109/TSP.2010.2042478",
        "issn": "1053-587X",
        "publisher": "IEEE",
        "publication": "IEEE Transactions on Signal Processing",
        "publication_date": "2010-05",
        "series_number": "5",
        "volume": "58",
        "issue": "5",
        "pages": "2701-2714"
    },
    {
        "id": "authors:z3yyh-d3t74",
        "collection": "authors",
        "collection_id": "z3yyh-d3t74",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170921-155701400",
        "type": "article",
        "title": "Energy scaling laws for distributed inference in random fusion networks",
        "author": [
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Yukich",
                "given_name": "Joseph E.",
                "clpid": "Yukich-J-E"
            },
            {
                "family_name": "Tong",
                "given_name": "Lang",
                "clpid": "Tong-Lang"
            },
            {
                "family_name": "Swami",
                "given_name": "Ananthram",
                "clpid": "Swami-A"
            }
        ],
        "abstract": "The energy scaling laws of multihop data fusion networks for distributed inference are considered. The fusion network consists of randomly located sensors distributed i.i.d. according to a general spatial distribution in an expanding region. Under Markov random field (MRF) hypotheses, among the class of data-fusion policies which enable optimal statistical inference at the fusion center using all the sensor measurements, the policy with the minimum average energy consumption is bounded below by the average energy of fusion along the minimum spanning tree, and above by a suboptimal policy, referred to as Data Fusion for Markov Random Fields (DFMRF). Scaling laws are derived for the energy consumption of the optimal and suboptimal fusion policies. It is shown that the average asymptotic energy of the DFMRF scheme is strictly finite for a class of MRF models with Euclidean stabilizing dependency graphs.",
        "doi": "10.1109/JSAC.2009.090916",
        "issn": "0733-8716",
        "publisher": "IEEE",
        "publication": "IEEE Journal on Selected Areas in Communications",
        "publication_date": "2009-09",
        "series_number": "7",
        "volume": "27",
        "issue": "7",
        "pages": "1203-1217"
    },
    {
        "id": "authors:h718a-79k59",
        "collection": "authors",
        "collection_id": "h718a-79k59",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170920-162015185",
        "type": "article",
        "title": "Detection of Gauss-Markov Random Fields With Nearest-Neighbor Dependency",
        "author": [
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Tong",
                "given_name": "Lang",
                "clpid": "Tong-Lang"
            },
            {
                "family_name": "Swami",
                "given_name": "Ananthram",
                "clpid": "Swami-A"
            }
        ],
        "abstract": "The problem of hypothesis testing against independence for a Gauss-Markov random field (GMRF) is analyzed. Assuming an acyclic dependency graph, an expression for the log-likelihood ratio of detection is derived. Assuming random placement of nodes over a large region according to the Poisson or uniform distribution and nearest-neighbor dependency graph, the error exponent of the Neyman-Pearson detector is derived using large-deviations theory. The error exponent is expressed as a dependency-graph functional and the limit is evaluated through a special law of large numbers for stabilizing graph functionals. The exponent is analyzed for different values of the variance ratio and correlation. It is found that a more correlated GMRF has a higher exponent at low values of the variance ratio whereas the situation is reversed at high values of the variance ratio.",
        "doi": "10.1109/TIT.2008.2009855",
        "issn": "0018-9448",
        "publisher": "IEEE",
        "publication": "IEEE Transactions on Information Theory",
        "publication_date": "2009-02",
        "series_number": "2",
        "volume": "55",
        "issue": "2",
        "pages": "816-827"
    },
    {
        "id": "authors:c0hac-kqd07",
        "collection": "authors",
        "collection_id": "c0hac-kqd07",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170920-160524319",
        "type": "article",
        "title": "Optimal Node Density for Detection in Energy-Constrained Random Networks",
        "author": [
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Tong",
                "given_name": "Lang",
                "clpid": "Tong-Lang"
            },
            {
                "family_name": "Swami",
                "given_name": "Ananthram",
                "clpid": "Swami-A"
            }
        ],
        "abstract": "The problem of optimal node density maximizing the Neyman-Pearson detection error exponent subject to a constraint on average (per node) energy consumption is analyzed. The spatial correlation among the sensor measurements is incorporated through a Gauss-Markov random field (GMRF) model with Euclidean nearest-neighbor dependency graph. A constant density deployment of sensors under the uniform or Poisson distribution is assumed. It is shown that the optimal node density crucially depends on the ratio between the measurement variances under the two hypotheses and displays a threshold behavior. Below the threshold value of the variance ratio, the optimal node density tends to infinity under any feasible average energy constraint. On the other hand, when the variance ratio is above the threshold, the optimal node density is the minimum value at which it is feasible to process and deliver the likelihood ratio (sufficient statistic) of the sensor measurements to the fusion center. In this regime of the variance ratio, an upper bound on the optimal node density based on a proposed 2-approximation fusion scheme and a lower bound based on the minimum spanning tree are established. Under an alternative formulation where the energy consumption per unit area is constrained, the optimal node density is shown to be strictly finite for all values of the variance ratio and bounds on this optimal node density are provided.",
        "doi": "10.1109/TSP.2008.928514",
        "issn": "1053-587X",
        "publisher": "IEEE",
        "publication": "IEEE Transactions on Signal Processing",
        "publication_date": "2008-10",
        "series_number": "10",
        "volume": "56",
        "issue": "10",
        "pages": "5232-5245"
    },
    {
        "id": "authors:yts23-dqp36",
        "collection": "authors",
        "collection_id": "yts23-dqp36",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170920-155855969",
        "type": "article",
        "title": "Distributed Estimation Via Random Access",
        "author": [
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Tong",
                "given_name": "Lang",
                "clpid": "Tong-Lang"
            },
            {
                "family_name": "Swami",
                "given_name": "Ananthram",
                "clpid": "Swami-A"
            }
        ],
        "abstract": "In this correspondence, the problem of distributed Bayesian estimation is considered in the context of a wireless sensor network. The Bayesian estimation performance is analyzed in terms of the expected Fisher information normalized by the transmission rate of the sensors. The sensors use a communication scheme known as the type-based random access (TBRA) scheme. Under a constraint on the expected transmission energy, an optimal spatio-temporal allocation scheme that maximizes the performance metric is characterized. It is shown that the performance metric is crucially dependent on the fading parameter known as the channel coherence index. For channels with low coherence indices, sensor transmissions tend to cancel each other, and there exists an optimal finite mean transmission rate that maximizes the performance metric. On the other hand, for channels with high coherence indices, there should be as many simultaneous transmissions as allowed by the network. The presence of a critical coherence index where the change from one behavior to another occurs is established.",
        "doi": "10.1109/TIT.2008.924652",
        "issn": "0018-9448",
        "publisher": "IEEE",
        "publication": "IEEE Transactions on Information Theory",
        "publication_date": "2008-07",
        "series_number": "7",
        "volume": "54",
        "issue": "7",
        "pages": "3175-3181"
    },
    {
        "id": "authors:c5kfj-s1x74",
        "collection": "authors",
        "collection_id": "c5kfj-s1x74",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170920-153723886",
        "type": "article",
        "title": "Type-Based Random Access for Distributed Detection Over Multiaccess Fading Channels",
        "author": [
            {
                "family_name": "Anandkumar",
                "given_name": "Animashree",
                "clpid": "Anandkumar-A"
            },
            {
                "family_name": "Tong",
                "given_name": "Lang",
                "clpid": "Tong-Lang"
            }
        ],
        "abstract": "The problem of distributed detection in a sensor network over multiaccess fading channels is considered. A random-access transmission scheme referred to as the type-based random access (TBRA) is proposed and analyzed. Error exponents of TBRA under noncoherent detection are characterized with respect to the mean transmission rate and the channel-coherence index. For the zero-mean multiaccess fading channels, it is shown that there exists an optimal mean-transmission rate that maximizes the detection-error exponents. The optimal mean-transmission rate can be calculated numerically or estimated using the Gaussian approximation, and it gives a sensor-activation strategy that achieves an optimal allocation of transmission energy to spatial and temporal domains. Numerical examples and simulations are used to compare TBRA with the conventional centralized time-division multiple access (TDMA) scheme. It is shown that for the zero-mean multiaccess fading channels, TBRA gives substantial improvement in the low signal-to-noise ratio (SNR) regime whereas for the nonzero mean fading channels, TBRA performs better over a wide range of SNR.",
        "doi": "10.1109/TSP.2007.896302",
        "issn": "1053-587X",
        "publisher": "IEEE",
        "publication": "IEEE Transactions on Signal Processing",
        "publication_date": "2007-10",
        "series_number": "10",
        "volume": "55",
        "issue": "10",
        "pages": "5032-5043"
    }
]