[PDF] Semi Parametric Estimation In Network Data And Tools For Conducting Complex Simulation Studies In Causal Inference - eBooks Review

Semi Parametric Estimation In Network Data And Tools For Conducting Complex Simulation Studies In Causal Inference


Semi Parametric Estimation In Network Data And Tools For Conducting Complex Simulation Studies In Causal Inference
DOWNLOAD

Download Semi Parametric Estimation In Network Data And Tools For Conducting Complex Simulation Studies In Causal Inference PDF/ePub or read online books in Mobi eBooks. Click Download or Read Online button to get Semi Parametric Estimation In Network Data And Tools For Conducting Complex Simulation Studies In Causal Inference book now. This website allows unlimited access to, at the time of writing, more than 1.5 million titles, including hundreds of thousands of titles in various foreign languages. If the content not found or just blank you must refresh this page



Semi Parametric Estimation In Network Data And Tools For Conducting Complex Simulation Studies In Causal Inference


Semi Parametric Estimation In Network Data And Tools For Conducting Complex Simulation Studies In Causal Inference
DOWNLOAD
Author : Oleg Sofrygin
language : en
Publisher:
Release Date : 2016

Semi Parametric Estimation In Network Data And Tools For Conducting Complex Simulation Studies In Causal Inference written by Oleg Sofrygin and has been published by this book supported file pdf, txt, epub, kindle and other format this book has been release on 2016 with categories.


This dissertation is concerned with application of robust semi-parametric methods to problems of estimation in network-dependent data and the conduct of large-scale simulation studies for causal inference research in epidemiological and medical data. Specifically, Chapter 1 presents a modern semi-parametric approach to estimation of causal effects in a population connected by a single social network. The connectivity of the population units will typically imply that the observed data on these units is no longer independent and identically distributed. Moreover, such social settings typically result in highly dimensional data. This chapter contributes to current statistical methodology by presenting an approach that allows valid estimation and inference and addresses the statistical issues specific to such networked population datasets. The framework of semi-parametric estimation, called the targeted maximum likelihood estimation (TMLE), is presented. This framework improves upon the existing methods by offering robustness, weakened sensitivity to near positivity violations, as well as the ability to deal with high-dimensionality issues of social network data. In particular, this approach relies on the accurate reflection of the background knowledge available for a given scientific problem, allowing estimation and inference without having to make unrealistic assumptions about the structure of the data. In addition, this chapter generalizes previous work describing estimation of complex causal parameters, such as the direct treatment effects under interference and the causal effects of interventions on social network structure. Although the past decade has produced many contributions towards estimation of causal effects in social network settings, there has been considerably less research on the topic of variance estimation for such highly-dependent data. This chapter presents an approach to constructing valid inference, providing a variance estimator that is scalable to very large datasets with highly-connected observations. The efficient open-source software implementation of these methods also accompanies this chapter. Chapter 2 presents open-source software tools for conduct of reproducible simulation studies for complex parameters that emerge from application of causal inference methods in epidemiological and medical research. This simulation software is build on the framework of non-parametric structural equation modeling. This chapter also studies simulation-based testing of statistical methods in causal inference for longitudinal data with time-varying exposure and confounding. It contributes to existing literature by presenting a unified syntax for non-parametrically defining complex causal parameters, which can be used as the model-free and agnostic gold standard for comparison of different statistical methods for causal inference. For instance, this chapter provides various examples of specification and evaluation of causal parameters that arise naturally in longitudinal causal effect analyses when using marginal structural models (MSMs). The application of these newly developed software tools to replication of several previously published simulation studies in causal inference are also described. Chapter 3 builds on the work described in Chapter 2 and addresses the issue of dependent data simulation for causal inference research in social network data. In particular, it provides a model-free approach to test the validity of various estimation procedures in simulated network-settings. This chapter first outlines a non-parametric causal model for units connected by a network and provides various applied examples of simulations with social network data. This chapter also showcases a possible application of the highly scalable open-source software implementation of the semi-parametric estimation methods described in Chapter 1. In particular, a large scale social network simulation study is described, and the performance of three dependent-data estimators from Chapter 1 is examined. This simulation study also examines the problem of inference for network-dependent data, specifically, by comparing the performance of the dependent-data TMLE variance estimator from Chapter 1 to the true TMLE variance derived from simulations. Finally, Chapter 3 concludes with a simulation study of an HIV epidemic described in terms of a longitudinal process which evolves over a static network in discrete time-steps among several highly inter-connected communities. The abstracts of the three works which make up this dissertation are reproduced below. Chapter 1: This chapter describes the robust semi-parametric approach towards estimation and inference for the sample average treatment-specific mean in observational settings where data are collected on a single network of connected units (e.g., in the presence of interference or spillover). Despite recent advances, many of the currently used statistical methods rely on assumption of a specific parametric model for the outcome, even though some of the most important statistical assumptions required by these models are most likely violated in the observational network data settings, resulting in invalid and anti-conservative statistical inference. In this chapter, we rely on the recent methodological advances for the targeted maximum likelihood estimation (TMLE) for data collected on a single population of causally connected units, to describe an estimation approach that permits for more realistic classes of data-generative models and provides valid statistical inference in the context of such network-dependent data. The approach is applied to an observational setting with a single time point stochastic intervention. We start by assuming that the true observed data-generating distribution belongs to a large class of semi-parametric statistical models. We then impose some restrictions on the possible set of the data-generative distributions that may belong to our statistical model. For example, we assume that the dependence among units can be fully described by the known network, and that the dependence on other units can be summarized via some known (but otherwise arbitrary) summary measures. We show that under our modeling assumptions, our estimand is equivalent to an estimand in a hypothetical IID data distribution, where the latter distribution is a function of the observed network data-generating distribution. With this key insight in mind, we show that the TMLE for our estimand in dependent network data can be described as a certain IID data TMLE algorithm, also resulting in a new simplified approach to conducting statistical inference. We demonstrate the validity of our approach in a network simulation study. We also extend prior work on dependent-data TMLE towards estimation of novel causal parameters, e.g., the unit-specific direct treatment effects under interference and the effects of interventions that modify the initial network structure. Chapter 2: This chapter introduces the \pkg{simcausal} \proglang{R} package - an open-source software tool for specification and simulation of complex longitudinal data structures that are based on non-parametric structural equation models. The package aims to provide a flexible tool for simplifying the conduct of transparent and reproducible simulation studies, with a particular emphasis on the types of data and interventions frequently encountered in real-world causal inference problems, such as, observational data with time-dependent confounding, selection bias, and random monitoring processes. The package interface allows for concise expression of complex functional dependencies between a large number of nodes, where each node may represent a measurement at a specific time point. The package allows for specification and simulation of counterfactual data under various user-specified interventions (e.g., static, dynamic, deterministic, or stochastic). In particular, the interventions may represent exposures to treatment regimens, the occurrence or non-occurrence of right-censoring events, or of clinical monitoring events. Finally, the package enables the computation of a selected set of user-specified features of the distribution of the counterfactual data that represent common causal quantities of interest, such as, treatment-specific means, the average treatment effects and coefficients from working marginal structural models. The applicability of \pkg{simcausal} is demonstrated by replicating the results of two published simulation studies. Chapter 3: The past decade has seen an increasing body of literature devoted to the estimation of causal effects in network-dependent data. However, the validity of many classical statistical methods in such data is often questioned. There is an emerging need for objective and practical ways to assess which causal methodologies might be applicable and valid in such novel network-based datasets. In this chapter we describe a set of tools implemented as part of the \pkg{simcausal} \proglang{R} package that allow simulating data based on the non-parametric structural equation model for connected units. We also provide examples of how these simulations may be applied to evaluation of different statistical methods for estimation of causal effects in such data. In particular, these simulation tools are targeted to the types of data and interventions frequently encountered in real-world causal inference research in social networks, such as, observational studies with spill-over or interference. We developed a novel \proglang{R} language interface which simplifies the specification of network-based functional relationships between connected units. Moreover, this network-based syntax can be combined with.



Towards Distribution Free Interpretation Inference And Network Estimation


Towards Distribution Free Interpretation Inference And Network Estimation
DOWNLOAD
Author : Yue Gao (Ph.D.)
language : en
Publisher:
Release Date : 2023

Towards Distribution Free Interpretation Inference And Network Estimation written by Yue Gao (Ph.D.) and has been published by this book supported file pdf, txt, epub, kindle and other format this book has been release on 2023 with categories.


In the era of AI, statistical or machine learning methods towards distribution-free assumptions are becoming increasingly important due to the growing amount of data that is being collected and analyzed. Traditional parametric methods may not always be appropriate or may lead to model mis-specification and inaccurate results when dealing with large or complex data sets. Besides, as specific distributional assumptions or parametric modeling are removed, the challenge of model interpretation and prediction inference arises and has been currently at the forefront of research efforts. One problem of our interests in this regard is non-parametric or semi-parametric network estimation for data that are not independent. Specifically, influence network estimation from a multi-variate point process or time series data is a problem of fundamental importance. Prior work has focused on parametric approaches that require a known parametric model, which makes estimation procedures less robust to model mis-specification, non-linearities and heterogeneities. In Chapter 2, we develop a semi-parametric approach based on the monotone single-index multi-variate autoregressive model (SIMAM) which addresses these challenges. In particular, rather than using standard parametric approaches, we use the monotone single index model (SIM) for network estimation. We provide theoretical guarantees for dependent data, and an alternating projected gradient descent algorithm. Significantly we achieve rates of the form O(T^{-1/3} \sqrt{s\log(TM)}) (optimal in the independent design case) where s is {he number of edges in the influence network that indicates the sparsity level, M is the number of actors and T is the number of time points. In addition, we demonstrate the performance of SIMAM both on simulated data and two real data examples, and show it outperforms state-of-the-art parametric methods both in terms of prediction and network estimation. Another aspect important for distribution-free or model-free learning is the interpretation, i.e. to make the complicated non-parametric predictive models explainable. A number of model-agnostic methods for measuring variable importance (VI) have emerged in recent times, which assess the difference in predictive power between a full model trained on all variables and a reduced model that omits the variable(s) of interest. However, these methods typically encounter a bottleneck when estimating the reduced model for each variable or subset of variables, which is both costly and lacks theoretical guarantees. To address this problem, Chapter 3 proposes an efficient and adaptable approach for approximating the reduced model while ensuring important inferential guarantees. Specifically, we replace the need for fully retraining a wide neural network with a linearization that is initiated using the full model parameters. By including a ridge-like penalty to make the problem convex, we establish that our method can estimate the variable importance measure with an error rate of O({1}/{\sqrt{n}), where n represents the number of training samples, provided that the ridge penalty parameter is adequately large. Furthermore, we demonstrate that our estimator is asymptotically normal, enabling us to provide confidence bounds for the VI estimates. Finally, we demonstrate the method's speed and accuracy under different data-generating regimes and showcase its applicability in a real-world seasonal climate forecasting example. In addition to semi-parametric network estimation and fast estimation of variable importance for interpretation, an efficient method for prediction inference without specific distributional assumptions on the data is of our interest as well. In Chapter 4, we present a novel, computationally-efficient algorithm for predictive inference (PI) that requires no distributional assumptions in the data and can be computed faster than existing bootstrap-type methods for neural networks. Specifically, if there are $n$ training samples, bootstrap methods require training a model on each of the n subsamples of size n-1; for large models like neural networks, this process can be computationally prohibitive. In contrast, the proposed method trains one neural network on the full dataset with ([epsilon], [delta]) -differential privacy (DP) and then approximates each leave-one-out model efficiently using a linear approximation around the neural network estimate. With exchangeable data, we prove that our approach has a rigorous coverage guarantee that depends on the preset privacy parameters and the stability of the neural network, regardless of the data distribution. Simulations and experiments on real data demonstrate that our method satisfies the coverage guarantees with substantially reduced computation compared to bootstrap methods.



Towards Distribution Free Interpretation Inference And Network Estimation


Towards Distribution Free Interpretation Inference And Network Estimation
DOWNLOAD
Author : Yue Gao (Ph.D.)
language : en
Publisher:
Release Date : 2023

Towards Distribution Free Interpretation Inference And Network Estimation written by Yue Gao (Ph.D.) and has been published by this book supported file pdf, txt, epub, kindle and other format this book has been release on 2023 with categories.


In the era of AI, statistical or machine learning methods towards distribution-free assumptions are becoming increasingly important due to the growing amount of data that is being collected and analyzed. Traditional parametric methods may not always be appropriate or may lead to model mis-specification and inaccurate results when dealing with large or complex data sets. Besides, as specific distributional assumptions or parametric modeling are removed, the challenge of model interpretation and prediction inference arises and has been currently at the forefront of research efforts. One problem of our interests in this regard is non-parametric or semi-parametric network estimation for data that are not independent. Specifically, influence network estimation from a multi-variate point process or time series data is a problem of fundamental importance. Prior work has focused on parametric approaches that require a known parametric model, which makes estimation procedures less robust to model mis-specification, non-linearities and heterogeneities. In Chapter 2, we develop a semi-parametric approach based on the monotone single-index multi-variate autoregressive model (SIMAM) which addresses these challenges. In particular, rather than using standard parametric approaches, we use the monotone single index model (SIM) for network estimation. We provide theoretical guarantees for dependent data, and an alternating projected gradient descent algorithm. Significantly we achieve rates of the form O(T^{-1/3} \sqrt{s\log(TM)}) (optimal in the independent design case) where s is {he number of edges in the influence network that indicates the sparsity level, M is the number of actors and T is the number of time points. In addition, we demonstrate the performance of SIMAM both on simulated data and two real data examples, and show it outperforms state-of-the-art parametric methods both in terms of prediction and network estimation. Another aspect important for distribution-free or model-free learning is the interpretation, i.e. to make the complicated non-parametric predictive models explainable. A number of model-agnostic methods for measuring variable importance (VI) have emerged in recent times, which assess the difference in predictive power between a full model trained on all variables and a reduced model that omits the variable(s) of interest. However, these methods typically encounter a bottleneck when estimating the reduced model for each variable or subset of variables, which is both costly and lacks theoretical guarantees. To address this problem, Chapter 3 proposes an efficient and adaptable approach for approximating the reduced model while ensuring important inferential guarantees. Specifically, we replace the need for fully retraining a wide neural network with a linearization that is initiated using the full model parameters. By including a ridge-like penalty to make the problem convex, we establish that our method can estimate the variable importance measure with an error rate of O({1}/{\sqrt{n}), where n represents the number of training samples, provided that the ridge penalty parameter is adequately large. Furthermore, we demonstrate that our estimator is asymptotically normal, enabling us to provide confidence bounds for the VI estimates. Finally, we demonstrate the method's speed and accuracy under different data-generating regimes and showcase its applicability in a real-world seasonal climate forecasting example. In addition to semi-parametric network estimation and fast estimation of variable importance for interpretation, an efficient method for prediction inference without specific distributional assumptions on the data is of our interest as well. In Chapter 4, we present a novel, computationally-efficient algorithm for predictive inference (PI) that requires no distributional assumptions in the data and can be computed faster than existing bootstrap-type methods for neural networks. Specifically, if there are $n$ training samples, bootstrap methods require training a model on each of the n subsamples of size n-1; for large models like neural networks, this process can be computationally prohibitive. In contrast, the proposed method trains one neural network on the full dataset with ([epsilon], [delta]) -differential privacy (DP) and then approximates each leave-one-out model efficiently using a linear approximation around the neural network estimate. With exchangeable data, we prove that our approach has a rigorous coverage guarantee that depends on the preset privacy parameters and the stability of the neural network, regardless of the data distribution. Simulations and experiments on real data demonstrate that our method satisfies the coverage guarantees with substantially reduced computation compared to bootstrap methods.



An Introduction To Causal Inference


An Introduction To Causal Inference
DOWNLOAD
Author : Judea Pearl
language : en
Publisher: Createspace Independent Publishing Platform
Release Date : 2015

An Introduction To Causal Inference written by Judea Pearl and has been published by Createspace Independent Publishing Platform this book supported file pdf, txt, epub, kindle and other format this book has been release on 2015 with Causation categories.


This paper summarizes recent advances in causal inference and underscores the paradigmatic shifts that must be undertaken in moving from traditional statistical analysis to causal analysis of multivariate data. Special emphasis is placed on the assumptions that underly all causal inferences, the languages used in formulating those assumptions, the conditional nature of all causal and counterfactual claims, and the methods that have been developed for the assessment of such claims. These advances are illustrated using a general theory of causation based on the Structural Causal Model (SCM) described in Pearl (2000a), which subsumes and unifies other approaches to causation, and provides a coherent mathematical foundation for the analysis of causes and counterfactuals. In particular, the paper surveys the development of mathematical tools for inferring (from a combination of data and assumptions) answers to three types of causal queries: (1) queries about the effects of potential interventions, (also called "causal effects" or "policy evaluation") (2) queries about probabilities of counterfactuals, (including assessment of "regret," "attribution" or "causes of effects") and (3) queries about direct and indirect effects (also known as "mediation"). Finally, the paper defines the formal and conceptual relationships between the structural and potential-outcome frameworks and presents tools for a symbiotic analysis that uses the strong features of both. The tools are demonstrated in the analyses of mediation, causes of effects, and probabilities of causation. -- p. 1.



Targeted Minimum Loss Based Estimation


Targeted Minimum Loss Based Estimation
DOWNLOAD
Author : Samuel David Lendle
language : en
Publisher:
Release Date : 2015

Targeted Minimum Loss Based Estimation written by Samuel David Lendle and has been published by this book supported file pdf, txt, epub, kindle and other format this book has been release on 2015 with categories.


Causal inference generally requires making some assumptions on a causal mechanism followed by statistical estimation. The statistical estimation problem in causal inference is often that of estimating a pathwise differentiable parameter in a semiparametric or nonparametric model. Targeted minimum loss-based estimating (TMLE) is a framework for constructing an asymptotically linear plug-in estimator for such parameters. The natural direct effect (NDE) is a parameter that quantifies how some treatment affects some outcome directly, as opposed to indirectly through some mediator value between the treatment and outcome on the causal pathway. In Chapter 2, we introduce the NDE among the untreated and show that under some assumptions the NDE among the untreated is identifiable and equivalent to a statistical parameter as the so called average treatment effect among the untreated. We then present a locally efficient, doubly robust TMLE for the statistical target parameter and apply it to the estimation of the NDE among the untreated in simulations and of the NDE in a data set from an RCT. Some estimators that adjust for the propensity score (PS) nonparametrically, such as PS matching or stratification by the PS, are robust to slight misspecification of the PS estimator. In particular, if the PS estimator fails to estimate the true propensity score, but still approximates some other balancing score, such methods are still consistent for average treatment effect (ATE). In Chapter 3, we extend a traditional TMLE for the ATE to have this property while still being locally efficient and doubly robust and investigate the performance of the proposed estimator in a simulation study. Online estimators are estimators that process a relatively small piece of a data set at a time, and can be updated as more data becomes available. Typically, online estimators are used in the large scale machine learning literature, but to our knowledge, have not been used to estimate statistical parameters associated with causal parameters. In Chapter 4, we propose two online estimators for the ATE that are asymptotically efficient and doubly robust in a single pass through a data set. The first is similar to the augmented inverse probability of treatment weighting estimator in the batch setting, and the second involves an additional targeting step inspired by TMLE, which improves performance in some cases. We investigate the performance of both in a simulation study.



Semiparametric Estimation In Network Formation Models With Homophily And Degree Heterogeneity


Semiparametric Estimation In Network Formation Models With Homophily And Degree Heterogeneity
DOWNLOAD
Author : Peter Toth
language : en
Publisher:
Release Date : 2017

Semiparametric Estimation In Network Formation Models With Homophily And Degree Heterogeneity written by Peter Toth and has been published by this book supported file pdf, txt, epub, kindle and other format this book has been release on 2017 with categories.


This paper considers a semiparametric version of the network formation model of Graham (2017). The two-way fixed-effects binary choice model allows for homophily and degree heterogeneity, but unlike Graham (2017) leaves the distribution of pair-specific unobservables unspecified. Identification of the slope parameters and fixed effects follows from a novel approach that does not rely on distributional assumptions. The identification strategy suggests an estimator for the slope parameters based upon tetrads of nodes within the network. A computationally simple version of this estimator is shown to be consistent with a non-parametric convergence rate. A consistent estimator of the fixed effects is also provided. Partial identification, for the case of discrete covariate support, and an extension to nonlinear fixed effects are also considered.



Targeted Learning


Targeted Learning
DOWNLOAD
Author : Mark J. van der Laan
language : en
Publisher: Springer Science & Business Media
Release Date : 2011-06-17

Targeted Learning written by Mark J. van der Laan and has been published by Springer Science & Business Media this book supported file pdf, txt, epub, kindle and other format this book has been release on 2011-06-17 with Mathematics categories.


The statistics profession is at a unique point in history. The need for valid statistical tools is greater than ever; data sets are massive, often measuring hundreds of thousands of measurements for a single subject. The field is ready to move towards clear objective benchmarks under which tools can be evaluated. Targeted learning allows (1) the full generalization and utilization of cross-validation as an estimator selection tool so that the subjective choices made by humans are now made by the machine, and (2) targeting the fitting of the probability distribution of the data toward the target parameter representing the scientific question of interest. This book is aimed at both statisticians and applied researchers interested in causal inference and general effect estimation for observational and experimental data. Part I is an accessible introduction to super learning and the targeted maximum likelihood estimator, including related concepts necessary to understand and apply these methods. Parts II-IX handle complex data structures and topics applied researchers will immediately recognize from their own research, including time-to-event outcomes, direct and indirect effects, positivity violations, case-control studies, censored data, longitudinal data, and genomic studies.



Unbroken Circles


Unbroken Circles
DOWNLOAD
Author : Cecilia B. Loving
language : en
Publisher:
Release Date : 2020-03-31

Unbroken Circles written by Cecilia B. Loving and has been published by this book supported file pdf, txt, epub, kindle and other format this book has been release on 2020-03-31 with categories.


A book of poetry dedicated to the restorative justice practice of circle-keeping.



The Economics Of Artificial Intelligence


The Economics Of Artificial Intelligence
DOWNLOAD
Author : Ajay Agrawal
language : en
Publisher: University of Chicago Press
Release Date : 2024-03-05

The Economics Of Artificial Intelligence written by Ajay Agrawal and has been published by University of Chicago Press this book supported file pdf, txt, epub, kindle and other format this book has been release on 2024-03-05 with Business & Economics categories.


A timely investigation of the potential economic effects, both realized and unrealized, of artificial intelligence within the United States healthcare system. In sweeping conversations about the impact of artificial intelligence on many sectors of the economy, healthcare has received relatively little attention. Yet it seems unlikely that an industry that represents nearly one-fifth of the economy could escape the efficiency and cost-driven disruptions of AI. The Economics of Artificial Intelligence: Health Care Challenges brings together contributions from health economists, physicians, philosophers, and scholars in law, public health, and machine learning to identify the primary barriers to entry of AI in the healthcare sector. Across original papers and in wide-ranging responses, the contributors analyze barriers of four types: incentives, management, data availability, and regulation. They also suggest that AI has the potential to improve outcomes and lower costs. Understanding both the benefits of and barriers to AI adoption is essential for designing policies that will affect the evolution of the healthcare system.



Causal Inference In Statistics


Causal Inference In Statistics
DOWNLOAD
Author : Judea Pearl
language : en
Publisher: John Wiley & Sons
Release Date : 2016-01-25

Causal Inference In Statistics written by Judea Pearl and has been published by John Wiley & Sons this book supported file pdf, txt, epub, kindle and other format this book has been release on 2016-01-25 with Mathematics categories.


CAUSAL INFERENCE IN STATISTICS A Primer Causality is central to the understanding and use of data. Without an understanding of cause–effect relationships, we cannot use data to answer questions as basic as "Does this treatment harm or help patients?" But though hundreds of introductory texts are available on statistical methods of data analysis, until now, no beginner-level book has been written about the exploding arsenal of methods that can tease causal information from data. Causal Inference in Statistics fills that gap. Using simple examples and plain language, the book lays out how to define causal parameters; the assumptions necessary to estimate causal parameters in a variety of situations; how to express those assumptions mathematically; whether those assumptions have testable implications; how to predict the effects of interventions; and how to reason counterfactually. These are the foundational tools that any student of statistics needs to acquire in order to use statistical methods to answer causal questions of interest. This book is accessible to anyone with an interest in interpreting data, from undergraduates, professors, researchers, or to the interested layperson. Examples are drawn from a wide variety of fields, including medicine, public policy, and law; a brief introduction to probability and statistics is provided for the uninitiated; and each chapter comes with study questions to reinforce the readers understanding.