Biggio, B., Nelson, B., and Laskov, P. Poisoning attacks against support vector machines. I recommend you to change the following parameters to your liking. The more recent Neural Tangent Kernel gives an elegant way to understand gradient descent dynamics in function space. I am grateful to my supervisor Tasnim Azad Abir sir, for his . We'll consider bilevel optimization in the context of the ideas covered thus far in the course. Wojnowicz, M., Cruz, B., Zhao, X., Wallace, B., Wolff, M., Luan, J., and Crable, C. "Influence sketching": Finding influential samples in large-scale regressions. This is a better choice if you want all the bells-and-whistles of a near-state-of-the-art model. Thomas, W. and Cook, R. D. Assessing influence on predictions from generalized linear models. Infinite Limits and Overparameterization [Slides]. 2018. Here, we used CIFAR-10 as dataset. ? Uses cases Roadmap 2 Reviving an "old technique" from Robust statistics: Influence function can take significant amounts of disk space (100s of GBs) but with a fast SSD A spherical analysis of Adam with batch normalization. Haoping Xu, Zhihuan Yu, and Jingcheng Niu. Understanding Black-box Predictions via Influence Functions - ResearchGate Goodfellow, I. J., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. Thus, you can easily find mislabeled images in your dataset, or Y. LeCun, L. Bottou, G. B. Orr, and K.-R. Muller. In this paper, we use influence functions a classic technique from robust statistics to trace a models prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. Huang, L., Joseph, A. D., Nelson, B., Rubinstein, B. I., and Tygar, J. Adversarial machine learning. A sign-up sheet will be distributed via email. Cook, R. D. Assessment of local influence. calculated. Most importantnly however, s_test is only Model selection in kernel based regression using the influence function. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al. Understanding Black-box Predictions via Influence Functions So far, we've assumed gradient descent optimization, but we can get faster convergence by considering more general dynamics, in particular momentum. This will naturally lead into next week's topic, which applies similar ideas to a different but related dynamical system. Check if you have access through your login credentials or your institution to get full access on this article. This is a PyTorch reimplementation of Influence Functions from the ICML2017 best paper: Understanding Black-box Predictions via Influence Functions by Pang Wei Koh and Percy Liang. , . S. L. Smith, B. Dherin, D. Barrett, and S. De. the algorithm will then calculate the influence functions for all images by test images, the harmfulness is ordered by average harmfullness to the Understanding Black-box Predictions via Influence Functions International Conference on Machine Learning (ICML), 2017. Ribeiro, M. T., Singh, S., and Guestrin, C. "why should I trust you? Amershi, S., Chickering, M., Drucker, S. M., Lee, B., Simard, P., and Suh, J. Modeltracker: Redesigning performance analysis tools for machine learning. Things get more complicated when there are multiple networks being trained simultaneously to different cost functions. We'll consider the two most common techniques for bilevel optimization: implicit differentiation, and unrolling. All information about attending virtual lectures, tutorials, and office hours will be sent to enrolled students through Quercus. config is a dict which contains the parameters used to calculate the We see how to approximate the second-order updates using conjugate gradient or Kronecker-factored approximations. your individual test dataset. Despite its simplicity, linear regression provides a surprising amount of insight into neural net training. This code replicates the experiments from the following paper: Understanding Black-box Predictions via Influence Functions. A classic result by Radford Neal showed that (using proper scaling) the distribution of functions of random neural nets approaches a Gaussian process. C. Maddison, D. Paulin, Y.-W. Teh, B. O'Donoghue, and A. Doucet. An evaluation of the human-interpretability of explanation. Why neural nets generalize despite their enormous capacity is intimiately tied to the dynamics of training. [ICML] Understanding Black-box Predictions via Influence Functions Another difference from the study of optimization is that the goal isn't simply to fit a finite training set, but rather to generalize. To scale up influence functions to modern machine learning settings, The power of interpolation: Understanding the effectiveness of SGD in modern over-parameterized learning. Often we want to identify an influential group of training samples in a particular test prediction for a given We study the task of hardness amplification which transforms a hard function into a harder one. compress your dataset slightly to the most influential images important for A Survey of Methods for Explaining Black Box Models thereby identifying training points most responsible for a given prediction. CSC2541 Winter 2021 - Department of Computer Science, University of Toronto If the influence function is calculated for multiple The ACM Digital Library is published by the Association for Computing Machinery. Z. Kolter, and A. Talwalkar. In this paper, we use influence functions -- a classic technique from robust statistics -- to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. Negative momentum for improved game dynamics. Using machine teaching to identify optimal training-set attacks on machine learners. On linear models and convolutional neural networks, we demonstrate that influence functions are useful for multiple purposes: understanding model behavior, debugging models, detecting dataset errors, and even creating visually-indistinguishable training-set attacks. G. Zhang, S. Sun, D. Duvenaud, and R. Grosse. On linear models and convolutional neural networks, we demonstrate that influence functions are useful for multiple purposes: understanding model behavior, debugging models, detecting dataset errors, and even creating visually-indistinguishable training-set attacks.See more on this video at https://www.microsoft.com/en-us/research/video/understanding-black-box-predictions-via-influence-functions/ Rather, the aim is to give you the conceptual tools you need to reason through the factors affecting training in any particular instance. values s_test and grad_z for each training image are computed on the fly Li, J., Monroe, W., and Jurafsky, D. Understanding neural networks through representation erasure. To scale up influence functions to modern machine learning settings, we develop a simple, efficient implementation that requires only oracle access to gradients and Hessian-vector products. In, Cadamuro, G., Gilad-Bachrach, R., and Zhu, X. Debugging machine learning models. In contrast with TensorFlow and PyTorch, JAX has a clean NumPy-like interface which makes it easy to use things like directional derivatives, higher-order derivatives, and differentiating through an optimization procedure. can speed up the calculation significantly as no duplicate calculations take Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I. J., Harp, A., Irving, G., Isard, M., Jia, Y., Jzefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Man, D., Monga, R., Moore, S., Murray, D. G., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P. A., Vanhoucke, V., Vasudevan, V., Vigas, F. B., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., and Zheng, X. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. We have a reproducible, executable, and Dockerized version of these scripts on Codalab. We'll start off the class by analyzing a simple model for which the gradient descent dynamics can be determined exactly: linear regression. This is a tentative schedule, which will likely change as the course goes on. Limitations of the empirical Fisher approximation for natural gradient descent. PVANet: Lightweight Deep Neural Networks for Real-time Object Detection. affecting everything else. << In. the training dataset were the most helpful, whereas the Harmful images were the , Hessian-vector . influence-instance. In this paper, we use influence functions a classic technique from robust statistics to trace a models prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. Understanding Black-box Predictions via Influence Functions Understanding Black-box Predictions via Inuence Functions 2. PW Koh*, KS Ang*, H Teo*, PS Liang. In this paper, we use influence functions a classic technique from robust statistics to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. On linear models and convolutional neural networks, we demonstrate that influence functions are useful for multiple purposes: understanding model behavior, debugging models, detecting dataset errors, and even creating visually-indistinguishable training-set attacks. Understanding Blackbox Prediction via Influence Functions - SlideShare Understanding Black-box Predictions via Influence Functions Idea: use Influence Functions to observe the influence of the test samples from the training samples. x\Y#7r~_}2;4,>Fvv,ZduwYTUQP }#&uD,spdv9#?Kft&e&LS 5[^od7Z5qg(]}{__+3"Bej,wofUl)u*l$m}FX6S/7?wfYwoF4{Hmf83%TF#}{c}w( kMf*bLQ?C}?J2l1jy)>$"^4Rtg+$4Ld{}Q8k|iaL_@8v Cook, R. D. and Weisberg, S. Characterizations of an empirical influence function for detecting influential cases in regression. On the limited memory BFGS method for large scale optimization. No description, website, or topics provided. When testing for a single test image, you can then In many cases, they have far more than enough parameters to memorize the data, so why do they generalize well? The next figure shows the same but for a different model, DenseNet-100/12. You signed in with another tab or window. Gradient-based Hyperparameter Optimization through Reversible Learning. Online delivery. You can get the default config by calling ptif.get_default_config(). WhiteBox Part 2: Interpretable Machine Learning - TooTouch Understanding black-box predictions via influence functions. Some of the ideas have been established decades ago (and perhaps forgotten by much of the community), and others are just beginning to be understood today. One would have expected this success to require overcoming significant obstacles that had been theorized to exist. in terms of the dataset. Often we want to identify an influential group of training samples in a particular test prediction for a given machine learning model. Understanding Black-box Predictions via Influence Functions. The algorithm moves then Alex Adam, Keiran Paster, and Jenny (Jingyi) Liu, 25% Colab notebook and paper presentation. We show that even on non-convex and non-differentiable models Appendix: Understanding Black-box Predictions via Inuence Functions Pang Wei Koh1Percy Liang1 Deriving the inuence functionIup,params For completeness, we provide a standard derivation of theinuence functionIup,params in the context of loss minimiza-tion (M-estimation). Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks. Understanding Black-box Predictions via Influence Functions Reference Understanding Black-box Predictions via Influence Functions Lectures will be delivered synchronously via Zoom, and recorded for asynchronous viewing by enrolled students. In Proceedings of the international conference on machine learning (ICML). The canonical example in machine learning is hyperparameter optimization. Krizhevsky, A., Sutskever, I., and Hinton, G. E. Imagenet classification with deep convolutional neural networks. Stochastic gradient descent as approximate Bayesian inference. D. Maclaurin, D. Duvenaud, and R. P. Adams. He, M. Narayanan, S. Gershman, B. Kim, and F. Doshi-Velez. Why Use Influence Functions? Liu, D. C. and Nocedal, J. We have a reproducible, executable, and Dockerized version of these scripts on Codalab. Goodman, B. and Flaxman, S. European union regulations on algorithmic decision-making and a "right to explanation". grad_z on the other hand is only dependent on the training In this paper, we use influence functions a classic technique from robust statistics to trace a . To scale up influence functions to modern machine learning A. M. Saxe, J. L. McClelland, and S. Ganguli. We'll then consider how the gradient noise in SGD optimization can contribute an implicit regularization effect, Bayesian or non-Bayesian. calculations, which could potentially be 10s of thousands. Here, we plot I up,loss against variants that are missing these terms and show that they are necessary for picking up the truly inuential training points. . /Filter /FlateDecode test images, the helpfulness is ordered by average helpfulness to the understanding model behavior, debugging models, detecting dataset errors, In, Mei, S. and Zhu, X. Data poisoning attacks on factorization-based collaborative filtering. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, Chris Zhang, Dami Choi, Anqi (Joyce) Yang. In this paper, we use influence functions -- a classic technique from robust statistics -- to trace a model's prediction through . Dependencies: Numpy/Scipy/Scikit-learn/Pandas Check out CSC2541 for the Busy. A. S. Benjamin, D. Rolnick, and K. P. Kording. Riemannian metrics for neural networks I: Feed-forward networks. Koh, Pang Wei. Students are encouraged to attend synchronous lectures to ask questions, but may also attend office hours or use Piazza. A unified analysis of extra-gradient and optimistic gradient methods for saddle point problems: Proximal point approach. S. McCandish, J. Kaplan, D. Amodei, and the OpenAI Dota Team. We'll mostly focus on minimax optimization, or zero-sum games. The Noisy natural gradient as variational inference. James Tu, Yangjun Ruan, and Jonah Philion. Helpful is a list of numbers, which are the IDs of the training data samples The marking scheme is as follows: The problem set will give you a chance to practice the content of the first three lectures, and will be due on Feb 10. How can we explain the predictions of a black-box model? On the Accuracy of Influence Functions for Measuring - ResearchGate the prediction outcomes of an entire dataset or even >1000 test samples. In. Understanding Black-box Predictions via Influence Functions Which algorithmic choices matter at which batch sizes? I'll attempt to convey our best modern understanding, as incomplete as it may be. The reference implementation can be found here: link. This could be because we explicitly build optimization into the architecture, as in MAML or Deep Equilibrium Models. Github Kansagara, D., Englander, H., Salanitro, A., Kagen, D., Theobald, C., Freeman, M., and Kripalani, S. Risk prediction models for hospital readmission: a systematic review. . We use cookies to ensure that we give you the best experience on our website. insignificant. approximations to influence functions can still provide valuable information. we demonstrate that influence functions are useful for multiple purposes: With the rapid adoption of machine learning systems in sensitive applications, there is an increasing need to make black-box models explainable. Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., and Darrell, T. Decaf: A deep convolutional activation feature for generic visual recognition. Please download or close your previous search result export first before starting a new bulk export. LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradient-based learning applied to document recognition. Neural tangent kernel: Convergence and generalization in neural networks. Please try again. Google Scholar Krizhevsky A, Sutskever I, Hinton GE, 2012. Neither is it the sort of theory class where we prove theorems for the sake of proving theorems. If there are n samples, it can be interpreted as 1/n. J. Lucas, S. Sun, R. Zemel, and R. Grosse. For modern neural nets, the analysis is more often descriptive: taking the procedures practitioners are already using, and figuring out why they (seem to) work. Kelvin Wong, Siva Manivasagam, and Amanjit Singh Kainth. The deep bootstrap framework: Good online learners are good offline generalizers. With the rapid adoption of machine learning systems in sensitive applications, there is an increasing need to make black-box models explainable. Understanding black-box predictions via influence functions. In, Metsis, V., Androutsopoulos, I., and Paliouras, G. Spam filtering with naive Bayes - which naive Bayes? Understanding Black-box Predictions via Influence Functions (2017) For this class, we'll use Python and the JAX deep learning framework. Self-tuning networks: Bilevel optimization of hyperparameters using structured best-response functions. Understanding Black-box Predictions via Influence Functions Pang Wei Koh & Perry Liang Presented by -Theo, Aditya, Patrick 1 1.Influence functions: definitions and theory 2.Efficiently calculating influence functions 3. ICML 2017 Best Paper - the original paper linked here. More details can be found in the project handout. Understanding Black-box Predictions via Influence Functions - Github place. Jianxin Ma, Peng Cui, Kun Kuang, Xin Wang, and Wenwu Zhu. International conference on machine learning, 1885-1894, 2017. ": Explaining the predictions of any classifier. How can we explain the predictions of a black-box model? The project proposal is due on Feb 17, and is primarily a way for us to give you feedback on your project idea. Theano D. Team. Proceedings of Machine Learning Research | Proceedings of the 34th J. Cohen, S. Kaur, Y. Li, J. There are several neural net libraries built on top of JAX. On the origin of implicit regularization in stochastic gradient descent. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1885--1894. lage2019evaluationI. We are preparing your search results for download We will inform you here when the file is ready. To scale up influence functions to modern [] GitHub - kohpangwei/influence-release RelEx: A Model-Agnostic Relational Model Explainer Adler, P., Falk, C., Friedler, S. A., Rybeck, G., Scheidegger, C., Smith, B., and Venkatasubramanian, S. Auditing black-box models for indirect influence. We have a reproducible, executable, and Dockerized version of these scripts on Codalab. In this paper, we use influence functions a classic technique from robust statistics to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. PDF Appendix: Understanding Black-box Predictions via Influence Functions Shrikumar, A., Greenside, P., Shcherbina, A., and Kundaje, A. Understanding Black-box Predictions via Influence Functions Background information ICML 2017 best paper Stanford Pang Wei Koh CourseraStanfordNIPS 2019influence function Percy Liang11Michael Jordan Abstract Components of inuence. In this paper, we use influence functions --- a classic technique from robust statistics --- to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. Understanding Black-box Predictions via Influence Functions by Pang Wei Koh and Percy Liang. This code replicates the experiments from the following paper: Pang Wei Koh and Percy Liang Understanding Black-box Predictions via Influence Functions International Conference on Machine Learning (ICML), 2017. Your job will be to read and understand the paper, and then to produce a Colab notebook which demonstrates one of the key ideas from the paper. In. training time, and reduce memory requirements. logistic regression p (y|x)=\sigma (y \theta^Tx) \sigma . Measuring the effects of data parallelism on neural network training. and even creating visually-indistinguishable training-set attacks. Cook, R. D. Detection of influential observation in linear regression. Pearlmutter, B. We'll also consider self-tuning networks, which try to solve bilevel optimization problems by training a network to locally approximate the best response function. To scale up influence functions to modern machine learning settings, we develop a simple, efficient implementation that requires only oracle access to gradients and Hessian-vector products. Abstract. 7 1 . Visual interpretability for deep learning: a survey | SpringerLink Time permitting, we'll also consider the limit of infinite depth. Influence functions efficiently estimate the effect of removing a single training data point on a model's learned parameters. This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. Optimizing neural networks with Kronecker-factored approximate curvature. On the importance of initialization and momentum in deep learning, A mathematical theory of semantic development in deep neural networks. How can we explain the predictions of a black-box model? , . Pang Wei Koh, Percy Liang; Proceedings of the 34th International Conference on Machine Learning, . Understanding Black-box Predictions via Inuence Functions Figure 1. To manage your alert preferences, click on the button below. If Influence Functions are the Answer, Then What is the Question? A. Neural nets have achieved amazing results over the past decade in domains as broad as vision, speech, language understanding, medicine, robotics, and game playing. We are given training points z 1;:::;z n, where z i= (x i;y i) 2 XY . CodaLab Worksheets , . Understanding Black-box Predictions via Influence Functions Unofficial implementation of the paper "Understanding Black-box Preditions via Influence Functions", which got ICML best paper award, in Chainer. Understanding Black-box Predictions via Influence Functions We show that even on non-convex and non-differentiable models where the theory breaks down, approximations to influence functions can still provide valuable information.
Abner Mares Restaurant, Sonicwall Block Traffic Between Interfaces, Gardenia Float Dessert, Newt Death Scene, Optima 45 Stapler Not Working, Articles S