I'm a Senior Research Scientist at Salesforce Research in Palo Alto where I work on Deep Learning and its applications to Natural Language Processing. I am particularly interested in efficient training methods and issues pertaining to generalization and scalability. I am also interested in Deep Learning systems and tooling. I received my PhD from Northwestern University in 2017 under the supervision of Prof. Jorge Nocedal and Prof. Andreas Waechter. For my PhD, I focused on efficiently finding solutions to Mathematical Optimization problems which are nonsmooth or stochastic. This includes several problems in Machine Learning and Deep Learning. |

Email: keskar.nitish@gmail.com

**CTRL: A Conditional Transformer Language Model for Controllable Generation.**__N. Keskar__, B. McCann, L. Varshney, C. Xiong & R. Socher

Paper: arXiv preprint, blog post

Code: GitHub

**Pretrained AI Models: Performativity, Mobility, and Change.**L. Varshney,__N. Keskar__& R. Socher

Paper: arXiv preprint

**Neural text summarization: A critical evaluation.**W. Kryściński,__N. Keskar__, B. McCann, C. Xiong & R. Socher

Paper: EMNLP 2019

**XLDA: Cross-Lingual Data Augmentation for Natural Language Inference and Question Answering.**J. Singh, B. McCann,__N. Keskar__, C. Xiong & R. Socher

Paper: arXiv preprint

**Unifying Question Answering and Text Classification via Span Extraction.**__N. Keskar__, B. McCann, C. Xiong & R. Socher

Paper: arXiv preprint

**Coarse-grain fine-grain coattention network for multi-evidence question answering.**V. Zhong, C. Xiong,__N. Keskar__& R. Socher

Paper: ICLR 2019, arXiv preprint

**A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation.**A. Gotmare,__N. Keskar__, C. Xiong & R. Socher

Paper: ICLR 2019, arXiv preprint

**Identifying Generalization Properties in Neural Networks.**H. Wang,__N. Keskar__, C. Xiong & R. Socher

Paper: arXiv preprint, blog post

**The Natural Language Decathlon: Multitask Learning as Question Answering.**B. McCann,__N. Keskar__, C. Xiong & R. Socher

Paper: preprint, blog post

Code: GitHub

Press: VentureBeat, ZDNet, FAZ (German), SiliconAngle

**Using Mode Connectivity for Loss Landscape Analysis.**A. Gotmare,__N. Keskar__, C. Xiong & R. Socher

Paper: arXiv preprint

**An Analysis of Neural Language Modeling at Multiple Scales.**S. Merity,__N. Keskar__& R. Socher

Paper: arXiv preprint

Code: GitHub

**Regularizing and optimizing LSTM language models.**S. Merity,__N. Keskar__& R. Socher.

Paper: ICLR 2018, arXiv preprint

Code: GitHub

**Scalable Language Modeling: WikiText-103 on a Single GPU in 12 hours.**S.Merity,__N. Keskar__, J. Bradbury & R. Socher

Paper: SysML 2018 (PDF)

**Improving Generalization Performance by Switching from Adam to SGD.**__N. Keskar__& R.Socher

Paper: arXiv preprint

**Weighted Transformer Network for Machine Translation.**K. Ahmed,__N. Keskar__& R. Socher

Paper: arXiv preprint

**Balancing Communication and Computation in Distributed Optimization.**A. Berahas, R. Bollapragada,__N. Keskar__& E. Wei

Paper: IEEE Transactions on Automatic Control, arXiv preprint

**On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima.**__N. Keskar__, D. Mudigere, J. Nocedal, M. Smelyanskiy & P. T. P. Tang

Paper: ICLR 2017, arXiv preprint

Code: GitHub

This paper was selected for an oral presentation at ICLR 2017; only 15 such papers were selected.

**A Limited-Memory Quasi-Newton Algorithm for Bound-Constrained Nonsmooth Optimization.**__N. Keskar__& A. Waechter

Paper: Optimization Methods & Software, arXiv preprint

Code: GitHub

**adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs.**__N. Keskar__& A. Berahas

Paper: Proceedings of ECML-PKDD 2016, arXiv preprint

Code: GitHub

**A Second-Order Method for Convex L1-Regularized Optimization with Active Set Prediction.**__N. Keskar__, J. Nocedal, F. Oztoprak & A. Waechter

Paper: Optimization Methods & Software, arXiv preprint

Code: GitHub

This paper won the 2016 Charles Broyden Prize for best paper published in the Optimization Methods & Software journal.

**A Nonmonotone Learning Rate Strategy for SGD Training of Deep Neural Networks.**__N. Keskar__& G. Saon

Paper: Proceedings of IEEE ICASSP (2015)

Deep Learning

Natural Language Processing

Research Tooling

Nonlinear Optimization

Scientific Computing