Welcome to the 14th issue of the Papers with Code newsletter. In this edition, we cover:
- a summary of some of the latest advances in the lottery ticket hypothesis,
- state-of-the-art object detectors (YOLOX series),
- top trending papers of July 2021,
- ... and much more.
Trending Papers with Code 📄
Latest Advances in the Lottery Ticket Hypothesis
The Lottery Ticket Hypothesis suggests that when training dense, randomly-initialized deep neural networks (DNNs) there exist subnetworks ("winning lottery tickets") that, when trained in isolation, match the performance of the original network in a similar number of iterations.
The lottery ticket hypothesis provides powerful ideas and findings that aim at improving deep neural networks in terms of achieving significant savings in memory footprint or inference time.
In this issue, we highlight some of the recent advances and findings involving the lottery ticket hypothesis:
A Generalized Lottery Ticket Hypothesis
CIFAR-10 test accuracy plotted against compression ratio when iterative magnitude pruning (IMP) is used for non-canonical dictionaries. LEFT: Random dictionary. RIGHT: Using bottleneck dictionaries with either identity blocks (ID) or discrete cosine transform (DCT). Figure source: Alabdulmohsin et al. (2021).
The original work by Frankle and Carbin shows that over 95% of the model parameters can be pruned without affecting accuracy. It also shows that a critical compression threshold exists after which the test accuracy decreases rapidly. The original lottery ticket hypothesis considers unstructured sparsity which in practice may not lead to significant speedups as current accelerators are optimized for dense matrix operations. Alabdulmohsin et al. (2021) present "Generalized Lottery Ticket Hypothesis" which encompasses both unstructured and structured pruning under a common framework.
Key takeaways: The authors note that in the original lottery ticket hypothesis the notion of sparsity ("many weights are zero") stems from the particular choice of the basis in the DNN parameter space (canonical basis). The proposed "Generalized Lottery Ticket Hypothesis" provides evidence that the same phenomenon continues to hold for other, arbitrary choices of the basis and their implied notions of sparsity. Given these observations, the authors suggest that better compression can be achieved when using a more appropriate basis such as discrete cosine transform (DCT). In addition, structured pruning can be achieved by carefully selecting the basis and using algorithms originally developed for unstructured sparsity (e.g., iterative magnitude pruning (IMP)). This generalization effectively relaxes the notion of sparsity by choosing an arbitrary basis in the DNN parameter space.
Multi-Prize Lottery Ticket Hypothesis
Multi-prize tickets (MPT), obtained only by pruning and binarizing random networks, outperform trained full precision and state-of-the-art binary weight networks. Figure source: Diffenderfer and Kailkhura (2021).
Another aspect of the lottery ticket hypothesis that's challenging is efficiently finding the winning tickets (high-performing subnetworks); the process is expensive and requires an iterative process of training and pruning weights. Diffenderfer and Kailkhura (2021) recently proposed the "Multi-Prize Lottery Ticket Hypothesis" suggesting that a "sufficiently over-parameterized neural network with random weights contains several subnetworks that (a) have comparable accuracy to a dense target network with learned weights (prize 1), (b) do not require any further training to achieve prize 1 (prize 2), and (c) is robust to extreme forms of quantization (prize 3)".
Main findings: The main benefit of this work is that it provides a framework to learn compact and highly accurate binary neural networks by simply pruning and quantizing randomly weighted full precision networks. The authors also propose an algorithm for finding multi-prize tickets tested on ImageNet and CIFAR-10. Results show that as networks grow deeper and wider, multi-prize tickets start to reach similar test accuracy as their full-precision counterparts. Without updating weights, these multi-prize tickets set new state-of-the-art results for binary neural networks on CIFAR-10. More results here.
Lottery Ticket Hypothesis: A Collection of Recent Papers
Below we provide a collection of further readings on the lottery ticket hypothesis:
📄 The Lottery Ticket Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models - Chen et al. (2021)
📄 The Elastic Lottery Ticket Hypothesis - Chen et al. (2021)
📄 A Lottery Ticket Hypothesis Framework for Low-Complexity Device-Robust Neural Acoustic Scene Classification - Yang et al. (2021)
📄 A Unified Lottery Ticket Hypothesis for Graph Neural Networks - Chen et al. (2021)
📄 The Lottery Ticket Hypothesis for Pre-trained BERT Networks - Chen et al. (2020)
📄 Linear Mode Connectivity and the Lottery Ticket Hypothesis - Frankle et al. (2020)
📄 Winning Lottery Tickets in Deep Generative Models - Kalibhat et al. (2020)
📄 It's Hard for Neural Networks to Learn the Game of Life - Springer and Kenyon (2020)
📄 Stabilizing the Lottery Ticket Hypothesis - Frankle et al. (2019)
📄 One ticket to win them all - Morcos et al. (2019)
Exceeding YOLO Series in 2021
LEFT: Speed-accuracy tradeoff of several models. RIGHT: Size-accuracy curve of lite models on mobile devices. YOLOX are the proposed models, which are compared to other state-of-the-art object detectors. Figure source: Ge et al. (2021).
In recent years, YOLO type models have led to many advancements in object detection. These models aim for attaining the optimal speed and accuracy tradeoff for real-time applications. Models in the YOLO series have always leveraged the most advanced techniques available at the time (e.g. Residual Net for YOLOv3). Ge et al. (2021) recently proposed to improve the YOLO series by incorporating some recent techniques such as anchor-free detection and advanced label assignment strategies.
Main findings: The proposed high-performance models switch the YOLO detector to anchor-free. These models also adopt other advanced detection techniques such as incorporating a decoupled head and the leading label assignment strategy, SimOTA. State-of-the-art results are attained across a large scale range of models. For instance, YOLO-Nano (a 0.91M parameter model) with 1.08G FLOPs achieves 25.3% AP on COCO, surpassing other models like NanoDet. The proposed YOLOX-L outperforms YOLOv5-L with 50.0% AP on COCO; YOLOX-L and YOLOv5-L are similar sized models. Overall, the proposed YOLOX models achieve a better trade-off between speed and accuracy as compared to the other models across all model sizes.
Top Trending Papers of July 2021 🏆
Below we highlight the top trending papers of July 2021:
📄 YOLOX - Ge et al. (2021)
📄 CBNetV2 - Liang et al. (2021)
📄 Per-Pixel Classification is Not All You Need for Semantic Segmentation - Cheng et al. (2021)
📄 Focal Self Attention for Local-Global Interactions in Vision Transformers - Yang et al. (2021)
📄 Polarized Self-Attention - Liu et al. (2021)
📄 Depth-supervised NeRF - Deng et al. (2021)
📄 Real-ESRGAN - Wang et al. (2021)
📄 Global Filter Networks for Image Classification - Rao et al. (2021)
📄 CSWin Transformer - Dong et al. (2021)
📄 Codex: GPT Language Model Fine-Tuned on Code - Chen et al. (2021)
Trending Datasets and Libraries 🛠
Trending datasets
WikiGraphs - a dataset of Wikipedia articles each paired with a knowledge graph, to facilitate research in conditional text generation, graph generation and graph representation learning.
MultiBench - a unified large-scale benchmark for multimodal learning spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas.
QVHighlights - a dataset for detecting customized moments and highlights from videos given natural language.
Trending libraries/tools
Small-text - a Python-based active learning library offering pool-based active learning for text classification.
Fedlearn-Algo - an open-source privacy preserving machine learning platform.
HW2VEC - an open-source graph learning tool that lowers the threshold for newcomers to research hardware security applications with graphs.
Community Highlights ✍️
We would like to thank:
- @muhaochen for contributing to Tasks and Datasets, including the addition of results to a recent paper on entity alignment for KGs.
- @byunghak for contributing to Tasks, including the addition of the Medical Code Prediction task.
- @raysonlaroca for contributing to Datasets, including the Vehicle-Rear dataset.
- @mrudolph, @yzhang, @nairouz, and @kirillova-anastasia for contributing to several benchmarks.
- @donovanOng for contributing to several datasets and benchmarks.
Special thanks to the hundreds of other contributors for their ongoing contributions to Papers with Code.
More on Papers with Code 🔗
🔥 Hot Research on Papers with Code 🔥
We have recently released a new feature on Papers with Code that allows you to keep track of trending papers discussed and shared by the community.
An Overview of Distributed Methods
We recently put together a compilation of distributed methods for scaling deep learning to very large models.
---
We would be happy to hear your thoughts and suggestions on the newsletter. Please reply to elvis@paperswithcode.com.