I am a postdoc researcher at the Department of Electronic Engineering, Tsinghua University, collaborating with Prof. Bowen Zhou.
I received my Ph.D. at the Department of Computer Science and Technology, Tsinghua Univeristy in 2023, advised by Prof. Hai-Tao Zheng and also co-advised by Prof. Zhiyuan Liu.
Research
My research spans the areas of natural language processing and machine learning. At the current stage, I am particularly interested advanced stimulation of language models. My research aims to develop theory, tools, and algorithms to effectively and efficiently drive language models (especially the large ones), and also establish a deeper understanding by observing the behaviors of models.
Parameter-efficient Fine-tuning of Large-scale Pre-trained Language Models
Ning Ding,
Yujia Qin,
Guang Yang,
Fuchao Wei,
Zonghan Yang,
Yusheng Su,
Shengding Hu,
Yulin Chen,
Chi-Min Chan,
Weize Chen,
Jing Yi,
Weilin Zhao,
Zhiyuan Liu,
Hai-Tao Zheng,
Jianfei Chen,
Yang Liu,
Jie Tang,
Juanzi Li,
and Maosong Sun
Nature Machine Intelligence Cover Article of Nature Machine Intelligenceās March Issue World Artificial Intelligence Conference Youth Outstanding Paper Award
As pre-trained language models (PLMs) have become the fundamental infrastructure for various NLP tasks and researchers have readily enjoyed themselves in the pretrainingfinetuning paradigm, evidence from emerging research has continuously proven that larger models tend to yield better performance. However, despite the welcome outcome, the process of fine-tuning large-scale PLMs brings prohibitive adaptation costs. In fact, finetuning all the parameters of a colossal model and retaining separate instances for different tasks are practically infeasible. This necessitates a new branch of research focusing on the parameter-efficient adaptation of PLMs. In order to unleash the imagination of the possible advantages of such methods, not limited to parameter efficiency, we coined a new term delta tuning from a morphological point of view to refer to the original āparameter efficient tuningā. In contrast with the standard fine-tuning, delta tuning only fine-tunes a small portion of the model parameters while keeping the rest untouched, largely reducing both the computation and storage costs. Recent studies have demonstrated that a series of delta tuning methods with distinct tuned parameter selection could achieve performance on a par with full-parameter fine-tuning, suggesting a new promising way of stimulating large-scale PLMs. In this paper, we first formally describe the problem of delta tuning and then comprehensively review recent delta tuning approaches. We also propose a unified categorization criterion that divides existing delta tuning methods into three groups: addition-based, specification-based, and reparameterization-based methods. Though initially proposed as an efficient method to steer large models, we believe that some of the fascinating evidence discovered along with delta tuning could help further reveal the mechanisms of PLMs and even deep neural networks. To this end, we discuss the theoretical principles underlying the effectiveness of delta tuning and propose frameworks to interpret delta tuning from the perspective of optimization and optimal control, respectively. Furthermore, we provide a holistic empirical study of representative methods, where results on over 100 NLP tasks demonstrate a comprehensive performance comparison of different approaches. The experimental results also cover the analysis of combinatorial, scaling and transferable properties of delta tuning. To facilitate the research of delta tuning, we are also developing an open-source toolkit, OpenDelta , that enables practitioners to efficiently and flexibly implement delta tuning on PLMs. At last, we discuss a series of real-world applications of delta tuning.
OpenPrompt: An Open-source Framework for Prompt-learning
Ning Ding,
Shengding Hu,
Weilin Zhao,
Yulin Chen,
Zhiyuan Liu,
Hai-Tao Zheng,
and Maosong Sun
ACL System Demonstration
2022
Best Demo Paper Award
Prompt-learning has become a new paradigm in modern natural language processing, which directly adapts pre-trained language models (PLMs) to cloze-style prediction, autoregressive modeling, or sequence to sequence generation, resulting in promising performances on various tasks. However, no standard implementation framework of prompt-learning is proposed yet, and most existing prompt-learning codebases, often unregulated, only provide limited implementations for specific scenarios. Since there are many details such as templating strategy, initializing strategy, and verbalizing strategy, etc. need to be considered in prompt-learning, practitioners face impediments to quickly adapting the desired prompt learning methods to their applications. In this paper, we present OpenPrompt, a unified easy-to-use toolkit to conduct prompt-learning over PLMs. OpenPrompt is a research-friendly framework that is equipped with efficiency, modularity, and extendibility, and its combinability allows the freedom to combine different PLMs, task formats, and prompting modules in a unified paradigm. Users could expediently deploy prompt-learning frameworks and evaluate the generalization of them on different NLP tasks without constraints.
Enhancing Chat Language Models by Scaling High-quality Instructional Conversations
Ning Ding,
Yulin Chen,
Bokai Xu,
Yujia Qin,
Shengding Hu,
Zhiyuan Liu,
Maosong Sun,
and Bowen Zhou
EMNLP
2023
The Ultra series solutions also contain other works like UltraFeedback, UltraInteract, UltraMedical, etc.
Recognizing relations between entities is a pivotal task of relational learning. Learning relation representations from distantly-labeled datasets is difficult because of the abundant label noise and complicated expressions in human language. This paper aims to learn predictive, interpretable, and robust relation representations from distantly-labeled data that are effective in different settings, including supervised, distantly supervised, and few-shot learning. Instead of solely relying on the supervision from noisy labels, we propose to learn prototypes for each relation from contextual information to best explore the intrinsic semantics of relations. Prototypes are representations in the feature space abstracting the essential semantics of relations between entities in sentences. We learn prototypes based on objectives with clear geometric interpretation, where the prototypes are unit vectors uniformly dispersed in a unit ball, and statement embeddings are centered at the end of their corresponding prototype vectors on the surface of the ball. This approach allows us to learn meaningful, interpretable prototypes for the final classification. Results on several relation learning tasks show that our model significantly outperforms the previous state-of-the-art models. We further demonstrate the robustness of the encoder and the interpretability of prototypes with extensive experiments.
Awards
Yunfan Award of WAIC, 2024.
Young Elite Scientists Sponsorship Program by CAST, 2023.
World Artificial Intelligence Conference Youth Outstanding Paper Award, 2023.
Shuimu Tsinghua Scholar Program, 2023.
Zhang Keqian Scholar Program, 2023.
Outstanding Doctoral Dissertation of Tsinghua University, 2023.
Outstanding Graduate of DCST, Tsinghua University, 2023.