My research interests are Natural Language Processing (NLP) and Machine Learning (ML), with a particular focus on the development of NLP algorithms/techniques which deal with the interaction between syntax and semantics. I am also strongly interested in investigating whether large language models (LMs) developed in recent years, such as BERT and GPTs, can learn the structural representation of text. In addition to techniques, I am also interested in working with text in specific domains, for example biomedical text and privacy policies.
Currently I am working on the following projects. For previous projects, please see the publications page.
The goal of the project is to develop NLP techniques to extract relations, in particular the cause-effect relation, in biomedical text using transfer learning using large pre-trained language models. Our approach is to extract entities (e.g. drug/effect/procedure names) and relations (e.g. 'drugA causes effectB') at the same time -- joint extraction.
The goal of the project is to develop an automatic privacy summarization system. It analyzes privacy policies posted online and displays summaries by extracting important sentences from a policy. We have started the project a few years ago and have been continuously making improvements to the system. The current work is to update the system completely, from back-end to front-end, using the state-of-the-art AI, ML and NLP technologies developed in recent years.
The goal of the project is develop to an efficient technique for controlling Natural Language Generation (NLG). Recently, numerous articles and papers have been written on the racial, religious, gender, and other biases in the text generated by NLG models. Clearly NLG must be controlled, to be steered away from generating such kinds of text. In this project, we aim to develop to an efficient technique for controlling NLG. Not only will the technique be effective in mitigating biased/toxic language, it will be optimized for computation. It will also be a general NLG technique, which will be applicable to a variety of NLG tasks such as dialogues/conversations and story generation. Then we will evaluate the technique using both automatic and human evaluation methods.