Regression (and Scoring) Aware Inference with LLMs
Michal Lukasik, Harikrishna Narasimhan, Aditya Krishna Menon, Felix Yu, Sanjiv Kumar.
In
EMNLP (findings),
2024.
What do larger image classifiers memorise?
Michal Lukasik, Vaishnavh Nagarajan, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar.
In
TMLR,
2024.
It's an Alignment, Not a Trade-off: Revisiting Bias and Variance in Deep Models
Lin Chen, Michal Lukasik, Wittawat Jitkrittum, Chong You, Sanjiv Kumar.
In
ICLR (spotlight presentation),
2024.
Two-stage LLM Fine-tuning with Less Specialization and More Generalization
Yihan Wang, Si Si, Daliang Li, Michal Lukasik, Felix Yu, Cho-Jui Hsieh, Inderjit S Dhillon, Sanjiv Kumar.
In
ICLR,
2024.
ResMem: Learn what you can and memorize the rest
Zitong Yang, Michal Lukasik, Vaishnavh Nagarajan, Zonglin Li, Ankit Rawat, Manzil Zaheer, Aditya Menon, Sanjiv Kumar.
In
NEURIPS,
2023.
Large language models with controllable working memory
Daliang Li, Ankit Singh Rawat, Manzil Zaheer, Xin Wang, Michal Lukasik, Andreas Veit, Felix Yu, Sanjiv Kumar.
In
ACL (findings),
2023.
Robust distillation for worst-class performance: on the interplay between teacher and student objectives
Serena Wang, Harikrishna Narasimhan, Yichen Zhou, Sara Hooker, Michal Lukasik, Aditya Krishna Menon.
In
UAI,
2023.
Teacher's pet: understanding and mitigating biases in distillation
Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv Kumar.
In
TMLR,
2022.
Semantic Label Smoothing for Sequence to Sequence Problems
Michal Lukasik, Himanshu Jain, Aditya Krishna Menon, Seungyeon Kim, Srinadh Bhojanapalli, Felix Yu, Sanjiv Kumar.
In
EMNLP,
2020.
Text Segmentation by Cross Segment Attention
Michal Lukasik, Boris Dadachev, Gonçalo Simões, Kishore Papineni.
In
EMNLP,
2020.
Does label smoothing mitigate label noise?
Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv Kumar.
In
ICML,
2020.
Scaling Graph Neural Networks with Approximate PageRank
Aleksandar Bojchevski, Johannes Klicpera, Bryan Perozzi, Amol Kapoor, Martin Blais, Benedek Rózemberczki, Michal Lukasik, Stephan Günnemann.
In
KDD,
2020.
Longitudinal Modeling of Social Media with Hawkes Process based on Users and Networks
P.K. Srijith, Michal Lukasik, Kalina Bontcheva and Trevor Cohn.
In
The IEEE/ACM International Conference on Social Networks Analysis and Mining, ASONAM,
2017. Abstract
Longitudinal Modeling of Social Media with Hawkes Process based on Users and Networks
Online social networks provide a platform for sharing information at an unprecedented scale. Users generate information which propagates across the network resulting in information cascades. In this paper, we study the evolution of information cascades in Twitter using a point process model of user activity. We develop several Hawkes process models considering various properties including conversational structure, users’ connections and general features of users including the textual information, and show how they are helpful in modeling the social network activity. We consider low-rank embeddings of users and user features, and learn the features helpful in identifying the influence and susceptibility of users. Evaluation on Twitter data sets associated with civil unrest shows that incorporating richer properties improves the performance in predicting future activity of users and memes
Computational approach to dendritic spine taxonomy and shape transition analysis
Grzegorz Bokota, Marta Magnowska, Tomasz Kusmierczyk, Michal Lukasik, Matylda Roszkowska, Dariusz Plewczynski.
In
Frontiers in Computational Neuroscience,
2017. Abstract
Computational approach to dendritic spine taxonomy and shape transition analysis
The common approach in morphological analysis of dendritic spines of mammalian neuronal cells is to categorize spines into subpopulations based on whether they are stubby, mushroom, thin, or filopodia shaped. The corresponding cellular models of synaptic plasticity, long-term potentiation, and long-term depression associate the synaptic strength with either spine enlargement or spine shrinkage. Although a variety of automatic spine segmentation and feature extraction methods were developed recently, no approaches allowing for an automatic and unbiased distinction between dendritic spine subpopulations and detailed computational models of spine behavior exist. We propose an automatic and statistically based method for the unsupervised construction of spine shape taxonomy based on arbitrary features. The taxonomy is then utilized in the newly introduced computational model of behavior, which relies on transitions between shapes. Models of different populations are compared using supplied bootstrap-based statistical tests. We compared two populations of spines at two time points. The first population was stimulated with long-term potentiation, and the other in the resting state was used as a control. The comparison of shape transition characteristics allowed us to identify the differences between population behaviors. Although some extreme changes were observed in the stimulated population, statistically significant differences were found only when whole models were compared
Stance classification in Rumours as a Sequential Task Exploiting the Tree Structure of Social Media Conversations
Arkaitz Zubiaga, Elena Kochkina, Maria Liakata, Rob Procter, Michal Lukasik.
In
26th International Conference on Computational Linguistics, COLING,
2016. Abstract
Stance classification in Rumours as a Sequential Task Exploiting the Tree Structure of Social Media Conversations
Rumour stance classification, the task that determines if each tweet in a collection discussing a rumour is supporting, denying, questioning or simply commenting on the rumour, has been attracting substantial interest. Here we introduce a novel approach that makes use of the sequence of transitions observed in tree-structured conversation threads in Twitter. The conversation threads are formed by harvesting users’ replies to one another, which results in a nested tree-like structure. Previous work addressing the stance classification task has treated each tweet as a separate unit. Here we analyse tweets by virtue of their position in a sequence and test two sequential classifiers, Linear-Chain CRF and Tree CRF, each of which makes different assumptions about the conversational structure. We experiment with eight Twitter datasets, collected during breaking news, and show that exploiting the sequential structure of Twitter conversations achieves significant improvements over the non-sequential methods. Our work is the first to model Twitter conversations as a tree structure in this manner, introducing a novel way of tackling NLP tasks on Twitter conversations.
Hawkes Processes for Continuous Time Sequence Classification an Application to Rumour Stance Classification in Twitter
Michal Lukasik, P. K. Srijith, Duy Vu, Kalina Bontcheva, Arkaitz Zubiaga, Trevor Cohn.
In
Proceedings of the 54th annual meeting of the Association for Computational Linguistics, ACL,
2016. Abstract
Hawkes Processes for Continuous Time Sequence Classification an Application to Rumour Stance Classification in Twitter
Classification of temporal textual data sequences is a common task in various domains such as social media and the Web. In this paper we propose to use Hawkes Processes for classifying sequences of temporal textual data, which exploit both temporal and textual information. Our experiments on rumour stance classification on four Twitter datasets show the importance of using the temporal information of tweets along with the textual content.
Metrics for Evaluation of Word-level Machine Translation Quality Estimation
Varvara Logacheva, Michal Lukasik and Lucia Specia.
In
Proceedings of the 54th annual meeting of the Association for Computational Linguistics, ACL,
2016. Abstract
Metrics for Evaluation of Word-level Machine Translation Quality Estimation
The aim of this paper is to investigate suitable evaluation strategies for the task of word-level quality estimation of machine translation. We suggest various metrics to replace F1-score for the “BAD” class, which is currently used as main metric. We compare the metrics’ performance on real system outputs and synthetically generated datasets and suggest a reliable alternative to the F1-BAD score — the multiplication of F1-scores for different classes. Other metrics have lower discriminative power and are biased by unfair labellings.
Convolution Kernels for Discriminative Learning from Streaming Text
Michal Lukasik, Trevor Cohn.
In
Proceedings of the Thirtieth AAAI Conference. AAAI,
2016. Abstract
Convolution Kernels for Discriminative Learning from Streaming Text
Time series modeling is an important problem with many applications in different domains. Here we consider discriminative learning from time series, where we seek to predict an output response variable based on time series input. We develop a method based on convolution kernels to model discriminative learning over streams of text. Our method outperforms competitive baselines in three synthetic and two real datasets, rumour frequency modeling and popularity prediction tasks.
Classifying Tweet Level Judgements of Rumours in Social Media
Michal Lukasik, Trevor Cohn and Kalina Bontcheva.
In
Proceedings of Empirical Methods of Natural Language Processing, EMNLP,
2015. Abstract
Classifying Tweet Level Judgements of Rumours in Social Media
Social media is a rich source of rumours and corresponding community reactions. Rumours reflect different characteristics, some shared and some individual. We formulate the problem of classifying tweet level judgements of rumours as a supervised learning task. Both supervised and unsupervised domain adaptation are considered, in which tweets from a rumour are classified on the basis of other annotated rumours. We demonstrate how multi-task learning helps achieve good results on rumours from the 2011 England riots.
Modeling Tweet Arrival Times using Log-Gaussian Cox Processes
Michal Lukasik, Srijith Prabhakaran Nair Kusumam, Trevor Cohn and Kalina Bontcheva.
In
Proceedings of Empirical Methods of Natural Language Processing, EMNLP,
2015. Abstract
Modeling Tweet Arrival Times using Log-Gaussian Cox Processes
Research on modeling time series text corpora has typically focused on predicting what text will come next, but less well studied is predicting when the next text event will occur. In this paper we address the latter case, framed as modeling continuous inter-arrival times under a log-Gaussian Cox process, a form of inhomogeneous Poisson process which captures the varying rate at which the tweets arrive over time. In an application to rumour modeling of tweets surrounding the 2014 Ferguson riots, we show how interarrival times between tweets can be accurately predicted, and that incorporating textual features further improves predictions.
Point process modelling of rumour dynamics in social media
Michal Lukasik, Trevor Cohn and Kalina Bontcheva.
In
Proceedings of the 53rd annual meeting of the Association for Computational Linguistics, ACL,
2015. Abstract
Point process modelling of rumour dynamics in social media
Rumours on social media exhibit complex temporal patterns. This paper develops a model of rumour prevalence using a point process, namely a log-Gaussian Cox process, to infer an underlying continuous temporal probabilistic model of post frequencies. To generalize over different rumours, we present a multi-task learning method parametrized by the text in posts, to allow data statistics to be shared between groups of similar rumours. Our experiments demonstrate that our model outperforms several strong baseline methods for rumour frequency prediction evaluated on tweets from the 2014 Ferguson riots
Hierarchical, Multi-label Classification of Scholarly Publications Modifications of ML-KNN Algorithm
Michal Lukasik, Tomasz Kusmierczyk, Lukasz Bolikowski, Hung Son Nguyen.
In
Intelligent Tools for Building a Scientific Information Platform,
2013. Abstract
Hierarchical, Multi-label Classification of Scholarly Publications Modifications of ML-KNN Algorithm
One of the common problems when dealing with digital libraries is lack of classification codes in some of the documents. In the following publication we deal with this problem in a multi-label, hierarchical case of Mathematics Subject Classification System. We develop modifications of ML-KNN algorithm and show how they improve results given by the algorithm on example of Springer textual data
Evaluation of Features for Author Name Disambiguation Using Linear Support Vector Machines
Piotr Jan Dendek, Lukasz Bolikowski, Michal Lukasik.
In
Document Analysis Systems, DAS,
2012. Abstract
Evaluation of Features for Author Name Disambiguation Using Linear Support Vector Machines
Author name disambiguation allows to distinguish between two or more authors sharing the same name. In a previous paper, we have proposed a name disambiguation framework in which for each author name in each article we build a context consisting of classification codes, bibliographic references, co-authors, etc. Then, by pair wise comparison of contexts, we have been grouping contributions likely referring to the same people. In this paper we examine which elements of the context are most effective in author name disambiguation. We employ linear Support Vector Machines (SVM) to find the most influential features