Inferring the posteriors in LDA through Gibbs sampling

In my last blog post, which was about a million years ago, I described the generative nature of LDA and left the interferential step open. In this blog post, I will explain one method to calculate estimations of the topic distribution θ and the term distribution ϕ. This approach, first formulated by Griffiths and Steyvers (2004) in the context of LDA, is to use Gibbs sampling, a common algorithm within the Markov Chain Monte Carlo (MCMC) family of sampling algorithms. Before applying Gibbs sampling directly to LDA, I will first give a short introduction to Gibbs sampling more generally.

more ...

Understanding the generative nature of LDA with R

Topic modeling is a suite of algorithms that discover latent topics in large corpora of texts. To better understand what topic modeling does, I'll explain the conceptual background behind the algorithm(s). Topic modeling finds topics in a document that summarize the document in a "compressed" manner - as a weighted …

more ...

Identifying outliers and influential cases

With experimental data, you commonly have to deal with "outliers", that is, data points that behave differently than the rest of the data for some reason. These outliers can influence the analysis and thus the interpretation of the data. In this blog post, we will look at these outliers and …

more ...

An Introduction to Gradient Descent in Python

Gradient descent is an optimization algorithm used to find the local minimum of a function. It is commonly used in many different machine learning algorithms. In this blog post, I will explain the principles behind gradient descent using Python, starting with a simple example of how gradient descent can be used to find the local minimum of a quadratic equation, and then progressing to applying gradient descent to linear regression. By the end of the post, you should be able to code your own version of gradient descent and understand the concept behind it.

more ...