Understanding the generative nature of LDA with R

Topic modeling is a suite of algorithms that discover latent topics in large corpora of texts. To better understand what topic modeling does, I'll explain the conceptual background behind the algorithm(s). Topic modeling finds topics in a document that summarize the document in a "compressed" manner - as a weighted …

more ...

Identifying outliers and influential cases

With experimental data, you commonly have to deal with "outliers", that is, data points that behave differently than the rest of the data for some reason. These outliers can influence the analysis and thus the interpretation of the data. In this blog post, we will look at these outliers and …

more ...


Accessing MongoDB from R with mongolite

Recently, I have moved away from text files as data storage, and started using MongoDB. While there are already two R packages (RMongo and rmongodb) interfacing with MongoDB, I was never completed satified with them - especially in comparison to the excellent PyMongo. A couple of days ago, a new package …

more ...