nonparametric Bayesian model motivation

原创

mb649b884ce232e 2023-06-29 10:03:40 ©著作权

©著作权归作者所有：来自51CTO博客作者mb649b884ce232e的原创作品，请联系作者获取转载授权，否则将追究法律责任

Example 1: GMM

Given a set of observed data, two clusters generated by GMM of two components, we need to find a model to fit them.

For parametric model, we can see the observed data's appearance or empirically assume the model is a GMM of two components.

However, if we add more training data, i.e. we provide a additional cluster, totally three clusters, for parametric model GMM of two components,

we can just change the component weight to fit the training data. Obviously, it cannot fit data well.

For nonparametric model, it can change the model structure or number of parameter depending on the data size.

Example 2: topic modelling

In the similar way, if we have a training set of 6 words w1-w6, and each word belongs to distinct topic, z1-z6 all different, for theta has 6 components, it can fit

these words well. If we add an additional topic word w7, z7, z1-z7 all different, for theta has 6 components, it can not fit the data well, so that LDA generate

the example topic - word distribution

topic 1: w1

topic 2: w2

topic 3: w3

topic 4: w4, w7

topic 5: w5

topic 6: w6

这样就会导致，topic 4 is not meaningful. topic words are not coherent !!

if using nonparametric method, theta can be expanded to 7 components depending on the prior process.

For nonparametric, usually we will give a prior process on the distribution that we estimate. like GP, DP, BP,IBP

Note: nonparametric doesn't mean the model without parameters!!!!!