Wikimore

Summary

DescriptionExample of Thompson sampling.webp	English: Visualization of Thompson sampling in a simulated simplified context. We want to evaluate different treatment efficacies (our unknowns) in an efficient way. This is a case of basic multi-arm bandit problem. Outcome is simplified as either success of failure, and each treatment has its own (unknown to us) real probability of success (indicated by rotated squares). At each step, a patient comes in, and Thompson sampling is applied to choose which treatment to give. To that end: 1) for each treatment, a random number is picked following our current bayesian belief for that treatment's actual probability of success; 2) the treatment in which we picked the maximum of these random numbers is chosen (argmax) and applied; 3) once we get the result (success or failure), our belief is updated accordingly, and we can go to the next step. The number below each treatment's rotated square represents the numbers of patients who received this treatment up until now. The more a treatment is applied, the less uncertainty we have about its efficacy (the distribution is "thinner"). We can see that here, Thompson sampling rapidly abandons the "bad" treatments and favors the good ones.
Date	10 April 2020
Source	Own work
Author	Nguiard

I, the copyright holder of this work, hereby publish it under the following license:

You are free:

Under the following conditions:

attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.