Can the posterior information of a parameter be more extreme than both the prior information and the data evidence (likelihood function)? In one of our consulting/collaborative projects with the J&J, we encountered such a counter-intuitive (paradoxical?) phenomenon, as illustrated in the figures below. This counter-intuitive result on the parameter of interest is caused by "marginalizations" of the joint distribution/likelihood functions; but it is not the same "marginalization paradox" described by Berger (2006) and Dawid et al. (1973). A brief description and discussion/interpretation of the problem is on a separate page (click here).
Caption: The figure on the left hand side contains contour plots of a bi-beta prior [in blue], the joint likelihood function [in black] and the (simulated) posterior distribution [in red] of the binomial parameters (p0, p1). These three two-dimensional distributions are projected to the direction of the parameter of interest d = p1 - p0 (off-diagonal 45 degree line pointing towards the upper left corner). This leads to the figure on the right hand side, in which the marginal posterior of d = p1- p0 [in red] is more extreme than its prior [in blue] and data evidence [in black]!
Remark: When the prior is less skewed, this counter-intuitive result may be somewhat mitigated, depending on the structure of the likelihood function and the prior. But the phenomenon is mathematical and is there to stay --- as long as skewed distributions are involved!