synthval.metrics.KLDivergenceEstimation#

class synthval.metrics.KLDivergenceEstimation(drop_duplicates=True)#

Bases: SimilarityMetric

Similarity Metric computing an estimation of the Kullback-Leibler divergence based on the methodology proposed in the referenced paper. It should be noted that the algorithm used may cause a division-by-zero error if duplicates are present in the distributions considered.

Parameters:

drop_duplicates (bool)

drop_duplicates#

Flag controlling if the duplicates in the distribution can be dropped automatically (default: True).

Type:

bool, Optional

References

Pérez-Cruz, F. - Kullback-Leibler divergence estimation of continuous distributions - IEEE International Symposium on Information Theory, 2008.

calculate(real_dist_df, synth_dist_df)#

Compute an estimation of the Kullback-Leibler divergence between two set of samples originating from two multivariate distribution real_dist and synth_dist.

Parameters:
  • real_dist_df (pandas.DataFrame) – Set of samples representing distribution real_dist.

  • synth_dist_df (pandas.DataFrame) – Set of samples representing distribution synth_dist.

Returns:

A numpy array containing the estimated value of the Kullback-Leibler divergence.

Return type:

numpy.ndarray