BLEU

  • Bilingual Evaluation Understudy
  • MT quality metric
  • Precision Based
  • Precisionn=phypngrampCountclip(ngram)phypngrampCount(ngram)Precision_n=\frac{\sum\limits_{p \in hyp}\sum\limits_{n-gram \in p}Count_{clip}(n-gram)}{\sum\limits_{p \in hyp} \sum\limits_{n-gram \in p}Count(n-gram)}
  • Countclip(ngram)=min(matched n-gram count,maxrRef(n-gram count in r))Count_{clip}(n-gram) = min(\text{matched n-gram count}, max_{r \in Ref} (\text{n-gram count in r}))
  • A weighted logarithmic average : Consider the exponential decay observed in the n-gram precision
n=1Npnwnn=1Nwn=1n=1Nwnexp(n=1Nwnlnpn)=exp(1Nn=1Nlnpn)\sqrt[\sum_{n=1}^N w_n]{\prod_{n=1}^N p_n^{w_n}}=\frac{1}{\sum_{n=1}^N w_n} \exp \left(\sum_{n=1}^N w_n * \ln p_n\right)=\exp \left(\frac{1}{N} * \sum_{n=1}^N \ln p_n\right)

Details

  • Brevity Penalty
  • BP={1if c>re(1rc)if crBP = \begin{cases} 1 & \text{if } c > r \\ e^{(1 - \frac{r}{c})} & \text{if } c \leq r \end{cases}
  • BLEU=BPexp(n=1Nwnlogpn)BLEU = BP \cdot \exp \left(\sum\limits_{n=1}^N w_n \log p_n\right)
  • logBLEU=min(1rc,0)+n=1Nwnlogpn\log BLEU = min(1-\frac{r}{c}, 0) + \sum\limits_{n=1}^N w_n \log p_n

ROUGE

Recall-Oriented Understudy for Gisting Evaluation

ROUGE-N

An n-gram recall between a candidate summary and a set of reference summaries

Recall=TPTP+FNRecall=\frac{TP}{TP+FN}

OverView