CS 7545 Fall 2018 Notes 2

Game Theory

Def: A two player (finite strategy) game is given by a pair of matrices

\[N \in \mathbb{R}^{n\times m}, M \in \mathbb{R}^{n\times m}\]

where

\[M_{i,j} = \text{payoff to player 1 if } p_1 \text{ selects action } i \text{ and } p_2 \text{ selects action } j\]

Let’s draw \(M\) here

\[M = \begin{bmatrix} m_{1,1} & \cdots & m_{1,m} \\ m_{2,1} & \cdots & m_{2,m} \\ m_{3,1} & \cdots & m_{3,m} \\ m_{4,1} & \cdots & m_{4,m} \\ m_{5,1} & \cdots & m_{5,m} \\ \vdots & \vdots & \vdots \\ m_{n,1} & \cdots & m_{n,m} \\ \end{bmatrix}\]

Note: \(\textbf{p}^T M \textbf{q}\) is the expected gain of player 1 if \(p_i\) is probability of prayer 1 taking action \(i\) and \(q_j\) is the probability of player 2 taking action \(j\)

Def: A game is zero sum if

\[N = -M\]

Def: A Nash equilibrium is a pair \(\widetilde{p} \in \Delta_n, \widetilde{q} \in \Delta_m,\) s.t.

\[\forall p \in \Delta_n, \widetilde{p}^T M \widetilde{q} \geq p^T M \widetilde{q}\] \[\forall q \in \Delta_m, \widetilde{p}^TN\widetilde{q} \geq \widetilde{p}^TNq\]

Nash’s theorem: There exist a (possibly non-unique) Nash equilibrium for any 2-player game.

Von Neumann’s min-max theorem:

\[∀M \in \mathbb{R}^{n× m}, \min_{p\in \Delta_n} \max_{q\in \Delta_m} p^T M q = \max_{q\in \Delta_m} \min_{p\in \Delta_n} p^T M q\]

No regret algorithm

We say that an algorithm \(\mathcal{A}\) is no-regret if \(\forall \ell_1 \ldots \ell_T \ldots \in [0,1]\) with \(\textbf{p}^t \in \Delta_n\) chosen as \(\textbf{p}^t \leftarrow \mathcal{A}(\ell_1,\ldots,\ell_{t-1})\)

\[\frac{1}{T} \left( \sum_{t=1}^T{\textbf{p}^t \cdot \boldsymbol{\ell}^t} - \min_{p\in \Delta_n}{\sum_{t=1}^{T}{\textbf{p} \cdot \boldsymbol{\ell}^t}} \right) = \epsilon_T = O(1)\]

Observe:
\[\min_{p\in \Delta_n}{\sum_{t=1}^{T}{\textbf{p}^t \cdot \boldsymbol{\ell}_t}} = \min_{i=1\ldots n}{\sum_{t=1}^{T}{\textbf{e}_i \cdot \boldsymbol{\ell}^t}}\]
Note:
- EWA is as no-regret algorithm with \(\epsilon_T \leq \frac{\log N + \sqrt{2 T \log N}}{T} = \frac{\log N}{T} + \sqrt{\frac{2\log N}{T}}\)
- No regret algorithm performs well even in worst case ( e.g. loss chosen against p )

No-Regret Happy theorem

Let \(M\) be \(\mathbb{R}^{n\times m},\,\, \mathcal{A}\) be a no-regret algorithm.
For \(t = 1 \ldots T\),
- \(\textbf{p}^t\) is chosen as \(\mathcal{A}(\ell_1,\ldots,\ell_{t-1}), \text{ where } \ell_s = Mq_s (s = 1\ldots t-1)\)
- \(\textbf{q}^t\) is chosen as \(\textbf{q}^t = \operatorname*{argmax}_{\textbf{q}\in \Delta_m}{\textbf{p}^t \cdot \textbf{M} \textbf{q}}\;\;\;\; \text{ a.k.a.most adversary nature}\)
Q1: How happy is q
\[\begin{aligned} \frac{1}{T}\sum_{t=1}^{T}{\textbf{p}^t \cdot \textbf{M} \textbf{q}^t } &= \frac{1}{T} \sum_{t=1}^{T}{\max_{\textbf{q}}\textbf{p}^t\cdot \textbf{M} \textbf{q}} \\ &≥ \frac{1}{T}\max_{\textbf{q}}{\sum_{t=1}^{T}{(\textbf{p}^t \cdot \textbf{M} \textbf{q})}} \\ &= \frac{1}{T}\max_{\textbf{q}}{\sum_{t=1}^{T}{(\textbf{p}^t)}} \cdot \textbf{M} \textbf{q} = \max_{\textbf{q}}{ \bar{\textbf{p}} } \cdot \textbf{M} \textbf{q} \\ &≥ \min_{\textbf{p}}\max_{\textbf{q}} \textbf{p}\cdot \textbf{M} \textbf{q} \end{aligned}\]
Q2: How happy is p
\[\begin{aligned} \frac{1}{T}\sum_{t=1}^{T}{\textbf{p}^t \cdot \textbf{M} \textbf{q}^t} &= \frac{1}{T}\sum_{t=1}^{T}{\textbf{p}^t \cdot \boldsymbol{\ell}^t} & \\ &= \frac{1}{T}\min_{\textbf{p}}{\sum_{t=1}^{T}{\textbf{p}\cdot \boldsymbol{\ell}^t}} + \epsilon_T & \text{ by definition of no regret} \\ &= \min_{\textbf{p}}{\frac{1}{T} \sum_{t=1}^{T}{\textbf{p} \cdot \textbf{M} \textbf{q}^t}} + \epsilon_T & \\ &= \min_{\textbf{p}}{\textbf{p} \cdot \textbf{M} \bar{\textbf{q}}} + \epsilon_T & \\ &≤ \max_{\textbf{q}} \min_{\textbf{p}} \textbf{p} \cdot \textbf{M} \textbf{q} + \epsilon_T \end{aligned}\]
To summarize:

\[\begin{aligned} \nu^* &= \min_{\textbf{p}}\max_{\textbf{q}} \textbf{p}\cdot \textbf{M} \textbf{q} \\ &\leq \max_{\textbf{q}}{ \bar{\textbf{p}} } \cdot \textbf{M} \textbf{q} \\ &\leq \frac{1}{T}\sum_{t=1}^{T}{\textbf{p}^t \cdot \textbf{M} \textbf{q}^t} \\ &\leq \min_{\textbf{p}}{\textbf{p} \cdot \textbf{M} \bar{\textbf{q}}} + \epsilon_T \\ &\leq \max_{\textbf{q}} \min_{\textbf{p}} \textbf{p} \cdot \textbf{M} \textbf{q} + \epsilon_T \\ &= \nu^*+\epsilon_T \end{aligned}\]

Corollary:

\(\bar{\textbf{p}}\) and \(\overline{\textbf{q}}\) are \(\epsilon_T\)-optimal Nash eq.

Boosting

Given \(\textbf{x}_1,\ldots,\textbf{x}_n \in \mathcal{X}\), \(\textbf{y}_1,\ldots, \textbf{y}_n \in \{-1,1\}\), Hypothesis class \(H = \{ h_1,\ldots,h_m \}\) where \(h : \mathcal{X} \mapsto \{ -1, 1 \}\)

Weak Learner Assumption:
\[∀ \textbf{p} \in \Delta_n,\, ∃ h \in H,\,\text{s.t. if } \textbf{x}_i \text{ show up with probability } p_i,\text{ then }\] \[\operatorname{Pr}\{ h(\textbf{x}_i) \neq y_i \} \leq \frac{1}{2} - \frac{\gamma}{2},\;\; \gamma > 0\]
Which is: n \(∀ \textbf{p} \in \Delta_n,\, ∃ h \in H,\,\text{s.t. } \sum_{i}{p_i\frac{1 - y_ih(\textbf{x}_i) }{2}} \leq \frac{1}{2} - \frac{\gamma}{2}\)

Alternatively:
\[∀ \textbf{p} \in \Delta_n,\, ∃ h \in H,\,\text{s.t. } \gamma \leq \sum_{i}{p_iy_ih(\textbf{x}_i)}\]
Proof of \(WLA \implies SLA\)

Define \(\textbf{M} \in \{ -1, 1 \}^{n×m}\), \(\textbf{M}_{i,j} = h_j(\textbf{x}_i)y_i\), then
\[\sum_{i}{p_iy_ih_j(\textbf{x}_i)} = \textbf{p} \cdot \textbf{M} \textbf{e}_j\]
WLA says for any \(\textbf{p}\) this is a \(j\), we have
\[\gamma \leq \min_{\textbf{p} \in \Delta_n}{\textbf{p} \cdot \textbf{M} \textbf{e}_j} \leq \min_{\textbf{p} \in \Delta_n}\max_{\textbf{q} \in \Delta_m}{\textbf{p} \cdot \textbf{M} q}\]
So
\[0 < \gamma \leq \max_{\textbf{q} \in \Delta_m}\min_{\textbf{p} \in \Delta_n}{\textbf{p} \cdot \textbf{M} q}\]
which is strong Learner assumption:
\[\exists q \in \Delta_m \text{ s.t. } 0 < \min_{\textbf{p} \in \Delta_n}{\textbf{p}^T \textbf{M} q}\]
Strong Learning Assumption: exist \(\textbf{q} \in \Delta_m\), s.t. \(∀ i = 1\ldots n,\\)
\[\sum_{h\in H}{\textbf{q}_h \cdot h(\textbf{x}_i) y_i} > 0\]
How to find \(\textbf{q}\)

If we use a no-regret algorithm to learn p that maximize error of prediction (a.k.a minimizing \(\textbf{p⋅Mq}\)), and we choose \(\textbf{q}\) according to \(\textbf{p}\) to maximize \(\textbf{p⋅Mq}\), then by no regret happy theorem
\[\gamma - \epsilon_T = \min_{\textbf{p}}\max_{\textbf{q}} \textbf{p}\cdot \textbf{M} \textbf{q} - \epsilon_T \leq \min_{\textbf{p}}{\textbf{p} \cdot \textbf{M} \overline{\textbf{q}}}\]
So, whenever \(\epsilon_T < \gamma, \overline{\textbf{q}}\) is what we need.
Boosting by Majority Algorithm:

We use EWA as the no-regret algorithm. (Note: EWA requires that \(\textbf{M} \in [0,1]^{n\times m}\) but here \(\textbf{M} \in \{-1,1\}^{n\times m}\). the professor promise it will work somehow. My thought is that let \(\textbf{M}' = \frac{\textbf{M}+\textbf{1}}{2}\), then \(\textbf{p} \cdot \textbf{M}' \textbf{q} = \textbf{p} \cdot \frac{\textbf{M}+\textbf{1}}{2} \textbf{q} = \frac{1}{2} \textbf{p} \cdot \textbf{Mq} + \underbrace{\textbf{p} \cdot \textbf{1q}}_{=1!}\), so optimal \(\textbf{q}\) for \(\textbf{M}'\) is also optimal for \(\textbf{M}\) )

Let \(T > \frac{2\log N}{\gamma^2}\) (which somehow \(\epsilon_T < \gamma\) at this point), \(\textbf{w}^1 = 1\), For \(t = 1 \ldots T\), Let
\[\begin{aligned} \textbf{p}^t &= \frac{\textbf{w}^t}{\| \textbf{w}^t \|_1} & \\ h_t &= \operatorname*{argmax}_{h\in \mathcal{H}}{\sum_{i=1}^{N}{\textbf{p}^t_ih(\textbf{x}_i)y_i}} & \text{ we should choose q to maximize } \textbf{p}\cdot \textbf{Mq} \\ & &\text { but optimal value always happen at corner } \\ & &\text { which is equivalent to choose best } h_t \\ \textbf{w}^{t+1}_i &= \textbf{w}^t_i \exp{ \left( -\eta h_t(\textbf{x}_i)y_i \right) } & \end{aligned}\]
Output \(\overline{h_T} = \frac{1}{T}\sum_{t=1}^{T}{h_t}\)

Online Convex Optimization

Settings

Let a set \(\mathcal{K} \subset \mathbb{R}^d\) be convex and compact.
- For \(t = 1\ldots T\),
  - Algorithm select \(\textbf{x}_t \in \mathcal{K}\)
  - Nature select convex function \(f_t : \mathcal{K} \mapsto \mathbb{R}\)
Let Regret be \(\left(\sum_{1}^{T}{f_t(\textbf{x}_t)} \right) - \min_{\textbf{x}\in \mathcal{K}}{\sum_{t=1}^{T}{f_t(\textbf{x})}}\)
- Note:
  - This is more general than experts setting (hedge setting)
  - e.g.: set \(\mathcal{K} = \Delta_n,\, f_t(\textbf{x}) = \ell_t \cdot \textbf{x}\)
Online Gradient Descent Algorithm (OGD)
- Define
  \[\operatorname{Proj}_{\mathcal{K}}{x} = \operatorname*{argmin}_{y\in \mathcal{K}}{\|y-x\|_2}\]
  Note: \(\forall \textbf{z} \in \mathcal{K}, \forall \textbf{y}\):
  \[\| \operatorname{Proj}(\textbf{y}) - z\|_2 \leq \|y-z\|_2\]
- OGD Algorithm
  
  Let \(\textbf{x}_0\) be arbitrary \(\textbf{x} \in \mathcal{K}\),
  \[\textbf{x}_{t+1} = \operatorname{Proj}_{\mathcal{K}}{x_t-\eta \nabla_t \text{ where } \nabla_t = \nabla f_t(\textbf{x}_t)}\]
- Theorem
  
  Assume \(\| \nabla f(\textbf{x}_t) \| \leq G,\, \|\textbf{x}_0 - \textbf{x}^* \| \leq D \,(\forall \textbf{x}^* \in \mathcal{K})\), then
  \[\operatorname{Regret}_T(\text{OGD}) \leq GD\sqrt{T}\]
- Proof
  
  Notice that
  \[\begin{aligned} \frac{1}{2} \| \textbf{x}_{t+1} - \textbf{x}^* \|^2 &= \frac{1}{2} \| \operatorname{Proj}_{\mathcal{K}}{\textbf{x}_t - \eta \nabla_t} - \textbf{x}^* \|^2 \\ &\leq \frac{1}{2} \| \textbf{x}_t-\eta \nabla_t - \textbf{x}^* \|^2 \\ &= \frac{1}{2} (\textbf{x}_t - \textbf{x}^* - \eta \nabla_t) \cdot (\textbf{x}_t - \textbf{x}^* - \eta \nabla_t) \\ &= \frac{1}{2} \| \textbf{x}_t - \textbf{x}^* \|^2 + \frac{\eta^2}{2}\| \nabla_t\|^2 - \eta \nabla_t \cdot ( \textbf{x}_t - \textbf{x}^* ) \\ & & \\ \eta \nabla_t \cdot ( \textbf{x}_t - \textbf{x}^* ) &\leq \frac{1}{2} \left( \| \textbf{x}_t - \textbf{x}^* \|^2 - \| \textbf{x}_{t+1} - \textbf{x}^* \|^2 \right) + \frac{\eta^2}{2}\| \nabla_t\|^2 \end{aligned}\]
  Also notice that if \(f\) is convex, then \(f(\textbf{x}^*) - f(\textbf{x}) \geq \nabla f(\textbf{x})(\textbf{x}^* - \textbf{x})\), so
  \[\nabla_t \cdot ( \textbf{x}_t - \textbf{x}^* ) \geq f(\textbf{x}_t) - f(\textbf{x}^*)\]
  So
  \[\begin{aligned} \operatorname{Regret}_T(\text{OGD}) &= \sum { f(\textbf{x}_t) - f(\textbf{x}^*) } \\ &\leq \sum_{t=1}^{T} {\nabla_t \cdot ( \textbf{x}_t - \textbf{x}^* ) } \\ &\leq \sum_{t=1}^{T} { \left( \frac{1}{2\eta} \left( \| \textbf{x}_t - \textbf{x}^* \|^2 - \| \textbf{x}_{t+1} - \textbf{x}^* \|^2 \right) + \frac{\eta}{2}\| \nabla_t\|^2 \right) } \\ &\leq \sum_{t=1}^{T} { \frac{1}{2\eta} \left( \| \textbf{x}_t - \textbf{x}^* \|^2 - \| \textbf{x}_{t+1} - \textbf{x}^* \|^2 \right) } + \frac{\eta}{2} TG^2 \\ &\leq \frac{1}{2\eta} \left( (\underbrace{\| \textbf{x}_1 - \textbf{x}^* \|^2}_{\leq D^2} + \underbrace{ - \| \textbf{x}_{T+1} - \textbf{x}^* \|^2 }_{\leq 0} \right) + \frac{\eta}{2} TG^2 \\ &\leq \frac{1}{2\eta} D^2 + \frac{\eta}{2} TG^2 \\ \end{aligned}\]
  Set \(\eta = \frac{D}{G\sqrt{T}}\), we have
  \[\operatorname{Regret}_T(\text{OGD}) \leq DG\sqrt{T}\]