§5.4: Degree-1 weight

In this section we prove two theorems about the degree-$1$ Fourier weight of boolean functions: \[ \mathbf{W}^{1}[f] = \sum_{i=1}^n \widehat{f}(i)^2. \]

This important quantity can be given a combinatorial interpretation thanks to the noise stability formula $\mathbf{Stab}_\rho[f] = \sum_{k\geq 0}\rho^k \cdot \mathbf{W}^{k}[f]$: \[ \text{For $f : \{-1,1\}^n \to {\mathbb R}$,} \quad \mathbf{W}^{1}[f] = \frac{d}{d \rho} \mathbf{Stab}_\rho[f]\Bigm|_{\rho = 0}. \] Thinking of $\|f\|_2$ as constant and $\rho \to 0$, the noise stability formula implies \[ \mathbf{Stab}_\rho[f] = \mathop{\bf E}[f]^2 + \mathbf{W}^{1}[f]\rho \pm O(\rho^2), \] or equivalently, \[ \mathop{\bf Cov}_{\substack{({\boldsymbol{x}}, \boldsymbol{y}) \\ \text{ $\rho$-correlated}}}[f({\boldsymbol{x}}),f(\boldsymbol{y})] = \mathbf{W}^{1}[f] \rho \pm O(\rho^2). \] In other words, for $f : \{-1,1\}^n \to \{-1,1\}$ the degree-$1$ weight quantifies the extent to which $\mathop{\bf Pr}[f({\boldsymbol{x}}) = f(\boldsymbol{y})]$ increases when ${\boldsymbol{x}}$ and $\boldsymbol{y}$ go from being uncorrelated to being slightly correlated.

There is an additional viewpoint if we think of $f$ as the indicator of a subset $A \subseteq \{-1,1\}^n$ and its noise sensitivity $\mathbf{NS}_{\delta}[f]$ as a notion of $A$’s “surface area”, or “noisy boundary size”. For nearly maximal noise rates — i.e., $\delta = \tfrac{1}{2} – \tfrac{1}{2} \rho$ where $\rho$ is small — we have that $A$’s noisy boundary size is “small” if and only if $\mathbf{W}^{1}[f]$ is “large” (vis-a-vis $A$’s measure). Two examples suggest themselves when thinking of subsets of the Hamming cube with small “boundary”: subcubes and Hamming balls.

Proposition 21 Let $f : {\mathbb F}_2^n \to \{0,1\}$ be the indicator of a subcube of codimension $k \geq 1$ (e.g., the $\mathrm{AND}_k$ function). Then $\mathop{\bf E}[f] = 2^{-k}$, $\mathbf{W}^{1}[f] = k 2^{-2k}$.

Proposition 22 Fix $t \in {\mathbb R}$. Consider the sequence of LTFs $f_n : \{-1,1\}^n \to \{0,1\}$ defined by $f_n(x) = 1$ if and only if $\sum_{i=1}^n \tfrac{1}{\sqrt{n}} x_i > t$. (I.e., $f_n$ is the indicator of the Hamming ball $\{x : \mathrm{Dist}(x, (1, \dots, 1)) < \frac{n}{2} – \frac{t}{2}\sqrt{n}\}$.) Then \[ \lim_{n \to \infty} \mathop{\bf E}[f_n] = \overline{\Phi}(t), \qquad \lim_{n \to \infty} \mathbf{W}^{1}[f_n] = \phi(t)^2. \]

You are asked to verify these facts in the exercises. Regarding Proposition 22, it’s natural for $\phi(t)$ to arise since $\mathbf{W}^{1}[f_n]$ is related to the influences of $f_n$, and coordinates are influential for $f_n$ if and only if $\sum_{i=1}^n \tfrac{1}{\sqrt{n}} x_i {\approx t}$. If we write $\alpha = \lim_{n \to \infty} \mathop{\bf E}[f_n]$ then this proposition can be thought of as saying that $\mathbf{W}^{1}[f_n] \to \mathcal{U}(\alpha)^2$, where $\mathcal{U}$ is defined as follows:

Definition 23 The Gaussian isoperimetric function $\mathcal{U} : [0,1] \to [0,\frac{1}{\sqrt{2\pi}}]$ is defined by $\mathcal{U} = \phi \circ \Phi^{-1}$. This function is symmetric about $1/2$; i.e., $\mathcal{U} = \phi \circ \overline{\Phi}{}^{-1}$.

The name of this function will be explained when we prove the Gaussian isoperimetric inequality in Chapter 11. For now we’ll just use the following fact:

Proposition 24 For $\alpha \to 0^+$, $\mathcal{U}(\alpha) \sim \alpha \sqrt{2 \ln(1/\alpha)}$.

Proof: Write $\alpha = \overline{\Phi}(t)$, where $t \to \infty$. We use the fact that $\overline{\Phi}(t) \sim \phi(t)/t$. Thus \begin{gather*} \alpha \sim \tfrac{1}{\sqrt{2\pi} t}\exp(-t^2/2) \quad \Rightarrow \quad t \sim \sqrt{2\ln (1/\alpha)} \\ \phi(t) \sim \overline{\Phi}(t) \cdot t \quad \Rightarrow \quad \mathcal{U}(\alpha) \sim \alpha \cdot t \sim \alpha \sqrt{2\ln(1/\alpha)}. \Box \end{gather*}

Given Propositions 21 and 22, let’s consider the degree-$1$ Fourier weight of subcubes and Hamming balls asymptotically as their “volume” $\alpha = \mathop{\bf E}[f]$ tends to $0$. For the subcubes we have $\mathbf{W}^{1}[f] = \alpha^2 \log(1/\alpha)$. For the Hamming balls we have $\mathbf{W}^{1}[f_n] \to \mathcal{U}(\alpha)^2 \sim 2\alpha^2 \ln(1/\alpha)$. So in both cases we have an upper bound of $O(\alpha^2 \log(1/\alpha))$.

You should think of this upper bound $O(\alpha^2 \log(1/\alpha))$ as being unusually small. The obvious a priori upper bound, given that $f : \{-1,1\}^n \to \{0,1\}$ has $\mathop{\bf E}[f] = \alpha$, is \[ \mathbf{W}^{1}[f] \leq \mathop{\bf Var}[f] = \alpha(1-\alpha) \sim \alpha. \] Yet subcubes and Hamming balls have degree-$1$ weight which is almost quadratically smaller. In fact the first theorem we will show in this section is the following:

Level-1 Inequality Let $f : \{-1,1\}^n \to \{0,1\}$ have mean $\mathop{\bf E}[f] = \alpha \leq 1/2$. Then \[ \mathbf{W}^{1}[f] \leq O(\alpha^2\log(1/\alpha)). \]

(For the case $\alpha \geq 1/2$, replace $f$ by $1-f$.) Thus all small subsets of $\{-1,1\}^n$ have unusually small $\mathbf{W}^{1}[f]$; or equivalently (in some sense), unusually large “noisy boundary”. This is another key illustration of the idea that the Hamming cube is a “small-set expander”.

Remark 25 The bound in the Level-1 Inequality has a sharp form, $\mathbf{W}^{1}[f] \leq (2+o_\alpha(1)) \alpha^2\ln(1/\alpha)$. Thus Hamming balls are in fact the “asymptotic maximizers” of $\mathbf{W}^{1}[f]$ among sets of small volume $\alpha$. Also, the inequality holds more generally for $f : \{-1,1\}^n \to [-1,1]$ with $\alpha = \mathop{\bf E}[|f|]$.

Remark 26 The name “Level-1 Inequality” is not completely standard; e.g., in additive combinatorics the result would be called Chang’s Inequality. We use this name because we will also generalize to “Level-$k$ Inequalities” in Chapter 10.

So far we considered maximizing degree-$1$ weight among subsets of the Hamming cube of a fixed small volume, $\alpha$. The second theorem in this section is concerned with what happens when there is no volume constraint. In this case, maximizing examples tend to have volume $\alpha = 1/2$; switching the notation to $f : \{-1,1\}^n \to \{-1,1\}$, this corresponds to $f$ being unbiased ($\mathop{\bf E}[f] = 0$). The unbiased Hamming ball is $\mathrm{Maj}_n$, which we know has $\mathbf{W}^{1}[\mathrm{Maj}_n] \to \tfrac{2}{\pi}$. This is quite large. But unbiased subcubes are just the dictators $\chi_i$ and their negations; these have $\mathbf{W}^{1}[\pm \chi_i] = 1$ which is obviously maximal.

Thus the question of which $f : \{-1,1\}^n \to \{-1,1\}$ maximizes $\mathbf{W}^{1}[f]$ has a trivial answer. But this answer is arguably unsatisfactory, since dictators (and their negations) are not “really” functions of $n$ bits. Indeed, when we studied social choice in Chapter 2 we were motivated to rule out functions $f$ having a coordinate with unfairly large influence. And in fact Proposition 2.57 showed that if all $\widehat{f}(i)$ are equal (and hence small) then $\mathbf{W}^{1}[f] \leq \frac{2}{\pi} + o_n(1)$. The second theorem of this section significantly generalizes Proposition 2.57:

The ${\frac{\mathbf 2}{\boldsymbol{\pi}}}$ Theorem Let $f : \{-1,1\}^n \to \{-1,1\}$ satisfy $|\widehat{f}(i)| \leq \epsilon$ for all $i \in [n]$. Then \begin{equation} \label{eqn:regular-W1-bound} \mathbf{W}^{1}[f] \leq \tfrac{2}{\pi} +O(\epsilon). \end{equation} Further, if $\mathbf{W}^{1}[f] \geq \tfrac{2}{\pi} – \epsilon$ then $f$ is $O(\sqrt{\epsilon})$-close to the LTF $\mathrm{sgn}(f^{=1})$.

Functions $f$ with $|\widehat{f}(i)| \leq \epsilon$ for all $i \in [n]$ are called $(\epsilon,1)$-regular; see Chapter 6. So the $\frac{2}{\pi}$ Theorem says (roughly speaking) that within the class of $(\epsilon,1)$-regular functions, the maximal degree-$1$ weight is $\frac{2}{\pi}$, and any function achieving this is an unbiased LTF. Further, from Theorem 14 we know that all unbiased LTFs which are $(\epsilon,1)$-regular achieve this.

Remark 27 Since we have $\mathbf{Stab}_\rho[f] \approx \mathbf{W}^{1}[f] \rho$ and $\frac{2}{\pi} \arcsin \rho \approx \frac{2}{\pi} \rho$ when $\rho$ is small, the $\frac{2}{\pi}$ Theorem gives the Majority Is Stablest Theorem in the limit $\rho \to 0^+$.

Let’s now discuss how we’ll prove our two theorems about degree-$1$ weight. Let $f : \{-1,1\}^n \to \{0,1\}$ and $\alpha = \mathop{\bf E}[f]$; we think of $\alpha$ as small for the Level-1 Inequality and $\alpha = 1/2$ for the $\frac{2}{\pi}$ Theorem. By Plancherel, $\mathbf{W}^{1}[f] = \mathop{\bf E}[f({\boldsymbol{x}}) L({\boldsymbol{x}})]$, where \[ L(x) = f^{=1}(x) = \widehat{f}(1)x_1 + \cdots + \widehat{f}(n) x_n. \] To upper-bound $\mathop{\bf E}[f({\boldsymbol{x}}) L({\boldsymbol{x}})]$, consider that as ${\boldsymbol{x}}$ varies the real number $L({\boldsymbol{x}})$ may be rather large or small, but $f({\boldsymbol{x}})$ is always $0$ or $1$. Given that $f({\boldsymbol{x}})$ is $1$ on only a $\alpha$ fraction of ${\boldsymbol{x}}$’s, the “worst case” for $\mathop{\bf E}[f({\boldsymbol{x}}) L({\boldsymbol{x}})]$ would be if $f(x)$ were $1$ precisely on the $\alpha$ fraction of $x$’s where $L(x)$ is largest. In other words, \begin{equation} \label{eqn:talagrand1} \mathbf{W}^{1}[f] = \mathop{\bf E}[f({\boldsymbol{x}}) L({\boldsymbol{x}})] \leq \mathop{\bf E}[\boldsymbol{1}_{\{L({\boldsymbol{x}}) \geq t\}} \cdot L({\boldsymbol{x}})], \end{equation} where $t$ is chosen so that \begin{equation} \label{eqn:what-is-t} \mathop{\bf Pr}[L({\boldsymbol{x}}) \geq t] \approx \alpha. \end{equation} But now we can analyze \eqref{eqn:talagrand1} quite effectively using tools such as Hoeffding’s bound and the Central Limit Theorem, since $L({\boldsymbol{x}})$ is just a linear combination of independent $\pm 1$ random bits. In particular $L({\boldsymbol{x}})$ has mean $0$ and standard deviation $\sigma = \sqrt{\mathbf{W}^{1}[f]}$ so by the CLT it acts like the Gaussian $\boldsymbol{Z} \sim \mathrm{N}(0,\sigma^2)$, at least if we assume all $|\widehat{f}(i)|$ are small. If we are thinking of $\alpha = 1/2$, then $t = 0$ and we get \[ \sigma^2 = \mathbf{W}^{1}[f] \leq \mathop{\bf E}[\boldsymbol{1}_{\{L({\boldsymbol{x}}) \geq 0\}} \cdot L({\boldsymbol{x}})] \approx \mathop{\bf E}[\boldsymbol{1}_{\{\boldsymbol{Z} \geq 0\}} \cdot \boldsymbol{Z}] = \tfrac{1}{\sqrt{2\pi}} \sigma; \] This implies $\sigma^2 \lessapprox \tfrac{1}{2\pi}$, as claimed in the $\frac{2}{\pi}$ Theorem (after adjusting $f$’s range to $\{-1,1\}$). If we are instead thinking of $\alpha$ as small then \eqref{eqn:what-is-t} suggest taking $t \sim \sigma \sqrt{2\ln(1/\alpha)}$ so that $\mathop{\bf Pr}[\boldsymbol{Z} \geq t] \approx \alpha$. Then a calculation akin to the one in Proposition 24 implies \[ \mathbf{W}^{1}[f] \leq \mathop{\bf E}[\boldsymbol{1}_{\{L({\boldsymbol{x}}) \geq t\}} \cdot L({\boldsymbol{x}})] \approx \alpha \cdot \sigma \sqrt{2\ln(1/\alpha)}, \] from which the Level-1 Inequality follows. In fact, we don’t even need all $|\widehat{f}(i)|$ small for this latter analysis; for large $t$ it’s possible to upper-bound \eqref{eqn:talagrand1} using only Hoeffding’s bound:

Lemma 28 Let $\ell(x) = a_1 x_1 + \cdots + a_n x_n$, where $\sum_{i} a_i^2 = 1$. Then for any $s \geq 1$, \[ \mathop{\bf E}[\boldsymbol{1}_{\{|\ell({\boldsymbol{x}})| > s\}} \cdot |\ell({\boldsymbol{x}})|] \leq (2s+2)\exp(-\tfrac{s^2}{2}). \]

Proof: We have \begin{align*} \mathop{\bf E}[\boldsymbol{1}_{\{|\ell({\boldsymbol{x}})| > s\}} \cdot |\ell({\boldsymbol{x}})|] &= s\mathop{\bf Pr}[|\ell({\boldsymbol{x}})| > s] + \int_{s}^\infty \mathop{\bf Pr}[|\ell({\boldsymbol{x}})| > u]\,du \\ &\leq 2s\exp(-\tfrac{s^2}{2}) + \int_{s}^\infty 2\exp(-\tfrac{u^2}{2})\,du, \end{align*} using Hoeffding’s bound. But for $s \geq 1$, \[ \int_{s}^\infty 2\exp(-\tfrac{u^2}{2})\,du \leq \int_{s}^\infty u \cdot 2\exp(-\tfrac{u^2}{2})\,du = 2\exp(-\tfrac{s^2}{2}). \Box \]

We now give formal proofs of the two theorems, commenting that rather than $L(x)$ it’s more convenient to work with \[ \ell(x) = \tfrac{1}{\sigma} f^{=1}(x) = \tfrac{\widehat{f}(1)}{\sigma}x_1 + \cdots + \tfrac{\widehat{f}(n)}{\sigma}x_n. \]

Proof of the Level-1 Inequality: Following Remark 25 we let $f : \{-1,1\}^n \to [-1,1]$ and $\alpha = \mathop{\bf E}[|f|]$. We may assume $\sigma = \sqrt{\mathbf{W}^{1}[f]} > 0$. Writing $\ell = \frac{1}{\sigma} f^{=1}$ we have $ \langle f, \ell \rangle = \tfrac{1}{\sigma} \langle f, f^{=1} \rangle = \tfrac{1}{\sigma} \mathbf{W}^{1}[f] = \sigma $ and hence \[ \sigma = \langle f, \ell \rangle = \mathop{\bf E}[\boldsymbol{1}_{\{|\ell({\boldsymbol{x}})| \leq s\}} \cdot f({\boldsymbol{x}}) \ell({\boldsymbol{x}})] + \mathop{\bf E}[\boldsymbol{1}_{\{|\ell({\boldsymbol{x}})| > s\}} \cdot f({\boldsymbol{x}}) \ell({\boldsymbol{x}})] \] holds for any $s \geq 1$. The first expectation above is at most $\mathop{\bf E}[s |f({\boldsymbol{x}})|] = \alpha s$, and the second is at most $(2+2s)\exp(-s^2/2) \leq 4s\exp(-s^2/2)$ by Lemma 28. Hence \[ \sigma \leq \alpha s + 4s \exp(-s^2/2). \] The optimal choice of $s$ is $s = (\sqrt{2} + o_\alpha(1)) \sqrt{\ln(1/\alpha)}$, yielding \[ \sigma \leq (\sqrt{2} + o(1)) \alpha\sqrt{\ln(1/\alpha)}. \] Squaring this establishes the claim $\sigma^2 \leq (2+o_\alpha(1)) \alpha^2\ln(1/\alpha)$. $\Box$

Proof of the $\frac{2}{\pi}$ Theorem: We may assume $\sigma = \sqrt{\mathbf{W}^{1}[f]} \geq 1/2$: for the theorem’s first statement this is because otherwise there is nothing to prove; for the theorem’s second statement this is because we may assume $\epsilon$ sufficiently small.

We start by proving \eqref{eqn:regular-W1-bound}. Let $\ell = \frac{1}{\sigma} f^{=1}$, so $\|\ell\|_2 = 1$ and $|\widehat{\ell}(i)| \leq 2\epsilon$ for all $i \in [n]$. We have \begin{equation} \label{eqn:ltf-test1} \sigma = \langle f, \ell \rangle \leq \mathop{\bf E}[|\ell|] \leq \sqrt{\tfrac{2}{\pi}} + C \epsilon \end{equation} for some constant $C$, where we used Theorem 13. Squaring this proves \eqref{eqn:regular-W1-bound}. We observe that \eqref{eqn:regular-W1-bound} therefore holds even for $f : \{-1,1\}^n \to [-1,1]$.

Now suppose we also have $\mathbf{W}^{1}[f] \geq \tfrac{2}{\pi} – \epsilon$; i.e., \[ \sigma \geq \sqrt{\tfrac{2}{\pi} - \epsilon} \geq \sqrt{\tfrac{2}{\pi}} - 2\epsilon. \] Thus the first inequality in \eqref{eqn:ltf-test1} must be close to tight; specifically, \begin{equation} \label{eqn:ltf-test-contra} (C+2)\epsilon \geq \mathop{\bf E}[|\ell|] – \langle f, \ell\rangle = \mathop{\bf E}[(\mathrm{sgn}(\ell({\boldsymbol{x}})) - f({\boldsymbol{x}})) \cdot \ell({\boldsymbol{x}})]. \end{equation} By the Berry–Esseen Theorem (and Remark 12 and an exercise), \[ \mathop{\bf Pr}[|\ell| \leq K\sqrt{\epsilon}] \leq \mathop{\bf Pr}[|\mathrm{N}(0,1)| \leq K\sqrt{\epsilon}] + .56 \cdot 2\epsilon \leq \tfrac{1}{\sqrt{2\pi}} \cdot 2K\sqrt{\epsilon} + 1.12 \epsilon \leq 2K\sqrt{\epsilon} \] for any constant $K \geq 1$. We therefore have the implication \begin{align*} \mathop{\bf Pr}[f \neq \mathrm{sgn}(\ell)] \geq 3K\sqrt{\epsilon} &\Rightarrow \mathop{\bf Pr}[f({\boldsymbol{x}}) \neq \mathrm{sgn}(\ell({\boldsymbol{x}}))\ \wedge\ |\ell({\boldsymbol{x}})| > K\sqrt{\epsilon}] \geq K \sqrt{\epsilon} \\ &\Rightarrow \mathop{\bf E}[(\mathrm{sgn}(\ell({\boldsymbol{x}})) - f({\boldsymbol{x}})) \cdot \ell({\boldsymbol{x}})] \geq K\sqrt{\epsilon} \cdot 2(K\sqrt{\epsilon}) = 2K^2 \epsilon. \end{align*} This contradicts \eqref{eqn:ltf-test-contra} for $K = \sqrt{C+2}$, say. Thus $\mathop{\bf Pr}[f \neq \mathrm{sgn}(\ell)] \leq 3\sqrt{C+2}\sqrt{\epsilon}$, completing the proof. $\Box$

For an interpolation between these two theorems, see the exercises.

We conclude this section with an application of the Level-1 Inequality. First, a quick corollary which we leave for the exercises:

Corollary 29 Let $f : \{-1,1\}^n \to \{-1,1\}$ have $|\mathop{\bf E}[f]| \geq 1 – \delta \geq 0$. Then $\mathbf{W}^{1}[f] \leq 4\delta^2 \log(2/\delta)$.

In Chapter 2.5 we stated the FKN Theorem, which says that if $f : \{-1,1\}^n \to \{-1,1\}$ has $\mathbf{W}^{1}[f] \geq 1 – \delta$ then it must be $O(\delta)$-close to a dictator or negated-dictator. The following theorem shows that once the FKN Theorem is proved, it can be strengthened to give an essentially optimal (see the exercises) closeness bound:

Theorem 30 Suppose the FKN Theorem holds with closeness bound $C\delta$, where $C \geq 1$ is a universal constant. Then in fact it holds with bound $\delta/4 + \eta$, where $\eta = 16C^2 \delta^2 \max(\log(1/C\delta),1)$.

Proof: Suppose $f : \{-1,1\}^n \to \{-1,1\}$ has $\mathbf{W}^{1}[f] \geq 1 – \delta \geq 0$. By assumption $f$ is $C\delta$-close to $\pm \chi_i$ for some $i \in [n]$, say $i = n$. Thus we have \[ |\widehat{f}(n)| \geq 1 - 2C\delta \] and our task is to show that in fact $|\widehat{f}(n)| \geq 1 – \delta/2 – 2\eta$. We may assume $\delta \leq \frac{1}{10C}$ as otherwise $1 – \delta/2 – 2\eta < 0$ (exercise) and there is nothing to prove. By employing the trick from Exercise 2.42 we may also assume $\mathop{\bf E}[f] = 0$.

Consider the restriction of $f$ given by fixing coordinate $n$ to $b \in \{-1,1\}$; i.e., $f_{[n-1] \mid b}$. For both choices of $b$ we have $|\mathop{\bf E}[f_{[n-1] \mid b}]| \geq 1 – 2C\delta$ and so Corollary 29 implies $\mathbf{W}^{1}[f_{[n-1] \mid b}] \leq 16C^2\delta^2 \log(1/C\delta)$. Thus \[ 16C^2\delta^2 \log(1/C\delta) \geq \mathop{\bf E}_{\boldsymbol{b}}[\mathbf{W}^{1}[f_{[n-1] \mid \boldsymbol{b}}]] = \sum_{j < n} (\widehat{f}(\{j\})^2 + \widehat{f}(\{j, n\})^2) \geq \sum_{j < n} \widehat{f}(j)^2, \] by Corollary 3.22. It follows that \[ \widehat{f}(n)^2 = \mathbf{W}^{1}[f] – \sum_{j<n}\widehat{f}(j)^2 \geq 1- \delta – 16C^2\delta^2 \log(1/C\delta), \] and the proof is completed by the fact that \[ 1- \delta - 16C^2\delta^2 \log(1/C\delta) \geq (1-\delta/2 - 2\eta)^2 \] when $\delta \leq \frac{1}{10C}$ (exercise). $\Box$

5 comments to §5.4: Degree-1 weight

Leave a Reply




You can use most standard LaTeX, as well as these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>