A key building block in MT SGL

2023-01-28

A key building block in MT-SGL is the computation of the proximal operator in (12) when is the multi-task sparse group lasso regularizer given byFor MT-SGL, the iterates x ≡ Θ are matrices, and the proximal operator is computed at z ≡ Z = Θ + α(Θ − Θ). For the loss function L(·) corresponding to Gaussian (least squares) and Poisson regression, and , respectively. The loss gradients are given bywith , the problem of computing the proximal operator is given by The goal is to be able to compute efficiently. For simple regularizers, such as ℓ-regularization with p ∈  1, 2, ∞ , proximal operators are available in closed-form and can easily be computed (Combettes and Pesquet, 2011). On the other hand, proximal operators for complex regularizers are non-trivial and usually resort to inner iterative subroutine, which becomes frustratingly slow (Yu, 2013a). When complex regularizers are composed of the sum of simple regularizers, researchers have looked for ways to leverage the fact that proximal operators of the summands are easy to compute. In the following, two proximal operators combination strategies are discussed: shape proximal average (Bauschke et al., 2008, Yu, 2013a) and shape proximal composition (Jenatton et al., 2011, Yu, 2013b). Proximal average: It simply averages the solutions from the proximal operator for each simple regularizer, that is, . Besides having nice properties (Bauschke et al., 2008), it DCC-2036 has been shown that proximal average acts as a surrogate function and is a good approximation for the original composite regularizer (Yu, 2013a). Additionally, as proximal operators can be computed independently, it is suitable for parallel proximal gradient algorithms. For the MT-SGL, the proximal average operator is computed aswhere P and are the proximal operators for the ℓ2,1 and G2,1 regularizers, respectively, and are discussed later in this section. Proximal composition:Yu (2013) investigated the proximal operator of the sum of multiple non-smooth functions as the composition of the proximal operators for individual regularizers, that is, . Yu shows sufficient conditions for the decomposition hold (see Theorem 1 in (Yu, 2013b)). Although he also shows that Intervening sequence needs not hold in general, as we show in Section 4.2, such strategy works well in practice. Proximal composition for the MT-SGL composite regularizer (6) is expressed as: , or more specifically Both shape proximal average and shape proximal composition take the proximal operators for the individual regularization terms and make a combination of them. Next, we discuss the proximal operators of the individual regularizers: ℓ2,1 and G2,1, which compose the MT-SGL formulation. We also show that they can be executed efficiently using suitable extensions of soft-thresholding. The proximal map of the ℓ2,1 regularization can be written asSince , the problem decomposes over the rows u, j = 1, …, p. Following (Liu et al., 2009), the row-wise updates can be done by soft-thresholding aswhere u, v are the jth rows of U, V respectively. As for the G2,1 regularization, the proximal map is defined asSince , the problem decomposes as updates over the task specific groups . Following (Yuan et al., 2013), the task-specific group updates can be done by soft-thresholding aswhere are parameters for task-specific groups for group j and task h in Θ and U, respectively. In practice, since the Lipschitz constant κ may be unknown, we perform a backtracking line-search procedure to ensure function minimization. The pseudocode of MT-SGL is summarized in Algorithm 1, where F(Θ) denotes the objective function of MT-SGL as in Eq. (6), Q(Θ1, Θ2) denotes the quadratic approximation as in Eq. (8) for the MT-SGL objective, and denotes the proximal operator for the MT-SGL regularization as in Eq. (16) or (17). The algorithm can be stopped if the change of the function values corresponding to adjacent iterations is within a small value, say 10−4.