Day 3 -- In the Mix
Wednesday, March 17, 2010 at 10:08AM Posting live from the Dictionary Learning for Sparse Representation special session:
“An l1 Criterion for Dictionary Learning by Subspace Identification” Florent Jaillet, Remi Gribonval, Mark D. Plumbley, Hadi Zayyani
- First on the topic of sparse representation.
- Usefulness of sprase representations “Such as Compressed sensing, which is a very active area
- Challenge is finding sparse representations for the new kinds of data we encounter: “What is a sparse representation for social network data?”
- How do we find sparse representations for data sets without any kind of a priori knowledge.
- The idea for dictionary learning (of course) is that we've got some set of data, we assume that there is some underlying "dictionary" which represents the data sparsely that we can recover from some subset/training data.
- In subspace models, we desire to find a set of orthogonal vectors which can be used as a basis for transforming the data into a sparse domain.
- Proposed is the identification of normal vectors, vectors which are orthogonal to many training samples. I.e. finding a vector that is orthogonal to a large enough subset of the training samples will probably be orthogonal to the set as a whole.
- The propose approach is find the best set of orthogonal vectors which represent the training data with the smallest l1 magnitude
- We're looking at some 2D empirical examples now, which consist of something like a gaussian blob of random points with some "lines" of random points drawn through the center of the 2D gaussian. A vector is placed at the mean of this set and then rotated around (like you do in polar coordinate system) and calculate the l1 magnitude at each angle and make a plot and find the minimum
- This is cool, he is showing the same thing in 3D and you get this crazy looking blob plot.
- The point is being made with all this is that the problem is non-convex, which of course makes the minimization problematic, so the method must rely on steepest descent, which is implemented as a quadratic program in the paper. I.e. the method is hampered by local minima.
- Some theoretical garuntees: If the number of training samples is greater than a function of the dimensionality, then the local minima exist at the correct solution with high probability.
- The authors are now progressing on to higher dimension problems and to develop a full algorithmic framework.
- What happens when the subspaces are close how robust is the method? "The greater the coherence, the less robust the system is to outliers."
- Why try to identify orthogonal vectors rather than the atoms of the dictionary? "The algorithm needs to be derived to perform the optimization tends to be too complex in that situation, and looking at the orthogonal vectors provides an easier analysis."
- What about the number of atoms or subspaces, is it assumed to be known? "We are assuming here that we know it, it is a given input of the problem. What I believe is that based on our theoretical guarantee so far...provided the subspaces are not too coorelated, and given a sufficent number of training samples, we can identify the local minima. So if we do not know the dimensionality, given enough training samples, you should be able to identify the number of dimensions from the number of local minimia [but not in a robust fashion]. This is an open problem."
"Structured and Incoherent Parametric Disctionary Design" Mehrdad Yaghooobi, Laurent Daudet, Michael Davies
- Defining the problem as trying to find the dictionary, D, which gives us the best sparsity.
- Some previous methods: concatenation of orthonormal bases, dictionary learning using a set of training samples, but for this paper they focus on dictionary design subject to a certain property such as the RIP, minimum coherence, etc.
- One important goal for this work is to find a dictionary whose atoms have very small coherence.
- One kind of incoherent basis is the Equiangular Tight Frame (ETF), and is given as an "ideal" dictionary.
- Parametric dictionaries: a mapping from a set of parameters to a set of atoms (i.e. a functionally produced dictionary)
- Structured dictionaries: a dictionary is structured if the atoms are geometrically related.
- And the, of course, they put these two together with the structured property of being shift-invariant.
- The formulation is to minimize the difference between a couple of Gramm matrices in the l-inf norm.
- Okay a lot of crazy math just happened when it came to relaxing the formulation but the usm of that is the fact "that this can be solved very efficiently." I'll have to take his word on it
- How close to being an ETF has this methodology gotten? "We are still quite far".
- But there is still gain the performance of signal recovery using the parametric dictionaries.
- Future work: investigating structures other than shift-invariance, and applying the parametric dictionary model to the dictionary learning model.
"A Union of Incoherent Spaces Model for Classification" K. Schnass, Pierre Vandergheynst
- Some times, classification of large dimensionality problems can be problematic due to training time and storage space, so some kind of dimensionality reduction usually needs to be employed.
- In general, when we do this, we apply the same matrix over the entire training set for dimensionality reduction
- What is proposed is a more general model, where each class is embedded into the lower dimension using class-specific features.
- So how do we choose which features are the best features to check for each class? This is the core of the proposed method, and the given formulation is relaxed using the Grassmanian space theory for low rank feature matrices.
- The optimal feature matrix is found via alternate projections between two sets, which we want to find the intersection of. I'm not really quite clear on what these two sets are...ahh one is the set of orthonormal matrices, and the other is the set of matrices that give stability to the classification.
- They tested this method on the Yale B and AR databased on identifying subjects. Out of 2414 images, the best success rate of classification was 98.87%...slightly better result than some state of the art method run on the Yale set.
- There is no convergence proof (the objective is non-convex in nature).
"Adaptive Compressed Sensing: New Class of Self-Organizing Models for Neuroscience" Will Coulter, Christopher Hillar, Guy Isely, Friedrich Sommer
- Problem: do agents (populations of neurons) communicate structured information across regions of space in the brain.
- Some population of neurons is teasing out the sparse structure information of some visual signal in some way.
- Can this be applied to CS? I.e. is reasonable for CS to be run in the environment of the brain.
- Here, we aren't really interested in reconstructing the whole image, just kind of getting the impression of it.
- The big idea in classification in past few years isn't carving out some complex decision boundry, but rather to rotate the information into a higher dimensional space where the classes might be linearly seperable...which is something that we can see biologically.
- The idea is that encoding data in pixels is "unnecessary", we just need sparse elements, and very few of those if that.
- This neurological compression destroys the original signal, but how can we recover the structure from the original signal.
- In the brain, neurons respond to different "structures", just as in sparse coding.
- In computation, we can alternate between estimating the sparse basis and the minimization/reconstruction and come up with something that looks like the neuron perception fields found biologically.
- There is a big discussion over whether or not information is sparsely represented in the brain or not. This model makes the statement that it in fact, does both.
- Use receptive fields to measure the quality of their work, i.e. the same way the biological tests are employed. I.e. characterized with a 2D stimulus response.
- They used this also to train up Psi which sparsifies the best in an unsupervised procedure.
- Main point, even though the original signal isn't being preserved, the structure of the signal is somehow being preserved through multiple stages of neuro-processing.
- The content of images changes to learned sparse dictionaries, i.e. sending scenes of buildings will give the first stage a "building" basis, and this will propagate to all other bases in the pipeline.
"Improved Sparse Recovery Thresholds with Two Step Reweighted l1 Minimization" Weiyu Xu, Amin Khajehnejad, Amir Salman Avestimehr, Babak Hassibi
- This is a much more CS oriented talk..going through the standard background procedure now.
- For a given subrate (delta), if you use l1 minimization to discover the original signal, there is some kind of critical threshold of sparsity, mu(delta), so that almost all k-sparse x are recoverable via l1 minimization.
- So the question is is there a linear time algorithm that has a weak threshold strictly better than l1 minimization?
- For an arbitrary distribution, that is for a universal improvement, it is an open problem. But for certain classes of distributions, we can find a threshold improvement.
- The procedure is a two step one, first, the signal is recovered using standard l1 minimization, and then an approximation of the support set of x corresponding to the significant coefficients, then the reweighted method is employed using this set of weights, penalizing values outside the set of significant coefficients.
- What is shown by this algorithm is that for a fixed subrate, there exists some value, eps, which is greater than zero such that the threshold is moved to (1 + eps)*mu(delta).
- Candes, Wakin, & Boyd proposed a different reweighted method, but the one used in this method is a little bit different because of theirs doesn't exactly lend itself to mathematical analysis.
- There is a weak sense of robustness in this case, so there is an analytical boundary on the reocverablility of a signal given that it isn't exactly sparse.
- This is pretty cool, the phase transition diagram for this method looks more like a nike-swoosh than diagonal (i.e. much greater threshold performance at low subrate/sparsity pairs)
- The quality of the phase transition boundary is relative to the distribution of the coefficient magnitudes, specifically the distribution near the origin. If there are a large number of near-zero coefficients (but not quite) this will be problematic for the reconstruction (as these will be lost along the way).
Eric.Tramel |
1 Comment |
Conference,
ICASSP2010,
Research in
Research
Reader Comments (1)
I follow you VIA GFC and I love your blog! exhzbk exhzbk - moncler jacket doudoune.