Friday
Dec092011

PCA, Compressive Measurements, and Video

 

So, I've been having a little bit of fun with visualizations today. The outcomes were pretty nice and potentially artistic, so I thought I'd share them with everyone. Let me explain what you're looking at here:

I've been investigating (for a while now) the recovery of video from compressive measurements. One of these techniques we presented at DCC earlier this year and is also part of another article submission. One of the core procedures of this technique involves harvesting all of the possible blocks within a search window in a reference frame surrounding a target block location. All of these blocks are then ranked according to their proximity to the target block using the \(\ell_2\) norm. Those of you familiar with traditional video coding will be familiar with this technique of block matching from ME/MC. The basic idea here is that these proximities are similar under random projection, so we can compare projected reference blocks against block compressive measurements and come up with some decent frame predictions. That is for some reference block dataset \( \mathcal{H} \), \( | \mathcal{H} | = K\) for all references \( h_{1, \dots, K} \in \mathcal{H} \), that the set of all pairwise distances, \( \{ || h_i - h_j ||_2 ;  i, j \in 1, \dots , K\}\), has its salient structure preserved under random projection, that is, \( \{ || \Phi (h_i - h_j) ||_2 ;  i, j \in 1, \dots , K\}\). This observation is nothing more than just restating portions of works on the JL Lemma. The application to block matching happens when we can say with a high liklihood that $$ arg \underset{i \in 1, \dots, K}{min} || x - h_i ||_2 \approx arg\underset{i \in 1, \dots, K}{min} || \Phi ( x - h_i ) ||_2 .$$

This ends up working out pretty well in practice.

For these visualizations, I originally wanted to see if there was any kind of clustering going on within the set of reference blocks, \( \mathcal{H}\). In order to visualize this high dimensional data, I just did the most direct thing and took the first three principal components of the reference dataset and then scatter plotted the coefficients of each reference in the dataset. In otherwords, if the matrix \( P \) contains the first three PC's of \( \mathcal{H} \), then each point in these images, \( p_i \) is calculated as $$ p_i = P^T h_i .$$ I used the same procedure for the set of projected reference blocks, as well, except here the principal components, \( Q \), are calculated from the set \( \{ \Phi h_i ; i \in 1, \dots K \} \). Then, each point \( q_i \) is calculated as $$ q_i = Q^T \Phi h_i .$$

In order to fit some extra information in there, I color coded the data points. The points \( p_i \) are mapped on the white->red->black scale. The points on the white->blue scale represent the points \( q_i \). Increasing color intensity represents increasing accuracy, or proximity, to the true/target block for each, \( || x - h_i ||_2 \). This same set of proximities was also used for displaying the colors of \( q_i \) as well, to give some consistency across both. The size of each point also increases as its proximity to the true/target block increases. Also, for reference, I just used projections with coefficients drawn from a normal distribution \(\Phi \sim \mathcal{N}(0,1)\).

What is really cool about these visualizations is that they show the similarity between point clouds and their random projections. I also included a little animation that shows one such cloud unfolding as the number of measurements increases. 

The other interesting aspect that these visualizations show is that most of these reference blocks seem to lie on some kind of manifold. This makes some sense, because we can think of these blocks as articulated, that is, they can be described by a set of parameters, namely the pixel location of each block. This is why manifold theory might be productive for super-resolution or sub-pixel ME/MC. I also find the intricate structures that show up in these visiualizations are especially cool. I would not have expected to see some of the shapes that are shown, here. Maybe I need to spend some more time with manifolds!

 

Sunday
Jan022011

Education on Higher Education

I ran across this article on The Economist the other day, and found it quite an interesting read. The Economist, as I'm sure many of you know, can be supremely pessimistic at times, but I am interested in hear what the readers of this site think with regards to the current state of higher/doctorate education.

Do you see anything that needs changing? If so, is there anything that could be done, or do we just have to sit in it? 

Thursday
Dec162010

I DCC and You Can Too

Hi everyone! 

This coming spring, we'll be attending that bastion of old guard source coders, the Digital Compression Conference (DCC). I'm curious to know if anyone from the CS community will be in attendance. Maybe we could talk CS and where things are going over some hot cocoa? (Scratch that...dark as night coffee only for me).

There hasn't been any specific schedule released for the conference yet, so I am unsure if there will be any presentation session specific to CS. There are no parallel sessions at DCC, so it’s definitely a great opportunity to "spread the gospel" as it were.

In reading some of the reviews we got back on our paper, and I don't know if others are seeing this too, but there seems to be some misunderstandings that folks working on traditional systems have when it comes to CS. It seems like there is always a question from this group about a lack of "rate-distortion" comparisons when it comes to CS. Also, there is an insistence to compare against traditional coding systems (i.e. JPEG, JPEG2000, H.264). This topic is something that I really want to hit during our talk at DCC: that CS isn't a traditional coding system.

I know that seems really obvious to the CS community, but it seems that this misunderstanding is standing in the way of wider acceptance of CS methodology.  The fact that CS offers not just computationally light encoding, but rather no computation encoding, seems to be lost on many in the traditional community.  It will take some time for the adjustment in thinking, however, because hard-hitting heavy computation encoders have been such a fact of life in the field that the assumption that sensing systems have to have some kind of computational overhead is just taken for granted.

Because of this assumption, many simply flip to the back of many a CS paper and say “But wait...JPEG/H.264 can do better than this, why would I use this convoluted system? What a bunch of hype!” and then throw the journal to the ground and stomp away angrily to write blog entries and poetry (this is my imagination, now).

It’s like fighting the misconceptions that CS is “in-painting” (that you Wired…), we’ve got to tackle these false assumptions that people have about what CS is. Once they understand and have that “Aha!” moment when they say “Wait…this means I can…”, then we have a new CS author :)

Thursday
Mar182010

Day 4 -- A Veritable Cornucopia of Compressed Sensing

If you were going to come to ICASSP for one day, this would probably have been the day you chose to come (if you read these blogs, anyway). The morning consisted of a CS poster session and a talk by Michael Wakin. This afternoon consisted of an entire lecture session: Compressive Sensing: Theory and Methods. Yes, it seems IEEE has decided on the -ive rather than the -ed. Also, I think today will consist of fewer pictures, because most of the ones I got tended towards blurry and the back of folk's heads. Those poster crowds can get rough.

The first poster I saw was "Empirical Quantization for Sparse Sampling Systems"  by Michael Lexa out of the University of Edinburgh, formerly out of Rice. (Rice really pumps out the DSP don't they?) He's a Nuit Blanche-er, so Igor can go ahead and give a little fist pump :P As for the work, I find it pretty cool that folks are beginning to look at the quantization problems. This method of quantization, Michael said, is best understood in the context of "quantization for classification." It seems that the target for this technique is something akin to  Mishali & Eldar's Sub-Nyquist wideband sampler. I'm pretty far from throughly understanding the work, so I'm afraid that my further explanation won't do much justice to it. Be sure to check it out. If you're handy with the Kullback-Leibler, you should feel pretty comfortable with this paper. Great work, Michael, hopefully that "spouse's pay check" grant will let us see more of the same :)

After talking for a bit about what we were working on, Michael pointed me towards Jason Luska, from Rice, and Marco Duarte, formerly of Rice now at Princeton, who both worked on the single-pixel camera. I'm eager to see one of these things in action, and maybe I won't have to wait very long. Jason told me that a recent startup has formed for the production of these devices, and that recently they have put together a portable version, so the Single-Pixel Camera is finally out of the lab and capturing IR in the natural world. I'm glad to hear that work is progressing on it! The new versions are apparently running at high resolution (1024x768 I believe) and thanks to some optimization folks, Jason says the reconstructions are "under a minute". I believe they're still using the l1 BP approach, but I can't be sure. He also hinted at some future work on "in the loop" reconstruction feedback to the encoder for distilled image enhancement, which sounds complicatedly wonderful.

Marco also reminisced about the first six months of building the camera, saying they spent they spent the first six months reconfiguring the reconstructions, inserting proposals from new research all along the way trying to correct the erroneous results they were getting only to find out that the problem all along was faulty optics. Ouch! Thank you both for the conversation.

Wakin's talk, "Concentration of Measure for Block Diagonal Measurement Matrices", I found to particularly interesting, since the blocked-approach we've been using amounts to this same approach. I imagine that we'll be making use of this theoretical framework in the future, and I'd love to see it expanded into a journal article. I've been looking at some similar phenomena and wonder if they can be fit into this same framework. The only difference is that I've been phrasing what I've been doing as a correlation between block energies rather than a concentration. I'll have to fiddle around some more.

Marco also gave a talk at the afternoon session, "Kronecker Product Matrices for Compressed Sensing", which was also very interesting because of his kind of "fusion" of basis functions to act as a sparse 3D basis for the joint reconstruction of hyperspectral datacubes. I'm wondering if this would be useful at all with Dr. Fowler's Compressive Projection PCA. I also really liked Silvia Gandy & Isao Yamada's "Alternating Minimization Techniques for the Efficient Recovery of a Sparsely Corrupted Low-Rank Matrix". Its a little bit of a different application of an l1 minimization, but the end result is very interesting, and I would like to see more examples of it. The one given was separating shadowing and specular effects from faces given a database of a face under different lighting conditions. I had seen something based on this last year and I'm wondering if this was the same work.

I also took a look at the "Bayesian Compressed Sensing Imaging Using a Gaussian Scale Mixture" poster shown by George Tazagkarakis. It seemed pretty solid, though I didn't get many details on the reconstruction time, and the quality measures were a little bit hard to directly compare to some other methods. However, this was one of a few papers that I saw today that make use of the GSM as a sparsifying prior, rather than a laplace or gaussian (which does a terrible job) prior. There was another interesting prior for the Bayesian crowd that was shown in the CS lecture session: the Inverse Gamma prior, used in "Efficient Sparse Bayesian Learning via Gibbs Sampling" by Xing Tan, Jian Li, and Peter Stoica. I think there was a Jefferys prior thrown into the mix sometime today, as well. All I can say is, all you BCS folks are insane. 

Okay, thats all for now, check back later this evening or tomorrow :)

 

Wednesday
Mar172010

Day 3 -- A Break

There was one more talk at the Dictionary Learning session that I wasn't able to post about thanks to some really wonky internet at the conference. Right at the end it gave out and I lost all the notes that I made...guess I learned my lesson!

"Ultrasound Tomography with Learned Dictionaries" Ivana Tosic, Ivana Jovanovic, Pascal Frossard, Martin Vetterli, Neb Duric

It was on the reconstruction of sound speed through ultrasound tomography samples that used l1 regularization as an added constraint to offer better performance than a least squares approach.

I have some apologies to make:

Firstly, I missed the session on Video and Motion Analysis last night, which is unfortunate because there were two papers presented during that session of particular interest to us (though Sungkwang did attend, and I'm pressing him to make a write up :P ). The two papers were

"Motion Estimation from Compressed Linear Measurements" Vijayarghavan Thirumalai, Pascal Frossard

"Compressive Sensing and Differential Image Motion Estimation" Nathan Jacobs, Stephen Schuh, Robert Pless

Also, this morning I had intended to attend a talk in the Target Detection & Localization session by the folks over at Duke:

"Hyperspectral Target Detection from Incoherent Projections" Kalyani Krishnamurthy, Maxim Raginsky, Rebecca Willett

Unfortunately, I got caught up in the Dictionary Learning session and didn't make it in time for this talk, but Zach Harmany, one of their colleagues at Duke, told me it was a pretty good presentation, and I'm sorry I missed it.

Last night, Sungkwang and I were having pizza out at this great little Italian bar down the road and talking about CS and the nature of research. At a big event like this, its easy to see how "far" research has come. Yes, I did use quotations there. Maybe my brain is wired a bit differently, but I'm a big fan of quality over quantity, which is a topic that I think many researchers have been moaning about for many decades. The problem, of course, is how to quantify the effectiveness of research dollars spent, and ultimately quantity measure has been adopted because of its objectiveness. I'm not sure its something that can be easily changed, and solution is not exactly well defined.

Also, after looking at many posters the past couple of days and mulling over everything, I'm beginning to wonder about the nature of the transition in research from finding answers to challenging problems to having challenging and complex solutions and finding problems to put them on. There seems to be this major disconnect with what the "point" of all this is. Perhaps I'm talking much more like a PhD student and much less like a future professor, and thats okay, I suppose. But I have an investment in all this, too, its my literal bread and butter as well. 

What I'm getting at is this: I can see myself getting very tired, very fast with some of the self-serving nature of research. In the end, it cannot be about the complexity of your contribution but its meaningfulness. Of course, many significant contributions to the community and humanity as a whole are inherently complex. The theory underlying wavelets can be thought of as "complex", but the final solution is, in fact, an elegant one. I suppose I would just like to see the pursuit of elegance in engineering research as it is in many scientific pursuits. 

I want research to consist of wooded vales, tobacoo pipes, benches, good outside conversation and collaboration (without the nagging consistent fear of theft), conferences of under 200, and a constant focus on solving the present problems definitively. A pipe dream, I suppose. I'm probably about 70 or 80 years too late for that.

This kind of got me thinking about a conference for strictly CS related topics. I hadn't heard of one yet, maybe its already happened and I've missed the boat. Many things in the field are still in what I'd call a "high-entropy" state and haven't yet coalesced into some standard models of which problems we're solving and what context we are solving them in. It seems like every paper has a different results metric and its not clear what to take seriously and what not. I can't tell you how many times I've started reading a CS paper with a very intriguing abstract only to find it fairly light and completely impractical. There seems to be a lot of people walking around with their hammers giving a one-go at the CS "nail."

I'm still very much in favor of seeing more thought going into the hardware behind CS. Until there are some very robust sampling methodologies for a particular signal type, CS will remain on the sidelines of that field. The medical imaging community has done an excellent job making a case for CS, which I think is great. The imaging community, I think, needs a bit more work. Sure, the single pixel camera has been made, but thats far from "solving" the image and video signal sampling problem. There are so many other things to look at and validate, especially in the field of video. I'm actually really surprised that there hasn't been more hardware work to come out of Rice after the single pixel camera. I am becoming more and more convinced that we've got to be tying CS in with some kind of hardware framework. In other words, can we stop sampling in the wavelet domain, please? How does that make any physical sense? Where are these magical coefficient capturing cameras? 

CS is now beginning to be more well understood analytically, which is great, and provides us with insight into future implementations and what CS is capable of, but our empirical & physical results need a lot of catching up. Perhaps over the next year we'll see more of this :)

Okay, thats all for now. I hope you can bear with my idealism from time to time!