Tantalizingly close

So close and yet so far. That is how it feels at the moment. I found the bug I mentioned in the last post, and had hoped would be the last piece of the puzzle before I had camera ego motion calculations working, but it wasn’t to be.

I now have a problem with the build up of error over consecutive frames. My sanity check to see if ego motion calculation is working is to run the system completely stationary ( the only option at the moment anyway ). Therefore the ego motion calculation between frames should render a 0, 0, 0, 0 result. i.e No movement in any of the 3 spatial dimensions and no rotation. Unfortunately I am getting results that look more like this after one iteration:
—————————————–
New position after ego motion estimation:
X = -0.044358336825996
Y = 0.025654321647698
Z = -0.003337081060976
Theta = 0.431281320335258

Obviously subsequent iterations compound this error and it quickly spirals out of control.

Now I realise in the real system there will be the odometery feeding into this and the odometery and ego motion will be combined in a kalman filter to get the actual estimated motion. In this way the accumulative error won’t matter so much as it will be dampened by the odometery each iteration.

My concern is that this error is far too big to start off with. I can accept (and was expecting) some error between frames, but in the above example the system believes it has moved 4cm to the left. 4cm is too much. What I have to do now is identify the source of this error. I suspect it may be something to do with excessive accuracy in the SIFT feature 3d position calculations. Any error between calculated feature locations could be being exacerbated by the least squares minimisation. That at least is my uneducated guess at this point. More investigation will hopefully shed some light on this issue.

Nearly there ?

At the risk of jinxing myself completely I think I am nearly at the point where my vision system is correctly calculating camera ego motion! Now if that didn’t make you excited it should because it is very very cool.

I managed to spend a bit of time hacking on the weekend and made a huge amount of progress. Now that I have identified what I suspect (but haven’t yet confirmed) are libsifts short comings and moved to using Lowe’s example binary to extract SIFT features things are starting to work. The matching algorithm I was using to match features between stereo images and between frames was also a bit broken, so I ported Lowe’s example matching algorithm from C to C# and that greatly improved matching performance. So now I have one bug to squash and ego motion calculations will be working. I’m pretty sure I know where the problem lies so hopefully I’ll get that nailed in the next day or two.

At that point I will have one remaining task before I can declare phase one complete. I still have to figure out how to use the camera extrinsic parameters to give more accuracy to the stereo triangulation calculations. At that point the system will be able to accurately build a map of SIFT features observed in it’s environment, and simultaneously localise itself within that environment.

Break through! …Sort of

Well, about Wednesday night I was looking at the output of one of the sift annotated images from my vision system and I finally snapped. It just looked wrong. It had been bothering me for a while that it didn’t seem to be selecting very interesting features and taking an unreasonable amount of interest in the bedroom wall, so I broke out the example binary provided by Professor Lowe himself (the inventor of the SIFT algorithm) for comparison.

As I suspected running the two SIFT implementations against identical images yielded completely different results. Lowe’s implementation finding many more and far more interesting features than the version I had been using for months.

This discovery was simultaneously bad and good news. Bad in that it meant I had a bunch of work to do, but good in that I had finally discovered the source of all my frustration.

My new plan of attack has been to write a bunch of C# code to invoke Lowe’s binary out of process and parse the resulting key file back into my program. It’s all a bit messy really but this whole project is still just a proof of concept so I’m not too stressed about performance at the moment. There is no doubt in my mind at all that C# is magnitudes slower than C for this sort of work, but development time is significantly less and for a prototype that’s what matters.

I still have a bunch of work to do before I fully understand stereo vision and the associated camera calibration requirements but I am getting there.

Trying to find the motivation…

It’s been ages since I did any work on my vision project and I find myself struggling to find the motivation. I have been stuck on a tricky problem for a while now and I think I need a breakthrough to get some interest back. The problem is that the calculations I am using to calculate the real world 3d coordinates of features are giving completely nonsensical answers. I am having trouble establishing whether the formulas I am using are wrong, or if the feature matching isn’t working properly and I am trying to work out pose information for a bad stereo feature match…

If only I could find a site on the web somewhere that had a definitive discussion of calculating real world 3d coordinates from stereo images that would at least help discount the first option.

It’s the debugging that’s the killer

I’ve tons of fun over the last few days trying to debug my vision system. To be fair I haven’t really spent that much time on it yet, but it is a fairly complicated system and trying to guess which features in the image the computer is interested in and why is a bit of a challenge. To make things easier I have been trying to annotate the images as the feature detection process runs and save them to disk so I can analyse them later. It’s required the wrapping of more functionality of the opencv library so yet more fun with marshalling and P/Invoke!

The other thing that maybe complicating matters is that I’m not sure how well the SIFT detection works in low light conditions and since I do all my hacking in the evening…

Maybe a bit of dedicated hacking this weekend will help.

In other news I am off to see Billy Corgan tomorrow night so am getting really amped for that!

Fun weekend but still stuck

Anthea and I had a fun weekend with a trip to Leads castle and some wine tasting at the oldest winery in Kent (yeah, apparently the English make wine 😉 leads castle was great. It has really beautiful grounds and the castle itself is very pretty and we managed to conquer the maze which was fun. They also had about 20 or so birds of prey sitting out on their perches waiting for the falconry display at 1:30 (which we missed unfortunately) and it was really cool to see some of these magnificent birds up close.

I received a little parcel from Amazon on Friday. I’ve finally got my second camera so should be doing real stereo vision soon. I am still COMPLETELY stuck on deriving the rotation from two sets of points using least squares however. I have a set of 3d co-ordinates which represent a set of features observed at time t1, and then I have another set of 3d co-ordinates which represent the same features at time t2. What I am trying to do is establish the translation and rotation of the camera platform between time t1 and t2. The translation is easy, but the rotation is giving me a real headache! I know I need to use a least squares minimisation, but the actual implementation escapes me. It is getting pretty annoying now too as this has held me up for a couple of weeks now. In the unlikely event that anyone reading this knows how to do this can you please contact me!

Ah well, back to reading Math world and my perpetual state of confusion :-/

Some productive hacking

Over the weekend I have made a pretty decent amount of progress with my vision project. I have created a simulated stereo system by taking images with the camera on the left hand side, moving the camera and then taking images of the same scene from the right hand side. I then load these images from disk simultaneously to simulate the effect of having two cameras. Hopefully I will get my other camera soon as this solution won’t work for long.

I am now at the point where I can match features between the stereo images and extract depth information from the stereo nature of the images. I have also completed the matrix mathematics that will give an estimate of the image coordinates of a given feature in the next frame given the camera translation between frames. This last bit of work was a bit challenging as it required me to relearn a bunch of linear algebra and trigonometry which I hadn’t used since University.

My next challenge is to use least squares minimisation on the features successfully matched between frames to find the error between the projected feature location and the actual feature location. This will then feed back into the camera location calculations as a sort of error term. I am thinking I may even use a kalman filter to integrate the robot odometry and calculated camera ego motion.

I still have a long way to go before this prototype is finished and I know whether the system will even work or not, but it is pretty exciting to be able to do what I have already accomplished!

Some progress

Well I’ve made a little progress tonight. I managed to get the conversion from an opencv image (IplImage) format to an ImageMap (essentially an 2d array of doubles) working. This now means that all my work using libsift so far now works nicely with opencv!

I also managed to get a simple C# test app running where I continuously grab frames from my Logitech Quickcam Pro 4000 and display them to the user. It essentially just calls all the opencv and highgui methods using P/Invoke but it’s working well.

It’s finally looking like I might start making some progress again!

A little progress

Well I managed to get the quickcam displaying the full resolution image using opencv. It turns out that it was capturing at 640×480 the whole time, it just wasn’t displaying the whole image. A bit of tweaking to the capture properties and everything started working correctly. Specifically I needed to add these lines:

cvSetCaptureProperty(capture, CV_CAP_PROP_FRAME_WIDTH, 640);
cvSetCaptureProperty(capture, CV_CAP_PROP_FRAME_HEIGHT, 480);

I was also having a little bit of trouble getting at the raw image data from C# as you access it like this using C:

pix = ((uchar*)(img->imageData + img->widthStep*y))[x];

I ended up writing a function that looks like this (where imageData is an IntPtr):


/// This method is used to get the value of a single pixel from the
/// underlying C
array.
private unsafe byte GetPixel(int x, int y)
{
// the image data is in one long contiguous array so we need to
// skip y rows of widthstep
values before adding our x value to
// index to the right x point in the right row.

return (byte)(((byte*)this.Image.imageData) + this.Image.widthStep * y)[x];
}

Hey… it works 🙂

I should have paid more attention in class!

So it turns out that all that matrix math I learnt at Uni is actually useful. Unfortunately I can’t remember most of it now so I am going through the very painful process of relearning it all.

I am trying to estimate what the image coordinates of an object will be after I perform a translation and rotation of the camera viewpoint. I think I’ve got the real world coordinates of the object relative to the new camera position. Now its just a matter of turning that into an image reference (pixel location).

Once I have this working I should be able to match features between frames. This will then enable me to estimate camera ego motion. Then I’ll be well on my way to my SLAM goal 🙂