Worth your time.
Worth your time.
An interesting article at FastCompany about PlaceRaider, an experimental smartphone trojan designed by Indiana University and the U.S. Navy. It’s Android malware designed to build 3-D models of users’ apartments.
PlaceRaider, which was summarized in a recent arXiv paper, is a piece of “visual malware” which smartphone cameras, accelerometers, and gyroscopes, to reconstruct victims’ rooms and offices. As pictures are uploaded onto the central server, they are knitted together into a 3D model of the indoor location where the pics were taken.
Some really great work out of MSR.
Combination Depth Camera/Projector/IMU allows for some really novel interactions.
Worth a look.
A very cool SIGGRAPH ’04 paper on automatic colorization using marked images. A fairly simple algorithm with very impressive results.
One of the authors of this paper is also an author of the recently quite popular ‘Depixelizing Pixel Art’.
Definitely worth a look.
Many people have experimented with using the Kinect for more than just user interaction. One thing that I have been very interested in is extracting point clouds from the device.
People at the ROS (ros.org) project have gone to some trouble to determine empirical calibration parameters for the depth camera (disparity to real-world depth) here, and Nicolas Burrus has posted parameters for the RGB camera (relationship between the depth image and the RGB image) here.
Putting those together, one can take the depth image from the Kinect and turn it in to a metric point cloud with real distances. Then, those points can be projected back to the RGB camera centre to determine which RGB pixel corresponds to each depth point, and hence arrive a colour for each point in the cloud. This lets the surfaces captured in the image appear textured. With a bit of coding I came up with this:
A coloured metric point cloud taken inside my house. (That’s my hand in the foreground.)
One thing that I haven’t seen explored much is how the Kinect fares collecting point clouds outside. The claimed max range of the Kinect is 4m, but my device has been able to reach more than 5.5m inside.
Because the Kinect operates using infrared structured-light, infrared interference can reduce the range significantly or even result in no depth image at all. This is a problem when using the device outside as sunlight during the day plays havoc with the depth image returned by the Kinect. Of course, in the dark you will get a great depth image but no RGB image to colour the cloud!
There is a YouTube video posted by some robotics students showing how the sensor operates in sunlight:
Inspired by the video I decided to try it for myself – so I attached a Kinect to my car…
Using the software I had already written I could capture point clouds with metric distances relative to the Kinect. However since the Kinect itself is moving I wanted a different output. I wanted a real-world point cloud that spans many depth frames. Collecting all the information needed to reconstruct a 3D world that is spatially located meant I had to write a bit more software…
To spatially locate my car (and hence the Kinect itself) I used the GPS in my Google Nexus One along with BlueNMEA. This allowed me to get NMEA strings from the GPS in the phone via a TCP connection and log them. Using that information I could locate each depth frame and image frame and build a point cloud in a real-world coordinate system (so every point has the equivalent of a latitude, longitude, and altitude).
My software talks to the Kinect and Phone in real-time and logs all the data needed to export a point cloud. I wrote an exporter for the PLY format so I could easily view the data in the awesome open source MeshLab.
In the end I was able to capture some pretty cool looking things like this nice white picket fence:
Combining a section of depth frames you can get an idea of the power of the method. Here is a 26m section of road travelling at speed:
These points are all in real-world coordinates and could be put, say on Google Earth and appear in the right place. The point cloud is a bit messy because I did not have easy access to gyroscopes or accelerometers to track the motion of the car. Perhaps this is a good excuse to purchase a Nexus S! I did not bother to access the accelerometers in the Nexus One because it doesn’t have a gyro and so the outputs are of limited use for dead reckoning.
The project uses libfreenect and the .NET wrapper for same, along with OpenTK for the Matrix maths and Proj.NET for the spatial transforms. All amazing libraries and I’m in awe of the developers who spend their time maintaining them.
The code will live on GitHub here. It’s very hacky and mostly useful as a proof-of-concept. If you’re going to do anything with it, please wear a helmet.
Update #1: Reaction to Slashdotting
Trimensional is a new app available on the iTunes store for 3D Scanning.
You can see a video of the app here:
It works only on the iPhone 4, using the front facing camera and requires you to turn the lights off and put the screen brightness at maximum.
That might give you a clue…
Yes – it’s our old friend structured light!
(For more information on structured light 3D scanning, check out my post on the Kinect).
I don’t have an iPhone 4 so I can’t try it myself, but what I have done is my own 3D Scanning using structured light.
The algorithm is relatively simple to implement if you want to do it yourself (if I find time to clean up my own C# implementation I will post it on GitHub for the curious), but if coding is not your thing you can do it yourself with a camera and a projector using Kyle McDonald’s processing implementation.
There is a good instructable that will show you how to do 3D scans using the three phase technique yourself step by step.
If you want to take your scanning to the next level the great folks at MakerBot Industries sell a nice kit containing laser cut wooden camera and projector mounts that will allow you to get calibrated 3D scans.
It’s a really great idea for an app. I wonder if anyone is working on an implementation for Android (don’t look at me…my Nexus One doesn’t have a front facing camera).
At E3 2009 Microsoft showed a prototype video of ‘Project Natal’, their next-generation controller technology, it’s pretty cool:
This excited a lot of hobbyists (including myself!) as TOF cameras are unique in their ability to capture 3D images from a single sensor, removing the need for multiple sensors, complicated calibrations and software to capture a 3D scene. The prospect of one being included with a videogame accessory meant that there would be an inexpensive option for people wanting to play with a 3D camera at home.
There is a nice debunking of this here, where the blogger shows some videos that demonstrate the technique used. Structured Light 3D scanning is not something introduced by the Kinect, but it’s definitely the most impressive implementation I’ve ever seen of the technique. I’ll explain a bit more about how structured light scanning works in the following paragraphs.
In conventional stereo vision, two cameras are placed in different locations both looking at the same scene. A point-matching algorithm is used to identify identical points in the images resulting from both cameras. The distance between the cameras and the location of the matched pixels in each image can be used to triangulate the depth of the object at each pixel location.
The basic idea of structured light scanning is instead of using two cameras and a point-matching (correspondence) algorithm, we use one camera and one projector. If you could project a unique colour on to every pixel column of the scene that the camera sees and then pick those colours in the resultant image, you have a virtual correspondence between two ‘cameras’ in different locations. If this is difficult to understand there is an alternate technique (using a moving pattern) using a similar concept that is pretty intuitive and can be seen in this video:
This is not how the Kinect works but hopefully it paints a picture of how encoding information in the projected image can help us retrieve depth information from the single camera.
If you want to try this out for yourself, ut is possible to encode all the information necessary to reconstruct the scene in just three images so that we don’t have to use a moving image.
If you want to do this (with your own webcam and projector) there is a great Processing implementation by Kyle McDonald here.
This implementation is based on a Three Phase technique developed by Song Zhang for his PhD thesis. If you’re interested in developing your own implementation the paper you want to look at is probably S Zhang, “Recent progresses on real-time 3-D shape measurement using digital fringe projection techniques”
The Kinect proves that real-time structured light scanners can be made compact and performant, and gives TOF cameras a run for their money. TOF or not, the Kinect is still the most inexpensive 3D camera available, and it does structured light so well that it doesn’t really matter.
The Kinect has several advantages over a home-made 3d scanner:
Almost as soon as the Kinect was released an effort to develop drivers to repurpose the device was underway. People from all over the world collaborated and within a few days there were already functioning open source drivers. The awesome OpenKinect people have developed the libfreenect library which has become the standard library for Kinect hacking. There are drivers available for the 3 major operating systems along with wrappers for several programming languages.
One look at Engadget is proof enough that the Kinect Hacks community is prolific and vibrant. So many creative and talented people have made use of the amazing potential of the device in different ways.
One of the reasons a 3D camera is nice to use as an input device is that it makes it easier to tackle a tricky problem in computer vision: background subtraction (i.e. tell which pixels are in the foreground and which are in the background). With that problem solved, things like hand tracking and pose estimation become easier.
If you’re interested in extracting real-world coordinates from the Kinect, you probably want to look at Matthew Fisher’s site where he posts example code for the transform including some empirical calibration factors. Depending on what wrapper you use those transforms may be transparent when you use the wrapper API to get the depth information from the device.
I bought a Kinect on release day but until recently I had only connected it to my computer a couple of times to see how the driver development was progressing. Yesterday I started hacking up a quick implementation of the classic ‘Pong’ game, but controlled by the hands of two players.
Here’s a video of an early prototype:
The video is very jerky but that is only when something is trying to capture the OpenGL window.
The source is available on GitHub here.
The ‘Pong’ game is setup as a single Box2D world with the ball, paddles and walls all as Box2D bodies. Body movement and collision detection is all handled by the physics engine with the bodies rendered using openFrameworks.
The goals are also Box2D bodies that aren’t rendered to screen, and on collision with the goals the score is incremented and a new game starts.
The controls take the depth image from the Kinect and threshold it between a near and far plane, leaving only items that are a configurable distance from the sensor appearing in the image.
The depth image is then processed to find contours that are in a certain size range, and those contours are tested for their curvature to try to determine if they contain fingers. If they do, they are chosen as a ‘control blob’ (hand) and depending on what half of the Kinect’s vision they are in, assigned to a player’s paddle.
(All trademarks and registered trademarks are the property of their respective owners.)