The easiest solution to the localization problem is to use dead reckoning: if we know how fast and how long we have been moving, we can estimate approximately how far we are from our original position. Alternatively, we can measure the number and length of our steps or the number of wheel turns of our vehicle to do the same. Indeed, boats used to proceed like that: their speed and direction was estimated in terms of how long a knot in a rope dropped from the bow of the vessel took to reach its back side (1) and new locations were estimated in terms of how long that speed was kept. Naturally, odometrics can only work for a time, because errors tend to accumulate in an unbounded way. If no external reference is used to correct the position from time to time, after a while the vehicle is lost. One could try, for example, to precalculate the path to a spot 10 m away and then try to walk there blindfolded. Chances are one would most likely end fairly far from the desired destination, specially if turns are involved, or, at worst, stuck into an unexpected obstacle. In order to avoid this problem, we could just open the eyes that, in the robot case, means we can combine different information sources using methods like Kalman filters or Montecarlo techniques to remove or, at least, reduce uncertainty.
Ok, so maybe dead reckoning was not such a good idea...
Of course, active beacons might not be available; GPS, for example, can not be used indoors. If this is the case, robots can refer to known features in the environment -natural or artificial landmarks- to estimate where they are, in the same way we could check a metro station we are watching in a map to decide where we are at the moment. In these cases, cameras are typically used not as range sensors, but to detect the expected landmarks within the field of view e.g. (Urdiales et al, 2009).
A very special case among these ones is omnicameras, whose optics have been designed so that all points in the world are projected through a single center of projection. Hence, they capture 360º fields of view in a single frame and usually extract fairly reliable landmarks from the picture (see Keith Price bibliography on the subject).
If cameras are not available or do not want to be used, geometric landmarks can also be detected. These landmarks are significant, distinct locations that can be detected by means of range sensors, like a corridor with two open doors at each side. This would be equivalent to try to check where we are by touching the walls if the lights go out. However, similar landmarks may exist in different areas of the environment, so it is necessary to disambiguate perceptions by keeping some track of the mobile position on accumulating several landmarks in a row via a statistic method like, say, Markov Model e.g. (Baltzakis, 2003)(Fox, 1998).
In absence of a model of the environment, it is necessary to correct the position, store detected landmarks and build a new model, all at the same time. Think, for example, of trying to draw a map of a city we have never been in by marking every distinctive building we see and trying to guess the distance and relative position between each two of them. This problem is known as SLAM. SLAM conforms a quite complex, complete field of research and it is out of the scope of our work.
(1) In fact, the idea was to sing a song and write down in the bitacora in which word it had stopped. One can only hope that the crew was able to keep the rythm.