One of the most time consuming task when I was working on the Ushahidi platform in Chile was the geo-location of events. This is a critical issue for emergency situations because of the importance of the location of people in need, crucial for the delivery of relief aid. The problematic related to the geo-location of information in emergency situations is strictly related to the need of real time information, leading us to a dilemma: time is crucial but so it is accuracy of the geographic location.
To be able to geo-locate an information manually you need time, the more time you need to find the exact location, the more you are delaying the receive of relief aid, which sometimes can make the difference in between life and death. The Haiti Ushahidi platform and its use to rescue people trapped under building is a good example of this dynamic. The platform was as successful as hundreds of volunteers were working on the manual geo-location of information in near real time. The problem here is evident: you don’t necessarily have hundreds of volunteers working on geo-location, and Haiti was an exception, not the norm.
I have been asked recently to look at a mapping website related to health issues, namely Health Map, which deals with online news media and health sources to monitor and visualize, on a real time basis, disease outbreak alerts.
The similarities between this system and Ushahidi are multiples: they both deal with emergency issues, and they both require as much accuracy as possible, as well as real time information, and both platforms collect information from different sources like SMS, e-mail, media and web submission.
In trying to find out more information about this platform I read a very interesting article published in November 2009: “Automated vocabulary discovery for geo-parsing online epidemic intelligence” written by Keller, Freifeld and Brownstein.
In the article the authors deal with a familiar problem which is the “critical task of the last phase of the information processing scheme: the geographic parsing (“geo-parsing“) of a disease outbreak alert or the extraction from one such textual document of its related geographic information.”
The similarities with the information we dealt with on the Ushahidi platform are even more evident when the authors of the articles speak about the fact that “each alert is uniquely characterized by its disease category, a set period in time and its precise geographic location.”
Of course, the Ushahidi SIPA Team, as the Fletcher Team, had volunteers dealing with those issues, which were categorizing, mapping and time-locating information, but Health Map doesn’t have a poll of volunteers doing it. For this reason the geo-location of information on Health Map was done by assigning incoming alerts “to a low resolution geographic description such as its country, and in some cases its immediately lower geographic designation.” But of course, this was not enough.
So the Health Map Team worked on a different system which I think is extremely interesting. They created a “specially crafted gazetteer, which was built incrementally by adding relevant geographic phrases extracted from the specific kind of news report intended for mapping” which was then used in a “look-up tree algorithm which tries to find a match between the sequences of words in the alert and the sequences of words in the entries of the gazetteer. The system also implement a set of rules which use the position of the phrase in the alert to decide whether or not the phrase is related to the reported diseases.”
Even more interesting here is the learning process of the algorithm, related to a simulation of a real reader, “based on the intuition that a human reader presented with a text containing a phrase that is out of his vocabulary would most likely be able to guess whether this phrase refers to a geographic location or not. This reader would infer the semantic role of the phrase with a certain accuracy, because he has prior knowledge of the syntactic context in which geographic references appear, maybe also of their particular character distribution or the fact that they generally begin with a capital letter, etc.”
To make it short (and a bit more comprehensible), the machine/algorithm is learning a process by simulating a very specific human task, and extracting the general rules of this process, which is then re-transformed into a generic scheme that will allow its automatic application in a more general environment.
The Health Map approach can be used indeed for further studies on the application of learning algorithms to the automatic geo-location of information. As the authors explained in their conclusions: “The results of this analysis provide a framework for future automated global surveillance that reduce manual efforts and improve timeliness of reporting. Ultimately, the automated content analysis of news media and other nontraditional sources of surveillance data can facilitate early warning of emerging disease threats and improve timeliness of response and intervention.”
I think that experiments like this portrait exciting perspectives for Crisis Mappers, but also that automatic tasks performed by machines raise more questions about the necessary role of the human component in dealing with emergencies and with sensitive issues like disease outbreaks. The dilemma is still there: Time vs Accuracy is not going to be an easy battle!