Last: geolocation links
Next: no entry
June 04, 2003
towards a london data garden
the problem
Collaborative mapping is a fine idea - but how (or where) do you start? One person cannot map an area large enough to be useful to anyone other than themselves. People cannot collaborate without some kind of structure to organise activity, prevent duplication, and cross-check data quality.
chunking
London is big. Really, really big. Every one of us is small. Really, really small. How can one person contribute effort into a larger system? Simply getting people to record what they want will cause data gluts in some areas, and paucity in others. People will not contribute if they feel they are not contributing, but merely repeating work others have already completed. Also, the amount of effort required to contribute must be small enough to carry out almost on a whim.
The solution for our problem is to chunk geographically into arbitrary units of data, that, joined together form a useful whole. This raises another problem - how do you chunk geographically when you have no map to start with? Like compiling a compiler, you have to start somewhere: Whilst actual map data is copyright, metadata about the maps are not (IANAL), especially when used in abstract form. We have a good, universal map to start with - the A-Z (other countries and cities may have comparable universal maps, or get around the situation another way - grid cities could use streets as bounding entities for blocks).
In particular, the pink/red central pages offer squares that are very geographically managable (less than 0.5 km x 0.5 km I think - I haven't an A-Z to hand). The squares are referenced by page number, a number and a letter (and an edition number - squares in each edition cover the same geographic area, but there will need to be some manipulation of pages/square co-ordinates). The idea is for each participant to only have to cover the area in one block for each data collection project. Rather than collecting all data wanted at the same data, the task is chunked into small seperate pieces of useful data. This should take 2-3 hours at most.
The initial data collection would consist of street names, numbers, road intersections. Other information would be gathered in additional data collection projects. These could be:
GPS readings of roads/boundaries
public transport nodes (bus stops, tube, rail, river, taxi ranks)
postcodes
photos of every building
wireless LAN points (and open access)
mobile phone cell IDs
information for openguides (pubs, restaurants etc.)
Each project is separate, but based around the grid system specified. Therefore data from each project can be interrelated.
It is very important that this information is captured on-the-street, or by word-of-mouth (e.g. asking people for their postcodes). This information *must* not be copied from existing maps or databases.
organisation
To ensure data quality, there has to be some level of organisation. The data is only as useful as its most inaccurate data. I suggest devolved local power for most decisions, with a big stick for those that cause trouble.
My analogy is a garden, like Kew or Hampton Court. They cover a large area, require lots of care and attention (continually), and lots of different skills. The whole is better than the parts, but the parts are pretty damn fine too.
- gardeners
Gardeners are in charge of a particular square on the map. They input the initial data about the square, and have responsibility for checking new data about that square. Gardeners arbitrate any disputes in data quality. There is one gardeners per square, and people may garden more than one square. However, if their work is not kept up-to-date, they may lose gardening privilidges, and new gardeners would be found.
- planters
Planters provide the majority of the information into the system. Each new project is available to be taken in each square - different projects may require different skills or technology, such as a GPS or a digital camera. All data input by planters is checked by the gardener before it becomes available for collecting. Gardeners may plant in their own square, and in any others. Planters can provide information in any square.
- weeders
Any information present in the system can be questioned, disputed, or replaced by a weeder. Weeders do not need to plant or garden. The gardener of the square where data is disputed decides whether the data needs to be replaced, re-collected, or destroyed.
- head gardener
If there are big disputes between a gardener and a planter or weeder, the data will be referred to a head gardener. They may get independent corroboration of data quality. Head gardener decisions are final.
- visitors
Visitors do nothing but look at the data. They can register to receive updates if information in a particular square changes.
- collectors
Whilst most of the effort has been expended on the data entry system, what is actually most important is that the data can be reused in other projects, and that adequate data exchange facilities are available. Collectors have access through APIs to the data, and also to the raw databases themselves. They can download a snapshot of the entire database. They can also propose additional projects - if they write the code and db table to handle it, and there is a general consensus that It Is A Good Thing.
data quality/refresh
For each project, a period of time is set before the information has to be collected again (and a time period after which the data should be thrown away). These would appear on the square's page as they become available, and the gardener and previous planters/visitors would be informed.
ownership of data
(this bit is controversial, and I could do with feedback)
Data is given to the garden through a with-attribution licence. This means that the planter and gardener are shown on pages containing data they produced.
Other people/projects can use the data in the garden on a non-commercial/with-attribution/with-similar-rights licence (i.e. other projects have to mention the data garden project, and then cannot provide the data on to a paid app on different rights).
There will be times and places where this data would be useful in commercial applications (pay-for sites and software). I kind of feel this should be paid for, with money divided between those that have provided the data (gardeners and planters). I want to avoid a CDDB-style calamity though.
why would someone participate?
I don't have a complete answer to this. Some people just will (I know I would). Others may be forced to - this would make some nice cross-curricular school projects (*prods Jo about Nesta*). It would be nice to offer something to those that participate - some community stuff, message boards/mailing lists - but it would be great if projects that use the data could offer some extra functionality/beta testing to participants. I've also thought about league tables of those who participate most. Nothing like stroking egos to get things done. The potential monetary remuneration may also be compelling, but I don't want people to participate because of that.
changing scale
Another worry is what to do when all of central London is mapped. Sure, there's always more data to collect. But how to move to the blue square pages? I don't think there's any way to correlate the two... start the blue map as a completely seperate entity?
next steps
God knows. It's just an idea. Please feedback. I'd love to program it, but my skills are very rusty. I have drawn out pretty detailed screen flows and wireframes. I'd like people to spec out db schemas and useful interfaces to get data out (and maybe in). Then it's a case of programming muscle. If people want to get involved, holler.
Trackbacks
TrackBack URL for this entry:
http://www.anti-mega.com/cgi-bin/mt/addmttb.cgi/50




