How to download Wikipedia
So you're looking for some dummy data? Well how about downloading the wikipedia???!!
There are over 2 milliion pages on the wikipedia. Don't try to crawl the site, it won't let you. No robots allowed!
Go to https://download.wikipedia.org and you'll see a list of all the databases. If you're looking for the English one it's "enwiki". Then you can choose to download a whole bunch of stuff ... but the file you generally want to download is "pages-articles.xml.bz2". This contains current versions of article content, and is the archive most mirror sites will probably want. The latest version at the time of writing is 1.7GB.
Now you can run some decent content through your search engine or proof of conept applcation!
Comments
- Anonymous
October 05, 2006
Thank you for higligting this, this is so cool! - Anonymous
October 05, 2006
Not a problem Hannes :) - Anonymous
December 02, 2006
Very cool. Don't forget you can use DataDude to generate data too... I believe it's now out as CTP 7 - Anonymous
December 02, 2006
Update: DataDude is now RTM 1.0 :)