Description of the files/directory/resources for the ACL 2018 publication “Which Melbourne? Augmenting Geocoding with Maps.” by Milan Gritta et al. 2018

WEIGHTS.ZIP - I included a trained model for you with a slight experimental tweak. You need it to geocode other text using the geoparse.py script. See instructions in README.md (GitHub) and inside the script. To train new models, take a look at OPTIONAL FILES. If you come up with something interesting, let me know! Thanks.

GEONAMES.DB - This is the SQLite database built from allCountries.txt used throughout training and testing. Definitely need this one!

------------------------------------------OPTIONAL FILES-----------------------------------------------

GEOWIKI.TXT - This is the Wikipedia-derived **RAW** data i.e. geotagged Wikipedia articles in plain text. For replication, you won't require the file. However, if you wish to modify the training data to your liking, recreate them from scratch, use this dataset and the helper methods in preprocessing.py. 

TRAIN_WIKI_UNIFORM.TXT - This is the machine-readable training data ready to go if you wish to train your own model using different parameters. Required by the train.py script.

ALL_COUNTRIES.TXT - This is a bit redundant as you can download this file from geonames.org but I spared you the effort. You need this file to (re)build the SQL Database.

DEV_WIKI_UNIFORM.TXT - This is the machine-readable DEV or VALIDATION data used to test the model during development. It's a disjoint partition of the Wikipedia-derived data, same DOMAIN and FORMAT as the training data but obviously different Wikipedia articles.

GLOVE.TWITTER.50D.TXT.ZIP - Embeddings as used in the experiments, downloaded from the GloVe website. In reality, I found that a lot of vectors are missing from the vocabulary so will have to be learnt from scratch. I don't think there will be any performance loss if you don't initialise embeddings.