Downloading Datastore Data from a Java Application on Google App Engine

Posted on Monday, September 20, 2010


One of the most frequently talked barriers for getting onto the cloud is the data lock-in. In this post we would try to look at how we can download data as a CSV from the app engine datastore. This would help us get the data back from the app engine, just in case you want to move away from the datastore.

So there exists a tool which can help you get data, that is the good part. The not so good part is that there is nothing available in Java (as of now) to accomplish this. We have to depend on the python sdk. That should give you a hint that we would need to download the python sdk.

It might also be worthwhile to check that you have python installed on your system. When I type python on the command prompt for my Ubuntu 10.04 system, I get

vikas@vikas-laptop:~$ python
Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.

So, good to go. Assuming that you have downloaded the python sdk, you would find the file in the root of the distribution.

Now, we have to set up a remote servlet with our Java application with which would be able to communicate with the app engine remotely. The following needs to be set up in your web.xml



Deploy your application to the app engine.

Now we would connect to the app engine and generate a bulkloader configuration file. A configuration file could either be auto-generated, the easier way or be written by hand. It contains information about all our entities and their fields.

There are 2 ways to generate the configuration file, either by using or Bothe the examples are as create_bulkloader_config –url=http://<your app> –application=<your app_id> --filename=config.yml --create_config –url=http://<your app> –application=<your app_id> --filename=generated_bulkloader.yaml

Once you execute this command, after the necessary authentication the following file is generated

# Autogenerated bulkloader.yaml file.
# You must edit this file before using it. TODO: Remove this line when done.
# At a minimum address the items marked with TODO:
#  * Fill in connector and connector_options
#  * Review the property_map.
#    - Ensure the 'external_name' matches the name of your CSV column,
#      XML tag, etc.
#    - Check that __key__ property is what you want. Its value will become
#      the key name on import, and on export the value will be the Key
#      object.  If you would like automatic key generation on import and
#      omitting the key on export, you can remove the entire __key__
#      property from the property map.

# If you have module(s) with your model classes, add them here. Also
# change the kind properties to model_class.
- import: base64
- import: re
- import: google.appengine.ext.bulkload.transform
- import: google.appengine.ext.bulkload.bulkloader_wizard
- import: google.appengine.ext.db
- import: google.appengine.api.datastore
- import: google.appengine.api.users


- kind: Audit
  connector: # TODO: Choose a connector here: csv, simplexml, etc...
    # TODO: Add connector options here--these are specific to each connector.
    - property: __key__
      external_name: key
      export_transform: transform.key_id_or_name_as_string

    - property: auditActionType
      external_name: auditActionType
      # Type: String Stats: 146 properties of this type in this kind.

    - property: date
      external_name: date
      # Type: Date/Time Stats: 146 properties of this type in this kind.
      import_transform: transform.import_date_time('%Y-%m-%dT%H:%M:%S')
      export_transform: transform.export_date_time('%Y-%m-%dT%H:%M:%S')

As you would notice, it would create configurations for all the entities and then leave out some TODOs for you to change. So for example we replace the

  connector: # TODO: Choose a connector here: csv, simplexml, etc...
    # TODO: Add connector options here--these are specific to each connector.


  connector: csv
    encoding: utf-8
    columns: from_header

Once our configuration file is ready, you can start downloading / uploading data to the engine using the configuration file.

To download data, again use any one of the following commands download_data --config_file=config.yml --filename=UserRole.csv --kind=UserRole –url=http://<your app> –application=<your app_id> --download --url= --application=bookhours --config_file=generated_bulkloader.yaml --kind=UserRole –filename=UserRole.csv

When the command completes, you would get a csv file for the UserRole which would be something similar to this depending on your entity.

Report role,ROLE_REPORT,90002

There is also a command, which lets you download all the entities all at once download_data –application= --url= –filename=data.csv

One of the problems of this usage is that the csv file is not a text file. It seems to be a sqlite file. I have to figure out whether I can open it with sqlite.

Another issue that I found with bulkloader is that it seems to ignore the relationships in the generated configuration file. For example, if there is a relationship between User and UserRole then that is not evident from the auto generated configuration file. Refer to my issue raised on GAE forums.

Nevertheless, we are able to get data back from the app engine, albeit for each individual entity as a csv file. Once we have the csv, it becomes easy for us to put the concerns about data lock-in to rest.

Posted in: Architecture, Cloud, Java