Recently I wanted to run some jobs. I’m a huge advocate of using docker, so natrually I was going to build a docker image, to run my python scripts, then I wanted to schedule said job to run once in a while. Doing so on AWS is pretty easy using lamda and step functions, however since this is wasn’t a paid gig, I wasn’t able to get someone to fork the bill, enter Google Cloud!
Google Cloud Platform (GCP), is in a way the newer kid on the block. AWS has a long history of cloud platform and excellent customer support, whereas Google customer service is a bit like big foot, you’ve heard of it, some people say they seen it, but it doesn’t really exist…BUT, google still is an amazing technology company, they release early, the imporve it to make it rock (i.e. Android). And best they offer 300 bux free credits. So I decided to go for google, how bad can it be?
In this post I’ll talk about how I setup the google cloud to work for me, in a rather cool way. It took lots of blood, sweat and tears but I got it working. I schedule a job once in a while, I spin up a cluster of instances, run the job then shut it down! Not only is that cool (ya I’m a geek), it’s also quiet cost effective.
I will outline what I did, and even try share the my code with you guys.
Here goes:
Step 1 – Build docker image and push to google cloud private registry
The first step was the easier and the most trival. It is pretty much the same as AWS.
Create a build docker image
Let’s start with creating a build image. GitLab ci allows you to use your own image as your build machine, this is cool. If you’re using a different ci, I leave it to you to adjust this to your our system.
from docker:latest
RUN apk add --no-cache python py2-pip curl bash
RUN curl -sSL https://sdk.cloud.google.com | bash
ENV PATH $PATH:~/google-cloud-sdk/binser
RUN pip install docker-compose
This a Dockerfile for the build machine. It uses docker machine and it pulls pip, and installs glcoud.
Then I push this build image to docker-hub. If you haven’t done this before you need to:
1) Singup to docker cloud https://hub.docker.com and remember your username.
2) in the build machine folder, run docker build . -t /build-machine
3) run:
$ docker login
$ docker push /build-machine:latest
service
Create a GCP service account
You have to create a service account, give it access to the registery then export the key file as json. This is very simple step. If you’re unsure how to do it, just click through the IAM / Admin, you need to create a user, give it an IAM and export the key. Very easy.
Customize CI Script to push to private registery
Once this is all done, you have your build machine, we can now work on your ci script. I will show you how to do this on gitlab ci, but you can adapt this to your own environment. First create a build environment variable called CLOUDSDK_JSON and paste the contents of the json key you created in the previous step as the value of that key. Then add the following .gitlab-ci.yaml file to your project.
image: /build-machine
services:
- docker:dind
stages:
- build
- test
- deploy
before_script:
- apk add --no-cache python py2-pip
- pip install --no-cache-dir docker-compose
- docker version
- docker-compose version
- gcloud version
build_image:
stage: build
except:
- develop
- master
script:
- docker build -t :latest .
deploy:
stage: deploy
only:
- develop
- master
script:
- docker build -t :latest .
- echo $CLOUDSDK_JSON > key.json
- gcloud auth activate-service-account --key-file=key.json
- docker tag :latest $PRIVATE_REGISTERY/:latest
- gcloud docker -- push $PRIVATE_REGISTERY/:latest
- gcloud auth revoke
Please adjust the job-image-name to your job docker image name, service_account_name to the service acocunt name you created and the build image to the image you pushed to docker hub. This yaml file is directect at a python job, but you can change it to any other language.
I have 3 stages: build, test and deploy.
I build and test on all branches, but only deploy on master. Gitlab ci has an issue, each step can happen on a different machine, so my first build step isn’t kept to the deploy phase, which forced me to re-build in the deploy phase.
Once this is done, you ci system should be pusing your image to your google private registery, well done!
Step 2 – Running Jobs in a Tеmp cluster
Here come the tricky part. Since jobs only need to run every x time, and only for a limited period, it’s ideal to be run as a google function. However those are limited to one hour, and can only be written in JavaScript (AWS support multiple languages with lamda and with state machines). And since I didn’t want to pay for full time cluster time running, I had to develop my own way to run jobs.
Kubernetes Services
Controlling jobs in a cluster and cluster control can be achieved using Kubernetes. This is one part of GCP that really shines, it let’s you define services, jobs, and pods (a collection of containers), and to run them.
To do this, I wrote a KubernetesService class in python that will:
– Spin up / create a cluster.
– Launch docker containers on the cluster.
– Once jobs finish, shutdown the cluster.
class KubernetesService():
def __init__(self, namespace='default'):
self.api_instance = kubernetes.client.BatchV1Api()
service = build('container', 'v1')
self.nodes = service.projects().zones().clusters().nodePools()
self.namespace = namespace
This is the class and constructor. The full code for this class has more configuration and env varibles, as is part of the appengine cron project. I will include repo, if you want full details on how to achieve this.
def setClusterSize(self, newSize):
logging.info("resizing cluster {} to {}".format(CLUSTER_ID, newSize))
self.nodes.setSize(projectId=PROJECT_ID, zone=ZONE,
clusterId=CLUSTER_ID, nodePoolId=NODE_POOL_ID,
body={"nodeCount": newSize}).execute()
This function can control the cluster size. It can spin it up, before jobs need to be run, then shut it down after:
def kubernetes_job(self, containers_info, job_name='default_job', shutdown_on_finish=True):
# Scale the Kubernetes to 3 nodes
self.setClusterSize(3)
timestampped_job_name = "{}-{:%Y-%m-%d-%H-%M-%S}".format(job_name, datetime.datetime.now())
# Adding the container to a pod definition
pod = kubernetes.client.V1PodSpec()
pod.containers = self.create_containers(containers_info)
pod.name = "p-{}".format(timestampped_job_name)
pod.restart_policy = 'OnFailure'
# Adding the pod to a Job template
template = kubernetes.client.V1PodTemplateSpec()
template_metadata = kubernetes.client.V1ObjectMeta()
template_metadata.name = "tpl-{}".format(timestampped_job_name)
template.metadata = template_metadata
template.spec = pod
# Adding the Job Template to the Job spec
spec = kubernetes.client.V1JobSpec()
spec.template = template
# Adding the final job spec to the top level Job object
body = kubernetes.client.V1Job()
body.api_version = "batch/v1"
body.kind = "Job"
metadata = kubernetes.client.V1ObjectMeta()
metadata.name = timestampped_job_name
body.metadata = metadata
body.spec = spec
try:
# Creating the job
api_response = self.api_instance.create_namespaced_job(self.namespace, body)
logging.info('job creations result'.format(api_response))
except ApiException as e:
print("Exception when calling BatchV1Api->create_namespaced_job: %s\n" % e)
kubernetes_job function creates continers (an additional function that creates container objects with env variables. Containers are then part of a pod, and that pod is part of a job template which is part of a job spec. You can read more about it in the Kubernetes docs.
def shutdown_cluster_on_jobs_complete(self):
api_response = self.api_instance.list_namespaced_job(self.namespace)
if next((item for item in api_response.items if item.status.succeeded != 1), None) is None:
logging.info("no running jobs found, shutting down clutser")
self.setClusterSize(0)
else:
logging.info("found running jobs, keeping cluster up")
If you don’t want to code to continue to wait for the jobs, you can poll for completion, and that is what shutdown_cluster_on_jobs_complete is for. It will shutdown the cluster once there are no running jobs.
This class controls all the job scheduling execution successfully.
And it’s part of an appengine (however they can be used independently).
Next we we need to have this script scheduled or triggered to activate.
And that is our cron scheduler task.
Cron scheduler appengine service
Sadly google doesn’t give you an easy way to run code in the cloud, you actually have to write more code to run code (silly right?)
The concenpt is that appengine provies you with a cron web scheduler that calles you own apps endpoints in given intervals.
First you add cron.yaml to your project and you configure which endpoint and the time interval to hit that endpoint:
cron:
- description: task to kick off all updates
url: /events/run-jobs
schedule: every 2 hours
- description: task to shutdown jobs when finished
url: /events/shutdown-jobs
schedule: every 5 min
Then we can add a handler to shutdown the jobs, and to kick off the jobs.
class RunJobsHandler(webapp2.RequestHandler):
def get(self):
try:
logging.info("running jobs")
jobs_list = Settings.get("JOBS_LIST").split()
for job_name in jobs_list:
job_name = job_name.replace("_", "-") //names cannot have underscore
logging.info('about to publish job {}'.format(job_name))
containers_info = [
{
"image": Settings.get("IMAGE_NAME"),
"name": job_name,
"env_vars": [
{ "name": "SOME_ENV_BAR", "value": some_value}
]
}
]
job_env_vars = Settings.get('JOB_ENV_VARS').split()
for env_var in spider_container_env_vars:
logging.info('adding container var {}'.format(env_var))
containers_info[0]['env_vars'].append({
"name": env_var,
"value": Settings.get(env_var)
})
kuberService.kubernetes_job(containers_info, job_name, False)
self.response.status = 204
except Exception, e:
logging.exception(e)
self.response.status = 500
self.response.write("error running jobs, check logs for more details.")
else:
self.response.write("jobs published successfully")
Last we want to add a Setting class to load env like variables from the datastore:
import os
from google.appengine.ext import ndb
if os.getenv('SERVER_SOFTWARE', '').startswith('Google App Engine/'):
PROD = True
else:
PROD = False
class Settings(ndb.Model):
name = ndb.StringProperty()
value = ndb.StringProperty()
@staticmethod
def get(name):
NOT_SET_VALUE = "NOT SET"
retval = Settings.query(Settings.name == name).get()
if not retval:
retval = Settings()
retval.name = name
retval.value = NOT_SET_VALUE
retval.put()
if retval.value == NOT_SET_VALUE:
raise Exception(('Setting %s not found in the database. A placeholder ' +
'record has been created. Go to the Developers Console for your app ' +
'in App Engine, look up the Settings record with name=%s and enter ' +
'its value in that record\'s value field.') % (name, name))
return retval.value
Note that most the app depends on the datastore. Sadly google doesn’t allow you to have env variables easily, but you can setup env variables in the datastore.
For this I added a class called Settings.
Then we just add bind the route handler:
import webapp2
app = webapp2.WSGIApplication([('/events/run-jobs', RunJobsHandler)],
debug=True)
This should allow our app, to spin up a cluster, launch containers and then shutdown the cluster. In my code I also added a handler for the shutdown.
Then make sure you have gcloud installed (here is how and just deploy the appengine using the gcloud deploy command and you should be good to go ( here is how
While my example runs the same docker image, and just has different operation with different env variables, you can easily adjust this code to suit whatever need you might have.
Here is the full git repo:
Hope you find it useful!