In this tutorial, we provide instructions how to set up a personal deep learning environment on the Google Cloud Platform (GCP), and how to create and run a (computing) instance on a virtual machine (VM) which (re)uses an instructor-designed image. Please follow the steps in detail.
Important note: this setup is part of Assignment 0.
Go to the Google cloud console (https://cloud.google.com/) and sign in with your LionMail account (yourUNI@columbia.edu).
If you sign up with some other account, you will not be able to use google coupons provided by instructors.If you are a new user of Google cloud, you can get $300 credits for free by clicking 'Get started for free'. You can explore the GCP for a while with free credits. After the add/drop period, students will get educational coupons from instructors to cover course-related google cloud expenses.
After you have received a Google Cloud coupon/code, please follow the following instructions to redeem the coupons. (Google Cloud coupon distribution method TBD.)
Go to https://console.cloud.google.com/education
Select your LionMail account on the top right and your project (if created) at the top left. If you do not have a project right now, that is fine. You can first redeem the coupon and then create the project following the instructions.
Enter the required information and click 'ACCEPT AND CONTINUE'.
If the coupon is successfully redeemed, you will be redirected to the overview page of your billing account.
All the billing accounts created using educational coupons will be named 'Billing Account for Education'. If you have redeemed other coupons before, there will be multiple billing account with the same name.
A billing account can be linked to different projects. To find out which project is linked to one billing account, you can click 'Account management' and see the details. You can also rename the billing account for clearer usage in the future.
Now, you have successfully redeemed the coupon.
Please notice that charges for using a GPU can be approximately $1/hour - so please manage your computational resources wisely. A good way to do this is to create a deep learning environment on your local computer and debug your code there, and only finally run it in the Google Cloud when more powerful computational resources are needed. Note that some assignments can be executed even on non-GPU personal computers.
Go to Google cloud dashboard
Create your project
Click 'create project'. Make sure that you create the project under the organization 'columbia.edu'.
For administrative reasons, we request that you use 'ecbm4040-yourUNI' as your project name.Upgrade your billing account (skip if already upgraded)
If this is your first time using GCP with your columbia ID, you need to upgrade your account to get access to all GCP features. Click 'Navigation Menu' on the top-left of homepage.Verify your GPU quota(s)
Make sure to select the project that you just created 'ecbm4040-yourUNI'.
Click on "Filter" again. Find "Metric : compute.googleapis.com/gpus_all_regions".
If you have not used this service before, then your value for this quota is 0.
You should change the limit to at least 1.
Change the limit
Select the quota you want to edit and then click 'EDIT QUOTAS'.
Fill in the info and then submit the request.
Wait for a moment to let Google process your request.
You should receive an e-mail from Google informing you that they received the request. You will receive another e-mail after your quota request is approved.
Note that the quota editing request would be processed typically in one or two business days. But the actual waiting period might vary from minutes to a few hours to 4 days or even longer, depending on the general quota demand. Typically, it takes longer for Google to process the requests at the end of the semester. Please be aware of that fact and manage your time for project experiments at the end of semester properly.
Wait for a few more minutes and check that the quota has been successfully edited.
Verify the quotas for the compute engine (using NVIDIA T4 as an example).
Find "Metric: compute.googleapis.com/nvidia_t4_gpus".
You will see that their limits are already 1. The regions are where the compute engine is located.
You can change the limit according to your future needs.
Create a new GCP virtual machine (VM) instance.
Before creating a GCP VM, make sure that you have enough quotas for gpus_all_regions and nvidia_t4_gpus. During the process of creating a VM, you can also find out how many quotas are needed and whether your quotas are sufficient. We will show this later.
There are 2 options for creating a VM instance:
Creating a VM instance in your GCP project, based on a pre-created custom image.
Go to "Compute Engine" -> "VM instances", click CREATE INSTANCE.
Note: Check GPU availability in various zones on this site, you may need to experiment with different zones.
Select from CUSTOM IMAGES, source project ecbm4040-ta for the image "ecbm4040-imageforstudents-tf24".
Click "Select". This image is pre-installed with the following specs:
Under 'Firewall', check "Allow HTTP traffic" and "Allow HTTPS traffic".
Note: you can later create additional different instances with various computational power for your project, the procedure is the same.
Wait for several minutes, and the newly created VM instance will be running after the creation.
Possible issues during the creation.
If you do not have enough quotas for the computing resource you want, After you select the machine, a reminder will appear in the top right corner, informing you that there are not enough quotas and telling you the required amount (using NVIDIA A100 80G as an example). You can adjust the quota accordingly to ensure you have enough quotas to create the VM.
Sometimes after you create the VM, the system will prompt you with 'quota maximum in region has been exceeded.' This is also due to insufficient quotas. Please increase the quota value.
Due to limited computing resources, sometimes the resources you want may be scarce in a certain region, making it impossible to allocate a compute engine. In this case, after creating the VM, the system will prompt you with '', and you can click 'RETRY' multiple times, which might resolve the issue, or you can switch to another available region.
If you find that it is hard to find an available region, you can use gpu-finder following the guidance click, which can automatically attept to create instances in different regions.
There are two methods to establish a connection to your cloud GCP instance from your personal computer: one is using the GCP dashboard, and the other is based on GCP firewall settings.
Method 1 - Connect to the instance directly from the GCP dashboard (online): Go to the list of compute instances on the GCP dashboard. Click the 'SSH' button next to your running instance and wait for several minutes.
After the connection is established, a new cmd window will appear, you have to change your Linux user name to 'ecbm4040'. That can be done by clicking on the settings button in the command window, as shown in the screenshot below.
Method 2 - Connect to the instance using the Google Cloud SDK:
Install the Google Cloud SDK onto your laptop first.
After the installation, the instructions for interacting with your VM instance on the cloud
can be called via SDK command lines from the local computer. Open the SDK
console (cmd window) on your local computer, and initialize your Gcloud account using gcloud init
command.
If this is the first
activity after you installed the SDK, you will be directed to a website. The
information such as zone or project id should conform with your previous online settings.
After the initialization, type gcloud init
again, and you should see something like the following:
Now you can use ssh tools provided by Google Cloud SDK to connect to your instance (running in the cloud) with the following command:
gcloud compute ssh ecbm4040@your_instance_name
This will open up a remote connection in another cmd window, and provide a command line access to your google cloud instance.
When you want to close the connection, type exit
.
To learn more about the Google Cloud CLI, you can browse this guidance gcloud CLI.
Note: The custom image which instructors provide to you contains all the tools, installed under the username 'ecbm4040'. If you ssh into another user name, many components will not work.
This step will check whether the tools have been properly installed, and if they are available in your environment.
CUDA tool verification: First, check whether a GPU device is available:
ecbm4040@your-instance-name: $ nvidia-smi
If GPU is available, that output will show some basic information about your GPU device.
Python virtual environments are used for managing python and other software versions. (Note that for the local computer setup, we described the installation of Anaconda instead.) In the instructor's custom image, a conda environment called 'envTF24' has been set up. You need to use the instruction below to activate it. It is recommended that you use the same environment for your future assignments. If you need additional tools, they can be added by using pip install commands.
Switch user to ecbm4040 if you have not already done so and navigate to environment directory.
(base) username@your-instance-name: $ sudo su ecbm4040
(base) ecbm4040@your-instance-name: $ cd ../ecbm4040
Activate the envTF24 virtual environment
(base) ecbm4040@your-instance-name: $ source envTF24/bin/activate
After the activation of the environment, you can review which packages are currently installed using the command pip list:
(envTF24) ecbm4040@your-instance-name: $ pip list
Note: If you need to deactivate the environment, type
deactivate
TensorFlow is an open-source library for deep learning created by Google. The version of TensorFlow in the cloud image which is provided by the instructors is 2.4. That is the version that should be used to complete the assignments for E4040 in Fall 2021.
To check the installation of TensorFlow 2.4, type python
, and run the
following code inside the python prompt.
>>
with the Linux command prompt
$
. If you want to exit python, type exit() to get back to the Linux prompt.)
python
>> import tensorflow as tf
>> tf.__version__
'2.4'
To verify that the GPU is configured and CUDA libraries are successfully loaded:
tf.config.list_physical_devices()
You should see PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')
printed to the console along with other messages indicating the opening of CUDA libraries.
We next describe (a) how to start a Jupyter server in your Google Cloud VM instance, and (b) how to open/access your Jupyter Notebook
There are two ways to accomplish this:
(i) Method 1 - Using the console of the Google SDK running on your laptop;
(ii) Method 2 - Configuring a firewall from the GCP dashboard.
Jupyter tools have been installed in the 'envTF24' virtual environment in the GCP instance.
Configuring and starting Jypyter server on the GCP
Configure your Jupyter Notebook on the server side
First, generate a new configuration file:
(envTF24)ecbm4040@your-instance-name: $ jupyter notebook --generate-config
Open that configuration file:
(envTF24)ecbm4040@your-instance-name: $ vi ~/.jupyter/jupyter_notebook_config.py
Add the following lines into the file. (If you are new to Linux and do not know how to use the vi editor, see this tutorial: https://www.cs.colostate.edu/helpdocs/vi.html).
c = get_config()
c.NotebookApp.ip = '*'
c.NotebookApp.open_browser = False
c.NotebookApp.port = 9999 # or other port number
Generate your Jupyter login password, press Enter for no password.
(envTF24)ecbm4040@your-instance-name: $ jupyter notebook password
Enter password:
Verify password:
[NotebookPasswordApp] Wrote hashed password to /Users/you/.jupyter/jupyter_notebook_config.json
Start Jupyter server in your Google Cloud VM instance
(envTF24)ecbm4040@your-instance-name: $ jupyter notebook
Opening the Jupyter Notebook:
Your Jupyter server is running remotely in your GCP instance. You need to connect your local computer to that remote server in order to view, edit and run your Jupyter Notebook files from a browser on your laptop (Chrome, Firefox, etc.).
Method 1: Open Jupyter Noteboook using the Google cloud SDK
Open an SDK console and use SSH to connect to the Jupyter Notebook.
Type in the following
code to set up a connection with your remote instance.
Note that in -L 9999:localhost:9999
, the
first
"9999" is your local port and that you can set another port number if you want. The second "9999" is the
remote port number and it should be the same as the port that the Jupyter Notebook server is using.
gcloud compute ssh --ssh-flag="-L 9999:localhost:9999" --zone "us-east1-d" "ecbm4040@your-instance-name"
Open a browser on your laptop (Chrome, Safari etc.)
Go to http://localhost:9999
or https://localhost:9999
and you will be directed to your remote Jupyter server. Type in the Jupyter password that you created
before, and now you can enter your home directory in the linux virtual machine, which is running in the GCP.
Method 2: Open Jupyter Noteboook by configuring a firewall from the GCP dashboard
GitHub, a tool that you can use to manage your assigments.
How to use GitHub for working with GCP:
Activate the virtual environment
ecbm4040@your-instance-name:~$ source envTF24/bin/activate
Check if Git is installed
(envTF24) ecbm4040@your-instance-name:~$ git --version
This command will output the version of Git installed if it is present. For example, you might see an output like:
git version 2.17.1
Configure Git
(envTF24) ecbm4040@your-instance-name:~$ git config --global user.name "Your Name"
(envTF24) ecbm4040@your-instance-name:~$ git config --global user.email "your.email@example.com"
You can check your configuration settings with:
(envTF24) ecbm4040@your-instance-name:~$ git config --list""
Generate SSH Key
(envTF24) ecbm4040@your-instance-name:~$ ssh-keygen -t rsa -b 4096 -C "your.email@example.com"
Press Enter to accept the default file location and provide a passphrase if you want extra security. After generating the key, add it to the SSH agent:
(envTF24) ecbm4040@your-instance-name:~$ eval "$(ssh-agent -s)"
(envTF24) ecbm4040@your-instance-name:~$ ssh-add ~/.ssh/id_rsa
Add SSH Key to GitHub. You need to add your SSH key to your GitHub account. Copy your SSH key to the clipboard:
(envTF24) ecbm4040@your-instance-name:~$ cat ~/.ssh/id_rsa.pub
Log in to your GitHub account, go to Settings -> SSH and GPG keys -> New SSH key, and paste your key there.
Clone the repository to your workspace
To clone your repository to your local machine, copy the SSH URL from your repository on GitHub:
And run:
(envTF24) ecbm4040@your-instance-name:~$ github clone git@github.com:your-username/your-repository.git
Now you can navigate into your repository and start using Git. Here are some basic Git commands:
To check the status of your repository:
git status
To add files to your repository:
git add .
To commit your changes:
git commit -m "Your commit message"
To push your changes to GitHub:
git push
Pull Changes from GitHub:
git pull
gpu-finder, a useful tools that attempts to make it easier to find and provision Compute Engine Instances with GPUs.
How to use gpu-finder:
Before using this tool, make sure that you have already installed gcloud CLI on your workstation. If not, please follow the guidance click.
Clone the repository to your workstation. And follow the README carefully.
Download the service account key file and set the environment variable.
Following the guidance link to create a service account key.
After create the service account key, set the environment variable following link
Install the Google API client liabrary by running the command below:
pip install -r requirements.txt
Modify the gpu-config.json file.
This file is use to set the appropriate configuration parameters, similar to how you adjust VM configurations when creating a new VM on a web page. Here is the recommended setting.
Change the project_id and the name of the instance according to the requirements.
Modify gpu-finder.py
According to our needs, we need to make the following modifications to the function create_instance.
def create_instance(compute, project, config, zone_list):
...
for j in range(compute_config['number_of_instances']):
print(f"Creating instance number {instances+1} of {compute_config['number_of_instances']} in {zone_config['zone']},
zone {zones_attempted+1} out of {len(zones)} attempted.")
image_project = compute_config['instance_config']['image_project']
# image_family = compute_config['instance_config']['image_family']
# image_response = compute.images().getFromFamily(
# project=image_project, family=image_family).execute()
image_name = compute_config['instance_config']['image_family']
image_response = compute.images().get(project=image_project, image=image_name).execute()
source_disk_image = image_response['selfLink']
# instance_name = compute_config['instance_config']['name'] + '-' + str(instances+1) + '-' + zone_config['zone']
instance_name = compute_config['instance_config']['name']
# Configure the machine
...
Run gpu-finder.py. Wait until it find an available machine for you.
If you do not want to delete the instance, you can simply close the terminal.
The machine could be unavaliable the second time you want to use. It is highly recommended that you use GitHub to manage your projects.
Tmux, a screen multiplexer. It allows you to run multiple programs in multiple window panes within one terminal. That capability makes Tmux a popular tool for working on a remote server (such as GCP) while connected from a personal computer. If you want to explore more applications of Tmux, click here, or read this note: http://deeplearning.lipingyang.org/2017/06/28/tmux-resources/
How to use Tmux for working with GCP:
(base) ecbm4040@your-instance-name: $ conda activate envTF24
Next, create a Tmux session:
(envTF24) ecbm4040@your-instance-name: $ tmux new -s session1
Then you will be in the session named 'session1'.
(envTF24) ecbm4040@your-instance-name: $ jupyter notebook
For more Tmux commands, refer to this link: https://www.hamvocke.com/blog/a-quick-and-easy-guide-to-tmux/.
The biggest advantage of Tmux is that it allows a process to keep running even when your laptop is disconnected from your instance in the cloud. If your network has accidentially broken, or you need to close your laptop, the process would still be running in the cloud session, unless you kill the whole session. We highly recommend that you train time-consuming deep learning models in a Tmux session.
(envTF24) ecbm4040@your-instance-name: $ pip install runipy
Suppose that you opened a Jupyter Notebook, and used SSH to connect to it. Attach to your created Tmux session ('session1' here). Split the window panes. Switch to the new window pane, then you can use runipy to run your .ipynb file.
For more details, see http://deeplearning.lipingyang.org/2018/03/29/run-jupyter-notebook-from-terminal-with-tmux/.
ECBM E4040 Neural Networks and Deep Learning
Columbia University