I have an SSH.py with the goal of connecting to many servers over SSH to run a Python script (worker.py). I am using Paramiko, but am very new to it and learning as I go. On each server I ssh over with, I need to keep the Python script running -- this is for training a model parallely and so the script needs to run on all machines as to update model parameters/train jointly. The Python script on the servers need to be running so either all the SSH connections cannot close or I have to figure out a way for the Python script on the servers to keep running even if I close the connection.
From extensive googling, it looks like you can achieve this with nohup or:
client = paramiko.SSHClient()
client.connect(ip_address, username, password)
transport = client.get_transport()
channel = transport.open_session()
channel.exec_command("python worker.py > /logs/'command output' 2>&1")
However, what is unclear to me is how do we close/exit all SSH connections? I am running the SSH.py file on cmd.exe, would closing the cmd.exe be enough for all processes remotely to close?
In addition, is my use of client.close() correct for my purposes?
Please see below what I have for my code.
# SSH.py
import paramiko
import argparse
import os
path = "path"
python_script = "worker.py"
# definitions for ssh connection and cluster
ip_list = ['XXX.XXX.XXX.XXX', XXX.XXX.XXX.XXX', XXX.XXX.XXX.XXX']
port_list = [':XXXX', ':XXXX', ':XXXX']
user_list = ['user', 'user', 'user']
password_list = ['pass', 'pass', 'pass']
node_list = list(map(lambda x: f'-node{x + 1} ', list(range(len(ip_list)))))
cluster = ' '.join([node + ip + port for node, ip, port in zip(node_list, ip_list, port_list)])
# run script on command line of local machine
os.system(f"cd {path} && python {python_script} {cluster} -type worker -index 0 -batch 64 > {path}/logs/'command output'/{ip_list[0]}.log 2>&1")
# loop for IP and password
for i, (ip, user, password) in enumerate(zip(ip_list[1:], user_list[1:], password_list[1:]), 1):
try:
print("Open session in: " + ip + "...")
client = paramiko.SSHClient()
client.connect(ip, user, password)
transport = client.get_transport()
channel = transport.open_session()
except paramiko.SSHException:
print("Connection Failed")
quit()
try:
channel.exec_command(f"cd {path} && python {python_script} {cluster} -type worker -index {i} -batch 64 > {path}/logs/'command output'/{ip_list[i]}.log 2>&1", timeout=30)
client.close() # here I am closing connection but above command should be running, my question is can I safely close cmd.exe on which I am running SSH.py?
except paramiko.SSHException:
print("Cannot run file. Continue with other IPs in list...")
client.close()
continue
The code is based on Running process of remote SSH server in the background using Python Paramiko
Edit: It seems like the channel.exec_command() is not executing the command
f"cd {path} && python {python_script} {cluster} -type worker -index {i} -batch 64 > {path}/logs/'command output'/{ip_list[i]}.log 2>&1"
So I wonder if it is because of client.close()? What would happen if I comment out all the lines with client.close()? Would this help? Is this dangerous? When I quit my local Python script, would this close all my SSH connections and hence, no need for client.close()?
Also all my machines have Windows OS.