0

I'm new to bash. I have installed "seqtk" and "magicblast" in my shell calling "conda install" on terminal, and they work properly when I directly called them on terminal or write them in a bash script(test.sh) and then run that script on terminal.

This is test.sh

#!/bin/bash

var1=$(sed 's/\.fastq\.gz/\.fa/' <<< $1)
seqtk seq -a $1 > $(basename $var1)

and this would work

$./test.sh sample_folder/sample_R1.fastq.gz

But when I tried to run the script using python(test.ipynb)

import os

os.system("./test.sh sample_folder/sample_R1.fastq.gz")

it gives me "seqtk: command not found"

I searched on Google and thought it's because I installed the commands in my shell but 'os.system' would start a new shell and run the commands there, that's why test.ipynb shows "command not found" while the commands can work properly on terminal.

I'm wondering how could I solve this issue.

Thanks in advance!

I also tried using "source" to solve this by os.system("source HOME/.bash_aliases"), and this failed and shows "source: not found". I don't really know how to use this command.

8
  • 1
    the problem seems not in Python, but seqtk, check again that its a known name by typing which seqtk or seqtk --version, and you certainly don't need the source thing which you are doing. Commented Jul 24, 2023 at 15:53
  • 2
    What's the output of type setq (in bash)? If it's an alias or a shell function, that's not expected to work in a noninteractive shell. If it's an executable, make sure it's located in a directory listed in print(os.environ['PATH']) (in Python). Commented Jul 24, 2023 at 16:17
  • 2
    BTW, os.system('source anything') isn't expected to work. First, source is a bash built-in and system invokes sh, not bash. Second, anything that the source command does only lasts as long as the shell does, and by the time os.system returns the shell it invoked has already exited. Third, aliases are only enabled by default in interactive copies of bash, so even if you run the alias command in a noninteractive shell it does nothing unless you've gone out of your way to turn that feature on. Commented Jul 24, 2023 at 16:18
  • Anyhow -- if you can tell us how the alias is defined, we can tell you how to use subprocess.Popen (which is the modern replacement; nobody should be using os.system anymore) to invoke the target directly without depending on the alias or using sh or bash at all. Commented Jul 24, 2023 at 16:22
  • (err, we need the output in bash of type seqtk, rather) Commented Jul 24, 2023 at 16:24

2 Answers 2

2

I see you named the file containing your Python program test.ipynb. Which suggests you are running it in a Jupyter notebook or some other IDE. Mamba (like Anaconda) is a virtual environment manager. You have to activate a virtual environment to use it. Activation means updating env variables; most importantly PATH which is used to locate external commands. Env vars are private to each process so when you activate a Mamba environment in one shell it does not activate it in any other process.

You could do what @pts suggested in their answer but I recommend not doing that. It's brittle and just introduces another source of confusing behavior. You will be better served by changing your workflow to launch your IDE from the shell where you activated the Mamba environment and otherwise learn to run any program that depends on a Mamba environment from that shell.

Sign up to request clarification or add additional context in comments.

2 Comments

im actually remotely connect to a server using VScode and do my work there, so im not sure if your suggestion also works for my situation. And could u please give an example of how to activate virtual environment by updating env vars? Is it the way as what @pts suggested? im new to bash and shell so it's a bit confused for me. Thank you!
@PetrichorWang, What pts suggested does not activate a virtual environment. It is a workaround for not having activated the virtual environment by explicitly modifying the PATH env var to include the directory of the virtual environment that contains its programs. That is sometimes appropriate but probably not in your case. Furthermore, given what you just wrote it is unclear if it would even solve your problem. Sorry, but it is going to take a lot of one-on-one hand holding to help you. Something a SO question isn't suited for.
0

A quick workaround is doing this in Python:

os.environ['PATH'] = os.pathsep.join((
    '/mnt/data/my_folder/mambaforge/bin', os.environ['PATH']))
os.system('seqtk ...')  # It will work.
os.system('./test.sh ...')  # It will work.

However, for long term maintainability, you should activate the virtual environment instead. Doing so will take care of modifying PATH and other relevant environment variables. See the answer by @KurtisRader for details.

2 Comments

Btw is there any method that can make the program run faster? using seqtk and magicblast combine sample1_R1.fastq.gz and sample1_R2.fastq.gz takes about 2min for each sample.
@PetrichorWang, You should open a new question regarding your performance problem. Each StackOverflow question should be focused on a single issue.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.