2

Here is my sample data:

Option failonnomatch on
Option batch on
Option confirm Off
open sftp://username:[email protected]:22 hostkey="ssh-rsa 1024 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00"

get File*.txt \local\path\Client\File.txt
mv File*.txt /remote/archive/

close
exit

I would like to create a powershell script to extract pieces of information out of this text file.

List of items I need:

  • Username
  • Password
  • Host
  • Port
  • ssh key
  • File Name
  • Local Path
  • Remote Path

I'm hoping that if I learn how to do a couple of these, the method will be applicable to all items. I attempted to extract the ssh key with the following powershell/regex:

$doc -match '(?<=hostkey=")(.*)(?=")' 

$doc being the sample data

but it appears to be returning the whole line. Any help would be greatly appreciated. Thank you.

3
  • 1
    If their all key/value like that, just use (?<=\bkey=")([^"]*)(?=") Or, you could do a global match using (?<=\b\w+=")([^"]*)(?=") Commented Nov 15, 2018 at 21:56
  • 1
    Your command will ony return $true/$false. To return a value you need to evaluate the $Matches collection. Also to what file do you refer? edit your question to contain some sample data. Commented Nov 15, 2018 at 22:12
  • what part of the last line is the "file" and what part is the "path"? the File*.txt looks like a file specification. the next part seems to be the full file name. i presume you want that broken into \SERVER\Path\Client & File.txt but i'm unsure of that. Commented Nov 15, 2018 at 22:20

2 Answers 2

1

If -match is returning a whole line, the implication is that the LHS of your -match operation is an array, which in turn suggests that you used Get-Content without -Raw, which yields the input as an array of lines, in which case -match acts as a filter.

Instead, read your file as a single, multi-line string with Get-Content -Raw; with a scalar LHS,
-match then returns a [bool]
, and the results of the matching operation are reported in automatic variable $Matches (a hashtable whose 0 entry contains the overall match, 1 what the 1st capture group matched, ...):

# Read file as a whole, into a single, multi-line string.
$doc = Get-Content -Raw file.txt 

if ($doc -match '(?<=hostkey=")(.*)(?=")') {
   # Output what the 1st capture group captured
   $Matches[1]
}

With your sample input, the above yields
ssh-rsa 1024 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00


You can then extend the approach to capture multiple tokens, in which case I suggest using named capture groups ((?<name>...)); the following example uses such named capture groups to extract several of the tokens of interest:

if ($doc -match '(?<=sftp://)(?<username>[^:]+):(?<password>[^@]+)@(?<host>[^:]+)'){
  # Output the named capture-group values.
  # Note that index notation (['username']) and property
  # notation (.username) can be used interchangeably.
  $Matches.username
  $Matches.password
  $Matches.host
}

With your sample input, the above yields:

username
password
host.name.net

You can extend the above to capture all tokens of interest.
Note that . by default doesn't match \n (newline) characters.


Optional reading: Using the x (IgnoreWhiteSpace) option to make regexes more readable:

Extracting that many tokens can result in a complex regex that is hard to read, in which case the x (IgnoreWhiteSpace) regex option, can help (as an inline option, (?x) at the start of the regex):

if ($doc -match '(?x)
    (?<=sftp://)(?<username>[^:]+)
    :(?<password>[^@]+)
    @(?<host>[^:]+)
    :(?<port>\d+)
    \s+hostkey="(?<sshkey>.+?)"
    \n+get\ File\*\.txt\ (?<localpath>.+)
    \nmv\ File\*\.txt\ (?<remotepath>.+)
  '){
    # Output the named capture-group values.
    $Matches.GetEnumerator() | ? Key -ne 0
}

Note how the whitespace used for making the regex more readable (spreading it across multiple lines) is ignored while matching, whereas whitespace to be matched in the input must be escaped (e.g., to match a single space, or [ ], or \s to match any whitespace char.)

With your sample input, the above yields the following:

Name                           Value
----                           -----
host                           host.name.net
localpath                      \local\path\Client\File.txt
port                           22
sshkey                         ssh-rsa 1024 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
remotepath                     /remote/archive/
password                       password
username                       username

Note that the reason the capture groups are out of order is that $Matches is a hash table (of type [hashtable]), whose key enumeration order is an implementation artifact: no particular enumeration order is guaranteed.

However, random access to capture groups works just fine; e.g., $Matches.port will return 22.

Sign up to request clarification or add additional context in comments.

4 Comments

I like this method as the regex seems to make a little more sense but I'm getting stuck when I go down to grab the file name. I think it's because I'm moving to a new line but I'm not sure how to include that in the regex. Thank you. (?<=sftp://)(?<username>[^:]+):(?<password>[^@]+)@(?<host>[^:]+):(?<port>[^-]+) -hostkey="(?<sshkey>[^"]+)(?<=get )(?<filename>[^/])
@MichaelSPalatsi: You need to match intervening whitespace as well (and, as stated, by default . doesn't match \n (newlines)). Please see my update for using the IgnoreWhiteSpace regex option to make complex expressions more manageable.
Awesome! That will certainly clean things up. I believe I have one last question. Say I have a group of files and I intend on using this regex against all of those files BUT in some files, one of my groupings is likely to not match anything. How can I handle that?
Glad to hear it, @MichaelSPalatsi. As for your follow-up question: that's hard to answer in the abstract. I suggest you create a new question that focuses just on that problem with specific examples. Feel free to ping me here once you have done so, and I'm happy to take a look.
1

this uses named matches with flags set to singleline, multiline, case insensitive and then uses $Matches.MatchName to get the items into a custom object.

# fake reading in a text file as one string
#    in real life, use Get-Content -Raw
$InStuff = @'
open sftp://username:[email protected]:22 hostkey="ssh-rsa 1024 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00"

get File*.txt \SERVER\Path\Client\File.txt
'@

$Null = $InStuff -match '(?smi).+//(?<UserName>.+):(?<Password>.+)@(?<HostName>.+):(?<Port>.+) hostkey="(?<SshKey>.+)".+get .+ (?<FullFileName>\\.+)$'

[PSCustomObject]@{
    UserName = $Matches.UserName
    Password = $Matches.Password
    Port = $Matches.Port
    SshKey = $Matches.SshKey
    PathName = Split-Path -Path $Matches.FullFileName -Parent
    FileName = Split-Path -Path $Matches.FullFileName -Leaf
    }

output ...

UserName : username
Password : password
Port     : 22
SshKey   : ssh-rsa 1024 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
PathName : \SERVER\Path\Client
FileName : File.txt

6 Comments

It's an effective solution (+1), but if you provide a complete solution that is specific to the OP's exact scenario without addressing the misconceptions implied by the question (around how -match works), you'll make the OP very happy, but future readers with similar misconceptions - but different scenarios - won't necessarily benefit.
@mklement0 - i see what you mean ... i took the mention of "as one string" covered that idea. yours is far more detailed on the subject. i'll try to keep that in mind. [grin]
Hi Lee, I failed to mention I have additional lines preceding and following the given sample. How can I accommodate those lines? Thank you.
@MichaelSPalatsi - you will need to add a complete text to your original post so that folks can have a realistic sample to code against. if the text is too long, post it to Pastebin or Gist.GitHub and add a link to it into your OP.
@Lee_Dailey That makes sense. Sorry, I'm new. :) I've updated the OP.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.