0

I have a large number of files, of which I want to do a word analysis - counting how often each word appears within each file. As the final output I want to have a CSV file with the file names in the heading and for each file two columns - word and the respective count.

file1 word, file1 count, file2 word, file2 count, ....
hello, 4, world, 5, ...
password, 10, save, 2, ...

To achieve this I open each file and save the word count in a hash table. Because each hash table has a different length (different number of unique words) I try to put the results in a data table to export them.

$file = Get-ChildItem -Recurse 

$out = New-Object System.Data.DataSet "ResultsSet"

foreach($f in $file){
$pres = $ppt.Presentations.Open($f.FullName, $true, $true, $false)
$id = $f.Name.substring(0,5)

$results = @{} #Hash table for this file
for($i = 4; $i -le $pres.Slides.Count; $i++){
    $s = $pres.Slides($i)
    $shapes = $s.Shapes 
    $textBox = $shapes | ?{$_.TextFrame.TextRange.Length -gt 100}

    if($textBox -ne $null){
        $textBox.TextFrame.TextRange.Words() | %{$_.Text.Trim()} | %{if(-not $results.ContainsKey("$_")){$results.Add($_,1)}else{$results["$_"] += 1 }}
    }
}

$pres.Close()

$dt = New-Object System.Data.DataTable
$dt.TableName = $id
[String]$dt.Columns.Add("$id Word")
[Int]$dt.Columns.Add("$id Count")
foreach($r in ($results.GetEnumerator() | sort Value)) {
    $dt.Rows.Add($r.Key, $r.Value)
}
$out.Tables.Add($dt)
}

$out | export-csv

There are two main issues:

  1. The number of unique words is different for each file (hash tables have different length)
  2. Files are read one-by-one. So the results for each file need to be cached before being exportet.

Somehow I do not get the output that I want, but only meta data. How can I achieve the correct output?

7
  • So if you have 3 files, each with 30 unique words, you want to end up with 180 columns? Commented Oct 1, 2020 at 13:10
  • use Export-Csv -NoTypeInformation to stop it from displaying metadata Commented Oct 1, 2020 at 13:28
  • @Doug: No. In that case I would want 6 columns (3 times "word" and "count") with 30 rows. Commented Oct 1, 2020 at 13:42
  • @Theo -NoTypeInformation only deletes the first line with the TypeInformation, but not the meta data output. Commented Oct 1, 2020 at 13:43
  • Then show us your current output and explain better what the desired output should be Commented Oct 1, 2020 at 13:50

1 Answer 1

0

I took the time to write out a simulation of your situation.

# File names. The number of files should match the number of hash tables
$Files = 'file1','file2','file3','file4','file5'
# hash table results per file (simulated)
$HashPerFile = [ordered]@{ hello = 4; goodbye = 3; what = 1; is = 7; this = 4 },
     [ordered]@{ password = 2; hope = 1; they = 3; are = 2; not = 5; plain = 2; text = 18},
     [ordered]@{ help = 6; me = 2; please = 5 },
     [ordered]@{ decrypt = 1; the = 3; problem = 1 },
     [ordered]@{ because = 2; I = 5; cannot = 9 }
# Headers for the object output
$properties = $Files |% {"$_ word";"$_ count"}

# Determining max number of rows in results based on highest hash table length
$MaxRows = [linq.enumerable]::max([int[]]($hashperfile |% {$_.Count}))

# Precreating the result array $r
$r = 1..$MaxRows |% { "" | select $properties }

# Index of $properties. This helps select the correct 'file word' and 'file count' property
$pIndex = 0

# for loop to go through each file and hash table
for ($i = 0; $i -lt $files.count; $i++) {

# rIndex is the index of the $r array.
# When a new file is selected, this needs to reset to 0 so we can begin at the top of the $r array again.
        $rIndex = 0

# Iterate the hash table that matches the file. Index $i ensures this.
        $hashPerFile[$i].GetEnumerator() |% { 
            $r[$rIndex].$($properties[$pIndex]) = $_.Key
            $r[$rIndex++].$($properties[$pIndex+1]) = $_.Value
        }

# Have to use +2 because there are two properties for each file
        $pIndex += 2
}

$r # Output
$r | Export-Csv output.csv -NoType # CSV output
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.