1

How to extract Typescript codes from this html page? It has p and class "synStatement", "synIdentifier", "synConstant", "synType". I am learning jsoup. The output from my Java jsoup program is not complete and also not formatted properly.

<p>Currency.ts</p>

<pre class="code lang-typescript" data-lang="typescript" data-unlink><span class="synStatement">export</span> <span class="synStatement">type</span> Currency <span class="synStatement">=</span> <span class="synIdentifier">{</span>
  unit: <span class="synConstant">'EUR'</span> | <span class="synConstant">'GBP'</span> | <span class="synConstant">'JPY'</span> | <span class="synConstant">'USD'</span>
  value: <span class="synType">number</span>
<span class="synIdentifier">}</span>

<span class="synStatement">export</span> <span class="synStatement">const</span> Currency <span class="synStatement">=</span> <span class="synIdentifier">{</span>
  <span class="synStatement">from(</span>value: <span class="synType">number</span><span class="synStatement">,</span> unit: Currency<span class="synIdentifier">[</span><span class="synConstant">'unit'</span><span class="synIdentifier">]</span> <span class="synStatement">=</span> <span class="synConstant">'USD'</span><span class="synStatement">)</span>: Currency <span class="synIdentifier">{</span>
    <span class="synStatement">return</span> <span class="synIdentifier">{</span> unit<span class="synStatement">,</span> value <span class="synIdentifier">}</span>
  <span class="synIdentifier">}</span>
<span class="synIdentifier">}</span>
</pre>

Desired output:

Currency.ts

export type Currency = {
  unit: 'EUR' | 'GBP' | 'JPY' | 'USD'
  value: number
}

export const Currency = {
  from(value: number, unit: Currency['unit'] = 'USD'): Currency {
    return { unit, value }
  }
}

I tried:

import java.io.File;

public class Currency
{
    public static void main( String[] args )
    {
        try {
            File input = new File("Currency.html");
            Document doc = Jsoup.parse(input, "UTF-8", "");
            List<String> typescriptCode = new ArrayList<String>();
            String strs[] = {
                "synStatement",
                "synIdentifier",
                "synConstant",
                "synType",
            };
            for (String str : strs) {
                Elements spansWithsynStatementElements = doc.select("span." + str);
                if (spansWithsynStatementElements != null) {
                    for (Element e : spansWithsynStatementElements) {
                        String text = "";
                        text += e.ownText();
                        typescriptCode.add(text);
                    }
                }
            }
            
            int size = typescriptCode.size();
            for (int i = 0; i < size; i++) {
                System.out.println(typescriptCode.get(i));
                System.out.println("");
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

1 Answer 1

1

If formatting is not an issue for you, you can simply extract and print the text:

String script = doc.text(); 
System.out.println(script);

The output is:

Currency.ts export type Currency = { unit: 'EUR' | 'GBP' | 'JPY' | 'USD' value: number}export const Currency = { from(value: number, unit: Currency['unit'] = 'USD'): Currency { return { unit, value } }}

If you want to format the output, you'll have to use a pretty print library. You can look here for example.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.