0

consider that i am getting a HTML format string and want to read the number of words & characters, Consider, i am getting,

var HTML =  '<p>BODY&nbsp;Article By Archie(Used By Story Tool)</p>';

now i want to get number of words and characters

above html will look like:

BODY Article By Archie(Used By Story Tool)

IMPORTANT

  1. i want to avoid html tags while counting words or character
  2. avoid keywords like **&nbsp;** etc..
  3. Ex. words and character should be counted of : (for current example)
    BODY Article By Archie(Used By Story Tool)

please help,
Thank You.

3 Answers 3

3

To give an example for adamantium's suggestion:

var e = document.createElement("span");
e.innerHTML = '<p>BODY&nbsp;Article By Archie(Used By Story Tool)</p>';
var text = e.textContent || e.innerText;

var characterCount = text.length;
var wordCount = text.split(/[\s\.\(\),]+/).length;

Update: Added other word-stop characters

Sign up to request clarification or add additional context in comments.

4 Comments

thanks for replying i have following query: the words can be separated by: blank space, fullstop,comma,opening round bracket(,closing round bracket ), dot . ,etc how that can be done the above code is giving 9 word count it should be 8 i guess pls reply
your earlier example worked with K Primes idea of using .match() of regExp ,anyways. thanks for reply
and can you please tell how to identify number(digit) an same example ie e.innerHTML ='<p>Super By My 4 Story</p>'; i want to ignore digit from word count
to eliminate numeric sections, you'd use something like: /\s+[0-9]+\s+|[\s\.\(\),]+/
2
  1. Use a hidden HTML element that can render text like span or p

  2. Assign the string to the innerHTML of the hidden element.

  3. Count the characters using length property of innerText/textContent.

To read the word count you can

  1. Split the innerText/textContent using empty space

  2. Get the length of the returned array.

3 Comments

Instead of split, I'd recommend using RegExp to take into account non-space breaks - for example, text.match(/\b\w/g).length
Note that split() can also take a regular expression as an argument: developer.mozilla.org/En/Core_JavaScript_1.5_Reference/Objects/…
and can you please tell how to identify number(digit) an same example ie e.innerHTML ='<p>Super By My 4 Story</p>'; i want to ignore digit from word count
0

Algorithm:

  • Sweep through the entire html
  • Perform regex replaces
    • replace <.*> (regex for anyting tat stays withing <>)by nothing
    • replace /&nbsp/ by nothing
  • tip: can be done by replace function in javascript. hunt on w3schools.com

Now you have the clutter out!

then perform a simple word/character count

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.