2

I have data like following

  1. 12 x ATG 370 g, 12 x 720 ml, 1 Glas = 0.97, 1 kg = 2.03
  2. versch. Sorten, 2 x 250 g, 1 Packung = 1.-, 100 g = 0.40
  3. 2 x 950 g, 1 Packung = 4.98, 1 kg = 4.47, tiefgekühlt
  4. versch. Sorten, 2 x 500 g, 1 Packung = 0.65, 1 kg = 1.-
  5. 3,5 % Fett, 3 x 1 Liter, 1 Packung = 0.76, 1 Liter = 0.60
  6. Krönung Balance gemahlen oder Krönung Aroma ganze Kaffeebohnen, 500 g, 1 kg = 6.44
  7. versch. Sorten, 400 g, 1 kg = 5.60
  8. 400 g, versch. Sorten, 1 kg = 5.60

Expected Outcome

  1. 12 x 720 ml => { pack: 12, weight:720 , unit: ml }
  2. 2 x 250 g. => { pack: 2, weight:250 , unit: g }
  3. 2 x 950 g => { pack: 2, weight:950 , unit: g }
  4. 2 x 500 g => { pack: 2, weight:500 , unit: g }
  5. 3 x 1 Liter => { pack: 3, weight:1 , unit: Liter }
  6. 500 g => { pack: 1, weight:500 , unit: g }
  7. 400 g => { pack: 1, weight:400 , unit: g }
  8. 400 g => { pack: 1, weight:400 , unit: g }

I tried the following code

const re = /^(\d+x)?([\d,]+)([a-z]+)/gm;

str.split(",").forEach(v => {
   const value = v.replace(/\s/g, "")
   let arr = [...value.matchAll(re)];
   console.log(arr[0]);
})

Results of the input string using above code

  1. 12 x ATG 370 g, 12 x 720 ml, 1 Glas = 0.97, 1 kg = 2.03

["12x", undefined, "12", "x"] ["12x720ml", "12x", "720", "ml"] undefined ["1kg", undefined, "1", "kg"]

  1. versch. Sorten, 2 x 250 g, 1 Packung = 1.-, 100 g = 0.40

undefined ["2x250g", "2x", "250", "g"] undefined ["100g", undefined, "100", "g"]

and so on...

I am not able to figure out how to extract the desired data and if this is even possible since the occurrence of the required data is not positioned properly in the string.

EDIT ( NEW )

Wiktor Stribiżew solution works perfectly for the above cases.

New Requirement -

  1. 12 x ATG 370 g, 12 x 720 ml, 1 Glas = 0.97, 1 kg = 2.03
  2. versch. Sorten, 2 x 250 g, 1 Packung = 1.-, 100 g = 0.40
  3. 2 x 950 g, 1 Packung = 4.98, 1 kg = 4.47, tiefgekühlt
  4. versch. Sorten, 2 x 500 g, 1 Packung = 0.65, 1 kg = 1.-
  5. 3,5 % Fett, 3 x 1 Liter, 1 Packung = 0.76, 1 Liter = 0.60
  6. Krönung Balance gemahlen oder Krönung Aroma ganze Kaffeebohnen, 400 - 500 g, 1 kg = 6.44 ( Range )
  7. versch. Sorten, 400 g, 1 kg = 5.60
  8. 100 - 400 g, versch. Sorten, 1 kg = 5.60 ( Range )

Expected Outcome

  1. 12 x 720 ml => { pack: 12, minweight:720 , maxweight: 0, unit: ml }
  2. 2 x 250 g. => { pack: 2, minweight:250 , maxweight: 0, unit: g }
  3. 2 x 950 g => { pack: 2, minweight:950 , maxweight: 0, unit: g }
  4. 2 x 500 g => { pack: 2, minweight:500 , maxweight: 0, unit: g }
  5. 3 x 1 Liter => { pack: 3, minweight:1 , maxweight: 0, unit: Liter }
  6. 400 - 500 g => { pack: 1, minweight:400 , maxweight: 500, unit: g }
  7. 400 g => { pack: 1, minweight:400 , maxweight: 0, unit: g }
  8. 100 - 400 g => { pack: 1, minweight:100 , maxweight: 400, unit: g }
2
  • Can you have multiple valid sub-strings in your string? Something like "1 x 1 mg, 1000 l"? Why not to have 2 separate regex expressions? Commented Jan 3, 2021 at 20:04
  • @PM77-1 I highly doubt but for the time being i can assume no. Can you please elaborate your comment Why not to have 2 separate regex expressions? Commented Jan 3, 2021 at 20:06

1 Answer 1

1

You can use

const arr = ['12 x ATG 370 g, 12 x 720 ml, 1 Glas = 0.97, 1 kg = 2.03','versch. Sorten, 2 x 250 g, 1 Packung = 1.-, 100 g = 0.40','2 x 950 g, 1 Packung = 4.98, 1 kg = 4.47, tiefgekühlt','versch. Sorten, 2 x 500 g, 1 Packung = 0.65, 1 kg = 1.-','3,5 % Fett, 3 x 1 Liter, 1 Packung = 0.76, 1 Liter = 0.60','Krönung Balance gemahlen oder Krönung Aroma ganze Kaffeebohnen, 400 - 500 g, 1 kg = 6.44','versch. Sorten, 400 g, 1 kg = 5.60','100 - 400 g, versch. Sorten, 1 kg = 5.60'];
const re = /(?:,\s*|^)(?:(\d+)\s*x\s*)?(\d+(?:\s*-\s*\d+)?)\s*([a-zA-Z]+)(?:$|,)/;
arr.forEach( str => {
   let [_, pack, weight, unit] = str.match(re);
   pack = pack || 1;
   console.log(str, {'pack': pack, 'weight': weight, 'unit': unit});
})

The regex matches:

  • (?:,\s*|^) - either a comma followed with zero or more whitespaces or start of string
  • (?:(\d+)\s*x\s*)? - an optional sequence of
    • (\d+) - Capturing group 1 (pack): one or more digits
    • \s*x\s* - x enclosed with optional zero or more whitespaces
  • (\d+(?:\s*-\s*\d+)?) - Capturing group 2 (weight): one or more digits and an optional sequence of - enclosed with optional whitespaces and then one or more digits
  • \s* - zero or more whitespaces
  • ([a-zA-Z]+) - Capturing group 3 (unit): one or more letters
  • (?:$|,) - either end of string or a comma

See the regex demo.

Sign up to request clarification or add additional context in comments.

2 Comments

Super thanks. Works like a charm. One more question - Please see the edited question ( last section )
@SaurabhKumar It is just a matter of adding an optional group, (?:\s*-\s*\d+)?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.