Partial regexp matching

# Isiah Meadows (4 months ago)

I've been working on a test framework, and I'd love to implement support for matching tests via regexp-based selectors. It's basically impossible without the ability to execute a regular expression and test if it matched positively, negatively, or incompletely.

  • If the regexp does not have an end marker, this, of course, can't generate a negative match, only positive/incomplete ones.
  • If the regexp does have an end marker, this is where I actually need native support. (This is especially true if group references get involved.)

Any chance this could get added?


Isiah Meadows me at isiahmeadows.com

Looking for web consulting? Or a new website? Send me an email and we can get started. www.isiahmeadows.com

# Isiah Meadows (4 months ago)

Here's a few examples:

Any of these would suffice to solve my issue.

Also, I'm not the first to request this: esdiscuss.org/topic/partial-matching-a-string-against-a-regex


Isiah Meadows me at isiahmeadows.com

Looking for web consulting? Or a new website? Send me an email and we can get started. www.isiahmeadows.com

# Mike Samuel (4 months ago)

So you want an API that can indicate "might match if there were more input"?

Is it ok if it is conservative? I.e. it won't incorrectly say "definitely wouldn't match given more input" but could tolerate errors the other way?

For example, /^food(?!)/ would have to say no for "foop" but we might tolerate a maybe for "foo".

On Feb 15, 2018 8:12 AM, "Isiah Meadows" <isiahmeadows at gmail.com> wrote:

I've been working on a test framework, and I'd love to implement support for matching tests via regexp-based selectors. It's basically impossible without the ability to execute a regular expression and test if it matched positively, negatively, or incompletely.

  • If the regexp does not have an end marker, this, of course, can't generate a negative match, only positive/incomplete ones.
  • If the regexp does have an end marker, this is where I actually need native support. (This is especially true if group references get involved.)

Any chance this could get added?


Isiah Meadows me at isiahmeadows.com

Looking for web consulting? Or a new website? Send me an email and we can get started. www.isiahmeadows.com

# Isiah Meadows (4 months ago)

Yes, and I specifically want the conservative variant - I'd want a "maybe" for "foo" in that example.

(For context, my test framework determines while running a test whether that test has children, and checks whether to allocate them when defining them. For me, I only need "yes/maybe" and "no", but splitting "yes" and "maybe" could be beneficial to others.)


Isiah Meadows me at isiahmeadows.com

Looking for web consulting? Or a new website? Send me an email and we can get started. www.isiahmeadows.com

# Mike Samuel (4 months ago)

On Mon, Feb 19, 2018 at 11:43 AM, Isiah Meadows <isiahmeadows at gmail.com>

wrote:

Yes, and I specifically want the conservative variant - I'd want a "maybe" for "foo" in that example.

(For context, my test framework determines while running a test whether that test has children, and checks whether to allocate them when defining them. For me, I only need "yes/maybe" and "no", but splitting "yes" and "maybe" could be beneficial to others.)

Ok, so you're trying to decide whether to prune a graph search based on a regexp that classifies paths through the graph.

We can't base a distinction between yes and maybe on whether a zero-width $ assertion is triggered if there are paths to completion that do not pass through that assertion.

const re = /^foo($|d).?/ // This variant uses the $ assertionconsole.log( re.exec("foo")[0] === 'foo')// yes would be inappropriate, but yes|maybe would be because // This variant uses the dconsole.log( re.exec("food")[0] === 'food')// and yes|maybe would be appropriate here since console.log( re.exec("foods")[0] === 'foods')

So IIUC, the yes/maybe distinction could be based on a bit that is set on success of a $ assertion and erased on exit from any of ( ...|... , ...? , ...* , ...{0,...} , (?!...) ). That only works when we know that the start of the match is stable though because

const re = /foo$/ console.log( re.test('foo'))console.log( re.test('foofoo'))

It would hold for the common case /^...$/ but in that case you already know the answer, and can test it at runtime by testing myRegExp.source matches a meta-pattern like /^^([^\]|\[\s\s])*$$/.

# Isiah Meadows (4 months ago)

Inline.

On Mon, Feb 19, 2018 at 12:39 PM, Mike Samuel <mikesamuel at gmail.com> wrote:

On Mon, Feb 19, 2018 at 11:43 AM, Isiah Meadows <isiahmeadows at gmail.com> wrote:

Yes, and I specifically want the conservative variant - I'd want a "maybe" for "foo" in that example.

(For context, my test framework determines while running a test whether that test has children, and checks whether to allocate them when defining them. For me, I only need "yes/maybe" and "no", but splitting "yes" and "maybe" could be beneficial to others.)

Ok, so you're trying to decide whether to prune a graph search based on a regexp that classifies paths through the graph.

That would be the correct understanding. (I typically avoid CS jargon since I never got the formal education and I rarely converse with people well-educated in it, so apologies if me not using the technical term made things harder any.)

We can't base a distinction between yes and maybe on whether a zero-width $ assertion is triggered if there are paths to completion that do not pass through that assertion.

const re = /^foo($|d).?/

// This variant uses the $ assertion console.log( re.exec("foo")[0] === 'foo') // yes would be inappropriate, but yes|maybe would be because

// This variant uses the d console.log( re.exec("food")[0] === 'food') // and yes|maybe would be appropriate here since

console.log( re.exec("foods")[0] === 'foods')

This particular scenario would not matter to me directly because all I need is a "could this match now or potentially later". The optional end would be fine, since I'd have the invariant that when I check each child, I'll be adding a space along with the next test's name anyways (and thus won't have a d to worry about).

As for whether it should consider it "ended", I think that's something that could probably be spec'd out in a proposal repo, and I doubt that'd be a blocker for stage 1 (that's typically a stage 2 concern).

So IIUC, the yes/maybe distinction could be based on a bit that is set on success of a $ assertion and erased on exit from any of ( ...|... , ...? , ...* , ...{0,...} , (?!...) ). That only works when we know that the start of the match is stable though because

const re = /foo$/

console.log( re.test('foo')) console.log( re.test('foofoo'))

It would hold for the common case /^...$/ but in that case you already know the answer, and can test it at runtime by testing myRegExp.source matches a meta-pattern like /^^([^\]|\[\s\s])*$$/.


Isiah Meadows me at isiahmeadows.com

Looking for web consulting? Or a new website? Send me an email and we can get started. www.isiahmeadows.com

# Mike Samuel (4 months ago)

On Mon, Feb 19, 2018 at 1:15 PM, Isiah Meadows <isiahmeadows at gmail.com>

wrote:

We can't base a distinction between yes and maybe on whether a zero-width $ assertion is triggered if there are paths to completion that do not pass through that assertion.

const re = /^foo($|d).?/

// This variant uses the $ assertion console.log( re.exec("foo")[0] === 'foo') // yes would be inappropriate, but yes|maybe would be because

// This variant uses the d console.log( re.exec("food")[0] === 'food') // and yes|maybe would be appropriate here since

console.log( re.exec("foods")[0] === 'foods')

This particular scenario would not matter to me directly because all I need is a "could this match now or potentially later". The optional end would be fine, since I'd have the invariant that when I check each child, I'll be adding a space along with the next test's name anyways (and thus won't have a d to worry about).

As for whether it should consider it "ended", I think that's something that could probably be spec'd out in a proposal repo, and I doubt that'd be a blocker for stage 1 (that's typically a stage 2 concern).

Fwiw, it sounds like a fine idea to me.