mardi 28 mars 2017

C# Regex that both captures or excludes the content of a line if it contains certain characters

I've been trying to build a Regex that analyze's the content of multiples HTML pages and check if they have any Accented letters like "á à â ã". The pattern should capture the whole line of code if it detect's any accented letters AND ignore those in comments.

Here's an example:

<li><a href="#prepaid-plan" data-toggle="tab">I want to capture this á</a </li> 
//I don't want to capture this á

The example above should capture only:

<li><a href="#prepaid-plan" data-toggle="tab">I want to capture this á</a </li>

I've made this pattern so far:

(\W(?<!\/\/)(?=\w*[á|â|ã|à|é|ê|è|í|î|ì|ó|ô|õ|ò|ú|û|ù])\S*)

But it fails when the word with the accented letter is not what immediatly follows the "//" and only capture the word, not the entire line.

Can you guys help me? Thanks in advance!

Aucun commentaire:

Enregistrer un commentaire