Jump to content
UBot Underground

Remove Unwanted Lines


Recommended Posts

Not sure how to go about this...

 

I have a #var filled with text

 

info1

info2

info3

info-i-don't-want

info1

info2

info3

info-i-don't-want

info1

info2

info3

info-i-don't-want

 

The problem is that there is no way for me to know in advance if that unwanted line of text will be in the #var or not, and if it is there, I have no way to know what the actual text might be.

 

I was looking for a regex solution to basically delete all lines after info3 but end before info1 but couldn't find anything that would work for me

 

Any ideas? Am I over complicating this?

Link to post
Share on other sites

Not sure how to go about this...

 

I have a #var filled with text

 

info1

info2

info3

info-i-don't-want

info1

info2

info3

info-i-don't-want

info1

info2

info3

info-i-don't-want

 

The problem is that there is no way for me to know in advance if that unwanted line of text will be in the #var or not, and if it is there, I have no way to know what the actual text might be.

 

I was looking for a regex solution to basically delete all lines after info3 but end before info1 but couldn't find anything that would work for me

 

Any ideas? Am I over complicating this?

 

Software,

 

     I'm not too clear on what information you have that you can zero in on to test. It also seems like your info is in a list and not a var.

 

     Given your example:

 

     You could use a $substring function to get position 5 of each row and if position 5 of the row is not 1-3 then delete the row.

 

     Hope that helps.

 

Andy (Arunner26)

Link to post
Share on other sites

if i understand correctly

this is what you got

---------------------------------------------------------------------------------------------

info1 eferojeerjrmolwwkr

info2 rwrjwjwje w ekhwb

info3 kke nfiejrrjoeerkrkr

info4  -i-don't-want

info5  -i-don't-want
info6  -i-don't-want
info1 eferojeerjrmolwwkr

info2 rwrjwjwje w ekhwb

info3 kke nfiejrrjoeerkrkr

info4  -i-don't-want

info5  -i-don't-want
info6  -i-don't-want

info7  -i-don't-want555
info8  -i-don't-want4444
info5  -i-don't-wantyyy
info6  -i-don't-want4444
info1 555

info2 rwrjwjwje w ekhwb6666

info3 kke nfiejrrjoeerkrkr88888

-----------------------------------------------------------------------------------

this what you want

 

info1 eferojeerjrmolwwkr
info2 rwrjwjwje w ekhwb
info3 kke nfiejrrjoeerkrkr
info1 eferojeerjrmolwwkr
info2 rwrjwjwje w ekhwb
info3 kke nfiejrrjoeerkrkr
info1 555
info2 rwrjwjwje w ekhwb6666
info3 kke nfiejrrjoeerkrkr88888

-----------------------------------------------------------------

use find regex

info[123].*

Link to post
Share on other sites

Software,

 

     I'm not too clear on what information you have that you can zero in on to test. It also seems like your info is in a list and not a var.

 

     Given your example:

 

     You could use a $substring function to get position 5 of each row and if position 5 of the row is not 1-3 then delete the row.

 

     Hope that helps.

 

Andy (Arunner26)

Thanks for jumping in Andy - No, it's not in a list, it's in a variable at that point.... I need to clear out the unwanted stuff before i add it into a list because the list isnt a one-per-line type of list.

Link to post
Share on other sites

if i understand correctly

this is what you got

---------------------------------------------------------------------------------------------

info1 eferojeerjrmolwwkr

 

info2 rwrjwjwje w ekhwb

 

info3 kke nfiejrrjoeerkrkr

 

info4  -i-don't-want

 

info5  -i-don't-want

info6  -i-don't-want

info1 eferojeerjrmolwwkr

 

info2 rwrjwjwje w ekhwb

 

info3 kke nfiejrrjoeerkrkr

 

info4  -i-don't-want

 

info5  -i-don't-want

info6  -i-don't-want

 

info7  -i-don't-want555

info8  -i-don't-want4444

info5  -i-don't-wantyyy

info6  -i-don't-want4444

info1 555

 

info2 rwrjwjwje w ekhwb6666

 

info3 kke nfiejrrjoeerkrkr88888

-----------------------------------------------------------------------------------

this what you want

 

info1 eferojeerjrmolwwkr

info2 rwrjwjwje w ekhwb

info3 kke nfiejrrjoeerkrkr

info1 eferojeerjrmolwwkr

info2 rwrjwjwje w ekhwb

info3 kke nfiejrrjoeerkrkr

info1 555

info2 rwrjwjwje w ekhwb6666

info3 kke nfiejrrjoeerkrkr88888

-----------------------------------------------------------------

use find regex

info[123].*

yeah man that basically what i want to achieve - but i dont understand your regex

Link to post
Share on other sites

yeah man that basically what i want to achieve - but i dont understand your regex

I think he is assuming that your text starts with "info", but as you said, the text changes, so I suppose he misunderstood.

 

I believe your best bet is to try and scrape the HTML that surrounds your text as well, and then use regex on that (usually if text is of different type than the text that you want to scrape, there should be a difference in HTML surrounding it).

 

That's why you should first post the HTML (or give us a URL) so we can help you....

 

In case I'm wrong and you get that as string (no HTML), you'll have to find some differences between "infoX" and "infoX-i-don't-want" texts, and than use those findings in regex to recognize bad lines. If you even can't do that, then I think you'll have a hard time. However, for this approach you would need to post the actual text that's there, else it's impossible to help you.

Link to post
Share on other sites

im guessing that the line starts with info so that is a constant ( it isnt going to change)

the [123] means that it wiil look for any thing that is found between the brackets

the , is any character

int this case its any character after info[123]

the * after the . means that there is 0 or more characters in the string before it finds a blank space

Keep in mind that this is not a perfect solution

the [123] wil match 1,2,3 or 23.or 31 or any variation of 123

if what follows info[123] is a sentence that has letters numbers and spaces i would replace the [123] with this

[d\w\s\'\"\.\-\,\;\:\&\!\?]*

it will match any string until it gets to the end of the line

i struggle with regex to

search the users for

HelloInsomnia

he has a tool for regex it helps me out a lot

Link to post
Share on other sites

Lets go at it differently....

 

lets say i have a bunch of random text in the #var, but 2 constants....

 

Sales: Confirmed  ... this is constant 1

<modified> ... this is constant 2.

 

There are several sets of the 2 constants scatter among the text in the #var.

 

like this....

 

<modified> bunch of text for me to keep Sales: Confirmed and another bunch of crap i dont need <modified>

 

I tried using this regex but didn't help.

(?<=Sales: Confirmed).*?(?=<modified>)

basically just trying to delete everything AFTER constant #1, and BEFORE constant #2

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...