Jump to content
UBot Underground

Scraping a variable


Recommended Posts

Hmmm... I am a bit rusty ... :)

 

I have a file that contains the following:and I want to extract this (red) to another variable. These red rows might vary from one case to another so there might be two rows or there might be 30, 50 or even 367.

 

<snip>

 

    "english" => array(

        "descriptions" => array(
        
        "button"         => "Button Translations",
        "date"             => "Date & Time Translations",

    ),    

    // UPDATE 27TH JULY 2013
    "button" => array(            
        "1"             => "Add",
        "2"             => "Edit",
        "3"             => "Delete",
    ),
 

 

</snip>

 

 

If this was a page then I guess I could do something like:

set(#myExtractedText, $page scrape("\"descriptions\" => array\\(", "),"), "Global")

 

 

I have fiddled a lot with regex to try to solve this (reading the file to a variable and then do "$find regular expression" on that variable), but I don't get it right. I am stuck and I would appreciate some help.

 

 

 

Thank you!

 

 

Link to post
Share on other sites
For a little clarification.
 

The data your trying extract is: 

"button"         => "Button Translations",
 "date"             => "Date & Time Translations",

 
And your saying there could be more rows than just two.
Are the rowing your trying to extract the same format as your example?
Link to post
Share on other sites

one way

set(#A, "

<snip>

 

    \"english\" => array(

        \"descriptions\" => array(
        
        \"button\"         => \"Button Translations\",
        \"date\"             => \"Date & Time Translations\",
    ),    

    // UPDATE 27TH JULY 2013
    \"button\" => array(            
        \"1\"             => \"Add\",
        \"2\"             => \"Edit\",
        \"3\"             => \"Delete\",
    ),", "Global")
set(#replace, $replace regular expression($find regular expression(#A, "(?<=\").*?(?=\",)"), "\\d.*", $nothing), "Global")

CD

  • Like 1
Link to post
Share on other sites

 

For a little clarification.
 

The data your trying extract is: 

 

"button"         => "Button Translations",

 "date"             => "Date & Time Translations",

 
And your saying there could be more rows than just two.
Are the rowing your trying to extract the same format as your example?

 

 

Yes, there could be one or several rows. Those that are to be extracted are always between

 

"descriptions" => array(

 

and the closing

 

),

 

 

The file could look like this:

 

<snip>

 

    "english" => array(

 

        "descriptions" => array(

        

        "button"         => "Button Translations",

        "date"             => "Date & Time Translations",

        "64"             => "Numerical label",

        "mixed_character_label2"             => "a label with mixed characters: this text also contains a (, ) and a .",

    ),    

 

    // UPDATE 27TH JULY 2013

    "button" => array(            

        "1"             => "Add",

        "2"             => "Edit",

        "3"             => "Delete",

        "text_label"             => "My string",

    ),

 

 

</snip>

Link to post
Share on other sites

one way

set(#A, "

<snip>

 

    \"english\" => array(

        \"descriptions\" => array(
        
        \"button\"         => \"Button Translations\",
        \"date\"             => \"Date & Time Translations\",
    ),    

    // UPDATE 27TH JULY 2013
    \"button\" => array(            
        \"1\"             => \"Add\",
        \"2\"             => \"Edit\",
        \"3\"             => \"Delete\",
    ),", "Global")
set(#replace, $replace regular expression($find regular expression(#A, "(?<=\").*?(?=\",)"), "\\d.*", $nothing), "Global")

CD

 

 

Thank you! It could be a start.

 

The problem is that it extracts everything, not only what is between:

 

"descriptions" => array(

 

and

 

),

Link to post
Share on other sites

put into a list the use

 

),   as the delimiter then use list item 0 then regex

set(#A, "<snip>

 

    \"english\" => array(

        \"descriptions\" => array(
        
        \"button\"         => \"Button Translations\",
        \"date\"             => \"Date & Time Translations\",
        \"64\"             => \"Numerical label\",
        \"mixed_character_label2\"             => \"a label with mixed characters: this text also contains a (, ) and a .\",
    ),    

    // UPDATE 27TH JULY 2013
    \"button\" => array(            
        \"1\"             => \"Add\",
        \"2\"             => \"Edit\",
        \"3\"             => \"Delete\",
        \"text_label\"             => \"My string\",
    ),", "Global")
set(#B, $list item($list from text(#A, "),"), 0), "Global")
set(#find, $find regular expression(#B, "(?<=\").*?(?=\",)"), "Global")

  • Like 2
Link to post
Share on other sites

hmmm...

 

not sure what you want then.

 

this is what I get and seams to be getting what I think it is you want.

 

only thing missing is " and ", and that can be fixed easy

 

 

can you give screenshot?

 

I made it with 5.5.1 should work in 4 too

post-5979-0-86631600-1416423078_thumb.png

Link to post
Share on other sites

The goal is to be able to set a "keystring", e.g. "descriptions" and scrape every key-value pair inbetween

 

"<keystring>" => array("

 

and

 

"),"

 

but ONLY those, not those key-value pair that follows after the closing "),". This is what it is doing with your code.

 

So in the below case, for the keystring "descriptions", it should only extract:

"button" => "Button Translations",
"date" => "Date & Time Translations",
"64" => "Numerical label",
"mixed_character_label2" => "a label with mixed characters: this text also contains a (, ) and a .",

The things it extracts now is:

        "button"         => "Button Translations",
        "date"             => "Date & Time Translations",
        "64"             => "Numerical label",
        "mixed_character_label2"             => "a label with mixed characters: this text also contains a (, ) and a .",
        "1"             => "Add",
        "2"             => "Edit",
        "3"             => "Delete",
        "text_label"             => "My string",

...which is not correct.

 

 

---

 

Original code:

 "english" => array(

        "descriptions" => array(
        
        "button"         => "Button Translations",
        "date"             => "Date & Time Translations",
        "64"             => "Numerical label",
        "mixed_character_label2"             => "a label with mixed characters: this text also contains a (, ) and a .",
    ),    

    // UPDATE 27TH JULY 2013
    "button" => array(            
        "1"             => "Add",
        "2"             => "Edit",
        "3"             => "Delete",
        "text_label"             => "My string",
    ),
 
)  // my comment:  end of the "english" array

Sorry, my mother tongue is not English, so maybe I am expressing myself a bit odd. I hope you understand. :)

 

Thank you for all your help so far!

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...