Jump to content


Photo

Scrape a string between two strings


  • Please log in to reply
3 replies to this topic

#1 yoram

yoram

    Member

  • UBot Users
  • PipPip
  • 29 posts
  • OS:Windows 7
  • Total Memory:< 1Gb
  • Framework:v3.5
  • License:Pro

Posted 30 April 2012 - 02:33 PM

I am trying to scrape data from an html page (code sample below).
The data I want to scrape is between two td tags:
Starting at : <td align="center" bgcolor="#F3F7F8" >
Closing at </td>
I need to get the data between those tags. After trying to learn rejex for two days, I didn't succeed to do it.
Anyone has any suggestion?

Thanks



          <td align="center" bgcolor="#F3F7F8" >		
            <b><font face="Arial" size="2">
            <img border =0 name="DDD" src="../Images/esh.gif" 
            style="cursor: hand;"
            onClick="MyPopUpWinJV('פרטים על מומחיות  Ã—“\'ר אדלמן בוריס','18204','www.old.health.gov.il');"
            
            >
            </font></b>
			</td>

          <td align="right" bgcolor="#F3F7F8" width="105"><font face="Arial" size="2" color="#0000FF">בתוקף       </font></td>


          <td align="right" bgcolor="#F3F7F8" width="80"><font face="Arial" size="2" color="#0000FF">רשיון קבוע                   </font></td>

          <td align="right" bgcolor="#F3F7F8"><font face="Arial" size="2" color="#0000FF">03/05/1971</font></td>
          <td align="right" bgcolor="#F3F7F8"><font face="Arial" size="2" color="#0000FF">תל אביב - יפו</font></td>
          <td align="right" bgcolor="#F3F7F8"><font face="Arial" size="2" color="#0000FF">1-8755</font></td>
          <td align="right" bgcolor="#F3F7F8"><font face="Arial" size="2" color="#0000FF">בוריס</font></td>
          <td align="right" dir=rtl bgcolor="#F3F7F8"><font face="Arial" size="2" color="#0000FF">אדלמן</font></td>
          <td bgcolor="#369CCD" align="center"><font color="#FFFFFF" face="Arial" size="1">55</font></td>
        </tr>

        <tr>

	<td align="center" bgcolor="#F3F7F8" >
	</td>

          <td align="right" bgcolor="#F3F7F8" width="105"><font face="Arial" size="2" color="#0000FF">בתוקף       </font></td>


          <td align="right" bgcolor="#F3F7F8" width="80"><font face="Arial" size="2" color="#0000FF">רשיון קבוע                   </font></td>

          <td align="right" bgcolor="#F3F7F8"><font face="Arial" size="2" color="#0000FF">30/03/2003</font></td>
          <td align="right" bgcolor="#F3F7F8"><font face="Arial" size="2" color="#0000FF">תל אביב - יפו</font></td>
          <td align="right" bgcolor="#F3F7F8"><font face="Arial" size="2" color="#0000FF">1-34977</font></td>
          <td align="right" bgcolor="#F3F7F8"><font face="Arial" size="2" color="#0000FF">אליהו</font></td>
          <td align="right" dir=rtl bgcolor="#F3F7F8"><font face="Arial" size="2" color="#0000FF">אדלמן</font></td>
          <td bgcolor="#369CCD" align="center"><font color="#FFFFFF" face="Arial" size="1">56</font></td>
        </tr>

        <tr>

 
          <td align="center" bgcolor="#F3F7F8" >		
            <b><font face="Arial" size="2">
            <img border =0 name="DDD" src="../Images/esh.gif" 
            style="cursor: hand;"
            onClick="MyPopUpWinJV('פרטים על מומחיות  Ã—“\'ר אדלמן אליאס גבריאל אדריאן','86478','www.old.health.gov.il');"
            
            >
            </font></b>
			</td>



#2 k1lv9h

k1lv9h

    Advanced Member

  • UBot Users
  • PipPipPip
  • 253 posts
  • LocationPennsylvania
  • OS:Windows 7
  • Total Memory:8Gb
  • Framework:v3.5 & v4.0
  • License:Dev

Posted 30 April 2012 - 03:54 PM

Hi,

I know it is not regex.

This should work.
Attached File  select-td-data-share.ubot   1.04K   4 downloads

Kevin

#3 yoram

yoram

    Member

  • UBot Users
  • PipPip
  • 29 posts
  • OS:Windows 7
  • Total Memory:< 1Gb
  • Framework:v3.5
  • License:Pro

Posted 30 April 2012 - 10:40 PM

Hi,

I know it is not regex.

This should work.
Attached File  select-td-data-share.ubot   1.04K   4 downloads

Kevin


Thank you Kevin for your replay, the code you wrote is scraping only part of the code between the tags.
and unfortunately its not the part i need (i need the onclick code in brackets, or blank if nothing in this specific cell).
MyPopUpWinJV('פרטים על מומחיות  ד\'ר אדלמן בוריס','18204','www.old.health.gov.il')

it seems like ubot function (page scrape) has problem scraping this code because all the special characters in it.
thats why i thought regex is the solution (hope i am wrong, am i?).

#4 k1lv9h

k1lv9h

    Advanced Member

  • UBot Users
  • PipPipPip
  • 253 posts
  • LocationPennsylvania
  • OS:Windows 7
  • Total Memory:8Gb
  • Framework:v3.5 & v4.0
  • License:Dev

Posted 04 May 2012 - 05:13 AM

Hi,

You could try something like this. I used regex to select onclick.

Attached File  select-td-data-004-share.ubot   1.07K   5 downloads

Kevin




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users