How to extract URLs (href property) from HTML

January 5th, 2008

protected ArrayList getURL(string txtIn)
{
    ArrayList outURL = new ArrayList();
    Regex r = new Regex(“href\\s*=\\s*(?:(?:\\\”(?<url>[^\\\”]*)\\\”)|(?<url>[^\\s]* ))”);
    MatchCollection mc1 = r.Matches(txtIn);

    foreach (Match m1 in mc1)
    {
        foreach (Group g in m1.Groups)
        {
            outURL.Add(g.Value);
        }
    }

    return outURL;
}

Programming | Comments | Trackback Jump to the top of this page

2 comments on “How to extract URLs (href property) from HTML”

  1. 01

    […] How to extract URLs (href property) from HTML […]

    Jump to the top of this page
  2. 02

    Good site!
    brainybusiness.info

    koperfild at May 3rd, 2008 around 9:18 pm
    Jump to the top of this page

Leave a Reply

  •  
  •  
  •  

You can keep track of new comments to this post with the comments feed.


Recently on Flickr

  • IMG_0514
  • IMG_0506
  • IMG_0505
  • IMG_0503
  • IMG_0497
  • IMG_0495
  • IMG_0494
  • IMG_0493

Switch Theme

Meta