Tool to Get URLs from Text Chunk and De-duplicate entries

This tool was created to help Community Architect partners manually create a list of their most recent member's personal websites. The two specific uses that spawned this little app were both based on needing to extract urls from HTML produced to display database output. Ideally, you would write your program to output them in a portable, useful fashion, but in the case of CA tools, sometimes just getting output to the screen is the best you can hope for. (Hopefully this will change with more devoted minds and resources, but for now this will have to do.)

To use this example: View source on the HTML page with the urls, then cut/paste it into the left hand input. Modify the configuration controls to best suit what you know about the data. Click to get the urls. Depending on how large a code chunk you are parsing, it may take some time. If you would like to de-dupe urls from the parsed list, use the de-dupe button. There are a variety of other Community Architect specific quirks you can perform as well. For example [alt] is the delimiter the CA Wrap language uses to segment items into a list/array.

If you have questions or suggestions, contact the author. Best wishes.

1. Paste your view-source here
3. Get extracted URLS here
2. Options:


- Assume "http://" prefix.

Who is Community Architect? Community Architect has been one of the best kept reseller-like-affiliate-like white-label web hosting provider secrets over the last seven or so years. United Online (NASDAQ:UNTD) acquired them from About.com/Primedia (NYSE:PRM) a few years ago and has released a handful of useful tools for members, but has been slow to recognize the value in the CA program and build and release tools for partners. Category leaders are sometime hard to recognize, but CA is definitely a category leader.

Additional Keywords: Parse urls from inputs. A JavaScript tool to get urls from a chunk of text.