PHP Regex Help Needed

Status
Not open for further replies.

plepco

New member
May 24, 2007
276
11
0
New Orleans
www.massindexer.com
I need to take a list of URLs like this for example:

www.mysite.com all about my site
whatever.net things and stuff
http://www.mycoolsite.com cool, coolness, and cool stuff
randomsite.org
RadStuff.com

and turn them into a list of links. If there are keywords like some of the above have, the keywords or phrases should be the link without the URL being shown. If no keywords are specified, it should go ahead and show the URL as a link.

I have like 1000 URLs so I need this to be automated.

I'm a novie at PHP and regex confuses the hell outta me, but here's what I have so far:
PHP:
<?php>
$urls=$_POST['urls'];  
$urls=str_replace('http://www.', 'http://', $urls);
$urls=preg_replace("/(?:(http:\/\/)|(www\.))(\S+\b\/?)([ [:punct:]]*)(\s|$)/i", 
    "<a href=\"http://$2$3\">$1$2$3</a>$4$5", $urls);
$urls=stripslashes($urls);
echo $urls;
?>
So I have a form where I paste my URL list, and it outputs the URLs as links. I can't figure out how to get the keywords to be links without showing the URLs!

So how do I get this working?
 


Try splitting the list up in lines with only links and lines with links and keywords. That way you can make rules based on wether or not you should use keywords.
 
Here is a sample on how to make links with a line with the url and then some keywords. I didn't use regexp, as its much too early in the morning for that ;)

Code:
<?php
	$t = 'http://www.mycoolsite.com cool, coolness, and cool stuff';
	
	$t = explode(' ', $t);
	$url = array_shift($t);
	$t = implode(' ', $t);
	echo '<a href="'.$url.'">'.$t.'</a>';
?>
 
Code:
(\/[0-9A-Za-z._&?=%\/~-]+\.html)

That will grab all URLs ending in .html change .html to whatever you want to grab or tweak it to grab all extentions, regex is a bit confusing at first but super powerful if you can understand it, even though your not supposed to use them often because it slows down your run time.
 
...your not supposed to use them often because it slows down your run time.

Yeah. I still resort to implodes/explodes as often as I can. Not just because it should be faster, but because I haven't fully grokked regexp yet.
 
edit: lol.. Ignore the previous msg. I did not fully understand your initial question.


Since you don't need to find out what the anchor text is, I don't see a need
to extract the url. Just extract the entire <a> tag.

Code:
<?php

//$urls=$_POST['urls']; 
$urls='<br><a href="http://www.mysite.com">www.mysite.com</a> all about my site<br>whatever.net things and stuff<br><a href="http://www.mycoolsite.com" target="_blank">http://www.mycoolsite.com</a> cool, coolness, and cool stuff<br>randomsite.org<br><a href="http://radstuff.com" target="_blank">RadStuff.com</a>';  

echo "Links <br><ul>";

$status= preg_match_all('/<a (.*?)>(.*?)<\/a>/i', $urls, $regs);
$links = $regs[0];
foreach ($links as $value){
 echo "<li>$value</li>";
}
echo "</ul>"; 

?>
Cheers
 
I hope i understood your question this time. :)

Code:
<?

$urls='www.mysite.com all about my site
whatever.net things and stuff
http://www.mycoolsite.com cool, coolness, and cool stuff
randomsite.org
RadStuff.com';  

$words = split("[\n\r]+", $urls);
foreach($words as $value){
$text = preg_split('/[ ]/', $value);

if (count($text)>1){
   
$keywords = implode(" ", $text);   
$keywords = str_replace($text['0'],"",$keywords);

    if (preg_match('/http:\/\//', $text['0'])){
    echo "<br><a href='" . $text['0'] ."'>$keywords</a><br>" ; 
    
    }else{
     echo "<br><a href='http://" . $text['0'] ."'>$keywords</a><br>" ;
    }


}else{ 
     if (preg_match('/http:\/\//', $text['0'])){
    echo "<br><a href='" . $text['0'] ."'>".$text['0']."</a><br>" ;   
    
    }else{
     echo "<br><a href='http://" . $text['0'] ."'>".$text['0']."</a><br>" ;   
    }
      
}      
}
?>
 
Yeah. I still resort to implodes/explodes as often as I can. Not just because it should be faster, but because I haven't fully grokked regexp yet.

The explode function still works well across multiple hosting accounts but you should avoid using the implode function for certain aspects, I find that certain hosting accounts (Mainly GoDaddy) will really give you fucked up results when doing shit with the implode function instead you should use CURL for better cross hosting compatibility.
 
Use CURL instead of implode? ... WTF you on?

Besides, curl is most often disabled on shared hosting servers.
 
Use CURL instead of implode? ... WTF you on?

Besides, curl is most often disabled on shared hosting servers.

Yeah. Using Curl is overkill to join an array together ;)

It might be overkill and it might be disabled on shared hosting but go try and deal with it when you've got GoDaddy as a hosting provider and your trying to implode something inside of an RSS feed.........You'll be fucked.....thats the only reason why I just don't use it, for me it takes a whole 30 seconds to write up a CURL command over an implode, or if CURL isn't right for the situation as well I'll go off and use something that is.

Maybe I'm the only one who's had bad experinces with implode who knows haha.
 
Are we talking about the same command? I use implode ( PHP: implode - Manual ) to join an array of bits together into a string. CURL for getting files off the net.

I have never heard of a host that didn't support the implode command.

Oh, and I would love to see you use CURL to join the array below with spaces between the bits.

Code:
$array = array('Joined', 'by', 'curl.');
 
Use CURL instead of implode? ... WTF you on?

Besides, curl is most often disabled on shared hosting servers.

Are we talking about the same command? I use implode ( PHP: implode - Manual ) to join an array of bits together into a string. CURL for getting files off the net.

I have never heard of a host that didn't support the implode command.

Oh, and I would love to see you use CURL to join the array below with spaces between the bits.

Code:
$array = array('Joined', 'by', 'curl.');

haha I know it sounds fucked up and your 100% right implode works to append strings together and CURL grabs pages but I've had a situation before where I could not use implode because of the way GoDaddy has their shit set up, I don't know what it was but it was fucked from the start.
 
Status
Not open for further replies.