Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Punctuation in URLs (Read 2365 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Punctuation in URLs

Hello, all.

Google's webmaster tools suite is reporting an ever-larger number (currently 22) of incorrect links pointing to my site, and most of them originate from the Hydrogenaudio forums. I think the problem is that the forum software is trying to automatically turn plain (not marked up) text into URLs, and is accidentally including the punctuation after the URL ends, such as periods and end-parentheses or end-brackets. Because this sends users to my site (and I'm guessing that other webmasters are seeing the same thing) looking for a page that ends with ".html)" or ".html." instead of just ".html", the user gets a 404 "Not found" error page.

I've now tarted up my error page so that my site does its best to guess where the user is trying to get to, and offers them a valid link. So it's not so much a problem for me now, but it probably harms Hydrogenaudio's reputation in search engines, because the web crawlers will likely report that the forums contain dud links.

So I wondered whether there was a way to tweak the HA forum software so that it tested whether its best-guess at what constitutes a URL actually is valid or not.

I did search to see whether this had been mentioned before, but didn't find anything. If this has been covered here at length before now, please point me to the relevant thread and accept my apologies.

Punctuation in URLs

Reply #1
Well there is code in the forum software to automatically clean up URLs when they end with punctuation characters, but it works for trailing .,?! only.
Also it doesn't prevent something like ".html..." anyway.
Full-quoting makes you scroll past the same junk over and over.