Thanks to this answer over here. I have been using the following code to validate a URL. Its just that there are so many possible options with the new .anything
domains lately. So I figured, that which ever the twitter treats as a URL(while posting a tweet), I will use the same... to follow the standard, so to say!
I want to know how the twitter validates a URL, is there any library that I could use which twitter is using. Please help me solve this common problem. Thanks a ton!
public static List<String> extractUrls(String input) {
List<String> result = new ArrayList<String>();
Pattern pattern = Pattern.compile(
"(\\s)+\\b(((ht|f)tp(s?)\\:\\/\\/|~\\/|\\/)|(www.)?)" +
"(\\w+:\\w+)?(([-\\w]+\\.)+(com|org|net|gov" +
"|mil|biz|info|mobi|name|aero|jobs|museum|club" +
"|travel|[a-z]{2}))(:[\\d]{1,5})?" +
"(((\\/([-\\w~!$+|.,=]|%[a-f\\d]{2})+)+|\\/)+|\\?|#)?" +
"((\\?([-\\w~!$+|.,*:]|%[a-f\\d{2}])+=?" +
"([-\\w~!$+|.,*:=]|%[a-f\\d]{2})*)" +
"(&(?:[-\\w~!$+|.,*:]|%[a-f\\d{2}])+=?" +
"([-\\w~!$+|.,*:=]|%[a-f\\d]{2})*)*)*" +
"(#([-\\w~!$+|.,*:=]|%[a-f\\d]{2})*)?\\b");
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
result.add(matcher.group());
}
return result;
}
Twitter exposes twitter-text
library which has a lot of text processing options. Here is the relevant repo https://github.com/twitter/twitter-text/tree/master/java. If you want to do this on client side, you can use code from https://github.com/twitter/twitter-text