I am working on collecting data from twitter and making processing on it, but i have the problem that: text is dirty,
example :
String dirtyText="this*is#a*&very_dirty&String";
example :
String dirtyText="All f dis happnd bcause u gave ur time, talent n passion.";
please i want it as simple as possible.
This is not an easy problem to solve. All f dis happnd
could be "cleaned" to produce All *of* this happened
or All *if* this happened
. For the first example, you can merely replace all non-alphabetic characters with spaces. See this question for how to do that.
Otherwise I think you would need a natural language processor, or at the very least a spell checker. To guess what a Tweet should be in correct english is an extremely complex problem to solve. Take a look at Jazzy for an open source spell checker.