I'm trying to find a certain words on the phone screen, and then display a bounding box around them if they are present. I follow these steps:
TextRecognizer
object from the Google ML KitRect
returned by the ML Kit to draw on the screen.It almost works, here is a screenshot of the detection finding and highlighting the word hello
on the notepad app:
As you can see the the semi transparent yellow boxed are a not quite on the hello
s.
Here are the relevant code samples. Passing the screenshot bitmap to the ML Kit:
InputImage image = InputImage.fromBitmap(screenshotBitmap, 0);
//I checked: image, screen, and overlay view dimensions are exactly the same.
TextRecognizer recognizer = TextRecognition.getClient();
recognizer.process(image)
.addOnSuccessListener(this::processText);
The processText
method which gets the recognized words:
for (Text.Element element : getElements()) {
String elementText = element.getText();
Rect bounds = element.getBoundingBox(); //Getting the bounding box
if (elementText.equalsIgnoreCase("hello")) { //hello is hardcoded for now
addHighlightCard(bounds.left, bounds.top, bounds.width(), bounds.height());
}
}
And finally, the addHighlightCard
, which creates and positions the views you see on the screenshot. It uses a fullscreen overlay, with a RelativeLayout
, because this layout allows me to specify the exact location and width of child views.
public void addHighlightCard(int x, int y, int width, int height) {
View highlightCard = inflater.inflate(R.layout.highlight_card, overlayRoot, false);
RelativeLayout.LayoutParams params = new RelativeLayout.LayoutParams(width, height);
params.leftMargin = x;
params.topMargin = y;
highlightCard.setLayoutParams(params);
overlayRoot.addView(highlightCard, params);
}
As you can see there is no scaling going on whatsoever, I capture the whole screen, and I use a layout which fills the whole screen (even the toolbar). Then, I though the coordinates returned by the ML Kit should be directly usable to draw to the screen. But clearly I'm wrong, it seems the image is getting scaled down somewhere, but I can't figure out where.
SOLUTION: It turned out that the incorrect size of the Media Projection API virtual display caused the misaligned bounding boxes. Instead of making this question even longer, I will post a link here to a GitHub repository, where you can find a sample app which shows a working way of using the Media Projection API and performing text recognition on the screenshots.
Sample app: test-text-recognition
I see 4 potential problems with your code.
When you create your highlight card here:
public void addHighlightCard(int x, int y, int width, int height) {
...
params.leftMargin = x;
params.topMargin = y;
...
}
You assign absolute coordinates (screen coordinates) x
and y
rather than coordinates relative to your RelativeLayout
, and that's wrong because the RelativeLayout
also has some offset in regards to the device screen.
To assign correct coordinates, calculate screen coordinates for your RelativeLayout
first, and then adjust x
and y
based on those coordinates. For instance:
public void addHighlightCard(int x, int y, int width, int height) {
...
int[] screenCoordinates = new int[2];
overlayRoot.getLocationOnScreen(screenCoordinates);
int xOffset = screenCoordinates[0];
int yOffset = screenCoordinates[1];
params.leftMargin = x - xOffset;
params.topMargin = y - yOffset;
...
}
However, if your root View
takes over the whole screen, it shouldn't be a problem.
I believe it may be a problem since if you want to add a new View
on top of another FrameLayout
should be used instead. However, I cannot say for sure if it's a problem because I do not see the full code.
You haven't shown us how exactly you do it with MediaProjection, so it can also be a problem. I used a different way to capture the screen that you can see below.
You're inflating a View
from the LayoutInflater
to highlight the found text. For a test, I did it a bit differently by combining a ShapeDrawable
and View
like:
...
ShapeDrawable drawable = new ShapeDrawable();
drawable.getPaint().setColor(Color.YELLOW);
drawable.getPaint().setStyle(Paint.Style.STROKE);
drawable.getPaint().setStrokeWidth(5f);
View shapeView = new View(decorView.getContext());
shapeView.setBackground(drawable);
...
The full code will be provided below.
Since you mentioned that your RelativeLayout
takes over the whole screen, I decided to create a sample project to demonstrate that a project similar to yours is working just fine.
Below is the explanation and relevant code.
plugins {
id 'com.android.application'
}
android {
compileSdkVersion 30
buildToolsVersion "30.0.2"
defaultConfig {
applicationId "com.example.myapplication"
minSdkVersion 24
targetSdkVersion 30
versionCode 1
versionName "1.0"
testInstrumentationRunner "androidx.test.runner.AndroidJUnitRunner"
}
buildTypes {
release {
minifyEnabled false
proguardFiles getDefaultProguardFile('proguard-android-optimize.txt'), 'proguard-rules.pro'
}
}
compileOptions {
sourceCompatibility JavaVersion.VERSION_1_8
targetCompatibility JavaVersion.VERSION_1_8
}
}
dependencies {
implementation 'androidx.appcompat:appcompat:1.2.0'
implementation 'com.google.android.material:material:1.3.0'
implementation 'com.google.android.gms:play-services-mlkit-text-recognition:16.1.3'
testImplementation 'junit:junit:4.+'
androidTestImplementation 'androidx.test.ext:junit:1.1.2'
androidTestImplementation 'androidx.test.espresso:espresso-core:3.3.0'
}
Here, to take a screenshot, I'm using the following piece of code:
Bitmap bitmap = Bitmap.createBitmap(decorView.getWidth(),
decorView.getHeight(), Bitmap.Config.ARGB_8888);
Canvas canvas = new Canvas(bitmap);
decorView.draw(canvas);
InputImage image = InputImage.fromBitmap(bitmap, 0);
I'm doing this in OnGlobalLayoutListener
to make sure that the decor view has proper width and height. OK, the full code for the class is below:
public class MainActivity extends AppCompatActivity {
@Override
protected void onCreate(@Nullable Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
RecyclerView recyclerView = findViewById(R.id.recycler_view);
recyclerView.setAdapter(new RecyclerViewAdapter(this));
recyclerView.setLayoutManager(new LinearLayoutManager(this));
View decorView = getWindow().getDecorView();
decorView.getViewTreeObserver().addOnGlobalLayoutListener(new ViewTreeObserver.OnGlobalLayoutListener() {
@Override
public void onGlobalLayout() {
decorView.getViewTreeObserver().removeOnGlobalLayoutListener(this);
// take a screenshot of your screen
Bitmap bitmap = Bitmap.createBitmap(decorView.getWidth(),
decorView.getHeight(), Bitmap.Config.ARGB_8888);
Canvas canvas = new Canvas(bitmap);
decorView.draw(canvas);
InputImage image = InputImage.fromBitmap(bitmap, 0);
TextRecognizer recognizer = TextRecognition.getClient();
recognizer.process(image).addOnSuccessListener(new OnSuccessListener<Text>() {
@Override
public void onSuccess(Text text) {
for (Text.TextBlock textBlock : text.getTextBlocks()) {
if ("hello".equalsIgnoreCase(textBlock.getText())) {
Rect box = textBlock.getBoundingBox();
int left = box.left;
int top = box.top;
int right = box.right;
int bottom = box.bottom;
ShapeDrawable drawable = new ShapeDrawable();
drawable.getPaint().setColor(Color.YELLOW);
drawable.getPaint().setStyle(Paint.Style.STROKE);
drawable.getPaint().setStrokeWidth(5f);
View shapeView = new View(decorView.getContext());
shapeView.setBackground(drawable);
FrameLayout rootView = findViewById(R.id.root_view);
int[] location = new int[2];
rootView.getLocationOnScreen(location);
FrameLayout.LayoutParams params = new FrameLayout.LayoutParams(right - left,
bottom - top);
params.setMargins(left - location[0],
top - location[1],
right - location[0],
bottom - location[1]);
rootView.addView(shapeView, params);
}
}
}
});
}
});
}
private static class RecyclerViewAdapter extends RecyclerView.Adapter<RecyclerViewAdapter.RecyclerViewHolder> {
private final Context context;
private final String[] elements = new String[] {"Hello", "Hello", "Bye", "Hello", "Hi there", "Hello"};
private RecyclerViewAdapter(Context context) {
this.context = context;
}
@NonNull
@Override
public RecyclerViewHolder onCreateViewHolder(@NonNull ViewGroup parent, int viewType) {
View item = LayoutInflater.from(context).
inflate(R.layout.list_item, parent, false);
return new RecyclerViewHolder(item);
}
@Override
public void onBindViewHolder(@NonNull RecyclerViewHolder holder, int position) {
holder.textView.setText(elements[position]);
}
@Override
public int getItemCount() {
return elements.length;
}
public static class RecyclerViewHolder extends RecyclerView.ViewHolder {
private final TextView textView;
public RecyclerViewHolder(@NonNull View itemView) {
super(itemView);
this.textView = itemView.findViewById(R.id.element_view);
}
}
}
}
<?xml version="1.0" encoding="utf-8"?>
<FrameLayout xmlns:android="http://schemas.android.com/apk/res/android"
android:id="@+id/root_view"
android:layout_width="match_parent"
android:layout_height="match_parent">
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
android:layout_width="match_parent"
android:layout_height="match_parent"
android:paddingStart="30dp"
android:orientation="vertical">
<androidx.recyclerview.widget.RecyclerView
android:id="@+id/recycler_view"
android:layout_width="match_parent"
android:layout_height="match_parent"
android:scrollbars="vertical" />
</LinearLayout>
</FrameLayout>
As you can see, I'm using FrameLayout
as the root view.
<?xml version="1.0" encoding="utf-8"?>
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
android:layout_width="match_parent"
android:layout_height="wrap_content"
android:background="?android:attr/selectableItemBackground"
android:orientation="vertical">
<TextView
android:id="@+id/element_view"
android:layout_width="match_parent"
android:layout_height="wrap_content"
android:paddingTop="16dp"
android:paddingBottom="8dp"
android:fontFamily="google-sans-medium"/>
<View
android:layout_width="match_parent"
android:layout_height="1dp"
android:background="#000"/>
</LinearLayout>
Nothing special with the layout - just a simple one for RecyclerView
.
All 4 "Hello" results are highlighted in yellow.
Make sure you're using the correct way of retrieving the display size if you do it not from Activity
(in your GitHub project you're retrieving it from the Service
) because you need the real display size, and not something else. So, do it as below:
// get width and height
WindowManager wm = (WindowManager) getApplicationContext().getSystemService(Context.WINDOW_SERVICE);
Display display = wm.getDefaultDisplay();
Point size = new Point();
display.getRealSize(size);
mWidth = size.x;
mHeight = size.y;
So, in your sample you have to change your method to:
private void createVirtualDisplay() {
// get width and height
WindowManager wm = (WindowManager) getApplicationContext().getSystemService(Context.WINDOW_SERVICE);
Display display = wm.getDefaultDisplay();
Point size = new Point();
display.getRealSize(size);
mWidth = size.x;
mHeight = size.y;
...
}
That's it.