androidoverlaytext-recognitiongoogle-mlkit

Android: locating words on the screen. Google ML Kit bounding boxes are off a bit


I'm trying to find a certain words on the phone screen, and then display a bounding box around them if they are present. I follow these steps:

  1. Capture entire screen contents (with MediaProjection API).
  2. Pass this screenshot to a TextRecognizer object from the Google ML Kit
  3. Check the detected words, in case of match use the Rect returned by the ML Kit to draw on the screen.

It almost works, here is a screenshot of the detection finding and highlighting the word hello on the notepad app:

screenshot

As you can see the the semi transparent yellow boxed are a not quite on the hellos.

Here are the relevant code samples. Passing the screenshot bitmap to the ML Kit:

InputImage image = InputImage.fromBitmap(screenshotBitmap, 0);
//I checked: image, screen, and overlay view dimensions are exactly the same.
TextRecognizer recognizer = TextRecognition.getClient();
recognizer.process(image)
          .addOnSuccessListener(this::processText);

The processText method which gets the recognized words:

 for (Text.Element element : getElements()) {
      String elementText = element.getText(); 
      Rect bounds = element.getBoundingBox(); //Getting the bounding box
      if (elementText.equalsIgnoreCase("hello")) { //hello is hardcoded for now
          addHighlightCard(bounds.left, bounds.top, bounds.width(), bounds.height());
      }
 }

And finally, the addHighlightCard, which creates and positions the views you see on the screenshot. It uses a fullscreen overlay, with a RelativeLayout, because this layout allows me to specify the exact location and width of child views.

public void addHighlightCard(int x, int y, int width, int height) {
    View highlightCard = inflater.inflate(R.layout.highlight_card, overlayRoot, false);
    RelativeLayout.LayoutParams params = new RelativeLayout.LayoutParams(width, height);
    params.leftMargin = x;
    params.topMargin = y;
    highlightCard.setLayoutParams(params);
    overlayRoot.addView(highlightCard, params);
}

As you can see there is no scaling going on whatsoever, I capture the whole screen, and I use a layout which fills the whole screen (even the toolbar). Then, I though the coordinates returned by the ML Kit should be directly usable to draw to the screen. But clearly I'm wrong, it seems the image is getting scaled down somewhere, but I can't figure out where.

SOLUTION: It turned out that the incorrect size of the Media Projection API virtual display caused the misaligned bounding boxes. Instead of making this question even longer, I will post a link here to a GitHub repository, where you can find a sample app which shows a working way of using the Media Projection API and performing text recognition on the screenshots.

Sample app: test-text-recognition


Solution

  • Analysis

    I see 4 potential problems with your code.

    Usage of screen coordinates

    When you create your highlight card here:

    public void addHighlightCard(int x, int y, int width, int height) {
        ...
        params.leftMargin = x;
        params.topMargin = y;
        ...
    } 
    

    You assign absolute coordinates (screen coordinates) x and y rather than coordinates relative to your RelativeLayout, and that's wrong because the RelativeLayout also has some offset in regards to the device screen.

    To assign correct coordinates, calculate screen coordinates for your RelativeLayout first, and then adjust x and y based on those coordinates. For instance:

    public void addHighlightCard(int x, int y, int width, int height) {
        ...
        int[] screenCoordinates = new int[2];
        overlayRoot.getLocationOnScreen(screenCoordinates);
        int xOffset = screenCoordinates[0];
        int yOffset = screenCoordinates[1];        
    
        params.leftMargin = x - xOffset;
        params.topMargin = y - yOffset;
        ...
    } 
    

    However, if your root View takes over the whole screen, it shouldn't be a problem.

    Usage of RelativeLayout

    I believe it may be a problem since if you want to add a new View on top of another FrameLayout should be used instead. However, I cannot say for sure if it's a problem because I do not see the full code.

    Usage of MediaProjection for screen capturing

    You haven't shown us how exactly you do it with MediaProjection, so it can also be a problem. I used a different way to capture the screen that you can see below.

    Highlighting the text

    You're inflating a View from the LayoutInflater to highlight the found text. For a test, I did it a bit differently by combining a ShapeDrawable and View like:

    ...
                                ShapeDrawable drawable = new ShapeDrawable();
                                drawable.getPaint().setColor(Color.YELLOW);
                                drawable.getPaint().setStyle(Paint.Style.STROKE);
                                drawable.getPaint().setStrokeWidth(5f);
                                View shapeView = new View(decorView.getContext());
                                shapeView.setBackground(drawable);
    ...
    

    The full code will be provided below.

    Solution

    Since you mentioned that your RelativeLayout takes over the whole screen, I decided to create a sample project to demonstrate that a project similar to yours is working just fine.

    Below is the explanation and relevant code.

    build.gradle

    plugins {
        id 'com.android.application'
    }
    
    android {
        compileSdkVersion 30
        buildToolsVersion "30.0.2"
    
        defaultConfig {
            applicationId "com.example.myapplication"
            minSdkVersion 24
            targetSdkVersion 30
            versionCode 1
            versionName "1.0"
    
            testInstrumentationRunner "androidx.test.runner.AndroidJUnitRunner"
        }
    
        buildTypes {
            release {
                minifyEnabled false
                proguardFiles getDefaultProguardFile('proguard-android-optimize.txt'), 'proguard-rules.pro'
            }
        }
        compileOptions {
            sourceCompatibility JavaVersion.VERSION_1_8
            targetCompatibility JavaVersion.VERSION_1_8
        }
    }
    
    dependencies {
    
        implementation 'androidx.appcompat:appcompat:1.2.0'
        implementation 'com.google.android.material:material:1.3.0'
        implementation 'com.google.android.gms:play-services-mlkit-text-recognition:16.1.3'
        testImplementation 'junit:junit:4.+'
        androidTestImplementation 'androidx.test.ext:junit:1.1.2'
        androidTestImplementation 'androidx.test.espresso:espresso-core:3.3.0'
    }
    

    MainActivity.java

    Here, to take a screenshot, I'm using the following piece of code:

                    Bitmap bitmap = Bitmap.createBitmap(decorView.getWidth(),
                            decorView.getHeight(), Bitmap.Config.ARGB_8888);
                    Canvas canvas = new Canvas(bitmap);
                    decorView.draw(canvas);
                    InputImage image = InputImage.fromBitmap(bitmap, 0);
    

    I'm doing this in OnGlobalLayoutListener to make sure that the decor view has proper width and height. OK, the full code for the class is below:

    public class MainActivity extends AppCompatActivity {
        @Override
        protected void onCreate(@Nullable Bundle savedInstanceState) {
            super.onCreate(savedInstanceState);
    
            setContentView(R.layout.activity_main);
    
            RecyclerView recyclerView = findViewById(R.id.recycler_view);
            recyclerView.setAdapter(new RecyclerViewAdapter(this));
            recyclerView.setLayoutManager(new LinearLayoutManager(this));
    
            View decorView = getWindow().getDecorView();
            decorView.getViewTreeObserver().addOnGlobalLayoutListener(new ViewTreeObserver.OnGlobalLayoutListener() {
                @Override
                public void onGlobalLayout() {
                    decorView.getViewTreeObserver().removeOnGlobalLayoutListener(this);
    
                    // take a screenshot of your screen
                    Bitmap bitmap = Bitmap.createBitmap(decorView.getWidth(),
                            decorView.getHeight(), Bitmap.Config.ARGB_8888);
                    Canvas canvas = new Canvas(bitmap);
                    decorView.draw(canvas);
                    InputImage image = InputImage.fromBitmap(bitmap, 0);
    
                    TextRecognizer recognizer = TextRecognition.getClient();
                    recognizer.process(image).addOnSuccessListener(new OnSuccessListener<Text>() {
                        @Override
                        public void onSuccess(Text text) {
                            for (Text.TextBlock textBlock : text.getTextBlocks()) {
                                if ("hello".equalsIgnoreCase(textBlock.getText())) {
                                    Rect box = textBlock.getBoundingBox();
                                    int left = box.left;
                                    int top = box.top;
                                    int right = box.right;
                                    int bottom = box.bottom;
    
                                    ShapeDrawable drawable = new ShapeDrawable();
                                    drawable.getPaint().setColor(Color.YELLOW);
                                    drawable.getPaint().setStyle(Paint.Style.STROKE);
                                    drawable.getPaint().setStrokeWidth(5f);
                                    View shapeView = new View(decorView.getContext());
                                    shapeView.setBackground(drawable);
    
                                    FrameLayout rootView = findViewById(R.id.root_view);
                                    int[] location = new int[2];
                                    rootView.getLocationOnScreen(location);
    
                                    FrameLayout.LayoutParams params = new FrameLayout.LayoutParams(right - left,
                                            bottom - top);
                                    params.setMargins(left - location[0],
                                            top - location[1],
                                            right - location[0],
                                            bottom - location[1]);
    
                                    rootView.addView(shapeView, params);
                                }
                            }
                        }
                    });
                }
            });
        }
    
        private static class RecyclerViewAdapter extends RecyclerView.Adapter<RecyclerViewAdapter.RecyclerViewHolder> {
            private final Context context;
            private final String[] elements = new String[] {"Hello", "Hello", "Bye", "Hello", "Hi there", "Hello"};
    
            private RecyclerViewAdapter(Context context) {
                this.context = context;
            }
    
            @NonNull
            @Override
            public RecyclerViewHolder onCreateViewHolder(@NonNull ViewGroup parent, int viewType) {
                View item = LayoutInflater.from(context).
                        inflate(R.layout.list_item, parent, false);
                return new RecyclerViewHolder(item);
            }
    
            @Override
            public void onBindViewHolder(@NonNull RecyclerViewHolder holder, int position) {
                holder.textView.setText(elements[position]);
            }
    
            @Override
            public int getItemCount() {
                return elements.length;
            }
    
            public static class RecyclerViewHolder extends RecyclerView.ViewHolder {
                private final TextView textView;
    
                public RecyclerViewHolder(@NonNull View itemView) {
                    super(itemView);
    
                    this.textView = itemView.findViewById(R.id.element_view);
                }
            }
        }
    }
    

    activity_main.xml

    <?xml version="1.0" encoding="utf-8"?>
    <FrameLayout xmlns:android="http://schemas.android.com/apk/res/android"
        android:id="@+id/root_view"
        android:layout_width="match_parent"
        android:layout_height="match_parent">
    
        <LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
            android:layout_width="match_parent"
            android:layout_height="match_parent"
            android:paddingStart="30dp"
            android:orientation="vertical">
    
            <androidx.recyclerview.widget.RecyclerView
                android:id="@+id/recycler_view"
                android:layout_width="match_parent"
                android:layout_height="match_parent"
                android:scrollbars="vertical" />
    
        </LinearLayout>
    </FrameLayout>
    

    As you can see, I'm using FrameLayout as the root view.

    list_item.xml

    <?xml version="1.0" encoding="utf-8"?>
    <LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
        android:layout_width="match_parent"
        android:layout_height="wrap_content"
        android:background="?android:attr/selectableItemBackground"
        android:orientation="vertical">
    
        <TextView
            android:id="@+id/element_view"
            android:layout_width="match_parent"
            android:layout_height="wrap_content"
            android:paddingTop="16dp"
            android:paddingBottom="8dp"
            android:fontFamily="google-sans-medium"/>
    
        <View
            android:layout_width="match_parent"
            android:layout_height="1dp"
            android:background="#000"/>
    
    </LinearLayout>
    

    Nothing special with the layout - just a simple one for RecyclerView.

    Result

    All 4 "Hello" results are highlighted in yellow.

    enter image description here

    Update

    Make sure you're using the correct way of retrieving the display size if you do it not from Activity (in your GitHub project you're retrieving it from the Service) because you need the real display size, and not something else. So, do it as below:

            // get width and height
            WindowManager wm = (WindowManager) getApplicationContext().getSystemService(Context.WINDOW_SERVICE);
            Display display = wm.getDefaultDisplay();
            Point size = new Point();
            display.getRealSize(size);
            mWidth = size.x;
            mHeight = size.y;
    

    So, in your sample you have to change your method to:

        private void createVirtualDisplay() {
            // get width and height
            WindowManager wm = (WindowManager) getApplicationContext().getSystemService(Context.WINDOW_SERVICE);
            Display display = wm.getDefaultDisplay();
            Point size = new Point();
            display.getRealSize(size);
            mWidth = size.x;
            mHeight = size.y;
            ...
        }
    

    That's it.