How to create a rule based on a Regexp to scan emails with PDF and image files

Problem

How can you create a routing rule to using a Regexp to scan PDF attachments or image files.

Environment

  • Gmail web interface
  • Admin console

Solution

Regexp does not scan images if they contain numbers or text. However, if you create a compliance rule using a Regexp only text and attachments with text, it will be scanned.

Workaround
Create another compliance rule based on predefined content detectors. Please note that this solution only applies to: the following Enterprise editions:
  • Education fundamentals
  • Education Standard
  • Teaching and Learning Upgrade
  • Education plus
Steps:
  1. Go to Google Workspace Admin console.
  2. Hover your mouse to the left, then click Apps > Google Workspace > Gmail > Compliance.
  3. On the left, select the organization unit you want to affect.
  4. Under Content compliance click Add another rule or Configure.
  5. Add a name to the rule.
  6. Check the Inbound and Internal sending boxes if needed.
  7. Under Add expressions that describe the content you want to search for in each message right under Expressions click Add.
  8. From the drop down menu Predefined content match new  options will become available.
  9. From the drop down menu choose India Aadhaar Number, it is the last option.
  10. Under Minimum number of matches you can select the number of the specific content in this case am Aadjaar number to appear in an email for the rule to be triggered.
  11. Under Confidence threshold you can choose between high or medium (see note below for details on each).
  12. Click Save.
  13. Under If the above expressions match, do the following you can choose what to do based on the option like send it to a different sender, go to quarantine.
Note:
  • High: Fewer messages exceed the threshold, so fewer messages trigger the action. This might result in more false negatives: More messages being delivered when they shouldn't be. As a result, use this setting if you want messages to be delivered at the expense of occasionally letting messages through when they should trigger the action.
  • Medium: More messages exceed the threshold, so more messages trigger the action. This might result in more false positives: More messages triggering the action when they should simply be delivered. Use this setting if you're not sensitive to messages occasionally triggering the action when they should be delivered.

Cause

Regular expressions do not scan numbers or text in attachments, working as expected.