gmail Adaptor developer README.md
Source: https://github.com/OpenFn/adaptors/tree/main/packages/gmail
Gmail Adaptor
What it does
This adaptor is used to extract specific content from Gmail messages using custom desired "content" configurations. The sample code specifies how to query Gmail for messages and identify desired attachments and metadata.
Without any parameters, the getContentsFromMessages()
function will return an array containing every message in the account of the authenticated user including from
, date
and subject
.
A number of options are available to isolated the desired messages and to customize the output.
Options
Optional parameters include: contents
, query
, email
, processedIds
, maxResults
options.contents
Use the options.contents
array to specify the content to retrieve from each message. Always included are from
, date
, and subject
.
Each item can be a simple string (ie, 'body'
, 'subject'
) or an MessageContent object offering advanced configuration.
Basic metadata
The following types of content can be extracted:
body
: Extracts the message body.subject
: Extracts the email subject.date
: Extracts the timestamp of the email.from
: Extracts the sender's information.
Optionally, each of these content strings can be expanded to include additional specifications:
const mySubject = {
type: 'subject',
name: 'email-title',
maxLength: 25,
};
- The
type
property instructs the function which content type to extract. - The
name
property allows you to add a custom name to this information. - The
maxLength
property allows you to limit the length of the content returned.
Attachment: basic file
Extract content from a file attachment.
file
: Identify the specific file inside the archive by providing its name as a string or using a regular expression to matching a pattern.
const myMetadata = {
type: 'file',
name: 'metadata',
file: /^summary\.txt$/,
};
const myMetadata = {
type: 'file',
file: 'summary.txt',
maxLength: 500,
};
Attachment: archived file
Extract content from a file embedded in an archive attachment.
archive
: Specify the file name of the archive using either a string for an exact match or a regular expression to match a pattern.file
: Identify the specific file inside the archive by providing its name as a string or using a regular expression to match a pattern.
const myArchivedFile = {
type: 'archive',
name: 'data',
archive: 'devicedata.zip',
file: /_CURRENT_DATA_\w*?\.json$/,
maxLength: 5000,
};
options.contents = [mySubject, 'body', myMetadata, myArchivedFile];
options.query
Use a query
parameter to filter the messages returned.
The query syntax supports the same query format as the Gmail search
box.
options.query = 'from:someuser@example.com rfc822msgid:<somemsgid@example.com> is:unread';
A full list of supported search operations can be found here: Refine searches in Gmail
options.email
Optionally specify the email address used for the Gmail account. This almost always the same email associated with the authenticated user so this parameter is optional.
options.email = '<EMAIL>';
options.processedIds
In some scenarios, it may be necessary to skip certain messages to prevent the retrieval of duplicate data. Passing an array of messageIds will allow the function to skip these messages if any of the ids are encountered in the returned messages.
options.processedIds = [
'194e3cf1ca0ccd66',
'283e2df2ca0ecd75',
'572e1af3ca0bcd84',
];
options.maxResults
To prevent inadventant massive retrieval of messages, you can limit the number of results returned. The default value is 1000.
This works in conjuction with the options.processedIds
parameter. For example:
- account contains messages [1, 2, 3]
options.processedIds = [1];
options.maxResults = 1;
- this will skip message #1 and resulting dataset will contain a single message #2
Example jobs
const query = 'in:inbox newer_than:2d';
const contents = ['body'];
const maxResults = 200;
getContentsFromMessages({ query, contents, maxResults });
const subject = 'device data summary'.replace(' ', '+');
const query = `in:inbox subject:${subject} newer_than:1m`;
const email = 'special_assigned_delegate@gmail.com';
const metadataFile = {
type: 'file',
name: 'metadata',
file: /summary\.txt$/,
maxLength: 500,
};
const dataFile = {
type: 'archive',
name: 'data',
archive: /_device_data\.zip$/,
file: /_CURRENT_DATA_\w*?\.json$/,
};
const contents = [metadataFile, dataFile];
getContentsFromMessages({ query, email, contents });
Sample state.data
Output
For each matched message, the extracted content is returned as a message object of content properties. Here's an example state.data
for a single matched message:
[
{
messageId: '1934c017c1752c01',
from: 'Friendly Sender <sender@gmail.com>',
date: '2024-11-20T23:56:08.000Z',
subject: 'Fwd: FW: Facility Anomaly Report (Summary Data)',
metadata: {
filename: 'daily_summary.txt',
content: '{ "appInfo": { "isAutomaticTime": true }',
},
data: {
archiveFilename: '0031_device_data.zip',
filename: '0031_CURRENT_DATA_P100DT9H45M46S_20241115T102926Z.json',
content: '{ "AMOD": "VL45", "AMFR": "ICECO" }',
},
},
];
Each property on the message object represents a specific piece of information extracted:
- from: Sender's email and name.
- date: The timestamp when the email was sent.
- subject: Contains the email subject.
- metadata: Metadata-named file content, with its matched file name.
- data: Data-named archive file content, with its matched archive name and file name.
Acquiring an access token
The Gmail adaptor implicitly uses the Gmail account of the Google account that is used to authenticate the application.
Allowing the Gmail adaptor to access a Gmail account is a multi-step process.
Create an OAuth 2.0 client ID
Follow the instructions are found here: https://support.google.com/googleapi/answer/6158849
- Go to Google Cloud Platform Console
- Go to "APIs & Services"
- Click "Credentials"
- Click "Create Credentials"
- Select "OAuth client ID"
- Select "Create OAuth client ID"
- Select Application type "Web application"
- Add a uniquely-identifiable name
- Click "Create"
- On the resulting popup screen, find and click "DOWNLOAD JSON" and save this file to a secure location.
Use the Postman application to query the OAuth enpoint and retrieve an access token
Initially, you'll need to configure an authentication request using Postman's built-in OAuth 2.0 implementation:
- Open Postman
- Create a new request
- Switch to the "Authorization" tab
- On the left side, select Type OAuth 2.0
- On the right side, scroll down to the "Configure New Token" section
- Fill out the form using information from the downloaded json file from the
previous section
- Token Name: Google Oauth
- Grant Type: Authorization Code
- Auth URL: (found in the json file as auth_url)
- Access Token URL: (found in the json file as token_url)
- Client ID: (found in the json file as client_id)
- Client Secret: (found in the json file as client_secret)
- Scope: https://www.googleapis.com/auth/gmail.readonly
- State: (any random string is fine)
- Client Authentication: Send as Basic Auth header
Once the form is filled out, repeat these steps each hour to retrieve a new access token:
- Click on "Get New Access Token"
- A browser will open and you'll be asked to authenticate with your Google Account
- Accept the request to allow this OAuth session to access your Google Account.
- In the MANAGE ACCESS TOKENS popup, find and copy the new Access Token
- This access token will be valid for 1 hour.
Configure OpenFn CLI to find the access token
The Gmail adaptor looks for the access token in the configuration section under access_token
.
Example configuration using a workflow:
"workflow": {
"steps": [
{
"id": "getGmailContent",
"adaptors": [
"gmail"
],
"expression": "path/to/gmail.js",
"configuration": {
"access_token": "(access token acquired from Postman)"
}
}
]
}