Ingesting Documents through the Amazon Kendra S3 Connector
- Create an S3 bucket to store your documents and create the folder whitepapers.
- Unzip the documents and upload them into that folder.
- Now that you have the documents into an S3 bucket, you can go to the Amazon Kendra console, go to your index and click on Datasources.
- Select S3 and click on Add Connector.
- Enter a name for your connector (in my case Workshop_S3_connector) and click on Nest.
- Click on Browse S3 and select the S3 bucket where you uploaded the documents.
- Create a New IAM Role.
- On the Additional configuration section you can define inclusion and exclusion patterns, add the S3 folder where you uploaded the documents and click Add.
- On the Set sync run schedule click on Next.
- Click on Create.
- After the creation process is complete, click on Sync Now.
- Now that you have ingested some documents, you can navigate to Kendra’s built-in Search Console to test queries. For example what service has eleven nines of durability?.
As you can see, this is the basic process to ingest documents to Kendra from Amazon S3.
However you can see that we don’t have much flexibility in terms of filtering, faceting, and boosting as we don’t have additional metadata. For example, I am not able to show only the results for a specific Category like Machine Learning. This will be addressed on the next section.