Building a robot with computer vision and speech

In this two-part series, I show you how I built a robot using a Raspberry Pi, the CamJam EduKit #3, and Amazon Web Services.

First, let’s take a look at a demo that describes the robot’s capabilities:

https://youtu.be/4GH_0LEwjPo

The robot works in two modes:

  1. ‘Remote Control’ mode: The robot’s navigation can be controlled remotely through a web-based console, and it can be made to “speak” out what it “sees”.
  2. Chatbot control: I am currently working on this. Control the robot using your own voice and text based chat. This will be featured in part 2 of this post when I’m ready.

In this post, I explain the physical build of the robot as well as preparation of the operating system and software.

Components required

Before we begin, let’s take a look at what components went into my robot:

Raspberry Pi 3 Model B

Available here and on Amazon.com

The lower-priced Raspberry Pi Zero W could also work but it will be complicated to connect the speakers as the Pi Zero does not have an audio jack.

CamJam EduKit #3 – Robotics

I found this kit extremely useful. It has (nearly) all the components you need to get started building a robot instead of having to figure out everything by yourself. The EduKit #3 includes:

  • 2 x DC motors 
  • 1 x DC motor controller board
  • 2 x wheels
  • 1 x ball castor (‘front wheel’ of the robot)
  • 1 x small breadboard
  • 1 x battery box for 4 AA batteries to drive the motor
  • 1 x ultrasonic distance sensor
  • A line follower sensor (not required for this project)
  • Resistors and jumper cables

You can buy one at thePiHut

Power bank – small size

Batteries

You probably already have one of these laying around. This is used to power your Raspberry Pi. I used an old Nokia power bank.

You will also need four (4) AA batteries.

Powered speaker with 3.5mm jack, and an audio Cable

Your robot will speak through this. A rechargeable/powered speaker will be needed. Try to get a light one so that you do not add stress to the motors. I used a Nokia Bluetooth speaker that I had laying around.

Raspberry Pi Camera Module

Your robot will see using this. 

You can buy one here or on Amazon.com

microUSB cable

To connect your power bank to your Raspberry Pi. I recommend a very short cable, with right angle connectors to save space inside the chassis. Speaking of which…

A chassis – a plastic or cardboard box, OR access to a 3D Printer.

You’ll need a chassis for your robot. Print one or use the cardboard box that came with the CamJam EduKit (this is what I did).  Use your imagination!

Amazon Web Services (AWS) account.

Create one at aws.amazon.com if you don’t already have one. We’ll be using Amazon S3, Amazon Polly, and Amazon Rekognition for this project.

Assembling your robot

I’ve put together a short video on the components that went into the robot. This is to supplement the already detailed documentation available on the CamJam website.

In addition to the CamJam components, you’ll need to connect a Raspberry Pi camera module, mount the camera on a ZeroView camera mount minus the Pi Zero (or other mounting arrangement), and connect a speaker to the 3.5mm headphone jack.

https://youtu.be/oyg_JdpbKf4

Configuring the Raspberry Pi

Now that you have finished putting together your robot, it’s time to prepare your Raspberry Pi. Below are some of the things you’ll need to do. You’ll find steps for these in the documentation for Raspberry Pi – I’ve added links.

  1. Configure the Raspberry Pi with a static IP on your wireless network. [documentation] You’ll need internet access to download updates and some of the packages.
  2. Enable SSH (recommended for convenience) [documentation]
  3. Install the latest updates. [documentation]
  4. Install Python 3 [documentation]. The code provided is not compatible with Python 2.x.
  5. You may want to change the default hostname and change the default password.
  6. Download and install webbot on your Pi from GitHub. We’ll be modifying code from this project.
  7. Install pygame on your Raspberry Pi [documentation]
  8. Enable the camera interface on your Raspberry Pi. (Tip: type sudo raspi-config)
  9. While you’re there, change the audio configuration of the Pi so that it plays through the speakers you’ve connected to the 3.5mm socket, instead of the HDMI port. [documentation]
  10. You can also set the time and timezone in raspi-config. (Tip: look under Localization/Localisation options)
  11. Sign up for an Amazon Web Services (AWS) account if you don’t already have one.
  12. Install the AWS Command Line Interface (CLI) on the Raspberry Pi. [documentation]
  13. Install boto3. That’s the Amazon Web Services SDK for Python.
  14. On the machine that you’re using to SSH into the Raspberry Pi, install an SCP client (like WinSCP) if you don’t already have it installed. This is not a required step, but it will make it much easier for you to edit code on your PC and have it synchronized with your Pi during testing.

Configuring Amazon Web Services (AWS)

Here is a diagram that describes the process at a high level.

Creating and configuring an S3 bucket
  1. Create an S3 bucket. In my example, I am calling it rekorobot.
  2. Since this is meant to be an experiment, you could configure the bucket permissions to allow public read. Important: Note that pictures taken by the robot will be publicly accessible if you use this method, so do not store sensitive content. It is recommended to further secure the bucket for more specific read access but this requires some changes beyond the scope of this post. Public write is not required for this project and should NOT be enabled. 
  3. Optional: Create a lifecycle rule to delete all objects in the bucket after one day. This ensures that pictures uploaded by the robot are deleted every day and you don’t incur unnecessary charges for images that are no longer needed.

Configuring access

1. Create an IAM user (say robotuser) with the following AWS Managed Policies attached: AmazonRekognitionFullAccess, AmazonPollyFullAccess. Note the Access Key ID and Secret Access Key. You’re going to need this later.

2. Create a Managed Policy that allows the user access to put and get objects on the rekorobot bucket you just created. [documentation]

2. On the Raspberry Pi, configure the AWS CLI using the aws configure command. Enter the Access Key ID and the Secret Access Key for the robotuser IAM user. 

Configuring Amazon Rekognition

There isn’t much to configure here. Once you’ve allowed access to Amazon Rekognition for robotuser, your code can call the Amazon Rekognition APIs right away using the boto3 SDK!

Configuring Amazon Polly

There isn’t much to configure here. Once you’ve allowed access to Amazon Polly for robotuser, your code can call the Amazon Rekognition APIs right away!

However, you’ll need Polly to generate a speech file for you to copy to the robot so it can say something like “I am currently not able to identify any objects” in case nothing was “seen” or recognized. To do this, simply type the text and choose Download MP3. You can customize the voice. I chose Salli.

Save this file to the /webbot folder on your Raspberry Pi. Name the file as notfound_Salli.mp3

Follow the same steps to create another file named robotready_Salli.mp3 that can play a message like “Hello. Robot is now ready” when the robot starts up. Copy this file to the /webbot folder.

Writing and modifying code

You will need to modify the index.html file in the /webbot/public folder for cosmetic changes that add the additional functionality we need. Take a look at the modified index.html below:

Here’s how the modified interface looks like, when accessed from my phone:

Modified Webbot Interface

Next, you’ll need to modify the webbot.py code using your favorite text editor so that it does the extra bits:

1. Take a picture programmatically using the Pi Camera and the raspistill utility when a button is clicked on in the web interface provided by webbot.

2. Upload the image from the camera to S3.

3. Have the image analyzed by Amazon Rekognition and obtain the response.

4. Send the response text to Amazon Polly and obtain the speech response.

5. Play out the speech through the speaker using pygame. 

6. Some basic error handling.

I like to use WinSCP and NotePad++ on my PC for editing code on the Raspberry Pi. Below is the modified code. I have provided commented code so things are clear.

See Part 2 – Remotely control this robot using a chatbot, serverless compute and IoT!

Bonus content

In case you’re wondering what “the view from the robot” was like when I recorded the first video above, here they are:

I hope you find this helpful. If you do, be sure to post a comment below. Have fun!

Don’t forget to read Part 2 – Remotely control this robot using a chatbot, serverless compute and IoT

One thought on “Building a robot with computer vision and speech

Comments are closed.