Building Skills for Amazon Echo

Irena Shaigorodsky posted November 6, 2016

I am a proud owner of a pre-ordered Amazon Echo (or Alexa). It’s a lot of fun for guests and the family alike. It claims to have been born on Nov. 6, 2015, and generally tries to act like an AI. Its voice recognition is pretty decent, especially with accents. When it was released, there was none like it and now Google, and Apple, and various other startups are rushing to fill the void. Andrew Ng, Baidu’s chief scientist and the head of Baidu Research predicted in 2014, “In five years, we think 50 percent of queries will be on speech or images.” I am going to take you through what it takes to put a voice interface in place, how it can be useful for you, and my observations and conclusions — so we all can get into the future faster.

This summer at Vacation Rentals’ Hack-a-thone me and Ahmed Abdelmeged got to put rentals searches on this device. The article itself is using externally published sample built with the lessons learned from the experiment.

Let’s see what it takes to create an Alexa add-on, known as a custom skill. The add-on will use the voice interface and home card on the Alexa companion app. It will link your Alexa (Amazon) account to a Google account. With this add-on, you will be able to check for upcoming birthdays and other important dates for your contacts. The complete demo project is posted to GitHub.

Designing the Voice Interface

The first thing to figure out is what question(s) the skill will be able to answer. As far as the demo skill goes, it will be able to look up important dates by first name, and/or a particular calendar date, and/or the type of the reminder. The skill can either take all the information upfront or in a Q&A session with the user. Also, it should provide a help message about itself and allow for all the necessary setup with Google.

Schema of Intents

First, it needs a schema of intents (actions) that the skill will support. Intents can be built-in, such as AMAZON.HelpIntent, AMAZON.StopIntent, AMAZON.PreviousIntent, AMAZON.NextIntent, or custom ones. The skill code should be able to fulfill the declared actions.

Each declared intent can have a number of placeholders of built-in (‘AMAZON.US_FIRST_NAME’, ‘AMAZON.DATE’) or custom (‘LIST_OF_REMINDERS’)  types. Custom types are just lists of acceptable values, such as ‘birthday’, ‘birthdays’, ‘anniversary’, ‘anniversaries’, ‘reminder’, or ‘reminders’ for LIST_OF_REMINDERS.

Sample Utterances

This is a list of phrases with slots (placeholders) for Alexa to fill in. The acceptable list of values for each slot is based on custom lists or supported types that appear in the intent schema.

Invocation Name

Once the voice interface is sketched, it’s time to pick up a good invocation name. One should consider the guidelines from Amazon as well as the limitations of voice recognition that will come up during testing. If the intent name cannot be easily recognized by Alexa, it should not be used. ‘google reminder‘ is the intent name that will be used for the demo. It’s not too long, and it’s easy to recognize.

As you start to see right now, there is no AI magic involved, just-well scripted conversations that Echo can carry on. If you go outside of the known scripts, Echo will fall back on catch-all phrases.

Amazon Lambda: The Fastest Way to Get the Skill Going

There are multiple options for building a skill-backing server, but writing Node.js-powered Amazon Lambda code is the fastest way to get the demo going. However, for one’s real life implementation, it can be an externally running service, or lambda, written in a different language.

The Amazon team published a number of JavaScript-based examples, however, none of those use linked accounts or cards with images. The demo re-uses the auxiliary alexaDateUtil (as-is) and AlexaSkill (with some modifications) to allow those use cases.

The details that had to be filled into the main section of the code (index.js) are event and intent handlers. As you can see, each declared intent in the schema has a matching named function to be activated.

Intent Handling

Let’s look closely at handling the help intent and surfacing one of the reminders (for the complete project, go to GitHub).

The help message will either takes the user through the Q&A session if there is already a link established with a Google account (access token in the session), or takes the user through the account linking process. For this purpose, a modified version of the “ask” auxiliary method is used. It takes the type of the home cards along with the title, content, and images of various scale.

The serving of the reminder is done with the card as well. If the contact has an associated image, the companion app will present it to the user.

And last, but not least, handling the OAuth 2.0 security handshake. The good news is that Amazon Echo takes care of the account linking/handshake with a little additional code and some configuration explained in the further details. The actual code just needs to check if the access token is already present and use it — or link the account if it is not (as shown above).

Maintenance and Development Considerations

There is an abundance of console.log statements in the snippet of code provided. They will come in handy when you start talking to the device, which will blink and turn itself off.

When the function is deployed, the logs are captured by CloudWatch and can be accessed for further inspection. While excessive logging in development is great, the production version may turn some of it off while keeping error conditions with all the information possible. Monitor metrics can be hooked up to the logs as the situation requires. For more information, just read this blog.

Testing Considerations

One can test the code through Amazon’s Service Simulator. If the skill is going to use account linking, this actually means that there should be real and mocked version of the code. Service Simulator does not let you go through the linking.

When testing with the real code, tools such as PostMan can be utilized to make the calls and observe the outputs.

As with any Q&A process, checking all the utterances (scripted conversations) as well as acceptable and unacceptable slot inputs are a must.

Try testing on a real device — it can be interesting to observe how the scripted conversations are aligned with the way we actually speak. One may need to go back and tweak the utterances to better represent common speech.

Putting it All Together

There are number of moving parts in this demo that need to be set up or deployed for the example to start working on your device. While deploying, you might be puzzled by the chicken and the egg problem: One part of the set-up will need information, such as an ID from another part of the setup, that cannot be done before the first part is completed. My hope is that someday, the complexity will go away or be augmented.

Amazon Account Setup

First, you will need an Amazon developer account set up. Just go to https://developer.amazon.com/and sign up for an account with the Amazon ID used for the device. If this is not possible, no worries — a device can be used by multiple accounts linked to a family, as described here. Just remember to switch to the account used for development purposes while working on the skill.

Google Account Setup

To make Google API accessible from the lambda code, there has to be an API Key and OAuthClient credentials.

  • Go to your Google API Console and create a new project via the drop-down. Name it Alexa Skill Demo.
  • Select the project created. Go to the Credentials section.
    • Create credential: API key. For production code, it is advised to restrict the key of the IP/Domain of the service, such as a hosted lambda.
    • Create a credential OAuth client ID for web applications:
      • Name: alexa-skill.
      • Once the skill is in place: Use a redirect URL as specified for account linking. Your edirect URL may look like:
        https://pitangui.amazon.com/spa/skill/account-linking-status.html?vendorId=AAAAAAAAAAAAAA&state=xyz&code=SplxlOBeZQQYbYS6WxSbIA
  • Go to Dashboard. Click Enable API. Allow access to the calendar and contacts API.

See this article for additional details about setting your account up.

Building the Code

  1. To get the build going, copy alexa_config.bash.sample to alexa_config.bash.
  2. Fill in the Google API key.
  3. Once your skill is set to be configured, make sure you fill in the Skill application ID information.
  4. Create a demo.zip package under the bin directory of the project by running the package.shscript.

Deploying the Lambda

  1. Go to the AWS Console and click on the Lambda link. Note: Ensure you are in us-east or you won’t be able to use Alexa with Lambda.
  2. Click on the Create a Lambda Function or Get Started Now button.
  3. Skip the blueprint.
  4. Name the Lambda Function “Google-Birthday-Reminder-Example-Skill”.
  5. Select the runtime as Node.js.
  6. Select the code entry type as “Upload a ZIP file”, then upload the bin/demo.zip file to Lambda. Use this to re-upload any code changes.
  7. Keep the Handler as index.handler (this refers to the main JS file in the ZIP).
  8. Create a basic execution role and click create.
  9. Leave the Advanced settings as the defaults.
  10. Click “Next” and review the settings, then click “Create Function”.
  11. Click the “Triggers” tab and select “Add trigger”.
  12. Set the Trigger type as Alexa Skills kit. Click Submit.
  13. Copy the ARN from the top right to be used later in the Alexa Skill Setup.

Configuring the Skill

  1. Go to the Alexa Console and click Add a New Skill.
  2. Set “Google Birthday Reminder” for the skill name and “google reminder” as the invocation name. This is what is used to activate your skill. For example, you would say: “Alexa, Ask google reminder when John’s birthday is.”
  3. Copy the Intent Schema from the included IntentSchema.json.
  4. Copy the custom slot types from the customSlotTypes folder. Each file in the folder represents a new custom slot type. The name of the file is the name of the custom slot type, and the values in the file are the values for the custom slot.
  5. Copy the Sample Utterances from the included SampleUtterances.txt. Click Next.
  6. Select the Lambda ARN for the skill Endpoint and paste the ARN copied from above. Click Next.
  7. Account linking: Yes.
    1. Authorization URL: https://accounts.google.com/o/oauth2/auth
    2. Access Token URI: https://accounts.google.com/o/oauth2/token
    3. Scope:
      1. https://www.googleapis.com/auth/calendar.readonly.
      2. https://www.googleapis.com/auth/contacts.readonly.
    4. Use client ID and secret as created for the Google Dev project.
  8. Go back to the skill Information tab and copy the appId. Paste the appId into the alexa_config.bash file for the variable APP_ID, then update the Lambda source ZIP file with this change by running package.sh. Finally, upload to Lambda again — this step makes sure the lambda function only serves requests from an authorized source.
  9. You are now able to start testing your sample skill! You should be able to go to the Echo webpage and see your skill enabled.
  10. In order to test it, try some of the Sample Utterances.
  11. Your skill is now saved, and once you are finished testing, you can continue to publish your skill.

Publishing the Skill

Fill in the publishing information tab on the Alexa Console with category information, long and short descriptions, testing instructions and sample utterances, keywords, and images. Use the following documentation for additional details.

Once you’re sure that the submission checklist is covered, submit the skill for the certification. Good luck, and I hope to use yours soon!

Why Invest in the Skill

Alexa Skills categories range from Game, Trivia & Accessories to Business & Finance. It provides integration glue for Connected Cars and Smart Homes. It can help with productivity and improve various social aspects of your life. It makes life easier and expands the user base. Think about all the people the skill can help. All those who struggle with regular interfaces.

Even the users that can use traditional interfaces prefer voice interfaces for the speed and ease of use. For instance, the Skill trending the week of Oct. 3, 2016, was the Hurricane Tracker as hurricane Matthew was approaching southeastern United States. The time has come to put voice interfaces on the roadmap to be ready by 2019. Even more promising, you can put an Alexa Voice Service on any connected device with a speaker and a microphone.

One response to “Building Skills for Amazon Echo”

  1. George Bezerra says:

    Cool post!

Leave a Reply

Your email address will not be published. Required fields are marked *