15 KiB
NDB API Guide
This guide aims to provide a better help than what the USDA gives, by describing both all of the available data and the ways to access it.
The Database
The Nutritional Database is split in two parts:
- The Standard Release database, or SR: It holds nutritional information for common foods with no associated brands; useful to answer requests like "regular oatmeal". This part of the database is released yearly in multiple formats, including an Access Database.
- The Branded Foods database: Holds nutritional information for branded food items from US manufacturers; useful to answer more specific requests like "McFlurry with Oreo cookies".
There are a few Python packages to provide ways to make use of the Standard Release database, but they only work with the yearly exports as a starting point; not with the API. Furthermore, the API provides access to both databases, while the yearly exports only include the Standard Release. This is why python-usda was made.
Basic items
These items can be accessed using list endpoints. They provide the basics to later access nutritional information.
Food items
One of the simplest items. A food item has an ID, also called a
ndbno
(a Nutritional Database number), and a name. A search
endpoint is available to search food items by name.
Food groups
Food items may belong in food groups; requesting for a food item will only give you the food group's name, but it is possible to list food groups themselves and get an ID linked to their name.
Nutrients
Nutrients also can be listed, and only have an ID and a name; list endpoints only provide you with IDs and names. However, they can hold measurement data when they are returned inside a report.
Derivation codes
Those codes can be listed and provide information as of how a
nutrient's measured value has been derived from multiple measurements.
This information is not fully supported by python-usda but can
still be obtained when requesting a report in the
Statistics
mode as raw JSON. Nutrients will hold indicator
codes that can be linked to descriptions using the list endpoint for
derivation codes.
Reports
To get actual nutritional information, as list endpoints will not give you anything of that sort, you need to ask for a report. There are two types of reports available.
Food Reports
Food Reports are what you would find on a product's packaging; all the nutritional facts for a given food item.
Types
There are three types of Food Reports that you can request for:
- Basic
-
The most common nutritional information; exactly what you would find on an actual product's packaging.
- Full
-
Every single available nutrient for this item.
- Statistics
-
Get more statistics-related information about the nutrient's measurements; their standard error, the way their values have been derived from multiple measurements, etc. This is not fully supported by python-usda.
In python-usda, those report types are represented by the
usda.enums.UsdaNdbReportType
enum.
Measurements
In each Food Report, you will find a list of nutrients. Those
nutrients will not only have an ID and a name, they will also hold a
value
and a unit
which express the nutrient's
quantity in 100 grams of the food item. They also have a
group
to let you regroup nutrients in nutrient
groups; those are different from food groups and cannot be
listed anywhere else.
Nutrients will also hold measures: their value is their "main measurement" but there can be more than one measurement, usually performed on another volume of the food item or in different conditions.
Those measurements will have a label
which describes the
measurement itself; most of the time, it just states the volume of food
used to perform the measurement.
The official documentation differs from what the API actually returns; what we have is a measured quantity as a decimal value with a missing unit, and a 100-gram equivalent for the measurement. python-usda tries to handle this misconception simply by abstracting away the problem and using as properties what the API actually says.
Versions
There are two versions of Food Reports:
- Version 1 Food Reports provide foot notes as a list of strings that you have to deal with yourself; you cannot link them to any data. It is only possible to request for one Version 1 food report at once.
- Version 2 Food Reports are provided with another endpoint that lets you request up to 25 reports at once, saving some time, and give you footnotes with unique IDs and a new list of Sources that are more easily handled by code.
Sources
Version 2 Food Reports provide a new sources
property; a
list of sources, mostly articles, for the measurements returned in the
report.
Sources are mostly designed to hold information about scientific publications: they have an ID, a title, a year of publication, names of the volume and issue they were first published in, and a list of authors as a long string formatted like in a bibliography citation. While this is perfectible, it is already easier to toy with those sources than with raw footnotes.
Nutrient Reports
The Nutritional Database API provides another kind of report; the Nutrient Report. They actually use a list endpoint, not a report endpoint, because they return a list of food items.
For up to 20 nutrients, you can fetch pages and pages of food items with associated nutrients and measurements data. This is perfect to get statistics about a great number of food items and a reduced set of nutrients.
python-usda handles nutrient reports by letting you iterate over them seamlessly, without ever caring about those pages and lists. You can then get food items with an added attribute for a nutrients list, that contain the same kind of information you would get in a Food Report.
API endpoints
This section goes deeper in detail about the API endpoints themselves and the implementation in python-usda, for those who want to understand some of the design choices or use the API themselves without the assistance of this Python API client.
There are many quirks that are not described in the API documentation and that are important to know to deal with this API properly, as with many other APIs that do not follow standard practices.
First of all, every endpoint requires you to give an API key as an
?api_key=
parameter. For basic testing while doing
development, you may use the DEMO_KEY
API key; but this key
is strongly rate-limited and should not be used in production. Instead,
go get a free Data.gov API key. All you need is to have a name, an
e-mail address and to go
here.
List endpoints
There are three list endpoints: /list
,
/search
and /nutrients
.
/list
-
List food items, food groups, nutrients and derivation codes.
/search
-
Search food items only, by name.
/nutrients
-
Get a Nutrient Report.
List parameters
You can perform GET requests on the /list
endpoint with
the following parameters:
lt
-
The list type. Defaults to
f
.d
for derivation codes;f
for food items;g
for food groups;n
for nutrients;nr
for all nutrients in the Standard Release database;ns
for nutrients that are not in the Standard Release database, also known as specialty nutrients.
In python-usda, this setting is represented by the
usda.enums.UsdaNdbListType
enum. max
-
Maximum number of items to return with each page. Defaults to 50. The official documentation states you can get up to 1,500 items at once; however the API actually limits to 500.
offset
-
Zero-based index of the first item that should be returned. Defaults to 0. You can use this to perform pagination ; if you got a page with the 50 first results, you can get the next pages by setting this parameter to 50, then 100, then 150, etc.
sort
-
Field to sort items on.
n
for name ori
for ID. Defaults ton
. format
-
The response return format,
xml
orjson
. Defaults tojson
. Can also be set using the HTTP Accept header on the request.
Search parameters
You can perform GET requests on the /search
endpoint
with the following parameters:
q
-
The search query. If left empty, the endpoints acts like
/list
. ds
-
A data source to restrict results to. If left empty, nutrients from all data sources are returned. The two exact following strings can be used:
Standard Reference
Branded Food Products
fg
-
A food group ID to restrict results to. If left empty, no filtering on the food group is performed.
max
-
Maximum number of items to return with each page. Defaults to 50. The official documentation states you can get up to 1,500 items at once; however the API actually limits to 500.
offset
-
Zero-based index of the first item that should be returned. Defaults to 0. You can use this to perform pagination; if you got a page with the 50 first results, you can get the next pages by setting this parameter to 50, then 100, then 150, etc.
sort
-
Field to sort items on.
n
for name orr
for relevance to the query. Defaults tor
. format
-
The response return format,
xml
orjson
. Defaults tojson
. Can also be set using the HTTP Accept header on the request.
Nutrient Report
You can perform GET requests on the /nutrient
endpoint
with the following parameters:
nutrients
-
A list of up to 20 nutrient IDs to use for the nutrient report.
ndbno
-
Optionally restrict the nutrient report to a single food item by ID.
fg
-
A list of up to 10 food group IDs to restrict results to. If left empty, no filtering on the food group is performed.
subset
-
Boolean: set this to
1
to restrict to an abridged list of about 1,000 most commonly consumed food items in the United States. Defaults to0
— show all results. max
-
Maximum number of items to return with each page. Defaults to 50. The official documentation states you can get up to 1,500 items at once; however the API actually limits to 150.
offset
-
Zero-based index of the first item that should be returned. Defaults to 0. You can use this to perform pagination; if you got a page with the 50 first results, you can get the next pages by setting this parameter to 50, then 100, then 150, etc.
sort
-
Field to sort items on.
f
for food item orc
for nutrient content. Defaults tof
. format
-
The response return format,
xml
orjson
. Defaults tojson
. Can also be set using the HTTP Accept header on the request.
Responses
List endpoint JSON responses are formatted in the following way:
{
"list": {
"start": "100",
"end": "150",
"total": "50",
"item": [...]
}
}
The list.item
array will hold all the items you
requested for. list.start
and list.end
are the
start and end indexes on this page, and list.total
is the
length of the list.item
array, not the total
number of results. The list
objects will also usually
contain other arguments depending on what you have specified in your
request, which could make it possible to write a generic parser for any
response, entirely detached from any request.
python-usda uses the usda.pagination.RawPaginator
class to provide
seamless iteration over such paginated endpoints. This class returns raw
JSON data which can then be parsed using the usda.Pagination.ModelPaginator
wrapper.
However, the Nutrient Report endpoint returns responses in the following way:
{
"report": {
"start": "100",
"end": "150",
"total": "50",
"foods": [...]
}
}
For everything else, this endpoint works just like the other list
endpoints, but the most important parts of the response, the
list
object and its item
array, are replaced
by report
and foods
.
python-usda solves this by using a custom class to paginate
over this endpoint: usda.pagination.RawNutrientReportPaginator
.
Reports endpoints
Two endpoints are available for food reports:
/reports
-
Request a single Food Report version 1 at once
/V2/reports
-
Request up to 25 Food Reports version 2 at once. Version 2 Reports add more data on sources and better footnotes.
Both endpoints can be requested using the same parameters:
ndbno
-
On Food Reports version 1, ID of a single food item to get a report for. On Food Reports version 2, a list of up to 25 food item IDs to get reports for.
type
-
The report type. Defaults to
b
.b
: Basic report type; what you could find on an actual product's packaging.f
: Full report type; every nutrient available for the food item.s
: Stats report type; additional statistics information from the Standard Release database.
In python-usda, this parameter is represented by the
usda.enums.UsdaNdbReportType
enum. format
-
The response return format,
xml
orjson
. Defaults tojson
. Can also be set using the HTTP Accept header on the request.
Errors
The API returns errors in a very inconsistent way. First of all, a warning:
Warning
Do not trust the HTTP status codes.
This API often returns HTTP 200 statuses when there actually are errors. The easiest way to handle errors is to first check for a JSON body; if there is one, parse it and see if there is an error or if it is an actual result; if there is none, then try checking the status code.
The error JSON bodies are of multiple shapes depending on the kind of error. What follows is a non-exhaustive list of errors, as it is impossible to make sure all errors are covered without a very thorough usage of the API.
API rate limit exceeded
{
"errors": {
"error": [
{
"code": "OVER_RATE_LIMIT",
"message": "..."
}
]
}
}
This error is the only known error type where there is an
errors
object that holds an error
array. A developer must have been coding under influence
here.
Invalid API key
{
"error": {
"code": "API_KEY_INVALID",
"message": "..."
}
}
Parameter error
This error occurs when one of the GET parameters in a request is invalid. This may be the most useful error message, as it usually also describes the correct values for the parameter in a way easier to understand than the official documentation.
Note that in this case, the code
property is a number
corresponding to an actual HTTP status code that should be returned as
the response's status code, but isn't.
{
"error": {
"code": 400,
"parameter": "...",
"message": "..."
}
}