Add the Nutritional Database API Guide

2018-10-26 22:02:27 +02:00 · 2018-10-26 22:02:27 +02:00 · a3ac976c13
parent 7bf66d06fc
commit a3ac976c13
1 changed files with 442 additions and 0 deletions
--- a/docs/guide.rst
+++ b/docs/guide.rst
@ -0,0 +1,442 @@
+NDB API Guide
+=============
+
+This guide aims to provide a better help than what the USDA gives, by
+describing both all of the available data and the ways to access it.
+
+.. contents::
+   :local:
+   :backlinks: none
+
+The Database
+------------
+
+The Nutritional Database is split in two parts:
+
+* The **Standard Release** database, or **SR**: It holds nutritional
+  information for common foods with no associated brands; useful to answer
+  requests like "regular oatmeal". This part of the database is released
+  yearly in multiple formats, including an Access Database.
+* The **Branded Foods** database: Holds nutritional information for branded
+  food items from US manufacturers; useful to answer more specific requests
+  like "McFlurry with Oreo cookies".
+
+There are a few Python packages to provide ways to make use of the Standard
+Release database, but they only work with the yearly exports as a starting
+point; not with the API. Furthermore, the API provides access to both
+databases, while the yearly exports only include the Standard Release.
+This is why *python-usda* was made.
+
+Basic items
+-----------
+
+These items can be accessed using list endpoints. They provide the basics to
+later access nutritional information.
+
+Food items
+^^^^^^^^^^
+
+One of the simplest items. A food item has an ID, also called a ``ndbno``
+(a Nutritional Database number), and a name. A search endpoint is available to
+search food items by name.
+
+Food groups
+^^^^^^^^^^^
+
+Food items may belong in food groups; requesting for a food item will only
+give you the food group's name, but it is possible to list food groups
+themselves and get an ID linked to their name.
+
+Nutrients
+^^^^^^^^^
+
+Nutrients also can be listed, and only have an ID and a name; list endpoints
+only provide you with IDs and names. However, they can hold measurement data
+when they are returned inside a report.
+
+Derivation codes
+^^^^^^^^^^^^^^^^
+
+Those codes can be listed and provide information as of how a nutrient's
+measured value has been derived from multiple measurements. This information
+is not fully supported by *python-usda* but can still be obtained when
+requesting a report in the ``Statistics`` mode as raw JSON.
+Nutrients will hold indicator codes that can be linked to descriptions using
+the list endpoint for derivation codes.
+
+Reports
+-------
+
+To get actual nutritional information, as list endpoints will not give you
+anything of that sort, you need to ask for a report. There are two types of
+reports available.
+
+Food Reports
+^^^^^^^^^^^^
+
+Food Reports are what you would find on a product's packaging; all the
+nutritional facts for a given food item.
+
+Types
+'''''
+
+There are three types of Food Reports that you can request for:
+
+Basic
+   The most common nutritional information; exactly what you would find on an
+   actual product's packaging.
+Full
+   Every single available nutrient for this item.
+Statistics
+   Get more statistics-related information about the nutrient's measurements;
+   their standard error, the way their values have been derived from multiple
+   measurements, etc. This is not fully supported by *python-usda*.
+
+In *python-usda*, those report types are represented by the
+:class:`usda.enums.UsdaNdbReportType` enum.
+
+Measurements
+''''''''''''
+
+In each Food Report, you will find a list of nutrients. Those nutrients will
+not only have an ID and a name, they will also hold a ``value`` and a
+``unit`` which express the nutrient's quantity in 100 grams of the food item.
+They also have a ``group`` to let you regroup nutrients in *nutrient groups*;
+those are different from *food groups* and cannot be listed anywhere else.
+
+Nutrients will also hold **measures**: their value is their "main measurement"
+but there can be more than one measurement, usually performed on another
+volume of the food item or in different conditions.
+
+Those measurements will have a ``label`` which describes the measurement
+itself; most of the time, it just states the volume of food used to perform
+the measurement.
+
+The official documentation differs from what the API actually returns; what
+we have is a measured quantity as a decimal value with a missing unit, and a
+100-gram equivalent for the measurement. *python-usda* tries to handle this
+misconception simply by abstracting away the problem and using as properties
+what the API actually says.
+
+Versions
+''''''''
+
+There are two versions of Food Reports:
+
+* **Version 1** Food Reports provide foot notes as a list of strings that you
+  have to deal with yourself; you cannot link them to any data. It is only
+  possible to request for one Version 1 food report at once.
+* **Version 2** Food Reports are provided with another endpoint that lets you
+  request up to 25 reports at once, saving some time, and give you footnotes
+  with unique IDs and a new list of Sources that are more easily handled by
+  code.
+
+Sources
+'''''''
+
+Version 2 Food Reports provide a new ``sources`` property; a list of sources,
+mostly articles, for the measurements returned in the report.
+
+Sources are mostly designed to hold information about scientific publications:
+they have an ID, a title, a year of publication, names of the volume and issue
+they were first published in, and a list of authors as a long string formatted
+like in a bibliography citation. While this is perfectible, it is already
+easier to toy with those sources than with raw footnotes.
+
+Nutrient Reports
+^^^^^^^^^^^^^^^^
+
+The Nutritional Database API provides another kind of report; the Nutrient
+Report. They actually use a list endpoint, not a report endpoint, because they
+return a list of **food items**.
+
+For up to 20 nutrients, you can fetch pages and pages of food items with
+associated nutrients and measurements data. This is perfect to get statistics
+about a great number of food items and a reduced set of nutrients.
+
+*python-usda* handles nutrient reports by letting you iterate over them
+seamlessly, without ever caring about those pages and lists. You can then get
+food items with an added attribute for a nutrients list, that contain the
+same kind of information you would get in a Food Report.
+
+API endpoints
+-------------
+
+This section goes deeper in detail about the API endpoints themselves and the
+implementation in *python-usda*, for those who want to understand some of the
+design choices or use the API themselves without the assistance of this Python
+API client.
+
+There are many quirks that are not described in the API documentation and that
+are important to know to deal with this API properly, as with many other APIs
+that do not follow standard practices.
+
+First of all, every endpoint requires you to give an API key as an
+``?api_key=`` parameter. For basic testing while doing development, you may
+use the ``DEMO_KEY`` API key; but this key is strongly rate-limited and should
+not be used in production. Instead, go get a free Data.gov API key. All you
+need is to have a name, an e-mail address and to
+`go here <https://api.data.gov/signup/>`_.
+
+List endpoints
+^^^^^^^^^^^^^^
+
+There are three list endpoints: ``/list``, ``/search`` and ``/nutrients``.
+
+``/list``
+   List food items, food groups, nutrients and derivation codes.
+``/search``
+   Search food items only, by name.
+``/nutrients``
+   Get a Nutrient Report.
+
+List parameters
+'''''''''''''''
+
+You can perform GET requests on the ``/list`` endpoint with the following
+parameters:
+
+``lt``
+   The list type. Defaults to ``f``.
+
+   * ``d`` for derivation codes;
+   * ``f`` for food items;
+   * ``g`` for food groups;
+   * ``n`` for nutrients;
+   * ``nr`` for all nutrients in the Standard Release database;
+   * ``ns`` for nutrients that are not in the Standard Release database,
+     also known as *specialty nutrients*.
+
+   In *python-usda*, this setting is represented by the
+   :class:`usda.enums.UsdaNdbListType` enum.
+``max``
+   Maximum number of items to return with each page. Defaults to 50.
+   The official documentation states you can get up to 1,500 items at once;
+   however the API actually limits to 500.
+``offset``
+   Zero-based index of the first item that should be returned.
+   Defaults to 0. You can use this to perform pagination ;
+   if you got a page with the 50 first results, you can get the next pages by
+   setting this parameter to 50, then 100, then 150, etc.
+``sort``
+   Field to sort items on. ``n`` for name or ``i`` for ID. Defaults to ``n``.
+``format``
+   The response return format, ``xml`` or ``json``. Defaults to ``json``.
+   Can also be set using the HTTP Accept header on the request.
+
+Search parameters
+'''''''''''''''''
+
+You can perform GET requests on the ``/search`` endpoint with the following
+parameters:
+
+``q``
+   The search query. If left empty, the endpoints acts like ``/list``.
+``ds``
+   A data source to restrict results to. If left empty, nutrients from all
+   data sources are returned. The two exact following strings can be used:
+
+   * ``Standard Reference``
+   * ``Branded Food Products``
+``fg``
+   A food group ID to restrict results to. If left empty, no filtering on the
+   food group is performed.
+``max``
+   Maximum number of items to return with each page. Defaults to 50.
+   The official documentation states you can get up to 1,500 items at once;
+   however the API actually limits to 500.
+``offset``
+   Zero-based index of the first item that should be returned.
+   Defaults to 0. You can use this to perform pagination;
+   if you got a page with the 50 first results, you can get the next pages by
+   setting this parameter to 50, then 100, then 150, etc.
+``sort``
+   Field to sort items on. ``n`` for name or ``r`` for relevance to the query.
+   Defaults to ``r``.
+``format``
+   The response return format, ``xml`` or ``json``. Defaults to ``json``.
+   Can also be set using the HTTP Accept header on the request.
+
+Nutrient Report
+'''''''''''''''
+
+You can perform GET requests on the ``/nutrient`` endpoint with the following
+parameters:
+
+``nutrients``
+   A list of up to 20 nutrient IDs to use for the nutrient report.
+``ndbno``
+   Optionally restrict the nutrient report to a single food item by ID.
+``fg``
+   A list of up to 10 food group IDs to restrict results to.
+   If left empty, no filtering on the food group is performed.
+``subset``
+   Boolean: set this to ``1`` to restrict to an abridged list of about 1,000
+   most commonly consumed food items in the United States.
+   Defaults to ``0`` — show all results.
+``max``
+   Maximum number of items to return with each page. Defaults to 50.
+   The official documentation states you can get up to 1,500 items at once;
+   however the API actually limits to 150.
+``offset``
+   Zero-based index of the first item that should be returned.
+   Defaults to 0. You can use this to perform pagination;
+   if you got a page with the 50 first results, you can get the next pages by
+   setting this parameter to 50, then 100, then 150, etc.
+``sort``
+   Field to sort items on. ``f`` for food item or ``c`` for nutrient content.
+   Defaults to ``f``.
+``format``
+   The response return format, ``xml`` or ``json``. Defaults to ``json``.
+   Can also be set using the HTTP Accept header on the request.
+
+Responses
+'''''''''
+
+List endpoint JSON responses are formatted in the following way:
+
+.. code:: json
+
+   {
+       "list": {
+           "start": "100",
+           "end": "150",
+           "total": "50",
+           "item": [...]
+       }
+   }
+
+The ``list.item`` array will hold all the items you requested for.
+``list.start`` and ``list.end`` are the start and end indexes on this page,
+and ``list.total`` is the length of the ``list.item`` array, *not* the total
+number of results. The ``list`` objects will also usually contain other
+arguments depending on what you have specified in your request, which could
+make it possible to write a generic parser for any response, entirely
+detached from any request.
+
+*python-usda* uses the :class:`usda.pagination.RawPaginator` class to provide
+seamless iteration over such paginated endpoints.
+This class returns raw JSON data which can then be parsed using the
+:class:`usda.Pagination.ModelPaginator` wrapper.
+
+However, the Nutrient Report endpoint returns responses in the following way:
+
+.. code:: json
+
+   {
+       "report": {
+           "start": "100",
+           "end": "150",
+           "total": "50",
+           "foods": [...]
+       }
+   }
+
+For everything else, this endpoint works just like the other list endpoints,
+but the most important parts of the response, the ``list`` object and its
+``item`` array, are replaced by ``report`` and ``foods``.
+
+*python-usda* solves this by using a custom class to paginate over this
+endpoint: :class:`usda.pagination.RawNutrientReportPaginator`.
+
+Reports endpoints
+^^^^^^^^^^^^^^^^^
+
+Two endpoints are available for food reports:
+
+``/reports``
+   Request a single Food Report version 1 at once
+``/V2/reports``
+   Request up to 25 Food Reports version 2 at once. Version 2 Reports add
+   more data on sources and better footnotes.
+
+Both endpoints can be requested using the same parameters:
+
+``ndbno``
+   On Food Reports version 1, ID of a single food item to get a report for.
+   On Food Reports version 2, a list of up to 25 food item IDs to get
+   reports for.
+``type``
+   The report type. Defaults to ``b``.
+
+   * ``b``: Basic report type; what you could find on an actual product's
+     packaging.
+   * ``f``: Full report type; every nutrient available for the food item.
+   * ``s``: Stats report type; additional statistics information from the
+     Standard Release database.
+
+   In *python-usda*, this parameter is represented by the
+   :class:`usda.enums.UsdaNdbReportType` enum.
+``format``
+   The response return format, ``xml`` or ``json``. Defaults to ``json``.
+   Can also be set using the HTTP Accept header on the request.
+
+Errors
+^^^^^^
+
+The API returns errors in a very inconsistent way. First of all, a warning:
+
+.. warning:: Do not trust the HTTP status codes.
+
+This API often returns HTTP 200 statuses when there actually are errors. The
+easiest way to handle errors is to first check for a JSON body; if there is
+one, parse it and see if there is an error or if it is an actual result; if
+there is none, *then* try checking the status code.
+
+The error JSON bodies are of multiple shapes depending on the kind of error.
+What follows is a non-exhaustive list of errors, as it is impossible to make
+sure all errors are covered without a very thorough usage of the API.
+
+API rate limit exceeded
+'''''''''''''''''''''''
+
+.. code:: json
+
+   {
+       "errors": {
+           "error": [
+               {
+                   "code": "OVER_RATE_LIMIT",
+                   "message": "..."
+               }
+           ]
+       }
+   }
+
+This error is the only known error type where there is an ``errors`` *object*
+that holds an ``error`` *array*. A developer must have been coding under
+influence here.
+
+Invalid API key
+'''''''''''''''
+
+.. code:: json
+
+   {
+       "error": {
+           "code": "API_KEY_INVALID",
+           "message": "..."
+       }
+   }
+
+Parameter error
+'''''''''''''''
+
+This error occurs when one of the GET parameters in a request is invalid.
+This may be the most useful error message, as it usually also describes the
+correct values for the parameter in a way easier to understand than the
+official documentation.
+
+Note that in this case, the ``code`` property is a number corresponding to an
+actual HTTP status code that should be returned as the response's status code,
+but isn't.
+
+.. code:: json
+
+   {
+       "error": {
+           "code": 400,
+           "parameter": "...",
+           "message": "..."
+       }
+   }