diff --git a/docs/guide.rst b/docs/guide.rst new file mode 100644 index 0000000..5cf9fd4 --- /dev/null +++ b/docs/guide.rst @@ -0,0 +1,442 @@ +NDB API Guide +============= + +This guide aims to provide a better help than what the USDA gives, by +describing both all of the available data and the ways to access it. + +.. contents:: + :local: + :backlinks: none + +The Database +------------ + +The Nutritional Database is split in two parts: + +* The **Standard Release** database, or **SR**: It holds nutritional + information for common foods with no associated brands; useful to answer + requests like "regular oatmeal". This part of the database is released + yearly in multiple formats, including an Access Database. +* The **Branded Foods** database: Holds nutritional information for branded + food items from US manufacturers; useful to answer more specific requests + like "McFlurry with Oreo cookies". + +There are a few Python packages to provide ways to make use of the Standard +Release database, but they only work with the yearly exports as a starting +point; not with the API. Furthermore, the API provides access to both +databases, while the yearly exports only include the Standard Release. +This is why *python-usda* was made. + +Basic items +----------- + +These items can be accessed using list endpoints. They provide the basics to +later access nutritional information. + +Food items +^^^^^^^^^^ + +One of the simplest items. A food item has an ID, also called a ``ndbno`` +(a Nutritional Database number), and a name. A search endpoint is available to +search food items by name. + +Food groups +^^^^^^^^^^^ + +Food items may belong in food groups; requesting for a food item will only +give you the food group's name, but it is possible to list food groups +themselves and get an ID linked to their name. + +Nutrients +^^^^^^^^^ + +Nutrients also can be listed, and only have an ID and a name; list endpoints +only provide you with IDs and names. However, they can hold measurement data +when they are returned inside a report. + +Derivation codes +^^^^^^^^^^^^^^^^ + +Those codes can be listed and provide information as of how a nutrient's +measured value has been derived from multiple measurements. This information +is not fully supported by *python-usda* but can still be obtained when +requesting a report in the ``Statistics`` mode as raw JSON. +Nutrients will hold indicator codes that can be linked to descriptions using +the list endpoint for derivation codes. + +Reports +------- + +To get actual nutritional information, as list endpoints will not give you +anything of that sort, you need to ask for a report. There are two types of +reports available. + +Food Reports +^^^^^^^^^^^^ + +Food Reports are what you would find on a product's packaging; all the +nutritional facts for a given food item. + +Types +''''' + +There are three types of Food Reports that you can request for: + +Basic + The most common nutritional information; exactly what you would find on an + actual product's packaging. +Full + Every single available nutrient for this item. +Statistics + Get more statistics-related information about the nutrient's measurements; + their standard error, the way their values have been derived from multiple + measurements, etc. This is not fully supported by *python-usda*. + +In *python-usda*, those report types are represented by the +:class:`usda.enums.UsdaNdbReportType` enum. + +Measurements +'''''''''''' + +In each Food Report, you will find a list of nutrients. Those nutrients will +not only have an ID and a name, they will also hold a ``value`` and a +``unit`` which express the nutrient's quantity in 100 grams of the food item. +They also have a ``group`` to let you regroup nutrients in *nutrient groups*; +those are different from *food groups* and cannot be listed anywhere else. + +Nutrients will also hold **measures**: their value is their "main measurement" +but there can be more than one measurement, usually performed on another +volume of the food item or in different conditions. + +Those measurements will have a ``label`` which describes the measurement +itself; most of the time, it just states the volume of food used to perform +the measurement. + +The official documentation differs from what the API actually returns; what +we have is a measured quantity as a decimal value with a missing unit, and a +100-gram equivalent for the measurement. *python-usda* tries to handle this +misconception simply by abstracting away the problem and using as properties +what the API actually says. + +Versions +'''''''' + +There are two versions of Food Reports: + +* **Version 1** Food Reports provide foot notes as a list of strings that you + have to deal with yourself; you cannot link them to any data. It is only + possible to request for one Version 1 food report at once. +* **Version 2** Food Reports are provided with another endpoint that lets you + request up to 25 reports at once, saving some time, and give you footnotes + with unique IDs and a new list of Sources that are more easily handled by + code. + +Sources +''''''' + +Version 2 Food Reports provide a new ``sources`` property; a list of sources, +mostly articles, for the measurements returned in the report. + +Sources are mostly designed to hold information about scientific publications: +they have an ID, a title, a year of publication, names of the volume and issue +they were first published in, and a list of authors as a long string formatted +like in a bibliography citation. While this is perfectible, it is already +easier to toy with those sources than with raw footnotes. + +Nutrient Reports +^^^^^^^^^^^^^^^^ + +The Nutritional Database API provides another kind of report; the Nutrient +Report. They actually use a list endpoint, not a report endpoint, because they +return a list of **food items**. + +For up to 20 nutrients, you can fetch pages and pages of food items with +associated nutrients and measurements data. This is perfect to get statistics +about a great number of food items and a reduced set of nutrients. + +*python-usda* handles nutrient reports by letting you iterate over them +seamlessly, without ever caring about those pages and lists. You can then get +food items with an added attribute for a nutrients list, that contain the +same kind of information you would get in a Food Report. + +API endpoints +------------- + +This section goes deeper in detail about the API endpoints themselves and the +implementation in *python-usda*, for those who want to understand some of the +design choices or use the API themselves without the assistance of this Python +API client. + +There are many quirks that are not described in the API documentation and that +are important to know to deal with this API properly, as with many other APIs +that do not follow standard practices. + +First of all, every endpoint requires you to give an API key as an +``?api_key=`` parameter. For basic testing while doing development, you may +use the ``DEMO_KEY`` API key; but this key is strongly rate-limited and should +not be used in production. Instead, go get a free Data.gov API key. All you +need is to have a name, an e-mail address and to +`go here `_. + +List endpoints +^^^^^^^^^^^^^^ + +There are three list endpoints: ``/list``, ``/search`` and ``/nutrients``. + +``/list`` + List food items, food groups, nutrients and derivation codes. +``/search`` + Search food items only, by name. +``/nutrients`` + Get a Nutrient Report. + +List parameters +''''''''''''''' + +You can perform GET requests on the ``/list`` endpoint with the following +parameters: + +``lt`` + The list type. Defaults to ``f``. + + * ``d`` for derivation codes; + * ``f`` for food items; + * ``g`` for food groups; + * ``n`` for nutrients; + * ``nr`` for all nutrients in the Standard Release database; + * ``ns`` for nutrients that are not in the Standard Release database, + also known as *specialty nutrients*. + + In *python-usda*, this setting is represented by the + :class:`usda.enums.UsdaNdbListType` enum. +``max`` + Maximum number of items to return with each page. Defaults to 50. + The official documentation states you can get up to 1,500 items at once; + however the API actually limits to 500. +``offset`` + Zero-based index of the first item that should be returned. + Defaults to 0. You can use this to perform pagination ; + if you got a page with the 50 first results, you can get the next pages by + setting this parameter to 50, then 100, then 150, etc. +``sort`` + Field to sort items on. ``n`` for name or ``i`` for ID. Defaults to ``n``. +``format`` + The response return format, ``xml`` or ``json``. Defaults to ``json``. + Can also be set using the HTTP Accept header on the request. + +Search parameters +''''''''''''''''' + +You can perform GET requests on the ``/search`` endpoint with the following +parameters: + +``q`` + The search query. If left empty, the endpoints acts like ``/list``. +``ds`` + A data source to restrict results to. If left empty, nutrients from all + data sources are returned. The two exact following strings can be used: + + * ``Standard Reference`` + * ``Branded Food Products`` +``fg`` + A food group ID to restrict results to. If left empty, no filtering on the + food group is performed. +``max`` + Maximum number of items to return with each page. Defaults to 50. + The official documentation states you can get up to 1,500 items at once; + however the API actually limits to 500. +``offset`` + Zero-based index of the first item that should be returned. + Defaults to 0. You can use this to perform pagination; + if you got a page with the 50 first results, you can get the next pages by + setting this parameter to 50, then 100, then 150, etc. +``sort`` + Field to sort items on. ``n`` for name or ``r`` for relevance to the query. + Defaults to ``r``. +``format`` + The response return format, ``xml`` or ``json``. Defaults to ``json``. + Can also be set using the HTTP Accept header on the request. + +Nutrient Report +''''''''''''''' + +You can perform GET requests on the ``/nutrient`` endpoint with the following +parameters: + +``nutrients`` + A list of up to 20 nutrient IDs to use for the nutrient report. +``ndbno`` + Optionally restrict the nutrient report to a single food item by ID. +``fg`` + A list of up to 10 food group IDs to restrict results to. + If left empty, no filtering on the food group is performed. +``subset`` + Boolean: set this to ``1`` to restrict to an abridged list of about 1,000 + most commonly consumed food items in the United States. + Defaults to ``0`` — show all results. +``max`` + Maximum number of items to return with each page. Defaults to 50. + The official documentation states you can get up to 1,500 items at once; + however the API actually limits to 150. +``offset`` + Zero-based index of the first item that should be returned. + Defaults to 0. You can use this to perform pagination; + if you got a page with the 50 first results, you can get the next pages by + setting this parameter to 50, then 100, then 150, etc. +``sort`` + Field to sort items on. ``f`` for food item or ``c`` for nutrient content. + Defaults to ``f``. +``format`` + The response return format, ``xml`` or ``json``. Defaults to ``json``. + Can also be set using the HTTP Accept header on the request. + +Responses +''''''''' + +List endpoint JSON responses are formatted in the following way: + +.. code:: json + + { + "list": { + "start": "100", + "end": "150", + "total": "50", + "item": [...] + } + } + +The ``list.item`` array will hold all the items you requested for. +``list.start`` and ``list.end`` are the start and end indexes on this page, +and ``list.total`` is the length of the ``list.item`` array, *not* the total +number of results. The ``list`` objects will also usually contain other +arguments depending on what you have specified in your request, which could +make it possible to write a generic parser for any response, entirely +detached from any request. + +*python-usda* uses the :class:`usda.pagination.RawPaginator` class to provide +seamless iteration over such paginated endpoints. +This class returns raw JSON data which can then be parsed using the +:class:`usda.Pagination.ModelPaginator` wrapper. + +However, the Nutrient Report endpoint returns responses in the following way: + +.. code:: json + + { + "report": { + "start": "100", + "end": "150", + "total": "50", + "foods": [...] + } + } + +For everything else, this endpoint works just like the other list endpoints, +but the most important parts of the response, the ``list`` object and its +``item`` array, are replaced by ``report`` and ``foods``. + +*python-usda* solves this by using a custom class to paginate over this +endpoint: :class:`usda.pagination.RawNutrientReportPaginator`. + +Reports endpoints +^^^^^^^^^^^^^^^^^ + +Two endpoints are available for food reports: + +``/reports`` + Request a single Food Report version 1 at once +``/V2/reports`` + Request up to 25 Food Reports version 2 at once. Version 2 Reports add + more data on sources and better footnotes. + +Both endpoints can be requested using the same parameters: + +``ndbno`` + On Food Reports version 1, ID of a single food item to get a report for. + On Food Reports version 2, a list of up to 25 food item IDs to get + reports for. +``type`` + The report type. Defaults to ``b``. + + * ``b``: Basic report type; what you could find on an actual product's + packaging. + * ``f``: Full report type; every nutrient available for the food item. + * ``s``: Stats report type; additional statistics information from the + Standard Release database. + + In *python-usda*, this parameter is represented by the + :class:`usda.enums.UsdaNdbReportType` enum. +``format`` + The response return format, ``xml`` or ``json``. Defaults to ``json``. + Can also be set using the HTTP Accept header on the request. + +Errors +^^^^^^ + +The API returns errors in a very inconsistent way. First of all, a warning: + +.. warning:: Do not trust the HTTP status codes. + +This API often returns HTTP 200 statuses when there actually are errors. The +easiest way to handle errors is to first check for a JSON body; if there is +one, parse it and see if there is an error or if it is an actual result; if +there is none, *then* try checking the status code. + +The error JSON bodies are of multiple shapes depending on the kind of error. +What follows is a non-exhaustive list of errors, as it is impossible to make +sure all errors are covered without a very thorough usage of the API. + +API rate limit exceeded +''''''''''''''''''''''' + +.. code:: json + + { + "errors": { + "error": [ + { + "code": "OVER_RATE_LIMIT", + "message": "..." + } + ] + } + } + +This error is the only known error type where there is an ``errors`` *object* +that holds an ``error`` *array*. A developer must have been coding under +influence here. + +Invalid API key +''''''''''''''' + +.. code:: json + + { + "error": { + "code": "API_KEY_INVALID", + "message": "..." + } + } + +Parameter error +''''''''''''''' + +This error occurs when one of the GET parameters in a request is invalid. +This may be the most useful error message, as it usually also describes the +correct values for the parameter in a way easier to understand than the +official documentation. + +Note that in this case, the ``code`` property is a number corresponding to an +actual HTTP status code that should be returned as the response's status code, +but isn't. + +.. code:: json + + { + "error": { + "code": 400, + "parameter": "...", + "message": "..." + } + }