Module netCDF4 :: Class Dataset

Class Dataset

object --+
         |
        Dataset
Known Subclasses:

Dataset(self, filename, mode="r", clobber=True, diskless=False, persist=False, format='NETCDF4')

A netCDF Dataset is a collection of dimensions, groups, variables and attributes. Together they describe the meaning of data and relations among data fields stored in a netCDF file.

Parameters:

filename - Name of netCDF file to hold dataset.

Keywords:

mode - access mode. r means read-only; no data can be modified. w means write; a new file is created, an existing file with the same name is deleted. a and r+ mean append (in analogy with serial files); an existing file is opened for reading and writing. Appending s to modes w, r+ or a will enable unbuffered shared access to NETCDF3_CLASSIC or NETCDF3_64BIT formatted files. Unbuffered acesss may be useful even if you don't need shared access, since it may be faster for programs that don't access data sequentially. This option is ignored for NETCDF4 and NETCDF4_CLASSIC formatted files.

clobber - if True (default), opening a file with mode='w' will clobber an existing file with the same name. if False, an exception will be raised if a file with the same name already exists.

format - underlying file format (one of 'NETCDF4', 'NETCDF4_CLASSIC', 'NETCDF3_CLASSIC' or 'NETCDF3_64BIT'. Only relevant if mode = 'w' (if mode = 'r','a' or 'r+' the file format is automatically detected). Default 'NETCDF4', which means the data is stored in an HDF5 file, using netCDF 4 API features. Setting format='NETCDF4_CLASSIC' will create an HDF5 file, using only netCDF 3 compatibile API features. netCDF 3 clients must be recompiled and linked against the netCDF 4 library to read files in NETCDF4_CLASSIC format. 'NETCDF3_CLASSIC' is the classic netCDF 3 file format that does not handle 2+ Gb files very well. 'NETCDF3_64BIT' is the 64-bit offset version of the netCDF 3 file format, which fully supports 2+ GB files, but is only compatible with clients linked against netCDF version 3.6.0 or later.

diskless - create diskless (in memory) file. This is an experimental feature added to the C library after the netcdf-4.2 release.

persist - if diskless=True, persist file to disk when closed (default False).

Returns:

a Dataset instance. All further operations on the netCDF Dataset are accomplised via Dataset instance methods.

A list of attribute names corresponding to global netCDF attributes defined for the Dataset can be obtained with the ncattrs() method. These attributes can be created by assigning to an attribute of the Dataset instance. A dictionary containing all the netCDF attribute name/value pairs is provided by the __dict__ attribute of a Dataset instance.

The instance variables dimensions, variables, groups, cmptypes, data_model, disk_format and path are read-only (and should not be modified by the user).

Instance Methods
 
__delattr__(...)
x.__delattr__('name') <==> del x.name
 
__enter__(...)
 
__exit__(...)
 
__getattr__(...)
 
__getattribute__(...)
x.__getattribute__('name') <==> x.name
 
__init__(self, filename, mode="r", clobber=True, diskless=False, persist=False, format='NETCDF4')
x.__init__(...) initializes x; see help(type(x)) for signature
a new object with type S, a subtype of T
__new__(T, S, ...)
 
__setattr__(...)
x.__setattr__('name', value) <==> x.name = value
 
__str__(x)
str(x)
 
__unicode__(...)
 
close(self)
Close the Dataset.
 
createCompoundType(self, datatype, datatype_name)
Creates a new compound data type named datatype_name from the numpy dtype object datatype.
 
createDimension(self, dimname, size=None)
Creates a new dimension with the given dimname and size.
 
createGroup(self, groupname)
Creates a new Group with the given groupname.
 
createVLType(self, datatype, datatype_name)
Creates a new VLEN data type named datatype_name from a numpy dtype object datatype.
 
createVariable(self, varname, datatype, dimensions=(), zlib=False, complevel=4, shuffle=True, fletcher32=False, contiguous=False, chunksizes=None, endian='native', least_significant_digit=None, fill_value=None)
Creates a new variable with the given varname, datatype, and dimensions.
 
delncattr(self, name, value)
delete a netCDF dataset or group attribute.
 
filepath(self)
Get the file system path (or the opendap URL) which was used to open/create the Dataset.
 
getncattr(self, name)
retrievel a netCDF dataset or group attribute.
 
ncattrs(self)
return netCDF global attribute names for this Dataset or Group in a list.
 
renameAttribute(self, oldname, newname)
rename a Dataset or Group attribute named oldname to newname.
 
renameDimension(self, oldname, newname)
rename a Dimension named oldname to newname.
 
renameGroup(self, oldname, newname)
rename a Group named oldname to newname (requires netcdf >= 4.3.1).
 
renameVariable(self, oldname, newname)
rename a Variable named oldname to newname
 
set_fill_off(self)
Sets the fill mode for a Dataset open for writing to off.
 
set_fill_on(self)
Sets the fill mode for a Dataset open for writing to on.
 
setncattr(self, name, value)
set a netCDF dataset or group attribute using name,value pair.
 
setncatts(self, attdict)
set a bunch of netCDF dataset or group attributes at once using a python dictionary.
 
sync(self)
Writes all buffered data in the Dataset to the disk file.

Inherited from object: __format__, __hash__, __reduce__, __reduce_ex__, __repr__, __sizeof__, __subclasshook__

Instance Variables
  cmptypes
The cmptypes dictionary maps the names of compound types defined for the Group or Dataset to instances of the CompoundType class.
  data_model
The data_model attribute describes the netCDF data model version, one of NETCDF3_CLASSIC, NETCDF4, NETCDF4_CLASSIC or NETCDF3_64BIT.
  dimensions
The dimensions dictionary maps the names of dimensions defined for the Group or Dataset to instances of the Dimension class.
  disk_format
The disk_format attribute describes the underlying file format, one of NETCDF3, HDF5, HDF4, PNETCDF, DAP2, DAP4 or UNDEFINED.
  groups
The groups dictionary maps the names of groups created for this Dataset or Group to instances of the Group class (the Dataset class is simply a special case of the Group class which describes the root group in the netCDF file).
  path
The path attribute shows the location of the Group in the Dataset in a unix directory format (the names of groups in the hierarchy separated by backslashes).
  variables
The variables dictionary maps the names of variables defined for this Dataset or Group to instances of the Variable class.
Properties
  file_format
  maskanscale
  parent
  vltypes

Inherited from object: __class__

Method Details

__delattr__(...)

 

x.__delattr__('name') <==> del x.name

Overrides: object.__delattr__

__getattribute__(...)

 

x.__getattribute__('name') <==> x.name

Overrides: object.__getattribute__

__init__(self, filename, mode="r", clobber=True, diskless=False, persist=False, format='NETCDF4')
(Constructor)

 

x.__init__(...) initializes x; see help(type(x)) for signature

Overrides: object.__init__

__new__(T, S, ...)

 
Returns: a new object with type S, a subtype of T
Overrides: object.__new__

__setattr__(...)

 

x.__setattr__('name', value) <==> x.name = value

Overrides: object.__setattr__

__str__(x)
(Informal representation operator)

 

str(x)

Overrides: object.__str__

createCompoundType(self, datatype, datatype_name)

 

Creates a new compound data type named datatype_name from the numpy dtype object datatype.

Attention: If the new compound data type contains other compound data types (i.e. it is a 'nested' compound type, where not all of the elements are homogenous numeric data types), then the 'inner' compound types must be created first.

The return value is the CompoundType class instance describing the new datatype.

createDimension(self, dimname, size=None)

 

Creates a new dimension with the given dimname and size.

size must be a positive integer or None, which stands for "unlimited" (default is None). Specifying a size of 0 also results in an unlimited dimension. The return value is the Dimension class instance describing the new dimension. To determine the current maximum size of the dimension, use the len function on the Dimension instance. To determine if a dimension is 'unlimited', use the isunlimited() method of the Dimension instance.

createGroup(self, groupname)

 

Creates a new Group with the given groupname.

The return value is a Group class instance describing the new group.

createVLType(self, datatype, datatype_name)

 

Creates a new VLEN data type named datatype_name from a numpy dtype object datatype.

The return value is the VLType class instance describing the new datatype.

createVariable(self, varname, datatype, dimensions=(), zlib=False, complevel=4, shuffle=True, fletcher32=False, contiguous=False, chunksizes=None, endian='native', least_significant_digit=None, fill_value=None)

 

Creates a new variable with the given varname, datatype, and dimensions. If dimensions are not given, the variable is assumed to be a scalar.

The datatype can be a numpy datatype object, or a string that describes a numpy dtype object (like the dtype.str attribue of a numpy array). Supported specifiers include: 'S1' or 'c' (NC_CHAR), 'i1' or 'b' or 'B' (NC_BYTE), 'u1' (NC_UBYTE), 'i2' or 'h' or 's' (NC_SHORT), 'u2' (NC_USHORT), 'i4' or 'i' or 'l' (NC_INT), 'u4' (NC_UINT), 'i8' (NC_INT64), 'u8' (NC_UINT64), 'f4' or 'f' (NC_FLOAT), 'f8' or 'd' (NC_DOUBLE). datatype can also be a CompoundType instance (for a structured, or compound array), a VLType instance (for a variable-length array), or the python str builtin (for a variable-length string array).

Data from netCDF variables is presented to python as numpy arrays with the corresponding data type.

dimensions must be a tuple containing dimension names (strings) that have been defined previously using createDimension. The default value is an empty tuple, which means the variable is a scalar.

If the optional keyword zlib is True, the data will be compressed in the netCDF file using gzip compression (default False).

The optional keyword complevel is an integer between 1 and 9 describing the level of compression desired (default 4). Ignored if zlib=False.

If the optional keyword shuffle is True, the HDF5 shuffle filter will be applied before compressing the data (default True). This significantly improves compression. Default is True. Ignored if zlib=False.

If the optional keyword fletcher32 is True, the Fletcher32 HDF5 checksum algorithm is activated to detect errors. Default False.

If the optional keyword contiguous is True, the variable data is stored contiguously on disk. Default False. Setting to True for a variable with an unlimited dimension will trigger an error.

The optional keyword chunksizes can be used to manually specify the HDF5 chunksizes for each dimension of the variable. A detailed discussion of HDF chunking and I/O performance is available here. Basically, you want the chunk size for each dimension to match as closely as possible the size of the data block that users will read from the file. chunksizes cannot be set if contiguous=True.

The optional keyword endian can be used to control whether the data is stored in little or big endian format on disk. Possible values are little, big or native (default). The library will automatically handle endian conversions when the data is read, but if the data is always going to be read on a computer with the opposite format as the one used to create the file, there may be some performance advantage to be gained by setting the endian-ness.

The zlib, complevel, shuffle, fletcher32, contiguous, chunksizes and endian keywords are silently ignored for netCDF 3 files that do not use HDF5.

The optional keyword fill_value can be used to override the default netCDF _FillValue (the value that the variable gets filled with before any data is written to it, defaults given in netCDF4.default_fillvals). If fill_value is set to False, then the variable is not pre-filled.

If the optional keyword parameter least_significant_digit is specified, variable data will be truncated (quantized). In conjunction with zlib=True this produces 'lossy', but significantly more efficient compression. For example, if least_significant_digit=1, data will be quantized using numpy.around(scale*data)/scale, where scale = 2**bits, and bits is determined so that a precision of 0.1 is retained (in this case bits=4). From http://www.cdc.noaa.gov/cdc/conventions/cdc_netcdf_standard.shtml: "least_significant_digit -- power of ten of the smallest decimal place in unpacked data that is a reliable value." Default is None, or no quantization, or 'lossless' compression.

When creating variables in a NETCDF4 or NETCDF4_CLASSIC formatted file, HDF5 creates something called a 'chunk cache' for each variable. The default size of the chunk cache may be large enough to completely fill available memory when creating thousands of variables. The optional keyword chunk_cache allows you to reduce (or increase) the size of the default chunk cache when creating a variable. The setting only persists as long as the Dataset is open - you can use the set_var_chunk_cache method to change it the next time the Dataset is opened. Warning - messing with this parameter can seriously degrade performance.

The return value is the Variable class instance describing the new variable.

A list of names corresponding to netCDF variable attributes can be obtained with the Variable method ncattrs(). A dictionary containing all the netCDF attribute name/value pairs is provided by the __dict__ attribute of a Variable instance.

Variable instances behave much like array objects. Data can be assigned to or retrieved from a variable with indexing and slicing operations on the Variable instance. A Variable instance has five Dataset standard attributes: dimensions, dtype, shape, ndim and least_significant_digit. Application programs should never modify these attributes. The dimensions attribute is a tuple containing the names of the dimensions associated with this variable. The dtype attribute is a string describing the variable's data type (i4, f8, S1, etc). The shape attribute is a tuple describing the current sizes of all the variable's dimensions. The least_significant_digit attributes describes the power of ten of the smallest decimal place in the data the contains a reliable value. assigned to the Variable instance. If None, the data is not truncated. The ndim attribute is the number of variable dimensions.

delncattr(self, name, value)

 

delete a netCDF dataset or group attribute. Only use if you need to delete a netCDF attribute with the same name as one of the reserved python attributes.

filepath(self)

 

Get the file system path (or the opendap URL) which was used to open/create the Dataset. Requires netcdf >= 4.1.2

getncattr(self, name)

 

retrievel a netCDF dataset or group attribute. Only use if you need to set a netCDF attribute with the same name as one of the reserved python attributes.

set_fill_off(self)

 

Sets the fill mode for a Dataset open for writing to off.

This will prevent the data from being pre-filled with fill values, which may result in some performance improvements. However, you must then make sure the data is actually written before being read.

set_fill_on(self)

 

Sets the fill mode for a Dataset open for writing to on.

This causes data to be pre-filled with fill values. The fill values can be controlled by the variable's _Fill_Value attribute, but is usually sufficient to the use the netCDF default _Fill_Value (defined separately for each variable type). The default behavior of the netCDF library correspongs to set_fill_on. Data which are equal to the _Fill_Value indicate that the variable was created, but never written to.

setncattr(self, name, value)

 

set a netCDF dataset or group attribute using name,value pair. Only use if you need to set a netCDF attribute with the same name as one of the reserved python attributes.

setncatts(self, attdict)

 

set a bunch of netCDF dataset or group attributes at once using a python dictionary. This may be faster when setting a lot of attributes for a NETCDF3 formatted file, since nc_redef/nc_enddef is not called in between setting each attribute


Instance Variable Details

disk_format

The disk_format attribute describes the underlying file format, one of NETCDF3, HDF5, HDF4, PNETCDF, DAP2, DAP4 or UNDEFINED. Only available if using netcdf C library version >= 4.3.1, otherwise will always return UNDEFINED.

path

The path attribute shows the location of the Group in the Dataset in a unix directory format (the names of groups in the hierarchy separated by backslashes). A Dataset, instance is the root group, so the path is simply '/'.