如何在 DynamoDB 中添加列

有没有一种方法可以在 Amazon 的 AWS 中的 DynamoDB 中向现有表添加一个新列?

谷歌帮不上忙,

http://docs.aws.amazon.com/cli/latest/reference/dynamodb/update-table.html?highlight=update%20table中的 UpdateTable 查询没有任何与添加新列相关的信息。

110946 次浏览

DynamoDB does not require schema definition, and so there is no such thing as a "column". You can just add a new item with a new attribute.

A way to add a new column to existing table in DynamoDB in Amazon's AWS:

We can store the values in DynamoDb in 2 ways, (i) In an RDBMS Type of Structure for the DynamoDB, we can add a new Coulmn by executing the same command keeping the "new Column" entry within which the Records in the Existing Table has been created. we can use DynamoDb with the Records/ Rows having Values for certain Columns while other columns does not have Values.

(ii) In a NoSQL kind of Structure; where we store a Json String within a Column to keep all the attributes as per the Requirement. Here we are generating a json string and we have to add the new Attribute into the json String which can then be inserted into the same Column but with the new Attribute.

Well, let's not get dragged away in the semantic discussion about the difference between "fields" and "columns". The word "column" does remind us of relational databases, which dynamodb is not. In essence that means that dynamodb does not have foreign keys.

Dynamodb does have "primary partition keys" and "index partition keys" though, just as with relational databases. (Although there are other strategies as well)

You do need to respect those keys when you add data. But aside from those requirements, you don't have to predefine your fields (except for those partition keys mentioned earlier).

Assuming that you are new to this, some additional good practices:

  • Add a numeric field to each record, to store the time of creation in seconds. Dynamodb has optional cleaning features, which require this type of field in your data.
  • You cannot use dates in dynamodb, so you have to store those as numeric fields or as strings. Given the previously mentioned remark, you may prefer a numeric type for them.
  • Don't store big documents in it, because there is a maximum fetch size of 16MB, and a maximum record size of 400KB. Fortunately, AWS has S3 storage and other kind of databases (e.g. DocumentDB).

There are many strategies for table keys:

  • If you only declare the partition-key, then it acts like a primary key (e.g. partition-key=invoiceId). That's fine.
  • If your object has a parent reference. (e.g. invoices have a customer), then you probably want to add a sort-key. (e.g. partition-key=customerId;sort-key=invoiceId) Together they behave like a composed key. The advantage is that you can do a lookup using both keys, or just using the partition-key. (e.g. request a specific invoice for a specific customer, or all invoices for that customer)

I installed NoSQL Workbench then connected to existing DynamoDB Table and tried to update existing Item by adding a new attribute. I figured out that we can only add a new attribute with one of these types - "SS", "NS", "BS" (String Set, Number Set, Binary Set"). In Workbench, we can generate code for the chosen operation.

enter image description here

I scanned my dynamodb Table and for each item added new attribute with type "SS" then I scanned again and updated recently added new attribute to type - "S" in order create a global secondary index (GSI) with a primary key - "pk2NewAttr".

NoSQL Workbench related video - https://www.youtube.com/watch?v=Xn12QSNa4RE&feature=youtu.be&t=3666

Example in Python "how to scan all dynamodb Table" - https://gist.github.com/pgolding

This script will either delete a record, or add a ttl field. You might want to tailor it to your column name and remove the delete stuff.

Usage:

usage: add_ttl.py [-h] --profile PROFILE [-d] [--force_delete_all] [-v] [-q]


Procedurally modify DynamoDB


optional arguments:
-h, --help          show this help message and exit
--profile PROFILE   AWS profile name
-d, --dryrun        Dry run, take no action
--force_delete_all  Delete all records, including valid, unexpired
-v, --verbose       set loglevel to DEBUG
-q, --quiet         set loglevel to ERROR

Script:

#!/usr/bin/env python3
# pylint:disable=duplicate-code


import argparse
import logging
import sys
from collections import Counter
from dataclasses import dataclass
from datetime import datetime, timedelta, timezone
from functools import cached_property
from typing import Dict, Optional


import boto3
from dateutil.parser import isoparse
from tqdm import tqdm


LOGGER = logging.getLogger(__name__)


DATE_FORMAT = "%Y-%m-%d %H:%M:%S"
LOG_FORMAT = "[%(asctime)s] %(levelname)s:%(name)s:%(message)s"




def setup_logging(loglevel=None, date_format=None, log_format=None):
"""Setup basic logging.


Args:
loglevel (int): minimum loglevel for emitting messages
"""


logging.basicConfig(
level=loglevel or logging.INFO,
stream=sys.stdout,
format=log_format or LOG_FORMAT,
datefmt=date_format or DATE_FORMAT,
)




def parse_args():
"""
Extract the CLI arguments from argparse
"""
parser = argparse.ArgumentParser(description="Procedurally modify DynamoDB")


parser.add_argument(
"--profile",
help="AWS profile name",
required=True,
)


parser.add_argument(
"-d",
"--dryrun",
action="store_true",
default=False,
help="Dry run, take no action",
)


parser.add_argument(
"--force_delete_all",
action="store_true",
default=False,
help="Delete all records, including valid, unexpired",
)


parser.add_argument(
"-v",
"--verbose",
dest="loglevel",
help="set loglevel to DEBUG",
action="store_const",
const=logging.DEBUG,
)


parser.add_argument(
"-q",
"--quiet",
dest="loglevel",
help="set loglevel to ERROR",
action="store_const",
const=logging.ERROR,
)


return parser.parse_args()




def query_yes_no(question, default="yes"):
"""Ask a yes/no question via input() and return their answer.


"question" is a string that is presented to the user.
"default" is the presumed answer if the user just hits <Enter>.
It must be "yes" (the default), "no" or None (meaning
an answer is required of the user).


The "answer" return value is True for "yes" or False for "no".
"""
valid = {"yes": True, "y": True, "ye": True, "no": False, "n": False}
if default is None:
prompt = " [y/n] "
elif default == "yes":
prompt = " [Y/n] "
elif default == "no":
prompt = " [y/N] "
else:
raise ValueError("invalid default answer: '%s'" % default)


while True:
sys.stdout.write(question + prompt)
choice = input().lower()
if default is not None and choice == "":
return valid[default]


if choice in valid:
return valid[choice]


sys.stdout.write("Please respond with 'yes' or 'no' " "(or 'y' or 'n').\n")




@dataclass
class Table:
"""Class that wraps dynamodb and simplifies pagination as well as counting."""


region_name: str
table_name: str
_counter: Optional[Counter] = None


def __str__(self):
out = "\n" + ("=" * 80) + "\n"
for key, value in self.counter.items():
out += "{:<20} {:<2}\n".format(key, value)
return out


def str_table(self):
keys = list(self.counter.keys())
# Set the names of the columns.
fmt = "{:<20} " * len(keys)


return f"\n\n{fmt}\n".format(*keys) + f"{fmt}\n".format(
*list(self.counter.values())
)


@cached_property
def counter(self):
if not self._counter:
self._counter = Counter()
return self._counter


@cached_property
def client(self):
return boto3.client("dynamodb", region_name=self.region_name)


@cached_property
def table(self):
dynamodb = boto3.resource("dynamodb", region_name=self.region_name)
return dynamodb.Table(self.table_name)


@property
def items(self):
response = self.table.scan()
self.counter["Fetched Pages"] += 1
data = response["Items"]


with tqdm(desc="Fetching pages") as pbar:
while "LastEvaluatedKey" in response:
response = self.table.scan(
ExclusiveStartKey=response["LastEvaluatedKey"]
)
self.counter["Fetched Pages"] += 1
data.extend(response["Items"])
pbar.update(500)


self.counter["Fetched Items"] = len(data)
return data


@cached_property
def item_count(self):
response = self.client.describe_table(TableName=self.table_name)
breakpoint()
count = int(response["Table"]["ItemCount"])
self.counter["Total Rows"] = count
return count




def delete_item(table, item):
return table.table.delete_item(
Key={
"tim_id": item["tim_id"],
}
)




def update_item(table: Table, item: Dict, ttl: int):
return table.table.update_item(
Key={"tim_id": item["tim_id"]},
UpdateExpression="set #t=:t",
ExpressionAttributeNames={
"#t": "ttl",
},
ExpressionAttributeValues={
":t": ttl,
},
ReturnValues="UPDATED_NEW",
)




def main():
setup_logging()
args = parse_args()


if not query_yes_no(
f"Performing batch operations with {args.profile}; is this correct?"
):
sys.exit(1)


sys.stdout.write(f"Setting up connection with {args.profile}\n")


boto3.setup_default_session(profile_name=args.profile)


table = Table(region_name="us-west-2", table_name="TimManager")


now = datetime.utcnow().replace(microsecond=0).astimezone(timezone.utc)
buffer = timedelta(days=7)


# @TODO list comprehension
to_update = []
to_delete = []


for item in tqdm(table.items, desc="Inspecting items"):
ttl_dt = isoparse(item["delivery_stop_time"])
if ttl_dt > now - buffer and not args.force_delete_all:
to_update.append(item)
else:
to_delete.append(item)


table.counter["Identified for update"] = len(to_update)
table.counter["Identified for delete"] = len(to_delete)
table.counter["Performed Update"] = 0
table.counter["Performed Delete"] = 0


if to_update and query_yes_no(
f"Located {len(to_update)} records to update with {args.profile}"
):
for item in tqdm(to_update, desc="Updating items"):
if not args.dryrun:
ttl_dt = isoparse(item["delivery_stop_time"])
response = update_item(table, item, int((ttl_dt + buffer).timestamp()))
if response.get("ResponseMetadata", {}).get("HTTPStatusCode") == 200:
table.counter["Updated"] += 1


if to_delete and query_yes_no(
f"Located {len(to_delete)} records to delete with {args.profile}"
):
for item in tqdm(to_delete, desc="Deleting items"):
if not args.dryrun:
table.counter["Deleted"] += 1
response = delete_item(table, item)
if response.get("ResponseMetadata", {}).get("HTTPStatusCode") == 200:
table.counter["Deleted"] += 1


sys.stdout.write(str(table))




if __name__ == "__main__":
main()

You can achieve the same by doing the following,

  1. Open Dynamodb and click on the tables option in the left sidebar menu.
  2. Search your table by name and click on your table
  3. Now select the orange button named Explore table items
  4. Scroll down and Click on Create item
  5. Now you will see an editor with JSON Value, Click on Form button on the right side to add a new column and its type.

Note: this will insert 1 new record and you can see now the new column as well.