> For the complete documentation index, see [llms.txt](https://cifar.gitbook.io/note/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://cifar.gitbook.io/note/kai-fa/python-zhi-shi.md). # python 知识 ## 环境相关问题 #### [【vscode的Python插件的坑】python3.6 按F5调试没反应一闪而过【已解决】](https://blog.csdn.net/weixin_39916966/article/details/125737069) > 并关掉插件自动更新功能

docker 内 miniconda python 环境配置 ![](/files/Z98R83a1Vd5OVE0S333B) 方法：python加到了Path里而已

1、[launch.json vscode 调试带参数程序，指定GPU, 指定python解释器](https://blog.csdn.net/Answer3664/article/details/111992151) 2、[vscode 选择python解释器](https://blog.csdn.net/chaipp0607/article/details/119000497) ``` pip3 install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple ``` ## [总教程](https://github.com/jackfrued/Python-100-Days)：[**面向对象进阶**](https://github.com/jackfrued/Python-100-Days/blob/master/Day01-15/09.%E9%9D%A2%E5%90%91%E5%AF%B9%E8%B1%A1%E8%BF%9B%E9%98%B6.md) Y分钟速成X, 其中 X=Python [https://learnxinyminutes.com/docs/zh-cn/python-cn/]() ## 代码demo ### 基本数据结构

元组

string、list 和 tuple 都属于 sequence（序列）。 **注意：** * 1、与字符串一样，元组的元素**不能修改**。 * 2、元组也可以被索引和切片，方法一样。 * 3、注意构造包含 0 或 1 个元素的元组的特殊语法规则。 * 4、元组也可以使用+操作符进行拼接。 * 虽然tuple的元素不可改变，但它[可以包含可变的对象](https://blog.csdn.net/qq_29168809/article/details/103384840)，比如list列表。

### [list](https://www.runoob.com/python/python-lists.html) [list.extend(seq) 方法](https://www.runoob.com/python/att-list-extend.html)：在列表末尾一次性追加另一个序列中的多个值（用新列表扩展原来的列表）。 ### python 分批 ```python for i in range(0, len(total_infer_data), BATCH_SIZE): batch_infer_data = total_infer_data[i:i + BATCH_SIZE] ``` ### [迭代](https://www.liaoxuefeng.com/wiki/1016959663602400/1017316949097888) Iteration > 在Python中，迭代是通过`for ... in`来完成的，而很多语言比如C语言，迭代`list`是通过下标完成 > > 可以看出，Python的`for`循环抽象程度要高于C的`for`循环，因为Python的`for`循环不仅可以用在`list`或`tuple`上，还可以作用在其他**可迭代对象**上。 > > `list`这种数据类型虽然有下标，但很多其他数据类型是没有下标的，但是，**只要是可迭代对象**，无论有无下标，都可以迭代， ### [**列表生成式**](https://www.liaoxuefeng.com/wiki/1016959663602400/1017317609699776) **一行式** > \[x \* x for x in range(1, 11)] > > output ：\[1, 4, 9, 16, 25, 36, 49, 64, 81, 100] > > 写列表生成式时，把要生成的元素x \* x放到前面，后面跟for循环，就可以把list创建出来 > > **带 if** > > \[x for x in range(1, 11) if x % 2 == 0] > > 但是，我们不能在最后的`if`加上`else`**,** 因为跟在`for`后面的`if`是一个**筛选条件**，不能带`else`，否则如何筛选？ > > **把`if`写在`for`前面必须加`else`** > > 因为`for`前面的部分是一个表达式，它必须根据`x`计算出一个结果。 > > **总结** > > 可见，在一个列表生成式中，`for`前面的`if ... else`是表达式，而`for`后面的`if`是过滤条件，不能带`else`。 ### [**生成器**](https://www.liaoxuefeng.com/wiki/1016959663602400/1017318207388128) 列表**元素**可以按照某种规则推算出来，那我们是否可以在循环的过程中**不断推算**出后续的元素呢？这样就不必创建完整的list，从而节省大量的空间。在Python中，这种**一边循环一边计算的机制**，称为生成器：generator。我们讲过，**generator保存的是算法**，每次调用`next(g)`，就计算出`g`的下一个元素的值，直到计算到最后一个元素，没有更多的元素时，抛出`StopIteration`的错误。 > generator非常强大。如果**推算的算法比较复杂**，用类似列表生成式的`for`循环**无法实现的时候**，还可以**用函数来实现**。 > > `fib`函数实际上是定义了斐波拉契数列的推算规则，可以从第一个元素开始，推算出后续任意的元素，这种逻辑其实非常类似generator。 > > 要把`fib`函数变成generator函数，**只需要**把`print(b)`改为`yield b`就可以了 > > 如果一个函数定义中包含`yield`关键字，那么这个函数就不再是一个普通函数，而是一个**generator函数**，调用一个generator函数将返回一个generator > > generator函数和普通函数的执行流程不一样。普通函数是顺序执行，遇到`return`语句或者最后一行函数语句就返回。而变成generator的函数，在每次调用`next()`的时候执行，遇到`yield`**语句返回**，再次执行时从上次返回的`yield`语句处继续执行。 > > 用`for`循环调用generator时，发现拿不到generator的`return`语句的返回值。如果想要拿到返回值，必须捕获`StopIteration`错误，**返回值包含**在`StopIteration`的`value`中

yield

#### **我们可以得出以下结论：** 一个带有 yield 的函数就是一个 generator，它和普通函数不同，生成一个 generator 看起来像**函数调用**，但**不会执行任何函数代码**，直到**对其调用 next()**（在 for 循环中会自动调用 next()）才开始执行。虽然执行流程仍按函数的流程执行，但每执行到一个 yield 语句就会**中断**，**并返回**一个迭代值，下次执行时从 yield 的下一个语句继续执行。看起来就好像一个函数在正常执行的过程中被 yield 中断了数次，每次中断都会通过 yield 返回当前的迭代值。 yield 的好处是显而易见的，把一个**函数**改写为一个 generator 就获得了**迭代能力**，比起用类的**实例保存状态**来计算下一个 next() 的值，不仅**代码简洁**，而且执行**流程异常清晰**。 #### return 的作用在一个 generator function 中，如果没有 return，则默认执行至函数完毕，如果在**执行过程中 return，则直接抛出 StopIteration 终止迭代。** #### 另一个例子另一个 yield 的例子来源于文件读取。如果直接对文件对象调用 read() 方法，会**导致不可预测的内存占用**。好的方法是利用**固定长度的缓冲区**来不断读取文件内容。通过 yield，我们不再需要编写读文件的迭代类，就可以轻松实现文件读取

> #### [send() 方法](https://blog.csdn.net/qq_28915777/article/details/108186963) > > 总结 > > * **当有如 y = yield x 这样的句子时**，首先执行等号右半部分，再将结果赋给 y，但显然执行完yield就被冻结了，赋值只能在下一次运行生成器时执行，而下一次初始值为None。 > * send方法**可以为生成器传值**，send(None) 等价于 Next（） > * 第一次运行生成器时如果用send方法，参数只能为None ### [**迭代器**](https://www.liaoxuefeng.com/wiki/1016959663602400/1017323698112640) > 可以直接作用于`for`循环的对象统称为可迭代对象：`Iterable`。 > > 可以被`next()`函数调用并不断返回下一个值的对象称为迭代器：`Iterator`。 > > 生成器都是`Iterator`对象，但`list`、`dict`、`str`虽然是`Iterable`，却不是`Iterator`。 > > 把`list`、`dict`、`str`等`Iterable`**变成**`Iterator`可以使用`iter()`函数 > > 你可能会问，为什么`list`、`dict`、`str`等数据类型不是`Iterator`？ > > 这是因为Python的`Iterator`对象表示的是一个数据流，Iterator对象可以被`next()`函数调用并不断返回下一个数据，直到没有数据时抛出`StopIteration`错误。可以把这个数据流看做是一个有序序列，但**我们却不能提前知道序列的长度**，只能不断通过`next()`函数实现按需计算下一个数据，所以`Iterator`的**计算是惰性的**，只有在需要返回下一个数据时它才会计算。 > > `Iterator`甚至可以表示一个无限大的数据流，例如全体自然数。而使用list是永远不可能存储全体自然数的。（内存条原因） > > 在Python中，迭代器（生成器， iterator）在Python中是一种很常用也很好用的数据结构，比起列表(list)来说，迭代器最大的优势就是延迟计算，按需使用，从而提高开发体验和运行效率，以至于在Python 3中**map,filter**等操作返回的不再是列表而是迭代器，所以，对于**读取大文件或者无限集合**，最好是使用迭代器。 ### python \* Python [星号表达式(starred expression)](https://blog.csdn.net/DawnRanger/article/details/78028171) > 出现在函数的参数中的星号表达式 `*args` ,用于将传入的可迭代参数序列解析出来，并存入args中 > > def fun1(\*args, \*\*kwargs): > > print(args, kwargs) Python [函数传参方法超级大汇总](https://zhuanlan.zhihu.com/p/132693168) > 命名关键字参数 > > 与普通关键字参数不同，命名关键字参数必须用\*进行区分，\*后面的参数必须通过关键字传入 > > ``` > #示例6 > def func(a,b,*,c): > print("args:",a,b,c) > if __name__ == "__main__": > func(2,3,c=4) #输出结果 args: 2 3 4 > ``` #### 可变参数如果定义的函数中带有\*args或者\*\*args都是属于可变参数方式进行参数传入 > \*args是以元组的方式收集不匹配的的位置参数 > > \*\*args是以字典的形式收集不匹配的位置参数，仅对关键字参数传入有效 > > 跟\*args一样，函数中的\*\*args必须放在位置参数后面。 ### 正则表达式 ai 时代： [教程](https://github.com/cdoco/learn-regex-zh) [可视化网站](https://regexr.com/) [在线](https://c.runoob.com/front-end/854?optionGlobl=global) [代码示例](https://blog.csdn.net/weixin_42793426/article/details/88545939) FlashText ：[比正则快 M 倍以上！Python 替换字符串的新姿势](https://mp.weixin.qq.com/s/am8Tat3Z3OnkQ4eN5r9Ztg) [知乎汇总](https://www.zhihu.com/question/48219401/answer/2266599494) ```python import os import re from pathlib import Path f_list = os.listdir(".") s_path = Path(__file__) f_list.remove(s_path.name) # print(f_list) for i in f_list: regex = re.compile('\d+') x = regex.findall(i) if len(x) != 0: num = x[0] dst = f"放傲骨贤妻第一季_{num}.ts" print(f"{i}, {dst}") os.rename(i, dst) ``` #### python获取ffmpeg输出的视频时间 ```python class FFmpegGetVideoTimesHandler(BaseRequestHandler): def post(self): ffmpeg_select = self.get_argument('ffmpeg_select') result = subprocess.Popen("ffmpeg -i "+ ffmpeg_select, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, shell=True) dictionary = {} for x in result.stdout.readlines(): if b"Duration" in x: x = re.search(rb"Duration.+?(\d{2}):(\d{2}):(\d{2})", x) # print(x.group(1),x.group(2),x.group(3),) dictionary['times'] = [str(x.group(1))[2:-1],str(x.group(2))[2:-1],str(x.group(3))[2:-1]] dictionary_json = json.dumps(dictionary) self.finish(dictionary_json) ``` ### 文件格式化 6 位 000001 不再采用自写函数封装。 ```python a = 1 print( " %08d " % int(a)) # 0 代表不足的以0补齐 ``` ### **struct** > 准确地讲，[Python没有专门处理字节的数据类型](https://www.liaoxuefeng.com/wiki/1016959663602400/1017685387246080) > > 好在Python提供了一个`struct`模块来解决`bytes`和其他二进制数据类型的转换。 ```python import struct src = "123.bmp" with open(src , "rb") as f: sss = f.read(30) aaa = struct.unpack(' byte就是字节 1byte=8bit ， python 的 ”rb“ 就是读字节，8bit。 > > 再依据对应文件的结构顺序解析，如 BMP格式采用小端方式存储数据，文件头的结构按顺序。正常而言，107406 的 4字节整数 unsigned int， ```python import struct aaa = struct.pack('>I', 107406) print(aaa) # output b'\x00\x01\xa3\x8e' ``` > 二进制为： 00000000 00000001 10100011 10001110

> > 正常存储为 \x00 \x01 \xa3 \x8e > > 对应十六进制：1a38e

> > 小端存储为 \x8e\xa3\x01\x00 （理解为原本正常字节颠倒顺序。）理解大端小端： [字节序探析：大端与小端的比较](https://www.ruanyifeng.com/blog/2022/06/endianness-analysis.html) [struct.unpack的用法](https://blog.csdn.net/gracioushe/article/details/5915900) ， [格式化数据类型字符参考](https://docs.python.org/zh-cn/3.9/library/struct.html#format-characters) ，输出结果使用参考：[bmpinfo.py](https://blog.csdn.net/qq_41800366/article/details/85801009) ### json [json.dump()与json\_dumps()区别](https://blog.csdn.net/lizhixin705/article/details/82344209) ### 相对位置解释 ```python images = os.listdir(r'./') ``` 为命令行所在路径。同代码文件存放位置无关。 ### 原地游标刷新 [Python 实现秒表功能](https://www.runoob.com/python3/python-simplestopwatch.html) ### lambda 对列表(list)和字典(dict)排序 [链接1](https://www.cnblogs.com/shgq0811/p/11142855.html) [链接2](https://www.polarxiong.com/archives/Python-%E4%BD%BF%E7%94%A8lambda%E5%BA%94%E5%AF%B9%E5%90%84%E7%A7%8D%E5%A4%8D%E6%9D%82%E6%83%85%E5%86%B5%E7%9A%84%E6%8E%92%E5%BA%8F-%E5%8C%85%E6%8B%AClist%E5%B5%8C%E5%A5%97dict.html) ### python 利用字典去重和计数 ```python import time import json from collections import defaultdict # 过滤结果, 去重和计数 def filter_result(all_result_list): s_time = time.strftime('%Y%m%d',time.localtime(time.time())) with open(f"{s_time}.json", "w") as f: json.dump(all_result_list, f, indent=4) # with open(f"{s_time}.json", "r") as f: # all_result_list = json.load(f) # 数据根据data_id去重, 并记录出现次数 mind_dict = {} # 利用字典去重 mind_dict_num = defaultdict(int) # 计算出现次数 for per in all_result_list: key = per["data_id"] conf = per["conf"] mind_dict_num[key] += 1 try: a = mind_dict[key] mind_dict[key] = a if a > conf else conf # 相同tag,取最大分 except: mind_dict[key] = conf ``` ### 扩展数据结构 [namedtuple(具名元组)](https://www.runoob.com/note/25726) [双向队列 deque](https://blog.csdn.net/chl183/article/details/106958004) 是"double-end queue"的简称； ### [如何学Python？](https://www.kawabangga.com/how-to-learn-python) Python的面试题1: [语法点](https://github.com/taizilongxu/interview_python) Python的面试题2： [面试题](https://github.com/kenwoodjw/python_interview_question) 谈谈[Python for循环的作用域](https://www.kawabangga.com/posts/2632) Python [LEGB规则](https://www.jianshu.com/p/3b72ba5a209c) 理解[Python对象的属性和描述器](https://www.kawabangga.com/posts/2302) ### class

class

**self :** [**表示实例化类后的地址id**](https://www.runoob.com/python/python-func-classmethod.html) ```python class A(object): # 属性默认为类属性（可以给直接被类本身调用） num = "类属性" # 实例化方法（必须实例化类之后才能被调用） def func1(self): # self : 表示实例化类后的地址id print(self) # 类方法（不需要实例化类就可以被类本身调用） @classmethod def func2(cls): # cls : 表示没用被实例化的类本身 print("func2") print(cls) print(cls.num) cls().func1() # 不传递传递默认self参数的方法（该方法也是可以直接被类调用的，但是这样做不标准） def func3(): print("func3") print(A.num) # 属性是可以直接用类本身调用的 # A.func1() 这样调用是会报错：因为func1()调用时需要默认传递实例化类后的地址id参数，如果不实例化类是无法调用的 a = A() print(A) print(a) a.func1() print(a.func1) print(a.func2) # output <__main__.A object at 0x0000017C89EFFDC0> <__main__.A object at 0x0000017C89EFFDC0> > > ``` #### 注意类属性和init属性的区别 init 实例化才会有的变量 ``` class A(object): # 属性默认为类属性（可以给直接被类本身调用） num = "类属性" def __init__(self) -> None: self.name = "Jerry" self.age = 18 # 不传递传递默认self参数的方法（该方法也是可以直接被类调用的，但是这样做不标准） def func3(): print("func3") print(A.num) # 属性是可以直接用类本身调用的 print(A.num) print(A.name) print(A.age) # output 类属性 Traceback (most recent call last): File "c:\Users\18618\Desktop\木链\temp\test3.py", line 15, in print(A.name) AttributeError: type object 'A' has no attribute 'name' ```

### 类特殊成员（属性和方法）自定义类的print显示属性 [ \_\_**repr\_\_**()方法：显示属性](http://c.biancheng.net/view/2367.html) \_\_new\_\_() [方法详解1](https://blog.csdn.net/sj2050/article/details/81172022) [方法详解2](http://c.biancheng.net/view/5484.html) ### 修饰符 ### [classmethod](https://www.runoob.com/python/python-func-classmethod.html) > **classmethod** 修饰符对应的函数不需要实例化，不需要 self 参数， > > 但**第一个参数需要是表示自身类的 cls 参数**，可以来**调用自身类**的属性，类的方法，实例化对象等。 > > 修饰符对应的函数不需要实例化, 个人理解类似 c++ 内联函数。 ### python 处理 gif ```python from PIL import Image import os """ 将一张GIF动图分解到指定文件夹 src_path：要分解的gif的路径 dest_path：保存后的gif路径 """ def gifSplit(src_path, dest_path, suffix="png"): img = Image.open(src_path) for i in range(img.n_frames): img.seek(i) new = Image.new("RGBA", img.size) new.paste(img) # 保存图片到本地 new.save(os.path.join(dest_path, "%d.%s" %(i, suffix))) # 或返回cv2读取结果 #cv2_img = cv2.cvtColor(np.asanyarray(new), cv2.COLOR_RGB2BGR) #return cv2_img gifSplit('tiga.gif', r'./pics') """ seek 查找此序列文件中的给定帧。如果你寻求超出序列的末尾，该方法引发了一个``EOFError``异常。打开序列文件时，库自动寻找第 0 帧 """ ``` ```python img_pil = Image.open("/path/to/gif").convert('RGB') img_pil.seek(0) img = cv2.cvtColor(np.asarray(img_pil), cv2.COLOR_RGB2BGR) @贾瑞这三行即可把GIF转为opencv BGR图 ``` ### python base64 [opencv图像Base64相互转换](https://blog.csdn.net/weixin_41967600/article/details/119543964) > 注意一下， data:image/png;base64,iVBORw0\*\*\*\*\* > > python 版本 base64 解码不解析 image/png;base64, 所以要通过截取取逗号后面的值 ``` try: origStr = base64_code.split(",")[1] except: app_log.error(f"base64_code: {base64_code[:20]}") if(len(origStr)%3 == 1): origStr += "==" elif(len(origStr)%3 == 2): origStr += "=" # base64解码 img_data = base64.b64decode(origStr) ``` ### PIL和opencv处理图片的差异 #### 1、格式差异使用opencv读取图像之后是BGR格式的，使用PIL读取图像之后是RGB格式的。 #### 2、数据可操作差异 PIL ：注意PIL读进来.size只显示HW两个维度 Image.open()函数只是保持了图像被读取的**状态**，但是图像的真实数据并未被读取，需要配合numpy.array()。 ```python I= Image.open(img_path) # PIL读取通道的顺序是RGB 尺寸是HWC print(np.array(I).shape) ``` OpenCV: Get image size (width, height) with **ndarray.shape** 读进来的图像可以直接进行pixel操作 ```python cv2_img = cv2.imread('./data/1.jpg') # numpy数组，元素类型是uinit8 0~255 读取通道的顺序是BGR 尺寸是HWC ``` #### 3、相互转换 ```python #将PIL类型转成numpy类型, numpy数组格式（H，W，C=3） np_img = np.asanyarray(PIL_img) # 将PIL类型转成numpy类型，数据类型是uint8, (H, W, C) # 其他转换方式 #PIL.Image转Opencv cv2_img = cv2.cvtColor(numpy.asarray(Img_img), cv2.COLOR_RGB2BGR) #Opencv转PIL.Image pil_img = Image.fromarray(cv2.cvtColor(cv_img,cv2.COLOR_BGR2RGB)) ``` ### cv2 resize demo > 图片缩放脚本，图片输入给SDK，防止图片过大，显存爆掉 > > 以后最好是python读图，包括GIF，再用从内存读的接口推理，这样灵活些, 还有过大图片的预处理 ```python import cv2 import os import tqdm root = "image3-rename" files = [] for fn in os.listdir(root): files.append(os.path.join(root, fn)) max_width_height = 1000 max_width_height = 1000 # 读图 removes = [] for pth in tqdm.tqdm(files): try: img1 = cv2.imread(pth) if img1 is None: removes.append(pth) continue # 保留长宽比缩放至最大输入大小 if img1.shape[0] > max_width_height or img1.shape[1] > max_width_height: # 按长边，缩放至max_width_height，为应对横竖版图片 ratio = min(img1.shape[0] / img1.shape[1], img1.shape[1] / img1.shape[0]) if img1.shape[0] >= img1.shape[1]: img1 = cv2.resize(img1, (int(ratio * max_width_height), max_width_height)) else: img1 = cv2.resize(img1, (max_width_height, int(ratio * max_width_height))) cv2.imwrite(pth, img1) except: removes.append(pth) continue for f in removes: os.remove(f) ``` ### opencv读图异常处理 ```python def load_image(img_path, max_width_height=1000): img = cv2.imread(img_path) # opencv读图异常处理 if img is None: try: img = Image.open(img_path) img.seek(0) new = Image.new("RGBA", img.size) new.paste(img) img = cv2.cvtColor(np.asanyarray(new), cv2.COLOR_RGB2BGR) except: # print("Fail {}".format(img_path)) img = np.zeros(shape=(8, 8, 3), dtype=np.uint8) # print("{}, {}".format(img_path, img.shape)) # 图像过小 if img.shape[0] == 0 or img.shape[1] == 0: img = np.zeros(shape=(8, 8, 3), dtype=np.uint8) # 长宽比异常 if img.shape[0] / img.shape[1] > 5: img = img[:img.shape[1], :img.shape[1], :] if img.shape[1] / img.shape[0] > 5: img = img[:img.shape[0], :img.shape[0], :] # 图片过大 if img.shape[0] > max_width_height or img.shape[1] > max_width_height: # 按长边，等比缩放至max_width_height ratio = min(img.shape[0] / img.shape[1], img.shape[1] / img.shape[0]) if img.shape[0] >= img.shape[1]: new_width = int(ratio * max_width_height) if new_width <= 1: img = np.zeros(shape=(8, 8, 3), dtype=np.uint8) else: img = cv2.resize(img, (new_width, max_width_height)) else: new_height = int(ratio * max_width_height) if new_height <= 1: img = np.zeros(shape=(8, 8, 3), dtype=np.uint8) else: img = cv2.resize(img, (max_width_height, new_height)) # 图像过小 if img.shape[0] <= 1 or img.shape[1] <= 1: img = np.zeros(shape=(8, 8, 3), dtype=np.uint8) # 内存不连续 if not img.flags['C_CONTIGUOUS']: img = np.ascontiguousarray(img, dtype=img.dtype) # 返回结果 return img ``` ### shutil demo ```python #coding: utf-8 import os import random import shutil # 原始数据 xty = "" # 通讯院图片地址 rmzk = "" # 自有图片地址 dst = "" # 输出图片地址 num = 500 # 遍历目录 xty_files = [] for root, dirs, fns in os.walk(xty): for fn in fns: xty_files.append(os.path.join(root, fn)) rmzk_files = [] for root, dirs, fns in os.walk(rmzk): for fn in fns: rmzk_files.append(os.path.join(root, fn)) # 随机数 xty_test_files = random.sample(xty_files, num) rmzk_test_files = random.sample(rmzk_files, num) # 拷贝 for p in xty_test_files: shutil.copy(p, dst) for p in rmzk_test_files: shutil.copy(p, dst) ``` ### numpy #### 计算相似度比对 A dot B , eg: A,1000×512 B,1×512 B·A(T) = 1×1000 ```python # 相似度比对 #例子1 # 主图特征 feat_dim = ivcpd.GetFeatureSize() query_feat_ptr_value = GetSharedPtrAddrValue(img_info_vec[0].feature_) query_feat_ctypes_ptr = ctypes.cast(query_feat_ptr_value, ctypes.POINTER(ctypes.c_float)) query_feat_npy = np.copy(np.ctypeslib.as_array(query_feat_ctypes_ptr, shape=(feat_dim, ))) # type: numpy.ndarray # 商品图特征 db_feats_np = np.ndarray(dtype=np.float32, shape=(len(img_info_vec) - 1, 256)) for db_img_idx in range(1, len(img_info_vec)): db_feat_ptr_value = GetSharedPtrAddrValue(img_info_vec[db_img_idx].feature_) db_feat_ctypes_ptr = ctypes.cast(db_feat_ptr_value, ctypes.POINTER(ctypes.c_float)) db_feats_np[db_img_idx - 1] = np.copy(np.ctypeslib.as_array(db_feat_ctypes_ptr, shape=(feat_dim, ))) # 计算相似度 sims = np.dot(query_feat_npy, db_feats_np.T) sims_indexes = [] for idx, sim in enumerate(sims): sims_indexes.append((sim, idx)) sims_indexes = sorted(sims_indexes) sims_indexes.reverse() #例子2 import numpy as np #eg: random_a = np.random.random((1000,512)) random_b = np.random.random((2,512)) print(random_a.shape) print(random_b.shape) query_feat_npy = random_b db_feats_np = random_a # 计算相似度 sims = np.dot(query_feat_npy, db_feats_np.T) sims_indexes = [] for idx, sim in np.ndenumerate(sims): sims_indexes.append((sim, idx)) sims_indexes = sorted(sims_indexes) sims_indexes.reverse() print(sims_indexes) ``` ### python 线程 threading 注意： 1、如何获取线程返回结果，仅能通过参数，预定义好结果存储变量，在线程内赋值，从而间接拿到结果。 2、如果所需结果需要保顺序，可以预定义一个结构，实现定义好顺序，线程结束后再拼接所需顺序。 ```python import threading BATCH_SIZE = 100 save_dict = {} threads = [] code_list = [] for idx, i in enumerate(list(range(0, len(data_ids), BATCH_SIZE))): inner_b_datas = data_ids[i:i + BATCH_SIZE] save_dict[idx] = [] thread = threading.Thread(name='{idx}',target=Model.thread_milvusOps_query_by_info, args=(inner_b_datas, group_id, idx, save_dict, code_list)) thread.start() threads.append(thread) app_log.info(f" thread start, idx:{idx}") for i in range(len(threads)): threads[i].join() for j in range(len(threads)): b_feature_list += save_dict[j] for per_code in code_list: if per_code != 200: code = per_code else: code = per_code ``` ### Queue > 有queue的库众多： > > from ray.util.queue import Queue python的多进程/线程 Queue存取性能很低， 1. 减少队列元素大小。尤其不能将图片数据存入Queue中，最好是存路径等简单数据，避免不必要数据拷贝。 2. 减少Queue的读取次数。可以通过打包的方式，增加一次读取出的数据量。比如128张图片路径写入一个json，然后插入队列中，而不是插入128次。大家以后开发注意下。 ### Python [seek()和tell()函数详解](http://c.biancheng.net/view/4780.html) ### 代码中退出[os.exit()](https://www.jb51.net/article/116968.htm) | 代码 | 作用 | | ---------- | ------------------------- | | sys.exit() | 线程退出，当只有一个进程只有一个线程，完全退出程序 | | os.exit() | 进程退出，完全退出程序 | ## 第三方库学习 ### 数据库相关 #### 安装安装常用数据库py包 * pip3 install pymilvus * pip3 install mysqlclient ([解决OSError: mysql\_config not found问题](https://blog.csdn.net/zy_whynot/article/details/106960087)) * pip3 install dbutils #### 用法 #### mysql > import MySQLdb > > from dbutils.pooled\_db import PooledDB MySQLdb安装问题：（centos7） > yum install mysql-devel gcc ( gcc-devel python-devel ) > > pip3 install mysqlclient #### Postgre 连接安装在服务器的数据库，图片

，先通过ssh连接服务器。 #### [redis](https://www.runoob.com/redis/redis-data-types.html) [Redis特点博文](https://mp.weixin.qq.com/s/EmIhZaXrrTiUsoh1bs5VJw) [Redis 常见问题](https://mp.weixin.qq.com/s/o2zmKMd9xwEQnp081otZyw) [python使用教程](https://www.runoob.com/w3cnote/python-redis-intro.html) [开即自启，添加服务](https://www.cnblogs.com/yunqing/p/10605934%20.html) [查看服务](https://www.cnblogs.com/kevin-yang123/p/9946808.html) 连接安装在服务器本地的redis 图片：

，通过ssh 先连接到服务器本地，再进入。命令行操作：[redis-cli](https://www.runoob.com/redis/redis-commands.html) 列出定义了键的数据库 : INFO keyspace 最多 0\~15 ，共16个db : CONFIG GET databases ```python import redis redis = redis.StrictRedis(host='localhost', port=6379, db='0', password='Rmzk@1234', decode_responses=True) # 和行行demo def run(self): ''' 消费redis里面的数据 :return ''' while 1: data_raw = [Redis1.spop("RMZK:CRAWLER:XML:data") for i in range(100)] if not data_raw: continue pool = threadpool.ThreadPool(1) requests = threadpool.makeRequests(self.paring, data_raw) [pool.putRequest(req) for req in requests] pool.wait() # spop 语法参照菜鸟教程 ``` 在某个db中分文件夹：其实很简单，只用在存储数据时，键值对中的键命名以冒号分开即可：命名空间：key。例如，vehicle:car1，vehicle:car2。查了下,这么说的.也就是说把key分成了几层吧。果然是叫命名空间吗新生代：[**ClickHouse** ](https://mp.weixin.qq.com/s/HClINM69QxGZm8aWhhMXEA) #### [kafka ](https://blog.csdn.net/luanpeng825485697/article/details/81036028) （咱们标准化服务要不要考虑把kafka换成Pulsar? ） ### time 时间库 [delta](https://www.baidu.com/s?ie=UTF-8\&wd=delta) - Δ 时间间隔 [datetime.timedelta类](https://blog.csdn.net/sunjinjuan/article/details/79113120) [相互转换](https://www.cnblogs.com/alfred0311/p/7885349.html) ### Numpy #### [numpy中的数据类型转换，不能直接改原数据的dtype! 只能用函数astype()。](https://www.cnblogs.com/hhh5460/p/5129032.html) python numpy 默认为8位float，而 c++取 4位 float。导致数值不同。 ### Pandas [5种创建Dataframe方法](https://blog.csdn.net/u010199356/article/details/85697860) json与dataframe的互相转换 [链接](https://blog.csdn.net/qq_41780234/article/details/84990551) ### Gevent gevent是基于协程的Python网络库 [Gevent简明教程](https://www.jianshu.com/p/4dca99ffc0b4) ### [asyncio](https://docs.python.org/zh-cn/3/library/asyncio.html) 协程 > 意义：在一个线程（协程）中，遇到io等待时间，线程可以利用这个等待时间去做其他事情。

进程线程协程异步 -- 对比

#### 进程线程协程异步并发编程（不是并行）目前有四种方式：多进程、多线程、协程和异步。 * 多进程编程在python中有类似C的os.fork,更高层封装的有multiprocessing标准库 * 多线程编程python中有Thread和threading * 异步编程在linux下主+要有三种实现select，poll，epoll * 协程在python中通常会说到yield，关于协程的库主要有greenlet,stackless,gevent,eventlet等实现。 **进程** * 不共享任何状态 * 调度由操作系统完成 * 有独立的内存空间（上下文切换的时候需要保存栈、cpu寄存器、虚拟内存、以及打开的相关句柄等信息，开销大） * 通讯主要通过信号传递的方式来实现（实现方式有多种，信号量、管道、事件等，通讯都需要过内核，效率低） **线程** * 共享变量（解决了通讯麻烦的问题，但是对于变量的访问需要加锁） * 调度由操作系统完成（由于共享内存，上下文切换变得高效） * 一个进程可以有多个线程，每个线程会共享父进程的资源（创建线程开销占用比进程小很多，可创建的数量也会很多） * 通讯除了可使用进程间通讯的方式，还可以通过共享内存的方式进行通信（通过共享内存通信比通过内核要快很多） **协程** * 调度完全由用户控制 * 一个线程（进程）可以有多个协程 * 每个线程（进程）循环按照指定的任务清单顺序完成不同的任务（当任务被堵塞时，执行下一个任务；当恢复时，再回来执行这个任务；任务间切换只需要保存任务的上下文，没有内核的开销，可以不加锁的访问全局变量） * 协程需要保证是非堵塞的且没有相互依赖 * 协程基本上不能同步通讯，多采用异步的消息通讯，效率比较高 **总结** * 进程拥有自己独立的堆和栈，既不共享堆，亦不共享栈，进程由操作系统调度 * 线程拥有自己独立的栈和共享的堆，共享堆，不共享栈，线程亦由操作系统调度(标准线程是的) * 协程和线程一样共享堆，不共享栈，协程由程序员在协程的代码里显示调度

Python [异步 async/await](https://blog.csdn.net/qq_43380180/article/details/111573642) > 我想要买50个土豆，每次从货架上拿走一个土豆放到篮子。当货架上的土豆不够的时候，这时只能够死等，而且在上面例子中等多长时间都不会有结果（因为一切都**是同步的**），也许可以**用多进程和多线程解决，而在现实生活中，更应该像是这样的**。当货架上的土豆没有了之后，我可以**询问**超市请求需要更多的土豆，这时候需要等待一段时间直到生产者完成生产的过程。当生产者完成和返回之后，这是便能从await**挂起的地方**继续往下跑，完成消费的过程。而这整一个过程，就是一个异步生成器迭代的流程 > > [异步async/await](https://www.cnblogs.com/tashanzhishi/p/10774515.html)

买土豆代码

```python import time import random import asyncio class Potato: @classmethod def make(cls, num, *args, **kws): print(cls) print(num) potatos = [] for i in range(num): potatos.append(cls.__new__(cls, *args, **kws)) return potatos def take_potatos(num): count = 0 while True: if len(all_potatos) == 0: time.sleep(.1) print(time.time()) else: potato = all_potatos.pop() yield potato count += 1 if count == num: break async def ask_for_potato(): await asyncio.sleep(random.random()) all_potatos.extend(Potato.make(random.randint(1, 10))) async def take_potatos2(num): count = 0 while True: if len(all_potatos) == 0: await ask_for_potato() potato = all_potatos.pop() yield potato count += 1 if count == num: break def buy_potatos(): bucket = [] for p in take_potatos(50): bucket.append(p) async def buy_potatos2(): bucket = [] async for p in take_potatos2(50): bucket.append(p) print(f'Got potato {id(p)}...') all_potatos = Potato.make(5) # buy_potatos1() # buy_potatos2() def main(): loop = asyncio.get_event_loop() res = loop.run_until_complete(buy_potatos2()) loop.close() main() ```

### celery [Celery介绍和基本使用](https://zhuanlan.zhihu.com/p/64595171) [celery集合django使用](https://www.celerycn.io/fu-lu/django)

代码示例

[任务调度利器：Celery](https://www.liaoxuefeng.com/article/903701468278784) ````python ```python # tasks.py import time from celery import Celery celery = Celery('tasks', broker='redis://localhost:6379/8', backend ='redis://localhost:6379/9', ) @celery.task def sendmail(mail): print('sending mail to %s...' % mail['to']) # time.sleep(2.0) print('mail sent.') ``` ```` celery -A tasks worker --loglevel=info （启动Worker ） ````python ```python from tasks import sendmail from celery.result import AsyncResult result = sendmail.delay(dict(to='celery@python.org')) task_id = result.id print(result.id) result = AsyncResult(task_id) if result.ready(): print(result.result) else: print("任务还未完成") ``` ````

### ffmepg 安装：pip3 install ffmpeg-python ```python import ffmpeg video_path = os.path.join(root, path_part) info = ffmpeg.probe(video_path) time_dura = info['format']['duration'] print('time_dura', time_dura) ``` ### pickle --- Python 对象序列化 [官网](https://docs.python.org/zh-cn/3/library/pickle.html) [博客](https://www.cnblogs.com/baby-lily/p/10990026.html) ### MoviePy：[Python视频剪辑自动化](https://www.jb51.net/article/202423.htm) [Python将视频转Gif ](https://zhuanlan.zhihu.com/p/416844100) [教程](https://blog.csdn.net/ucsheep/article/details/80999939) ### tornado Tornado获取参数区分：头参数 from参数 body参数 ```python class CdfSearch(baseAPI): def post(self): # try: argument_data = self.request.body # self.get_body_argument() # #self.get_argument("data", []) # self.request.body() ``` ### django [Django 文档内容](https://docs.djangoproject.com/zh-hans/4.1/contents/) [一文读懂WSGI和ASGI](https://blog.csdn.net/p515659704/article/details/110411508)

nginx 分发示例

子nginx配置 ```nginx server{ client_max_body_size 100M; client_body_buffer_size 20M; proxy_connect_timeout 1m; server_tokens off; listen 443 ssl; server_name backend; index index.html index.htm; ssl_certificate /etc/nginx/cert/server.crt; ssl_certificate_key /etc/nginx/cert/server.key; ssl_session_timeout 5m; ssl_ciphers ECDHE-RSA-AES128-GCM-SHA256:ECDHE:ECDH:AES:HIGH:!NULL:!aNULL:!MD5:!ADH:!RC4; ssl_protocols TLSv1 TLSv1.1 TLSv1.2; ssl_prefer_server_ciphers on; location / { if ($request_uri = "/") { return 301 "/index"; } try_files $uri $uri/ @router; root /mnt/bolean/front; index index.html; } location /v2/ { include uwsgi_params; uwsgi_pass 127.0.0.1:10001; } location /ws/ { proxy_pass http://127.0.0.1:10002; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; proxy_redirect off; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Host $server_name; } location @router { rewrite ^.*$ /index.html last; } } ``` 分发两个端口supevisor子配置 ``` // asgi.conf [fcgi-program:asgi] # TCP socket used by Nginx backend upstream socket=tcp://127.0.0.1:10002 # Directory where your site's project files are located directory=/mnt/bolean/sdp-backend # Each process needs to have a separate socket file, so we use process_num # Make sure to update "mysite.asgi" to match your project name command=/mnt/bolean/venv/bin/daphne -u /run/daphne%(process_num)d.sock --fd 0 --access-log - --proxy-headers backend.asgi:application # Number of processes to startup, roughly the number of CPUs you have numprocs=1 # Give each process a unique name so they can be told apart process_name=asgi%(process_num)d # Automatically start and recover processes autostart=true autorestart=true # Choose where you want your log to go stdout_logfile=/home/bolean/logs/asgi.log redirect_stderr=true ``` ``` // backend.conf [program:backend] command=sh -c "(while ! pg_isready>/dev/null; do sleep 1; echo waiting pg; done) && /mnt/bolean/venv/bin/uwsgi --socket 127.0.0.1:10001 --master --die-on-term --process 4 --threads 2 --wsgi-file backend/wsgi.py --buffer-size 30000" stopasgroup=true killasgroup=true stopwaitsecs=30 user=bolean ; User to run as directory=/mnt/bolean/sdp-backend stdout_logfile=/home/bolean/logs/backend.log ; Where to write log messages autostart=true autorestart=true redirect_stderr=true ; Save stderr in the same log environment=LANG=en_US.UTF-8,LC_ALL=en_US.UTF-8,HOME="/home/bolean",USER="bolean" ; priority=10 ```

### pyinstaller 打包exe库 [博客](https://www.cnblogs.com/Wl55387370/p/13939881.html) > 省区对于代码小白而言的环境安装操作。 ### from multiprocessing import Lock, Process ### Ray ：[python分布式多进程框架 ](https://blog.csdn.net/luanpeng825485697/article/details/88242020) [官网教程](https://docs.ray.io/en/latest/ray-core/walkthrough.html) 分布式执行框架——Ray [简单使用](https://blog.csdn.net/weixin_43255962/article/details/84453520) ray.wait() : 批量的任务等待 [ray.wait方法代码示例](https://vimsky.com/examples/detail/python-method-ray.wait.html) 纯净天空-[技术博客](https://vimsky.com/examples/) ## python语法 ### [python中的del用法](https://blog.csdn.net/windscloud/article/details/79732014) python代码内存泄露（增长），可能由于垃圾回收机制未及时清理变量，可指定如：video\_frames=\[ ] ```python random_data = np.random.random((1, 256))[0] _range = np.max(random_data) - np.min(random_data) random_data_insert = (random_data - np.min(random_data)) / _range random_data_insert = random_data_insert.astype("float32") GenSharedPtrFromPtrValue(random_data_insert.ctypes.data, 256, frame_info.feature_) print(random_data_insert.ctypes.data) print(frame_info.feature_) feat = GetSharedPtrAddrValue(frame_info.feature_) feat_buffer = ctypes.cast(feat, ctypes.POINTER(ctypes.c_float)) feat_arr2 = np.copy(np.ctypeslib.as_array(feat_buffer, shape=(256, ))) print("before search :{}".format(feat_arr2)) ``` [`ctypes`](https://docs.python.org/zh-cn/3.9/library/ctypes.html#module-ctypes) 是 Python 的外部函数库。它提供了与 C 兼容的数据类型，并允许调用 DLL 或共享库中的函数。可使用该模块以纯 Python 形式对这些库进行封装。 ## 安装包的方式 * pip install * python3 -m pip install 包的名字（pip install 出错的时候，用这种） * 第三方编译 [python setup.py build\_ext --inplace](https://blog.csdn.net/xiqi4145/article/details/114216162) 导出依赖包 > Python中[requirements.txt文件的作用](https://www.jianshu.com/p/ee7a1bcf0937) > > [pipreqs（找当前项目依赖的包）](https://www.cnblogs.com/believepd/p/10423094.html) ## 导入环境变量执行rmzk sdk时，so文件连接问题 > export LD\_LIBRARY\_PATH=../lib:$LD\_LIBRARY\_PATH > > ( export PYTHONPATH=$PYTHONPATH (因为要链接（import）其他自己写的包) ) ## conda 【Python】[Python创建虚拟环境的三种方式](https://blog.csdn.net/ARPOSPF/article/details/113616988) * [环境安装](https://zhuanlan.zhihu.com/p/336429888) * conda [换源1](https://blog.csdn.net/qq_42951560/article/details/109152114) [换源2](https://mirror.tuna.tsinghua.edu.cn/help/anaconda/) * [ubuntu添加自己的环境变量](https://blog.csdn.net/weixin_43506858/article/details/91492363) * conda 相关【`source ~/.bashrc`或用户登录后会自动进入`(base)`环境】 \| 激活 anaconda 环境：source activate \| 退出 anaconda 环境：source deactivate \| 创建 conda 环境：conda create -n yolov5-env python=3.8 \| 激活 conda 环境：conda activate yolov5-env \| 退出conda 环境：conda deactivate * 常用命令 \| 查看已有conda 环境：conda info -e ## Cython ### Cython是什么? Cython是一个编程语言，它通过类似Python的语法来编写C扩展并可以被Python调用.既具备了Python快速开发的特点，又可以让代码运行起来像C一样快，同时还可以方便地调用C library。 [Cython 3.0 中文文档](https://cython.apachecn.org/#/docs/3) [Cython 基本用法](https://zhuanlan.zhihu.com/p/24311879) [Cython入门教程](https://www.jianshu.com/p/cfcc2c04a6f5)