Sekyoro的博客小屋 ·

HFNLP学习

💡 原文中文，约5800字，阅读约需14分钟。

📝

内容提要

本文介绍了深度学习中常用的技术，包括hugging face的相关库和transformers模型，以及pipeline函数可用的任务参数。同时介绍了embedding的生成方式。

🎯

关键要点

深度学习中常用的技术包括hugging face的相关库和transformers模型。
transformers库用于处理一系列NLP任务，最基本的对象是pipeline函数。
pipeline函数将模型与预处理和后处理步骤连接，便于直接输入文本并获得答案。
pipeline函数支持多种任务参数，如情感分析、命名实体识别、文本生成等。
Zero-shot分类允许对未标记文本进行分类，直接指定分类标签。
文本生成任务通过生成剩余文本来自动完成句子，类似于手机的预测文本功能。
命名实体识别任务要求模型识别文本中的实体，如人员和地点。
问题回答任务根据上下文信息回答问题。
摘要任务用于提炼长文本的关键信息。
翻译任务将文本从一种语言翻译为另一种语言。
Transformer模型需要通过tokenizer将文本转换为数字格式，以便模型理解。
embedding是将语义用数字表示，常用方法是计算每个word的context-aware表示。
多模态嵌入将图像与文本嵌入到同一域内，CLIP模型是一个例子。

🏷️

标签

embedding hugging face pipeline函数 transformers模型深度学习

➡️

继续阅读

Presentation: From Copy-Paste to Composition: Building Agents Like Real Software
Jake Mannix discusses moving AI agents past chaotic "1970s BASIC" arc...
I made a policy engine think it was in production
Kyverno is a Kubernetes-native policy engine that validates, mutates, and gen...
Meta made its own AI detection system. It should have just used Google’s
IIn March, Meta's Oversight Board called on the company to "meet its ...
The 2026 Honda Prelude is a marvel of hybrid technology
When it comes to enthusiast-geared Honda hardware, the Civic Si, Civic Type R...
AWS Billing Bug Shows Customers Trillion-Dollar Estimates While Its Own Cost Alarms Fail to Act
A configuration change in AWS's bill computation system showed customers ...
Utility companies promise to spare us from AI’s energy bill
In the face of backlash to concerns the AI boom will increase consumer electr...